The Perfect Way to Generate Web Content

August 97



Since about the late last year, I have been involved in generating several kinds of web content. At the World Wide Widgets plant I work at, we have an intranet, which is like the Internet except that it is local to the plant. And, as I have described in a previous column, I am involved in generating a CD Rom using HTML content for our customers. And finally, I have also produced a Personal Home Page on the Internet (www.intrlink.net/~demattia) that contains all my back (and unexpurgated) columns for ComputorLink, and one of these days will also include my long awaited book-in-progress on where to go boating in the Inland Empire.

All of these documents have two things in common. First, the output format must be HTML, which stands for HyperText Markup Language, which is sort of a variation on SGML, which stands for Standard Generalized Markup Language. And second, all of my content was originally written with the WordPerfect word processor. The intranet / cdrom / Internet documents are for me rarely original, but are generated as a secondary document from some other original document produced for some other purpose. As an example, I write these columns using Word Perfect for the magazine, but I store the unedited versions of the columns on my web page in HTML format.

SGML has been around for a number of years. Its original function was to provide a common method of transporting written documents between various flavors of word processors. That function is no longer needed, since all the serious word processors are smart enough to consume the documents from or generate content in the format of most other serious word processors.

If that was all SGML ever did, that would have been a useful if now obsolete purpose. However, to simplify their life, the originators of SGML also narrowed the functionality of what word processors were allowed to do, at least in a transportable format. To get the documents back into some semblance of the original format in the new word processor, the SGML designers let the importing functions make certain assumptions, mostly that people who write with word processors are idiots and that they, the designers, are smarter than your average bear.

Consequently, they stripped out all font style characteristics, and most font size characteristics, and provided only one method of outlining, and indicated that they would best place your graphics within the content. It is rather like your College English professor taking a paper you have written, making some scribbling marks on it, and producing something ten times more interesting that what you handed in (yes that did happen to me). Except in this case it is some stupid program that thinks it knows better how to format a document than I do, and I do not believe that such is the case.

So, HTML, in concert with your friendly local browser, does the same thing. When I first started surfing the web, I always wondered why certain sites would put up a sign indicating that their site was "best viewed with Netscape 3" or whatever. And that is because the author of the content has very little control over how a document looks when displayed by a particular browser. What control he has comes a) from what a particular browser lets him do, and I can testify that different browsers will display the same content in drastically different ways, and b) that some browsers have implemented extensions to the HTML specifications, that do give you some extra control over your document, but if you use these proprietary extensions chances are they will not work when viewed on some other browser.

Yes, HTML is standardized, with the current standard being Version 3.2. To see what the standard consists of, (and it is amazing how small the standard, and the syntax, really is) you can go to www.w3.org, which is the body that maintains all the www (hence the name w3.org) standards, including the current HTML definitions. You can download the 3.2 document for free, or you can go to your local bookstore and pay $45 for a book on HTML that in 500 pages will essentially tell you what the 20 page standard does. But unless you are really into got no life hacking, you really don't much care what the standard says, as long as you have a tool that does understand the standard and can hide it from you. Tim Bernars-Lee, the father of the web, claims to be amazed that anybody would actually generate content by directly inserting html commands, rather than using a tool.

So anyway, that was the first surprise to me, that my perfectly coifed WP documents would look like death warmed over after these snobby PhDs that processed my HTML content got done 'fixing' things for me. For instance, all white space is stripped from your document. Any filler blanks, tabs, even carriage returns, are all tossed. Graphics get stuffed into parts of your document that have no relationship to the picture. You only have a choice of six modes of font type (size and bolding), and no choice of font face (other than proportional or fixed).

Because almost all of my proposed content was originally generated in WordPerfect 6, I upgraded ($80) to Corel WordPerfect 7, which was the first major word processor to directly generate HTML output, even before the Evil Empire's word processor did it natively. WP7 now has a new function which it calls the Internet Publisher, which can be selected from its file menu. When you run this function, basically all your hand crafted formatting controls get tossed, except where there is an HTML (version 1+) equivalent. You also get a new set of buttons on the button bar that are HyperText specific. So, for instance, all your font styles get stripped, as do most of the font sizes. However, you can go back using a button in the HyperText menu and declare a line to be a specific Header size, the only font sizing available in standard HTML. Tables, interesting enough, convert properly. Most graphics convert properly, including getting converted to GIF format, although for each document, the graphic is stored in a subdirectory under the document in a funny name. WP7 does not support forms, but WP8 is available now, and it likely does.

The fun has just begun. After you put back what formatting you are allowed to do within the constraints of the HTML spec, then you have to add your HyperText jumps, something that of course would not have existed in your original printed document. WP7 has buttons do to this for you. Just highlight a section of text with your mouse, click on the proper button, and fill in the URL address of a jump. If you are pointing to a location that will be jumped to, WP7 considers that to be a "bookmark" and gives you a dialog box to jump to that location. At the "jumped to" location in the target document, you do in fact insert a WP7 bookmark.

So, if I have a large document, perhaps already broken into several chapters, I will first further subdivide it into smaller Sections of one or two pages each. This is to let your browser import the content in a reasonable amount of time, and also if the user selects to print the document, the browser will only let them print a whole "page" (section) at a time, and you don't want to make it too big. Then I will generate a HyperText table of contents on the first page, to all the chapters of the document. Within each chapter I will generate HyperText section jumps. And where there may already have been an index, I would use the bookmark and HyperText jumps to let the user click on an index topic, and thereby jump directly to that part of the proper Section that contains the relevant material.

And this is where you will spend most of your time. WP7 does a pretty good job of converting what you already have into Web format. But a web document is really a different document than just a converted printed page. The hyperjumps make all the difference between a web document and a printed document. Already, where I have converted this content, I much prefer to use the web version of the document over the printed version because the jumps that are embedded in the web form of the document make it much more easy to find stuff in the document.

So, now you have converted your tome on "Using Truffle Sniffing Pigs to Find Gold Bearing Ore", and you want to put this on the web for all to see. First you need a home page. If you use a local Internet Service Provider, like one of the many that advertize in ComputorLink, they can help you to get started, and if this is a Personal page, it might even be free. My ISP provides such a service. Once he sets up your home page for you, which is essentially a directory on his system, he then provides you with some method to get your content from your workstation to your directory on his machine. This will be some variation of a File Transfer Protocol program. So all you need to do when you have added a new chapter to your tome is to connect to your ISP, and send your new HTML pages over to your directory, and they are instantly available for anybody in the world to see.

The problem of course, is to let the world know that you are there. And that is where Search Engines come in. This should probably take a whole article by itself, but suffice it to say that most of the search engine companies (Alta Vista being the main one) have some method where you can E-mail them, and they will add your page to their 'bot searches. However, it has been reported in the trade press, and I can so verify, that these search engines are months behind in doing their crawling, and stuff I put out there three months ago still has not been indexed. This is supposedly because of the enormous growth of the Internet, or maybe because the search engine

companies are not willing to add to their 'bot capabilities.

Since WP7 was among the first such products out there, it does have what I would call bugs, although they would probably call them unreleased fixes. If you have used styles in your original document to change fonts and more importantly change from proportional to non proportional fonts, the converter does a rather bad job. It takes the command and applies it to the first line of the text, but does not apply it to the remaining selected lines. So if you have selected courier font for a paragraph, all that will get converted is the first line of the paragraph. It also does not do a real good job of positioning graphics in your document. I found that it takes a lot of fiddling around to get everything arranged properly.

Would I use WP7 to generate original web content directly? It can. It has a wizard that gets you started. But there are a lot of web generators out there that cost a fraction of what a full fledged word processor does that will do the job, considering the rather strict limitations that you have to live with when generating html content. And, of course, you can use a text editor and just insert html commands directly -- the content is all ascii, after all, and the html commands are just funny strings contained by angle brackets <>. But if your content is going to be mostly recycled word processor stuff, I would just stick with the word processor that you already know and use it, if it supports an html export.



Read Next Article -->

Return to Home Page ^


Afterwords



With this article, ComputorLink entered a new area, in Ventura County,. California. This is the first time that we have published outside of the Washington area. There are other expansion plans for next year.

The original article was published as is in the Spokane edition, but a different editor made some significant editorial changes for the Ventura edition. Specifically, the Ventura edition deleted the address of this, my web page. Which is too bad, because the people in Spokane have had a chance, good or bad, to read these articles, and the Ventura people have not, but they have no way to know that they are here. Oh well. The editor is always right.