Since about the late last year, I have been involved in generating several kinds of web content.
At the World Wide Widgets plant I work at, we have an intranet, which is like the Internet except
that it is local to the plant. And, as I have described in a previous column, I am involved in
generating a CD Rom using HTML content for our customers. And finally, I have also produced
a Personal Home Page on the Internet (www.intrlink.net/~demattia) that contains all my back
(and unexpurgated) columns for ComputorLink, and one of these days will also include my long
awaited book-in-progress on where to go boating in the Inland Empire.
All of these documents have two things in common. First, the output format must be HTML,
which stands for HyperText Markup Language, which is sort of a variation on SGML, which
stands for Standard Generalized Markup Language. And second, all of my content was
originally written with the WordPerfect word processor. The intranet / cdrom / Internet
documents are for me rarely original, but are generated as a secondary document from some
other original document produced for some other purpose. As an example, I write these columns
using Word Perfect for the magazine, but I store the unedited versions of the columns on my web
page in HTML format.
SGML has been around for a number of years. Its original function was to provide a common
method of transporting written documents between various flavors of word processors. That
function is no longer needed, since all the serious word processors are smart enough to consume
the documents from or generate content in the format of most other serious word processors.
If that was all SGML ever did, that would have been a useful if now obsolete purpose. However,
to simplify their life, the originators of SGML also narrowed the functionality of what word
processors were allowed to do, at least in a transportable format. To get the documents back into
some semblance of the original format in the new word processor, the SGML designers let the
importing functions make certain assumptions, mostly that people who write with word
processors are idiots and that they, the designers, are smarter than your average bear.
Consequently, they stripped out all font style characteristics, and most font size characteristics,
and provided only one method of outlining, and indicated that they would best place your
graphics within the content. It is rather like your College English professor taking a paper you
have written, making some scribbling marks on it, and producing something ten times more
interesting that what you handed in (yes that did happen to me). Except in this case it is some
stupid program that thinks it knows better how to format a document than I do, and I do not
believe that such is the case.
So, HTML, in concert with your friendly local browser, does the same thing. When I first started
surfing the web, I always wondered why certain sites would put up a sign indicating that their
site was "best viewed with Netscape 3" or whatever. And that is because the author of the
content has very little control over how a document looks when displayed by a particular
browser. What control he has comes a) from what a particular browser lets him do, and I can
testify that different browsers will display the same content in drastically different ways, and b)
that some browsers have implemented extensions to the HTML specifications, that do give you
some extra control over your document, but if you use these proprietary extensions chances are
they will not work when viewed on some other browser.
Yes, HTML is standardized, with the current standard being Version 3.2. To see what the
standard consists of, (and it is amazing how small the standard, and the syntax, really is) you can
go to www.w3.org, which is the body that maintains all the www (hence the name w3.org)
standards, including the current HTML definitions. You can download the 3.2 document for
free, or you can go to your local bookstore and pay $45 for a book on HTML that in 500 pages
will essentially tell you what the 20 page standard does. But unless you are really into got no life
hacking, you really don't much care what the standard says, as long as you have a tool that does
understand the standard and can hide it from you. Tim Bernars-Lee, the father of the web, claims
to be amazed that anybody would actually generate content by directly inserting html commands,
rather than using a tool.
So anyway, that was the first surprise to me, that my perfectly coifed WP documents would look
like death warmed over after these snobby PhDs that processed my HTML content got done
'fixing' things for me. For instance, all white space is stripped from your document. Any filler
blanks, tabs, even carriage returns, are all tossed. Graphics get stuffed into parts of your
document that have no relationship to the picture. You only have a choice of six modes of font
type (size and bolding), and no choice of font face (other than proportional or fixed).
Because almost all of my proposed content was originally generated in WordPerfect 6, I
upgraded ($80) to Corel WordPerfect 7, which was the first major word processor to directly
generate HTML output, even before the Evil Empire's word processor did it natively. WP7 now
has a new function which it calls the Internet Publisher, which can be selected from its file menu.
When you run this function, basically all your hand crafted formatting controls get tossed, except
where there is an HTML (version 1+) equivalent. You also get a new set of buttons on the button
bar that are HyperText specific. So, for instance, all your font styles get stripped, as do most of
the font sizes. However, you can go back using a button in the HyperText menu and declare a
line to be a specific Header size, the only font sizing available in standard HTML. Tables,
interesting enough, convert properly. Most graphics convert properly, including getting
converted to GIF format, although for each document, the graphic is stored in a subdirectory
under the document in a funny name. WP7 does not support forms, but WP8 is available now,
and it likely does.
The fun has just begun. After you put back what formatting you are allowed to do within the
constraints of the HTML spec, then you have to add your HyperText jumps, something that of
course would not have existed in your original printed document. WP7 has buttons do to this for
you. Just highlight a section of text with your mouse, click on the proper button, and fill in the
URL address of a jump. If you are pointing to a location that will be jumped to, WP7 considers
that to be a "bookmark" and gives you a dialog box to jump to that location. At the "jumped to"
location in the target document, you do in fact insert a WP7 bookmark.
So, if I have a large document, perhaps already broken into several chapters, I will first further
subdivide it into smaller Sections of one or two pages each. This is to let your browser import
the content in a reasonable amount of time, and also if the user selects to print the document, the
browser will only let them print a whole "page" (section) at a time, and you don't want to make it
too big. Then I will generate a HyperText table of contents on the first page, to all the chapters
of the document. Within each chapter I will generate HyperText section jumps. And where
there may already have been an index, I would use the bookmark and HyperText jumps to let the
user click on an index topic, and thereby jump directly to that part of the proper Section that
contains the relevant material.
And this is where you will spend most of your time. WP7 does a pretty good job of converting
what you already have into Web format. But a web document is really a different document than
just a converted printed page. The hyperjumps make all the difference between a web document
and a printed document. Already, where I have converted this content, I much prefer to use the
web version of the document over the printed version because the jumps that are embedded in the
web form of the document make it much more easy to find stuff in the document.
So, now you have converted your tome on "Using Truffle Sniffing Pigs to Find Gold Bearing
Ore", and you want to put this on the web for all to see. First you need a home page. If you use
a local Internet Service Provider, like one of the many that advertize in ComputorLink, they can
help you to get started, and if this is a Personal page, it might even be free. My ISP provides
such a service. Once he sets up your home page for you, which is essentially a directory on his
system, he then provides you with some method to get your content from your workstation to
your directory on his machine. This will be some variation of a File Transfer Protocol program.
So all you need to do when you have added a new chapter to your tome is to connect to your ISP,
and send your new HTML pages over to your directory, and they are instantly available for
anybody in the world to see.
The problem of course, is to let the world know that you are there. And that is where Search Engines come in. This should probably take a whole article by itself, but suffice it to say that most of the search engine companies (Alta Vista being the main one) have some method where you can E-mail them, and they will add your page to their 'bot searches. However, it has been reported in the trade press, and I can so verify, that these search engines are months behind in doing their crawling, and stuff I put out there three months ago still has not been indexed. This is supposedly because of the enormous growth of the Internet, or maybe because the search engine
companies are not willing to add to their 'bot capabilities.
Since WP7 was among the first such products out there, it does have what I would call bugs,
although they would probably call them unreleased fixes. If you have used styles in your
original document to change fonts and more importantly change from proportional to non
proportional fonts, the converter does a rather bad job. It takes the command and applies it to the
first line of the text, but does not apply it to the remaining selected lines. So if you have selected
courier font for a paragraph, all that will get converted is the first line of the paragraph. It also
does not do a real good job of positioning graphics in your document. I found that it takes a lot
of fiddling around to get everything arranged properly.
Would I use WP7 to generate original web content directly? It can. It has a wizard that gets you
started. But there are a lot of web generators out there that cost a fraction of what a full fledged
word processor does that will do the job, considering the rather strict limitations that you have to
live with when generating html content. And, of course, you can use a text editor and just insert
html commands directly -- the content is all ascii, after all, and the html commands are just
funny strings contained by angle brackets <>. But if your content is going to be mostly recycled
word processor stuff, I would just stick with the word processor that you already know and use it,
if it supports an html export.
Read Next Article --> Return to Home Page ^
Afterwords
With this article, ComputorLink entered a new area, in Ventura County,. California. This is the
first time that we have published outside of the Washington area. There are other expansion
plans for next year.
The original article was published as is in the Spokane edition, but a different editor made some significant editorial changes for the Ventura edition. Specifically, the Ventura edition deleted the address of this, my web page. Which is too bad, because the people in Spokane have had a chance, good or bad, to read these articles, and the Ventura people have not, but they have no way to know that they are here. Oh well. The editor is always right.