There are certain elements that need to be included in the HTML of every page in a Web site. The easiest way to do this is to base all new pages on a standardized page template. This is just a starter file that you open to begin a new page.
Common things that might be included in a page template:
Top of page
Document type declaration (DTD)
Character encoding
Fundamental page structure tags: HTML, HEAD, BODY
Page title
LINK tag or JavaScript call to load CSS style sheets
META tags for PICS content labeling, if used
"Breadcrumb" or other navigation menu
Top-level heading
Bottom of page
ADDRESS tags
Document last-modified date, manual text or JavaScript call
Copyright notice
Author's name
Author's email address or mailto link (see below)
Closing BODY and HTML tags
The page title and top heading, listed in italics above, are elements whose text contents must always be filled in by the page author. The page title becomes the user's default bookmark label.
The date last modified, at the bottom of each page, can be done manually as text. It's also possible to make those page dates automatic, using this simple external JavaScript file.
The document type declaration (DTD) goes at the very top of each page's code, and identifies the type or version of HTML/XHTML being used. If there's no DTD the browser goes into quirks mode, in which it's ready to try to cope with any sort of malformed HTML. Including a DTD tells the browser what the rule book is supposed to be, and makes for faster page plotting. Most people coding in modern HTML should probably use this one:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<HTML>
( rest of page )
Once you have the page template filled in, you're simply going to start writing. Your user-friendly WYSIWYG authorship tool is going to take care of the gory details in the background, inserting body paragraphs, second, third, and possibly fourth-level subheadings, bullet and number lists, definition lists, hyperlink references to other pages, horizontal rules, block quotes, and whatever else you end up using.
Meta keyword and description tags in the page header were big for a while in the late 1990s, but were heavily abused and spammed. They became pretty much ignored by modern search engines before I ever got around to researching them. I can't express how happy it makes me that I actually managed to ignore this until it went away. Here are a couple of articles on the subject:
Death of a Meta Tag
http://searchenginewatch.com/sereport/article.php/2165061
An End to Meta Tags
http://www.traffick.com/article.asp?aID=102
Website authors use various feedback email strategies, to allow readers to send them mail without becoming a spam target:
For more about avoiding spam, see my Spam defense page in the Net navigation section.
Do the names Java and JavaScript confuse you? You're not alone. My Java vs. JavaScript page explains, and has a couple widgets on it you can use to check if your PC and browser support both systems.
<A HREF="mailto:jdoe@isp.com?subject=Feedback">Feedback</A>
We think of our modern PC displays as being big and crisp, and they are, compared to what we had before. But current displays still have a significantly lower resolution than laser-printed material on paper. Without really being aware of it, people read online material somewhat more slowly, and with measurably less patience.
In order to keep from losing your reader's attention, your writing for online must be different from writing for print media. For starters, it needs to be condensed to about half the length you'd use for print.
You can also communicate better if you write online content in inverted pyramids, as newspaper people are taught: start with the conclusion or main idea, follow immediately with the main supporting points, then supply the background detail. Both in online and newspaper content, the idea is that an impatient reader can quit at any time and still have gotten the high points of the item.
People rarely read online content the way they'd read a novel, line by line straight through. Web users scan the page looking for a single piece of information, tucked away somewhere in your content; they want to find that information as quickly as they can, and then return to what they were doing before. So they try to find the target topic while reading as little non-target information as possible.
You can help them by making your content scan-able: one idea per paragraph, as tightly written as possible, with the main points visually highlighted, using boldface, italics, or hyperlinks. (Blue hyperlinks function visually as emphasis; use that to your advantage whenever possible.)
Horrible example:
Of course, it's also possible to OVER-USE EMPHASIS. Excessive use of different kinds of emphasis in the same paragraph, such as boldface, italics, ALL CAPS, entirely too many words used as link text in hyperlinks, AND MIXTURES THEREOF, can lead to an ONLINE VERSION of something called P.T. BARNUM SYNDROME !!!
Don't worry, I won't do that any more. Aren't you glad I only used one font?
Any hyperlink has two parts: the link text, the brief text phrase selected to be shown in blue and underline, which the user clicks to follow the hyperlink, and the link address, the filename or Internet address of the location to go to when the link is clicked.
External hyperlinks are links to other Web pages on the Internet, which are not part of your own Web site. Their link addresses must be full Internet URLs. (For discussion of URL Internet addresses, see the Parts of a URL section of my Basic browser tips page.)
Internal hyperlinks are links to locations within your own Web site. These are normally used in link lists, as part of your carefully-designed internal navigation scheme, and sometimes as cross-references within page text, such as the reference in the previous paragraph. An internal hyperlink can connect to a location on a different page in your site, or to a location on the same page.
There's no way to identify internal vs. external hyperlinks by looking at them; they're both presented as blue underlined text. You should always make it clear from context which is which. Most of your internal hyperlinks should be part of your site's navigation system, and therefore fairly obvious. Cross references within page text should include context wording indicating that they are internal links, as above. External hyperlinks in text should include enough information within and around them to make it clear not only that the link is external, but who sponsors the other site and what information the user will find there.
An internal hyperlink to a different page can simply use the filename of the page as its link address. (If you have HTML files for your site in different directories, you may also have to provide a directory path in front of the filename.) Some browsers will then display the top of the linked page. Some browsers will "remember" a last-viewed location for each page, and jump to that location if a link doesn't specify a particular location.
In order to link to a specified location on a page, the location must first be identified in the HTML code, by labeling a text phrase with anchor tags. These look similar to hyperlink tags, with a specified anchor NAME instead of an HREF hyperlink address, but are not visible in the browser; they only serve to identify target locations for internal hyperlinks. The full link address to a defined anchor location is the filename (with pathname if needed) plus a number sign and the anchor name text, such as <A HREF="browsing.html#URL">.
When you link to an anchor on the same page, normally as part of your site navigation design, the link address will just be the number sign and the anchor name. Whenever an internal link address has no filename, the browser will assume an anchor location within the current page.
Internal hyperlinks can use either relative addressing or absolute addressing. Most sites will want to use relative addressing for all internal links. If all the HTML files of a site are in the same directory, a relative address will simply consist of the page's filename, with or without an anchor name. When page files are in multiple directories and pathnames are necessary, an absolute address will always begin with a slash, interpreted as a directory path beginning with the system root directory. A relative address with pathname will always begin with a directory name or filename rather than a slash, and be interpreted as a relative path beginning with the current directory.
Since PC's where authors write Web pages will generally have different directory structures than the Web server where the pages will be published, it makes a lot more sense to use relative addressing for internal links. If you use relative addressing, your site will navigate and browse exactly the same on your local hard disk as it will when you are looking at it on the Web server.
Beginning Web authors tend to make hyperlinks like this:
You can find discussion of URL Internet addresses here. Here is a link to the top of my basic browser tips page.
This is awkward and confusing to readers. You have to look pretty carefully at that to determine which "here" link goes where, don't you?
It's better to let the link text be part of the meaningful flow of the rest of the text, and make the choice of the link text work for you, both as visual emphasis and to identify where the link goes:
For discussion of URL Internet addresses, see the Parts of a URL section of my Basic browser tips page.
There's a long and perfectly legitimate tradition of using underline in ordinary English writing; sometimes to identify certain types of bibliographic references, or merely for emphasis. Unfortunately the Web has co-opted underline format to identify hyperlinks; it's a done deal.
If you allow anything on a Web page to be underlined that's not a hyperlink, I guarantee you readers will try to click on it. When it doesn't work, they will be at worst confused and at best annoyed. I get suckered by this myself on a regular basis; put me in the "annoyed" column.
It is true that non-link underlined text will normally be presented as black, rather than the blue or purple of typical unvisited and visited hyperlinks. Unsophisticated users will often not be aware of this color distinction. Sophisticated Web users know that Web authors can and do change link colors, and may try it anyway. Some versions of the Opera browser even default to presenting visited links in black rather than purple.
In fact, since any non-Web document these days stands a good chance of being put on the Web sooner or later, I think it's time to give up all use of underline as simple emphasis. Underlined titles in bibliographies or bibliographic references in text can be converted more gracefully, if they can be made into a hyperlink to the Web version of the document.
Everybody has a favorite Web browser. The natural impulse for a new Web author is to test-view pages in their usual browser. No matter what that is, there will be lots of Web users who use something else.
If you're writing content for an intranet where everyone is required to use a certain browser, obviously you only need to test it with that. If you're writing for the global Internet, or for an intranet where users may choose their browser, or intranet content which might someday be published via Internet or extranet, you really should test with some other browsers.
I think testing with other browsers should be considered in several levels of descending priority:
If you or your boss can afford $60 a month, BrowserCam is an interesting resource for browser compatibility testing. You can select a browser and version, OS and screen resolution, and quickly receive a screen capture of exactly what your code looks like on that platform. They also offer remote login for testing of dynamic features.
BrowserCam should pretty much do away with any need for multiple PCs, virtual machine or emulator software, and similar heroic measures in support of cross-platform HTML testing. Anybody who can boot Windows can install Firefox and Opera along with MSIE, and test with BrowserCam for all other platforms deemed relevant. Of course, the lazier sort of corporate Webmaster probably hopes his boss doesn't find out about BrowserCam.
The Opera browser has some handy features for Web developers. On Windows, Opera 7/8's default source code viewer is WordPad; Opera 9 includes its own native source code viewer/editor with syntax highlighting, which by default comes up as a browser tab. You can also specify any editor you want (Tools, Preferences, Advanced, Programs). To open a page for editing, you can just display it in Opera and do View, Source (or press Control-F3). I've never understood why other browsers don't do this, at least as an option.
Another Opera feature that can be useful for Web developers is View, Zoom, also available on the View toolbar. I'm sure this was originally designed with user accessibility in mind, but you can also use it to quickly model what a page would look like for someone with a bigger monitor or higher screen resolution. Try Zoom factors in the 60-90% range. You can also use keypad +/- to zoom in 10% increments, and keypad * to reset to 100%. You can sort of do this sort of modeling with IE's text size controls in the View menu, but Opera's Zoom scales the graphics right along with the text and IE doesn't.
External Web pages should also be tested with JavaScript disabled in the browser. This site's users run only 2-4% with JavaScript disabled, but I believe external Web pages should be functional and usable without it.
Anyone who slaps a "this page is best viewed with Browser X" label on a Web page appears to be yearning for the bad old days, before the Web, when you had very little chance of reading a document written on another computer, another word processor, or another network. (Tim Berners-Lee in Technology Review, July 1996)
I'd go a bit farther than that quote now: putting "best viewed with" language on a Web page anytime since about 2001 amounts to an admission of incompetence. I think that applies to statements about screen resolution as well as browsers, although that aspect is more controversial.
I think it's perfectly legitimate for an organization to decide not to try to make its Web code compatible with browser versions older than a certain threshold, or particular browsers with very small user bases. I just think nobody should have to have MSIE or Windows to use any particular Web page.
See also the Browsers section of my HTML links page for download sources for various browser versions.
What is a Web server and what does it do? The term is used for the hardware, the software, or both. A Web server is a computer that's running and connected to the Internet all the time. It runs server software, such as Apache or Microsoft IIS, that responds to HTTP requests from Web users by sending out HTML files and graphics. Nobody can see your Web pages but you until they are on the Web server. There are Web server programs for many operating systems, including NT, OS/2, and even MS-DOS, but most Web servers in use on the Net run on Unix. A very common one is the freeware Unix server Apache.
If you're lucky, your user-friendly WYSIWYG authorship tool will have a button that says "Publish" and everything will get loaded on the server and work fine. (No doubt you'll have to tell the software where your server and home directory are first.)
If you have to do some troubleshooting, see my page on ftp and troubleshooting. Common troubles might include:
Get one. Use it. Your favorite Internet software site will list them for various operating systems, possibly categorized under "validation." They're also sometimes known as "Web spiders."
People rearrange their sites, move or rename pages, move them to different servers, or sometimes a whole site just disappears from the Web. All of these are bad things, and some of them can be avoided with better site management practices. All these effects can tend to cause external hyperlinks on your own pages, that were good when you added them, to produce the all-too-familiar 404 Not found error, or at least something unexpected. Some people call this effect linkrot.
Of course, it's also possible to get bad internal or external hyperlinks from your own coding typos.
Valid hyperlinks are very important. They generate confidence in you and your Web site, and contribute to positive user attitudes on the Web generally. The more Webmasters keep their links current, the better the whole Web works. On the flip side, finding more than a few broken links in your site can quickly destroy your reader's respect for your material, and cause him or her to give up on you and move on.
A link tester will start from a specified home-page URL and follow all links. It can tell if a link is internal or external by the form of the address; if the link is internal, it investigates all hyperlinks on the linked page as well. Some testers write to database files while they run. After the check run is complete, a link tester will generate user reports in various formats, often including HTML.
You look over the report, maybe print it, and investigate the flagged links. A link may really be broken, the target page may have been moved with a polite redirect from the target site, the link may have timed out during the test run, or the other site may just have all robot programs locked out.
If you maintain your site with a link tester, maybe you should let your readers know about it, to bolster their confidence. Put a note like "links checked 7/4/06 by Whizbang," near the top of your home page. You can even link to the home page of your link-tester program, which will make the authors happy, and let your readers see what technology you're using, if they're curious.
You might want to check with your ISP: they may object to users running robot programs. You'll probably have better luck if you run your link tester in off-peak hours anyway, if only so that fewer links "time out" during the test. I always run mine after about 10:00 PM at night.
Xenu's Link Sleuth is an amazing free link tester that uses something called preemptive multithreading to test sites super fast. By that I mean it can completely link-test a site that takes hours in another link tester in about fifteen minutes. You may want to set the option View, Show broken links only for the best view of the test in progress. You'll definitely want that option for checking the broken links later.
Xenu has another function at File, Retry broken links, that lets you re-test just the bad links. This can eliminate some false errors such as timeouts, before you start checking errors. Try changing the setting at Options, Preferences, Parallel Threads to a lower value before running the broken-links re-test: it seems to cause Xenu to give each link more time. On dialup, I've had good results with a threads setting of 16-20 for the initial full-site test and 4-5 for re-testing broken links.
After a link test is finished, Xenu outputs HTML reports in your system default browser, which you can print and refer to, or save, if you like. You can print the report right after the link test, or save the full test results to a file with extension XEN and print or save HTML reports later. I suggest turning off all report options except Broken links, ordered by links, to print a check-off list you can use to track your progress while you investigate the bad links. This report also lists your pages that contain each link URL.
You can also refer directly to Xenu's simple spreadsheet-like display. To see the broken links sorted by URL, to match the report I recommend above, click on the first-column Address header button. You can right-click a broken-link record in Xenu and pick Open in Browser, which will open the exact URL Xenu found in your code in your system default Web browser, or at least try. When you find out the true status of that URL, you can go back to Xenu, right-click again and pick Properties, and see another list of your pages that have that link URL.
Many link testers—but not Xenu, yet—have a default option to look for a file on remote servers called robots.txt and obey instructions found there. Some sites, notably including Yahoo, use this system to completely exclude all robot/spider programs. You probably don't want to just turn off this kind of feature, though, because more responsible sites use robots.txt to keep robot programs out of dynamic content that would only confuse them, or in some cases even lead to an infinite loop.
In Xenu I believe sites that exclude all spiders produce the output status forbidden request. You can make Xenu skip their URLs: in the screen you use to start a link test (File, Check URL) there's an option labeled Do not check any URLs beginning with this. Type or paste a URL into the text box and click Add.
CyberSpyder is another link tester I used to use. You might also want to look at the link testers list on Open Directory.