v1.0.0 / 01 jun 03 / greg goebel / public domain
* The "World-Wide Web (WWW)" revolutionized the Internet by creating a much easier way to obtain information over the network, linking vast numbers of "web sites" and "web pages" that could be easily searched and inspected.
The basis of the Web is the formatting language used to create web pages, the "Hypertext Markup Language (HTML)". HTML is not only used to format documents, but also to control their links and operation. This document provides a short introduction to HTML.
* HTML provides a set of "tags" that are embedded in documents to be
displayed by a Web browser. These tags provide instructions on how a Web
browser will format and otherwise deal with a document displayed in the
browser. Typical HTML code might have the form:
<TITLE>This Is The Document Title</TITLE>
<H1>[1.0] Introductory Topics</H1>
Twas Bryllig, and the slithy toves
Did gyre and gimbel in the wabe.
All mimsy were the borogoves,
And the mome raths outgrabe.
<HTML>...</HTML> Defines an HTML document (optional).
<HEAD>...</HEAD> Defines head of document (optional).
<BODY>...</BODY> Defines body of document (optional).
<H1>...</H1>: First-level header.
<H2>...</H2>: Second-level header.
<P> Paragraph break.
<PRE>...</PRE>: Preformatted text block.
<UL>...</UL> Unnumbered list.
<LI> List item.
<EM>...</EM> Emphasis (usually italicized).
<STRONG>...</STRONG> Strong emphasis (usually bold).
<BR> Forced line break.
<HR> Horizontal rule.
<HTML> ... </HTML>
<HEAD> ... </HEAD>
<BODY> ... </BODY>
Text to format and display.
The remaining tags in the list are more interesting. First, there is the
"<TITLE>" tag, which simply declares the title of the document:
The actual title of the web page that the web browser displays can be
generated by the "<H1>" tag, which defines a first-level header:
<H1>Welcome To Coyote's Website</H1>
If there are any sub-headings with the text of the document to be displayed,
they can be generated with the "<H2>" tag, which defines a second-level
<H2>Just Who Is This Coyote Guy, Anyway?</H2>
Text in the body of the document will be "filled" to the margins of the web
browser display. Spacing and blank lines between paragraphs will be ignored,
and so paragraphs need to be broken apart with the tag:
If the author wants text to be printed "as is", without filling, the "<PRE>"
(preformatted block) tag must be used instead. For example, if the following
text were to be filled, it would all be scrunched up on one line and would
look terrible, so it is marked as preformatted text:
Name: Wyle Coyote
Marital Status: Single
This problem can be avoided by replacing these characters with "escape"
codes. The characters of concern and their escapes are as follows:
"<" = <
">" = >
"&" = &
'"' = "
<H1>Welcome To Coyote's Web Page</H1>
Welcome to Wyle Coyote's web page! Despite my busy career as a
predator and certified Genius, I have had time to establish a
presence on the Internet, and am now on-line for Coyote lovers everywhere!
<H2>Just Who Is This Coyote Guy, Anyway?</H2>
Allow me to introduce myself. I am Wyle Coyote, Esquire, IQ 200,
graduate class of 1950 from the Southwest Technical Institute. I have
a degree in Advanced Predation and specialize in trap technologies.
Being a predator is a challenging career. Attempting to capture
clever and fleet-footed game is by no means simple and in fact can
present significant hazards to the unwary. However, being a Genius I
find that the difficulties involved only make the game more interesting.
And now for the dry personal statistics, for those of you inclined to
worry about minor details:
Name: Wyle Coyote
Marital Status: Single
If you have questions, please feel free to email me at:
* Given the ability to make elementary web pages, the next step is to add other formatting capabilities. The first new formatting capability is the "list".
HTML defines various list formats, but the simplest is the unnumbered list,
defined by the "<UL>" tag. Each item in the list is defined by the "<LI>"
tag. For example:
Another common need in HTML documents is text styling, such as bold, italic,
underlined, and so on. There are a lot of different tags to define text
styles, but three are sufficient for most needs: "<EM>", for emphatic text
(usually italic); "<STRONG>", for strong emphasis (normally bold); and
"<CITE>", for citations. For example:
This example demonstrates <EM>emphatic text</EM>.
This example demonstrates <STRONG>bold text</STRONG>.
A citation: <CITE>WAR AND PEACE</CITE>
* For an example that puts these items together:
* The Acme Corporation of Albuquerque, New Mexico, is a for-profit concern
that focuses on a wide range of <EM>reliable, safe, and well-built</EM>
products. Acme is a people-oriented concern that provides an excellent
work environment and prospects for advancement.
* Acme offers an extensive list of useful products:
<LI> Jet Skates
<LI> Bat-Man Suits
<LI> Explosive Devices:
<LI> Land Mines
<H2>For Further Information</H2>
* Please contact Acme at:
1948 Roadrunner Drive
Albuquerque, NEW MEXICO, USA 87109
For product support information: <STRONG>firstname.lastname@example.org</STRONG>
<TITLE> ... <PRE> ... </TITLE> ... </PRE>
* It is easy to insert bitmap image files in a web page. These files should
generally be in the popular .GIF or .JPEG format, and can be specified with
the "<IMG>" image tag:
<IMG ALIGN=TOP SRC="another.gif">
<IMG SRC="somegfx.gif" ALT="Some Funny Graphics">
The image tag can be embedded in a line of text:
I <IMG SRC="heart.gif"> My German Shepherd
* So far all the tags discussed have focused on document cosmetics. However, anyone who uses the web knows that the real usefulness of a web page is obtained through "hyperlinking", in which pointers are set up on a web page to specify other web pages that the surfer can load with a mouse click.
HTML defines a hyperlink with the "anchor" tag:
* This leads to a discussion of relevant features of a computer file system. Personal computers all use a "hierarchical" file system (as do, with minor differences, UN*X workstations) and it is important to understand this concept to build a website.
Early personal computers featured a "flat" file system, meaning that each disk contained a single directory and a set of files. However, a modern PC's file system may contain a directory, which lists both files -- and lower-level "subdirectories" -- which may list files of their own, as well as their own subdirectories -- which may list files of their own, and so on (until the disk runs out of space). This scheme defines an upside-down "tree" (a "hierarchy") of directories.
For example, consider the following hierarchical file system:
This is a very simple example, but it illustrates the basic ideas of a hierarchical file system.
At the "top" of this upside-down "tree" is a directory named "C:\", or (ignoring the drive specifier), simply "\". This is always the name of the topmost directory on a PC disk; since it is the directory from which the rest of the directory "tree" grows, it is called the "root directory", or simply "root".
The root directory in this example contains three files, as well as three subdirectories: "Utils", "Web", and "Tmp".
The three subdirectories in this example store different sets of files: "Utils" stores various utility programs, "Web" stores HTML files, and "Tmp" stores temporary files. Furthermore, the "Web" subdirectory has two subdirectories of its own: "Priv" and "Pub", to store HTML files of personal and public interest respectively.
Please remember, this is only an arbitrary example. Any convenient organization can be defined; subdirectories may have any name; files can generally be stored where ever they are convenient. A hierarchical file system allows a user to create a neat organization for the files on his or her machine.
* Having a hierarchical file system implies a need to be able to describe the location of files in it, which leads to the idea of a "pathname".
Suppose a file (like, say, filelist.txt) is stored in the root directory of
the "C:" drive; then it can be located by prefixing the file name with the
drive ID and root directory name:
* Writing out full pathnames for files can be tiresome, but, fortunately, it is also possible to refer to a file by its "relative pathname". The full pathname describes a file from any directory on the PC. The relative pathname describes a file "relative" to the directory the user is "in" at the time (the "current" directory).
As the simplest example, if the current directory is on the "C:" drive,
there is no need to specify "C:" in the path, since it's assumed:
Relative pathnames make life easier, if at the cost of occasional confusion. Just to add to this confusion, let's add another wrinkle: the current directory can be referred to simply as ".", and (more important), the "parent" directory of the current directory can be referred to as "..".
The "parent directory" is the opposite of a "subdirectory". That is, since
"Utils" is a subdirectory of the root directory, then the root directory is
the parent directory of the "Utils" directory. For example, if "Utils" is
the current directory, then "filelist.txt" can be given by:
Second, the "path separator" character on UN*X is "/", while it is "\" on a
C:\Web\Priv\bdays.html # PC format
/Web/Priv/bdays.html # UN*X format
So, if a website is designed with hyperlinks using absolute pathnames, it will work if the user has complete access privileges to the server, and completely break if any outside surfer tries to access the pages. This means that the web pages on the site should in general be linked using relative pathnames. This also makes it easier to modify the website.
For example, a link to a product file from one page might have the form:
<A HREF="../index.html">Return To Main Page</A>
* Note the use of the filename "index.html". This is a special default file
name that a web browser will automatically load (in HTTP protocol at least)
if no specific file is specified. This is the file that is accessed when
surfing to a website where no target file is specified, just the server:
* The hyperlinks shown so far allow linking to other websites or to other
files on the same website, but it is also possible to "mark" places within a
web page so that they can be linked to. Such "markers" are defined by
another variation on the "anchor" tag:
<H2><A NAME="m3">SECTION 3</A></H2>
<A HREF="myfile.html#m3">Section 3 -- An Overview</A>
<A HREF="#m3">Go To Section 3</A>
* While all the hyperlinks shown so far use text to define the label used,
bitmap images can be used as well:
<A HREF="../index.html"><IMG SRC="prev.gif"></A>
A somewhat more advanced technique allows you to use selected regions of an image to link to files. Creating such a "clickable image" is done with the "MAP" and "AREA" tags.
For an example, let's say we have a bitmap named "testmap.gif" that has four button-like square regions on it, as follows:
This bitmap is 175 pixels (picture dots) on a side. Each square region is 50 pixels wide and high, and the margins and spacing between the squares are all 25 pixels. We want to link the region defined by "B11" to a web page named "file11.html", and similarly link "B12" to "file12.html", "B21" to "file21.html", and "B22" to "file22.html". Clicking on one of the squares should bring up the appropriate file, but to prevent confusion clicking on the margins or spacing between the squares should do nothing.
The first thing that needs to be done to work with MAP and AREA is define the
coordinates of the squares within the bitmap. The bitmap is regarded as an
X,Y grid, with the 0,0 coordinate at the top left corner. Dimensions of
elements within the bitmap are defined as offsets from that corner as
<IMG BORDER=0 SRC="testmap.gif" USEMAP="#maplist">
The map is set up by the MAP and AREA tags as follows:
<AREA SHAPE="rect" COORDS="25,25,75,75" HREF="file11.html">
<AREA SHAPE="rect" COORDS="100,25,150,75" HREF="file12.html">
<AREA SHAPE="rect" COORDS="25,100,75,150" HREF="file21.html">
<AREA SHAPE="rect" COORDS="100,100,150,150" HREF="file22.html">
In this example, we're setting up a list of rectangles to be used as
clickable elements, but the AREA tag can also be used to set up circles,
with the coordinates specified as "X,Y,radius" as follows:
<AREA SHAPE="circle" COORDS="25,25,15" HREF="circtest.html">
<AREA SHAPE="poly" COORDS="77,44,119,44,98,3,77,44" HREF="polytest.html">
* HTML provides a large number of bells and whistles for the website author. A few handy ones are discussed here.
The previous discussion of hyperlinks focused on linking from one HTML file
to another, but the hyperlinks can point to any type of file, and the web
browser will use the file extension to determine what should be done with
the file. Typical file types include:
.html: html document
.txt: plain text
.gif: GIF bitmap image
.jpg: JPEG bitmap image
.jpeg: JPEG bitmap image
.wav: Windows audio file
.ps: PostScript file
.mov: QuickTime movie
.mpeg: MPEG movie
.mpg: MPEG movie
.mp3: MP3 audio file
One nice thing to add to a web page is a background bitmap using the tag:
<BODY BGCOLOR="#ff0000"> Background color.
<BODY TEXT="#00ff00"> Text color.
<BODY LINK="#0000ff"> Link color.
<BODY VLINK="#ffffff"> Visited link color.
<A HREF="mailto:email@example.com">Wyle Coyote</A>
* While most HTML tags deal with the formatting and display of information on a web page, there is also a tag, the META tag, that is used to provide information about the web page, as well as provide control instructions to a web browser.
The standard META tag has the syntax:
<META name="META_tag_name" content="META_tag_value">
<META name="author" content="gv_goebel">
<META name="description" content="This page provides a survey of
superconductive physics, materials, and technology>
<META name="keywords" content="Superconductivity, BCS theory, YBCO">
<META name="robots" content="INDEX,FOLLOW">
* The META options that control a web browser have an alternate format:
<META http-equiv="META_tag_name" content="META_tag_value">
<META http-equiv="refresh" content="10;http://www.newsite.com/">
<META http-equiv="expires" content="0">
<META http-equiv="expires" content="Tue Feb 13 12:00:00 GMT 2001">
* The tools discussed so far can be used to build a reasonable web page. These tools are very simple. Once they are understood, the real problem becomes one of organizing the effort.
Planning a website involves three considerations:
* The "Content" issue is common sense: The topic, scope, and target audience of the web page needs to be clearly defined and understood. One secondary issue is maintenance: it is very common for people to put together websites and then forget about them. Such dead sites are known as "cobwebsites". This is a waste of time for all concerned, so a website should not be built unless there is the intent to keep it up to date.
* The "Cosmetics" issue is a little more troublesome. Novice website builders will tend to clutter their pages with graphics and bells and whistles. In reality, there is a tradeoff between complexity and utility. Graphics-intensive websites, for example, may look pretty, but they will take a long time to load into a user's browser and may simply irritate users.
Sometimes "features" can be counterproductive. Some people, for example, find blinking text extremely obnoxious, and many web authors set up colors or backgrounds that make web pages painfully hard to read. Pop-up windows are fine for warnings but are an intrusive irritant. Another irritant is a website that brings up a separate browser for each new page accessed. As a surfer can easily bring up a second window using the alternate mouse button if it's desired, there's usually no good reason to do it for the surfer.
As a rule: don't get too cute. At best people ignore most of the bells and whistles after the third time they see them, at worst they get increasingly annoyed with them.
As a related issue, it is important not to make too many assumptions about how a web page will be displayed. Different web browsers will handle HTML according to their own settings, and a web page that assumes a specific browser configuration will give bad results on another browser. It is useful to access your own pages from somebody else's machine to see how they look and work.
* The "Organization" issue is the hardest one to deal with. The information to be presented by the web page needs to be structured in a way that makes it clear and easy to access. This is entirely a matter of writing ability and style, but some hints can be provided:
Each module corresponds to a separate web page in the website, and the subsections correspond to different headings in the same web page. Headings for each page and subsection should be clear and self-explanatory.
It is very important to have a clean and simple hyperlinking scheme, organized hierarchically or in some other structured fashion. This makes hyperlinking easier to test, and also prevents all the hyperlinks from being broken if a minor change is made in the website. This is is very easy to do if the hyperlinks have been set up in a haphazard fashion.
* Now for some specific details. As far as hyperlinking schemes go, a strict hierarchy is one of the simplest approaches:
In this case, each page links to its children and to its parent. There is no jumping over levels in the hierarchy. This means that rearranging the pages or otherwise modifying the site will break the minimum number of hyperlinks. Adding hyperlinks back to the index page is workable, though it increases the complexity somewhat.
Related chapters of a large document that are organized as web pages can be chained to each other in a ring running through the table-of-contents page:
These are only representative organizations. Any rational scheme may work just as well, but they do illustrate the need to keep hyperlinks well organized.
If these schemes are hyperlinked using graphics labels in the form of arrows, it can be a bit confusing for a novice to figure out the scheme. One useful trick is to use the "ALT" feature in the bitmap tag code to give a string of text explaining what the function of the bitmap is: "BACK", "GO TO INDEX", "NEXT CHAPTER", and "PREVIOUS CHAPTER". The user can then get a hint about what the bitmap does by resting the mouse over it for a moment.
* Once the hyperlink organization has been selected, then there is the
question of organizing the hyperlink labels. They should be grouped in
fairly small groups to make them easy to inspect. It is useful to organize
them as lists:
<LI><A HREF="Priv/etc.html">Other Data</A>
o *Other Data*
<A HREF="gfx/moon.gif">MoonScene</A> |
<A HREF="gfx/mars.gif">MarScape</A> |
Fun bitmaps: *MoonScene* | *MarsScape* | *EarthInSpace*
* Simple tables can be easily provided for a website using preformatted text
blocks. One trick is that the underscore ("_") character is better for
building lines than the dash ("-") character, since underscores tend to be
displayed as continuous lines:
value 1: 3.11
value 2: 2.05 value 1: 2.37
value 3: 3.06 value 2: 3.44
value 4: 1.05 value 3: 1.03
------------------------ value 4: 0.21
Modest-size bitmaps are generally preferable in this respect to large ones, and simple GIF images involving a few solid colors and simple patterns are much faster to load than, say, JPEG photographic images.
In the case of web archives of graphics images, one clean little trick is to provide a set of shrunk-down "thumbnail" images of the archive's contents as hyperlinks to the full images.
One useful graphic is a "banner". This is a bitmap with title information for the website or a specific document that normally measures 468 x 60 pixels. There are "banner exchanges" available on the web that allow people to submit banners so they will circulate to other users along with a link to the site advertised by the banner.
Some surfers will turn off graphics to get greater speed in accessing websites. Please make sure that the site works even when graphics are not loaded.
* One peculiarity of web pages is that they are not read like books. They are read like scrolls. While it is useful in both books and web pages to have high-priority information first in the text and lower-priority information last, one eccentricity of web pages is that if you have a list of materials in chronological order, it may be better to put the most recent materials first, rather than last, since they are most often accessed.
* A few final comments on website design.
* The previous sections in this document have described how to construct elementary web pages, but there are more sophisticated tools available for building fancier web pages. These features will be mentioned briefly, with no discussion of details, as they are beyond the scope of this short document:
There are many books available on advanced HTML for those who wish to pursue these features. Various tools are now available to make building web pages automatic. However, it still remains useful to understand the basics of HTML, since a web page can be a complicated "machine" that will function much better if the builder knows how it works.
* I originally wrote these materials during the 1990s, but I eliminated them from my website in 2001 as I didn't feel they were getting any attention. I later realized they had some value and it made no sense just to keep them in my archives gathering dust, so I restored them to the site in 2003.
As I had lost the revision history I restored it as "v1.0.0", meaning it was a first-release document. This is not actually true, but as any earlier version would have had a two-part revcode (for example "v1.2") instead of a three-part revcode it should allow any earlier version to be detected, on the unlikely chance there's one floating around on the Internet.
As I wrote this stuff somewhat informally as notes for my own use, I never kept a source list on it. It doesn't really matter in this case, since if the examples in this document work as advertized there's no reason to question their validity.
* Revision history:
v1.0.0 / 01 jun 03 / gvg