HTML: The Definitive Guide

Previous Chapter 4 Next
 

4.10 Special Character Encoding

For the most part, characters within HTML documents that are not part of a tag are rendered as-is by the browser. However, some characters have special meaning and are not directly rendered, while other characters can't be typed into the source document from a conventional keyboard. Special characters need either a special name or a numeric character encoding for inclusion in an HTML document.

Special HTML Characters

As has become obvious in the discussion and examples leading up to this section of the book, three special characters in HTML source documents have very special meaning: the less-than sign (<), greater-than sign (>), and ampersand (&). These characters delimit tags and special character references. They'll confuse a browser if left dangling alone or with improper tag syntax. So you've got to go out of your way to include their actual, literal characters in your HTML documents. The only exception to this is that these character may appear literally within the <listing> and <xmp> tags.

Similarly, you've got to use a special encoding to include double quote characters within a quoted string, or when you want to include a special character that doesn't appear on your keyboard but is part of the ISO Latin-1 character set implemented and supported by most browsers.

Inserting Special Characters

To include a special character in your HTML document, you enclose either its standard entity name or a hash mark (#) and its numeric position in the Latin-1 standard character set[2] inside a leading ampersand and an ending semicolon, without any spaces in between. Whew. That's a long explanation for what is really a simple thing to do, as the following example illustrates. It shows how to include a greater-than sign in a snippet of code by using the character's entity name. It also demonstrates how to include a greater-than sign in your HTML text by referencing its Latin-1 numeric value:

[2] The very familiar ASCII character set is a subset of the more comprehensive Latin-1 character set. Composed by the well-respected International Organization for Standardization (ISO), the Latin-1 set is a list of all the letters, numbers, punctuation marks, and so on, commonly used by Western language writers, organized by number and encoded with special names. Appendix D contains the complete Latin-1 character set and encodings.

if a &gt; b, then t = 0
if a &#62; b, then t = 0

Both examples cause the text to be rendered as:

if a > b, then t = 0

The complete set of character entity values and names are in Appendix D. You could write an entire HTML document using character encoding, but that would be silly.


Previous Home Next
Addresses Book Index Rules, Images, and Multimedia
 


Banner.Novgorod.Ru