When you create an electronic text file of any kind, you may be
typing letters, but the file does not contain letters. No computer file does
that. It stores zeroes and ones in a complex code that a piece of software can
read and then translate into the characters you see when the file is opened
and displayed. The way this looks when it is displayed depends on two different
pieces of information. The first is the character code, and the second is the
font. In a word processor you can do some fancy things this way using unusual
fonts like WingDings, but in the Web environment font choices are somewhat more
limited, which is why people don’t use fonts like WingDings for Web pages, if
they are smart. Wingdings is Windows-only font, so it would not be available
for a large number of browsers.
In a pure text environment, like ASCII text, no font information
is used at all. But there are still a number of characters available. In the
ASCII coding scheme, an 8-bit byte is used to code each character. Since each
bit can be either a zero or a one, 8 bits means you can have 2^8 = 256 possible
characters. The way these codes are used has been standardized by the International
Organization for Standardization (ISO) for various languages. A good Web editor
will declare the language you are using in the head section using a META tag
like this:
<meta http-equiv=”Content-Type” content=”text/html; charset=iso-8859-1″>
This says that the character set being used in the document is
the ISO-8859-1 character set, which is the Latin-1 set used by many Western
European languages. Other languages use other character sets as needed, which
is how Russian, Greek, etc. pages can be created.
The ISO-8859-1 character set has 256 possible characters, in theory,
numbered from 0 to 255. Any of these characters can be used in a Web page by
first using the escape character, the Ampersand (&), followed by the number
(with a # sign in front), and finally a semicolon (;). The Ampersand is the
signal to the browser that something different is going to happen, and that
it should not display the characters that folllow, but instead treat them as
a special code. The semicolon at the end tells the browser that the special
code is finished, and that it should resume displaying the characters that it
finds in the text.
Your keyboard handles a lot of this stuff for you. When you type
on it, it sends the numerical code for your letters, punctuation, etc. and the
software displays it just fine. So while anything you can type on a keyboard
has a numerical code, in practice you never need to think about it. It is only
when you need a special character that is not on the
keyboard that you need to know the code. Here are the ISO-8859-1 entities:
Description | Character | Code | Name |
---|---|---|---|
Unused*(see note) | Unused | � | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
horizontal tab | 	 | ||
line feed | | ||
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused | | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
Unused | Unused |  | |
space |   | ||
exclamation mark | ! | ! | |
double quotation mark | “ | " | " **(see note) |
pound sign/number sign | # | # | |
dollar sign | $ | $ | |
percentage sign | % | % | |
ampersand | & | & | & |
apostrophe | ‘ | ' | |
left parenthesis | ( | ( | |
right parenthesis | ) | ) | |
asterisk | * | * | |
plus sign | + | + | |
comma | , | , | |
hyphen | – | - | |
period (full stop) | . | . | |
solidus (slash) | / | / | |
0 | 0 | ||
1 | 1 | ||
2 | 2 | ||
3 | 3 | ||
3 | 4 | ||
5 | 5 | ||
6 | 6 | ||
7 | 7 | ||
8 | 8 | ||
9 | 9 | ||
colon | : | : | |
semi-colon | ; | ; | |
less-than sign | < | < | < |
equal sign | = | = | |
greater-than sign | > | > | > |
question mark | ? | ? | |
commercial at | @ | @ | |
A | A | ||
B | B | ||
C | C | ||
D | D | ||
E | E | ||
F | F | ||
G | G | ||
H | H | ||
I | I | ||
J | J | ||
K | K | ||
L | L | ||
M | M | ||
N | N | ||
O | O | ||
P | P | ||
Q | Q | ||
R | R | ||
S | S | ||
T | T | ||
U | U | ||
V | V | ||
W | W | ||
X | X | ||
Y | Y | ||
Z | Z | ||
left square bracket | [ | [ | |
reverse solidus (back slash) | \ | \ | |
right square bracket | ] | ] | |
caret | ^ | ^ | |
horizontal bar (underscore) | _ | _ | |
acute accent | ` | ` | |
a | a | ||
b | b | ||
c | c | ||
d | d | ||
e | e | ||
f | f | ||
g | g | ||
h | h | ||
i | i | ||
j | j | ||
k | k | ||
l | l | ||
m | m | ||
n | n | ||
o | o | ||
p | p | ||
q | q | ||
r | r | ||
s | s | ||
t | t | ||
u | u | ||
v | v | ||
w | w | ||
x | x | ||
y | y | ||
z | z | ||
left curly brace | { | { | |
vertical bar | | | | | |
right curly brace | } | } | |
tilde | ~ | ~ | |
square | |  | |
€ | € | ||
Unused | Unused |  | |
comma | ‚ | ‚ | |
function | ƒ | ƒ | |
low left rising double quote | „ | „ | |
ellipsis | … | … | |
dagger mark | † | † | |
double dagger | ‡ | ‡ | |
letter modifying circumflex | ˆ | ˆ | |
per thousand sign | ‰ | ‰ | |
capital S caron or haceck | Š | Š | |
left single angle quote | ‹ | ‹ | |
capital OE ligature | Œ | Œ | |
Unused | Unused |  | |
Ž | Ž | ||
Unused | Unused |  | |
Unused | Unused |  | |
left single quotation mark | ‘ | ‘ | |
right single quoatation mark | ’ | ’ | |
left double quotation mark | “ | “ | |
right double quotation mark | ” | ” | |
round solid bullet | • | • | |
en dash | – | – | |
em dash | — | — | |
tilde | ˜ | ˜ | |
trademark | ™ | ™ | |
s caron or hacek | š | š | |
right single angle quotation mark | › | › | |
small oe ligature | œ | œ | |
Unused | Unused |  | |
ž | ž | ||
capital Y umlaut | Ÿ | Ÿ | |
non-breaking space |   | | |
inverted exclamation mark | ¡ | ¡ | ¡ |
cent sign | ¢ | ¢ | ¢ |
pound sign | £ | £ | £ |
currency sign | ¤ | ¤ | ¤ |
yen sign | ¥ | ¥ | ¥ |
broken vertical bar | ¦ | ¦ | ¦ |
section sign | § | § | § |
spacing diaresis | ¨ | ¨ | ¨ |
copyright sign | © | © | © |
feminine ordinal indicator | ª | ª | ª |
angle quotation mark, left | « | « | « |
negation sign | ¬ | ¬ | ¬ |
soft hyphen | | ­ | ­ |
circled R registered sign | ® | ® | ® |
spacing macron | ¯ | ¯ | &hibar; |
degree sign | ° | ° | ° |
plus-or-minus sign | ± | ± | ± |
superscript 2 | ² | ² | ² |
superscript 3 | ³ | ³ | ³ |
spacing acute | ´ | ´ | ´ |
micro sign | µ | µ | µ |
paragraph sign | ¶ | ¶ | ¶ |
middle dot | · | · | · |
spacing cedilla | ¸ | ¸ | ¸ |
superscript 1 | ¹ | ¹ | ¹ |
masculine ordinal indicator | º | º | º |
angle quotation mark, right | » | » | » |
fraction 1/4 | ¼ | ¼ | ¼ |
fraction 1/2 | ½ | ½ | ½ |
fraction 3/4 | ¾ | ¾ | ¾ |
inverted question mark | ¿ | ¿ | ¿ |
capital A, grave accent | À | À | À |
capital A, acute accent | Á | Á | Á |
capital A, circumflex accent | Â | Â | Â |
capital A, tilde | Ã | Ã | Ã |
capital A, dieresis or umlaut mark | Ä | Ä | Ä |
capital A, ring | Å | Å | Å |
capital AE diphthong (ligature) | Æ | Æ | Æ |
capital C, cedilla | Ç | Ç | Ç |
capital E, grave accent | È | È | È |
capital E, acute accent | É | É | É |
capital E, circumflex accent | Ê | Ê | Ê |
capital E, dieresis or umlaut mark | Ë | Ë | Ë |
capital I, grave accent | Ì | Ì | Ì |
capital I, acute accent | Í | Í | Í |
capital I, circumflex accent | Î | Î | Î |
capital I, dieresis or umlaut mark | Ï | Ï | Ï |
capital Eth, Icelandic | Ð | Ð | Ð |
capital N, tilde | Ñ | Ñ | Ñ |
capital O, grave accent | Ò | Ò | Ò |
capital O, acute accent | Ó | Ó | Ó |
capital O, circumflex accent | Ô | Ô | Ô |
capital O, tilde | Õ | Õ | Õ |
capital O, dieresis or umlaut mark | Ö | Ö | Ö |
multiplication sign | × | × | × |
capital O, slash | Ø | Ø | Ø |
capital U, grave accent | Ù | Ù | Ù |
capital U, acute accent | Ú | Ú | Ú |
capital U, circumflex accent | Û | Û | Û |
capital U, dieresis or umlaut mark | Ü | Ü | Ü |
capital Y, acute accent | Ý | Ý | Ý |
capital THORN, Icelandic | Þ | Þ | Þ |
small sharp s, German (sz ligature) | ß | ß | ß |
small a, grave accent | à | à | à |
small a, acute accent | á | á | á |
small a, circumflex accent | â | â | â |
small a, tilde | ã | ã | ã |
small a, dieresis or umlaut mark | ä | ä | ä |
small a, ring | å | å | å |
small ae diphthong (ligature) | æ | æ | æ |
small c, cedilla | ç | ç | ç |
small e, grave accent | è | è | è |
small e, acute accent | é | é | é |
small e, circumflex accent | ê | ê | ê |
small e, dieresis or umlaut mark | ë | ë | ë |
small i, grave accent | ì | ì | ì |
small i, acute accent | í | í | í |
small i, circumflex accent | î | î | î |
small i, dieresis or umlaut mark | ï | ï | ï |
small eth, Icelandic | ð | ð | ð |
small n, tilde | ñ | ñ | ñ |
small o, grave accent | ò | ò | ò |
small o, acute accent | ó | ó | ó |
small o, circumflex accent | ô | ô | ô |
small o, tilde | õ | õ | õ |
small o, dieresis or umlaut mark | ö | ö | ö |
division sign | ÷ | ÷ | ÷ |
small o, slash | ø | ø | ø |
small u, grave accent | ù | ù | ù |
small u, acute accent | ú | ú | ú |
small u, circumflex accent | û | û | û |
small u, dieresis or umlaut mark | ü | ü | ü |
small y, acute accent | ý | ý | ý |
small thorn, Icelandic | þ | þ | þ |
small y, dieresis or umlaut mark | ÿ | ÿ | ÿ |
*Note 1: The first 32 characters (0-31) are
non-printing control characters reserved since the dark ages for really geeky
computer science stuff, like controlling line printers attached to mainframe
computers. There is no reason I can think of why they need to be reserved in
the ISO-8859-1 character set, other than inertia, but in any case they are still
reserved for these reasons. If you want to see what they are, go to
http://en.wikipedia.org/wiki/ASCII
**Note: In the last column are text equivalents
for many of the numerical codes. Support for some of these is not as great as
for the numerical codes.