HTML stands for Hyper-Text Markup Language. It is a coding language, which uses a method called markup, to create hyper-text. HTML is actually a simplified subset of a more general markup language called SGML, which stands for Standard Generalized Markup Language, but is gradually returning to SGML as it evolves. This evolution of HTML is worth knowing at least a little about, since HTML is not set in stone. The changes that are occurring have their reasons, mostly in terms of creating capabilities that previous versions were lacking.
In 1989, Tim Berners-Lee, working at the European particle physics institute known as CERN (Centre European pour la Recherche Nucleaire), proposed a system to allow scientists to share papers with other using electronic networking methods. His idea became what is called the World-Wide Web. Since these documents were to be shared, some common method coding them needed to be developed. Tim Berners-Lee suggested that it be based on the already existing SGML. Here are a few quotes from a 1990 CERN memo that Berners-Lee wrote:
HyperText is a way to link and access information of various kinds as a web of nodes in which the user can browse at will. It provides a single user-interface to large classes of information (reports, notes, data-bases, computer documentation and on-line help).
We propose a simple scheme incorporating servers already available at CERN...
A program which provides access to the hypertext world we call a browser...
It would be inappropriate for us (rather than those responsible) to suggest specific areas, but experiment online help, accelerator online help, assistance for computer center operators, and the dissemination of information by central services such as the user office and CN [Computing & Networks] and ECP [Electronics & Computing for Physics] divisions are obvious candidates.
WorldWideWeb (or W3 ) intends to cater for these services across the HEP [ High Energy Physics ] community.
As you can see, Tim Berners-Lee put all of the basic pieces into place.
In 1992, when there were all of 50 web servers in the world, CERN released the portable Web browser as freeware. Marc Andreesen, who was working at the National Center for Supercomputing Applications, created a browser called Mosaic which was released in 1993. Shortly after that, he left NCSA to found Netscape. The first version of the Netscape browser implemented HTML 1.0.
In 1992, Berners-Lee and the CERN team released the first draft HTML 1.0, which was finalized in 1993. This specification was so simple it could be printed on one side of a piece of paper, but even then it contained the basic idea that has become central in the recent evolution of HMTL, which is the separation between logical structures and presentational elements. This is the most important single idea to grasp in learning HTML, IMHO. In 1994, HTML 2.0 was developed by the Internet Engineering Task Force's HTML Working Group. This group later was disbanded in favor of the World Wide Web Consortium (http://www.w3.org), which continues to develop HTML.
Netscape was just one of a number of browsers available. Mosaic was still offered by NCSA, Lynx was available on Unix machines, and few other companies were creating browsers. One of them, Spyglass, was purchased by Microsoft, and became the basis for Internet Explorer. Each browser contains, in its heart, a rendering engine, which is the code that tells it how to take your HTML and turn it into something you can see on the screen. What happened at this point is that each company, most particularly Netscape and Microsoft, started to develop their own "extensions" to HTML, often going in different directions. This problem bedevils us to this day, though the upcoming Netscape 6 browser may resolve this by being 100% compliant with the published HMTL standards. We are still waiting to see what this will look like.
The World Wide Web Consortium (W3C), which had taken over HTML development, attempted to create some standardization in HTML 3.0. But there was so much argument over what should be included that it never got beyond the draft discussion stage. Finally, in 1996 a consensus version, HTML 3.2, was issued. This added features like tables, and text flowing around images, to the official specification, while maintaining backwards compatibility with HTML 2.0. This also is a convenient place for marking the divergence in practice from the separation that Berners-Lee first made between logical structures and presentational elements. And as the Web took off in popularity, this breakdown became widespread and serious. The main focus of the W3C since then has been to rectify the situation. An example of this is the widespread use of tables and transparent "shim" GIFs to create page layout. While this creates pages that are visually correct, the logical structure of the page is pretty much destroyed, and such pages are frequently useless to anyone using a text browser, or a text-to-speech parser.
The W3C released the HTML 4.0 specification at the end of 1997, and followed with HTML 4.01 in 1999, which mostly corrected a few errors in the 4.0 specification. This release attempted to correct some of the more egregious errors that 3.2 had allowed (encouraged?) designers to commit, particularly in introducing Cascading Style Sheets. But in fact the W3C has abandoned HTML as the default standard in favor of a move back towards the root of SGML, a larger and more complex language. There will probably never be another HTML specification.
This is the successor to HTML. The "X" stands for Extensible. This is a reformulation of HTML 4.01 within XML (Extensible Markup Language), which is far more rigorous, and is intended to start moving the creation of Web pages away from HTML. This was released earlier this year, and is the most current standard for creating Web pages. This introduces some interesting changes in coding. For example, virtually all tags now have to be closed, including paragraph tags. Other tags, like the FONT tag, have been banished in favor of using Cascading Style Sheets to control all presentational elements.
Now, while standards are wonderful, that does not mean that browsers follow them. No browser currently available is completely consistent with HTML 4.0, which is already two and a half years old. Support for Cascading Style Sheets (CSS), for instance, is spotty and incomplete in all browsers. Also, each browser (rendering engine) interprets the specifications in different ways, leading to the eternal complaint of pages looking different in different browsers. Plus, most browsers have tried to maintain backwards compatibility with older standards, which complicates things when a newer standard invalidates some aspect of an older standard.
As I mentioned before, Netscape 6, which is still in development, is claimed to be 100% standards compliant with HTML 4.0, XHTML 1.0, CSS1, and partially compliant with CSS2. If they can pull it off, this would be wonderful for Web developers. But we have to wait and see what happens. Also, Netscape is not the only browser on the market. The leader, Microsoft's Internet Explorer, has better standards support than Netscape does among current browsers, but IE has appeared to drop full compliance from its plans, and has received a lot of criticism from the Web developer community on that account. Netscape, meanwhile, has made the decision to drop backwards compatibility from its rendering engine so as to get a lean, efficient, standards-compliant browser. It is entirely possible, therefore, that many pages that work fine now will stop working in Netscape 6 because they use methods that are no longer acceptable.