What Is XML?
XML stands for Extensible Markup Language (often written as
eXtensibleMarkup Language to justify the acronym). XML is a
set of rules for defining semantic tags that break a document
into parts and identify the different parts of the document. It
is a meta-markup language that defines a syntax used to define
other domain-specific, semantic, structured markup languages.
XML Is a Meta-Markup Language
The first thing you need to understand about XML is that it
isn’t just another markup language like the Hypertext Markup
Language (HTML) or troff. These languages define a fixed set
of tags that describe a fixed number of elements. If the markup
language you use doesn’t contain the tag you need—you’re
out of luck. You can wait for the next version of the markup
language hoping that it includes the tag you need; but then
you’re really at the mercy of what the vendor chooses to
include.
XML, however, is a meta-markup language. It’s a language
in which you make up the tags you need as you go along.
These tags must be organized according to certain general
principles, but they’re quite flexible in their meaning. For
instance, if you’re working on genealogy and need to describe
people, births, deaths, burial sites, families, marriages,
divorces, and so on, you can create tags for each of these.
You don’t have to force your data to fit into paragraphs, list
items, strong emphasis, or other very general categories.
The tags you create can be documented in a Document Type Definition (DTD).
You’ll learn more about DTDs in Part II of this book. For now, think of a DTD as a
vocabulary and a syntax for certain kinds of documents. For example, the MOL.DTD
in Peter Murray-Rust’s Chemical Markup Language (CML) describes a vocabulary
and a syntax for the molecular sciences: chemistry, crystallography, solid state
physics, and the like. It includes tags for atoms, molecules, bonds, spectra, and so
on. This DTD can be shared by many different people in the molecular sciences
field. Other DTDs are available for other fields, and you can also create your own.
XML defines a meta syntax that domain-specific markup languages like MusicML,
MathML, and CML must follow. If an application understands this meta syntax, it
automatically understands all the languages built from this meta language. A
browser does not need to know in advance each and every tag that might be used
by thousands of different markup languages. Instead it discovers the tags used by
any given document as it reads the document or its DTD. The detailed instructions
about how to display the content of these tags are provided in a separate style
sheet that is attached to the document.
For example, consider Schrodinger’s equation:
Scientific papers are full of equations like this, but scientists have been waiting
eight years for the browser vendors to support the tags needed to write even the
most basic math. Musicians are in a similar bind, since Netscape Navigator and
Internet Explorer don’t support sheet music.
XML means you don’t have to wait for browser vendors to catch up with what you
want to do. You can invent the tags you need, when you need them, and tell the
browsers how to display these tags.
0