Extensible Markup Language (XML) - Getting Started

Extensible Markup Language was fairly new to web technologies when I first heard of it in 2000. Though at the time the version 1.0 specification from the W3C was only two years old. Since it’s introduction it has many applications. XML has had two applications in my development. The first is SOAP and web services and the other is markup describing data. Such as if I had employee information, I would describe its information with tags: firstname, lastname, address, city, state, and zipcode.

I won’t get into SOAP and web services here. I’ll do that another time. But, if we were to start with XML, it is best to start with the ideas of XML, XSLT, DTD, and XML Schema.

To get into the importance of XML, let first talk about well-formed markup versus validated markup. Well-formed markup means that for every start tag, there is an end tag. Secondly, that all markup is properly nested. Here are couple examples:

Example: Missing Closing Tag

Example: Improper Nested Tags

Valid markup says that this set of markup follows these semantic rules. These rules tells you how the markup should be used. They enforce constraints. For instance, the W3C defines semantic rules for HTML. The two popular ones are Strict and Transitional. There are many other validations, but these seem to be widely used.

Now that we understand these two import aspects of XML, the next topics are XML Schema and DTD. XML Schema and DTD enforce validation. There is a difference between the two documents. XML Schema provides higher abstraction to an XML document. Though, a DTD (Document Type Definition) has a narrow use for enforcing constraints and structure.

Example: DTD

Creating DTDs are simple. Above are the elements that are expected for the XML document. The Employee has children and for each child there is a specification as "#PCDATA". This tells the parser that the text inside the tag will be parsed. "#CDATA" says the text will not be parsed.

Example: XML Schema

As you see, XML Schemas are more complex. There is a definition for a sequence for elements and as you can see the root element is defined at the end. Another point to notice is the definition for complex types. A complex type just enforces constraints on an element.

The last component to XML is XML Stylesheets (XSLT). If you look at the example below, HTML markup is embedded into the stylesheet. The “xsl:for-each” tag contains an attribute that selects what data will be placed in its postion as the dataset is rendered and styled for the browser.

Example: XML Stylesheet (XSLT)

What is shown above are basic concepts to XML. These documents can get long and complicated so therefore I would recommend looking for some software to help you manage these documents. One I would recommend is Altova. Though to use it in the long term, purchase is necessary. Altova does offer a trial period. Altova was very handy in speeding up project completion.