Overview
XML is marked up unicode text. It is extensible meaning anyone can make up their own tags and vocabularies. Below is an example of an Xml document:
So why is XML important?
- XML is Os and Application Neutral
- It is humanly readable
- Because it has a strict structure, it is easy to write programs to digest and to manipulate it
- it has become the prefered way to represent and store documents of all kinds
- it has become the preferred way to represent and transfer data between programs especially on the web
- because of the increasing number of documents in xml and because of the increasing amount of data transference in xml, it has become crucial for databases to be able to read, produce and store xml
- <dvd:DVD ISBN="0-7907-4261-6" xmlns:dvd="http://seattlecentral.org/faculty/sconge/DVD" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://seattlecentral.org/faculty/sconge/DVD DVD.xsd"> <dvd:title>The Matrix</dvd:title> <dvd:studio>Warner Brothers</dvd:studio> <dvd:rating>R</dvd:rating> <dvd:genre>SciFi</dvd:genre> <dvd:format>Wide Screen</dvd:format> <dvd:purchaseDate>2003-08-05</dvd:purchaseDate> <dvd:purchasePrice>14.99</dvd:purchasePrice> - <dvd:actors> <dvd:actor>Keanu Reeves</dvd:actor> <dvd:actor>Lawrence Fishburne</dvd:actor> <dvd:actor>Carrie Anne Moss</dvd:actor> <dvd:actor>Hugo Weaving</dvd:actor> </dvd:actors> - <dvd:extras> <dvd:extra>Commentaries</dvd:extra> <dvd:extra>script</dvd:extra> <dvd:extra>games</dvd:extra> </dvd:extras> <dvd:description>The world is an elaborate computer simulation. People are batteries for the machines</dvd:description> </dvd:DVD>
Xml must be "Well-formed", which means it must follow these rules among others:
- Element names can contain no spaces and must start with a letter.
- Element names are case sensitive
- Every element must have a closing element. It is possible to have an empty element that is self closed <element/>
- There must be a "Root" element, that encloses all other elements
- All Attributes must be quoted
- All elements must be properly nested
Other aspects
The Xml document above has other aspects that need mention: The first line is an xml declaration. It is not required that use it, but it is recommended. If you use it, the version="1.0" is a required attribute
Namespaces
The "xmlns" attribute designates a namespace. A namespace is used for two purposes. One is to group related items, the other is to distinguish or "disambiguate" the document's elements--that is, to keep them distinct from elements in other documents that might have the same names. The dvd: is a prefix that represents an alias for that namespace.
Valid Documents and Schema
A document is well-formed if it follows the rules. It is "Valid" if it conforms to a Schema or DTD (Document Type Definition). A Schema is an Xml file that describes and limits the structure of another xml document. An xml document does not have to have an associated schema, but an Xml Document can be "validated" to show that it conforms to a Schema. Below is the schema for the DVD document.
<xsd:schema targetNamespace="http://seattlecentral.org/faculty/sconge/DVD" xmlns:dvd="http://seattlecentral.org/faculty/sconge/DVD" xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xsd:annotation> <xsd:documentation>This Schema validates DVD xml files.</xsd:documentation> </xsd:annotation> <xsd:element name="DVD"> <xsd:complexType> <xsd:sequence> <xsd:element name="title" type="xsd:string" minOccurs="1" /> <xsd:element name="studio" type="xsd:string" /> <xsd:element name="rating"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="G" /> <xsd:enumeration value="PG13" /> <xsd:enumeration value="R" /> <xsd:enumeration value="X" /> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="genre"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Drama" /> <xsd:enumeration value="Comedy" /> <xsd:enumeration value="Mystery" /> <xsd:enumeration value="Western" /> <xsd:enumeration value="SciFi" /> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="format"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Wide Screen" /> <xsd:enumeration value="Standard" /> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="purchaseDate" type="xsd:date" /> <xsd:element name="purchasePrice" type="xsd:float" /> <xsd:element name="actors"> <xsd:complexType> <xsd:sequence> <xsd:element name="actor" type="xsd:string" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="extras"> <xsd:complexType> <xsd:sequence> <xsd:element name="extra" type="xsd:string" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="description" type="xsd:string" /> </xsd:sequence> <xsd:attribute name="ISBN" type="xsd:string" use="required" /> </xsd:complexType> </xsd:element> </xsd:schema>
XML in SQL Server
Sql Server 2005 has these XML features
- It can store Xml as a native data type and validate that data against a stored Xml Schema>
- It can output relational data from SQL as xml
- It can "shred" xml documents into relational columns and rows
- It can import and export xml documents
- It supports XPath and XQuery
- It supports a programming interface for xml with SQLXML
A couple of Notes: XPath is a language that lets you locate a particular value by following the path to "node". XQuery is an extension of XPath that lets you create complex queries of XML Documents