XML (Extensible Markup Language)
A structured formatting language for marking up text data for them to be transported around the internet.
- Property
- Self-describing
- Not a programming language
- Cannot display data
- Tags not predefined
- Requirements
- Should be well-formed
- Correct syntax
- Tags match
- Tags nested
- Characters legal
- Should be valid according to DTD (Document Type Definition)
- Tags used correctly
- Tags all declared
- Attributes all declared
- Should be well-formed
Basic Syntax
<?xml version="1.0" ?>
<!DOCTYPE doctype [
<!-- declarations -->
<!ELEMENT element (nested)>
<!ELEMENT nested type>
<!ATTLIST element
attr type value>
]>
<doctype>
<element attr="attribute">
<nested>text &entity;</nested>
</element>
</doctype>
- Elements
- Text
<text>text</text>
- Empty
<empty/>
- Attributes
<element attr="attribute">...
- Easier than using tags to get any component, or change format, etc.
- Text
- Entities: symbols replaced by other text before displayed
Document Type
Describes the schema of an XML document. Can be viewed as a node-labeled tree.
- W3C recommendation
- Elements and attributes that can appear
- Which elements are child elements
- Data types
- DTD
- Verifies correctness
- Describes format
- Sharable
- Standard or user-defined
- Examples
- Technical report
- Novel
- Software manual
- etc.
Standard Document Types
- Text
- TEI, DocBook, NITF
- Data
- CML, AIML
- Mixed
- Often custom e.g. bug reports
Twig Pattern Query
Returns answers that match the structure of the XML document.
Given a twig query $$q$$ and XML document $$d$$, a match is a set of nodes $$n_1, \dots, n_k$$ s.t.
- $$n_i$$ has the same label as the i-th element in $$q$$
- The edge relationships are satisfied
- Edge
/
: parent-child//
: ancestor-descendant