XML (Extensible Markup Language)

A structured formatting language for marking up text data for them to be transported around the internet.

  • Property
    • Self-describing
    • Not a programming language
    • Cannot display data
    • Tags not predefined
  • Requirements
    • Should be well-formed
      • Correct syntax
      • Tags match
      • Tags nested
      • Characters legal
    • Should be valid according to DTD (Document Type Definition)
      • Tags used correctly
      • Tags all declared
      • Attributes all declared

Basic Syntax

<?xml version="1.0" ?>
<!DOCTYPE doctype [
    <!-- declarations -->
    <!ELEMENT element (nested)>
    <!ELEMENT nested type>
    <!ATTLIST element
        attr type value>
]>
<doctype>
    <element attr="attribute">
        <nested>text &entity;</nested>
    </element>
</doctype>
  • Elements
    • Text
      • <text>text</text>
    • Empty
      • <empty/>
    • Attributes
      • <element attr="attribute">...
      • Easier than using tags to get any component, or change format, etc.
  • Entities: symbols replaced by other text before displayed

Document Type

Describes the schema of an XML document. Can be viewed as a node-labeled tree.

  • W3C recommendation
    • Elements and attributes that can appear
    • Which elements are child elements
    • Data types
  • DTD
    • Verifies correctness
    • Describes format
    • Sharable
    • Standard or user-defined
  • Examples
    • Technical report
    • Novel
    • Software manual
    • etc.

Standard Document Types

  • Text
    • TEI, DocBook, NITF
  • Data
    • CML, AIML
  • Mixed
    • Often custom e.g. bug reports

Twig Pattern Query

Returns answers that match the structure of the XML document.

Given a twig query $$q$$ and XML document $$d$$, a match is a set of nodes $$n_1, \dots, n_k$$ s.t.

  1. $$n_i$$ has the same label as the i-th element in $$q$$
  2. The edge relationships are satisfied

  • Edge
    • /: parent-child
    • //: ancestor-descendant

results matching ""

    No results matching ""