MAPXML Primer


Goals:

MAPXML is a programmers tool for mapping XML source into other formats.

Installation and Dependency:

MAPXML requires the following Python libraries to be available for import:
types, string, rparsexml, time


MAPXML Primer

MAPXML is a programmers tool for mapping XML source into other formats. It is similar in some respects to XSL (the Extensible Stylesheet Language ) or XSLT (XSL Transformations). The top level interface to specifying a mapping is a MapController instance. A MapController maps one XML format to another by recursive tag translation.

BASIC USAGE

In basic usage initialize a Map Controller

    M = MapController()

and assign substitution strings or MapNodes for each tag. (These subsitutions are covered in more detail in the sections below).

Then process text using

    translatedtext = M.process(originaltext)

The number of XML tags in originaltext should match the number of assigned substitution strings or MapNodes for M. If they don't, you will get errors.

THE TOP LEVEL SUBSTITUTION

Every MapController must have a top level substitution which defines a substitution location for "%(__content__)s", for example as in

    M[''] = ''' <html>
         %(__content__)s
         </BODY>
         </html>
         '''

Here the empty tag '' signifies the top level. In this particular example the start of the <BODY> tag might be introduced in the translation of another tag (like <DOCUMENTTITLE>).

SUBSTITUTION STRINGS:

You can either assign a substitution string to a tag name (for tags with content) or assign a MapNode for tags which either do not have content or require other special handling (such as attribute defaults and transforms).

FOR TAGS WITH CONTENT, you can directly assign a string (which internally is converted into a MapNode), for example

    M["H1"] = "<br><br><font size=+3><b>%(__content__)s</b></font><br>"

In this case the H1 tag has content (<H1>this is the content</H1>) but no attributes that effect the translation. Note that the marker "%(__content__)s" indicates the location of the recursively translated content. Also note the use of "%" and "s" in this marker - these are important, and easy to miss out (the "%" is a format operator, and the "s" is an indicator that it is a string).

    <H1>This is the content</H1>

translates to

    <br><br><font size=+3><b>This is the content</b></font><br>

FOR TAGS WITHOUT CONTENT or which require special handling assign an explicit MapNode

    M["AUTHOR"] = MapNode(None, "<em>Aaron Watters</em>")

the AUTHOR tag has no content and no attributes that effect translation.

    <AUTHOR/>

translates to

    <em>Aaron Watters</em>

CONTAINED TAGS ARE RECURSIVELY TRANSLATED. Using the translations declared above

    <H1><AUTHOR/></H1>

becomes

    <br><br><font size=+3><b><em>Aaron Watters</em></b></font><br>

FOR TAGS WITH ATTRIBUTES include the appropriate attributes in the substitution string, as required. For example,

    M["RETIRED"] = "<em>%(TITLE)s</em> <b>%(__content__)s</b> <em>Emeritas</em>"

The RETIRED tag has content and one attribute (TITLE) which is required and effects translation.

    <RETIRED TITLE="Dr.">Carl Fungus</RETIRED>

becomes

    <em>Dr.</em> <b>Carl Fungus</b> <em>Emeritas</em>

For

    M["EXCLAMATION"] = "<FONT COLOR="RED"><B>%(TEXT)</B>, <EM>%(RECIPIENT)s</EM></FONT>"

EXCLAMATION has no content and two attributes (TEXT and RECIPIENT) which are required and effect formatting.

    <EXCLAMATION TEXT="Holy rusty metal" RECIPIENT="Batman"/>

becomes

    <FONT COLOR="RED"><B>Holy rusty metal</B>, <EM>Batman</EM></FONT>

ATTRIBUTE DEFAULTS can be assigned using a MapNode assigned to a tag. For example an alternate RETIRED tag might assign "Dr." as the default value of Title.

    R = MapNode("<em>%(TITLE)s</em> <b>%(__content__)s</b> <em>Emeritas</em>")
    R.default["TITLE"] = "Dr."
    M["RETIRED"] =  R

Then using AUTHOR and RETIRED

    <RETIRED><AUTHOR/></RETIRED>

becomes

    <em>Dr.</em> <b><em>Aaron Watters</em></b> <em>Emeritas</em>

(Note the default "Dr." substitution). I wish ;(.

ATTRIBUTE TRANSFORMS can be assigned using a MapNode. For example to uppercase the titles of all retirees:

    R = MapNode("<em>%(TITLE)s</em> <b>%(__content__)s</b> <em>Emeritas</em>")
    from string import upper
    R.addTransform(TITLE, upper)
    M["RETIRED"] =  R

Then

    <RETIRED TITLE="Dr.">Carl Fungus</RETIRED>

becomes

    <em>DR.</em> <b>Carl Fungus</b> <em>Emeritas</em>

The MapNode.transformContent(transform) will perform a transformation on the recursively generated content of a tag (be careful when using this one!)

MapNodes can alter the translation definitions for tags encountered in their content. For example

    M["Title"] = "<H1>%(__content__)s</H1>"
    ...
    C = MapNode("<hr>Chapter<hr> %(__content__)s")
    C["Title"] = "<H3>%(__content__)s</H3>"
    M["Chapter"] = C

Then a title outside of the Chapter context defaults to H1 but a title inside a Chapter context defaults to H3.

"TAG NAME" OF None provides a "default" substitution to apply to unknown tags, as in

    M[None] = "" # erase unknown tags
    M[None] = "%(__content__)s" # echo content for unknown tags