
#####
Howto
#####

XIST is an extensible HTML/XML generator written in Python. It was
developed as a replacement for an HTML preprocessor named HSC and
borrows some features and ideas from it. It also borrows the basic
ideas (XML/HTML elements as Python objects) from HTMLgen or HyperText.

(If you're impatient, there's also a list of examples that shows what
can be done with XIST.)


========
Overview
========

XIST can be used as a compiler that reads an input XML file and
generates a transformed output file, or it could be used for generating
XML dynamically inside a web server (but note that handling object
trees is slower than simply sending string fragments). In either case
generating the final HTML or XML output requires the following three
steps:

  * Generating a source XML tree: This can be done either by parsing an
    XML file, or by directly constructing the tree ? as HTMLgen and
    HyperText do ? as a tree of Python objects. XIST provides a very
    natural and pythonic API for that.
  * Converting the source tree into a target tree: This target tree can
    be a HTML tree or a SVG tree or XSL-FO tree or any other XML tree
    you like. Every node class provides a convert method for performing
    this conversion. For your own XML element types you have to define
    your own element classes and implement an appropriate convert
    method. This is possible for processing instructions and entity
    references too.
  * Publishing the target tree: For generating the final output a
    Publisher object is used that generates the encoded byte string
    fragments that can be written to an output stream (or yielded from
    a WSGI application, etc.).


======================
Constructing XML trees
======================

Like any other XML tree API, XIST provides the usual classes:

  * Element for XML elements;
  * Attr for attributes;
  * Attrs for attribute mappings;
  * Text for text data;
  * Frag for document fragments, (a Frag object is simply a list of
    nodes);
  * Comment for XML comments (e.g. <!-- the comment -->);
  * ProcInst for processing instructions (e.g. <?php echo $spam;?>);
  * Entity for entity references (e.g. &parrot;) and
  * DocType for document type declarations (e.g. <!DOCTYPE html PUBLIC
    ...>).


---------------------------
XML trees as Python objects
---------------------------

XIST works somewhat different from a normal DOM API. Instead of only
one element class, XIST has one class for every element type. All the
elements from different XML vocabularies known to XIST are defined in
modules in the ll.xist.ns subpackage. (Of course it's possible to
define additional namespaces for your own XML vocabulary). The
definition of HTML can be found in ll.xist.ns.html for example.

Every element class has a constructor of the form:

    __init__(self, *content, **attrs)


Positional arguments (i.e. items in content) will be the child nodes of
the element node. Keyword arguments will be attributes. You can pass
most of Python's builtin types to such a constructor. Strings (str and
unicode) and integers will be automatically converted to Text objects.
Constructing an HTML element works like this:

    from ll.xist.ns import html

    node = html.div(
       "Hello ",
       html.a("Python", href="http://www.python.org/"),
       " world!"
    )


For attribute names that collide with Python keywords or are not legal
identifiers (most notably class in HTML) the attribute name must be
slightly modified, so that it's a legal Python identifier (for class an
underscore is appended):

    node = html.div(
       "Hello world!",
       class_="greeting"
    )


(Don't worry: This modified attribute name will be mapped to the real
official attribute name once the output is generated.)

You can pass attributes as a dictionary too:

    node = html.div(
       "Hello world!",
       {
          "class_": "greeting",
          "id": 42,
          "title": "Greet the world"
       }
    )



-----------------------------------
Generating XML trees from XML files
-----------------------------------

XML trees can also be generated by parsing XML files. For this the
module ll.xist.parsers provides several functions:

    def parseString(text, base=None, sysid=None, **parserargs)
    def parseURL(url, base=None, sysid=None, **parserargs)
    def parseFile(filename, base=None, sysid=None, **parserargs)
    def parse(stream, base=None, sysid=None, **parserargs)


parseString is for parsing strings (str and unicode) and parseURL is
for parsing resources from URLs. With parseFile you can parse local
files and with parse you can directly parse from a file-like object.

All four functions create a parser internally, parse the supplied
source document and return the resulting object tree.

For example, parsing a string can be done like this:

    from ll.xist import parsers
    from ll.xist.ns import html

    node = parsers.parseString(
       "<p>Hello <a href="http://www.python.org/">Python</a> world!</p>"
    )


For further info about the rest of the arguments to the parsing
functions, see the documentation for ll.xist.parsers.Parser.


==============================================
Defining new elements and converting XML trees
==============================================

To be able to parse an XML file, you have to provide an element class
for every element type that appears in the file. These classes either
come from modules provided by XIST or you can define your own. Defining
your own element class for an element named cool works like this:

    class cool(xsc.Element):
       def convert(self, converter):
          node = html.b(self.content, u" is cool!")
          return node.convert(converter)


You have to derive your new class from xsc.Element. The name of the
class will be the element name. For element type names that are invalid
Python identifiers, you can use the class attribute xmlname in the
element class to overwrite the element name.

To be able to convert an element of this type to a new XML tree
(probably HTML in most cases), you have to implement the convert
method. In this method you can build a new XML tree from the content
and attributes of the object.

Using this new element is simple

    >>> node = cool("Python")
    >>> print node.conv().asBytes()
    <b>Python is cool!</b>


conv simply calls convert with a default converter argument. We'll come
to converters in a minute. asBytes is a method that converts the node
to a byte string. This method will be explained when we discuss the
publishing interface.

Note that it is vital for your own convert methods that you recursively
call convert on your own content, because otherwise some unconverted
nodes might remain in the tree. Let's define a new element:

    class python(xsc.Element):
       def convert(self, converter):
          return html.a(u"Python", href=u"http://www.python.org/")


Now we can do the following:

    >>> node = cool(python())
    >>> print node.conv().asBytes()
    <b><a href="http://www.python.org/">Python</a> is cool!</b>


But if we forget to call convert for our own content, i.e. if the
element cool was written like this:

    class cool(xsc.Element):
       def convert(self, converter):
          return html.b(self.content, " is cool!")


we would get:

    >>> node = cool(python())
    >>> print node.conv().asBytes()
    <b><python /> is cool!</b>


Furthermore convert should never modify self, because convert might be
called multiple times for the same node.


----------
Converters
----------

conv is a convenience method that creates a default converter for you
and calls convert. This converter is created once and is passed to all
convert calls. It is used to store parameters for the conversion
process and it allows elements to pass information to other nodes. You
can also call convert yourself, which would look like this:

    from ll.xist import converters
    from ll.xist.ns import html

    node = cool(python())
    node = node.convert(converters.Converter())


You can pass the following arguments to the Converter constructor:

root
    root (which defaults to None) is the root URL for the conversion
    process. When you want to resolve a link in some of your own
    convert methods, the URL must be interpreted relative to this root
    URL (You can use URLAttr.forInput for that).
mode
    mode (which defaults to None) works the same way as modes in XSLT.
    You can use this for implementing different conversion modes.
stage
    stage (which defaults to "deliver") allows you to implement multi
    stage conversion: Suppose that you want to deliver a dynamically
    constructed web page with XIST that contains results from a
    database query and the current time. The data in the database
    changes infrequently, so it doesn't make sense to do the query on
    every request. The query is done every few minutes and the
    resulting HTML tree is stored in the servlet (using any of the
    available Python servlet technologies). For this conversion the
    stage would be "cache" and your database XML element would do the
    query when stage=="cache". Your time display element would do the
    conversion when stage=="deliver" and simply returns itself when
    stage=="cache", so it would still be part of the cached XML tree
    and would be converted to HTML on every request.
target
    target (which defaults to ll.xist.ns.html) specifies what the
    output should be. Values must be namespace subclasses (see below
    for an explanation of namespaces).
lang

    lang (which defaults to None) is the language in which the result
    tree should be. This can be used in the convert method to implement
    different conversions for different languages, e.g.:

        class note(xsc.Element):
           def convert(self, converter):
              if converter.lang==u"de":
                 title = u"Anmerkung"
              elif converter.lang==u"en":
                 title = u"Note"
              else:
                 title = u"???"
              node = xsc.Frag(
                 html.h1(title),
                 html.div(self.content)
              )
              return node.convert(converter)


Additional arguments are passed when a converter is created in the
context of a make script.


----------
Attributes
----------

Setting and accessing the attributes of an element works via the
dictionary interface:

    >>> node = html.a(u"Python", href=u"http://www.python.org/")
    >>> print node[u"href"].asBytes()
    http://www.python.org/
    >>> del node[u"href"]
    >>> print node[u"href"].asBytes()

    >>> node[u"href"] = u"http://www.python.org"
    >>> print node[u"href"].asBytes()
    http://www.python.org/


All attribute values are instances of subclasses of the class Attr.
Available subclasses are:

  * TextAttr, for normal text attributes;
  * URLAttr, for attributes that are URLs;
  * BoolAttr, for boolean attributes (for such an attribute only its
    presence is important, it's value will always be the same as the
    attribute name when publishing);
  * IntAttr, for integer attributes;
  * ColorAttr, for color attributes (e.g. #ffffff).

IntAttr and ColorAttr mostly serve as documentation of the attributes
purpose. Both classes have no added functionality.

Attr itself is derived from Frag so it is possible to use all the
sequence methods on an attribute.

Unset attributes will be treated like empty ones so the following is
possible:

    del node["spam"]
    node["spam"].append("ham")


This also means that after del node["spam"][:] the attribute will be
empty again and will be considered to be unset. Such attributes will be
skipped when publishing.

The main purpose of this is to allow you to construct values
conditionally and then use those values as attribute values:

    import random

    if random.random() < 0.5:
       class_ = None
    else:
       class_ = u"foo"

    node = html.div(u"foo", class_=class_)


In 50% of the cases the generated div element will not have a class
attribute.


Defining attributes
###################

When you define a new element you have to specify the attributes
allowed for this element. For this use the class attribute Attrs (which
must be a class derived from xsc.Element.Attrs) and define the
attributes by deriving them from one of the existing attribute classes.
We could extend our example element in the following way:

    class cool(xsc.Element):
       class Attrs(xsc.Element.Attrs):
          class adj(xsc.TextAttr): pass

       def convert(self, converter):
          node = xsc.Frag(self.content, u" is")
          if u"adj" in self.attrs:
             node.append(u" ", html.em(self[u"adj"]))
          node.append(u" cool!")
          return node.convert(converter)


and use it like this:

    >>> node = cool(python(), adj=u"totally")
    >>> node.conv().asBytes()
    <a href="http://www.python.org/">Python</a> is <em>totally</em> cool!



Default attributes
##################

It is possible to define default values for attributes via the class
attribute default:

    class cool(xsc.Element):
       class Attrs(xsc.Element.Attrs):
          class adj(xsc.TextAttr):
             default = u"absolutely"

       def convert(self, converter):
          node = xsc.Frag(self.content, u" is")
          if u"adj" in self.attrs:
             node.append(u" ", html.em(self[u"adj"]))
          node.append(u" cool!")
          return node.convert(converter)


Now if we instantiate the class without specifying adj we'll get the
default:

    >>> node = cool(python())
    >>> print node.conv().asBytes()
    <a href="http://www.python.org/">Python</a> is <em>absolutely</em> cool!


If we want a cool instance without an adj attribute, we can pass None
as the attribute value:

    >>> node = cool(python(), adj=None)
    >>> print node.conv().asBytes()
    <a href="http://www.python.org/">Python</a> is cool!



Attribute value sets
####################

It's possible to specify that an attribute has a fixed set of allowed
values. This can be done with the class attribute values. We could
extend our example to look like this:

    class cool(xsc.Element):
       class Attrs(xsc.Element.Attrs):
          class adj(xsc.TextAttr):
             default = "absolutely"
             values = (u"absolutely", u"totally", u"very")

       def convert(self, converter):
          node = xsc.Frag(self.content, u" is")
          if u"adj" in self.attrs:
             node.append(" ", html.em(self[u"adj"]))
          node.append(u" cool!")
          return node.convert(converter)


These values won't be checked when we create our cool instance. Only
when this node is parsed from a file will the warning be issued:

    >>> s = '<cool adj="pretty"><python/></cool>'
    >>> node = parsers.parseString(s)
    /home/walter/pythonroot/ll/xist/xsc.py:1665: IllegalAttrValueWarning: Attribute value 'pretty' not allowed for __main__:cool.Attrs.adj.
      warnings.warn(errors.IllegalAttrValueWarning(self))


The warning will also be issued if we publish such a node, but note
that for warnings Python's warning framework is used, so the warning
will be printed only once (but of course you can change that with
warnings.filterwarnings):

    >>> node = cool(python(), adj=u"pretty")
    >>> print node.asBytes()
    /home/walter/pythonroot/ll/xist/xsc.py:1665: IllegalAttrValueWarning: Attribute value 'pretty' not allowed for __main__:cool.Attrs.adj.
      warnings.warn(errors.IllegalAttrValueWarning(self))
    <cool adj="very"><python /></cool>



Required attributes
###################

Finally it's possible to specify that an attribute is required. This
again will only be checked when parsing or publishing. To specify that
an attribute is required simply add the class attribute required with
the value True. The attribute alt of the class ll.xist.ns.html.img is
such an attribute, so we'll get:

    >>> from ll.xist.ns import html
    >>> node = html.img(src="eggs.png")
    >>> print node.asBytes()
    /home/walter/pythonroot/ll/xist/xsc.py:2046: RequiredAttrMissingWarning: Required attribute 'alt' missing in ll.xist.ns.html:img.Attrs.
      warnings.warn(errors.RequiredAttrMissingWarning(self, attrs.keys()))
    <img src="eggs.png" />



----------
Namespaces
----------

Now that you've defined your own elements, you have to tell the parser
about them, so they can be instantiated when a file is parsed. This is
done with namespace classes.

Namespace classes can be thought of as object oriented versions of XML
namespaces. Two class attributes can be used to configure the
namespace: xmlname specifies the default namespace prefix to use for
the namespace and xmlurl is the namespace name. All element classes
nested inside the namespace class belong to the namespace.

It's also possible to define your namespace class without any nested
element classes and later add those classes to the namespace by
attribute assignment or with the class method update, which expects a
dictionary as an argument. All objects found in the values of the
dictionary will be added to the namespace as attributes. So you can put
the namespace class at the end of your Python module after all the
element classes are defined and add all the objects from the local
scope to the namespace. Your complete namespace might looks like this:

    class python(xsc.Element):
       def convert(self, converter):
          return html.a(
             u"Python",
             href=u"http://www.python.org/"
          )

    class cool(xsc.Element):
       def convert(self, converter):
          node = html.b(self.content, u" is cool!")
          return node.convert(converter)

    class __ns__(xsc.Namespace):
       xmlname = "foo"
       xmlurl = "http://www.example.com/foo"
    __ns__.update(vars())


All defined namespace classes will be registered with the parser
automatically, so all elements belonging to the namespace will be used
when parsing files.


Namespaces as modules
#####################

It is convenient to define all classes that belong to one namespace in
one Python module. However this means that the resulting module will
only contain one ?interesting? object: the namespace class. To make
using this class more convenient, it's possible to turn the namespace
class into a module by using the class method makemod instead of
update:

    class python(xsc.Element):
       def convert(self, converter):
          return html.a(
             u"Python",
             href=u"http://www.python.org/"
          )

    class cool(xsc.Element):
       def convert(self, converter):
          node = html.b(self.content, u" is cool!")
          return node.convert(converter)

    class __ns__(xsc.Namespace):
       xmlname = "foo"
       xmlurl = "http://www.example.com/foo"
    __ns__.makemod(vars())


Suppose that the above code is in the file foo.py. Doing an import foo
will then give you the namespace class instead of the module:

    >>> import foo
    >>> foo
    <foo:__ns__ namespace name=u"foo" url=u"http://www.example.com/foo" with 2 elements from "foo.py" at 0x87654321>
    >>> foo.python
    <foo:python element at 0x87654321>



Global attributes
#################

You can define global attributes belonging to a certain namespace in
the same way as defining local attributes belonging to a certain
element type: Define a nested Attrs class inside the namespace class
(derived from ll.xist.xsc.Namespace.Attrs):

    from ll.xist import xsc

    class __ns__(xsc.Namespace):
       xmlname = "foo"
       xmlurl = "http://www.example.com/foo"

       class Attrs(xsc.Namespace.Attrs):
          class foo(xsc.TextAttr): pass
    __ns__.makemod(vars())


Setting and accessing such an attribute can be done like this:

    >>> from ll.xist.ns import html
    >>> import foo
    >>> node = html.div(u"foo", {(foo, u"foo"): u"bar")
    >>> str(node[foo, u"foo"])
    'bar'


An alternate way of specifying a global attribute in a constructor
looks like this:

    >>> from ll.xist.ns import html
    >>> import foo
    >>> node = html.div(u"foo", foo.Attrs(foo=u"baz"))
    >>> str(node[foo, u"foo"])
    'baz'



Subclassing namespaces
######################

Each element class that belongs to a namespace can access its namespace
via the class attribute __ns__. When you're subclassing namespace
classes, the elements in the base namespace will be automatically
subclassed too. Of course you can explicitly subclass an element class
too. The following example shows the usefulness of this feature. Define
your base namespace like this and put it into navns.py:

    from ll.xist import xsc
    from ll.xist.ns import html

    languages = [
       (u"Python", u"http://www.python.org/"),
       (u"Perl", u"http://www.perl.org/"),
       (u"PHP", u"http://www.php.net/"),
       (u"Java", u"http://java.sun.com/")
    ]

    class navigation(xsc.Element):
       def convert(self, converter):
          node = self.__ns__.links()
          for (name, url) in languages:
             node.append(self.__ns__.link(name, href=url))
          return node.convert(converter)

    class links(xsc.Element):
       def convert(self, converter):
          node = self.content
          return node.convert(converter)

    class link(xsc.Element):
       class Attrs(xsc.Element.Attrs):
          class href(xsc.URLAttr): pass

       def convert(self, converter):
          node = html.div(html.a(self.content, href=self[u"href"]))
          return node.convert(converter)

    class __ns__(xsc.Namespace):
       xmlname = "nav"
       xmlurl = "http://www.example.com/nav"
    __ns__.makemod(vars())


This namespace defines a navigation element that generates divs with
links to various homepages for programming languages. We can use it
like this:

    >>> import navns
    >>> print navns.navigation().conv().asBytes()
    <div><a href="http://www.python.org/">Python</a></div>
    <div><a href="http://www.perl.org/">Perl</a></div>
    <div><a href="http://www.php.net/">PHP</a></div>
    <div><a href="http://java.sun.com/">Java</a></div>


(Of course the output will all be on one line.)

Now we can define a derived namespace (in the file nav2ns.py) that
overwrites the element classes links and link to change how the
navigation looks:

    from ll.xist import xsc
    from ll.xist.ns import html

    import navns

    class __ns__(navns):
       class links(navns.links):
          def convert(self, converter):
             node = html.table(
                self.content,
                border=0,
                cellpadding=0,
                cellspacing=0,
                class_=u"navigation",
             )
             return node.convert(converter)

       class link(navns.link):
          def convert(self, converter):
             node = html.tr(
                html.td(
                   html.a(
                      self.content,
                      href=self[u"href"],
                   )
                )
             )
             return node.convert(converter)
    __ns__.makemod(vars())


When we use the navigation element from the derived namespace we'll get
the following output:

    >>> import nav2ns
    >>> print nav2ns.navigation().conv().asBytes()
    <table border="0" cellpadding="0" cellspacing="0" class="navigation">
    <tr><td><a href="http://www.python.org/">Python</a></td></tr>
    <tr><td><a href="http://www.perl.org/">Perl</a></td></tr>
    <tr><td><a href="http://www.php.net/">PHP</a></td></tr>
    <tr><td><a href="http://java.sun.com/">Java</a></td></tr>
    </table>


(again all on one line.)

Notice that we automatically got an element class nav2ns.navigation,
that this element class inherited the convert method from its base
class and that the call to convert on the derived class did instantiate
the link classes from the derived namespace.


Namespaces as conversion targets
################################

The converter argument passed to the convert method has an attribute
target which is a namespace class and specifies the target namespace to
which self should be converted.

You can check which conversion is wanted with issubclass. Once this is
determined you can use element classes from the target to create the
required XML object tree. This makes it possible to customize the
conversion by passing a derived namespace to the convert method. To
demonstrate this, we change our example namespace to use the conversion
target like this:

    import navns

    class __ns__(navns):
       class links(nav.links):
          def convert(self, converter):
             node = converter.target.table(
                self.content,
                border=0,
                cellpadding=0,
                cellspacing=0,
                class_=u"navigation",
             )
             return node.convert(converter)

       class link(nav.link):
          def convert(self, converter):
             target = converter.target
             node = target.tr(
                target.td(
                   target.a(
                      self.content,
                      href=self[u"href"],
                   )
                )
             )
             return node.convert(converter)


What we might want to do is have all links (i.e. all ll.xist.ns.html.a
elements) generated with an attribute target="_top". For this we derive
a new namespace from ll.xist.ns.html and overwrite the a element:

    from ll.xist.ns import html

    class __ns__(html):
       class a(html.a):
          def convert(self, converter):
             node = html.a(self.content, self.attrs, target=u"_top")
             return node.convert(converter)


Now we can pass this namespace as the conversion target and all links
will have a target="_top".


-----------------------------
Validation and content models
-----------------------------

When generating HTML you might want to make sure that your generated
code doesn't contain any illegal tag nesting (i.e. something bad like
<p><p>Foo</p></p>). The module ll.xist.ns.html does this automatically:

    >>> from ll.xist.ns import html
    >>> node = html.p(html.p(u"foo"))
    >>> print node.asBytes()
    /home/walter/pythonroot/ll/xist/sims.py:238: WrongElementWarning: element <ll.xist.ns.html:p> may not contain element <ll.xist.ns.html:p>
      warnings.warn(WrongElementWarning(node, child, self.elements))
    <p><p>foo</p></p>


For your own elements you can specify the content model too. This is
done by setting the class attribute model inside the element class.
model must be an object that provides a checkvalid method. This method
will be called during parsing or publishing with the element as an
argument. When a validation violation is detected, the Python warning
framework should be used to issue a warning.

The module ll.xist.sims contains several classes that provide simple
validation methods: Empty can be used to ensure that the element
doesn't have any content (like br and img in HTML). Any does allow any
content. NoElements will warn about elements from the same namespace
(elements from other namespaces will be OK). NoElementsOrText will warn
about elements from the same namespace and non-whitespace text content.
Elements will only allow the elements specified in the constructor.
ElementsOrText will only allow the elements specified in the
constructor and text.

None of these classes will check the number of child elements or their
order.

For more info see the sims module.


--------
Entities
--------

In the same way as defining new element types, you can define new
entities. But to be able to use the new entities in an XML file you
have to use a parser that supports reporting undefined entities to the
application via skippedEntity (SGMLOPParser and ExpatParser in the
module ll.xist.parsers do that). The following example is from the
module ll.xist.ns.abbr:

    from ll.xist import xsc
    from ll.xist.ns import html

    class html(xsc.Entity):
       def convert(self, converter):
          return html.abbr(
             u"HTML",
             title=u"Hypertext Markup Language",
             lang=u"en"
          )


You can use this entity in your XML files like this:

    <cool adj="very">&html;</cool>



-----------------------
Processing instructions
-----------------------

Defining processing instructions works just like elements and entities.
Derive a new class from ll.xist.xsc.ProcInst and implement convert. The
following example implements a processing instruction that returns an
uppercase version of its content as a text node.

    class upper(xsc.ProcInst):
       def convert(self, converter):
          return xsc.Text(self.content.upper())


It can be used in an XML file like this:

    <cool><?upper foo?></cool>


There are namespaces containing processing instruction classes that
don't provide a convert method. These processing instruction objects
will then be published as XML processing instructions. One example is
the namespace ll.xist.ns.php.

Other namespaces (like ll.xist.ns.jsp) contain processing instruction
classes, but they will be published in a different (not XML compatible)
format. For example ll.xist.ns.jsp.expression("foo") will be published
as <%= foo>.


====================
Publishing XML trees
====================

After creating the XML tree and converting the tree into its final
output form, you have to write the resulting tree to a file. This can
be done with the publishing API. Three methods that use the publishing
API are bytes, asBytes and write. bytes is a generator that will yield
the complete 8-bit XML string in fragments. asBytes returns the
complete 8-bit XML string.

You can specify the encoding with the parameter encoding (with "utf-8"
being the default). Unencodable characters will be escaped with
character references when possible (i.e. inside text nodes, for
comments or processing instructions you'll get an exception):

    >>> from ll.xist import xsc
    >>> from ll.xist.ns import html
    >>> s = u"A\e4\u03a9\u8a9e"
    >>> node = html.div(s)
    >>> print node.asBytes(encoding="ascii")
    <div>A&#228;&#937;&#35486;>
    >>> print node.asBytes(encoding="iso-8859-1")
    <div>A&#937;&#35486;>
    >>> print xsc.Comment(s).asBytes()
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "/home/walter/pythonroot/ll/xist/xsc.py", line 600, in asBytes
        publisher.publish(stream, self, base)
      File "/home/walter/pythonroot/ll/xist/publishers.py", line 205, in publish
        self.node.publish(self)
      File "/home/walter/pythonroot/ll/xist/xsc.py", line 1305, in publish
        publisher.write(self.content)
      File "/usr/local/lib/python2.3/codecs.py", line 178, in write
        data, consumed = self.encode(object, self.errors)
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-4: ordinal not in range(128)


When you include an XML header or an HTML meta header, XIST will
automatically insert the correct encoding when publishing:

    >>> from ll.xist import xsc
    >>> from ll.xist.ns import xml, meta
    >>> e = xsc.Frag(xml.XML10(), u"\n", meta.contenttype())
    >>> print e.conv().asBytes(encoding="iso-8859-15")
    <?xml version='1.0' encoding='iso-8859-15'?>
    <meta content="text/html; charset=iso-8859-15" http-equiv="Content-Type" />


Another useful parameter is xhtml, it specifies whether you want pure
HTML or XHTML as output:

xhtml==0
    This will give you pure HTML, i.e. no final / for elements with an
    empty content model, so you'll get e.g. <br> in the output.
    Elements that don't have an empty content model, but are empty will
    be published with a start and end tag (i.e. <div></div>).
xhtml==1
    This gives HTML compatible XHTML. Elements with an empty content
    model will be published like this: <br /> (This is the default).
xhtml==2
    This gives full XML output. Every empty element will be published
    with an empty tag (without an additional space): <br/> or <div/>.

Writing a node to a file can be done with the method write:

    >>> from ll.xist.ns import html
    >>> node = html.div(u"", html.br(), u"")
    >>> node.write(open("foo.html", "wb"), encoding="ascii")


All these methods use the method publish internally. publish gets
passed an instance of ll.xist.publisher.Publisher.


===============
Searching trees
===============

There are two methods available for iterating through an XML tree and
finding nodes in the tree: The walk method and XFind expressions.


---------------
The walk method
---------------

The method walk is a generator. You pass a callable object to walk
which is used for determining which part of the tree should be searched
and which nodes should be returned.

ll.xist.xsc provides several useful predefined classes for specifying
what should be returned from walk: FindType will search only the first
level of the tree and will return any node that is an instance of one
of the classes passed to the constructor. So if you have an instance of
ll.xist.ns.html.ul named node you could search for all
ll.xist.ns.html.li elements inside with the following code:

    for cursor in node.content.walk(xsc.FindType(html.li)):
       print unicode(cursor.node)


FindTypeAll can be used when you want to search the complete tree. The
following example extracts all the links on the Python home page:

    from ll.xist import xsc, parsers
    from ll.xist.ns import html

    node = parsers.parseURL("http://www.python.org/", tidy=True)

    for cursor in node.walk(xsc.FindTypeAll(html.a)):
       print cursor.node[u"href"]


This gives the output:

    http://www.python.org/
    http://www.python.org/search/
    http://www.python.org/download/
    http://www.python.org/doc/
    http://www.python.org/Help.html
    http://www.python.org/dev/
    ...


The following example will find all external links on the Python home
page:

    from ll.xist import xsc, parsers
    from ll.xist.ns import html

    node = parsers.parseURL("http://www.python.org/", tidy=True)

    def isextlink(cursor):
       if isinstance(cursor.node, html.a) and not unicode(cursor.node[u"href"]).startswith(u"http://www.python.org"):
          return (True, xsc.entercontent)
       return (xsc.entercontent,)

    for cursor in node.walk(isextlink):
       print cursor.node[u"href"]


This gives the output:

    http://www.jython.org/
    http://sourceforge.net/tracker/?atid=105470&group%5fid=5470
    http://sourceforge.net/tracker/?atid=305470&group%5fid=5470
    http://sourceforge.net/cvs/?group%5fid=5470
    http://www.python-in-business.org/
    http://www.europython.org/
    mailto:webmaster@python.org
    ...


The callable (isextlink in the example) will be called for each node
visited. The cursor argument has an attribute node that is the node in
question. For the other attributes see the Cursor class.

The callable must return a sequence with the following entries:

ll.xist.xsc.entercontent
    enter the content of this element and continue searching;
ll.xist.xsc.enterattrs
    enter the attributes of this element and continue searching;
boolean value
    If true, the node will be part of the result.

The sequence will be ?executed? in the order you specify. To change the
top down traversal from our example to a bottom up traversal we could
change isextlink to the following (note the swapped tuple entries):

    def isextlink(node):
       if isinstance(node, html.a) and not unicode(node[u"href"]).startswith(u"http://www.python.org"):
          return (xsc.entercontent, True)
       return (xsc.entercontent,)


Note that the cursor yielded from walk will be reused by subsequent
next calls, so you should not modify the cursor and you can't rely on
attributes of the cursor after reentry to walk.


-----------------
XFind expressions
-----------------

A second method exists for iterating through a tree: XFind expressions.
An XFind expression looks somewhat like an XPath expression, but is
implemented as a pure Python expression (overloading the division
operators).

Our example from above that searched for lis inside uls can be
rewritten as follows:

    for child in node/html.li:
       print unicode(child)


A XFind expression returns an iterator for certain parts of the XML
tree. In an XFind expression a/b, a must be either a node or an
iterator producing nodes (note that an XFind expression itself is such
an iterator, so a itself might be a XFind expression). b must be an
XFind operator.

Every subclass of ll.xist.xsc.Node is a XFind operator. If b is such a
subclass, a/b will produce any child nodes of the nodes from a that is
an instance of b. If b is an attribute class, you will get attribute
nodes instead of child nodes. Other XFind operators can be found in the
module ll.xist.xfind. The all operator will produce every node in the
tree (except for attributes):

    from ll.xist import xfind
    from ll.xist.ns import html

    node = html.div(
       html.div(
          html.div(id=3),
          html.div(id=4),
          id=2,
       ),
       html.div(
          html.div(id=6),
          html.div(id=7),
          id=5,
       ),
       id=1
    )

    for child in node/xfind.all:
       print child["id"]


The output of this is:

    1
    2
    3
    4
    5
    6
    7


The following example demonstrates how to find all links on the Python
homepage via an XFind expression:

    from ll.xist import xfind, parsers
    from ll.xist.ns import html

    node = parsers.parseURL("http://www.python.org/", tidy=True)
    for link in node/xfind.all/html.a:
       print link["href"]


An all operator in the middle of an XFind expression can be
abbreviated. The XFind expression from the last example (node/xfind.all
/html.a) can be rewritten like this: node//html.a.

Another XFind operator is contains. It acts as a filter, i.e. the nodes
produced by a/xfind.contains(b) are a subset of the nodes produced by
a, those that contain child nodes of type b. Searching for all links on
the Python home page that contain images can be done like this:

    >>> from ll.xist import xfind, parsers
    >>> from ll.xist.ns import html
    >>> node = parsers.parseURL("http://www.python.org/", tidy=True)
    >>> for link in node//html.a/xfind.contains(html.img):
    ...     print link["href"]
    ...
    http://www.python.org/
    http://www.python.org/psf/donations.html
    http://www.opensource.org/


Note that using the all operator twice in an XFind expression currently
won't give you the expected result, as nodes might be produced twice.

Calling __getitem__ on an XFind operator gives you an item operator.
Such an item operator only returns a specific item (or slice) of those
nodes returned by the base iterator. An example:

    >>> from ll.xist.ns import html
    >>> e = html.table(html.tr(html.td(j) for j in xrange(i, i+3)) for i in xrange(1, 10, 3))
    >>> print e.pretty().asBytes()
    <table>
       <tr>
          <td>1</td>
          <td>2</td>
          <td>3</td>
       </tr>
       <tr>
          <td>4</td>
          <td>5</td>
          <td>6</td>
       </tr>
       <tr>
          <td>7</td>
          <td>8</td>
          <td>9</td>
       </tr>
    </table>
    >>> # Every cell
    >>> for td in e/html.tr/html.td:
    ...     print td
    ...
    1
    2
    3
    4
    5
    6
    7
    8
    9
    >>> # Every first cell in each row
    >>> for td in e/html.tr/html.td[0]:
    ...     print td
    1
    4
    7
    >>> # Every cell in the first row
    >>> for td in e/html.tr[0]/html.td:
    ...     print td
    1
    2
    3
    >>> # The first of all cells
    >>> for td in e/(html.tr/html.td)[0]:
    ...     print td
    1



==================
Manipulating trees
==================

XIST provides many methods for manipulating an XML tree.

The method withsep can be used to put a seperator node between the
child nodes of an Element or a Frag:

    >>> from ll.xist import xsc
    >>> from ll.xist.ns import html
    >>> node = html.div(*xrange(10))
    >>> print node.withsep(", ").asBytes()
    <div>0, 1, 2, 3, 4, 5, 6, 7, 8, 9</div>


The method shuffled returns a shuffled version of the Element or Frag:

    >>> from ll.xist import xsc
    >>> from ll.xist.ns import html
    >>> node = html.div(*xrange(10))
    >>> print node.shuffled().withsep(", ").asBytes()
    <div>8, 1, 3, 6, 7, 5, 2, 9, 4, 0</div>


There are methods named reversed and sorted that return a reversed or
sorted version of an element or fragment:

    >>> from ll.xist import xsc
    >>> from ll.xist.ns import html
    >>> def key(n):
    ...    return unicode(n)
    >>> node = html.div(8,4,2,1,9,6,3,0,7,5)
    >>> print node.sorted(key=key).reversed().withsep(",").asBytes()
    <div>9,8,7,6,5,4,3,2,1,0</div>


The method mapped recursively walks the tree and generates a new tree,
where all the nodes are mapped through a function. An example: To
replace Python with Parrot in every text node on the Python page, do
the following:

    from ll.xist import xsc, parsers

    def p2p(node, converter):
       if isinstance(node, xsc.Text):
          node = node.replace(u"Python", u"Parrot")
          node = node.replace(u"python", u"parrot")
       return node

    node = parsers.parseURL("http://www.python.org/", tidy=True)
    node = node.mapped(p2p)
    node.write(open("parrot_index.html", "wb"))


The function must either return a new node, in which case this new node
will be used instead of the old one, or return the old node to tell
mapped that it should recursively continue with the content of the
node.


====
URLs
====

For URL handling XIST uses the module ll.url. Refer to its
documentation for the basic functionality (especially regarding the
methods __div__ and relative).

When XIST parses an XML resource it uses a so called ?base? URL. This
base URL can be passed to all parsing functions. If it isn't specified
it defaults to the URL of the resource being parsed. This base URL will
be prepended to all URLs that are read during parsing:

    >>> from ll.xist import parsers
    >>> from ll.xist.ns import html
    >>> node = parsers.parseString('<img src="eggs.png"/>', base="root:spam/index.html")
    >>> print node.asBytes()
    <img src="root:spam/eggs.png" />


For publishing a base URL can be specified too. URLs will be published
relative to this base URL with the exception of relative URLs in the
tree. This means:

  * When you have a relative URL (e.g. #top) generated by a convert
    call, this URL will stay the same when publishing.
  * Base URLs for parsing should never be relative: Relative base URLs
    will be prepended to all relative URLs in the file, but this will
    not be reverted for publishing. In most cases the base URL should
    be a root URL when you parse local files.
  * When you parse remote web pages you can either omit the base
    argument, so it will default to the URL being parsing, so that
    links, images, etc. on the page will still point back to their
    original location, or you might want to use the empty URL URL() as
    the base, so you'll get all URLs in the page as they are.
  * When XIST is used as a compiler for static pages, you're going to
    read source XML files, do a conversion and write the result to a
    new target file. In this case you should probably use the URL of
    the target file for both parsing and publishing. Let's assume we
    have an URL #top in the source file. When we use the ?real? file
    names for parsing and publishing like this:

        node = parsers.parseFile("spam.htmlxsc", base="root:spam.htmlxsc")
        node = node.conv()
        node.write(open("spam.html", "wb"), base="root:spam.html")


    the following will happen: The URL #top will be parsed as
    root:spam.htmlxsc#top. After conversion this will be written to
    spam.html relative to the URL root:spam.html, which results in
    spam.html#top, which works, but is not what you want.

    When you use root:spam.html both for parsing and publishing, #top
    will be written to the target file as expected.


===================
Pretty printing XML
===================

The method pretty can be used for pretty printing XML. It returns a new
version of the node, with additional white space between the elements:

    from ll.xist.ns import html
    node = html.html(
       html.head(
          html.title(u"foo"),
       ),
       html.body(
          html.div(
             html.h1(u"The ", html.em(u"foo"), u" page!"),
             html.p(u"Welcome to the ", html.em(u"foo"), u" page."),
          ),
       ),
    )

    print node.pretty().asBytes()


This will print:

    <html>
       <head>
          <title>foo</title>
       </head>
       <body>
          <div>
             <h1>The <em>foo</em> page!</h1>
             <p>Welcome to the <em>foo</em> page.</p>
          </div>
       </body>
    </html>


Element content will only be modified if it doesn't contain Text nodes,
so mixed content will not be touched.


=============================================
Automatic generation of image size attributes
=============================================

The module ll.xist.ns.htmlspecials contains an element autoimg that
extends ll.xist.ns.html.img. When converted to HTML via the convert
method the size of the image will be determined and the height and
width attributes will be set accordingly (if those attributes are not
set already).


=====================
Embedding Python code
=====================

It's possible to embed Python code into XIST XML files. For this XIST
supports two new processing instructions: pyexec and pyeval (in the
module ll.xist.ns.code). The content of pyexec will be executed when
the processing instruction node is converted.

The result of a call to convert for a pyeval processing instruction is
whatever the Python code in the content returns. The processing
instruction content is treated as the body of a function, so you can
put multiple return statements there. The converter is available as the
parameter converter inside the processing instruction. For example,
consider the following XML file:

    <?pyexec
       # sum
       def gauss(top=100):
          sum = 0
          for i in xrange(top+1):
             sum += i
          return sum
    ?>
    <b><?pyeval return gauss()?></b>


Parsing this file and calling convert results in the following:

    <b>5050</b>

