Programming with the Document Object Model

This section is of a technical nature and is for readers who are familiar with C++, Java, and JavaScript programming. You may wish to skip this section of you are solely interested in viewing and creating techexplorer documents.

Basic DOM concepts

The Document Object Model (DOM) consists of a collection of interfaces that provide document traversal, node creation, and structure modification operations to external clients, allowing these clients to interact with the content of a document object. The design of the DOM interfaces closely follows the XML standard notions of a document containing elements and attributes, and is primarily intended to be used as a scripting language for XML Documents.

Since the DOM interfaces are defined as collections of methods, they do not presuppose any particular representation of a Document in memory, nor do they impose any given syntax on linearized forms of a Document. As a result, the DOM interfaces may be used to manipulate virtually any tree-structured data containing Text at the leaves of the tree.

techexplorer DOM Documents

Since techexplorer LaTeX and Mathematical Markup Language documents are represented internally as trees of nodes containing text, it is possible to provide a DOM interface on these internal data structures by defining the appropriate methods for each of the internal techexplorer node types. The techexplorer DOM API allows external programs to call these methods, implemented as part of the core techexplorer plugin and ActiveX control, using either the Java or the C++ binding of these methods provided by techexplorer.

Java

The Java binding of the techexplorer DOM methods is exposed using a collection of Java classes found in the package ibm.techexplorer.dom. Objects from these classes forward DOM method calls to techexplorer, allowing applications to treat these objects as if they were the actual objects present in the techexplorer document.

C++

The C++ binding of the techexplorer DOM methods is exposed using a collection of C++ classes found in the directory AddIns/DOM. Objects from these classes forward DOM method calls to techexplorer, and handle memory management of DOM objects as they are created, added, and removed from the techexplorer document.

Core DOM interfaces

Node

The Core DOM interfaces are defined by the DOM specification. Most, but not all, of these interfaces inherit from the Node interface that supplies basic tree traversal operations, such as getParentNode, getFirstChild, getNextSibling.

Document

The Document interface encapsulates the top-level structure of a document, and also provides methods for creating new nodes (createElement, createAttribute) belonging to the document.

Element

Internal nodes in the document tree are represented using the Element interface, that inherits from Node, and that provides operations for accessing the attributes associated with the node, such as getAttribute and setAttribute.

Text

Leaf nodes in a document tree consist of document text, represented by the Text interface defined by the DOM specification. Text nodes may be removed and inserted into the tree just as any other type of node, but there are also methods for modifying the content of the text, such as appendData and splitText.

Attribute

Element nodes in the document tree may have associated with them a collection of attributes, that are essentially key-value pairs that provide modifiers for the behavior of the element. A DOM Attribute is also a Node, but is not considered to be a child of the Element it modifies, and so does not appear in the list of children exposed by the usual Node traversal methods.

Calling the standard DOM methods

The following topics provide an introduction to the use of some of the standard DOM methods for creating techexplorer nodes.

Document traversal

There are two standard ways of traversing the DOM document tree using the methods provided by the Node and NodeList interfaces.

The first method asks the node for its list of child nodes, and accesses the list one at a time to get the children.

Node node = ... some input node ...
NodeList list = node.getNodeList();
int count = list.getLength();
for ( int i = 0; i < count; i += 1 ) {
Node child = list.item(i);
... operate on the child node ...
}

The second method asks the node for its first child, the repeatedly calls getNextSibling to move to the next child in the list. (This method also works with getLastChild and getPreviousSibling.)

Node node = ... some input node ...
Node child = node.getFirstChild();
while ( child != null ) {
... operate on the child node ...
child = child.getNextSibling();
}

Either method may be useful depending on the needs of the application.

Node creation

Creating new nodes in a DOM document is accomplished using the methods provided by the Document interface. The general pattern is that the application obtains access to the Document object, then calls the appropriate creation method with the arguments needed to create the desired node type.

Creating a new Element node requires knowing the name of the node type to be created. In an XML or MathML document this name corresponds to the element tag name in the document. In a LaTeX document, techexplorer defines an element tag name for each of the internal node types used by techexplorer to represent syntax elements from the LaTeX source.

Document doc = ... reference to the Document object ...
String input = "compound";
Element node = doc.createElement( input );
String output = node.getNodeName()

Creating a new Text node requires the application to provide the text that will stand as the contents of the new node.

Document doc = ... reference to the Document object ...
String input = "this is a test";
Text text = doc.createTextNode( input );
String output = text.getData();

The DocumentExplorer is a good way to discover, for any input document, the names of the element nodes used internally by techexplorer to represent that document, as well as the parent-child relationships inherent in the document. This information can be extremely useful in determining what Elements and Text nodes to create to achieve a given effect.

The techexplorer node types may be classified into those that accept zero or more children, and those that require a fixed number of children. Those node types that accept zero or more children are typically created with zero children. Those that require a fixed number of children are generally created with acceptable default (usually empty) values for those children.

Structure modification

The node creation methods provided by the Document interface only construct one node at a time. After a node is created, it may be inserted or removed from the document tree using the structure modification methods provided by the Node interface.

For nodes that accept zero or more children, the appendChild method is a convenient way of adding nodes to the tree:

Document doc = ... reference to the Document object ...
Element node = doc.createElement( "compound" );
Node child1 = doc.createTextNode( "first" );
Node child2 = doc.createTextNode( "second" );
Node child3 = doc.createTextNode( "third" );
node.appendChild( child1 );
node.appendChild( child2 );
node.appendChild( child3 );

The insertBefore method can also be used to add a new child at a specific point in the list of children.

The removeChild method removes a child node from its position in its parent. The removed node may then be reused for other purposes.

For nodes that require a fixed number of children, the replaceChild method can be used to create the desired structure:

Document doc = ... reference to the Document object ...
Element link = doc.createElement( "document-link" );
Node child1 = doc.createTextNode( "first" );
Node child2 = doc.createTextNode( "second" );
link.replaceChild( child1, link.getFirstChild() );
link.replaceChild( child2, link.getLastChild() );

For these node types, the appendChild and insertBefore methods raise an exception, since the result of these methods would create an invalid node tree. For the same reason, the removeChild method for such nodes instead replaces the child being removed with an empty node.

techexplorer DOM extensions

techexplorer supplies extensions to the DOM interfaces that fill two roles not addressed by the DOM specification:

document creation methods that DOM explicitly leaves unspecified
convenience methods for creating DOM trees from linearized strings

Document creation

Typically, a techexplorer Document is created when parsed from an existing file containing either LaTeX or MathML syntax. On parsing, techexplorer creates the Document object, and the external application need only access this object using the appropriate method.

For example, from Java, once the applet has accessed the techexplorer object provided by the techexplorer API, the getDocumentNode method can be used to obtain the DOM Document object:

techexplorer techexpl = ... reference to the plugin or ActiveX control object ...
Document document = techexpl.getDocumentNode();
... create and modify nodes as needed ...

Convenience methods

The Document object returned by the getDocumentNode method actually returns an object of class TEDocument (from package ibm.techexplorer.dom) that implements the Document interface. In addition, TEDocument objects also provide convenience methods for creating more complex node structures.

Document doc = ... reference to the Document object ...
TEDocument tedoc = (TEDocument) doc;
Node node = tedoc.createFromTexString( "\\docLink{foo}{bar}" );

Using the DOM for LaTeX documents

The following topics highlights several areas of interest when using the DOM to access LaTeX documents.

LaTeX tables

The content model (that is, the child elements) of array elements have changed since version 2.5 in several ways:

The rows of the array are collected into a single element. Previously the rows were direct children of the array element.
The columns of the array are presented under a new element. Previously the columns were only implicit in the content model.
The elements of the array are enclosed in an element. Previously the elements were direct children of the row.
Interrow material (\hline) is now enclosed in an element that contains a sequence of (n+1) elements.
Intercolumn material (|@) is now enclosed in an element that contains a sequence of (n+1) elements.
Row alignment is now enclosed in an element that contains a sequence of n elements.
Column alignment (lrcp) is now enclosed in an element that contains a sequence of n elements.

So a typical array element would look like:

     <array>
         <rows>
             <array-row>
                 <array-elt>
                 ...
             ...
         <cols>
             <array-col>
                 <array-elt>
                 ...
             ...
         <inter-rows>
             <inter-row>
             ...
         <inter-cols>
             <inter-col>
             ...
         <align-rows>
             <align-row>
             ...
         <align-cols>
             <align-col>
             ...
     </array>

There are two additional attributes defined for the array elements:

     modrows="true|false"
     modcols="true|false"

Setting 'modrows="true"' allows an application to perform DOM operations on the <rows> child element of the array. Setting 'modrows="false"' signals that the modifications are complete, and that the structure of the <cols> element should be recomputed.

Setting 'modcols="true"' allows an application to perform DOM operations on the <cols> child element of the array. Setting 'modcols="false"' signals that the modifications are complete, and that the structure of the <rows> element should be recomputed.

Array elements may be given attributes:

     rowspan="n"
     colspan="n"

indicating the array element spans the given number of rows and/or columns.

Quick start guide for porting existing DOM applications

Set modrows/modcols before making modifications to the array element
Append array rows to the <rows> element rather than the array element
Ensure each array element is contained in an <array-elt> node
Append interrow material to the <inter-row> element
Append intercolumn material to the <inter-col> element
Append column alignment material to the <align-col> element
Clear modrows/modcols after making modifications to the array element

IBM techexplorer Hypermedia Browser is a trademark of the IBM Corporation. Send comments and questions to techexpl@us.ibm.com.