Discover key features of DOM Level 3 Core, Part 1


Search for:	within
		Search help

IBM home | Products & services | Support & downloads | My account

developerWorks > XML | Java technology


	Discover key features of DOM Level 3 Core, Part 1

Contents:

Renaming and moving nodes from one document to another

Related content:

Discover key features of DOM Level 3 Core, Part 2

Understanding DOM

Moving DOM nodes

Subscriptions:

dW newsletters

dW Subscription
(CDs and downloads)

Manipulating and comparing nodes, handling text and user data

Level: Intermediate

Arnaud Le Hors (mailto:lehors@us.ibm.com?cc=&subject=Discover key features of DOM Level 3 Core, Part 1), Senior Software Engineer, IBM
Elena Litani (mailto:elitani@ca.ibm.com?cc=&subject=Discover key features of DOM Level 3 Core, Part 1), Staff Software Developer, IBM

19 August 2003

In this two-part article, the authors present some of the key features brought by the W3C Document Object Model (DOM) Level 3 Core Working Draft and show you how to use them with examples in Java code. This first part covers manipulating nodes and text, and attaching user data onto nodes.

The Document Object Model (DOM) is one of the most widely available APIs. It provides a structural representation of an XML document, enabling users to access and modify its contents. The DOM Level 3 Core specification, which is now in Last Call status, is the latest in a series of DOM specifications produced by the W3C. It provides a set of enhancements that make several common operations much simpler to perform, and make possible certain things you simply could not do before. It also supports the latest version of different standards, such as Namespaces in XML, XML Information Set, and XML Schema, and thus provides a more complete view of the XML data in memory.

The first part of this article covers operations on nodes; the second part focuses on operations on documents and type information, and explains how to use DOM in Xerces.

Renaming and moving nodes from one document to another
In DOM Level 2, renaming a node was a relatively expensive operation: You had to create a new node, copy all the data to the new node, insert the new node into the tree, and delete the old one.

The Document interface of DOM Level 3 now has a new method that does all this for you: renameNode allows you to rename an attribute or an element in the tree in one single call. It is important to note that while this operation attempts to simply change the name of the existing node, in some cases, the implementation may not be able to actually rename the node. Instead, it may be forced to create a new node with the new name and replace the existing node with the new node. The reason is that the DOM is designed to work on many different types of implementations, and in some of them changing the name of an element or attribute is not as simple as changing a field in an object. For example, in Web browsers renaming an element "P" to "INPUT" would translate into transforming a paragraph into a form field, which may be neither really possible nor desirable. So instead, the browser creates a new node and replaces the old one with the new one. Nevertheless, all this is transparent to you, as you still end up with a node that has the name you want.

Often, you have two documents in memory and you would like to merge or include a part of one document into another. In DOM Level 2 you could do something similar to this by using the importNode method on the Document interface. However, this method does not alter the original tree. Instead, it creates a clone of the source node and its descendents that you can then insert into the destination document. This is OK if that's what you want, but it's somewhat annoying if what you really want is to move the node from one document to another. This not only forces you to clean up the source nodes that are left behind, but it can also be expensive if the subtree you're moving is large.

With DOM Level 3, you can now do this more efficiently with adoptNode. This method, also found on the Document interface, effectively moves a subtree from one document to another. In effect, this changes the ownerDocument of the nodes in the subtree. Listing 1 shows how easy it can be to move elements between documents and rename nodes.

Listing 1. Moving elements and renaming nodes


// Renaming nodes
Element element = document.createElementNS("http://example.com", "street");
// if implementation can rename the node, element returned
// is the same object as was originally created
element = document.renameNode(element, "http://example.com", "address");

// adopting previously created node to a different document
Node adoptedNode = document2.adoptNode(element);

Again, because the DOM is designed to work on many different types of implementations, and because the source document and the destination document may belong to two different types of implementations, moving nodes from one document to the other may not be possible. In this case, adoptNode throws a NOT_SUPPORTED_ERR DOMException that you can catch. However, this is only required if your application actually deals with multiple DOM implementations at the same time.

Comparing nodes
DOM Level 3 brings a set of methods to compare nodes in many different ways. This includes a method to test whether two nodes are equal, are the same, and how they are positioned relative to each other in the document tree. You are probably familiar with the concepts of identity and equality. In the Java language, identity is tested with the operator ==, equality on the other hand is tested with a method such as equals. For two objects to be identical, they have to be the same object in memory. On the other hand, for two objects to be equal all they need is to have the same characteristics. Therefore, two objects that are identical are equal, but two objects that are equal are not necessarily identical.

DOM Level 3 defines what it takes for two nodes to be equal and provides a method, isEqualNode on Node, to perform this test. For example, if you create two empty element nodes named "foo" without any attributes, they are equal, even though they are not identical.

You could use something like == to test for identity; however, some DOM implementations with a complex internal structure do not expose their objects directly as nodes, but create proxies that are returned to the application. And they may create more than one of these proxies for the same node. This means that the object returned by a DOM operation, such as getFirstChild, may be different every time you call that method -- even if nothing else has changed. In this case, if you compare the identity of the returned objects, you will find that they are not identical. However, they really are references to the same node inside the implementation. The way to find this out is to use isSameNode. This tells you whether what you are looking at are proxies to the same object or objects that are actually different.

In addition to what we said about the equality of identical objects, isEqualNode always returns "true" if isSameNode returns "true". But two nodes that are equal are not necessarily the same.

The last addition that helps you compare nodes is the compareDocumentPosition method. This method allows you to find out how two nodes are positioned with respect to each other in the document tree. No more searching into your old books for the best algorithm to find out whether one node is positioned before the old one in the tree. This method tells you all you need to know: whether one node is a descendent or an ancestor of the other, whether it is before or after, and so on.

In addition, what might look like a convenience function can actually be more than that. Indeed, operations like compareDocumentPosition are likely to be done more efficiently by the implementation than by you, thanks to its knowledge of what works best with its internal structure. For example, an operation that would require you to traverse the tree would force you to choose between traversing the tree by getting the first child and then its next sibling, or by getting the list of child nodes and iterating over it. Depending on what the internal structure really looks like, one method may be faster than the other. But you have no way to determine this, and even if you did what may be best for one implementation may not be for another. On the other hand, if you use a method such as compareDocumentPosition and defer to the implementation to traverse the tree for you, you're guaranteed to always use the best way to do so. DOM Level 3 Core has several such functions; one of these is textContent, described in the following section.

Handling text
Until now, to replace the text content of an element node, you had to remove its children, create a Text node with the new content, and insert it as child of the Element node. Retrieving the content also required several steps, as shown in Listing 2.

Listing 2. Retrieving the text content of an element with DOM Level 2.


// Assuming element has two children comment and 
// a text node
NodeList list = elem.getChildNodes();
int len = list.getLength();
for (int i=0;i<len;i++){
        elem.removeChild(list.item(i));
}
elem.appendChild(document.createTextNode("content"));

With DOM Level 3 it is now much easier to retrieve and set text content on an Element node. The new read/write textContent attribute allows an easy manipulation of text content: Setting this attribute removes all the child nodes and replaces them with a single text node if you do not set it to an empty value; getting this attribute returns the concatenated text content of this node and its descendants.

Listing 3. Retrieving the text content of an element and modifying it with DOM Level 3.


String oldContent = elem.getTextContent();
elem.setTextContent("content");

This also makes it straightforward to create elements that simply contain a piece of text -- all you need to do is create the element and set its textContent. This basically gets the Text nodes out of the way and lets you deal with the text in your document more directly.

Another useful addition is the new wholeText attribute on the Text interface. This returns all the text contained in the logically-adjacent text nodes. In practice, this means that when you look at the child node of an element and it's a Text node, you can get all the text that is at that position in the document in one call. You no longer have to worry about the possibility that your text is being held by several adjacent Text nodes that need to be concatenated. The wholeText attribute gives you the answer you want directly.

User data
In many cases, the DOM does not actually contain all the data you have in your application; it's only one part of it. In fact, a DOM node often relates to some other object in your application. The challenge is managing the relationship between the two structures. In the past, in order to do this you had to store a reference to the DOM node in your structure, or if it was impossible you had to have yet another structure, such as a hash table, to store information on how to go from one structure to the other. As a result, it could be a real pain to maintain these when the DOM mutates. In particular, nodes could be modified or deleted without you ever knowing about it, and not having a chance to update your own structure accordingly.

DOM Level 3 can do a lot of this work for you. First, it allows you to store a reference to your application object on a Node. The object is associated with a key that you can use to retrieve that object later. You can have as many objects on a Node as you want; all you need to do is use different keys. Second, you can register a handler that is called when anything that could affect your own structure occurs. These are events such as a node being cloned, imported to another document, deleted, or renamed. With this, you can now much more easily manage the data you associate with your DOM. You no longer have to worry about maintaining the two in parallel. You simply need to implement the appropriate handler and let it be called whenever you modify your DOM tree. And you can do this with the flexibility of using a global handler or a different one on each node as you see fit. In any case, when something happens to a node on which you have attached some data, the handler you registered is called and provides you with all the information you need to update your own structure accordingly.

Conclusion
We've shown you how DOM Level 3 Core can make your life easier when working with nodes, whether it is renaming a node, moving nodes from one document to another, or comparing them. We've also shown you how DOM Level 3 Core lets you access and modify the text content of your document in a more natural way than having to deal with Text nodes that tend to get in the way. Finally, we've explained to you how you can use the DOM Level 3 Core to more easily maintain your own structure that is associated with the DOM.

In Part 2, we will show you other interesting features of DOM Level 3 Core, such as how to bootstrap and get your hands on a DOMImplementation object without having any implementation-dependent code in your application, how the DOM maps to the XML Infoset, how to revalidate your document in memory, and how to use DOM Level 3 Core in Xerces.

Resources

Read about the DOM Level 2 Core W3C Recommendation.
Get familiar with the latest DOM Level 3 Core Last Call draft.
Learn about the Xerces2 DOM implementation.
Download the latest Xerces-J parser.
Part 2 of this article series (developerWorks, August 2003) introduces other DOM Level 3 Core features, such as "bootstrap", revalidation of the DOM in memory, and the early implementation of this API in the Apache Xerces2 project.
Find more XML resources on the developerWorks XML zone, including the introductory tutorial Understanding DOM (developerWorks, July 2003).
Check out IBM WebSphere Studio Site Developer a robust, easy-to-use development environment for creating, building, and maintaining dynamic Web sites, applications, and Web services.
Find out how you can become an IBM Certified Developer in XML and related technologies.

About the authors
Arnaud Le Hors is a Senior Software Engineer at IBM, and is part of the XML Standards Strategy Group. He represents IBM in various Working Groups of W3C, such as XML Core and DOM. He's one of the editors of the DOM Level 1, 2, and 3, Core Specifications. Arnaud is also one of the developers of Xerces and one of the designers of Xerces2. You can reach him at lehors@us.ibm.com.

Elena Litani is a Staff Software Developer at the IBM Toronto Lab. She is one of the lead developers of Xerces2. For the last two years, Elena has been representing IBM in the W3C DOM Working Group. You can reach her at elitani@ca.ibm.com.

developerWorks > XML | Java technology

About IBM | Privacy | Terms of use | Contact