|
|
|
Contents: |
|
|
|
Related content: |
|
|
|
Subscriptions: |
|
|
| Manipulating and comparing nodes, handling text and user
data
Arnaud
Le Hors (mailto:lehors@us.ibm.com?cc=&subject=Discover
key features of DOM Level 3 Core, Part 1), Senior Software Engineer,
IBM Elena
Litani (mailto:elitani@ca.ibm.com?cc=&subject=Discover
key features of DOM Level 3 Core, Part 1), Staff Software Developer,
IBM
19 August 2003
In this two-part article, the authors present some of the
key features brought by the W3C Document Object Model (DOM) Level 3 Core
Working Draft and show you how to use them with examples in Java code.
This first part covers manipulating nodes and text, and attaching user
data onto nodes.
The Document Object Model (DOM) is one of the most widely available
APIs. It provides a structural representation of an XML document, enabling
users to access and modify its contents. The DOM Level 3 Core
specification, which is now in Last Call status, is the latest in a series
of DOM specifications produced by the W3C. It provides a set of
enhancements that make several common operations much simpler to perform,
and make possible certain things you simply could not do before. It also
supports the latest version of different standards, such as Namespaces in
XML, XML Information Set, and XML Schema, and thus provides a more
complete view of the XML data in memory.
The first part of this article covers operations on nodes; the second part
focuses on operations on documents and type information, and explains how
to use DOM in Xerces.
Renaming and moving nodes from one
document to another In DOM Level 2, renaming a node was a
relatively expensive operation: You had to create a new node, copy all the
data to the new node, insert the new node into the tree, and delete the
old one.
The Document interface of DOM Level 3 now has a new method
that does all this for you: renameNode allows you to rename
an attribute or an element in the tree in one single call. It is important
to note that while this operation attempts to simply change the name of
the existing node, in some cases, the implementation may not be able to
actually rename the node. Instead, it may be forced to create a new node
with the new name and replace the existing node with the new node. The
reason is that the DOM is designed to work on many different types of
implementations, and in some of them changing the name of an element or
attribute is not as simple as changing a field in an object. For example,
in Web browsers renaming an element "P" to "INPUT" would translate into
transforming a paragraph into a form field, which may be neither really
possible nor desirable. So instead, the browser creates a new node and
replaces the old one with the new one. Nevertheless, all this is
transparent to you, as you still end up with a node that has the name you
want.
Often, you have two documents in memory and you would like to merge or
include a part of one document into another. In DOM Level 2 you could do
something similar to this by using the importNode method on
the Document interface. However, this method does not alter
the original tree. Instead, it creates a clone of the source node and its
descendents that you can then insert into the destination document. This
is OK if that's what you want, but it's somewhat annoying if what you
really want is to move the node from one document to another. This not
only forces you to clean up the source nodes that are left behind, but it
can also be expensive if the subtree you're moving is large.
With DOM Level 3, you can now do this more efficiently with
adoptNode . This method, also found on the
Document interface, effectively moves a subtree from one
document to another. In effect, this changes the
ownerDocument of the nodes in the subtree. Listing 1 shows
how easy it can be to move elements between documents and rename
nodes. Listing 1. Moving elements and renaming
nodes
// Renaming nodes
Element element = document.createElementNS("http://example.com", "street");
// if implementation can rename the node, element returned
// is the same object as was originally created
element = document.renameNode(element, "http://example.com", "address");
// adopting previously created node to a different document
Node adoptedNode = document2.adoptNode(element);
|
Again, because the DOM is designed to work on many different types of
implementations, and because the source document and the destination
document may belong to two different types of implementations, moving
nodes from one document to the other may not be possible. In this case,
adoptNode throws a NOT_SUPPORTED_ERR
DOMException that you can catch. However, this is only required if
your application actually deals with multiple DOM implementations at the
same time.
Comparing nodes DOM
Level 3 brings a set of methods to compare nodes in many different ways.
This includes a method to test whether two nodes are equal, are the same,
and how they are positioned relative to each other in the document tree.
You are probably familiar with the concepts of identity and equality. In
the Java language, identity is tested with the operator == ,
equality on the other hand is tested with a method such as
equals . For two objects to be identical, they have to be the
same object in
memory. On the other hand, for two objects to be equal all they need is to
have the same characteristics. Therefore, two objects that are identical
are equal, but two objects that are equal are not necessarily
identical.
DOM Level 3 defines what it takes for two nodes to be equal and
provides a method, isEqualNode on Node , to
perform this test. For example, if you create two empty element nodes
named "foo" without any attributes, they are equal, even though they are
not identical.
You could use something like == to test for identity;
however, some DOM implementations with a complex internal structure do not
expose their objects directly as nodes, but create proxies that are
returned to the application. And they may create more than one of these
proxies for the same node. This means that the object returned by a DOM
operation, such as getFirstChild , may be different every time
you call that method -- even if nothing else has changed. In this case, if
you compare the identity of the returned objects, you will find that they
are not identical. However, they really are references to the same node
inside the implementation. The way to find this out is to use
isSameNode . This tells you whether what you are looking at
are proxies to the same object or objects that are actually different.
In addition to what we said about the equality of identical objects,
isEqualNode always returns "true" if isSameNode
returns "true". But two nodes that are equal are not necessarily the
same.
The last addition that helps you compare nodes is the
compareDocumentPosition method. This method allows you to
find out how two nodes are positioned with respect to each other in the
document tree. No more searching into your old books for the best
algorithm to find out whether one node is positioned before the old one in
the tree. This method tells you all you need to know: whether one node is
a descendent or an ancestor of the other, whether it is before or after,
and so on.
In addition, what might look like a convenience function can actually
be more than that. Indeed, operations like
compareDocumentPosition are likely to be done more
efficiently by the implementation than by you, thanks to its knowledge of
what works best with its internal structure. For example, an operation
that would require you to traverse the tree would force you to choose
between traversing the tree by getting the first child and then its next
sibling, or by getting the list of child nodes and iterating over it.
Depending on what the internal structure really looks like, one method may
be faster than the other. But you have no way to determine this, and even
if you did what may be best for one implementation may not be for another.
On the other hand, if you use a method such as
compareDocumentPosition and defer to the implementation to
traverse the tree for you, you're guaranteed to always use the best way to
do so. DOM Level 3 Core has several such functions; one of these is
textContent , described in the following section.
Handling text Until
now, to replace the text content of an element node, you had to remove its
children, create a Text node with the new content, and insert
it as child of the Element node. Retrieving the content also
required several steps, as shown in Listing 2. Listing 2. Retrieving the text content of an element with
DOM Level 2.
// Assuming element has two children comment and
// a text node
NodeList list = elem.getChildNodes();
int len = list.getLength();
for (int i=0;i<len;i++){
elem.removeChild(list.item(i));
}
elem.appendChild(document.createTextNode("content"));
|
With DOM Level 3 it is now much easier to retrieve and set text content
on an Element node. The new read/write
textContent attribute allows an easy manipulation of text
content: Setting this
attribute removes all the child nodes and replaces them with a single text
node if you do not set it to an empty value; getting this attribute
returns the concatenated text content of this node and its
descendants. Listing 3. Retrieving the text content
of an element and modifying it with DOM Level 3.
String oldContent = elem.getTextContent();
elem.setTextContent("content");
|
This also makes it straightforward to create elements that simply
contain a piece of text -- all you need to do is create the element and
set its textContent . This basically gets the
Text nodes out of the way and lets you deal with the text in
your document more directly.
Another useful addition is the new wholeText attribute on
the Text interface. This returns all the text contained in
the logically-adjacent text nodes. In practice, this means that when you
look at the child node of an element and it's a Text node,
you can get all the text that is at that position in the document in one
call. You no longer have to worry about the possibility that your text is
being held by several adjacent Text nodes that need to be
concatenated. The wholeText attribute gives you the answer
you want directly.
User data In many
cases, the DOM does not actually contain all the data you have in your
application; it's only one part of it. In fact, a DOM node often relates
to some other object in your application. The challenge is managing the
relationship between the two structures. In the past, in order to do this
you had to store a reference to the DOM node in your structure, or if it
was impossible you had to have yet another structure, such as a hash
table, to store information on how to go from one structure to the other.
As a result, it could be a real pain to maintain these when the DOM
mutates. In particular, nodes could be modified or deleted without you
ever knowing about it, and not having a chance to update your own
structure accordingly.
DOM Level 3 can do a lot of this work for you. First, it allows you to
store a reference to your application object on a Node . The
object is associated with a key that you can use to retrieve that object
later. You can have as many objects on a Node as you want;
all you need to do is use different keys. Second, you can register a
handler that is called when anything that could affect your own structure
occurs. These are events such as a node being cloned, imported to another
document, deleted, or renamed. With this, you can now much more easily
manage the data you associate with your DOM. You no longer have to worry
about maintaining the two in parallel. You simply need to implement the
appropriate handler and let it be called whenever you modify your DOM
tree. And you can do this with the flexibility of using a global handler
or a different one on each node as you see fit. In any case, when
something happens to a node on which you have attached some data, the
handler you registered is called and provides you with all the information
you need to update your own structure accordingly.
Conclusion We've shown
you how DOM Level 3 Core can make your life easier when working with
nodes, whether it is renaming a node, moving nodes from one document to
another, or comparing them. We've also shown you how DOM Level 3 Core lets
you access and modify the text content of your document in a more natural
way than having to deal with Text nodes that tend to get in
the way. Finally, we've explained to you how you can use the DOM Level 3
Core to more easily maintain your own structure that is associated with
the DOM.
In Part 2, we will show you other interesting features of DOM Level 3
Core, such as how to bootstrap and get your
hands on a DOMImplementation object without having any
implementation-dependent code in your application, how the DOM maps to the
XML Infoset, how to revalidate your document in memory, and how to use DOM
Level 3 Core in Xerces.
Resources
- Read about the DOM Level 2 Core W3C
Recommendation.
- Get familiar with the latest DOM Level 3 Core Last
Call draft.
- Learn about the Xerces2 DOM
implementation.
- Download the latest Xerces-J
parser.
- Part 2 of this article
series (developerWorks, August
2003) introduces other DOM Level 3 Core features, such as "bootstrap",
revalidation of the DOM in memory, and the early implementation of this
API in the Apache Xerces2 project.
- Find more XML resources on the developerWorks XML
zone, including the introductory tutorial Understanding DOM (developerWorks, July
2003).
- Check out IBM WebSphere Studio Site
Developer a robust, easy-to-use development environment for
creating, building, and maintaining dynamic Web sites, applications, and
Web services.
- Find out how you can become an IBM Certified Developer in
XML and related technologies.
About the
authors Arnaud Le Hors is a Senior Software Engineer at
IBM, and is part of the XML Standards Strategy Group. He represents
IBM in various Working Groups of W3C, such as XML Core and DOM. He's
one of the editors of the DOM Level 1, 2, and 3, Core
Specifications. Arnaud is also one of the developers of Xerces and
one of the designers of Xerces2. You can reach him at lehors@us.ibm.com. |
Elena Litani is a Staff Software
Developer at the IBM Toronto Lab. She is one of the lead developers
of Xerces2. For the last two years, Elena has been representing IBM
in the W3C DOM Working Group. You can reach her at elitani@ca.ibm.com.
|
|
|