IBM Skip to main content
Search for:   within 
      Search help  
     IBM home  |  Products & services  |  Support & downloads   |  My account

developerWorks > XML
developerWorks
Tip: Attributes in ContentHandler
58 KBe-mail it!
Contents:
Resources
About the author
Rate this article
Related content:
Elements and text in ContentHandler
Get the most from ContentHandlers
Subscriptions:
dW newsletters
dW Subscription
(CDs and downloads)
Extracting (more) data from XML documents

Level: Introductory

Brett McLaughlin (mailto:brett@oreilly.com?cc=&subject=Tip: Attributes in ContentHandler)
Author, O'Reilly and Associates
21 August 2003

The one aspect of data processing with ContentHandler that the author didn't cover in his last tip was attribute processing. While attributes are most commonly used for information transfer between an XML document and an XML processor, they also often contain valuable business data. In this tip, Brett shows you how SAX handles elements and reports those elements, as well as how you can use code to extract element data.

If you've been following along with this series of tips, you may be expecting to read about the SAX ErrorHandler interface -- that's what I promised at the end of the last tip, and that was certainly my intention. However, I've received several requests and suggestions for coverage of one last aspect of the ContentHandler interface, which of course I've been discussing for several tips now. Since the request was a good one, and involved another very common part of XML processing, I thought it was worth dealing with now. (For those of you who are just pining over error handling and the like, I hope you can hang on until my next tip!)

The request, of course, was for XML attribute processing. After the last several tips, I trust you're confident setting up, registering, and using ContentHandlers, and that you have no problem locating a specific element, or getting its textual content. What I left out, though, was how to obtain an attribute's value. This turns out to be a pretty simple process, so I'll breeze through it in this tip.

First, you need to locate the attribute (or attributes) that you want the value for. To accomplish this, you should begin by figuring out which element the attribute appears on. This can be done by looking at the XML document you're interested in (Listing 1 shows a simple example), or by browsing a DTD (shown in Listing 2) or XML Schema. All are valid approaches -- pick the one you prefer.

Listing 1. A simple XML document
<?xml version="1.0"?>

<root>
  <some-element some-attribute="value">Some content in the element</some-element>
  <some-other-element>
    <child age="1" birthDate="06/02/2003">
      More content
    <child>
  </some-other-element>
<root>
Listing 2. A simple XML DTD (for Listing 1)
<!ELEMENT root (some-element*, some-other-element+)>

<!ELEMENT some-element (#PCDATA)>
<!ATTLIST some-element
          some-attribute  CDATA #REQUIRED
>

<!ELEMENT some-other-element (child+)>

<ELEMENT child (#PCDATA)>
<!ATTLIST child
          age                   CDATA #REQUIRED
          birthDate             CDATA #REQUIRED
>

For the sake of this example, assume that you're looking for the birthDate attribute. Whether you look at the XML document or the DTD (or an XML Schema), you should be able to determine that the birthDate attribute is attached to the child element. So your first task is to locate that element. Of course, you already know how to do that, so this is a piece of cake -- if you've forgotten, Listing 3 is a quick refresher.

Listing 3. Finding the child element
    public void startElement (String uri, String localName,
            String qName, Attributes atts)
  throws SAXException {
  
      if (localName.equals("child")) {
        // Deal with the attributes
    }
  }

Now, you are going to start working with a new SAX class: Attributes. To be accurate, this is actually an interface, and your parser vendor provides some type of implementation of this interface. In either case, you're only going to be dealing with the public interface methods, so don't worry about what goes on under the hood. To get started, take a look at Listing 4, which is the Attributes interface in all its glory.

Listing 4. The SAX Attributes interface
package org.xml.sax;

public interface Attributes
{
    ////////////////////////////////////////////////
    // Indexed access.
    ////////////////////////////////////////////////

    public abstract int getLength ();

    public abstract String getURI (int index);

    public abstract String getLocalName (int index);

    public abstract String getQName (int index);

    public abstract String getType (int index);

    public abstract String getValue (int index);

    ////////////////////////////////////////////////
    // Name-based queries
    ////////////////////////////////////////////////

    public int getIndex (String uri, String localName);

    public int getIndex (String qName);

    public abstract String getType (String uri, String localName);

    public abstract String getType (String qName);

    public abstract String getValue (String uri, String localName);

    public abstract String getValue (String qName);
}

This should be pretty easy to understand; the rest of the tip isn't going to be anything revelatory. As the comments of this interface indicate, you can access an attribute by either its name or its index. If you know the name of the attribute (as you do in the make-believe example -- birthDate), I recommend using name-based queries, as shown in Listing 5.

Listing 5. Finding the birthDate attribute
    public void startElement (String uri, String localName,
            String qName, Attributes atts)
  throws SAXException {
  
      if (localName.equals("child")) {
        String childValue = atts.getValue("", "birthDate");
      // Do something with the value
    }
  }

Simple enough, right? Notice that I could have used the version that took in a qName (getValue(String qName)), but I generally prefer to pass in a URI and local name, just for self-documentation. This case has no URI, so an empty string works just fine. I could have also used getValue("birthDate") and gotten the same results.

You can also use index-based access for your attribute work. This isn't so common when you know the name of the attribute. In fact, it's downright dangerous in those cases. The SAX specification doesn't guarantee that XML attributes are going to be reported in the same order that they appear in the document being processed. This means that even if you can visually verify that a specific attribute appears second in the list of attributes on an element, it won't necessarily be reported to the startElement() method as the second in the attribute list. So you really shouldn't rely on index-based access for a specific named element.

That said, index-based access is still really useful. For example, it allows you to check out all attributes, and then to get the name and value for each. Consider the code in Listing 6, which does just that, all using index-based access.

Listing 6. Inspecting all attributes for the child element

public void startElement (String uri, String localName,
          String qName, Attributes atts)
  throws SAXException {
  
    if (localName.equals("child")) {
      int numAtts = atts.getLength();
        for (int i=0; i<numAtts; i++) {
          String attName = atts.getQName(index);
        String value = atts.getValue(index);
        System.out.println(" * Attribute named " + attName + 
                    " found, with value '" + value + "'");
        }
    }
  }

Well, I really am done with the ContentHandler interface this time. In my next tip, I will indeed move on to ErrorHandler, and see exactly how it can be used to handle everything from a misplaced angle bracket to a missing attribute. I'll also show you how a single parser (represented by an instance of an XMLReader) can have multiple handlers registered to it. For those of you who came to this article all jazzed up about error handling, sorry! Until my next tip, then, I will indeed see you online; and let me know what you think; as you can see from this article, it does indeed make a difference.

Resources

About the author
Brett McLaughlin has been working in computers since the Logo days (Remember the little triangle?). He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.


58 KBe-mail it!

What do you think of this document?
Killer! (5) Good stuff (4) So-so; not bad (3) Needs work (2) Lame! (1)

Comments?



developerWorks > XML
developerWorks
  About IBM  |  Privacy  |  Terms of use  |  Contact