IBM Skip to main content
Search for:   within 
      Search help  
     IBM home  |  Products & services  |  Support & downloads   |  My account

developerWorks > XML
developerWorks
Tip: Set up a SAX parser
60 KBe-mail it!
Contents:
Getting a parser
Features
Properties
Resources
About the author
Rate this article
Related content:
Introduction to XML
Understanding SAX
Achieving vendor independence with SAX
Subscribe to the XML tips newsletter
Subscriptions:
dW newsletters
dW Subscription
(CDs and downloads)
Use properties and features in SAX parsers

Level: Introductory

Brett McLaughlin (mailto:brett@oreilly.com?cc=&subject=Tip: Set up a SAX parser)
Author, O'Reilly and Associates
2 July 2003

This is the first in a series of tips that will serve as a comprehensive guide to using XML from the Java programming language. I begin with coverage of the SAX API. This tip reviews getting an instance of a SAX parser and setting various features and properties on that parser. Also, be sure to participate in the developerWorks XML and Java technology forum, hosted by Brett McLaughlin.

Working with XML from Java is a pretty rich topic; multiple APIs are available, and many of these make working with XML as easy as reading lines from a text document. Tree-based APIs like DOM present an in-memory XML structure that is optimal for GUIs and editors, and stream-based APIs like SAX are great for high-performance applications that only need to get at a document's data. In this series of tips, I walk you through the use of XML from Java, starting with the basics. Along the way, you'll learn lots of tricks that many of the pros don't even know about, so stick around even if you already have some XML experience.

I begin with SAX -- the Simple API for XML. While this API is probably the hardest of the Java and XML APIs to master, it's also arguably the most powerful. Additionally, most other API implementations (like DOM parsers, JDOM, dom4j, and so forth) are based in part on a SAX parser. Understanding SAX gives you a headstart on everything else you do in XML and the Java language. In this tip specifically, I'll cover getting an instance of a SAX parser and setting some basic features and properties of that parser.

Note: I'm assuming you have downloaded a SAX-compliant parser, such as Apache Xerces-J (see Resources for links). The Apache site has a wealth of information on how to get things set up, but basically you just need to drop the downloaded JAR files into your CLASSPATH. These examples assume that your parser is available for use.

Getting a parser
The first step in working with SAX is actually getting an instance of a parser. In SAX, the parser is represented by an instance of the org.xml.sax.XMLReader class. I covered this in detail in a previous tip ("Achieving vendor independence with SAX" -- see Resources), so I won't spend much time on it here. Listing 1 shows the correct way to get a new SAX parser instance without writing vendor-dependent code.

Listing 1. Getting a SAX parser instance

// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader(); 

Using this methodology, you need to set the system property org.xml.sax.driver to the class name of the parser you want to load. This is a vendor-specific class; for Xerces it should be org.apache.xerces.parsers.SAXParser. You specify this argument with the -D switch to your Java compiler:


java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser some.sample.Class

Of course, you want to ensure that the class specified exists and is on your class path.

Features
Once you have an instance of your parser, you need to configure it. Note that this isn't the same as setting up the parser to deal with errors, content, or structures in XML; instead, configuration is the process of actually telling the parser how to behave. You may turn on validation, turn off namespace checking, and expand entities. These behaviors are totally independent of a specific XML document, and therefore involve interaction with your new parser instance.

Note: For those of you who are overly anxious (I know you're out there), I will indeed be dealing with content, error handling, and the like. However, those subjects will be addressed in future tips, so you'll have to check back. For now, just concentrate on configuration, features, and properties.

You can configure parsers in two ways: features and properties. Features involve turning on or off a specific piece of functionality, like validation. Properties involve setting the value of a specific item that the parser uses, like the location of a schema to validate all documents against. I'll deal with features first, and then look at properties in the next section.

Features are set, not surprisingly, through a method on your parser called setFeature(). The syntax looks like that in Listing 2.

Listing 2. Setting features on a SAX parser

// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader(); 

String featureName = "some feature URI";
boolean featureOn = true;

try {
  parser.setFeature(featureName, featureOn);
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting feature: " + e.getMessage());
}

This is pretty self-explanatory; the key is knowing the common features available to SAX parsers. Each feature is identified by a specific URI. A complete list of these URIs is available online at the SAX Web site (see Resources). Some of the most common features are validation and namespace processing. Listing 3 shows an example of setting both of these properties.

Listing 3. Some common features

// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader(); 

try {
  // Turn on validation
  parser.setFeature("http://xml.org/sax/features/validation", true);
  // Ensure namespace processing is on (the default)
  parser.setFeature("http://xml.org/sax/features/namespaces", true);
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting feature: " + e.getMessage());
}

Note that while parsers have several standard SAX features, they are free to add their own vendor-specific features. For example, Apache Xerces-J adds features that allow for dynamic validation and the continuance of processing after encountering a fatal error. Consult your parser vendor's documentation for the relevant feature URIs.

Properties
Once you understand features, making sense of properties is easy. They behave in exactly the same manner, except that properties take an object as an argument where features take in a boolean value. You use the setProperty() method for this purpose, as shown in Listing 4.

Listing 4. Setting properties on a SAX parser

// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader(); 

String propertyName = "some property URI";

try {
  parser.setProperty(propertyName, obj-arg);
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown property specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported property specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting property: " + e.getMessage());
}

The same error-handling framework is in play here, so you can easily duplicate code between the two types of configuration options. As with features, SAX provides a standard set of properties, and vendors can add their own extensions. Common SAX-standard properties allow for setting a Lexical Handler and a Declaration Handler (two handlers I'll discuss in later tips). Parsers like Apache Xerces extend these with, for example, the ability to set the input buffer size and the location of an external schema to use in validation. Listing 5 shows a few properties in action.

Listing 5. Some common properties

// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader(); 

try {
  // Set the chunk to read in by SAX
  parser.setProperty("http://apache.org/xml/properties/input-buffer-size", 
      new Integer(2048));
  // Set a LexicalHandler
  parser.setProperty("http://xml.org/sax/properties/lexical-handler", 
      new MyLexicalHandler());
} catch (SAXNotRecognizedException e) {
  System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
  System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
  System.err.println("Error in setting feature: " + e.getMessage());
}

With an understanding of features and properties, you can make your parser do almost anything. Once you understand setting up your parser in this fashion, you're ready for my next tip, which will discuss building a basic content handler. Until then, I'll see you online in the XML and Java technology forum.

Resources

About the author
Brett McLaughlin has been working in computers since the Logo days. (Remember the little triangle?) He currently specializes in building application infrastructure using Java-related technologies. He has spent the last several years implementing these infrastructures at Nextel Communications and Allegiance Telecom, Inc. Brett is one of the co-founders of the Java Apache project Turbine, which builds a reusable component architecture for Web application development using Java servlets. He is also a contributor of the EJBoss project, an open source EJB application server, and Cocoon, an open source XML Web-publishing engine.


60 KBe-mail it!

What do you think of this document?
Killer! (5) Good stuff (4) So-so; not bad (3) Needs work (2) Lame! (1)

Comments?



developerWorks > XML
developerWorks
  About IBM  |  Privacy  |  Terms of use  |  Contact