|
|
|
Contents: |
|
|
|
Related content: |
|
|
|
Subscriptions: |
|
|
| Use properties and features in SAX parsers
Brett
McLaughlin (mailto:brett@oreilly.com?cc=&subject=Tip:
Set up a SAX parser) Author, O'Reilly and Associates 2 July
2003
This is the first in a series of tips
that will serve as a comprehensive guide to using XML from the Java
programming language. I begin with coverage of the SAX API. This tip
reviews getting an instance of a SAX parser and setting various features
and properties on that parser. Also, be sure to participate in the
developerWorks XML and Java technology forum, hosted by Brett
McLaughlin.
Working with XML from Java is a pretty rich topic; multiple APIs are
available, and many of these make working with XML as easy as reading
lines from a text document. Tree-based APIs like DOM present an in-memory
XML structure that is optimal for GUIs and editors, and stream-based APIs
like SAX are great for high-performance applications that only need to get
at a document's data. In this series of tips, I walk you through the use
of XML from Java, starting with the basics. Along the way, you'll learn
lots of tricks that many of the pros don't even know about, so stick
around even if you already have some XML experience.
I begin with SAX -- the Simple API for XML. While this API is probably
the hardest of the Java and XML APIs to master, it's also arguably the
most powerful. Additionally, most other API implementations (like DOM
parsers, JDOM, dom4j, and so forth) are based in part on a SAX parser.
Understanding SAX gives you a headstart on everything else you do in XML
and the Java language. In this tip specifically, I'll cover getting an
instance of a SAX parser and setting some basic features and properties of
that parser.
Note: I'm assuming you have downloaded a SAX-compliant
parser, such as Apache Xerces-J (see Resources
for links). The Apache site has a wealth of information on how to get
things set up, but basically you just need to drop the downloaded JAR
files into your CLASSPATH . These examples assume that your
parser is available for use.
Getting a parser The
first step in working with SAX is actually getting an instance of a
parser. In SAX, the parser is represented by an instance of the
org.xml.sax.XMLReader class. I covered this in detail in a
previous tip ("Achieving vendor independence with SAX" -- see Resources),
so I won't spend much time on it here. Listing 1 shows the correct way to
get a new SAX parser instance without writing vendor-dependent code.
Listing 1. Getting a SAX parser instance
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
|
Using this methodology, you need to set the system property
org.xml.sax.driver to the class name of the parser you want
to load. This is a vendor-specific class; for Xerces it should be
org.apache.xerces.parsers.SAXParser . You specify this
argument with the -D switch to your Java compiler:
java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser some.sample.Class
|
Of course, you want to ensure that the class specified exists and is on
your class path.
Features Once you have
an instance of your parser, you need to configure it. Note that this isn't
the same as setting up the parser to deal with errors, content, or
structures in XML; instead, configuration is the process of actually
telling the parser how to behave. You may turn on validation, turn off
namespace checking, and expand entities. These behaviors are totally
independent of a specific XML document, and therefore involve interaction
with your new parser instance.
Note: For those of you who are overly anxious (I know you're
out there), I will indeed be dealing with content, error handling, and the
like. However, those subjects will be addressed in future tips, so you'll
have to check back. For now, just concentrate on configuration, features,
and properties.
You can configure parsers in two ways: features and properties.
Features involve turning on or off a specific piece of
functionality, like validation. Properties involve setting the
value of a specific item that the parser uses, like the location of a
schema to validate all documents against. I'll deal with features first,
and then look at properties in the next section.
Features are set, not surprisingly, through a method on your parser
called setFeature() . The syntax looks like that in Listing
2. Listing 2. Setting features on a SAX
parser
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
String featureName = "some feature URI";
boolean featureOn = true;
try {
parser.setFeature(featureName, featureOn);
} catch (SAXNotRecognizedException e) {
System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
System.err.println("Error in setting feature: " + e.getMessage());
}
|
This is pretty self-explanatory; the key is knowing the common features
available to SAX parsers. Each feature is identified by a specific URI. A
complete list of these URIs is available online at the SAX Web site (see
Resources).
Some of the most common features are validation and namespace processing.
Listing 3 shows an example of setting both of these properties. Listing 3. Some common features
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
try {
// Turn on validation
parser.setFeature("http://xml.org/sax/features/validation", true);
// Ensure namespace processing is on (the default)
parser.setFeature("http://xml.org/sax/features/namespaces", true);
} catch (SAXNotRecognizedException e) {
System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
System.err.println("Error in setting feature: " + e.getMessage());
}
|
Note that while parsers have several standard SAX features, they are
free to add their own vendor-specific features. For example, Apache
Xerces-J adds features that allow for dynamic validation and the
continuance of processing after encountering a fatal error. Consult your
parser vendor's documentation for the relevant feature URIs.
Properties Once you
understand features, making sense of properties is easy. They behave in
exactly the same manner, except that properties take an object as an
argument where features take in a boolean value. You use the
setProperty() method for this purpose, as shown in Listing 4.
Listing 4. Setting properties on a SAX
parser
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
String propertyName = "some property URI";
try {
parser.setProperty(propertyName, obj-arg);
} catch (SAXNotRecognizedException e) {
System.err.println("Unknown property specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
System.err.println("Unsupported property specified: " + e.getMessage());
} catch (SAXException e) {
System.err.println("Error in setting property: " + e.getMessage());
}
|
The same error-handling framework is in play here, so you can easily
duplicate code between the two types of configuration options. As with
features, SAX provides a standard set of properties, and vendors can add
their own extensions. Common SAX-standard properties allow for setting a
Lexical Handler and a Declaration Handler (two handlers I'll discuss in
later tips). Parsers like Apache Xerces extend these with, for example,
the ability to set the input buffer size and the location of an external
schema to use in validation. Listing 5 shows a few properties in
action. Listing 5. Some common properties
// Obtain an instance of an XMLReader implementation from a system property
XMLReader parser = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
try {
// Set the chunk to read in by SAX
parser.setProperty("http://apache.org/xml/properties/input-buffer-size",
new Integer(2048));
// Set a LexicalHandler
parser.setProperty("http://xml.org/sax/properties/lexical-handler",
new MyLexicalHandler());
} catch (SAXNotRecognizedException e) {
System.err.println("Unknown feature specified: " + e.getMessage());
} catch (SAXNotSupportedException e) {
System.err.println("Unsupported feature specified: " + e.getMessage());
} catch (SAXException e) {
System.err.println("Error in setting feature: " + e.getMessage());
}
|
With an understanding of features and properties, you can make your
parser do almost anything. Once you understand setting up your parser in
this fashion, you're ready for my next tip, which will discuss building a
basic content handler. Until then, I'll see you online in the XML and Java technology forum.
Resources
About the
author Brett
McLaughlin has been working in computers since the Logo days.
(Remember the little triangle?) He currently specializes in building
application infrastructure using Java-related technologies. He has
spent the last several years implementing these infrastructures at
Nextel Communications and Allegiance Telecom, Inc. Brett is one of
the co-founders of the Java Apache project Turbine, which builds a
reusable component architecture for Web application development
using Java servlets. He is also a contributor of the EJBoss project,
an open source EJB application server, and Cocoon, an open source
XML Web-publishing engine. |
|
|