| ||
IBM home | Products & services | Support & downloads | My account |
|
Java programming dynamics, Part 1: Classes and class loading | ||||
A look at classes and what goes on as they're loaded by a
JVM
This article kicks off a new series covering a family of topics that I call Java programming dynamics. These topics range from the basic structure of the Java binary class file format, through run-time metadata access using reflection, all the way to modifying and constructing new classes at run time. The common thread running through all this material is the idea that programming the Java platform is much more dynamic than working with languages that compile straight to native code. If you understand these dynamic aspects, you can do things with Java programming that can't be matched in any other mainstream programming language. In this article, I cover some of the basic concepts that underlie these dynamic features of the Java platform. These concepts revolve around the binary format used to represent Java classes, including what happens when these classes are loaded into the JVM. Not only does this material provide a foundation for the rest of the articles in the series, it also demonstrates some very practical concerns for developers working on the Java platform. A class in
binary The binary class format is actually defined by the JVM specification.
Normally these class representations are generated from Java language
source code by a compiler, and they're usually stored in files with a
So what does this class format actually look like? Listing 1 gives the source code for a (very) short class, along with a partial hexadecimal display of the class file output by the compiler: Listing 1. Source and (partial) binary for Hello.java
Inside the
binary
The rest of the data is less entertaining. Following the signature are a pair of class format version numbers (in this case, for minor version 0 and major version 46 -- 0x2e in hexadecimal -- as generated by the 1.4.1 javac), then a count of entries in the constant pool. The entry count (in this case 26, or 0x001a) is followed by the actual constant pool data. This is where all the constants used by the class definition are stored. It includes class and method names, signatures, and strings (which you can recognize in the text interpretation to the right of the hexadecimal dump), along with various binary values. Items in the constant pool are variable length, with the first byte of each item identifying the type of item and how it should be decoded. I won't go into the details of all that here -- there are many references available if you're interested, starting with the actual JVM specification. The key point is just that the constant pool contains all the references to other classes and methods used by this class, along with the actual definitions for this class and its methods. The constant pool can easily make up half or more of the binary class size, though the average proportion is probably less. Following the constant pool are several items that reference constant pool entries for the class itself, its super class, and interfaces. These items are followed by information about the fields and methods, which are themselves represented as complex structures. The executable code for methods is present in the form of code attributes contained within the method definitions. This code is in the form of instructions for the JVM, generally called bytecode, which is one of the topics for the next section. Attributes are used for several defined purposes in the Java class format, including the already-mentioned bytecode, constant values for fields, exception handling, and debugging information. These purposes aren't the only possible uses for attributes, though. From the beginning, the JVM specification has required JVMs to ignore attributes of unknown types. This requirement gives flexibility for extending the use of attributes to serve other purposes in the future, such as providing meta-information needed by frameworks that work with user classes -- an approach that the Java-derived C# language has used extensively. Unfortunately, no hooks have yet been provided for making use of this flexibility at the user level. Bytecode and
stacks This virtual machine is actually fairly simple. It uses a stack architecture, meaning instruction operands are loaded to an internal stack before they're used. The instruction set includes all the normal arithmetic and logical operations, along with conditional and unconditional branches, load/store, call/return, stack manipulation, and several special types of instructions. Some of the instructions include immediate operand values that are directly encoded into the instruction. Others directly reference values from the constant pool. Even though the virtual machine is simple, the implementations aren't necessarily so. Early (first generation) JVMs were basically interpreters for the virtual machine bytecode. These actually were relatively simple, but suffered from severe performance problems -- interpreting code is always going to take longer than executing native code. To reduce these performance problems, second generation JVMs added just-in-time (JIT) translation. The JIT technique compiles Java bytecode to native code before executing it for the first time, giving much better performance for repeated executions. Current generation JVMs go even further, using adaptive techniques to monitor program execution and selectively optimize heavily used code. Loading the
classes Rather than a separate step, linking classes is part of the job performed by the JVM when it loads them into memory. This adds some overhead as classes are initially loaded, but also provides a high level of flexibility for Java applications. For example, applications can be written to use interfaces with the actual implementations left unspecified until run time. This late binding approach to assembling an application is used extensively in the Java platform, with servlets being one common example. The rules for loading classes are spelled out in detail in the JVM
specification. The basic principle is that classes are only loaded when
needed (or at least appear to be loaded this way -- the JVM has some
flexibility in the actual loading, but must maintain a fixed sequence of
class initialization). Each class that gets loaded may have other classes
that it depends on, so the loading process is recursive. The classes in
Listing 2 show how this recursive loading works. The
Setting the parameter
This is only a partial listing of the most important parts -- the full
trace consists of 294 lines, most of which I deleted for this listing. The
initial set of class loads (279, in this case) are all triggered by the
attempt to load the The portion of the listing after the A lot happens inside the JVM when a class is loaded and initialized,
including decoding the binary class format, checking compatibility with
other classes, verifying the sequence of bytecode operations, and finally
constructing a Off the beaten (class)
path Bootstrap isn't the only class loader. For starters, a JVM defines an
extension
class loader for loading classes from standard Java extension APIs, and a
system class
loader for loading classes from the general class path (including your
application classes). Applications can also define their own class loaders
for special purposes (such as run-time reloading of classes). Such added
class loaders are derived from the Each class loader also keeps a reference to a parent class loader, defining a tree of class loaders with the bootstrap loader at the root. When an instance of a particular class (identified by name) is needed, whichever class loader initially handles the request normally checks with its parent class loader first before trying to load the class directly. This applies recursively if there are multiple layers of class loaders, so it means that a class will normally be visible not only within the class loader that loaded it, but also to all descendant class loaders. It also means that if a class can be loaded by more than one class loader in a chain, the one furthest up the tree will be the one that actually loads it. There are many circumstances where multiple application classloaders are used by Java programs. One example is within the J2EE framework. Each J2EE application loaded by the framework needs to have a separate class loader to prevent classes in one application from interfering with other applications. The framework code itself will also use one or more other class loaders, again to prevent interference to or from applications. The complete set of class loaders make up a tree-structured hierarchy with different types of classes loaded at each level. Trees of loaders Figure 1. Tomcat class loaders In this type of environment, keeping track of the proper loader to use
for requesting a new class can be messy. Because of this, the
The flexibility of being able to load independent sets of classes is an
important feature of the Java platform. Useful as this feature is, though,
it can create confusion in some cases. One confusing aspect is the
continuing issue of dealing with JVM classpaths. In the Tomcat hierarchy
of class loaders shown in Figure 1, for instance, classes loaded by the
Common class loader will never be able to directly access (by name)
classes loaded by a Web application. The only way to tie these together is
through the use of interfaces visible to both sets of classes. In this
case, that includes the Problems can arise when code is moved between class loaders for any reason. For instance, when J2SE 1.4 moved the JAXP API for XML processing into the standard distribution, it created problems for many environments where applications had previously relied on loading their own chosen implementations of the XML APIs. With J2SE 1.3, this can be done just by including the appropriate JAR file in the user class path. In J2SE 1.4, the standard versions of these APIs are now in the extensions class path, so these will normally override any implementations present in the user class path. Other types of confusion are also possible when using multiple class
loaders. Figure 2 shows an example of a class identity
crisis that results when an interface and associated implementation
are each loaded by two separate class loaders. Even though the names and
binary implementations of the interfaces and classes are the same, an
instance of the class from one loader cannot be recognized as implementing
the interface from the other loader. This confusion could be resolved in
Figure 2 by moving the interface class Figure 2. Class identity crisis Conclusions The cost of the Java platform's flexibility in this area is somewhat higher overhead when starting an application. Hundreds of separate classes need to be loaded by the JVM before it can start executing even the simplest application code. This startup cost generally makes the Java platform better suited to long-running, server-type applications than for frequently used small programs. Server applications also benefit the most from the flexibility of run-time assembly of code, so it's no surprise that the Java platform has become increasingly favored for this type of development. In Part 2 of this series, I'll cover an introduction to using another aspect of the Java platform's dynamic underpinnings: the Reflection API. Reflection gives your executing code access to internal class information. This can be a great tool for building flexible code that can be hooked together at run time without the need for any source code links between classes. But as with most tools, you need to know when and how to use it to best advantage. Check back to find out the tricks and trade-offs of effective reflection in Part 2 of Java programming dynamics.
|
About IBM | Privacy | Terms of use | Contact |