Diagnosing Java code: Java generics without the pain, Part 3


Search for:	within
		Search help

IBM home | Products & services | Support & downloads | My account

developerWorks > Java technology


	Diagnosing Java code: Java generics without the pain, Part 3

Contents:

Valid constructor calls

Polymorphic recursion

Related content:

Diagnosing Java code series

Catching more errors at compile time with Generic Java

Automatic Code Generation from Design Patterns (PDF)

Subscriptions:

dW newsletters

dW Subscription
(CDs and downloads)

Overcoming the limitations of generics in the JSR-14 prototype compiler

Level: Intermediate

Eric E. Allen (mailto:eallen@cs.rice.edu?cc=&subject=Java generics without the pain, Part 3)
Ph.D. candidate, Java programming languages team, Rice University
9 April 2003

Java developer and researcher Eric Allen continues his discussion of generic types in JSR-14 and Tiger with a look at the ramifications of adding support for new operations on "naked" type parameters in generic types. Share your thoughts on this article with the author and other readers in the accompanying discussion forum. (You can also click Discuss at the top or bottom of the article to access the forum.)

In this installment of our series on the addition of generic types to Java programming, we'll consider one of the two limitations on the use of generics that we haven't discussed, namely the addition of support for new operations on "naked" type parameters (such as new T() in a class C<T>).

As I mentioned last month, Tiger and JSR-14 implement generic types over the Java language using "type erasure." With type erasure, generic types are used for type checking only; afterwards, they are replaced with their upper bound. As is natural with this definition, erasure would interfere with expressions such as new T().

If the bound on T were, say Object, such an expression would be erased to new Object() and, regardless of how T was instantiated (String, List, URLClassLoader, and so on), the new operation would make a new instance of Object. Clearly, that's not what we want.

To add support for expressions such as new T(), as well as the other type-dependent operations we discussed last time such as casts and instanceof expressions, we'd have to adopt some implementation strategy other than type erasure (such as keeping a separate class for each generic instantiation). But in the case of new operations, there are other issues that must be addressed.

In particular, many fundamental language-design issues have to be decided upon to make such an addition to the Java language a reality.

Don't miss the rest of this series
Part 1, Introduction to generic types and support features (February 2003)

Part 2, Extension limitations and implementation strategies (March 2003)

Part 4, Adding support for mixins through generic types (May 2003)

Valid constructor calls
First off, in order to form a legal new expression on a type parameter, such as new T(), it's necessary to make sure that we're calling a constructor that is valid for every instantiation of T. But because all we know about T is that it's a subtype of its declared bound, we have no idea what constructors an instantiation of T will have. This problem could be handled in one of three ways:

Require all instantiations of a type parameter to include a zeroary constructor.
Throw an exception whenever a run-time instantiation of the generic class does not include a needed constructor.
Modify the language syntax to include more elaborate bounds on the type parameters.

No. 1: Require a zeroary constructor
We could simply require all instantiations of a type parameter to include a zeroary constructor. This solution has the advantage of being very simple. It also has a precedent.

Existing Java technologies, like JavaBeans technology, that deal with similar problems solve the problem by requiring a zeroary constructor. A big disadvantage to this approach, however, is that for many classes there is no reasonable zeroary constructor.

For instance, any class representing a non-empty container would naturally take arguments representing its elements in the constructor. Including a zeroary constructor would force us to first create an instance and only then do the initialization we would have done in the constructor call. But that practice can lead to problems (You might want to read the April 2002 installment of this column "The Run-on Initializer bug pattern" for details; see Resources.)

No. 2: Throw an exception when needed constructor is missing
Another way of handling this problem would be to throw an exception whenever a run-time instantiation of the generic class does not include a needed constructor. Notice that the exception must be thrown at run time. Because of the Java language's incremental compilation model, we can't statically determine all of the instantiations of a generic class that will occur at run time. For example, suppose we have a set of generic classes as follows:

Listing 1. New operation on a "naked" type parameter


class C<T> {
  T makeT() {
    return new T();
  }
}

class D<S> {
  C<S> makeC() {
    return new C<S>();
  }
}

Now, in class D<S>, an instance of class C<S> is constructed. Then in the body of class C, a zeroary constructor on S will be called. Does such a zeroary constructor exist? The answer, of course, is that it depends on the instantiation of S!

If S is instantiated as, say, String, then the answer is yes. If it's instantiated as, say, Integer, then the answer is no. But when compiling classes D and C, we have no idea what instantiations of D<S> other classes may construct. Even if we had the whole program available for analysis (which we almost never have with Java programs), we would have to do a rather costly flow analysis to determine where potential problems with constructors could occur.

Furthermore, the sorts of errors we would get from this technique would be particularly hard for the programmer to diagnose and repair. For example, suppose the programmer is familiar only with the header of class D. He sees that the bound on D's type parameter is the default bound (Object). Given that information, he has no reason to believe that an instantiation of D that meets the declared type bound (such as D<Integer>) would cause an error. In fact, it may not cause an error for quite a long time, until finally someone calls method makeC, and (finally) calls method makeT on the instantiation of C. Then we will get a signaled error, but it will be long after the real problem occurred -- the bad instantiation of class D.

What's more, the stack trace of the signaled error may not even include any method calls on that faulty instance of D! Now let's suppose that the programmer doesn't have access to the source code for class C. He will be entirely in the dark as to what the problem is or how to fix his code unless he can manage to get hold of the maintainer of class C and get a clue.

No. 3: Modify syntax for more elaborate bounds
Another possibility is to modify the language syntax to include more elaborate bounds on the type parameters. These bounds could specify the set of available constructors that must be present in any instantiation of the parameter. Then, inside the generic class definition, the only constructors that could be called are those declared in the bound.

Also, client classes that instantiate our generic class must do so with classes that meet the declared constraint on what constructors exist. That way, the parameter declaration would serve as a contract between the class and its clients, and we could statically check that both obey the contract.

This approach has many advantages over the other two, allowing us to keep the expressiveness of the second approach and the same degree of static checking as in the first. But it also has problems to overcome.

For one thing, the type parameter declarations can easily become wordy. We'd probably need some form of syntactic sugar to make these augmented parameter declarations tolerable. Also, if augmented parameter declarations were added in a future version beyond Tiger, we'd have to ensure that these augmented declarations would be compatible with existing compiled generic classes.

If support for type-dependent operations on generic types is ever added to Java programming, it's not yet clear what form it would take. But from the perspective of which approach will keep Java code as robust as possible (and as easy to fix as possible when it does break), option three is definitely the way to go.

However, there is another problem with new expressions that is even more serious.

Polymorphic recursion
The more serious problem is the potential for polymorphic recursion in class definitions. Polymorphic recursion occurs when a generic class instantiates itself in its own body. For example, consider the following devious example:

Listing 2. A self-referential generic class


class C<T> {
  public Object nest(int n) {
    if (n == 0) return this;
    else return new C<C<T>>().nest(n - 1);
  }
}

Suppose a client class creates a new instance of C<Object> and calls, say, nest(1000). Then in the process of executing method nest(), a new instantiation C<C<Object>> will be constructed and nest(999) will be called on it. Then an instantiation C<C<C<Object>>> will be constructed, and so on, until a thousand separate instantiations of class C are made. Of course, I choose the number 1000 arbitrarily; in general we will have no way of knowing what integers will be passed to method nest at run time. In fact, they could be passed in as user input.

Why is this a problem? Because if we support type-dependent operations on generic types by constructing a separate class for each instantiation, then we will have no way of knowing which classes we'll need to construct until the program is run. But how can that work if the class loader looks for existing class files for each class it loads?

Again, there are a couple of possibilities here:

Place an upper limit on the number of instantiations of a generic class that a program can make.
Statically forbid polymorphic recursion.
Construct new instantiation classes on demand while the program is running.

No. 1: Place upper limit on instantiations
We could put an upper limit on the number of instantiations of a generic class that a program can make. Then, during compilation, we can determine a finite bound on the set of legal instantiations and simply generate class files for all instantiations in this bound.

This approach is similar to what is done in the C++ standard template library (and that should give us reason to worry that it's not a good approach). The problem with this approach is that, like signaling errors for bad constructor calls, the programmer will have no way of predicting that a given run of his program will crash. For example, suppose the bound on the number of instantiations were 42 and the nest() method mentioned earlier was being called with a user-supplied argument. Then, so long as the user types numbers below 42, everything works. When the user types 43, our house of cards crashes around him. Now imagine the poor maintainer of the code who's given the job of putting the pieces back together and trying to find out what's so special about the magic number 42.

No. 2: Statically forbid polymorphic recursion
Why don't we issue a command to the compiler like "statically forbid polymorphic recursion." (Sigh. If only it were that simple.) Of course, many programmers, including myself, would object that such a policy would inhibit the use of many important design patterns.

For example, in a generic class List<T>, do you really want to prevent the construction of a List<List<T>>? Returning such a list from a method could be useful for building many very common data structures. In fact, it turns out that we couldn't prevent polymorphic recursion even if we wanted to. Just like static detection of bad generic constructor calls, forbidding polymorphic recursion conflicts with incremental class compilation. This fact may be obfuscated by our simple example earlier in which the polymorphic recursion occurs as a simple, direct self-reference. But in general, the self-reference may take arbitrary levels of indirection through a multitude of classes compiled at different times. Again, that's because one generic class may instantiate another with its own type parameters.

Here's an example involving polymorphic recursion across two classes:

Listing 3. Mutually recursive polymorphic recursion


class C<T> {
  public Object potentialNest(int n) {
    if (n == 0) return this;
    else return new D<T>().nest(n - 1);
  }
}

class D<S> {
  public Object nest(int n) {
    return new C<C<S>>().nest(n);
  }
}

There is no polymorphic recursion evident in either class C or D, but an expression like new D<C<Object>>().nest(1000) will cause 1000 instantiations of class C.

Potentially, we could add new attributes to class files indicating all the various generic type instantiations inside the class and then analyze these instantiations for recursion when compiling other classes. But still, we would have to provide the programmer with weird and counter-intuitive error messages.

In the code above, where would we signal an error? In the compilation of class D or in the compilation of a client class that included the meddlesome expression new D<C<Object>>().nest(1000)? Either way, unless the programmer has access to the source code for class C, he will have no way of predicting when compilation errors would occur.

No. 3: Construct new instantiation classes on the fly
Another approach is to construct new instantiation classes on demand while the program is running. At first, this approach may seem entirely incompatible with the Java runtime. But in fact, all that is necessary to implement this strategy is to use a modified class loader that constructs new instantiation classes from a "template" class file.

The JVM spec already allows programmers to use modified class loaders; in fact, they're used by many popular Java applications such as Ant, JUnit, and DrJava. The disadvantage of this approach is that the modified class loader would have to be distributed along with the application for it to run on older JVMs. Because class loaders tend to be small, this overhead should not be high.

Let's look at a working example of this approach.

NextGen example: Modified class loader
The previous approach -- to solve the problem of polymorphic recursion with a modified class loader that constructs generic type instantiations on demand -- is the approach adopted with the NextGen extension to the Java language. A modified class loader uses template files that look almost exactly like ordinary class files except for "holes" in the constant pool that are filled in at load time for each instantiation class. Non-generic classes are unaffected.

At the Rice University JavaPLT programming languages lab, we've recently released a prototype of the NextGen compiler, an extension of the GJ generic Java compiler that includes support for type-dependent operations (casts, instanceof tests, new expressions) on type parameters. In this prototype implementation, we have employed a modified class loader to support polymorphic recursion. This prototype is available as a free download (see Resources).

Conclusion
As the above considerations demonstrate, adding full-fledged run-time support to generic Java includes the solution to many subtle design issues. If these issues are not handled well, the benefits of generic types can easily be outweighed by decreases in expressiveness and robustness. Hopefully, Java programming will continue to develop in a direction that maintains a high degree of both of these attributes.

Next time, we'll finish up our discussion of generic types by discussing perhaps the most powerful way that generic types could be applied: to add mixins (classes with parametric parent type) to the language. We'll relate such a formulation of mixins to previous discussions of this powerful language feature, discussing the advantages and disadvantages of adding mixins through generic types.

Resources

Participate in the discussion forum on this article. (You can also click Discuss at the top or bottom of the article to access the forum.)
Read the complete Diagnosing Java code series by Eric Allen on developerWorks. In particular, "Killer combo -- Mixins, Jam, and unit testing" (December 2002) and "The case for static types" (June 2002) can bolster your knowledge of generic types and the Java type system.
Learn how the practice of including extraneous zeroary constructors can cause problems in the author's April 2002 column, "Diagnosing Java code: The Run-on Initializer bug pattern."
Get a jump on generics in Java by downloading the JSR-14 prototype compiler; it includes the sources for a prototype compiler written in the extended language, a jar file containing the class files for running and bootstrapping the compiler, and a jar file containing stubs for the collection classes.
Eric Allen has a new book on the subject of bug patterns, Bug Patterns in Java (Apress, 2002), which presents a methodology for diagnosing and debugging computer programs by focusing on bug patterns, Extreme Programming methods, and ways to craft powerful testable and extensible software.
You can download the NextGen prototype compiler right now.
Also, check out DrJava, a free Java IDE that supports interactive evaluation of Java statements and expressions, and supports generic Java syntax and compilation.
Martin Fowler's Web site contains much useful information about effective refactoring.
Follow the discussion of adding generic types to Java by reading the Java Community Process proposal, JSR-14.
Keith Turner offers another look at this topic with his article "Catching more errors at compile time with Generic Java" (developerWorks, March 2001).
This paper, "Automatic Code Generation from Design Patterns" (PDF), from IBM Research, describes the architecture and implementation of a tool that automates the implementation of design patterns.
Find hundreds more Java technology resources on the developerWorks Java technology zone.

About the author
Eric Allen sports a broad range of hands-on knowledge of technology and the computer industry. With a B.S. in computer science and mathematics from Cornell University and an M.S. in computer science from Rice University, Eric is currently a Ph.D. candidate in the Java programming languages team at Rice. Eric is a project manager for and a founding member of the DrJava project, an open-source Java IDE designed for beginners; he is also the lead developer of the university's experimental compiler for the NextGen programming language, an extension of the Java language with added experimental features. Contact Eric at eallen@cs.rice.edu.

developerWorks > Java technology

About IBM | Privacy | Terms of use | Contact