|
|
Contents: |
|
|
|
Related content: |
|
|
|
Subscriptions: |
|
|
| Overcoming the limitations of generics in the JSR-14
prototype compiler
Eric
E. Allen (mailto:eallen@cs.rice.edu?cc=&subject=Java
generics without the pain, Part 3) Ph.D. candidate, Java
programming languages team, Rice University 9 April 2003
Java developer and researcher Eric Allen
continues his discussion of generic types in JSR-14 and Tiger with a
look at the ramifications of adding support for new
operations on "naked" type parameters in generic types. Share your
thoughts on this article with the author and other readers in the
accompanying discussion forum. (You
can also click Discuss at the top or bottom of the article to access the
forum.)
In this installment of our series on the addition of generic types to
Java programming, we'll consider one of the two limitations on the use of
generics that we haven't discussed, namely the addition of support for
new operations on "naked" type parameters (such as new
T() in a class C<T> ).
As I mentioned last month, Tiger and
JSR-14 implement generic types over the Java language using "type
erasure." With type erasure, generic types are used for type checking
only; afterwards, they are replaced with their upper bound. As is natural
with this definition, erasure would interfere with expressions such as
new T() .
If the bound on T were, say Object , such an
expression would be erased to new Object() and, regardless of
how T was instantiated (String ,
List , URLClassLoader , and so on), the
new operation would make a new instance of
Object . Clearly, that's not what we want.
To add support for expressions such as new T() , as well as
the other type-dependent operations we discussed last time such as casts
and instanceof expressions, we'd have to adopt some
implementation strategy other than type erasure (such as keeping a
separate class for each generic instantiation). But in the case of
new operations, there are other issues that must be
addressed.
In particular, many fundamental language-design issues have to be
decided upon to make such an addition to the Java language a reality.
Valid constructor
calls First off, in order to form a legal new
expression on a type parameter, such as new T() , it's
necessary to make sure that we're calling a constructor that is valid for
every instantiation of T . But because all we know about
T is that it's a subtype of its declared bound, we have no
idea what constructors an instantiation of T will have. This
problem could be handled in one of three ways:
- Require all instantiations of a type parameter to include a zeroary
constructor.
- Throw an exception whenever a run-time instantiation of the generic
class does not include a needed constructor.
- Modify the language syntax to include more elaborate bounds on the
type parameters.
No. 1: Require a zeroary
constructor We could simply require all instantiations of a
type parameter to include a zeroary constructor. This solution has the
advantage of being very simple. It also has a precedent.
Existing Java technologies, like JavaBeans technology, that deal with
similar problems solve the problem by requiring a zeroary constructor. A
big disadvantage to this approach, however, is that for many classes there
is no reasonable zeroary constructor.
For instance, any class representing a non-empty container would
naturally take arguments representing its elements in the constructor.
Including a zeroary constructor would force us to first create an instance
and only then do the initialization we would have done in the constructor
call. But that practice can lead to problems (You might want to read the
April 2002 installment of this column "The Run-on Initializer bug pattern"
for details; see Resources.)
No. 2: Throw an exception when
needed constructor is missing Another way of handling this
problem would be to throw an exception whenever a run-time instantiation
of the generic class does not include a needed constructor. Notice that
the exception must be thrown at run time. Because of the Java language's
incremental compilation model, we can't statically determine all of the
instantiations of a generic class that will occur at run time. For
example, suppose we have a set of generic classes as follows: Listing 1. New operation on a "naked" type
parameter
class C<T> {
T makeT() {
return new T();
}
}
class D<S> {
C<S> makeC() {
return new C<S>();
}
}
|
Now, in class D<S> , an instance of class
C<S> is constructed. Then in the body of class
C , a zeroary constructor on S will be called.
Does such a zeroary constructor exist? The answer, of course, is that it
depends on the instantiation of S !
If S is instantiated as, say, String , then
the answer is yes. If it's instantiated as, say, Integer ,
then the answer is no. But when compiling classes D and
C , we have no idea what instantiations of
D<S> other classes may construct. Even if we had the
whole program available for analysis (which we almost never have with Java
programs), we would have to do a rather costly flow analysis to determine
where potential problems with constructors could occur.
Furthermore, the sorts of errors we would get from this technique would
be particularly hard for the programmer to diagnose and repair. For
example, suppose the programmer is familiar only with the header of class
D . He sees that the bound on D 's type parameter
is the default bound (Object ). Given that information, he has
no reason to believe that an instantiation of D that meets
the declared type bound (such as D<Integer> ) would
cause an error. In fact, it may not cause an error for quite a long time,
until finally someone calls method makeC , and (finally) calls
method makeT on the instantiation of C . Then we
will get a signaled error, but it will be long after the real problem
occurred -- the bad instantiation of class D .
What's more, the stack trace of the signaled error may not even include
any method calls on that faulty instance of D ! Now let's
suppose that the programmer doesn't have access to the source code for
class C . He will be entirely in the dark as to what the
problem is or how to fix his code unless he can manage to get hold of the
maintainer of class C and get a clue.
No. 3: Modify syntax for more
elaborate bounds Another possibility is to modify the
language syntax to include more elaborate bounds on the type parameters.
These bounds could specify the set of available constructors that must be
present in any instantiation of the parameter. Then, inside the generic
class definition, the only constructors that could be called are those
declared in the bound.
Also, client classes that instantiate our generic class must do so with
classes that meet the declared constraint on what constructors exist. That
way, the parameter declaration would serve as a contract between the class
and its clients, and we could statically check that both obey the
contract.
This approach has many advantages over the other two, allowing us to
keep the expressiveness of the second approach and the same degree of
static checking as in the first. But it also has problems to overcome.
For one thing, the type parameter declarations can easily become wordy.
We'd probably need some form of syntactic sugar to make these augmented
parameter declarations tolerable. Also, if augmented parameter
declarations were added in a future version beyond Tiger, we'd have to
ensure that these augmented declarations would be compatible with existing
compiled generic classes.
If support for type-dependent operations on generic types is ever added
to Java programming, it's not yet clear what form it would take. But from
the perspective of which approach will keep Java code as robust as
possible (and as easy to fix as possible when it does break), option three
is definitely the way to go.
However, there is another problem with new expressions
that is even more serious.
Polymorphic
recursion The more serious problem is the potential for polymorphic recursion in
class definitions. Polymorphic recursion occurs when a generic class
instantiates itself in its own body. For example, consider the following
devious example: Listing 2. A self-referential
generic class
class C<T> {
public Object nest(int n) {
if (n == 0) return this;
else return new C<C<T>>().nest(n - 1);
}
}
|
Suppose a client class creates a new instance of
C<Object> and calls, say, nest(1000) . Then
in the process of executing method nest() , a new
instantiation C<C<Object>> will be constructed
and nest(999) will be called on it. Then an instantiation
C<C<C<Object>>> will be constructed, and so
on, until a thousand separate instantiations of class C are
made. Of course, I choose the number 1000 arbitrarily; in general we will
have no way of knowing what integers will be passed to method
nest at run time. In fact, they could be passed in as user
input.
Why is this a problem? Because if we support type-dependent operations
on generic types by constructing a separate class for each instantiation,
then we will have no way of knowing which classes we'll need to construct
until the program is run. But how can that work if the class loader looks
for existing class files for each class it loads?
Again, there are a couple of possibilities here:
- Place an upper limit on the number of instantiations of a generic
class that a program can make.
- Statically forbid polymorphic recursion.
- Construct new instantiation classes on demand while the program is
running.
No. 1: Place upper limit on
instantiations We could put an upper limit on the number of
instantiations of a generic class that a program can make. Then, during
compilation, we can determine a finite bound on the set of legal
instantiations and simply generate class files for all instantiations in
this bound.
This approach is similar to what is done in the C++ standard template
library (and that should give us reason to worry that it's not a good
approach). The problem with this approach is that, like signaling errors
for bad constructor calls, the programmer will have no way of predicting
that a given run of his program will crash. For example, suppose the bound
on the number of instantiations were 42 and the nest() method
mentioned earlier was being called with a user-supplied argument. Then, so
long as the user types numbers below 42, everything works. When the user
types 43, our house of cards crashes around him. Now imagine the poor
maintainer of the code who's given the job of putting the pieces back
together and trying to find out what's so special about the magic number
42.
No. 2: Statically forbid polymorphic
recursion Why don't we issue a command to the compiler like
"statically forbid polymorphic recursion." (Sigh. If only it were that
simple.) Of course, many programmers, including myself, would object that
such a policy would inhibit the use of many important design patterns.
For example, in a generic class List<T> , do you
really want to prevent the construction of a
List<List<T>> ? Returning such a list from a
method could be useful for building many very common data structures. In
fact, it turns out that we couldn't prevent polymorphic recursion even if
we wanted to. Just like static detection of bad generic constructor calls,
forbidding polymorphic recursion conflicts with incremental class
compilation. This fact may be obfuscated by our simple example earlier in
which the polymorphic recursion occurs as a simple, direct self-reference.
But in general, the self-reference may take arbitrary levels of
indirection through a multitude of classes compiled at different times.
Again, that's because one generic class may instantiate another with its
own type parameters.
Here's an example involving polymorphic recursion across two
classes: Listing 3. Mutually recursive polymorphic
recursion
class C<T> {
public Object potentialNest(int n) {
if (n == 0) return this;
else return new D<T>().nest(n - 1);
}
}
class D<S> {
public Object nest(int n) {
return new C<C<S>>().nest(n);
}
}
|
There is no polymorphic recursion evident in either class
C or D , but an expression like new
D<C<Object>>().nest(1000) will cause 1000
instantiations of class C .
Potentially, we could add new attributes to class files indicating all
the various generic type instantiations inside the class and then analyze
these instantiations for recursion when compiling other classes. But
still, we would have to provide the programmer with weird and
counter-intuitive error messages.
In the code above, where would we signal an error? In the compilation
of class D or in the compilation of a client class that
included the meddlesome expression new
D<C<Object>>().nest(1000) ? Either way, unless the
programmer has access to the source code for class C , he will
have no way of predicting when compilation errors would occur.
No. 3: Construct new instantiation
classes on the fly Another approach is to construct new
instantiation classes on demand while the program is running. At first,
this approach may seem entirely incompatible with the Java runtime. But in
fact, all that is necessary to implement this strategy is to use a
modified class loader that constructs new instantiation classes from a
"template" class file.
The JVM spec already allows programmers to use modified class loaders;
in fact, they're used by many popular Java applications such as Ant,
JUnit, and DrJava. The disadvantage of this approach is that the modified
class loader would have to be distributed along with the application for
it to run on older JVMs. Because class loaders tend to be small, this
overhead should not be high.
Let's look at a working example of this approach.
NextGen example: Modified class
loader The previous approach -- to solve the problem of
polymorphic recursion with a modified class loader that constructs generic
type instantiations on demand -- is the approach adopted with the NextGen
extension to the Java language. A modified class loader uses template
files that look almost exactly like ordinary class files except for
"holes" in the constant pool that are filled in at load time for each
instantiation class. Non-generic classes are unaffected.
At the Rice University JavaPLT programming languages lab, we've
recently released a prototype of the NextGen compiler, an extension of the
GJ generic Java compiler that includes support for type-dependent
operations (casts, instanceof tests, new
expressions) on type parameters. In this prototype implementation, we have
employed a modified class loader to support polymorphic recursion. This
prototype is available as a free download (see Resources).
Conclusion As the
above considerations demonstrate, adding full-fledged run-time support to
generic Java includes the solution to many subtle design issues. If these
issues are not handled well, the benefits of generic types can easily be
outweighed by decreases in expressiveness and robustness. Hopefully, Java
programming will continue to develop in a direction that maintains a high
degree of both of these attributes.
Next time, we'll finish up our discussion of generic types by
discussing perhaps the most powerful way that generic types could be
applied: to add mixins (classes with parametric parent type) to the
language. We'll relate such a formulation of mixins to previous
discussions of this powerful language feature, discussing the advantages
and disadvantages of adding mixins through generic types.
Resources
- Participate in the discussion forum on this
article. (You can also click Discuss at the top or bottom of the
article to access the forum.)
- Read the complete Diagnosing Java
code series by Eric Allen on developerWorks. In
particular, "Killer combo -- Mixins,
Jam, and unit testing" (December 2002) and "The case for static
types" (June 2002) can bolster your knowledge of generic types and
the Java type system.
- Learn how the practice of including extraneous zeroary constructors
can cause problems in the author's April 2002 column, "Diagnosing Java code: The
Run-on Initializer bug pattern."
- Get a jump on generics in Java by downloading the JSR-14 prototype
compiler; it includes the sources for a prototype compiler written
in the extended language, a jar file containing the class files for
running and bootstrapping the compiler, and a jar file containing stubs
for the collection classes.
- Eric Allen has a new book on the subject of bug patterns, Bug Patterns
in Java (Apress, 2002), which presents a methodology for
diagnosing and debugging computer programs by focusing on bug patterns,
Extreme Programming methods, and ways to craft powerful testable and
extensible software.
- You can download the NextGen prototype
compiler right now.
- Also, check out DrJava, a free Java
IDE that supports interactive evaluation of Java statements and
expressions, and supports generic Java syntax and compilation.
- Martin Fowler's Web site contains much useful information about
effective refactoring.
- Follow the discussion of adding generic types to Java by reading the
Java Community Process proposal, JSR-14.
- Keith Turner offers another look at this topic with his article "Catching more errors at
compile time with Generic Java" (developerWorks, March
2001).
- This paper, "Automatic Code Generation
from Design Patterns" (PDF), from IBM Research, describes the
architecture and implementation of a tool that automates the
implementation of design patterns.
- Find hundreds more Java technology resources on the developerWorks Java
technology zone.
About the
author Eric Allen sports a broad range of hands-on
knowledge of technology and the computer industry. With a B.S. in
computer science and mathematics from Cornell University and an M.S.
in computer science from Rice University, Eric is currently a Ph.D.
candidate in the Java programming languages team at Rice. Eric is a
project manager for and a founding member of the DrJava project, an
open-source Java IDE designed for beginners; he is also the lead
developer of the university's experimental compiler for the NextGen
programming language, an extension of the Java language with added
experimental features. Contact Eric at eallen@cs.rice.edu. |
|