|
|
|
Contents: |
|
|
|
Related content: |
|
|
|
Subscriptions: |
|
|
| Love or hate it, static type checking can make code more
robust
Eric
E. Allen (mailto:eallen@cs.rice.edu?cc=&subject=The
case for static types) Ph.D. candidate, Java programming languages
team, Rice University 1 June 2002
Many popular languages -- such as Ruby,
Python, and other so-called "scripting" languages -- have moved away
from static type checking as a means to help improve the reliability of
code. Still, static type checking can be one of the key weapons in a
powerful arsenal against introducing and for detecting bugs. In this
article, Eric Allen makes a case for static type checking, explains why
we should be glad that the Java language supports it, and discusses how
it can be made even better. Share your thoughts on this article with the
author and other readers in the discussion forum by clicking
Discuss at the top or bottom of the
article.
Static types -- most programmers love them or hate them. Advocates
boast that static types allow them to produce cleaner and more reliable
code than they could without them. Detractors moan about the added
complexity that static types require of a program.
Granted, static types are not a free lunch; they can be tedious to work
with at times. However, if our main concern is to keep bugs out of our
code, then, taken as a whole, Java programming is better off for having
and using static types. Why? Static type checking:
- Improves robustness through early error detection
- Increases performance by making required checks at the best times
- Supplements the weaknesses of unit testing
Let's examine these reasons in more detail and take a look at static
type checking in combination with pair programming.
Improve robustness via early
detection Static type checking improves program robustness.
Why? Because it helps locate errors as quickly as they possibly can be --
before the program is run. The logic here is inescapable. The
sooner a bug is identified, the easier it will be to diagnose the problem,
and a smaller amount of data will be corrupted by the errant
computation.
Locating and diagnosing a bug before the program is run is the ideal
state. This advantage makes static type checking a great success story in
programming language design because it is one of the few ways to
automatically detect bugs in a program before it is run, and because it
can perform this task in an acceptable amount of time. By "acceptable
amount of time," I mean time linear in the length of the program (with a
small constant factor) as opposed to the cubic or even exponential time
required by other forms of automated checking (many aren't guaranteed to
ever finish at all).
Granted, the more powerful a type system is, the easier it is to
program in (and the more errors it will detect). I don't deny that the
simplicity of the Java type system leaves much to be desired; this
simplicity often gets in the way, forcing us to circumvent it with casts.
But this situation is slowly improving.
The Sun JSR14 compiler adds a limited form of generic (or
parameterized) types to the language; we can be confident that its
adoption into the language is just a matter of time because it is now
firmly in the community process. More advanced language extensions such as
NextGen promise to further build on the increased expressiveness provided
by JSR14. That's a good thing, because, in many contexts, NextGen helps to
alleviate some of the added complexity necessary even in JSR14. See the Resources
section for more information on this issue. (Besides the JSR14 link, there
are some articles on parametric polymorphism.)
The benefits of static type checking extend beyond robustness, though.
It can also safeguard your program's performance.
The path to NextGen GJ is a
compiler written by Martin Odersky that allows users to add
generic, or parameterized types to the Java language
without sacrificing compatibility with legacy code. Its forerunner
was known as Pizza; it was an experimental language with several new
features that were adopted into Java 1.2. GJ improved Pizza by only
adding generic types. At Rice University, one of the programming
languages technology groups has implemented a compiler for an
upward-compatible version of GJ, called NextGen.
GJ compiles source to bytecode by means of type erasure, a
method that replaces every instance of each type variable with that
variable's upper bound. It also allows type variables to be declared
for specific methods, rather than for whole classes.
The NextGen language was jointly developed by Robert Cartwright
of Rice University and Guy Steele of Sun Microsystems. This language
adds the ability to perform run-time checks of type variables to GJ.
It supports type parameterization of individual methods, and
supports inner classes and run-time operations on generic types. It
doesn't allow type variables to be instantiated with primitive
types. |
Increase performance by reducing needed
checks In a safe language (and by "safe" I mean a language
that does not allow us to break its own abstractions), various checks on
the types of arguments passed to methods, as well as checks on the types
of accessed fields, are required and must be done. If these checks are not
done statically, then they must be done at run time.
Performing these required checks takes time, and languages that perform
them at run time incur a corresponding performance hit. When an invariant
is checked statically, we can eliminate the need to check it at run time,
speeding up the program. So, not only does static type checking allow us
to produce more robust code; it lets us produce more efficient code as
well.
Traditionally, static type checking during compilation is considered to
be inefficient. Linking together various type references across files can
take a long time for large programs in languages such as C/C++ because the
various files must be combined into one large executable with each
compilation. But the Java language avoids this issue entirely because
classes are compiled separately and loaded on demand into the JVM. There
is no need to link all referenced files into a single executable -- so
there is no corresponding slow-down during compilation.
So now, what do we say to those who claim that static typing is
unnecessary in the context of unit tests?
Exceeding the limits of unit
tests As regular readers of this column know, I am a strong
advocate of unit testing. It is the best practice to cover your programs
with unit tests. However, I will be the first to acknowledge the
limitations of unit testing.
A unit test can only test the behavior of a program during a specific
run, with specific inputs. Granted, in the narrow context of that run, we
can test the deep properties of the program.
In contrast, type checking checks shallow properties, but it
does so for all potential runs of the program, with any possible inputs.
Combining type checking and unit
tests Just as stories and unit tests complement one another
when specifying the behavior of a program, unit tests and static type
checking complement one another when identifying and stamping out bugs.
The combined effect of these two means of bug elimination is greater than
the sum of their individual effects.
Some designers and programmers will note that the kinds of errors
experienced in sophisticated programs are much deeper than those caught by
static type checking; therefore, they conclude that the static type system
hinders more than it helps. And without a doubt, a statically typed
language can make programs more verbose and can even prevent us from
writing some programs that would never cause any errors.
There are always tradeoffs. Using static types is not a free lunch.
Often, the program we write for the statically typed language will be more
complex than what we could write outside the type system. But even
"sophisticated" programs sport the kinds of shallow errors that are the
special prey of static type checking.
Even if these shallow errors are eliminated in a program, refactoring
can easily reproduce them. If we are to adopt the extreme programming
philosophy of perpetual refactoring, we will want to catch such shallow
errors as soon as they are introduced. (On the flip side, unit tests help
to catch deeper errors that occur under refactoring. The two concepts do
add up to a sum greater than their parts.) Static type checking works
quite well in the context of extreme programming.
A conflict between type checking and
unit tests Nevertheless, there is one conflict between
static type checking and unit testing that should be mentioned. Extreme
programming instructs us that we should interweave the writing of unit
tests with the writing of the code to implement those tests.
Each set of unit tests helps to specify a new aspect of functionality
and should be written right before the code that allows us to pass those
tests. Ideally, we'd like to compile those tests immediately after writing
them, so we can be sure they're ready to go.
But there's a problem here: the new tests won't pass static type
checking until we have defined the classes and methods to which the tests
refer. These classes and methods can be stubs that we fill in later,
but unless we have something in place, the references to them in the tests
won't make sense to the static checker.
Consider the relatively simple example in Listing 1, which shows a test
class for an implementation of multi-sets (an abstract
data-structure): Listing 1. A test class for an
implementation of multi-sets
import junit.framework.*;
import java.io.*;
/**
* A test class for MultiSet.
*
*/
public class MultiSetTest extends TestCase {
private static String W = "w";
private static String X = "x";
private static String Y = "y";
private static String Z = "z";
private static MultiSet<String> EMPTY = new MultiSet<String>();
private static MultiSet<String> XY = new MultiSet<String>(X, Y);
private static MultiSet<String> YZ = new MultiSet<String>(Y, Z);
private static MultiSet<String> XYZ = new MultiSet<String>(X, Y, Z);
private static MultiSet<String> XYY = new MultiSet<String>(X, Y, Y);
private static MultiSet<String> WXY = new MultiSet<String>(W, X, Y);
/**
* Constructor.
* @param String name
*/
public MultiSetTest(String name) {
super(name);
}
/**
* Creates a test suite for JUnit to run.
* @return a test suite based on the methods in this class
*/
public static Test suite() {
return new TestSuite(MultiSetTest.class);
}
private void _assertOrder(MultiSet set, String key, int value) {
assertEquals("order for key " + key, value, set.order(key));
}
public void testEmpty() {
_assertOrder(EMPTY, X, 0);
_assertOrder(EMPTY, Y, 0);
_assertOrder(EMPTY, Z, 0);
}
public void testOrder() {
_assertOrder(XY, X, 1);
_assertOrder(XY, Y, 1);
_assertOrder(YZ, Y, 1);
_assertOrder(YZ, Z, 1);
}
public void testAdd() {
MultiSet added = XY.add(YZ);
_assertOrder(added, X, 1);
_assertOrder(added, Y, 2);
_assertOrder(added, Z, 1);
}
public void testSubset() {
assertTrue(XY.subset(XYZ));
assertTrue(YZ.subset(XYZ));
assertTrue(! YZ.subset(XY));
assertTrue(! XY.subset(YZ));
assertTrue(! XYZ.subset(XY));
assertTrue(! XYZ.subset(YZ));
assertTrue(! XYY.subset(XYZ));
assertTrue(! XYZ.subset(XYY));
}
public void testSubtract() {
MultiSet XYYZ = XY.add(YZ);
assertEquals(YZ, XYYZ.subtract(WXY));
assertEquals(YZ, XYYZ.subtract(XY));
assertEquals(XY, XYYZ.subtract(YZ));
assertEquals(EMPTY, EMPTY.subtract(YZ));
assertEquals(EMPTY, YZ.subtract(YZ));
}
public void testUnion() {
assertEquals(XYZ, XY.union(YZ));
}
public void testIsEmpty() {
assertTrue(EMPTY.isEmpty());
assertTrue(! XY.isEmpty());
}
}
|
Where is class MultiSet ? What about the methods
union() , isEmpty() , and the like?
A type checker wouldn't have any better idea about the location of
these classes and methods than you do, so although this code compiles in
my environment, it won't compile for you. That is, it won't compile until
you write an implementation of class MultiSet with all of the
appropriate methods. Remember, in a statically typed language, you can't
compile new unit tests until you at least generate stubs for the classes
and methods you're trying to test.
This conflict can be easily alleviated through the use of test-oriented
development tools. Specifically, you need a development tool that can read
over a unit test, accumulate the class and method references (and
appropriate signatures) necessary for that test to pass static type
checking, and then generate the stub classes.
If you consider what the design of such a development tool would look
like, it becomes pretty clear that the plan for a test-oriented
development tool would look exactly like a static type checker, except
that, instead of generating errors, it would simply accumulate a log of
the stubs that it needs to generate. We are currently implementing a
static checker for NextGen with a "stub-generating" mode that does exactly
this.
Pair programming: Another shallow error
check Another complement to the shallow but general error
detection capabilities of static type checking is pair programming,
one of the tenets of extreme programming. Multiple intelligent agents
checking each others work is a great way to eliminate many of those
shallow errors.
Another effective means to achieve this effect is through open source
coding. When code can be made open source, robustness tends to improve --
after all, then you have more than two pairs of programmer's eyes going
over the code, looking for the smallest of "gotchas." As Eric Raymond of
"The Cathedral and the Bazaar" fame puts it (in what he dubs Linus's Law),
"Given enough eyeballs, all bugs are shallow."
Going beyond simple type
checking Of course, for the same reasons that static type
checking can be beneficial, more advanced forms of static checking can be
too. The terms "static checking" and "static analysis" are more general
notions than just checking types -- they refer to any mechanism for
analyzing the text of a program to determine how it will behave at run
time.
As other teams have shown, the Java language can be extended to include
other forms of static checking, such as limited static verification of
assertions. Another direction for future work is to add various "soft
typing systems" on top of the Java language, in which operations such as
casts can be verified to succeed in certain contexts, but unverified casts
are not prohibited.
When eliminating bugs, we should attempt to bring all arms to bear on
the problem, developing new and effective static checking systems to check
as many invariants as possible. In the next few articles, we'll explore
some of the static analysis tools available for Java programming, both as
prototypes and as production tools.
Resources
- Participate in the discussion forum on this
article. (You can also click Discuss at the top or bottom of the
article to access the forum.)
- Here are some details on the Java Specification Request "JSR 14: Add Generic Types To
The Java Programming Language." The idea is to extend the language with
generic (also known as parameterized) types.
- Eric Allen covered various proposals for adding generics to Java
programming in the February 2000 JavaWorld article, "Behold the
power of parametric polymorphism."
- Paul Rogers addressed the relationship between types and sub-type
polymorphism in this April 2001 JavaWorld article, "Reveal
the magic behind subtype polymorphism."
- David F. Bacon's talk, "Fast
and Effective Optimization of Statically Typed Object-Oriented
Programs," in which he offers an optimization algorithm for
statically typed object-oriented languages, is available from IBM
Research (in PDF format).
- Eric Raymond's "The
Cathedral and the Bazaar" carries a discussion on the increased
robustness inherent in open source software.
- Are you looking for thorough, conceptual resources on such issues in
programming as:
- Synthesizing object-oriented and functional design to promote
reuse
- Reduction semantics for classes
- Modular and polymorphic set-based analysis
- Static debugging via componential set-based analysis
- Direct- and continuation-passing-style optimizing compilers
These are just a few of the topics covered in available
papers, theses, dissertations, and technical reports at the Rice University
Programming Languages Team site.
- You can never say it enough! Two excellent resources for designing,
testing, and implementing killer Java code are the Extreme Programming site
and the JUnit site
(which provides links to a multitude of sources on program testing
methods).
- Read all of Eric Allen's Diagnosing
Java code articles.
- Find other Java technology resources on the developerWorks Java
technology zone.
About the
author Eric Allen has a bachelor's degree in computer
science and mathematics from Cornell University and is a Ph.D.
candidate in the Java programming languages team at Rice University.
Before returning to Rice to finish his degree, Eric was the lead
Java software developer at Cycorp, Inc. He has also moderated the
Java Beginner discussion forum at JavaWorld. His research
concerns the development of semantic models and static analysis
tools for the Java language, both at the source and bytecode levels.
Eric is the lead developer of Rice's experimental compiler for the
NextGen programming language, an extension of the Java language with
added language features, and is a project manager of DrJava, an
open-source Java IDE designed for beginners. Contact Eric at eallen@cs.rice.edu. |
|
|