Diagnosing Java code: The case for static types


Search for:	within
		Search help

IBM home | Products & services | Support & downloads | My account

developerWorks > Java technology


	Diagnosing Java code: The case for static types

Contents:

Improve robustness

Increase performance

Supplement unit testing

Related content:

Improve the performance of your Java code

The Impostor Type bug pattern

Designing extensible applications, Part 4

Designing "testable" applications

Diagnosing Java code columns

Subscriptions:

dW newsletters

dW Subscription
(CDs and downloads)

Love or hate it, static type checking can make code more robust

Level: Introductory

Eric E. Allen (mailto:eallen@cs.rice.edu?cc=&subject=The case for static types)
Ph.D. candidate, Java programming languages team, Rice University
1 June 2002

Many popular languages -- such as Ruby, Python, and other so-called "scripting" languages -- have moved away from static type checking as a means to help improve the reliability of code. Still, static type checking can be one of the key weapons in a powerful arsenal against introducing and for detecting bugs. In this article, Eric Allen makes a case for static type checking, explains why we should be glad that the Java language supports it, and discusses how it can be made even better. Share your thoughts on this article with the author and other readers in the discussion forum by clicking Discuss at the top or bottom of the article.

Static types -- most programmers love them or hate them. Advocates boast that static types allow them to produce cleaner and more reliable code than they could without them. Detractors moan about the added complexity that static types require of a program.

Granted, static types are not a free lunch; they can be tedious to work with at times. However, if our main concern is to keep bugs out of our code, then, taken as a whole, Java programming is better off for having and using static types. Why? Static type checking:

Improves robustness through early error detection
Increases performance by making required checks at the best times
Supplements the weaknesses of unit testing

Let's examine these reasons in more detail and take a look at static type checking in combination with pair programming.

Improve robustness via early detection
Static type checking improves program robustness. Why? Because it helps locate errors as quickly as they possibly can be -- before the program is run. The logic here is inescapable. The sooner a bug is identified, the easier it will be to diagnose the problem, and a smaller amount of data will be corrupted by the errant computation.

Locating and diagnosing a bug before the program is run is the ideal state. This advantage makes static type checking a great success story in programming language design because it is one of the few ways to automatically detect bugs in a program before it is run, and because it can perform this task in an acceptable amount of time. By "acceptable amount of time," I mean time linear in the length of the program (with a small constant factor) as opposed to the cubic or even exponential time required by other forms of automated checking (many aren't guaranteed to ever finish at all).

Granted, the more powerful a type system is, the easier it is to program in (and the more errors it will detect). I don't deny that the simplicity of the Java type system leaves much to be desired; this simplicity often gets in the way, forcing us to circumvent it with casts. But this situation is slowly improving.

The Sun JSR14 compiler adds a limited form of generic (or parameterized) types to the language; we can be confident that its adoption into the language is just a matter of time because it is now firmly in the community process. More advanced language extensions such as NextGen promise to further build on the increased expressiveness provided by JSR14. That's a good thing, because, in many contexts, NextGen helps to alleviate some of the added complexity necessary even in JSR14. See the Resources section for more information on this issue. (Besides the JSR14 link, there are some articles on parametric polymorphism.)

The benefits of static type checking extend beyond robustness, though. It can also safeguard your program's performance.

The path to NextGen
GJ is a compiler written by Martin Odersky that allows users to add generic, or parameterized types to the Java language without sacrificing compatibility with legacy code. Its forerunner was known as Pizza; it was an experimental language with several new features that were adopted into Java 1.2. GJ improved Pizza by only adding generic types.
At Rice University, one of the programming languages technology groups has implemented a compiler for an upward-compatible version of GJ, called NextGen.
GJ compiles source to bytecode by means of type erasure, a method that replaces every instance of each type variable with that variable's upper bound. It also allows type variables to be declared for specific methods, rather than for whole classes.

The NextGen language was jointly developed by Robert Cartwright of Rice University and Guy Steele of Sun Microsystems. This language adds the ability to perform run-time checks of type variables to GJ. It supports type parameterization of individual methods, and supports inner classes and run-time operations on generic types. It doesn't allow type variables to be instantiated with primitive types.

Increase performance by reducing needed checks
In a safe language (and by "safe" I mean a language that does not allow us to break its own abstractions), various checks on the types of arguments passed to methods, as well as checks on the types of accessed fields, are required and must be done. If these checks are not done statically, then they must be done at run time.

Performing these required checks takes time, and languages that perform them at run time incur a corresponding performance hit. When an invariant is checked statically, we can eliminate the need to check it at run time, speeding up the program. So, not only does static type checking allow us to produce more robust code; it lets us produce more efficient code as well.

Traditionally, static type checking during compilation is considered to be inefficient. Linking together various type references across files can take a long time for large programs in languages such as C/C++ because the various files must be combined into one large executable with each compilation. But the Java language avoids this issue entirely because classes are compiled separately and loaded on demand into the JVM. There is no need to link all referenced files into a single executable -- so there is no corresponding slow-down during compilation.

So now, what do we say to those who claim that static typing is unnecessary in the context of unit tests?

Exceeding the limits of unit tests
As regular readers of this column know, I am a strong advocate of unit testing. It is the best practice to cover your programs with unit tests. However, I will be the first to acknowledge the limitations of unit testing.

A unit test can only test the behavior of a program during a specific run, with specific inputs. Granted, in the narrow context of that run, we can test the deep properties of the program.

In contrast, type checking checks shallow properties, but it does so for all potential runs of the program, with any possible inputs.

Combining type checking and unit tests
Just as stories and unit tests complement one another when specifying the behavior of a program, unit tests and static type checking complement one another when identifying and stamping out bugs. The combined effect of these two means of bug elimination is greater than the sum of their individual effects.

Some designers and programmers will note that the kinds of errors experienced in sophisticated programs are much deeper than those caught by static type checking; therefore, they conclude that the static type system hinders more than it helps. And without a doubt, a statically typed language can make programs more verbose and can even prevent us from writing some programs that would never cause any errors.

There are always tradeoffs. Using static types is not a free lunch. Often, the program we write for the statically typed language will be more complex than what we could write outside the type system. But even "sophisticated" programs sport the kinds of shallow errors that are the special prey of static type checking.

Even if these shallow errors are eliminated in a program, refactoring can easily reproduce them. If we are to adopt the extreme programming philosophy of perpetual refactoring, we will want to catch such shallow errors as soon as they are introduced. (On the flip side, unit tests help to catch deeper errors that occur under refactoring. The two concepts do add up to a sum greater than their parts.) Static type checking works quite well in the context of extreme programming.

A conflict between type checking and unit tests
Nevertheless, there is one conflict between static type checking and unit testing that should be mentioned. Extreme programming instructs us that we should interweave the writing of unit tests with the writing of the code to implement those tests.

Each set of unit tests helps to specify a new aspect of functionality and should be written right before the code that allows us to pass those tests. Ideally, we'd like to compile those tests immediately after writing them, so we can be sure they're ready to go.

But there's a problem here: the new tests won't pass static type checking until we have defined the classes and methods to which the tests refer. These classes and methods can be stubs that we fill in later, but unless we have something in place, the references to them in the tests won't make sense to the static checker.

Consider the relatively simple example in Listing 1, which shows a test class for an implementation of multi-sets (an abstract data-structure):

Listing 1. A test class for an implementation of multi-sets



import  junit.framework.*;
import java.io.*;

/**
 * A test class for MultiSet.
 *
 */
public class MultiSetTest extends TestCase {
  private static String W = "w";
  private static String X = "x";
  private static String Y = "y";
  private static String Z = "z";

  private static MultiSet<String> EMPTY = new MultiSet<String>();
  private static MultiSet<String> XY = new MultiSet<String>(X, Y);
  private static MultiSet<String> YZ = new MultiSet<String>(Y, Z);
  private static MultiSet<String> XYZ = new MultiSet<String>(X, Y, Z);
  private static MultiSet<String> XYY = new MultiSet<String>(X, Y, Y);
  private static MultiSet<String> WXY = new MultiSet<String>(W, X, Y);

  /**
   * Constructor.
   * @param  String name
   */
  public MultiSetTest(String name) {
    super(name);
  }

  /**
   * Creates a test suite for JUnit to run.
   * @return a test suite based on the methods in this class
   */
  public static Test suite() {
    return new TestSuite(MultiSetTest.class);
  }

  private void _assertOrder(MultiSet set, String key, int value) {
    assertEquals("order for key " + key, value, set.order(key));
  }

  public void testEmpty() {
    _assertOrder(EMPTY, X, 0);
    _assertOrder(EMPTY, Y, 0);
    _assertOrder(EMPTY, Z, 0);
  }

  public void testOrder() {
    _assertOrder(XY, X, 1);
    _assertOrder(XY, Y, 1);
    _assertOrder(YZ, Y, 1);
    _assertOrder(YZ, Z, 1);
  }

  public void testAdd() {
    MultiSet added = XY.add(YZ);
    _assertOrder(added, X, 1);
    _assertOrder(added, Y, 2);
    _assertOrder(added, Z, 1);
  }

  public void testSubset() {
    assertTrue(XY.subset(XYZ));
    assertTrue(YZ.subset(XYZ));
    assertTrue(! YZ.subset(XY));
    assertTrue(! XY.subset(YZ));
    assertTrue(! XYZ.subset(XY));
    assertTrue(! XYZ.subset(YZ));
    assertTrue(! XYY.subset(XYZ));
    assertTrue(! XYZ.subset(XYY));
  }

  public void testSubtract() {
    MultiSet XYYZ = XY.add(YZ);
    assertEquals(YZ, XYYZ.subtract(WXY));
    assertEquals(YZ, XYYZ.subtract(XY));
    assertEquals(XY, XYYZ.subtract(YZ));
    assertEquals(EMPTY, EMPTY.subtract(YZ));
    assertEquals(EMPTY, YZ.subtract(YZ));
  }

  public void testUnion() {
    assertEquals(XYZ, XY.union(YZ));
  }

  public void testIsEmpty() {
    assertTrue(EMPTY.isEmpty());
    assertTrue(! XY.isEmpty());
  }
}

Where is class MultiSet? What about the methods union(), isEmpty(), and the like?

A type checker wouldn't have any better idea about the location of these classes and methods than you do, so although this code compiles in my environment, it won't compile for you. That is, it won't compile until you write an implementation of class MultiSet with all of the appropriate methods. Remember, in a statically typed language, you can't compile new unit tests until you at least generate stubs for the classes and methods you're trying to test.

This conflict can be easily alleviated through the use of test-oriented development tools. Specifically, you need a development tool that can read over a unit test, accumulate the class and method references (and appropriate signatures) necessary for that test to pass static type checking, and then generate the stub classes.

If you consider what the design of such a development tool would look like, it becomes pretty clear that the plan for a test-oriented development tool would look exactly like a static type checker, except that, instead of generating errors, it would simply accumulate a log of the stubs that it needs to generate. We are currently implementing a static checker for NextGen with a "stub-generating" mode that does exactly this.

Pair programming: Another shallow error check
Another complement to the shallow but general error detection capabilities of static type checking is pair programming, one of the tenets of extreme programming. Multiple intelligent agents checking each others work is a great way to eliminate many of those shallow errors.

Another effective means to achieve this effect is through open source coding. When code can be made open source, robustness tends to improve -- after all, then you have more than two pairs of programmer's eyes going over the code, looking for the smallest of "gotchas." As Eric Raymond of "The Cathedral and the Bazaar" fame puts it (in what he dubs Linus's Law), "Given enough eyeballs, all bugs are shallow."

Going beyond simple type checking
Of course, for the same reasons that static type checking can be beneficial, more advanced forms of static checking can be too. The terms "static checking" and "static analysis" are more general notions than just checking types -- they refer to any mechanism for analyzing the text of a program to determine how it will behave at run time.

As other teams have shown, the Java language can be extended to include other forms of static checking, such as limited static verification of assertions. Another direction for future work is to add various "soft typing systems" on top of the Java language, in which operations such as casts can be verified to succeed in certain contexts, but unverified casts are not prohibited.

When eliminating bugs, we should attempt to bring all arms to bear on the problem, developing new and effective static checking systems to check as many invariants as possible. In the next few articles, we'll explore some of the static analysis tools available for Java programming, both as prototypes and as production tools.

Resources

Participate in the discussion forum on this article. (You can also click Discuss at the top or bottom of the article to access the forum.)
Here are some details on the Java Specification Request "JSR 14: Add Generic Types To The Java Programming Language." The idea is to extend the language with generic (also known as parameterized) types.
Eric Allen covered various proposals for adding generics to Java programming in the February 2000 JavaWorld article, "Behold the power of parametric polymorphism."
Paul Rogers addressed the relationship between types and sub-type polymorphism in this April 2001 JavaWorld article, "Reveal the magic behind subtype polymorphism."
David F. Bacon's talk, "Fast and Effective Optimization of Statically Typed Object-Oriented Programs," in which he offers an optimization algorithm for statically typed object-oriented languages, is available from IBM Research (in PDF format).
Eric Raymond's "The Cathedral and the Bazaar" carries a discussion on the increased robustness inherent in open source software.
Are you looking for thorough, conceptual resources on such issues in programming as:
- Synthesizing object-oriented and functional design to promote reuse
- Reduction semantics for classes
- Modular and polymorphic set-based analysis
- Static debugging via componential set-based analysis
- Direct- and continuation-passing-style optimizing compilers
These are just a few of the topics covered in available papers, theses, dissertations, and technical reports at the Rice University Programming Languages Team site.
You can never say it enough! Two excellent resources for designing, testing, and implementing killer Java code are the Extreme Programming site and the JUnit site (which provides links to a multitude of sources on program testing methods).
Read all of Eric Allen's Diagnosing Java code articles.
Find other Java technology resources on the developerWorks Java technology zone.

About the author
Eric Allen has a bachelor's degree in computer science and mathematics from Cornell University and is a Ph.D. candidate in the Java programming languages team at Rice University. Before returning to Rice to finish his degree, Eric was the lead Java software developer at Cycorp, Inc. He has also moderated the Java Beginner discussion forum at JavaWorld. His research concerns the development of semantic models and static analysis tools for the Java language, both at the source and bytecode levels. Eric is the lead developer of Rice's experimental compiler for the NextGen programming language, an extension of the Java language with added language features, and is a project manager of DrJava, an open-source Java IDE designed for beginners. Contact Eric at eallen@cs.rice.edu.

developerWorks > Java technology

About IBM | Privacy | Terms of use | Contact