|
|
Contents: |
|
|
|
Related content: |
|
|
|
Subscriptions: |
|
|
| Avoid unnecessary mutation and access to make code robust
and easier to maintain
Eric
E. Allen (mailto:eallen@cs.rice.edu?cc=&subject=Design
for easy code maintenance) Ph.D. candidate, Java programming
languages team, Rice University 1 January 2003
This month, Eric Allen explains how avoiding
and controlling gratuitous mutation is key to retaining code robustness
while making the code easier to maintain. He focuses on such concepts as
functional style code crafting and ways of marking fields, methods, and
classes to handle and prevent mutability. Also, Eric explains the role
of unit testing and refactoring in this task, and offers two tools to
aid in refactoring efforts. Share your thoughts on this article with the
author and other readers in the accompanying discussion forum. (You can
also click Discuss at the top or bottom of the article to access the
forum.)
Effective debugging begins with good programming. Designing a program
to be easy to maintain is one of the most difficult challenges a
programmer faces, in part because programs are often maintained by
programmers other than those who originated the code. To maintain such
programs effectively, new programmers have to be able to quickly learn how
the program works, a task that's done most easily if small parts of the
program can be understood in isolation from the whole.
We'll outline some of the ways that programs can be written to help
make them more easily understood and maintained, looking at the issues of
mutability, decipherability, private methods, final methods, final
classes, local code, unit tests, and refactoring.
Mutability and
decipherability First up is the issue of mutability. Parts
of a program are most easily understood in isolation if the data that each
part works on is not altered by other, remote parts of a program during a
computation.
Too much information For
example, consider a program using an instance of a container class where
the constituent links can be modified. Every time the container is passed
to a method from one part of a program to another, and every time a
new expression is called where the container is passed as an
argument, an opportunity exists for the container to be altered away from
the control of the calling method.
We can't really be sure of our understanding of the calling method,
much less our ability to diagnose a bug, until we first understand how
each of the methods it calls modifies the container. If each of these
called methods call other modifying methods in turn, the amount of code
the maintenance programmer must read to understand a single method can
quickly balloon out of control.
For this reason, it can be highly advantageous to use different classes
for mutable and immutable containers. In the immutable versions, the
fields of the container can be marked as final .
Functional style to the
rescue Writing code so that it constructs new data as
opposed to modifying old data is known as functional style because
the methods of the program act like mathematical functions whose behavior
is described solely in terms of the output returned for each input.
The often overlooked advantage of functional style is that the
individual components of the program are far more easily understood in
isolation. If the data manipulated by a method is never altered by any of
the operations performed in its body, then all a programmer has to do to
understand that method is understand the results returned by those
operations. Compare this to the scenario above in which a method calls
several other methods, each of which modify the very data structures the
method operates on.
One nice feature of the Java language is that it allows us to use the
final keyword, as a directive to the type checker, to declare
when we want certain data to be immutable.
Avoiding mutation with the final keyword is a good way to
nail down the behavior of a class's methods. Every time a field is
modified, it has the potential to alter the behavior of the methods that
refer to it. Additionally, marking a field as final lets
other programmers that read the program know instantly that the field is
never modified, no matter how large the whole program is. For
example, consider the class hierarchy in the following for representing
immutable lists.
abstract class List {...}
class Empty extends List {...}
class Cons extends List {
private final Object first;
private final List rest;
}
|
All fields in these classes are marked as final . Is that
enough to ensure that instances of these classes are immutable? Not quite.
Of course, even when a field is marked as final , it's
important to remember that the components of the field itself may not be
final . Any part of the program that refers to those
components may be modified when they are altered, regardless of whether
the field itself is altered. In the example above, although the
constituent elements of the list can't be modified, we have to check that
those elements themselves don't contain non-final fields that may be
modified.
In this case, although a list may contain mutable elements, we can see
that the sequence of elements stored in a given list are immutable by
reasoning as follows: instances of Empty lists (that is, lists of length
zero) contain no elements at all; therefore, they can't be modified.
Instances of Cons (non-empty lists) contain two fields, both
final . The first field contains the first element of the list
and can't be modified; the second contains a list containing all remaining
elements. If the contents of this list is immutable, then so is the
containing list.
But the list contained in this second field has a length one less than
the length of containing list, so if we knew that all lists of length
n were immutable, we'd know that lists of length n + 1 were
also immutable. Since we already know that zero-length lists are
immutable, we also know that lists of length 1, 2, 3, and so on are also
immutable.
Tracing through the connections of a data structure like this can be
tedious, but it pays off when you can determine global properties of a
such a structure, such as immutability.
Controlling mutation The
best strategy to defend against unexpected mutation is to simply avoid all
mutation whenever possible. Only when there is a compelling reason to
mutate (such as, when it vastly simplifies the structure of the code)
should we make use of it. When mutation can be avoided, the payoff can be
enormous (in terms of lower maintenance costs and increased
robustness).
Even when there is a compelling reason to mutate data, it's best to try
to control that mutation, to limit the potential damage as much as
possible. Iterators and Streams are great examples of data structures
explicitly designed to control mutation by allowing us to walk over a
series of elements in a regular and well-defined fashion, rather than
explicitly modifying some handle on the elements.
Private methods Just as
setting fields to final helps limit outside influences on the
value of a field, setting them to private helps to limit the
influence they have on other parts of the program. If a field is private,
we can be certain that no other parts of the program depend on it
directly. If we eliminate the field and replace the internal
representation of the class data, we need only worry about fixing the
methods inside of the class to access the new data properly.
In the earlier example, notice that the fields of class
Cons are private. That way, we can control how those elements
are accessed through getters and the like. If a future maintainer of our
Lists ever wanted to modify our internal representation of Lists (for
example, perhaps it turns out that on certain platforms, array-based lists
are more efficient), the programmer can do so without modifying or even
looking at any of the clients of those lists. He simply has to rewrite the
getters to take the appropriate action with the new data.
Final methods, final classes, and
understanding code locally In contrast to marking fields as
final , marking a method as final is often
claimed to be at odds with OO design goals because it inhibits inheritance
polymorphism. But when trying to understand the behavior of a large
program, it helps to know what methods are not overridden.
Now it's absolutely true that good OO design involves using a great
deal of inheritance. In fact, inheritance is central to many OO design
patterns. But that doesn't mean that we should allow every method we write
to be overwritten. Often a program will implicitly rely on certain key
methods not being overwritten. By marking such methods as
final , we will allow other programmers to better understand
the behavior of expressions that call the method.
Additionally, marking classes as final can be a great
boost to decipherability. It can really help to know at a glance which
classes are never subclassed in a program. In fact, I would argue that the
only classes that shouldn't be marked as final are classes
that are actually subclassed in a program and classes that, as an inherent
part of the program design, are intended to be subclassed from outside
components.
Some may say that this concept will straightjacket future maintainers
of the code, keeping them from being able to extend the code. I say it
will certainly not restrict them. If future maintainers of a program need
to extend it to include a subclass where none existed before, it's not
hard to delete the final keyword on the corresponding class
and recompile it, provided they have access to the source code (and if
they don't have access to it, in what sense are they "maintainers" of that
code?).
Meanwhile, that added keyword serves as a form of automatically
verified documentation of an important invariant about the program
("automatically verified" because the program won't even compile if the
documentation is violated). By forcing developers to consciously choose
when they want to eliminate such an invariant, we can help to reduce the
introduction of errors.
Unit tests and mutation As
always, unit tests can help in understanding side-effecting code. If a
suite of unit tests adequately documents the effects of the methods in a
program, then a programmer can understand each method more quickly just by
reading its unit tests. Of course, the big question is whether the unit
tests really do cover the effects adequately. Coverage analysis tools like
Clover can help here to some degree.
Notice, however, that unit tests themselves are much easier to write
for strictly functional methods. To test strictly functional methods, all
that's involved is to call these methods with various representative
inputs and check their outputs (and make sure they throw exceptions when
they should).
When testing methods that modify the state of data structures, one must
first perform the operations necessary to put the input data into the
state expected by the method and then, after calling the method, check
that every modification of the data expected by clients was performed
correctly.
Wrapping up with refactoring
tools These tips can be great when writing new code, but
what about when you have to maintain old code that is barely decipherable?
Refactor, refactor, refactor.
Although refactoring old code takes time, that time is well spent,
especially with all the tool support for refactoring now available for
Java code. There are now many powerful tools for automatically refactoring
Java code, tools that preserve key invariants automatically.
One of the most full-featured tools for refactoring Java code is the
IDEA development environment. This environment provides automatic support
for a significant chunk of Martin Fowler's refactoring patterns. Another
tool I have found to be very useful is CodeGuide, a German IDE. Although
its list of automatic refactorings is small compared to IDEA, it showcases
an extraordinarily powerful feature -- continuous compilation.
While you're typing new code, CodeGuide analyzes it and tells you if
anything in the project has broken (of course, there is a short delay to
prevent it from signaling errors on every keystroke).
Although continuous compilation negatively affects responsiveness, it
can be well worth the wait in certain contexts. For example, you can type
final in front of a field and instantly see if anything in
the project breaks. If not, you know that the field isn't modified
anywhere in the program. Similarly, you can type private in
front of a field and instantly get a list of all outside accesses to the
field (in the form of errors).
Another great feature of CodeGuide is that it provides seamless support
for the JSR-14 experimental extension with generic types (scheduled for
official addition in Java 1.5).
Although writing code for decipherability can take a lot more time and
effort, it can help to increase the lifetime and the robustness of your
code, and it can significantly enhance the quality of life for those who
face the task of maintaining it. Finally, refactoring old code to be more
maintainable takes time but pays for itself the next time you have to fix
a bug.
Resources
- Participate in the discussion forum on this
article. (You can also click Discuss at the top or bottom of the
article to access the forum.)
- Eric Allen has a new book on the subject of bug patterns, a notion
first introduced in this column: Bug Patterns
in Java (Apress, 2002).
- Martin Fowler's Web site
contains much useful information about effective refactoring.
- For more guidelines on using the
final keyword, see "Java
theory and practice: Is that your final answer?"
(developerWorks, October 2002) by Brian Goetz.
- Examine seven principles to build a base for code design with
testing in mind in "Diagnosing
Java code: Designing 'testable' applications"
(developerWorks, September 2001).
- Explore the complete Diagnosing
Java code series.
- Follow the discussion of adding generic types to the Java language
by reading the Java Community Process proposal, JSR-14.
- "Catching
more errors at compile time with Generic Java," by Keith Turner
(developerWorks, March 2001), offers a look at how Generic Java
provides an elegant way to implement generic utility classes,
alleviating the need to cast and allowing more errors to be caught at
compile time.
- "Automatic
Code Generation from Design Patterns," from IBM Research, describes
the architecture and implementation of a tool that automates the
implementation of design patterns.
- Find hundreds more Java technology articles and tutorials on the
developerWorks Java technology
zone.
About the
author Eric Allen possesses a broad range of hands-on
knowledge of technology and the computer industry. With a B.S. in
computer science and mathematics from Cornell University and an M.S.
in computer science from Rice University, Eric is currently a Ph.D.
candidate in the Java programming languages team at Rice. Eric's
research concerns the development of semantic models and static
analysis tools for the Java language at the source and bytecode
levels. He is also concerned with the verification of security
protocols through semantic formalisms and type checking. Eric is
a project manager for and a founding member of the DrJava project,
an open-source Java IDE designed for beginners; he is also the lead
developer of the university's experimental compiler for the NextGen
programming language, an extension of the Java language with added
experimental features. Eric has moderated several Java forums for
the online magazine JavaWorld. In addition to these
activities, Eric teaches software engineering to Rice University's
computer science undergraduates. You can contact Eric at eallen@cs.rice.edu. |
|
|
|