|
|
|
Contents: |
|
|
|
Related content: |
|
|
|
Subscriptions: |
|
|
| Don't let the "this" reference escape during
construction
Brian
Goetz (mailto:brian@quiotix.com?cc=&subject=Safe
construction techniques) Principal Consultant, Quiotix Corp 1
June 2002
The Java language offers a flexible and
seemingly simple threading facility that makes it easy to incorporate
multithreading into your applications. However, concurrent programming
in Java applications is more complicated than it looks: there are
several subtle (and not so subtle) ways to create data races and other
concurrency hazards in Java programs. In this installment of Java
theory and practice, Brian looks at a common threading hazard:
allowing the this reference to escape during construction.
This harmless-looking practice can cause unpredictable and undesirable
results in your Java programs.
Testing and debugging multithreaded programs is extremely difficult,
because concurrency hazards often do not manifest themselves uniformly or
reliably. Most threading problems are unpredictable by their nature, and
may not occur at all on certain platforms (like uniprocessor systems) or
below a certain level of load. Because testing multithreaded programs for
correctness is so difficult and bugs can take so long to appear, it
becomes even more important to develop applications with thread safety in
mind from the beginning. In this article, we're going to explore how a
particular thread-safety problem -- allowing the this
reference to escape during construction (which we'll call the escaped
reference problem) -- can create some very undesirable results. We'll
then establish some guidelines for writing thread-safe constructors.
Following "safe construction"
techniques Analyzing programs for thread-safety violations
can be very difficult and requires specialized experience. Fortunately,
and perhaps surprisingly, creating thread-safe classes from the outset is
not as difficult, although it requires a different specialized skill:
discipline. Most concurrency errors stem from programmers attempting to
break the rules in the name of convenience, perceived performance
benefits, or just plain laziness. Like many other concurrency problems,
you can avoid the escaped reference problem by following a few simple
rules when you write constructors.
Hazardous race
conditions Most concurrency hazards boil down to some sort
of data race. A data race, or race condition, occurs when
multiple threads or processes are reading and writing a shared data item,
and the final result depends on the order in which the threads are
scheduled. Listing 1 gives an example of a simple data race in which a
program may print either 0 or 1, depending on the scheduling of the
threads.
public class DataRace {
static int a = 0;
public static void main() {
new MyThread().start();
a = 1;
}
public static class MyThread extends Thread {
public void run() {
System.out.println(a);
}
}
}
|
The second thread could be scheduled immediately, printing the initial
value of 0 for a . Alternately, the second thread might
not run immediately, resulting in the value 1 being printed
instead. The output of this program may depend on the JDK you are using,
the scheduler of the underlying operating system, or random timing
artifacts. Running it multiple times could produce different results.
Visibility hazards There
is actually another data race in Listing 1, besides the obvious race of
whether the second thread starts executing before or after the first
thread sets a to 1. The second race is a visibility race: the
two threads are not using synchronization, which would ensure visibility
of data changes across threads. Because there's no synchronization, if the
second thread runs after the assignment to a is completed by
the first thread, changes made by the first thread may or may not
be immediately visible to the second thread. It is possible that the
second thread might still see a as having a value of 0 even
though the first thread already assigned it a value of 1. This second
class of data race, where two threads are accessing the same variable in
the absence of proper synchronization, is a complicated subject, but
fortunately you can avoid this class of data race by using synchronization
whenever you are reading a variable that might have been last written by
another thread, or writing a variable that might next be read by another
thread. We won't be exploring this type of data race further here, but see
the "Synching
up with the Java Memory Model" sidebar and the Resources
section for more information on this complicated issue.
Synching up with the Java Memory
Model
The keyword in Java programming enforces
mutual exclusion: it ensures that only one thread is
executing a given block of code at a given time. But synchronization
-- or the lack thereof -- also has other more subtle consequences on
multiprocessor systems with weak memory models (that is, platforms
that don't necessarily provide cache coherency). Synchronization
ensures that changes made by one thread become visible to
other threads in a predictable manner. On some architectures, in the
absence of synchronization, different threads may see memory
operations appear to have been executed in a different order than
they actually were executed. This is confusing, but normal -- and
critical for achieving good performance on these platforms. If you
just follow the rules -- synchronize every time you read a variable
that might have been written by another thread or write a variable
that may be read next by another thread -- then you won't have any
problems. See the Resources
section for more information. |
Don't publish the "this" reference during
construction One of the mistakes that can introduce a data
race into your class is to expose the this reference to
another thread before the constructor has completed. Sometimes the
reference is explicit, such as directly storing this in a
static field or collection, but other times it can be implicit, such as
when you publish a reference to an instance of a non-static inner class in
a constructor. Constructors are not ordinary methods -- they have special
semantics for initialization safety. An object is assumed to be in a
predictable, consistent state after the constructor has completed, and
publishing a reference to an incompletely constructed object is dangerous.
Listing 2 shows an example of introducing this sort of race condition into
a constructor. It may look harmless, but it contains the seeds of serious
concurrency problems.
public class EventListener {
public EventListener(EventSource eventSource) {
// do our initialization
...
// register ourselves with the event source
eventSource.registerListener(this);
}
public onEvent(Event e) {
// handle the event
}
}
|
On first inspection, the EventListener class looks
harmless. The registration of the listener, which publishes a reference to
the new object where other threads might be able to see it, is the last
thing that the constructor does. But even ignoring all the Java Memory
Model (JMM) issues such as differences in visibility across threads and
memory access reordering, this code still is in danger of exposing an
incompletely constructed EventListener object to other
threads. Consider what happens when EventListener is
subclassed, as in Listing 3:
public class RecordingEventListener extends EventListener {
private final ArrayList list;
public RecordingEventListener(EventSource eventSource) {
super(eventSource);
list = Collections.synchronizedList(new ArrayList());
}
public onEvent(Event e) {
list.add(e);
super.onEvent(e);
}
public Event[] getEvents() {
return (Event[]) list.toArray(new Event[0]);
}
}
|
Because the Java language specification requires that a call to
super() be the first statement in a subclass constructor, our
not-yet-constructed event listener is already registered with the event
source before we can finish the initialization of the subclass fields. Now
we have a data race for the list field. If the event listener
decides to send an event from within the registration call, or we just get
unlucky and an event arrives at exactly the wrong moment,
RecordingEventListener.onEvent() could get called while
list still has the default value of null , and
would then throw a NullPointerException exception. Class
methods like onEvent() shouldn't have to code against final
fields not being initialized.
The problem with Listing
2 is that EventListener published a reference to the
object being constructed before construction was complete. While it might
have looked like the object was almost fully constructed, and
therefore passing this to the event source seemed safe, looks
can be deceiving. Publishing the this reference from within
the constructor, as in Listing 2, is a time bomb waiting to explode.
Don't implicitly expose the "this"
reference It is possible to create the escaped reference
problem without using the this reference at all. Non-static
inner classes maintain an implicit copy of the this reference
of their parent object, so creating an anonymous inner class instance and
passing it to an object visible from outside the current thread has all
the same risks as exposing the this reference itself.
Consider Listing 4, which has the same basic problem as Listing 2, but
without explicit use of the this reference:
public class EventListener2 {
public EventListener2(EventSource eventSource) {
eventSource.registerListener(
new EventListener() {
public void onEvent(Event e) {
eventReceived(e);
}
});
}
public void eventReceived(Event e) {
}
}
|
The EventListener2 class has the same disease as its
EventListener cousin in Listing
2: a reference to the object under construction is being published --
in this case indirectly -- where another thread can see it. If we were to
subclass EventListener2 , we would have the same problem where
the subclass method could be called before the subclass constructor
completes.
Don't start threads from within
constructors A special case of the problem in Listing
4 is starting a thread from within a constructor, because often when
an object owns a thread, either that thread is an inner class or we pass
the this reference to its constructor (or the class itself
extends the Thread class). If an object is going to own a
thread, it is best if the object provides a start() method,
just like Thread does, and starts the thread from the
start() method instead of from the constructor. While this
does expose some implementation details (such as the possible existence of
an owned thread) of the class via the interface, which is often not
desirable, in this case the risks of starting the thread from the
constructor outweigh the benefit of implementation hiding.
What do you mean by "publish"?
Not all references to the this reference
during construction are harmful, only those that publish the reference
where other threads can see it. Determining whether it is safe to share
the this reference with another object requires detailed
understanding of that object's visibility and what that object will do
with the reference. Listing 5 contains some examples of safe and unsafe
practices with respect to letting the this reference escape
during construction:
public class Safe {
private Object me;
private Set set = new HashSet();
private Thread thread;
public Safe() {
// Safe because "me" is not visible from any other thread
me = this;
// Safe because "set" is not visible from any other thread
set.add(this);
// Safe because MyThread won't start until construction is complete
// and the constructor doesn't publish the reference
thread = new MyThread(this);
}
public void start() {
thread.start();
}
private class MyThread(Object o) {
private Object theObject;
public MyThread(Object o) {
this.theObject = o;
}
...
}
}
public class Unsafe {
public static Unsafe anInstance;
public static Set set = new HashSet();
private Set mySet = new HashSet();
public Unsafe() {
// Unsafe because anInstance is globally visible
anInstance = this;
// Unsafe because SomeOtherClass.anInstance is globally visible
SomeOtherClass.anInstance = this;
// Unsafe because SomeOtherClass might save the "this" reference
// where another thread could see it
SomeOtherClass.registerObject(this);
// Unsafe because set is globally visible
set.add(this);
// Unsafe because we are publishing a reference to mySet
mySet.add(this);
SomeOtherClass.someMethod(mySet);
// Unsafe because the "this" object will be visible from the new
// thread before the constructor completes
thread = new MyThread(this);
thread.start();
}
public Unsafe(Collection c) {
// Unsafe because "c" may be visible from other threads
c.add(this);
}
}
|
As you can see, many of the unsafe constructs in the
Unsafe class bear a significant resemblance to the safe
constructs in the Safe class. Determining whether the
this reference can become visible to another thread can be
tricky. The best strategy is to avoid using the this
reference at all (directly or indirectly) in constructors. In reality,
however, that's not always possible. Just remember to be very careful with
the this reference and with creating instances of nonstatic
inner classes in constructors.
More reasons not to let references escape
during construction The practices detailed above for
thread-safe construction take on even more importance when we consider the
effects of synchronization. For example, when thread A starts thread B,
the Java Language Specification (JLS) guarantees that all variables that
were visible to thread A when it starts thread B are visible to thread B,
which is effectively like having an implicit synchronization in
Thread.start() . If we start a thread from within a
constructor, the object under construction is not completely constructed,
and so we lose these visibility guarantees.
Because of some of its more confusing aspects, the JMM is being revised
under Java Community Process JSR 133, which will (among other things)
change the semantics of volatile and final to
bring them more in line with general intuition. For example, under the
current JMM semantics, it is possible for a thread to see a
final field have more than one value over its lifetime. The
new memory model semantics will prevent this, but only if a constructor is
defined properly -- which means not letting the this
reference escape during construction.
Conclusion Making a
reference to an incompletely constructed object visible to another thread
is clearly undesirable. After all, how can we tell the properly
constructed objects from the incomplete ones? But by publishing a
reference to this from inside a constructor -- either
directly or indirectly through inner classes -- we do just that, and
invite unpredictable results. To prevent this hazard, try to avoid using
this , creating instances of inner classes, or starting
threads from constructors. If you cannot avoid using this
either directly or indirectly in a constructor, be very sure that you are
not making the this reference visible to other threads.
Resources
- Participate in the discussion forum on this
article. (You can also click Discuss at the top or bottom of the
article to access the forum.)
- Doug Lea's Concurrent
Programming in Java, Second Edition (Addison-Wesley, 1999) is a
masterful book on the subtle issues surrounding multithreaded
programming in Java applications.
- Synchronization
and the Java Memory Model is an excerpt from Doug Lea's book that
focuses on the actual meaning of
synchronized .
- "Double-checked
locking: Clever, but broken" (JavaWorld, February 2001) and
"Can
double-checked locking be fixed?" (JavaWorld, May 2001)
explore the JMM and the surprising consequences of failing to
synchronize in certain situations.
- In "Double-checked
locking and the Singleton pattern" (developerWorks, May
2002), Peter Haggar gives a step-by-step explanation of how strange
things can happen when you fail to synchronize.
- Semantics
of Multithreaded Java (PDF) details the proposed changes in the Java
Memory Model as a result of JSR 133.
- In "Writing
multithreaded Java applications" (developerWorks, February
2001), Alex Roetter gives a basic overview of threads, synchronization,
and locking in Java classes.
- Read all of Brian Goetz's Java
theory and practice columns.
- Find other Java technology content in the developerWorks Java
technology zone.
About the
author Brian Goetz is a software consultant and has been a
professional software developer for the past 15 years. He is a
Principal Consultant at Quiotix, a software development
and consulting firm located in Los Altos, California. See Brian's published and
upcoming articles in popular industry publications. You can
contact Brian at brian@quiotix.com. |
|
|