|
|
|
Contents: |
|
|
|
Related content: |
|
|
|
Subscriptions: |
|
|
| How you reference objects can seriously affect the garbage
collector
Jack
Shirazi (mailto:jack@JavaPerformanceTuning.com?cc=&subject=Referencing
objects), Director, JavaPerformanceTuning.com Kirk
Pepperdine (mailto:kirk@JavaPerformanceTuning.com?cc=&subject=Referencing
objects), CTO, JavaPerformanceTuning.com
26 Aug 2003
Intrepid optimizers Jack Shirazi
and Kirk Pepperdine, Director and CTO of JavaPerformanceTuning.com,
follow performance discussions all over the Internet, expanding and
clarifying the issues they encounter in this column. This month, they
set their sights on the Java Games Web site to see how game developers
identify and then resolve problems that appear when their application
doesn't release objects for garbage
collection.
If you think of game developers as the Formula One drivers of the Java
programming world, then you can understand why this group places so much
emphasis on performance. The performance problems that these developers
face on a daily basis often stretch the bounds of what we mere mortals
typically see. Where do you find these people? One place is at the Java
Games community Web site (see Resources). Though there may not be a lot of
server-side activity on this site, looking at what these bit-twiddlers
face every day can often yield precious nuggets that we all can benefit
from. So let's get our game on!
Object leaks Game
programmers are no different from other programmers -- they still need to
understand the subtleties of the Java runtime environment, such as garbage
collection. Garbage collection can be one of the more difficult concepts
to wrap your mind around, as it isn't always obvious how to debug heap
problems. It seems like there are a lot of discussions that start with, or
end with, "I'm having a problem with garbage collection."
Let's say you're getting out-of-memory errors. You've fired up your
profiler to look for the problem, but you've gotten nowhere. You can
easily get to the stage where it's easy to believe that the bug is in the
JVM heap management, rather than in your application. But, as explained
more than once by Java Game's resident experts, the JVM doesn't have any
substantiated object leaks. The garbage collector has proved to be
generally accurate in determining which objects are dead and then
reclaiming their space. So if you are getting out-of-memory errors, it is
extremely likely that your application is experiencing "unintentional
object retention."
Memory leaks versus unintentional
retention What is the difference between a memory leak and
unintentional object retention? When it comes to programs written in the
Java language, nothing really. They both basically mean that your
application is retaining references to objects that you don't intend to
reference. The classic example is adding objects into a collection to keep
track of those objects, but forgetting to remove them from the collection
when you no longer need to keep track of them. Because the collection can
keep growing without bound, and doesn't ever get smaller, at some point
you can have so many objects in the collection (or referenced by elements
in the collection) that you fill up the heap and get out-of-memory errors.
The garbage collector cannot reclaim those objects you think you are
finished with, because as far as the garbage collector is concerned, the
application can still access them at any time through that collection, so
they couldn't possibly be garbage.
In languages without garbage collection, like C++, there is a difference between
memory leak and unintentional object retention. C++ programs can have
unintentional object retention just like Java programs can. But C++
programs can also have real memory leaks, where objects are no longer
reachable by the application but the memory never gets released back to
the system. Thankfully, in Java programs, this latter type of memory leak
is not possible. We prefer to use the term "unintentional object
retention" for the memory problems that make Java programmers tear their
hair out, so we can distinguish ourselves from all those other programmers
who have to deal with more retrograde languages.
Tracking retained
objects So what do you do if you have unintentional object
retention? Well, first you need to determine which objects are being
unintentionally retained, and then you need to find which objects are
referencing them. Then you've got to figure out where they should be released. The
easiest way to identify these objects is by using a profiler with the
abilities to snapshot the heap, compare object numbers between snapshots,
track objects, find back references to objects, and force garbage
collections. With such a profiler, the procedure to follow is relatively
straightforward:
- Wait until the application has reached the steady state, where you
would expect most new objects are temporary objects that can be garbage
collected; typically this is after all the application initializations
have finished.
- Force a garbage collection, and take an object snapshot of the
heap.
- Do whatever work it is that is causing unintentionally retained
objects.
- Force another garbage collection and then take a second object
snapshot of the heap.
- Compare the two snapshots to see which objects have increased in
number from the first snapshot to the next. Because you forced garbage
collections before the snapshots, the objects left should all be objects
referenced by the application, and comparing the two snapshots should
identify exactly those newly created objects that are being retained by
the application.
- Using your knowledge of the application, determine from the snapshot
comparison which of the objects are being unintentionally
retained.
- Track back-references to find which objects are referencing the
unintentionally retained objects, until you reach the root object that
is causing the problem.
After following this procedure, you will
know how to cure the problem.
Explicitly nulling
variables Staying on the subject of garbage collection, one
really fascinating discussion concerned whether there was a performance
advantage to explicitly nulling variables. Nulling a variable is simply
explicitly assigning null to the variable, as opposed to just
letting references go out of scope. Listing 1.
Local scope
public static String scopingExample(String string) {
StringBuffer sb = new StringBuffer();
sb.append("hello ").append(string);
sb.append(", nice to see you!");
return sb.toString();
}
|
When the method is executing, the runtime stack holds a reference to
the StringBuffer object created in the first line. As long as
the method is executing, the reference to the StringBuffer
object prevents that object from being considered as garbage. After the
method is terminated, the variable sb goes out of scope, and
the runtime stack eliminates the reference to that
StringBuffer object. There is no longer any reference to the
StringBuffer object, and now it can be garbage collected.
This elimination of the reference is equivalent to nulling the
sb variable just after the method completes.
Wrong scoping So if
the JVM does the equivalent of the nulling for you, how can explicitly
nulling a variable ever help? For correctly scoped variables, there is no
benefit. But let's look at another version of the
scopingExample method, and this time we'll incorrectly scope
the sb variable. Listing 2. Static
scope
static StringBuffer sb = new StringBuffer();
public static String scopingExample(String string) {
sb = new StringBuffer();
sb.append("hello ").append(string);
sb.append(", nice to see you!");
return sb.toString();
}
|
Now sb is a static variable, so it lasts as long as the
class remains loaded in the JVM. Every time the method is executed, a new
StringBuffer object is created and referenced by the
variable. At that point the StringBuffer object previously
referenced by the sb variable becomes dead, making it a
candidate for garbage collection. This means that the
StringBuffer is being held onto by the application for much
longer than it needs to be -- possibly forever if no one ever calls
scopingExample again.
A pathological
example Even so, would explicitly nulling that variable
improve performance? We would have found it difficult to believe that one
object more or less can have much effect on performance, until I saw an
example given by a Sun engineer at Java Games involving an unfortunately
sized object. Listing 3. Object in old
space
private static Object bigObject;
public static void test(int size) {
long startTime = System.currentTimeMillis();
long numObjects = 0;
while (true) {
//bigObject = null; //explicit nulling
//SizableObject could simply be a large array, e.g. byte[]
//In the JavaGaming discussion it was a BufferedImage
bigObject = new SizableObject(size);
long endTime = System.currentTimeMillis();
++numObjects;
// We print stats for every two seconds
if (endTime - startTime >= 2000) {
System.out.println("Objects created per 2 seconds = " + numObjects);
startTime = endTime;
numObjects = 0;
}
}
}
|
This example simply loops, creating a large object and assigning it to
the same variable, reporting the number of objects created every two
seconds. Modern JVMs use a generational garbage
collection scheme, creating young objects in one space (called Eden) and then moving
them to another space if they survive past the first garbage collection.
Collecting objects in Eden, the young generation space where new objects
are created, is much faster than garbage collecting objects in the "old"
generation space. But if Eden is full and no space can be reclaimed, the
live objects in Eden must be moved to the old generation to make room for
new objects. Without the explicit null assignment, if the object being
created is large enough, then Eden gets full and the garbage collector
cannot reclaim the currently referenced object. Consequently, the object
gets moved to the old generation space and takes more time to garbage
collect.
With the explicit null assignment, Eden gets freed each time before the
new object is created, so garbage collection is much faster. In fact, with
the explicit nulling, the loop creates five times as many objects in two
seconds as without the explicit nulling -- but only if you choose objects
that are big enough to just fill Eden, about 500 Kilobytes for the default
1.4 JVM configuration on Windows. That's a performance difference of five
times faster due to one null assignment! But do note that the reason for
this performance difference is because the variable is scoped incorrectly,
for which the null assignment is simply a workaround, and also because the
object is very large. The whole discussion is further extended in the
article "Nulling variables and garbage collection" (see Resources).
Best practice That
was an interesting example, but it is worth emphasizing that the best
practice is to correctly scope variables, and to not explicitly null
them. Although explicitly nulling variables should normally have no
effect, there are also pathological examples where it could have a
significant negative effect on performance. For example, iteratively or
recursively nulling elements of a collection where the collection object
would otherwise be eligible for garbage collection actually adds overhead to a
program rather than helping the garbage collector. Keep in mind that the
example here was a deliberately mis-scoped one, essentially a case of
unintentional object retention.
Resources
About the
authors Jack Shirazi is the Director of JavaPerformanceTuning.com and author of Java Performance
Tuning (O'Reilly). Jack was an early adopter of Java, and for
the last few years has consulted primarily for the financial sector,
focusing on Java performance. Contact Jack at jack@JavaPerformanceTuning.com. |
Kirk Pepperdine is the Chief Technical
Officer at JavaPerformanceTuning.com and has been focused on Object
technologies and performance tuning for the last 15 years. Kirk is a
co-author of the book ANT Developer's
Handbook (SAMS). Contact Kirk at kirk@JavaPerformanceTuning.com.
|
|
|