Prevent mistypings to the String class


Search for:	within
		Search help

IBM home | Products & services | Support & downloads | My account

developerWorks > Java technology


	Prevent mistypings to the String class

Contents:

Zooming in on String mistypes

The problem and some solutions

Prevention methods

At the end of your string

Resources

About the author

Rate this article

Related content:

Diagnosing Java code series

Java modeling series

IBM OCL Parser 0.3

Subscriptions:

dW newsletters

dW Subscription
(CDs and downloads)

Take full advantage of the Java language's polymorphism

Level: Intermediate

Fernando Ribeiro (mailto:fribeiro@bol.com.br?cc=&subject=Prevent mistypings to the String class)
Consultant
August 2002

The conversion of objects to strings (or stringification) can cause problems in Java programming unless you remember that string representations are rarely used in solid object-oriented applications. In this article, systems analyst and programmer Fernando Ribeiro builds on Eric Allen's bug pattern concept and explains how mis-stringification can be a bug pattern; he discusses the diagnostics of this elusive pitfall and expounds on the benefits of type safety.

Stringification is the conversion of an object to a string and, for the purpose of this article, mis-stringification refers to the mistyping to the String class. The examples in this paper will show you that rarely is a product code a string, for example, but many developers will type it to the String class and jeopardize the infinite usefulness of polymorphism in object-oriented programming.

Although it may seem simply a matter of style (since an insidious attribute of the mis-stringification "bug pattern" is that it causes no errors at any time, not even at test time), avoiding mistyping to the String class allows you to take full advantage of the Java language's inherent feature of polymorphism. In practical terms, avoiding this pattern is the best way to combat it, and the best way to avoid it is to define a specific type for most elements in your code. By doing so, you will ensure the reliability of your system by making sure that each class type is appropriate for its job. This solution may add some overhead to your system's performance, but the trade-off is a much more reliable system.

We'll be discussing this pattern in the context of an enterprise system, and in this article we'll examine one way to detect this bug: the mis-overloading of a method. (We don't discuss repairing the bug much in this article because simply avoiding string representations is the best, most common, method to solving the problem.)

Zooming in on String mistypes
I like to look at this problem of mistyping objects to the String class much as I would a bug pattern. So, let's refer to this misadventure as the mis-stringification bug pattern. (For more on bug patterns, see Eric Allen's Diagnosing Java code column available in Resources.)

A few definitions
UML (Unified Modeling Language): A language for specifying, visualizing, constructing, and documenting software-system artifacts by simplifying the software-design process by crafting a plan, or "blueprint," for construction.
For those of you who aren't acquainted with all of the terms used in this article, these definitions should help get you up to speed:
OCL (Object Constraint Language): The expression language for the Unified Modeling Language (UML); it has the characteristics of a pure expression language (cannot change anything in the model), a modeling language (all implementation issues are out of scope and cannot be expressed), and a formal language (all constructs have a formally defined meaning).

Type-safe, type safety: A UML model element (such as a field or operation) assigned a type whose structure and behavior most closely match the specification of the element.

Stringification: Converting an object to a string.

Polymorphism: In object-oriented programming, a programming language's ability to process objects differently depending on their type.

Method overloading: The ability in object-oriented applications to redefine methods for derived classes in which the method name remains the same but the type of the parameters change.

Before we continue, permit me a quick word on the concept of type safety. When a UML model element is type-safe, its structure and behavior closely match its specification or, in other words, it has been developed specifically for its purpose. An example may help: a "key" parameter to an operation that searches an indexed list is not a string but only an object that, like any other Java object, may be stringified by calling the toString() : String method. The difference is that strings may be substringed, concatenated, and so on; keys cannot. They are keys, not strings.

In a type-safe application, the type of the color field, the return type of the getColor() : String method, and the color parameter of the setColor(color : String) : void method is Color not String -- it returns the color of the vehicle, not its string representation. Listing 1 offers an example.

In the code examples in this article, we will be using a fictitious enterprise system that encompasses the delivery and tracking functions of the products of the automotive industry. We will be defining classes for this system, including a Vehicle class (when we discuss individual vehicle details) and a more generic Product class (as an example of a generic enterprise product catalog).

Listing 1. The mis-stringified vehicle


/**
 * The product
 **/
public class Product {

    /**
     * Construct a product
     **/
    public Product() {
    }

    /**
     * Construct a product
     * @param code A code
     **/
    public Product(String code) {
        this.setCode(code);
    }

    /**
     * The code of a product
     **/
    private String code;

    public boolean equals(Object b) {
        if (!(b instanceof Product))
            return false;
        return this.getCode().equals(((Product)b).getCode());
    }

    protected void finalize() {
        this.setCode(null);
    }

    /**
     * Get the code of a product
     * @return The code of a product
     **/
    public String getCode() {
        return this.code;
    }

    public int hashCode() {
        String code = this.getCode(); // defensively copies
        if (code == null)
            return 0;
        return code.hashCode();
    }

    /**
     * Set the code of a product
     * @param code A code
     **/
    public void setCode(String code) {
        this.code = code;
    }

    public String toString() {
        return new String();
    }

}

/** * The vehicle **/ public class Vehicle { /** * Construct a vehicle **/ public Vehicle() { } /** * The color of a vehicle **/ private String color; /** * Get the color of a vehicle * @return The color of a vehicle **/ public String getColor() { return this.color; } /** * Set the color of a vehicle * @param color A color **/ public void setColor(String color) { this.color = color; } }

This bug pattern is found in many enterprise systems, including product catalogs. For an example, look at the following code (this example also defines a Product class):

Listing 2. The mis-stringified product


/**
 * The product
 **/
public class Product {

    /**
     * Construct a product
     **/
    public Product() {
    }

    /**
     * Construct a product
     * @param code A code
     **/
    public Product(String code) {
        this.setCode(code);
    }

    /**
     * The code of a product
     **/
    private String code;

    public boolean equals(Object b) {
        if (!(b instanceof Product))
            return false;
        return this.getCode().equals(((Product)b).getCode());
    }

    protected void finalize() {
        this.setCode(null);
    }

    /**
     * Get the code of a product
     * @return The code of a product
     **/
    public String getCode() {
        return this.code;
    }

    public int hashCode() {
        String code = this.getCode(); // defensively copies
        if (code == null)
            return 0;
        return code.hashCode();
    }

    /**
     * Set the code of a product
     * @param code A code
     **/
    public void setCode(String code) {
        this.code = code;
    }

    public String toString() {
        return new String();
    }

}

A few comments about the design of the Product class in the above code:

The first constructor is empty and takes no arguments.
The second constructor takes a code.
The codes compose -- are a part of -- the product.
The string representations of the products are empty strings.
The products are equaled by their codes.
The hash codes of the products are the hash codes of their codes.

Let's take a look at some code examples for the last two items.

Products are equaled by their codes
Here an OCL constraint to illustrate this point:


context Product::equals(b : Object) : boolean
    pre: b.oclIsKindOf(Product);
    post: result = self.getCode().equals(b.oclAsType(Product).getCode());

The hash code of products and their codes are the same
Here is an OCL constraint to illustrate this point:


context Product::hashCode() : int post:
    let code : String = self.getCode() in
    if code.oclIsUndefined() then
        result = 0;
    else
        result = code.hashCode();
    end if

Here's why an occurrence of the mis-stringification bug pattern can hamstring your ability to produce good code -- product code is not a string because it may require structure and behavior beyond what is available in the String class.

(The OCL constraints in this article are based on this OCL 2.0 submission -- for example, "oclIsNew" doesn't exist in OCL 1.4. For more on OCL, see Resources.)

Also, product code may require specializations (such as sales or engineering product code). And some products may be coded many times -- the engineering code may be used by the logistics systems; the logistics code may be used by the sales systems; the engineering, logistics, and sales codes may be used by the e-business systems. The usage requirements of product codes are sort of red flags to developers, steering them toward the practice of developing a new, specific type for each kind of product code.

So why does this problem occur? And how do we fix or avoid it?

The problem and some solutions
The problem occurs because most programmers don't employ type safety in object-oriented applications. (Remember, we think it is worth the extra effort to define a new type specific to the requirements rather than rely on existing types that may not match close enough and may cause problems.) The benefits of type-safe applications are sampled by the following vehicle problem in which vehicles (cars and trucks) are delivered by different ships. Take a look at the following code:


/**
 * Deliver a vehicle
 * @param vehicle A vehicle
 **/
public void deliver(String vehicle) {
    // is it a car or a truck?
}

The deliver(vehicle : String) : void method implements the delivery of strings (sad but true) instead of vehicles because any string is assignable to the vehicle parameter. This really isn't a solution to the problem.

The Vehicle type, like the one used in the next code block, is a much better match to what we want the invoker to pass to this method.


/**
 * Deliver a vehicle
 * @param vehicle A vehicle
 **/
public void deliver(Vehicle vehicle) {
    // who delivers a vehicle?
}

The deliver(vehicle:Vehicle) : void method implements the delivery of vehicles but, because cars and trucks -- all the vehicles in this context -- are not delivered by the same ship, it also isn't a complete solution to the problem.

Look at this next bit of code:


/**
 * Deliver a car
 * @param car A car
 **/
public void deliverCar(String car) {
    // delivered by the first ship
}

/**
 * Deliver a truck
 * @param truck A truck
 **/
public void deliverTruck(String truck) {
    // delivered by the second ship
}

This isn't a good solution either because the invokers of the deliverCar(car : String) : void and deliverTruck(truck: String) : void methods are conditioned to differentiate cars and trucks.

Finally, take a look at the following code:


/**
 * Deliver a car
 * @param car A car
 **/
public void deliver(Car car) {
    // delivered by the first ship
}

/**
 * Deliver a truck
 * @param truck A truck
 **/
public void deliver(Truck truck) {
    // delivered by the second ship
}

The invokers of the deliver(car : Car) : void and deliver(truck : Truck) : void methods aren't conditioned to differentiate cars and trucks because method overloading allows a developer to implement the same behavior for several argument lists. This approach is appropriate in object-oriented applications.

So far, the code examples we've covered have used method overloading in conjunction with a feature in the Java compiler -- method narrowing -- that searches for a best match to an operation requested by the invoker. This search is based not only on the name of the method but also on the type of its parameters and the size of the parameter list. (For more on method narrowing, see Resources.)

The deliver(vehicle : Vehicle) : void method replaces both the deliver(car : Car) : void and deliver(truck : Truck) : void methods when cars and trucks are delivered by the same ship. And, in accordance to the rules for binary compatibility of the Java specification, the invokers of these two methods don`t even need to be recompiled. This is the power of polymorphism to be unleashed by Java applications.

Prevention methods
The "golden rule" to avoiding problems with stringification is this:

String representations of objects should be the only strings in type-safe applications.

The following code and the UML class diagram should illustrate a clear design of a type-safe object-oriented application in UML.

A look at a type-safe product
The following block is a well-designed, type-safe product.

Listing 3. The type-safe product


/**
 * The product
 **/
public class Product {

    /**
     * Construct a product
     **/
    public Product() {
    }

    /**
     * Construct a product
     * @param code A code
     **/
    public Product(ProductCode code) {
        this.setCode(code);
    }

    /**
     * The code of a product
     **/
    private ProductCode code;

    public boolean equals(Object b) {
        if (!(b instanceof Product))
            return false;
        return this.getCode().equals(((Product)b).getCode());
    }
    
    protected void finalize() {
        this.setCode(null);
    }

    /**
     * Get the code of a product
     * @return The code of a product
     **/
    public ProductCode getCode() {
        return this.code;
    }

    public int hashCode() {
        ProductCode code = this.getCode(); // defensively copies
        if (code == null)
            return 0;
        return code.hashCode();
    }

    /**
     * Set the code of a product
     * @param code A code
     **/
    public void setCode(ProductCode code) {
        this.code = code;
    }

    public String toString() {
        return new String();
    }

}

Figure 1. The UML class diagram of the type-safe product

A look at the type-safe product code
In this section, we'll examine the product code and ProductCode class.

Listing 4. The product code


/**
 * The product code
 **/
public class ProductCode {

    /**
     * Construct a product code
     **/
    public ProductCode() {
    }

    public boolean equals(Object b) {
        if (!(b instanceof ProductCode))
            return false;
        return this.toString().equals(b.toString());
    }

    public int hashCode() {
        return this.toString().hashCode();
    }

    public String toString() {
        return new String();
    }

}

A quick note: At this point, some developers here would question the wisdom of having toString return a new String with every toString() call. I've certified this approach with other developers, including the author of Effective Java Programming, Joshua Bloch, and it seems to be the best solution at this time. Calling intern() to access the pool would be awkward because the variable to hold the return of this method tends to be short-lived so performance is assumed not to be an issue.

A few comments about the design of the ProductCode class:

The constructor is empty and takes no parameters.
The product codes are equaled by their string representations.
The hash codes of the product codes are the hash codes of their string representations.
The string representations of the product codes are empty strings.

Let's look a bit closer at the last three items.

Product codes are equaled by their string representations
Here is an OCL constraint to illustrate this point:


context ProductCode::equals(b : Object)
    pre: b.oclIsKindOf(ProductCode)
    post: result = self.getCode().equals(b.getCode())

The hash codes of product codes and their string representations are the same
Here is an OCL constraint to illustrate this point:


context ProductCode::hashCode() : int post:
    result = self.toString().hashCode();

Product code string representations are empty strings
Here is an OCL constraint to illustrate this point:


context ProductCode::toString() : String post:
    result.oclIsNew();

Examining interface implementation
Some interfaces may be easily implemented by subclasses of the ProductCode class:

Cloneable
Comparable
Serializable

Let's illuminate these subclass interface implementations with code examples. We'll begin with Cloneable:


public Object clone() throws CloneNotSupportedException {
    return super.clone();
}

And here's an example of an interface implementation of Comparable:


public int compareTo(Object b) {
    if (!(b instanceof ProductCode))
        throw new ClassCastException();
    return toString().compareTo(b.toString());
}

Following is a demonstration of the string representation of the product codes being changed by a subclass of the ProductCode class:


ProductCode pc = new ProductCode() {
    public String toString() {
        return "9BGRD08Z01G167984";
    }
};

Notice that the syntax isn't particularly beautiful in the last example.

At the end of your string
The String class is a final class. It may not be extended for a very good reason: the class itself already provides all the behavior used by Java applications. Inheriting the ProductCode class from String (as some developers would like to do) would be as awkward as using the string representation of a product code instead of the product code itself to compose a product.

Employing type safety to avoid the mis-stringification bug pattern will take extra time (to create new, more specific types), will likely not increase your system's performance, but will always increase your system's reliability.

The benefits of polymorphism go hand-in-hand with the practice of using type safety and mis-stringification is one more reason to care about this and understand that it is not just a matter of style.

I'd like to thank the authors of the OCL spec, Jos Warmer and Anneke Kleppe of Klasse Objecten, for their comments and the support they offered in crafting this article.

Resources

For more on bug patterns, see the Diagnosing Java code columns by Eric Allen.
A good source of information on the Unified Modeling Language and its expression language, OCL, is Granville Miller's column, Java modeling .
Two more excellent resources on OCL and UML the UML 1.4 specification and the OCL 2.0 submission from Boldsoft, Rational Software Corporation, IONA, and Adaptive Ltd. to learn more about OCL.
A great roundup on OCL -- definition, history of development, links to other resources -- can be found on the IBM OCL page.
A useful tool to understand OCL is the IBM OCL Parser 0.3.
A good resource on the useful Java feature method narrowing can be found in this Sun technical article.
Find other Java related resources on the developerWorks Java technology zone.

About the author
Fernando Ribeiro is a senior systems analyst and programmer in Brazil. Fernando has been using C++, Java and UML for six years in several industries, and recently was a member of a JCP expert group engaged in internationalizing J2EE applications for a major global IT service company. You can contact him at fribeiro@bol.com.br.

developerWorks > Java technology

About IBM | Privacy | Terms of use | Contact