Wednesday, May 6, 2009

And That's Final, Not!

This post is about C++ and Java, but it really offers some necessary background material to explore an interesting issue facing Scala. What follows is hopefully widely known by C++ and Java veterans, but it's still worth reviewing here so that we're all on the same page when we talk about Scala in the near future.

C++ fans are often encouraged not to use #defines for their constants, in part because the preprocessor has no notion of types.  For example, Scott Meyers champions this idea.  Instead of writing...
#define PI 3.14159

...which performs a simple textual substitution everywhere in the source file, the following is usually preferred:
const double PI = 3.14159;

The latter alternative helps the compiler supply more meaningful error messages. If the preprocessor were used instead, then the compiler has never heard of the lexeme "PI", and can't include it in any error messages.

We don't have a preprocessor in Java. We also lack a usable const keyword. Instead we use final to describe variables whose values will not change. Unfortunately, just as static has multiple meanings in C++, final has multiple meanings in Java.

In C++, methods are non-virtual by default, and must be given a special keyword, virtual, to denote that they are polymorphic. The Java philosophy is different. In Java, methods are virtual by default, and must be given a special keyword, final, to denote that they can not be overridden. So this keyword pulls double duty in Java: for methods final means non-virtual, and for fields it implies constant.

Another difference with C++ is that we don't have standalone variables in Java. We put them inside a class as below. In order to explore the issue at the heart of this blog entry, we deliberately do not make the field below static.

public class MyClass extends YourClass
{
public final double PI = 3.14159;
//
//... details omitted
}

A nearly equivalent way to define PI would be in a constructor. It's noteworthy that the final fields of a class can only be defined where they are declared, or in a constructor. A "set" method to change a final field would not compile.

public class MyClass extends YourClass{
public final double PI;public MyClass()
{
PI = 3.14159;
}
//
//... details omitted
}

At first glance, the two ways of defining the final PI in Java appear equivalent. But they are not. In fact, they are different in a crucial way that we'll explore in a subsequent post. Programmers that don't understand when final doesn't really mean final risk writing programs with undesired behavior.

In case you're on an interview...


A standard interview question is to ask a candidate to contrast inheritance in C++ and Java. The expected answer includes something like, "Well, C++ has multiple inheritance and Java doesn't."

But there's another difference, and folks who make the following observation display a valuable insight into the differences between the languages. "Well, I can truly call a virtual function from a Java constructor, but I can only appear to call a virtual function from a C++ constructor."

Let's digest this statement. If I try to call a virtual function in a C++ base class, I'm only going to get the base class's version, not the derived class's version. In other words, it's forbidden for a base class constructor to peer down into the code of a class that inherits from it.

// C++
class Base
{
public:
virtual void f();
Base();
};
void Base::f() { cout << "Base" << endl; }

The rules of C++ deny the implementation Derived::f from being executed within Base::Base. In other words, f does not behave as a virtual function when called within a constructor.

But in Java...


Such behavior contrasts sharply with Java. In Java, the derived class's implementation does get executed. Consider the ostensibly equivalent program below.

// Java
public class Base
{
public void f() { System.out.println("Base"); }
public Base() { f(); }
}

public class Derived extends Base
{
@Override public void f()
{
System.out.println("Derived");
}
}

public static void main(String[] args)
{
new Derived();
//
// "Derived" gets printed, not "Base"
}

Java's approach might seem to be an advantage, but it comes with a hefty price. If the Derived class has a constructor, it fires after the base class constructor. That means that the f method of the derived class executes before the derived class constructor.

Reread that and let it sink in. It implies that if the derived class constructor has any initializations to perform or invariants to enforce before Derived::f fires, then we're in trouble. Let's demonstrate this with an example.

public abstract class Abstract
{
public Abstract() { showPi(); }
public abstract void showPi();
}

public class Concrete extends Abstract
{
final double PI;
public Concrete() { this.PI = 3.14159; }
@Override public void showPi()
{
System.out.println(PI);
}
}

public class Main
{
public static main(String[] args)
{
new Concrete();
//
// "0.0" gets printed, not "3.14159"
}
}

How can this be? It's as if the final PI value has changed. In fact, that's exactly what has happened. When a new Java object is allocated from the heap, all its fields are zeroed out. So when the constructor of Abstract fires, the memory location where PI lives contains zero. Later on, when the constructor for Concrete fires, that memory location is overwritten by 3.14159. Any subsequent attempts to call showPi will print "3.14159".

How serious is this problem in Java? I argue that it's not too serious, as long as developers are trained in this behavior, and they know what to expect. The greater dangers come from language quirks that surprise the coder, or from the clever coder who tries to exploit the poorly lit street corners of the language.

There are a few reasons why this problem is not too awful. First, the behavior is still deterministic. The fields of the object are all zeroed out when it's allocated from the heap, so there is no surprising cruft left in those memory addresses. No matter how many times I run my program above, I'm always going to print "0.0" and not some random bits.

Second, it's prudent for constructors only to call methods that are themselves final (meaning non-virtual). This is a common coding convention, and embracing it leads to code that's easier to understand and maintain. Tools like the fb-contrib plugin for Findbugs can enforce this convention.

Finally, classes that extend base classes know what their superclass is. It's a bit difficult to sneak dodgy behavior into a base class without being seen by the designer of the derived class, particularly when your tools will detect it. Consider that the source code of the child class itself will specify the particular base class it extends.

How serious these non-final finals are in Scala, however, may be another matter. We'll investigate this in the near future.

No comments: