Natural Software: May 2009

Wednesday, May 27, 2009

Ruby Unless Scala

Now that I'm programming primarily in Scala, I find myself missing a couple of cool tricks that Ruby offers. For example, it's neat to be able to put "unless" modifiers at the end of a line. Even if you've never seen Ruby (or Perl) before, it's easy to guess what the following code does.

# Ruby
print total unless total.zero?

It should be no surprise that if the total is zero, then nothing happens. But if the total is not zero, then it's printed.

To my knowledge, Scala has no such concept. However, with all the power Scala offers for creating internal DSLs, it might be fun to try to emulate this syntax. This post is more about demonstrating what can be done with Scala than it is about championing the use of unless modifiers in one's code.

My first attempt failed. I thought I would create a RichBoolean, analogous to a RichInt, so that I could effectively add some methods onto the Boolean class. My new class needed an "unless" method, but it had to be right associative. So, borrowing the trick use in the cons operator, it would have to end in a colon.


class RichBoolean(b: Boolean) {
  def unless_:(block: => Unit) {
    if (!b) block
  }
}

implicit def booleanToRichBoolean(b: Boolean) = {
  new RichBoolean(b)
}

There's a lot going on up there, so let's try to tease it apart and explain it. The RichBoolean class is basically a wrapper around the Boolean class. We've effectively added an "unless_:" method to that class. The implicit function tells the compiler to convert a Boolean into a RichBoolean whenever it appears that someone is trying to call an "unless_:" method on it.

This is the standard Scala way to add methods to a class. Some languages are more open, and allow the addition of methods directly after the class is defined. Scala offers the same freedom, but with better control. Unless you're importing the booleanToRichBoolean function, you don't get the automatic conversion. I know that some folks are nervous about implicits. But because of this control, I find them safer than open classes.

Another noteworthy feature of the above is the arrow symbol. This implies that the block is being passed into the "unless_:" method by name, and not by value. In other words, we hope that the block doesn't get evaluated before "unless_:" executes, but only inside of that method when b is false.

Passing by name is a remarkably powerful language feature. To the imperative programmer, it might seem like passing by reference in C++ or Fortran, but it's actually more subtle. We're not passing the address of the result of some calculation. We're actually passing a pointer to the code that computes the result. Methods that accept pass-by-name have the option to skip the calculation entirely, when that makes sense. Consider how efficient that can make a logging API!

Finally, Scala forces us to include the underscore in the "unless_:" method name, so that there's no ambiguity about whether the colon symbol is part of the lexeme. There's an important lesson here that's more widely applicable than this example. Never end lexemes with underscores. If they happen to wind up next to a colon, they may run into trouble.

I tried to test out this code with a little function. I could pass either true or false into it and see what happened. It compiled fine. It just didn't do what I expected.


def demonstrate_ruby_syntax(flag: Boolean) = {
  println("flag is " + flag) unless_: flag
}

Sure enough, flag gets promoted to a RichBoolean, and the infix "unless_:" fires. But no matter whether I pass in true or false, the println always executes. This is a little surprising because we passed block by name and not by value.

I'm at a bit of a loss to explain this. A little instrumenting showed that the block containing the println statement is executing outside the "unless_:" method, and not inside it.

By way of comparison, suppose we used the cons operator (::) to construct a List[Int] as follows...


  val list = 1 / 0 :: Nil

This blows up because of the divide by zero, but the stack trace reveals that the exception occurs before getting into the cons method. However, the cons method scaladocs say that the argument is passed by value, not by name, so we'd expect exactly that here.

So, unable to make my RichBoolean idea work, I next tried to put a wrapper class around the block itself. This had the advantage of letting me get rid of the colon cruft on the unless method name. I also don't think it's any more dangerous, despite the implicit, because the unless method signature admits only a Boolean.


package ruby

object Unless {

  class UnlessClass(block: => Unit) {
    def unless(b: Boolean) = {
      if (!b) block
    }
  }

  implicit def
  unitToUnlessClass(block: => Unit): UnlessClass = {
    new UnlessClass(block)
  }
}

Happily, this approach worked. It also demonstrates a neat fact. The implicit function accepts block by name, and the UnlessClass constructor does too. Yet, block doesn't execute until the unless method is called with a false argument. This means that the Scala compiler is smart enough to let the by-name cascade through (at least) two calls.

All I have to do now is...


import ruby.Unless._

// details omitted...

  def demonstrate_ruby_syntax(flag: Boolean) = {
    println("flag is " + flag) unless flag
  }

... and my Ruby-esque unless modifier syntax works as expected. The printing only occurs when the flag is false.

There's one more enhancement we can consider. Our code only compiled because println returns Unit. But what if we had some other routine that returned some other type? In such a case, we're relying on the side effects, and not the computational result of the function. This is an imperative rather than functional style, but since Scala lives in both worlds, it would still be nice to be able to use the unless modifier syntax. Consider the following contrived example.


def myfunc(flag: Boolean): Int = {
  println("myfunc flag is " + flag)
  42
}

Happily, generics can come to our rescue. By parameterizing our UnlessClass, we can implicitly convert to it from arbitrary types.


class UnlessClass[T](block: => T) {
  def unless(b: Boolean): Unit = {
    if (!b) block
  }
}

implicit def
toUnlessClass[T](block: => T): UnlessClass[T] = {
  new UnlessClass[T](block)
}

Note that our new unless method still returns Unit because we only use this construct where the return value of methods like myfunc are deliberately discarded.


  def demonstrate_ruby_syntax(flag: Boolean) = {
    myfunc(flag) unless flag
  }

In summary, by emulating the Ruby unless modifier, we've demonstrated a few of the Scala language features that allow rich DSLs to be created. Along the way we learned about right associativity, implicits, passing by name, and generics.

Wednesday, May 20, 2009

Overriding Scala def With val

Last time, we created a little toy class hierarchy to demonstrate Scala injection. We also illustrated how Scala's powerful type system can keep us out of trouble. This time, we're going to explore some design tradeoffs that emerge from choosing Scala def or Scala val.

To review, we created an abstract base class that stores masses, and always reports their values in kilograms. We extended that with immutable classes that are initialized with values in various units.


abstract class Mass {
  def kilograms: double
}

class Kilograms(kg: Double) extends Mass {
  def kilograms = kg
}

class Grams(grams: Double) extends Mass {
  def kilograms = grams / 1000.0
}

Suppose that the calculation to convert grams into kilograms was difficult and lengthy. Then the Grams implementation of the kilograms method might get us into trouble, because we'd be repeating that work needlessly every time it was called.


class Grams(grams: Double) extends Mass {
  def kilograms: Double = {
    Thread.sleep(5000) // pretend to think hard
    grams / 1000.0
  }
}

The above class constructs instantly, but every time somebody calls the kilograms method on an instance, it takes a long time. This is sad because Grams is immutable. We'd like some way to save the output of the calculation instead of the input.

Let's use the javap tool to peer into what Scala is doing under the hood. The constructor argument grams is called a class parameter in Scala-ese. Class parameters used outside of constructors, as grams is used in the kilograms method, become full fledged private fields of the class. Consider the following (edited) snippet.


$ javap -private Grams
Compiled from "Grams.scala"
public class Grams extends Mass
    private final double grams;
    public Grams(double);
    public double kilograms();

Amazingly, Scala allows us to override the abstract "def kilograms" in mass with a "val kilograms" in Grams. This is a lovely language feature, but it's worth spending a little energy to understand what's going on under the hood.

Let's change our kilograms def into a val in our derived classes. The following class is slow to construct, but each call to kilograms completes instantly.


class Grams(grams: Double) extends Mass {
  val kilograms: double = {
    Thread.sleep(5000) // pretend to think hard
    grams / 1000.0
  }
}

Take a moment to digest the tradeoff. The first version is small in memory, containing only one double field, the grams class parameter. It constructs quickly, but each call to kilograms takes a long time. The second version constructs slowly, but all calls to kilograms are quick. We would prefer the first design if we expect the users of the class to call kilograms no more than once, and the second design if we expect the users to call kilograms multiple times on each Grams instance.

In the second design, the grams class parameter appears to be used nowhere but in the constructor itself when the "val kilograms" is defined. So, one might expect that it will not become a real field in the Grams class. Trusty javap confirms this suspicion. Consider the following (again edited) snippet.


$ javap -private Grams
Compiled from "Grams.scala"
public class Grams extends Mass
    private final double kilograms;
    public Grams(double);
    public double kilograms();

Note that under the hood, despite being declared a val in the Scala source code, kilograms is also a method. A moment's reflection(no pun intended) will tell us that it has to be a method. Grams is a concrete class that extends an abstract class with a pure virtual kilograms method. So even thought the Scala source hides it, kilograms is still a method of Grams.

What is that public kilograms method up to? Again we appeal to javap, and learn that it's doing nothing except returning the double stored in the private kilograms field. Just as we might have expected.


public double kilograms();
  Code:
   Stack=2, Locals=1, Args_size=1
   0:   aload_0
   1:   getfield #30;     // Field kilograms:D
   4:   dreturn

The above is much shorter than the previous version, which performed the expensive calculation. Again, we conclude that the criteria to prefer one design over the other rests on the expected usage patterns of our class, as explored above.

We should also ask ourselves whether it's possible to delay the expensive calculation, possibly indefinitely, in case it's never needed. This third design would represent the classic programming tradeoff between space and time, and we'll take it up in a later post.

In summary, we've seen that it's possible to override a Scala def with a Scala val. Under the hood, the override is still implemented by a method. The javap tool is very useful to help us figure out what's going on, and one would do well to understand the design tradeoffs of each approach. Scala's marriage of object oriented programming with functional programming is made in heaven. We can use inheritance and exploit immutability, enjoying the flexibility to make considered design choices.

Wednesday, May 13, 2009

Sweet Scala Injection

Before getting back to non-final finals, let's consider a fun diversion. The second most famous equation in Physics is Newton's second law of motion, F = ma. When you apply a force F to an object of mass m, it accelerates at rate a.

Of course, your arithmetic only gives you the right answer if you're consistent in the measurement system you pick. There are two major systems of units in use. One is the metric system or SI (System International), formerly called the mks system. The letters stand for meter, kilogram, and second, which are the principal units used to measure length, mass, and time.

The other main system in use is also the metric system. (Gotcha.) It's called the cgs system, whose letters stand for centimeter, gram, and second. You have to take care to keep your units straight to use formulas like F = ma. The units for force are named newtons in the mks system and dynes in the cgs system. But if you multiply a gram times an acceleration recorded in meters per second per second, you'll get neither a newton nor a dyne.

The recent loss of the Mars Polar Lander is a painful demonstration that units really matter.

Since I like to blog about how thinking like a scientist makes me a better coder, I'll mention that it's unnatural to think, "oh, this book weighs two." Such a sentence might be grammatically correct, but without specifying the units, it's meaningless.

Scala has an especially thoughtful type system, and we can press it into service to keep our units straight when we do calculations. In this (and the next) post, we'll create a toy program, in illustrate one or two Scala goodies.

Kilograms and grams both measure mass. It's not too much of a stretch to use the "is-a" relationship in an object oriented language to capture this notion. In what follows, Kilograms and Grams inherit from Mass.


absract class Mass {
  def kilograms: Double
}

class Kilograms(kg: Double) extends Mass {
  def kilograms = kg
}

class Grams(grams: Double) extends Mass {
  def kilograms = grams / 1000.0
}

Our base class has a kilograms method that returns the amount of mass in the mks units. All our calculations will be done in mks units, but the programmer is free to initialize a mass variable with either kilograms or grams.

Now let's construct a Force class. In a full-fledged example, we'd probably make it an abstract class extended by Newtons and Dynes. But we don't need such a complete solution here to demonstrate the ideas. Give the class an accelerates method, which tells how much the given force in newtons will accelerate a specified mass.


class Force(newtons: double) {
  def accelerates(mass: Mass) =
    (newtons / mass.kilograms) + " meters per sec^2"
}

Note that the accelerates method doesn't care whether it's passed a value in kilograms or in grams. All it's demanding is a mass, and since that offers a method to take us into mks-land, we can assuredly report our acceleration in meters per second per second.

Now, let's define a force of half a newton, and run a little program to see how much this force will accelerate a couple of masses. In each case below, there's no ambiguity about whether each mass is expressed in kilograms or grams, because the units are explicitly specified.


object MyApp extends Application {
  val force = new Force(0.5)
  println(force accelerates (new Kilograms(4.0)))
  println(force accelerates (new Grams(100)))
  //
  // "0.125 meters per sec^2"
  // "5.0 meters per sec^2"
}

The parentheses around the "new Kilograms(4.0)" are actually redundant, but that might surprise a Java programmer. Scala also lets us omit the dot between force and accelerates, which arguably improves readability.

So, the above works, but specifying "new Kilograms" everywhere I need to define a mass is a hassle. More importantly, it hurts readability, because there is no "new" anywhere in my mental model of the F = ma equation.

Fortunately, Scala offers injections, which can pretty up the source code. In C++, I can construct an instance on the stack without calling new. Although all instances in Scala live on the heap, I find the syntax reminiscent of C++ constructors.

We want to be able to write "Kilograms(4.0)" instead of "new Kilograms(4.0)" when we use our concrete Mass classes. To do this, create a Scala companion object of the same name as the class, and give it an apply method.


object Kilograms {
  def apply(kg: Double) = new Kilograms(kg)
}

object Grams {
  def apply(grams: Double) = new Grams(grams)
}

These functions are called injections. Basically, they are factory methods on the companion objects, but we don't need to call apply explicitly. This is the same syntactic sugar that allows us to write "List(1, 2, 3)" instead of "new List(1, 2, 3)". It pretties up our code nicely.


  println(force accelerates Kilograms(4.0))
  println(force accelerates Grams(100))

Note that we have made a tradeoff for this sweetness. We had to write more code (the injections) when defining our classes, so we could make life easier on the users of the classes. However, this is almost always the way to go. Readability is important.

Readability is also the reason that the accelerates method takes a Mass instance and not a plain Double. The extra word "Kilograms" or "Grams" doesn't help the computer, but it does help the human.

(However, the astute reader will have noticed that the kilograms method of the Grams class is inefficient. It performs a double precision floating point calculation every time it is called, even though the instance itself is immutable. If only there were a way to save the result of the calculation instead of the inputs, then we could run faster without worsening our memory footprint. Contemplating this is a topic for another day.)

In conclusion, tastefully applied Scala injections enhance readability. And they're more digestible than Martian soil coming towards you at a rate of, uhm, really fast.

Wednesday, May 6, 2009

And That's Final, Not!

This post is about C++ and Java, but it really offers some necessary background material to explore an interesting issue facing Scala. What follows is hopefully widely known by C++ and Java veterans, but it's still worth reviewing here so that we're all on the same page when we talk about Scala in the near future.

C++ fans are often encouraged not to use #defines for their constants, in part because the preprocessor has no notion of types. For example, Scott Meyers champions this idea. Instead of writing...

#define PI 3.14159

...which performs a simple textual substitution everywhere in the source file, the following is usually preferred:

const double PI = 3.14159;

The latter alternative helps the compiler supply more meaningful error messages. If the preprocessor were used instead, then the compiler has never heard of the lexeme "PI", and can't include it in any error messages.

We don't have a preprocessor in Java. We also lack a usable const keyword. Instead we use final to describe variables whose values will not change. Unfortunately, just as static has multiple meanings in C++, final has multiple meanings in Java.

In C++, methods are non-virtual by default, and must be given a special keyword, virtual, to denote that they are polymorphic. The Java philosophy is different. In Java, methods are virtual by default, and must be given a special keyword, final, to denote that they can not be overridden. So this keyword pulls double duty in Java: for methods final means non-virtual, and for fields it implies constant.

Another difference with C++ is that we don't have standalone variables in Java. We put them inside a class as below. In order to explore the issue at the heart of this blog entry, we deliberately do not make the field below static.


public class MyClass extends YourClass
{
    public final double PI = 3.14159;
    //
    //... details omitted
}

A nearly equivalent way to define PI would be in a constructor. It's noteworthy that the final fields of a class can only be defined where they are declared, or in a constructor. A "set" method to change a final field would not compile.


public class MyClass extends YourClass{
    public final double PI;public MyClass()
    {
        PI = 3.14159;
    }
    //
    //... details omitted
}

At first glance, the two ways of defining the final PI in Java appear equivalent. But they are not. In fact, they are different in a crucial way that we'll explore in a subsequent post. Programmers that don't understand when final doesn't really mean final risk writing programs with undesired behavior.

In case you're on an interview...

A standard interview question is to ask a candidate to contrast inheritance in C++ and Java. The expected answer includes something like, "Well, C++ has multiple inheritance and Java doesn't."

But there's another difference, and folks who make the following observation display a valuable insight into the differences between the languages. "Well, I can truly call a virtual function from a Java constructor, but I can only appear to call a virtual function from a C++ constructor."

Let's digest this statement. If I try to call a virtual function in a C++ base class, I'm only going to get the base class's version, not the derived class's version. In other words, it's forbidden for a base class constructor to peer down into the code of a class that inherits from it.


// C++
class Base
{
public:
    virtual void f();
    Base();
};
void Base::f() { cout << "Base" << endl; }

The rules of C++ deny the implementation Derived::f from being executed within Base::Base. In other words, f does not behave as a virtual function when called within a constructor.

But in Java...

Such behavior contrasts sharply with Java. In Java, the derived class's implementation does get executed. Consider the ostensibly equivalent program below.


// Java
public class Base
{
    public void f() { System.out.println("Base"); }
    public Base() { f(); }
}

public class Derived extends Base
{
    @Override public void f()
    {
        System.out.println("Derived");
    }
}

public static void main(String[] args)
{
    new Derived();
    //
    // "Derived" gets printed, not "Base"
}

Java's approach might seem to be an advantage, but it comes with a hefty price. If the Derived class has a constructor, it fires after the base class constructor. That means that the f method of the derived class executes before the derived class constructor.

Reread that and let it sink in. It implies that if the derived class constructor has any initializations to perform or invariants to enforce before Derived::f fires, then we're in trouble. Let's demonstrate this with an example.


public abstract class Abstract
{
    public Abstract() { showPi(); }
    public abstract void showPi();
}

public class Concrete extends Abstract
{
    final double PI;
    public Concrete() { this.PI = 3.14159; }
    @Override public void showPi()
    {
        System.out.println(PI);
    }
}

public class Main
{
    public static main(String[] args)
    {
        new Concrete();
        //
        // "0.0" gets printed, not "3.14159"
    }
}

How can this be? It's as if the final PI value has changed. In fact, that's exactly what has happened. When a new Java object is allocated from the heap, all its fields are zeroed out. So when the constructor of Abstract fires, the memory location where PI lives contains zero. Later on, when the constructor for Concrete fires, that memory location is overwritten by 3.14159. Any subsequent attempts to call showPi will print "3.14159".

How serious is this problem in Java? I argue that it's not too serious, as long as developers are trained in this behavior, and they know what to expect. The greater dangers come from language quirks that surprise the coder, or from the clever coder who tries to exploit the poorly lit street corners of the language.

There are a few reasons why this problem is not too awful. First, the behavior is still deterministic. The fields of the object are all zeroed out when it's allocated from the heap, so there is no surprising cruft left in those memory addresses. No matter how many times I run my program above, I'm always going to print "0.0" and not some random bits.

Second, it's prudent for constructors only to call methods that are themselves final (meaning non-virtual). This is a common coding convention, and embracing it leads to code that's easier to understand and maintain. Tools like the fb-contrib plugin for Findbugs can enforce this convention.

Finally, classes that extend base classes know what their superclass is. It's a bit difficult to sneak dodgy behavior into a base class without being seen by the designer of the derived class, particularly when your tools will detect it. Consider that the source code of the child class itself will specify the particular base class it extends.

How serious these non-final finals are in Scala, however, may be another matter. We'll investigate this in the near future.