Natural Software: June 2009

Wednesday, June 24, 2009

More Scala Using RAISIN

Last time, we offered a minimally functional emulation of C#'s using syntax, to manage resources elegantly in Scala. We defined a curried function, whose second argument was a simple block of code. We'll refine that approach and try to bring about the remaining goals we set for ourselves for this feature.


  def using[T <% Disposable]
  (resource: T)(block: => Unit) = {
    try {
      block
    }
    finally {
      resource.dispose
    }
  }

One problem with our first cut was that the object encapsulating the managed resource had a larger scope than we wanted. Since we constructed our FileHandle instance outside of the block that used it, one could accidentally access it after it had been disposed.


  val handle = new FileHandle("trouble")
  using(handle) {
    handle.read
    handle.write(42)
  }
  // big trouble below!
  handle.read

What we really need is not to pass a Unit into the using function, but a function that accepts the resource as its argument. In other words, we'd like to be able to make a useful function and pass that as an argument into the using method


  def useful_function(handle: FileHandle): Unit = {
    handle.read
    handle.write(42)
  }

  // pseudo-code to capture the idea
  //
  using(new FileHAndle("good"), useful_function)

That's the gist of what we want to do, but we don't want all the cruft of declaring the useful function separately. Happily, Scala allows us to use function literals to write the above very economically.


  using(new FileHandle("good")) { handle =>
    handle.read
    handle.write(42)
  }
  //
  // handle is not visible down here and
  // can't be abused, Yay

For this to work, we have to refine our using method. All we have to do is change the second argument from type Unit to the function T => Unit, and make sure to call the block with the expected T resource.


  def using[T <% Disposable]
  (resource: T)(block: T => Unit) {
    try {
      block(resource)
    }
    finally {
      resource.dispose
    }
  }

Our using function is pretty powerful now. Without any modifications, it works with closures as well as function literals. Let's alter the client code a bit to demonstrate. The following is a closure and not a function literal because i is not defined inside the curly braces demarking the code passed into using.


  def demonstrate_closure(i: Int) = {
    using (new FileHandle("simple")) { handle =>
      handle.read
      handle.write(i)
    }
  }

Still, there are additional things we can do in the body of our using method. For example, we could take special action if the resource passed in were null. Alternatively, we could wrap the dispose calls inside a try-catch block to prevent them from emitting exceptions.

C++ uses compile-time overloading to choose different behaviors for some functions. For example, the new operator comes in different overloaded flavors. One takes a throwaway argument of type nothrow_t to indicate that the desired version of new will return NULL when it fails, instead of throwing an exception.

In Scala, a tried and true way to choose different behaviors at compile time is by the import statements. For example, if you want a mutable Set in Scala, you


  import scala.collection.mutable.Set

This inherits from the same Set trait as the immutable version, so the logic where the class is used is clean. Although the C++ nothrow_t concept is interesting, Scala's approach appears to have a better separation of concerns, and results in uncluttered code.

If we are so inclined, we can do something analogous with our using method. We could choose to import from one package where the implementation swallows Throwables emitted by dispose. Or, we could import from another where they are allowed to propagate. In other words, we can handle exceptions quite intelligently, and customize our behavior depending on context.

Finally, let's consider whether we can avoid needing to nest using clauses, and manage the disposal of multiple resources more elegantly. This is possible, but there's one important subtlety that we have to worry about.


def using[T <% Disposable, U <% Disposable]
(resource: T, _resource2: => U)(block: (T,U) => Unit) {
  try {
    val resource2 = _resource2
    try {
      block(resource, resource2)
    }
    finally {
      resource2.dispose
    }
  }
  finally {
    resource.dispose
  }
}

Note that the _resource2 argument is passed by name, and not by value. We don't actually access it until declaring the val resource2 inside the outer try block. This means that if the construction of resource2 fails, we will still call dispose on the other resource.

Let's demonstrate this. Suppose our first resource object constructs okay, but the second one throws an exception in its constructor. This is standard behavior for a RAISIN class, which disallows partially constructed instances.


  def two_resources() = {
    using (new FileHandle("okay"), new FileHandle("bad")) {
      (first, second) =>
      second.write(first.read)
    }
  }

If that second FileHandle constructor fires before entering the using method, then we have a resource leak! The first FileHandle is never disposed. But, because we pass the second argument by name, the second constructor does not fire before entering the using method. We're essentially passing a pointer to the constructor into the using function, who calls it.

Why pass just the second one by name and not the first one? Did we just get lucky? No. Scala evaluates its arguments from left to right.

A consequence of this choice is that we cannot access the _resource2 argument more than once inside the using method. Note that it's accessed exactly once when defining the val resource2. Otherwise, the constructor would be called again and again inside the using method. That would be an even worse resource leak, and would probably malfunction.

We've now shown that our C# emulation meets all but one of our goals. This is impressive because the Scala behavior is superior even to C# itself, for example with regard to limiting the scope of variables. The remaining goal is to demonstrate how our using construct can work with legacy classes such as java.io.File that do not extend Disposable. We'll take up this cause in the near future, after a detour into some decidedly non-standard C++. But the punchline is, we had the foresight to use view bounds and not upper bounds, so we're well prepared.

In summary, we've shown how to emulate the C# using syntax in Scala, to enable RAISIN style programming. We were remarkably successful at bullet-proofing our resource management with surprisingly few lines of code. We handling many edge cases, offered flexibility, and achieved ambitious goals. Along the way, we encountered function literals, closures, pass by name, generics, view bounds, import statements, and (presently) implicits.

This was a lovely exercise because so many different aspects of Scala had to come together in harmony. It's clear that API designers must master these features to produce high quality code, but even casual programmers would do well to learn them.

Wednesday, June 17, 2009

Scala Using RAISIN

last time, we touched on RAISIN, and considered Java's inability to support this programming style to be an important deficiency of the language. We also promised to explore whether Scala could emulate the C# approach to deterministic destructors. We take up that challenge presently, and we're going to find that a wide variety of Scala features all come together to make this happen.

Implementing RAISIN is a little tougher than our Ruby "unless" modifier, where the task was pretty narrow and well understood. So before we begin, let's capture the goals we should set for emulating -- and surpassing -- the C# "using" syntax inside Scala.

Beautiful, readable code
Obliging the user to do very little
Handling multiple resources at once
Preventing stale objects from being accessed
Prefer immutable & avoid nulls
Intelligent exception handling
Flexible enough for arbitrary resources

Beautiful, readable code

This is always the prime directive. Suppose we had our FileHandle class, and we have to ge rid of its associated reource after we use it. We should tolerate nothing uglier than what we'd see in C#.


  // Scala wishful thinking
  //
  val handle = new FileHandle("myfile")
  using(handle) {
    // Either of the following methods might
    // throw, but that's okay.
    //
    handle.read
    handle.write(42)
  }

Obliging the user to do very little

We really want to avoid having to repeast all the try-finally scaffolding in the user's code, which Java would require. We also don't wan tht use to have to understand the details of how to free up the resources. Maybe something as simple as...

import csharp._

...should be sufficient to make the using syntax available to the programmer's code.

Handling multiple resources at once

Rather than nesting one using clause inside another, it would be nice to follow C#'s practice of allowing multiple resources inside one using statement. This also aligns with th functionality afforded by C++, in which we can put multiple objects on the stack inside the same block, illustrated below.


// C++
{
  FileHandle const h1 = // details omitted
  FileHandle const h2 = // details omitted

  // Use h1 and h2 freely here.  Even if the
  // construction of h2 failed, h1 still
  // gets released.  That's important
  //
}

Preventing stale objecgts from being accessed

This is an opportunity for our Scala solution to shine. Reconsidering our first example above, We'd like the handle to have the smallest possible scope.


  val handle = new FileHandle("myfile")
  using(handle) {
    // Either of the following methods might
    // throw, but that's okay.
    //
    handle.read
    handle.write(42)
  }

// It would be nice if we could somehow make the
// compiler prevent spurious accesses of the handle
// down here.  We want to deny access to disposed
// objects.

Prefer immutable & avoid nulls

We'd like to use val rather than var wherever we can. This is analogous to using Java final when declaring variables. We'd also like to be assured that the resource is constructed correctly, and not null.

These desires may may compel us to put the initialization, meaning the resource acquisition, somehow inside the using clause where it can be managed well.

Intelligent exception handling

It's a well known coding practice in C++ to code destructors so that they do not emit exceptions. However, no such convention exists for common Java classes. For example, the java.io.File.close method throws java.io.IOException. We need a way to handle such exceptions intelligently.

Flexible enough for arbitrary resources

In C++, any class can have a meaningful destructor, so previously designed classes can be used in the RAISIN style. In C#, we're constrained to use only classes that inherit from the IDisposable interface, and the cleanup has to be done in the dispose method.

This means that ordinary classes like java.io.File, which has a close method instead of a dispose method, will pose some difficulties when trying to wrap it in a C#-like "using" clause. Yet, Scala is powerful, and it's a reasonable goal to overcome these limitations.

Will all these goals in mind, let's not try to bite off too much at once. Last time, our zeroth cut defined a Disposable trait and a FileHandle that extends it. This time, we'll also want a using function that accepts a Disposable object and a block of code to be executed.


// First cut...
package csharp

object Using {
  def using[T <% Disposable](resource: T)(block: => Unit) {
    try {
      block
    }
    finally {
      resource.dispose
    }
  }
}

There's a lot going on in that method, so let's tease it apart carefully. First, it's a parameterized function, where the resource argument must be of type T. The <% notation is a view bound. It means that type T must inherit from Disposable or be transformable into Disposable by an implicit.

(It's not obvious yet why we need view bounds, or even an upper bound. This is just a little adumbration for how we're going to achieve some of our trickier goals, such as "preventing stale objects from being accessed," and "flexible enough for arbitrary resources." We won't get there in this post, but have patience.)

Second, the using method has two argument lists, rather than a single list of comma delimited arguments. Put another way, using is a curried function, as evidenced by two sets of parentheses instead of just one. This syntax allows the second argument to be a block of code in curly braces, rather than something inside using's parentheses.

Third, note that the arrow notation implies that the block is passed by name, not by value. This means that the code won't actually execute until block is called inside the try clause of the using method. It does not execute before using is entered.

Since our toy FileHandle class (defined in a previous post) inherits from Disposable, then we can write the following.


import csharp.Using._

object Main {

  def simple_usage = {
    val handle = new FileHandle("simple")
    using(handle) {
      handle.read
      handle.write(42)
    }
  }

  // details omitted

That's not bad for a first cut. We've achieved our first two goals, but we still have a long way to go in future posts to make progress on the others.

In summary, we've taken some steps towards implementing RAISIN in Scala, taking the C# using syntax as a model. Along the way, we've seen view bounds, curried functions, and pass-by-name. The latter two language features allow the user's code to be beautiful.

Wednesday, June 10, 2009

Software Development Process

A process is the collection of practices followed in an organization. it identifies the hats worn by people, and the artifacts they produce and consume. It names the responsibilities that the workers fulfill, and the workflows through which their artifacts pass. Also, a process likely includes at least some of the tools used, because automation is a big part of getting things done.

Examples of software development processes include RUP (Rational Unified Process), Scrum, and Waterfall. To make a coding analogy, one might argue that a certain project instantiates a development process just as an object instantiates a class.

A process not only reflects the activities of the participants, it also guides their efforts. however, keeping with the coding analogy, the humans are the virtual machine in which the process instance runs. Therefore, people are the heart of any process, and processes are always malleable. Even if a process purports to be rigid, it will not likely be followed for very long.

Processes can be documented, but a process description is no more a real process than a virus is a living cell.

The metaphor is apt. Practices are captured in memes. For example, champions of test driven development self identify as "test infected." Very few developers who have not actually tried TDD and seen that it changes the way code gets designed could have gleaned this effect just from reading a book.

A good process will reproduce, evolve, and spread its success far and wide. But just as some organisms can't live in some environments, the ecosystem has to be receptive to the practices embraced in a process for them to take root. There are no "best practices." Context is everything.

Successful processes arm decision makers with timely information, and offer guidance for resolving problems. As a corollary, the more empowered the workers are, the more freely available such information must be, because there are more decision makers shaping progress. The contrapositive also follows. Without transparency, success rests on the talents of just a few privileged individuals.

Useful processes permit the measurement of and influence over:

Quality
Costs
Progress
Growth

By Quality, of course we mean customer satisfaction. What's not quite so obvious is that many people in the organization wear the customer hat for various artifacts and services during development.

By Costs, we mean the fiduciary expenditures for salary, tools, training, hardware, and so on. (This is sometimes more difficult than it would appear, beause a single software effort could have multiple funders, each interested in different features being developed.)

By Progress, we mean the maturing of the artifacts, such as code, documentation, and models, into a consumable or sellable state. Often, the careful monitoring of progress is especially important to certain stakeholders.

By Growth, we mean the professional growth of the human beings who are developing goods. This includes skills improvement, job satisfaction, value to the organization, and contributions to the profession and the art.

Wednesday, June 3, 2009

Hey Scala, Finalize This

Years ago, when I moved from C++ to Java, I expected to miss a few things. My daily workhorses like templates & the STL were not available in the new environment. When it came to API design, which is really mini-language design, I could no longer rely on operator overloading. Even little efficiencies like inline functions were denied me.

But, it turns out, I didn't really long for any of those things as much as I anticipated. What I really missed, I mean what I felt no programmer could live without, was the deterministic destructor.

Bjarne Stroustrup champions RAISIN, Resource Acquisition IS INitialization. The idea is to represent acquired resources as class instances on the stack. So when those objects fall out of scope, a destructor fires and frees the associated resource.

A nice advantage of RAISIN is that it doesn't matter how you leave the scope. The code could simply return, or it could emit and exception. The burden falls on the class designer to remember to free up the resources. This is vastly superior to obliging every user of the class to remember to free the resources, in every place where it's used.

Note that we're not talking about manual memory management here. Resources include file handles and mutexes and database connections and whatnot. Acquisition occurs when the instance is initialized. Let's see an example.


//C++
void raisin_example(std::string name)
{
    FileHandle const handle = FileHandle(name);

    // use handle here, maybe some code throws
    // an exception, but that's okay
    //
}

All the cleanup work is done once and for all in FileHandle::~FileHandle(), and that always fires when the handle object falls out of scope. The simplicity and safety of he above contrasts sharply with Java.


// Java
void hardly_raisin(String name)
{
    FileHandle handle = null;
    try
    {
        handle = new FileHandle(name);

        // use handle here, maybe some code throws
        // an exception, but that's okay
        //
    }
    finally
    {
        if (null != handle) handle.close();
    }

Wow. That's a lot of scaffolding for something that every user has to remember to get right every single time. There's a lot of opportunity for things to go wrong here. We can't even make the handle const (or final), because it would then be out of scope of the finally clause. Java even gets worse if here are several resources, and we have to release them in the reverse order of which they were acquired.

Coming to Java required a profound shift in programming style. The lack of support for a good coding practice like RAISIN is one of the serious deficiencies of the language. I used to marvel at how much Java code I had written since leaving C+, as if hat was some kind of proof that RAISIN really wasn't so important. But now that I program in Scala, I shudder to think about how many lines of that Java code were just scaffolding.

Java's success despite this weakness says something about how important Java's strengths are. In other words, in the marketplace, garbage collection apparently trumps RAISIN. Thinking like a scientist, it's fun to speculate about what language features are more valuable than others, using trial by market as a grand laboratory.

The C# designers addressed this Java deficiency, after a fashion, by creating Disposable objects, and a convenient language syntax for cleaning them up.


// C#
void raisin_example(String name)
{
    FileHandle handle = new FileHandle(name);
    using(handle)
    {

        // use handle here, maybe some code throws
        // an exception, but that's okay
        //
    }
}

That's not too shabby. All we have to do is oblige the FileHandle class to inherit from the IDisposable interface and implement a dispose method, which executes at the end of the using clause. The clean up code that the C++ programmer would have to put into a destructor goes into the dispose method instead.

Like Java, Scala also lacks deterministic destructors. However, Scala is powerful enough to emulate the C# "using" syntax. Let's make a zeroth cut at this in Scala. We'll define a Disposable trait, which we'll use in subsequent posts.


// Disposable.scala
package csharp

trait Disposable {
  def dispose(): Unit
}

Finally, let's contrive a FileHandle class that extends csharp.Disposable. We'll use this in subsequent posts, too.


// FileHandle.scala
package raisin

class FileHandle(name: String) extends csharp.Disposable {

  // constructor acquires resource here
  //

  override def dispose: Unit = {
    // release resource here
    //
  }

  // Nice things you can do with a FileHandle
  //
  def read(): Int = { /* details omitted */ }
  def write(i: Int): Unit = { /* details omitted */ }
}

Last time, we showed how the Ruby unless modifier could be implemented in Scala. Next time, we'll take some steps towards bringing RAISIN to Scala.