Monday, November 23, 2009

AspectJ and Scala

What is the "atom" of software? If you consider an atom to be the smallest thing with which you can work, while continuing to do chemistry, then what's the software analogue?

My first thought was that an atom is a file. I can jar them up to make molecules, and string peptides of them together to make OSGi bundles. At some point the inorganic chemistry of programming becomes the protein-rich biochemistry of software engineering.

Or maybe an atom is the largest thing that, in isolation, can't possibly have a bug in it. Something like an instruction. Or maybe even a fully unit-tested class or method.

But, the history of the atom allows our analogy to grow richer, and weirder. Originally, atoms were a computational aid. They were discovered as a way to predict the outcomes of macroscopic chemical reactions. Even up until around 1905, there were still a handful of practicing chemists who didn't believe in atoms, except as a calculational tool.

But real they are, regardless of the intended meanings of the symbols chemists use to denote them. So, atoms feel more like aspects to me. Always there lurking in a program’s behavior, even if not represented using aspect syntax in the source code.

If I have a class implementing the public API of some library, then I log all the incoming calls. That logging is an aspect, even though I might have duplicated those slf4j calls in a dozen places. If I have code that takes care to release resources after I've acquired and used them, then that's another aspect. And if I've forgotten the finally clause somewhere, then that bug is a contaminant in the reactants, which makes my program behave differently than my chemistry equations would predict.

The trouble with hand-implemented aspects like repetitive logging calls or finally clauses -- even when you remember all of them -- goes beyond the biz logic pollution that they impose. All that duplicated code permits inconsistencies. For example, the log message in this method here looks a little different than the one over there.

And that's a bug escape. Because the log scraper that customer support is using, which you didn't even know about, is going to malfunction on that logging call that's only half a bubble off plumb.

It's a bit like isotopes of atoms. Not all carbon atoms are alike. You used a carbon-12 here, but whoops you used carbon-14 over there. And we know that one can decay on you. You used mostly protium, but here's a deuterium, so the heavy water that you made from it has measurably different physical properties (like boiling point), even though the chemical properties are the same.

Keeping with the analogy, hand-implemented aspects take you out of ordinary chemistry and force you to worry about nuclear and physical effects. It would be better to elevate aspects in the code to natively supported compiler constructs, like classes, so everybody is using the same isotopes.

That's why using AspectJ and Scala together tops my list of exciting things to do. I think of AspectJ as an external DSL that allows me to define pointcuts into my Scala code. Pretty much all my code, including the advice, continues to be written in Scala itself. The real virtue of AspectJ lies in the weaving.

And for some elegant work on internal DSL alternatives, refer to the paper by Daniel Spiewak and Tian Zhao about an AOP implementation in Scala.

So, rather than worrying about polluting my biz logic with code that better belongs in aspects, I'm now on guard against letting my biz logic leak into my aspects. And this is a much happier place to be.

Come to think of it, the false promise of object oriented programming was to offer reuse. This never really happened because classes are the wrong size to be reusable. Too small to be independently deployable, and too large to exclude application-specific implementation details. Instead, OO 's importance comes from the organizing principles it champions. But I wonder if aspects, devoid of custom biz logic, might take us closer to reusable software. Components and libraries are reusable in the large. But might some group (or period) of little aspect atoms be reusable in the small?

Thursday, October 29, 2009

Principled Concordion

There are a couple of ways to look at a hiatus from blogging. Either you are so successful that you rationalize you're too busy or important to reflect on all the wonderful things happening, or you've grown too slothful to find something exciting enough to share. I recently went to the No Fluff Just Stuff conference, and it has recharged my batteries, much as it did the last time I attended. Life offers only the palest excuses to avoid thoughtful introspection, or to fail to discover shareworthy things.

Analogies, Analogies


Lewis Carroll famously asked, why is a raven like writing desk? The question remains the archetypal example of a riddle deliberately concocted to have no solution. Nevertheless, I love contrived analogies for a couple of reasons. First, they are useful because they can communicate profound ideas economically. Second, they are whimsical and can warp the mind into discovering new ideas.

Which brings me to why developing software is like modeling a pendulum. To predict the future behavior of a simple harmonic oscillator, I need to know both the current position and the current velocity. Without two pieces of information, I can't solve the differential equation, and chart the pendulum bob's trajectory. Sorry, but there's just no getting around needing two values. Blame mathematics itself.

We often attack software projects like this. We figure out our current state and use it to predict where we need to go. This is a bit like taking a snapshot of the customer's expectations or requirements, and marching in the correct direction. The problem is, taking these measurements is really hard. And just a small error in requirements can lead to unsatisfied customers.

But, there's another way to solve differential equations. I still need two pieces of information, but they don't both have to be "initial" conditions. In a Dirichlet problem, I'm given an initial and a final position. From these, I can figure out the intermediate positions and the velocities.

We should (and the better among us do) develop software like this. By capturing requirements as stories, and expressing them as executable tests, we reduce our measurement errors. Moreover, our trajectory is anchored by the end condition, not mearly by initial guesses, so we're less likely to march off into the weeds.

Concordion

Consequently, I'm becoming increasingly enamored with Concordion. There are many descriptions of the tool available on the web, so I won't parrot them. Instead, I'd like to offer a different perspective, which I hope will not offend the Concordion community.

Concordion is an organizing principle, which helps one design acceptance tests and other tests of software. To be pedantic, it's actually an instantiation of such a principle, much as Smalltalk is an example of object oriented programming. My take on it is this: a family of automated tests deserves human-readable views into them, with appropriate encapsulation or elision of distracting details, such as execution order.

Concordion is often compared with FitNesse, but infrequently contrasted with it. FitNesse drives tests. I can go to a web page, push a button, and see my test run. Concordion, however, is a view into tests. I go to a web page to see results, which probably came from a continuous integration server. This difference is profound.

You can find a dozen books on object oriented programming, particularly the older ones, that sing the praises of OO because it permits code reuse. In the real world, reuse turns out to be the least compelling reason to embrace object oriented programming. The real value of OO principles lie in the improved organization of the resulting code. We mean "improved" here for human readability, not necessarily performance or computer efficiency.

Analogously, you can find many books about automated testing and the virtues it brings to software development. But a neglected advantage of good automation is that it offers ways to organize tests. Well presented tests are superior expressions of requirements.

With Concordion, I can design web page views into my tests. I leave many details, such as the order in which tests run, to my continuous integration server. For example, tests with similar setup requirements can be grouped together. But I can organize the presentation of the results any way that I want. For example, tests can be organized by sprint, or by module, or by cross-cutting feature.

Concordion makes software development look more like a Dirichlet problem, where I can keep the end in mind from the very beginning. Thinking of Concordion not as a tool, but as a principle, will shape how I program. And I still have much to learn about how one does that well.

Wednesday, June 24, 2009

More Scala Using RAISIN

Last time, we offered a minimally functional emulation of C#'s using syntax, to manage resources elegantly in Scala. We defined a curried function, whose second argument was a simple block of code. We'll refine that approach and try to bring about the remaining goals we set for ourselves for this feature.

def using[T <% Disposable]
(resource: T)(block: => Unit) = {
try {
block
}
finally {
resource.dispose
}
}

One problem with our first cut was that the object encapsulating the managed resource had a larger scope than we wanted. Since we constructed our FileHandle instance outside of the block that used it, one could accidentally access it after it had been disposed.

val handle = new FileHandle("trouble")
using(handle) {
handle.read
handle.write(42)
}
// big trouble below!
handle.read

What we really need is not to pass a Unit into the using function, but a function that accepts the resource as its argument. In other words, we'd like to be able to make a useful function and pass that as an argument into the using method

def useful_function(handle: FileHandle): Unit = {
handle.read
handle.write(42)
}

// pseudo-code to capture the idea
//
using(new FileHAndle("good"), useful_function)

That's the gist of what we want to do, but we don't want all the cruft of declaring the useful function separately. Happily, Scala allows us to use function literals to write the above very economically.

using(new FileHandle("good")) { handle =>
handle.read
handle.write(42)
}
//
// handle is not visible down here and
// can't be abused, Yay

For this to work, we have to refine our using method. All we have to do is change the second argument from type Unit to the function T => Unit, and make sure to call the block with the expected T resource.

def using[T <% Disposable]
(resource: T)(block: T => Unit) {
try {
block(resource)
}
finally {
resource.dispose
}
}

Our using function is pretty powerful now. Without any modifications, it works with closures as well as function literals. Let's alter the client code a bit to demonstrate. The following is a closure and not a function literal because i is not defined inside the curly braces demarking the code passed into using.

def demonstrate_closure(i: Int) = {
using (new FileHandle("simple")) { handle =>
handle.read
handle.write(i)
}
}

Still, there are additional things we can do in the body of our using method. For example, we could take special action if the resource passed in were null. Alternatively, we could wrap the dispose calls inside a try-catch block to prevent them from emitting exceptions.

C++ uses compile-time overloading to choose different behaviors for some functions. For example, the new operator comes in different overloaded flavors. One takes a throwaway argument of type nothrow_t to indicate that the desired version of new will return NULL when it fails, instead of throwing an exception.

In Scala, a tried and true way to choose different behaviors at compile time is by the import statements. For example, if you want a mutable Set in Scala, you

import scala.collection.mutable.Set

This inherits from the same Set trait as the immutable version, so the logic where the class is used is clean. Although the C++ nothrow_t concept is interesting, Scala's approach appears to have a better separation of concerns, and results in uncluttered code.

If we are so inclined, we can do something analogous with our using method. We could choose to import from one package where the implementation swallows Throwables emitted by dispose. Or, we could import from another where they are allowed to propagate. In other words, we can handle exceptions quite intelligently, and customize our behavior depending on context.

Finally, let's consider whether we can avoid needing to nest using clauses, and manage the disposal of multiple resources more elegantly. This is possible, but there's one important subtlety that we have to worry about.

def using[T <% Disposable, U <% Disposable]
(resource: T, _resource2: => U)(block: (T,U) => Unit) {
try {
val resource2 = _resource2
try {
block(resource, resource2)
}
finally {
resource2.dispose
}
}
finally {
resource.dispose
}
}

Note that the _resource2 argument is passed by name, and not by value. We don't actually access it until declaring the val resource2 inside the outer try block. This means that if the construction of resource2 fails, we will still call dispose on the other resource.

Let's demonstrate this. Suppose our first resource object constructs okay, but the second one throws an exception in its constructor. This is standard behavior for a RAISIN class, which disallows partially constructed instances.

def two_resources() = {
using (new FileHandle("okay"), new FileHandle("bad")) {
(first, second) =>
second.write(first.read)
}
}

If that second FileHandle constructor fires before entering the using method, then we have a resource leak! The first FileHandle is never disposed. But, because we pass the second argument by name, the second constructor does not fire before entering the using method. We're essentially passing a pointer to the constructor into the using function, who calls it.

Why pass just the second one by name and not the first one? Did we just get lucky? No. Scala evaluates its arguments from left to right.

A consequence of this choice is that we cannot access the _resource2 argument more than once inside the using method. Note that it's accessed exactly once when defining the val resource2. Otherwise, the constructor would be called again and again inside the using method. That would be an even worse resource leak, and would probably malfunction.

We've now shown that our C# emulation meets all but one of our goals. This is impressive because the Scala behavior is superior even to C# itself, for example with regard to limiting the scope of variables. The remaining goal is to demonstrate how our using construct can work with legacy classes such as java.io.File that do not extend Disposable. We'll take up this cause in the near future, after a detour into some decidedly non-standard C++. But the punchline is, we had the foresight to use view bounds and not upper bounds, so we're well prepared.

In summary, we've shown how to emulate the C# using syntax in Scala, to enable RAISIN style programming. We were remarkably successful at bullet-proofing our resource management with surprisingly few lines of code. We handling many edge cases, offered flexibility, and achieved ambitious goals. Along the way, we encountered function literals, closures, pass by name, generics, view bounds, import statements, and (presently) implicits.

This was a lovely exercise because so many different aspects of Scala had to come together in harmony. It's clear that API designers must master these features to produce high quality code, but even casual programmers would do well to learn them.

Wednesday, June 17, 2009

Scala Using RAISIN

last time, we touched on RAISIN, and considered Java's inability to support this programming style to be an important deficiency of the language.  We also promised to explore whether Scala could emulate the C# approach to deterministic destructors.  We take up that challenge presently, and we're going to find that a wide variety of Scala features all come together to make this happen.

Implementing RAISIN is a little tougher than our Ruby "unless" modifier, where the task was pretty narrow and well understood.  So before we begin, let's capture the goals we should set for emulating -- and surpassing -- the C# "using" syntax inside Scala.
  • Beautiful, readable code
  • Obliging the user to do very little
  • Handling multiple resources at once
  • Preventing stale objects from being accessed
  • Prefer immutable & avoid nulls
  • Intelligent exception handling
  • Flexible enough for arbitrary resources
Beautiful, readable code

This is always the prime directive.  Suppose we had our FileHandle class, and we have to ge rid of its associated reource after we use it.  We should tolerate nothing uglier than what we'd see in C#.

// Scala wishful thinking
//
val handle = new FileHandle("myfile")
using(handle) {
// Either of the following methods might
// throw, but that's okay.
//
handle.read
handle.write(42)
}

Obliging the user to do very little

We really want to avoid having to repeast all the try-finally scaffolding in the user's code, which Java would require.  We also don't wan tht use to have to understand the details of how to free up the resources.  Maybe something as simple as...
import csharp._
...should be sufficient to make the using syntax available to the programmer's code.

Handling multiple resources at once

Rather than nesting one using clause inside another, it would be nice to follow C#'s practice of allowing multiple resources inside one using statement.  This also aligns with th functionality afforded by C++, in which we can put multiple objects on the stack inside the same block, illustrated below.

// C++
{
FileHandle const h1 = // details omitted
FileHandle const h2 = // details omitted

// Use h1 and h2 freely here. Even if the
// construction of h2 failed, h1 still
// gets released. That's important
//
}

Preventing stale objecgts from being accessed

This is an opportunity for our Scala solution to shine.  Reconsidering our first example above, We'd like the handle to have the smallest possible scope.

val handle = new FileHandle("myfile")
using(handle) {
// Either of the following methods might
// throw, but that's okay.
//
handle.read
handle.write(42)
}

// It would be nice if we could somehow make the
// compiler prevent spurious accesses of the handle
// down here. We want to deny access to disposed
// objects.

Prefer immutable & avoid nulls

We'd like to use val rather than var wherever we can.  This is analogous to using Java final when declaring variables.  We'd also like to be assured that the resource is constructed correctly, and not null.

These desires may may compel us to put the initialization, meaning the resource acquisition, somehow inside the using clause where it can be managed well.

Intelligent exception handling

It's a well known coding practice in C++ to code destructors so that they do not emit exceptions.  However, no such convention exists for common Java classes.  For example, the java.io.File.close method throws java.io.IOException.  We need a way to handle such exceptions intelligently.

Flexible enough for arbitrary resources

In C++, any class can have a meaningful destructor, so previously designed classes can be used in the RAISIN style.  In C#, we're constrained to use only classes that inherit from the IDisposable interface, and the cleanup has to be done in the dispose method.

This means that ordinary classes like java.io.File, which has a close method instead of a dispose method, will pose some difficulties when trying to wrap it in a C#-like "using" clause.  Yet, Scala is powerful, and it's a reasonable goal to overcome these limitations.

Will all these goals in mind, let's not try to bite off too much at once.  Last time, our zeroth cut defined a Disposable trait and a FileHandle that extends it.  This time, we'll also want a using function that accepts a Disposable object and a block of code to be executed.

// First cut...
package csharp

object Using {
def using[T <% Disposable](resource: T)(block: => Unit) {
try {
block
}
finally {
resource.dispose
}
}
}

There's a lot going on in that method, so let's tease it apart carefully.  First, it's a parameterized function, where the resource argument must be of type T.  The <% notation is a view bound.  It means that type T must inherit from Disposable or be transformable into Disposable by an implicit.

(It's not obvious yet why we need view bounds, or even an upper bound.  This is just a little adumbration for how we're going to achieve some of our trickier goals, such as "preventing stale objects from being accessed," and "flexible enough for arbitrary resources."  We won't get there in this post, but have patience.)

Second, the using method has two argument lists, rather than a single list of comma delimited arguments. Put another way, using is a curried function, as evidenced by two sets of parentheses instead of just one.  This syntax allows the second argument to be a block of code in curly braces, rather than something inside using's parentheses.

Third, note that the arrow notation implies that the block is passed by name, not by value.  This means that the code won't actually execute until block is called inside the try clause of the using method.  It does not execute before using is entered.

Since our toy FileHandle class (defined in a previous post) inherits from Disposable, then we can write the following.


import csharp.Using._

object Main {

def simple_usage = {
val handle = new FileHandle("simple")
using(handle) {
handle.read
handle.write(42)
}
}

// details omitted

That's not bad for a first cut.  We've achieved our first two goals, but we still have a long way to go in future posts to make progress on the others.

In summary, we've taken some steps towards implementing RAISIN in Scala, taking the C# using syntax as a model.  Along the way, we've seen view bounds, curried functions, and pass-by-name.  The latter two language features allow the user's code to be beautiful.

Wednesday, June 10, 2009

Software Development Process

A process is the collection of practices followed in an organization.  it identifies the hats worn by people, and the artifacts they produce and consume.  It names the responsibilities that the workers fulfill, and the workflows through which their artifacts pass.  Also, a process likely includes at least some of the tools used, because automation is a big part of getting things done.

Examples of software development processes include RUP (Rational Unified Process), Scrum, and Waterfall.  To make a coding analogy, one might argue that a certain project instantiates a development process just as an object instantiates a class.

A process not only reflects the activities of the participants, it also guides their efforts.  however, keeping with the coding analogy, the humans are the virtual machine in which the process instance runs.  Therefore, people are the heart of any process, and processes are always malleable.  Even if a process purports to be rigid, it will not likely be followed for very long.

Processes can be documented, but a process description is no more a real process than a virus is a living cell.

The metaphor is apt.  Practices are captured in memes.  For example, champions of test driven development self identify as "test infected."  Very few developers who have not actually tried TDD and seen that it changes the way code gets designed could have gleaned this effect just from reading a book.

A good process will reproduce, evolve, and spread its success far and wide.  But just as some organisms can't live in some environments, the ecosystem has to be receptive to the practices embraced in a process for them to take root.  There are no "best practices."  Context is everything.

Successful processes arm decision makers with timely information, and offer guidance for resolving problems.  As a corollary, the more empowered the workers are, the more freely available such information must be, because there are more decision makers shaping progress.  The contrapositive also follows.  Without transparency, success rests on the talents of just a few privileged individuals.

Useful processes permit the measurement of and influence over:
  • Quality
  • Costs
  • Progress
  • Growth
By Quality, of course we mean customer satisfaction.  What's not quite so obvious is that many people in the organization wear the customer hat for various artifacts and services during development.

By Costs, we mean the fiduciary expenditures for salary, tools, training, hardware, and so on.   (This is sometimes more difficult than it would appear, beause a single software effort could have multiple funders, each interested in different features being developed.)

By Progress, we mean the maturing of the artifacts, such as code, documentation, and models, into a consumable or sellable state.  Often, the careful monitoring of progress is especially important to certain stakeholders.

By Growth, we mean the professional growth of the human beings who are developing goods.  This includes skills improvement, job satisfaction, value to the organization, and contributions to the profession and the art.

Wednesday, June 3, 2009

Hey Scala, Finalize This

Years ago, when I moved from C++ to Java, I expected to miss a few things. My daily workhorses like templates & the STL were not available in the new environment. When it came to API design, which is really mini-language design, I could no longer rely on operator overloading. Even little efficiencies like inline functions were denied me.

But, it turns out, I didn't really long for any of those things as much as I anticipated. What I really missed, I mean what I felt no programmer could live without, was the deterministic destructor.

Bjarne Stroustrup champions RAISIN, Resource Acquisition IS INitialization. The idea is to represent acquired resources as class instances on the stack. So when those objects fall out of scope, a destructor fires and frees the associated resource.

A nice advantage of RAISIN is that it doesn't matter how you leave the scope. The code could simply return, or it could emit and exception. The burden falls on the class designer to remember to free up the resources. This is vastly superior to obliging every user of the class to remember to free the resources, in every place where it's used.

Note that we're not talking about manual memory management here. Resources include file handles and mutexes and database connections and whatnot. Acquisition occurs when the instance is initialized. Let's see an example.

//C++
void raisin_example(std::string name)
{
FileHandle const handle = FileHandle(name);

// use handle here, maybe some code throws
// an exception, but that's okay
//
}

All the cleanup work is done once and for all in FileHandle::~FileHandle(), and that always fires when the handle object falls out of scope. The simplicity and safety of he above contrasts sharply with Java.

// Java
void hardly_raisin(String name)
{
FileHandle handle = null;
try
{
handle = new FileHandle(name);

// use handle here, maybe some code throws
// an exception, but that's okay
//
}
finally
{
if (null != handle) handle.close();
}

Wow. That's a lot of scaffolding for something that every user has to remember to get right every single time. There's a lot of opportunity for things to go wrong here. We can't even make the handle const (or final), because it would then be out of scope of the finally clause. Java even gets worse if here are several resources, and we have to release them in the reverse order of which they were acquired.

Coming to Java required a profound shift in programming style. The lack of support for a good coding practice like RAISIN is one of the serious deficiencies of the language. I used to marvel at how much Java code I had written since leaving C+, as if hat was some kind of proof that RAISIN really wasn't so important. But now that I program in Scala, I shudder to think about how many lines of that Java code were just scaffolding.

Java's success despite this weakness says something about how important Java's strengths are. In other words, in the marketplace, garbage collection apparently trumps RAISIN. Thinking like a scientist, it's fun to speculate about what language features are more valuable than others, using trial by market as a grand laboratory.

The C# designers addressed this Java deficiency, after a fashion, by creating Disposable objects, and a convenient language syntax for cleaning them up.

// C#
void raisin_example(String name)
{
FileHandle handle = new FileHandle(name);
using(handle)
{

// use handle here, maybe some code throws
// an exception, but that's okay
//
}
}

That's not too shabby. All we have to do is oblige the FileHandle class to inherit from the IDisposable interface and implement a dispose method, which executes at the end of the using clause. The clean up code that the C++ programmer would have to put into a destructor goes into the dispose method instead.

Like Java, Scala also lacks deterministic destructors. However, Scala is powerful enough to emulate the C# "using" syntax. Let's make a zeroth cut at this in Scala. We'll define a Disposable trait, which we'll use in subsequent posts.

// Disposable.scala
package csharp

trait Disposable {
def dispose(): Unit
}

Finally, let's contrive a FileHandle class that extends csharp.Disposable. We'll use this in subsequent posts, too.

// FileHandle.scala
package raisin

class FileHandle(name: String) extends csharp.Disposable {

// constructor acquires resource here
//

override def dispose: Unit = {
// release resource here
//
}

// Nice things you can do with a FileHandle
//
def read(): Int = { /* details omitted */ }
def write(i: Int): Unit = { /* details omitted */ }
}

Last time, we showed how the Ruby unless modifier could be implemented in Scala. Next time, we'll take some steps towards bringing RAISIN to Scala.

Wednesday, May 27, 2009

Ruby Unless Scala

Now that I'm programming primarily in Scala, I find myself missing a couple of cool tricks that Ruby offers. For example, it's neat to be able to put "unless" modifiers at the end of a line. Even if you've never seen Ruby (or Perl) before, it's easy to guess what the following code does.

# Ruby
print total unless total.zero?

It should be no surprise that if the total is zero, then nothing happens. But if the total is not zero, then it's printed.

To my knowledge, Scala has no such concept. However, with all the power Scala offers for creating internal DSLs, it might be fun to try to emulate this syntax. This post is more about demonstrating what can be done with Scala than it is about championing the use of unless modifiers in one's code.

My first attempt failed. I thought I would create a RichBoolean, analogous to a RichInt, so that I could effectively add some methods onto the Boolean class. My new class needed an "unless" method, but it had to be right associative. So, borrowing the trick use in the cons operator, it would have to end in a colon.

class RichBoolean(b: Boolean) {
def unless_:(block: => Unit) {
if (!b) block
}
}

implicit def booleanToRichBoolean(b: Boolean) = {
new RichBoolean(b)
}

There's a lot going on up there, so let's try to tease it apart and explain it. The RichBoolean class is basically a wrapper around the Boolean class. We've effectively added an "unless_:" method to that class. The implicit function tells the compiler to convert a Boolean into a RichBoolean whenever it appears that someone is trying to call an "unless_:" method on it.

This is the standard Scala way to add methods to a class. Some languages are more open, and allow the addition of methods directly after the class is defined. Scala offers the same freedom, but with better control. Unless you're importing the booleanToRichBoolean function, you don't get the automatic conversion. I know that some folks are nervous about implicits. But because of this control, I find them safer than open classes.

Another noteworthy feature of the above is the arrow symbol. This implies that the block is being passed into the "unless_:" method by name, and not by value. In other words, we hope that the block doesn't get evaluated before "unless_:" executes, but only inside of that method when b is false.

Passing by name is a remarkably powerful language feature. To the imperative programmer, it might seem like passing by reference in C++ or Fortran, but it's actually more subtle. We're not passing the address of the result of some calculation. We're actually passing a pointer to the code that computes the result. Methods that accept pass-by-name have the option to skip the calculation entirely, when that makes sense. Consider how efficient that can make a logging API!

Finally, Scala forces us to include the underscore in the "unless_:" method name, so that there's no ambiguity about whether the colon symbol is part of the lexeme. There's an important lesson here that's more widely applicable than this example. Never end lexemes with underscores. If they happen to wind up next to a colon, they may run into trouble.

I tried to test out this code with a little function. I could pass either true or false into it and see what happened. It compiled fine. It just didn't do what I expected.

def demonstrate_ruby_syntax(flag: Boolean) = {
println("flag is " + flag) unless_: flag
}

Sure enough, flag gets promoted to a RichBoolean, and the infix "unless_:" fires. But no matter whether I pass in true or false, the println always executes. This is a little surprising because we passed block by name and not by value.

I'm at a bit of a loss to explain this. A little instrumenting showed that the block containing the println statement is executing outside the "unless_:" method, and not inside it.

By way of comparison, suppose we used the cons operator (::) to construct a List[Int] as follows...

val list = 1 / 0 :: Nil

This blows up because of the divide by zero, but the stack trace reveals that the exception occurs before getting into the cons method. However, the cons method scaladocs say that the argument is passed by value, not by name, so we'd expect exactly that here.

So, unable to make my RichBoolean idea work, I next tried to put a wrapper class around the block itself. This had the advantage of letting me get rid of the colon cruft on the unless method name. I also don't think it's any more dangerous, despite the implicit, because the unless method signature admits only a Boolean.

package ruby

object Unless {

class UnlessClass(block: => Unit) {
def unless(b: Boolean) = {
if (!b) block
}
}

implicit def
unitToUnlessClass(block: => Unit): UnlessClass = {
new UnlessClass(block)
}
}

Happily, this approach worked. It also demonstrates a neat fact. The implicit function accepts block by name, and the UnlessClass constructor does too. Yet, block doesn't execute until the unless method is called with a false argument. This means that the Scala compiler is smart enough to let the by-name cascade through (at least) two calls.

All I have to do now is...

import ruby.Unless._

// details omitted...

def demonstrate_ruby_syntax(flag: Boolean) = {
println("flag is " + flag) unless flag
}

... and my Ruby-esque unless modifier syntax works as expected. The printing only occurs when the flag is false.

There's one more enhancement we can consider. Our code only compiled because println returns Unit. But what if we had some other routine that returned some other type? In such a case, we're relying on the side effects, and not the computational result of the function. This is an imperative rather than functional style, but since Scala lives in both worlds, it would still be nice to be able to use the unless modifier syntax. Consider the following contrived example.

def myfunc(flag: Boolean): Int = {
println("myfunc flag is " + flag)
42
}

Happily, generics can come to our rescue. By parameterizing our UnlessClass, we can implicitly convert to it from arbitrary types.

class UnlessClass[T](block: => T) {
def unless(b: Boolean): Unit = {
if (!b) block
}
}

implicit def
toUnlessClass[T](block: => T): UnlessClass[T] = {
new UnlessClass[T](block)
}

Note that our new unless method still returns Unit because we only use this construct where the return value of methods like myfunc are deliberately discarded.

def demonstrate_ruby_syntax(flag: Boolean) = {
myfunc(flag) unless flag
}

In summary, by emulating the Ruby unless modifier, we've demonstrated a few of the Scala language features that allow rich DSLs to be created. Along the way we learned about right associativity, implicits, passing by name, and generics.

Wednesday, May 20, 2009

Overriding Scala def With val

Last time, we created a little toy class hierarchy to demonstrate Scala injection. We also illustrated how Scala's powerful type system can keep us out of trouble. This time, we're going to explore some design tradeoffs that emerge from choosing Scala def or Scala val.

To review, we created an abstract base class that stores masses, and always reports their values in kilograms. We extended that with immutable classes that are initialized with values in various units.

abstract class Mass {
def kilograms: double
}

class Kilograms(kg: Double) extends Mass {
def kilograms = kg
}

class Grams(grams: Double) extends Mass {
def kilograms = grams / 1000.0
}

Suppose that the calculation to convert grams into kilograms was difficult and lengthy. Then the Grams implementation of the kilograms method might get us into trouble, because we'd be repeating that work needlessly every time it was called.

class Grams(grams: Double) extends Mass {
def kilograms: Double = {
Thread.sleep(5000) // pretend to think hard
grams / 1000.0
}
}

The above class constructs instantly, but every time somebody calls the kilograms method on an instance, it takes a long time. This is sad because Grams is immutable. We'd like some way to save the output of the calculation instead of the input.

Let's use the javap tool to peer into what Scala is doing under the hood. The constructor argument grams is called a class parameter in Scala-ese. Class parameters used outside of constructors, as grams is used in the kilograms method, become full fledged private fields of the class. Consider the following (edited) snippet.

$ javap -private Grams
Compiled from "Grams.scala"
public class Grams extends Mass
private final double grams;
public Grams(double);
public double kilograms();

Amazingly, Scala allows us to override the abstract "def kilograms" in mass with a "val kilograms" in Grams. This is a lovely language feature, but it's worth spending a little energy to understand what's going on under the hood.

Let's change our kilograms def into a val in our derived classes. The following class is slow to construct, but each call to kilograms completes instantly.

class Grams(grams: Double) extends Mass {
val kilograms: double = {
Thread.sleep(5000) // pretend to think hard
grams / 1000.0
}
}

Take a moment to digest the tradeoff. The first version is small in memory, containing only one double field, the grams class parameter. It constructs quickly, but each call to kilograms takes a long time. The second version constructs slowly, but all calls to kilograms are quick. We would prefer the first design if we expect the users of the class to call kilograms no more than once, and the second design if we expect the users to call kilograms multiple times on each Grams instance.

In the second design, the grams class parameter appears to be used nowhere but in the constructor itself when the "val kilograms" is defined. So, one might expect that it will not become a real field in the Grams class. Trusty javap confirms this suspicion. Consider the following (again edited) snippet.

$ javap -private Grams
Compiled from "Grams.scala"
public class Grams extends Mass
private final double kilograms;
public Grams(double);
public double kilograms();

Note that under the hood, despite being declared a val in the Scala source code, kilograms is also a method. A moment's reflection(no pun intended) will tell us that it has to be a method. Grams is a concrete class that extends an abstract class with a pure virtual kilograms method. So even thought the Scala source hides it, kilograms is still a method of Grams.

What is that public kilograms method up to? Again we appeal to javap, and learn that it's doing nothing except returning the double stored in the private kilograms field. Just as we might have expected.

public double kilograms();
Code:
Stack=2, Locals=1, Args_size=1
0: aload_0
1: getfield #30; // Field kilograms:D
4: dreturn

The above is much shorter than the previous version, which performed the expensive calculation. Again, we conclude that the criteria to prefer one design over the other rests on the expected usage patterns of our class, as explored above.

We should also ask ourselves whether it's possible to delay the expensive calculation, possibly indefinitely, in case it's never needed. This third design would represent the classic programming tradeoff between space and time, and we'll take it up in a later post.

In summary, we've seen that it's possible to override a Scala def with a Scala val. Under the hood, the override is still implemented by a method. The javap tool is very useful to help us figure out what's going on, and one would do well to understand the design tradeoffs of each approach. Scala's marriage of object oriented programming with functional programming is made in heaven. We can use inheritance and exploit immutability, enjoying the flexibility to make considered design choices.

Wednesday, May 13, 2009

Sweet Scala Injection

Before getting back to non-final finals, let's consider a fun diversion. The second most famous equation in Physics is Newton's second law of motion, F = ma. When you apply a force F to an object of mass m, it accelerates at rate a.

Of course, your arithmetic only gives you the right answer if you're consistent in the measurement system you pick. There are two major systems of units in use. One is the metric system or SI (System International), formerly called the mks system. The letters stand for meter, kilogram, and second, which are the principal units used to measure length, mass, and time.

The other main system in use is also the metric system. (Gotcha.) It's called the cgs system, whose letters stand for centimeter, gram, and second. You have to take care to keep your units straight to use formulas like F = ma. The units for force are named newtons in the mks system and dynes in the cgs system. But if you multiply a gram times an acceleration recorded in meters per second per second, you'll get neither a newton nor a dyne.

The recent loss of the Mars Polar Lander is a painful demonstration that units really matter.

Since I like to blog about how thinking like a scientist makes me a better coder, I'll mention that it's unnatural to think, "oh, this book weighs two." Such a sentence might be grammatically correct, but without specifying the units, it's meaningless.

Scala has an especially thoughtful type system, and we can press it into service to keep our units straight when we do calculations. In this (and the next) post, we'll create a toy program, in illustrate one or two Scala goodies.

Kilograms and grams both measure mass. It's not too much of a stretch to use the "is-a" relationship in an object oriented language to capture this notion. In what follows, Kilograms and Grams inherit from Mass.

absract class Mass {
def kilograms: Double
}

class Kilograms(kg: Double) extends Mass {
def kilograms = kg
}

class Grams(grams: Double) extends Mass {
def kilograms = grams / 1000.0
}

Our base class has a kilograms method that returns the amount of mass in the mks units. All our calculations will be done in mks units, but the programmer is free to initialize a mass variable with either kilograms or grams.

Now let's construct a Force class. In a full-fledged example, we'd probably make it an abstract class extended by Newtons and Dynes. But we don't need such a complete solution here to demonstrate the ideas. Give the class an accelerates method, which tells how much the given force in newtons will accelerate a specified mass.

class Force(newtons: double) {
def accelerates(mass: Mass) =
(newtons / mass.kilograms) + " meters per sec^2"
}

Note that the accelerates method doesn't care whether it's passed a value in kilograms or in grams. All it's demanding is a mass, and since that offers a method to take us into mks-land, we can assuredly report our acceleration in meters per second per second.

Now, let's define a force of half a newton, and run a little program to see how much this force will accelerate a couple of masses. In each case below, there's no ambiguity about whether each mass is expressed in kilograms or grams, because the units are explicitly specified.

object MyApp extends Application {
val force = new Force(0.5)
println(force accelerates (new Kilograms(4.0)))
println(force accelerates (new Grams(100)))
//
// "0.125 meters per sec^2"
// "5.0 meters per sec^2"
}

The parentheses around the "new Kilograms(4.0)" are actually redundant, but that might surprise a Java programmer. Scala also lets us omit the dot between force and accelerates, which arguably improves readability.

So, the above works, but specifying "new Kilograms" everywhere I need to define a mass is a hassle. More importantly, it hurts readability, because there is no "new" anywhere in my mental model of the F = ma equation.

Fortunately, Scala offers injections, which can pretty up the source code. In C++, I can construct an instance on the stack without calling new. Although all instances in Scala live on the heap, I find the syntax reminiscent of C++ constructors.

We want to be able to write "Kilograms(4.0)" instead of "new Kilograms(4.0)" when we use our concrete Mass classes. To do this, create a Scala companion object of the same name as the class, and give it an apply method.

object Kilograms {
def apply(kg: Double) = new Kilograms(kg)
}

object Grams {
def apply(grams: Double) = new Grams(grams)
}

These functions are called injections. Basically, they are factory methods on the companion objects, but we don't need to call apply explicitly. This is the same syntactic sugar that allows us to write "List(1, 2, 3)" instead of "new List(1, 2, 3)". It pretties up our code nicely.

println(force accelerates Kilograms(4.0))
println(force accelerates Grams(100))

Note that we have made a tradeoff for this sweetness. We had to write more code (the injections) when defining our classes, so we could make life easier on the users of the classes. However, this is almost always the way to go. Readability is important.

Readability is also the reason that the accelerates method takes a Mass instance and not a plain Double. The extra word "Kilograms" or "Grams" doesn't help the computer, but it does help the human.

(However, the astute reader will have noticed that the kilograms method of the Grams class is inefficient. It performs a double precision floating point calculation every time it is called, even though the instance itself is immutable. If only there were a way to save the result of the calculation instead of the inputs, then we could run faster without worsening our memory footprint. Contemplating this is a topic for another day.)

In conclusion, tastefully applied Scala injections enhance readability. And they're more digestible than Martian soil coming towards you at a rate of, uhm, really fast.

Wednesday, May 6, 2009

And That's Final, Not!

This post is about C++ and Java, but it really offers some necessary background material to explore an interesting issue facing Scala. What follows is hopefully widely known by C++ and Java veterans, but it's still worth reviewing here so that we're all on the same page when we talk about Scala in the near future.

C++ fans are often encouraged not to use #defines for their constants, in part because the preprocessor has no notion of types.  For example, Scott Meyers champions this idea.  Instead of writing...
#define PI 3.14159

...which performs a simple textual substitution everywhere in the source file, the following is usually preferred:
const double PI = 3.14159;

The latter alternative helps the compiler supply more meaningful error messages. If the preprocessor were used instead, then the compiler has never heard of the lexeme "PI", and can't include it in any error messages.

We don't have a preprocessor in Java. We also lack a usable const keyword. Instead we use final to describe variables whose values will not change. Unfortunately, just as static has multiple meanings in C++, final has multiple meanings in Java.

In C++, methods are non-virtual by default, and must be given a special keyword, virtual, to denote that they are polymorphic. The Java philosophy is different. In Java, methods are virtual by default, and must be given a special keyword, final, to denote that they can not be overridden. So this keyword pulls double duty in Java: for methods final means non-virtual, and for fields it implies constant.

Another difference with C++ is that we don't have standalone variables in Java. We put them inside a class as below. In order to explore the issue at the heart of this blog entry, we deliberately do not make the field below static.

public class MyClass extends YourClass
{
public final double PI = 3.14159;
//
//... details omitted
}

A nearly equivalent way to define PI would be in a constructor. It's noteworthy that the final fields of a class can only be defined where they are declared, or in a constructor. A "set" method to change a final field would not compile.

public class MyClass extends YourClass{
public final double PI;public MyClass()
{
PI = 3.14159;
}
//
//... details omitted
}

At first glance, the two ways of defining the final PI in Java appear equivalent. But they are not. In fact, they are different in a crucial way that we'll explore in a subsequent post. Programmers that don't understand when final doesn't really mean final risk writing programs with undesired behavior.

In case you're on an interview...


A standard interview question is to ask a candidate to contrast inheritance in C++ and Java. The expected answer includes something like, "Well, C++ has multiple inheritance and Java doesn't."

But there's another difference, and folks who make the following observation display a valuable insight into the differences between the languages. "Well, I can truly call a virtual function from a Java constructor, but I can only appear to call a virtual function from a C++ constructor."

Let's digest this statement. If I try to call a virtual function in a C++ base class, I'm only going to get the base class's version, not the derived class's version. In other words, it's forbidden for a base class constructor to peer down into the code of a class that inherits from it.

// C++
class Base
{
public:
virtual void f();
Base();
};
void Base::f() { cout << "Base" << endl; }

The rules of C++ deny the implementation Derived::f from being executed within Base::Base. In other words, f does not behave as a virtual function when called within a constructor.

But in Java...


Such behavior contrasts sharply with Java. In Java, the derived class's implementation does get executed. Consider the ostensibly equivalent program below.

// Java
public class Base
{
public void f() { System.out.println("Base"); }
public Base() { f(); }
}

public class Derived extends Base
{
@Override public void f()
{
System.out.println("Derived");
}
}

public static void main(String[] args)
{
new Derived();
//
// "Derived" gets printed, not "Base"
}

Java's approach might seem to be an advantage, but it comes with a hefty price. If the Derived class has a constructor, it fires after the base class constructor. That means that the f method of the derived class executes before the derived class constructor.

Reread that and let it sink in. It implies that if the derived class constructor has any initializations to perform or invariants to enforce before Derived::f fires, then we're in trouble. Let's demonstrate this with an example.

public abstract class Abstract
{
public Abstract() { showPi(); }
public abstract void showPi();
}

public class Concrete extends Abstract
{
final double PI;
public Concrete() { this.PI = 3.14159; }
@Override public void showPi()
{
System.out.println(PI);
}
}

public class Main
{
public static main(String[] args)
{
new Concrete();
//
// "0.0" gets printed, not "3.14159"
}
}

How can this be? It's as if the final PI value has changed. In fact, that's exactly what has happened. When a new Java object is allocated from the heap, all its fields are zeroed out. So when the constructor of Abstract fires, the memory location where PI lives contains zero. Later on, when the constructor for Concrete fires, that memory location is overwritten by 3.14159. Any subsequent attempts to call showPi will print "3.14159".

How serious is this problem in Java? I argue that it's not too serious, as long as developers are trained in this behavior, and they know what to expect. The greater dangers come from language quirks that surprise the coder, or from the clever coder who tries to exploit the poorly lit street corners of the language.

There are a few reasons why this problem is not too awful. First, the behavior is still deterministic. The fields of the object are all zeroed out when it's allocated from the heap, so there is no surprising cruft left in those memory addresses. No matter how many times I run my program above, I'm always going to print "0.0" and not some random bits.

Second, it's prudent for constructors only to call methods that are themselves final (meaning non-virtual). This is a common coding convention, and embracing it leads to code that's easier to understand and maintain. Tools like the fb-contrib plugin for Findbugs can enforce this convention.

Finally, classes that extend base classes know what their superclass is. It's a bit difficult to sneak dodgy behavior into a base class without being seen by the designer of the derived class, particularly when your tools will detect it. Consider that the source code of the child class itself will specify the particular base class it extends.

How serious these non-final finals are in Scala, however, may be another matter. We'll investigate this in the near future.

Wednesday, April 29, 2009

Scala Dependency Injection

First in a series... This post flows from my attempts to understand and apply the lessons from the "Modular Programming Using Objects" chapter of the Programming in Scala book by Odersky, Spoon, and Venners. As much as I love that book, I felt that the chapter was a bit too terse for my poor brain, and there's a lot of exploration possible for the ideas found there. In other words, maybe all this is obvious to everybody but me, but here goes...

I'm a constructor injection chauvinist. I don't hate setter injection, but I avoid it if I'm able. I do appreciate that how one does inversion of control is somewhat a matter of taste. But a couple of defenses of my preference come to mind.

First, I like my finals. In Java, member fields assigned in a constructor can be final, and that prevents me from accidentally changing things I shouldn't change. Poka-yoke has saved many a developer many a time. I think it was noted software developer Harry Callahan, who advised us to know our limitations. On second thought, I think he might have been speaking in a different context, but I'm well aware of the kinds of programming mistakes I'm prone to make.

Second, a once-used set method looks like dangling cruft. It's usually public so as to be callable by frameworks, so it lessens the signal to noise ratio of the class's source code. How? Well, the users of the class must be educated not to call the special set method. I'm troubled by a method with a standard name that suggests a particular usage, but then behaves unpredictably if that usage is attempted.

Additionally, the instantiators of the class must be educated to call the special set method, and not to use the class before doing so. It's never a good idea to surprise the coder, and constructors that don't finish constructing will enable partially built objects to exist. Maybe this is just a violation of Poka-yoke again.

In Scala, instead of finals, we have vals. And in addition to dependency injection frameworks like Guice or Spring, we have a lovely way within the language to assemble object graphs. It could well be argued that such frameworks are merely clunky compensators for weaknesses in the Java language itself, such as the lack of mixins.

Imagine an AutoPilot object that needs to ask questions of a FuelSensor object. The fuel sensor has a remaining_liters method that the auto pilot might need to call from time to time. So our object graph comprises an auto pilot object with a pointer to a fuel sensor. This graph has to be instantiated when the program starts.


class FuelSensor {
def remaining_liters: Int = { //blah blah

class AutoPilot(
private[this] val fuel_sensor: FuelSensor) {
// blah blah


A typical Scala approach to dependency injection will encapsulate the initialization of the object graph inside a trait that can be "with"ed into the application.


trait ProductionEnvironment {
val the_fuel_sensor = new FuelSensor()
val the_auto_pilot = new AutoPilot(the_fuel_sensor)
}

object MyApp extends Application
with ProductionEnvironment { // blah blah


Of course, one can initialize Scala object graphs using Spring XML files or Guice annotations, but the trait approach has a nice advantage: if you make a spelling mistake, it's a compilation error, not a runtime problem. Eventually, we're going to see that it enjoys other niceties, too.

In real life, I'll have many environments. For example, when I want to unit test my auto pilot class, I might do something like the following.


trait AutoPilotTestEnvironment {
val the_fuel_sensor = new FuelSensor {
override def remaining_liters: Int = {
// mock implementation here
}
}
val the_auto_pilot = new AutoPilot(the_fuel_Sensor)
}


In the above example, I'm free to use TestNG or ScalaTest if I prefer. Moreover, I can opt for a separate MockFuelSensor class instead of an anonymous one inside the trait. Don't let such details be distracting. The real point is that instead of being in XML-heck with Spring, I can create specific environment traits to assemble meaningful object graphs. And the compiler helps me.

There's a second concrete advantage of the Scala "in-language" approach to dependency injection (DI). I can use Object Oriented (OO) principles -- that is, the separation of the general from the specific -- to organize different configurations thoughtfully.

Suppose for example, that there were two flavors of fuel sensors. Let's emend our code example a bit. A couple of concrete fuel sensor implementations would inherit from the abstract fuel sensor type.


abstract class FuelSensor {
def remaining_liters: Int
// blah blah

class JetFuelSensor extends FuelSensor {
def remaining_liters: Int = { // blah blah

class PropellorFuelSensor extends FuelSensor {
def remaining_liters: Int = { // blah blah


The beauty here is that I can create mixins to mirror the inheritance heirachy of the objects being initialized. Our production environment trait becomes abstract, leaving configuration-specific mixins to handle the varying construction details.


trait ProductionEnvironment {
val the_fuel_sensor: FuelSensor
val the_auto_pilot = new AutoPilot(the_fuel_sensor)
}

trait JetFuelEnvironment {
val the_fuel_sensor = new JetFuelSensor
}

trait PropellorFuelEnvironment {
val the_fuel_sensor = new PropellorFuelSensor
}

object JetApplication extends Application
with JetFuelEnvironment
with ProductionEnvironment { // blah blah


This feels right. Knowledge about how to construct concrete objects can be collocated with their class definitions, if so desired. I can (no pun intended) mix and match my mixin environments to assemble the exact configuration I want for a given application. Spelling errors are detected early (at compile time).

Now for sure, I could do much of this in Spring by including XML fragments inside master configuration files, but I think it's much nicer on the human to use genuine, language-supported OO features. IDE (Interactive Development Environment) support is natural, and that's a big win here.

Stay tuned for more thoughts about Scala dependency injection, and for more refinements of our example. We still have to deal with a handful of "real world" considerations as we transform our toy system into an industrial strength solution. The goal of this post was just to throw up a straw man, whom we can clothe in armor as we go along.

In summary, Scala's support for mixins offers a nice, in-language way to initialize object graphs. Though perfectly compatible with dependency injection frameworks, Scala offers an approach that enjoys a couple of advantages. First, because configurations are code, certain errors are detected early. Second, configuration details can be partitioned meaningfully in traits, and then assembled in a more user-friendly fashion than XML files or annotations.