Friday, December 10, 2010

Klingon Scala

English is a "subject-verb-object" language.  This denotes the ordinary order of words in an English sentence.  For example, "He loves her," reveals his feelings, but not hers.  When we write code in an object oriented language, we tend to choose words that reflect this practice.

In Scala, given a Set ns of Integers, we can use the Set's contains method to ask whether a given number is in the Set.

  val ns = Set(8, 15, 17)
  println(ns contains 42)  //false
  println(ns contains 17)  //true

In a sense, contains is a binary operator that carries ordered pairs to Booleans.  We emphasize that, unlike other binary operators such as `+`, this one is not commutative.  A snippet like "42 contains ns" would mean something else entirely, and doesn't even compile.

A DSL with ∈

Klingon (Tlingan) is an "object-verb-subject" language.  Translating word-by-word from such a language, "Her loves He", or more properly, "She is-loved-by him" again tells us about his feelings, but not hers.  Sometimes when writing code, it would be easier on the reader to shuffle the order of our operands.

  println(42 `∈` ns)

This of course does not compile because there is no such `∈` method of integers. However, when the gain in readability is worth the effort, Scala offers a way to write such expressive code.

  class MyElement[X](x :X) {
    def `∈`(xs :Set[X]) = xs contains x
  }
  implicit toMyElement[X](x :X) = new MyElement(x)

This approach contrasts a bit with monkey patching found in other languages.  On one hand, the Scala approach tends to be a bit more verbose, since a new class is defined.  On the other hand, Scala allows careful control over the modification's scope.  Instead of globally altering the integer type as monkey patching would do, Scala affects the code only where the implicit function is imported.

Friday, December 3, 2010

Scala Duck Typing, Almost

There are a couple of different approaches to type systems, and I'm not talking about the whole static vs. dynamic thing. The nominative approach requires subtypes to extend base types explicitly. The structural approach allows types to be equivalent if they merely have the same methods. Scala supports both.

Nominative Example

trait Printable { def print :Unit }

class Nominative extends Printable {
def print { println("Nominative") }
}

If we define a function that accepts Printable instances, then it will happily accept Nominative instances, too.

def nominative(p :Printable) = {
p.print
}

nominative(new Nominative)

Structural Example

Because our Structural class below does not explicitly extend Printable, the compiler does not let us pass its instances into the nominative function, even though it has a suitable print method. And sometimes, that's exactly the kind of type safety we want.

class Structural {
def print { println("Structural") }
}

nominative(new Structural) // does NOT compile

But, other times it isn't. Scala is powerful enough to support structural types, whose definitions look like traits but without names. We can use the type keyword to give our structural type an alias.

type CanPrint = { def print :Unit }

def structural(p :CanPrint) = {
p.print
}

structural(new Nominative)
structural(new Structural) // compiles!

A nice feature here is that our Structural class was defined before CanPrint, so structural typing is useful when we must adapt old code to a new purpose.

An Interesting Idiom

Finally, let's consider an interesting non-legacy case. Suppose we want structural typing, and we also have full control over our class definitions. It sure would be nice to be able to have the compiler check that our signatures match up.

Unfortunately (or perhaps fortunately, since it's not completely clear what it should mean), the following does not compile.

class DoesNotCompile extends CanPrint {

So instead, let's use the Predef.identity generic function to ensure that our class does indeed have the correct structure.

class AnotherStructural {
identity[CanPrint](this)

def print :Unit = {
println("AnotherStructural")
}
}

If we had misspelled or forgotten the print method, our class would not have compiled.


Monday, November 23, 2009

AspectJ and Scala

What is the "atom" of software? If you consider an atom to be the smallest thing with which you can work, while continuing to do chemistry, then what's the software analogue?

My first thought was that an atom is a file. I can jar them up to make molecules, and string peptides of them together to make OSGi bundles. At some point the inorganic chemistry of programming becomes the protein-rich biochemistry of software engineering.

Or maybe an atom is the largest thing that, in isolation, can't possibly have a bug in it. Something like an instruction. Or maybe even a fully unit-tested class or method.

But, the history of the atom allows our analogy to grow richer, and weirder. Originally, atoms were a computational aid. They were discovered as a way to predict the outcomes of macroscopic chemical reactions. Even up until around 1905, there were still a handful of practicing chemists who didn't believe in atoms, except as a calculational tool.

But real they are, regardless of the intended meanings of the symbols chemists use to denote them. So, atoms feel more like aspects to me. Always there lurking in a program’s behavior, even if not represented using aspect syntax in the source code.

If I have a class implementing the public API of some library, then I log all the incoming calls. That logging is an aspect, even though I might have duplicated those slf4j calls in a dozen places. If I have code that takes care to release resources after I've acquired and used them, then that's another aspect. And if I've forgotten the finally clause somewhere, then that bug is a contaminant in the reactants, which makes my program behave differently than my chemistry equations would predict.

The trouble with hand-implemented aspects like repetitive logging calls or finally clauses -- even when you remember all of them -- goes beyond the biz logic pollution that they impose. All that duplicated code permits inconsistencies. For example, the log message in this method here looks a little different than the one over there.

And that's a bug escape. Because the log scraper that customer support is using, which you didn't even know about, is going to malfunction on that logging call that's only half a bubble off plumb.

It's a bit like isotopes of atoms. Not all carbon atoms are alike. You used a carbon-12 here, but whoops you used carbon-14 over there. And we know that one can decay on you. You used mostly protium, but here's a deuterium, so the heavy water that you made from it has measurably different physical properties (like boiling point), even though the chemical properties are the same.

Keeping with the analogy, hand-implemented aspects take you out of ordinary chemistry and force you to worry about nuclear and physical effects. It would be better to elevate aspects in the code to natively supported compiler constructs, like classes, so everybody is using the same isotopes.

That's why using AspectJ and Scala together tops my list of exciting things to do. I think of AspectJ as an external DSL that allows me to define pointcuts into my Scala code. Pretty much all my code, including the advice, continues to be written in Scala itself. The real virtue of AspectJ lies in the weaving.

And for some elegant work on internal DSL alternatives, refer to the paper by Daniel Spiewak and Tian Zhao about an AOP implementation in Scala.

So, rather than worrying about polluting my biz logic with code that better belongs in aspects, I'm now on guard against letting my biz logic leak into my aspects. And this is a much happier place to be.

Come to think of it, the false promise of object oriented programming was to offer reuse. This never really happened because classes are the wrong size to be reusable. Too small to be independently deployable, and too large to exclude application-specific implementation details. Instead, OO 's importance comes from the organizing principles it champions. But I wonder if aspects, devoid of custom biz logic, might take us closer to reusable software. Components and libraries are reusable in the large. But might some group (or period) of little aspect atoms be reusable in the small?

Thursday, October 29, 2009

Principled Concordion

There are a couple of ways to look at a hiatus from blogging. Either you are so successful that you rationalize you're too busy or important to reflect on all the wonderful things happening, or you've grown too slothful to find something exciting enough to share. I recently went to the No Fluff Just Stuff conference, and it has recharged my batteries, much as it did the last time I attended. Life offers only the palest excuses to avoid thoughtful introspection, or to fail to discover shareworthy things.

Analogies, Analogies


Lewis Carroll famously asked, why is a raven like writing desk? The question remains the archetypal example of a riddle deliberately concocted to have no solution. Nevertheless, I love contrived analogies for a couple of reasons. First, they are useful because they can communicate profound ideas economically. Second, they are whimsical and can warp the mind into discovering new ideas.

Which brings me to why developing software is like modeling a pendulum. To predict the future behavior of a simple harmonic oscillator, I need to know both the current position and the current velocity. Without two pieces of information, I can't solve the differential equation, and chart the pendulum bob's trajectory. Sorry, but there's just no getting around needing two values. Blame mathematics itself.

We often attack software projects like this. We figure out our current state and use it to predict where we need to go. This is a bit like taking a snapshot of the customer's expectations or requirements, and marching in the correct direction. The problem is, taking these measurements is really hard. And just a small error in requirements can lead to unsatisfied customers.

But, there's another way to solve differential equations. I still need two pieces of information, but they don't both have to be "initial" conditions. In a Dirichlet problem, I'm given an initial and a final position. From these, I can figure out the intermediate positions and the velocities.

We should (and the better among us do) develop software like this. By capturing requirements as stories, and expressing them as executable tests, we reduce our measurement errors. Moreover, our trajectory is anchored by the end condition, not mearly by initial guesses, so we're less likely to march off into the weeds.

Concordion

Consequently, I'm becoming increasingly enamored with Concordion. There are many descriptions of the tool available on the web, so I won't parrot them. Instead, I'd like to offer a different perspective, which I hope will not offend the Concordion community.

Concordion is an organizing principle, which helps one design acceptance tests and other tests of software. To be pedantic, it's actually an instantiation of such a principle, much as Smalltalk is an example of object oriented programming. My take on it is this: a family of automated tests deserves human-readable views into them, with appropriate encapsulation or elision of distracting details, such as execution order.

Concordion is often compared with FitNesse, but infrequently contrasted with it. FitNesse drives tests. I can go to a web page, push a button, and see my test run. Concordion, however, is a view into tests. I go to a web page to see results, which probably came from a continuous integration server. This difference is profound.

You can find a dozen books on object oriented programming, particularly the older ones, that sing the praises of OO because it permits code reuse. In the real world, reuse turns out to be the least compelling reason to embrace object oriented programming. The real value of OO principles lie in the improved organization of the resulting code. We mean "improved" here for human readability, not necessarily performance or computer efficiency.

Analogously, you can find many books about automated testing and the virtues it brings to software development. But a neglected advantage of good automation is that it offers ways to organize tests. Well presented tests are superior expressions of requirements.

With Concordion, I can design web page views into my tests. I leave many details, such as the order in which tests run, to my continuous integration server. For example, tests with similar setup requirements can be grouped together. But I can organize the presentation of the results any way that I want. For example, tests can be organized by sprint, or by module, or by cross-cutting feature.

Concordion makes software development look more like a Dirichlet problem, where I can keep the end in mind from the very beginning. Thinking of Concordion not as a tool, but as a principle, will shape how I program. And I still have much to learn about how one does that well.

Wednesday, June 24, 2009

More Scala Using RAISIN

Last time, we offered a minimally functional emulation of C#'s using syntax, to manage resources elegantly in Scala. We defined a curried function, whose second argument was a simple block of code. We'll refine that approach and try to bring about the remaining goals we set for ourselves for this feature.

def using[T <% Disposable]
(resource: T)(block: => Unit) = {
try {
block
}
finally {
resource.dispose
}
}

One problem with our first cut was that the object encapsulating the managed resource had a larger scope than we wanted. Since we constructed our FileHandle instance outside of the block that used it, one could accidentally access it after it had been disposed.

val handle = new FileHandle("trouble")
using(handle) {
handle.read
handle.write(42)
}
// big trouble below!
handle.read

What we really need is not to pass a Unit into the using function, but a function that accepts the resource as its argument. In other words, we'd like to be able to make a useful function and pass that as an argument into the using method

def useful_function(handle: FileHandle): Unit = {
handle.read
handle.write(42)
}

// pseudo-code to capture the idea
//
using(new FileHAndle("good"), useful_function)

That's the gist of what we want to do, but we don't want all the cruft of declaring the useful function separately. Happily, Scala allows us to use function literals to write the above very economically.

using(new FileHandle("good")) { handle =>
handle.read
handle.write(42)
}
//
// handle is not visible down here and
// can't be abused, Yay

For this to work, we have to refine our using method. All we have to do is change the second argument from type Unit to the function T => Unit, and make sure to call the block with the expected T resource.

def using[T <% Disposable]
(resource: T)(block: T => Unit) {
try {
block(resource)
}
finally {
resource.dispose
}
}

Our using function is pretty powerful now. Without any modifications, it works with closures as well as function literals. Let's alter the client code a bit to demonstrate. The following is a closure and not a function literal because i is not defined inside the curly braces demarking the code passed into using.

def demonstrate_closure(i: Int) = {
using (new FileHandle("simple")) { handle =>
handle.read
handle.write(i)
}
}

Still, there are additional things we can do in the body of our using method. For example, we could take special action if the resource passed in were null. Alternatively, we could wrap the dispose calls inside a try-catch block to prevent them from emitting exceptions.

C++ uses compile-time overloading to choose different behaviors for some functions. For example, the new operator comes in different overloaded flavors. One takes a throwaway argument of type nothrow_t to indicate that the desired version of new will return NULL when it fails, instead of throwing an exception.

In Scala, a tried and true way to choose different behaviors at compile time is by the import statements. For example, if you want a mutable Set in Scala, you

import scala.collection.mutable.Set

This inherits from the same Set trait as the immutable version, so the logic where the class is used is clean. Although the C++ nothrow_t concept is interesting, Scala's approach appears to have a better separation of concerns, and results in uncluttered code.

If we are so inclined, we can do something analogous with our using method. We could choose to import from one package where the implementation swallows Throwables emitted by dispose. Or, we could import from another where they are allowed to propagate. In other words, we can handle exceptions quite intelligently, and customize our behavior depending on context.

Finally, let's consider whether we can avoid needing to nest using clauses, and manage the disposal of multiple resources more elegantly. This is possible, but there's one important subtlety that we have to worry about.

def using[T <% Disposable, U <% Disposable]
(resource: T, _resource2: => U)(block: (T,U) => Unit) {
try {
val resource2 = _resource2
try {
block(resource, resource2)
}
finally {
resource2.dispose
}
}
finally {
resource.dispose
}
}

Note that the _resource2 argument is passed by name, and not by value. We don't actually access it until declaring the val resource2 inside the outer try block. This means that if the construction of resource2 fails, we will still call dispose on the other resource.

Let's demonstrate this. Suppose our first resource object constructs okay, but the second one throws an exception in its constructor. This is standard behavior for a RAISIN class, which disallows partially constructed instances.

def two_resources() = {
using (new FileHandle("okay"), new FileHandle("bad")) {
(first, second) =>
second.write(first.read)
}
}

If that second FileHandle constructor fires before entering the using method, then we have a resource leak! The first FileHandle is never disposed. But, because we pass the second argument by name, the second constructor does not fire before entering the using method. We're essentially passing a pointer to the constructor into the using function, who calls it.

Why pass just the second one by name and not the first one? Did we just get lucky? No. Scala evaluates its arguments from left to right.

A consequence of this choice is that we cannot access the _resource2 argument more than once inside the using method. Note that it's accessed exactly once when defining the val resource2. Otherwise, the constructor would be called again and again inside the using method. That would be an even worse resource leak, and would probably malfunction.

We've now shown that our C# emulation meets all but one of our goals. This is impressive because the Scala behavior is superior even to C# itself, for example with regard to limiting the scope of variables. The remaining goal is to demonstrate how our using construct can work with legacy classes such as java.io.File that do not extend Disposable. We'll take up this cause in the near future, after a detour into some decidedly non-standard C++. But the punchline is, we had the foresight to use view bounds and not upper bounds, so we're well prepared.

In summary, we've shown how to emulate the C# using syntax in Scala, to enable RAISIN style programming. We were remarkably successful at bullet-proofing our resource management with surprisingly few lines of code. We handling many edge cases, offered flexibility, and achieved ambitious goals. Along the way, we encountered function literals, closures, pass by name, generics, view bounds, import statements, and (presently) implicits.

This was a lovely exercise because so many different aspects of Scala had to come together in harmony. It's clear that API designers must master these features to produce high quality code, but even casual programmers would do well to learn them.

Wednesday, June 17, 2009

Scala Using RAISIN

last time, we touched on RAISIN, and considered Java's inability to support this programming style to be an important deficiency of the language.  We also promised to explore whether Scala could emulate the C# approach to deterministic destructors.  We take up that challenge presently, and we're going to find that a wide variety of Scala features all come together to make this happen.

Implementing RAISIN is a little tougher than our Ruby "unless" modifier, where the task was pretty narrow and well understood.  So before we begin, let's capture the goals we should set for emulating -- and surpassing -- the C# "using" syntax inside Scala.
  • Beautiful, readable code
  • Obliging the user to do very little
  • Handling multiple resources at once
  • Preventing stale objects from being accessed
  • Prefer immutable & avoid nulls
  • Intelligent exception handling
  • Flexible enough for arbitrary resources
Beautiful, readable code

This is always the prime directive.  Suppose we had our FileHandle class, and we have to ge rid of its associated reource after we use it.  We should tolerate nothing uglier than what we'd see in C#.

// Scala wishful thinking
//
val handle = new FileHandle("myfile")
using(handle) {
// Either of the following methods might
// throw, but that's okay.
//
handle.read
handle.write(42)
}

Obliging the user to do very little

We really want to avoid having to repeast all the try-finally scaffolding in the user's code, which Java would require.  We also don't wan tht use to have to understand the details of how to free up the resources.  Maybe something as simple as...
import csharp._
...should be sufficient to make the using syntax available to the programmer's code.

Handling multiple resources at once

Rather than nesting one using clause inside another, it would be nice to follow C#'s practice of allowing multiple resources inside one using statement.  This also aligns with th functionality afforded by C++, in which we can put multiple objects on the stack inside the same block, illustrated below.

// C++
{
FileHandle const h1 = // details omitted
FileHandle const h2 = // details omitted

// Use h1 and h2 freely here. Even if the
// construction of h2 failed, h1 still
// gets released. That's important
//
}

Preventing stale objecgts from being accessed

This is an opportunity for our Scala solution to shine.  Reconsidering our first example above, We'd like the handle to have the smallest possible scope.

val handle = new FileHandle("myfile")
using(handle) {
// Either of the following methods might
// throw, but that's okay.
//
handle.read
handle.write(42)
}

// It would be nice if we could somehow make the
// compiler prevent spurious accesses of the handle
// down here. We want to deny access to disposed
// objects.

Prefer immutable & avoid nulls

We'd like to use val rather than var wherever we can.  This is analogous to using Java final when declaring variables.  We'd also like to be assured that the resource is constructed correctly, and not null.

These desires may may compel us to put the initialization, meaning the resource acquisition, somehow inside the using clause where it can be managed well.

Intelligent exception handling

It's a well known coding practice in C++ to code destructors so that they do not emit exceptions.  However, no such convention exists for common Java classes.  For example, the java.io.File.close method throws java.io.IOException.  We need a way to handle such exceptions intelligently.

Flexible enough for arbitrary resources

In C++, any class can have a meaningful destructor, so previously designed classes can be used in the RAISIN style.  In C#, we're constrained to use only classes that inherit from the IDisposable interface, and the cleanup has to be done in the dispose method.

This means that ordinary classes like java.io.File, which has a close method instead of a dispose method, will pose some difficulties when trying to wrap it in a C#-like "using" clause.  Yet, Scala is powerful, and it's a reasonable goal to overcome these limitations.

Will all these goals in mind, let's not try to bite off too much at once.  Last time, our zeroth cut defined a Disposable trait and a FileHandle that extends it.  This time, we'll also want a using function that accepts a Disposable object and a block of code to be executed.

// First cut...
package csharp

object Using {
def using[T <% Disposable](resource: T)(block: => Unit) {
try {
block
}
finally {
resource.dispose
}
}
}

There's a lot going on in that method, so let's tease it apart carefully.  First, it's a parameterized function, where the resource argument must be of type T.  The <% notation is a view bound.  It means that type T must inherit from Disposable or be transformable into Disposable by an implicit.

(It's not obvious yet why we need view bounds, or even an upper bound.  This is just a little adumbration for how we're going to achieve some of our trickier goals, such as "preventing stale objects from being accessed," and "flexible enough for arbitrary resources."  We won't get there in this post, but have patience.)

Second, the using method has two argument lists, rather than a single list of comma delimited arguments. Put another way, using is a curried function, as evidenced by two sets of parentheses instead of just one.  This syntax allows the second argument to be a block of code in curly braces, rather than something inside using's parentheses.

Third, note that the arrow notation implies that the block is passed by name, not by value.  This means that the code won't actually execute until block is called inside the try clause of the using method.  It does not execute before using is entered.

Since our toy FileHandle class (defined in a previous post) inherits from Disposable, then we can write the following.


import csharp.Using._

object Main {

def simple_usage = {
val handle = new FileHandle("simple")
using(handle) {
handle.read
handle.write(42)
}
}

// details omitted

That's not bad for a first cut.  We've achieved our first two goals, but we still have a long way to go in future posts to make progress on the others.

In summary, we've taken some steps towards implementing RAISIN in Scala, taking the C# using syntax as a model.  Along the way, we've seen view bounds, curried functions, and pass-by-name.  The latter two language features allow the user's code to be beautiful.

Wednesday, June 10, 2009

Software Development Process

A process is the collection of practices followed in an organization.  it identifies the hats worn by people, and the artifacts they produce and consume.  It names the responsibilities that the workers fulfill, and the workflows through which their artifacts pass.  Also, a process likely includes at least some of the tools used, because automation is a big part of getting things done.

Examples of software development processes include RUP (Rational Unified Process), Scrum, and Waterfall.  To make a coding analogy, one might argue that a certain project instantiates a development process just as an object instantiates a class.

A process not only reflects the activities of the participants, it also guides their efforts.  however, keeping with the coding analogy, the humans are the virtual machine in which the process instance runs.  Therefore, people are the heart of any process, and processes are always malleable.  Even if a process purports to be rigid, it will not likely be followed for very long.

Processes can be documented, but a process description is no more a real process than a virus is a living cell.

The metaphor is apt.  Practices are captured in memes.  For example, champions of test driven development self identify as "test infected."  Very few developers who have not actually tried TDD and seen that it changes the way code gets designed could have gleaned this effect just from reading a book.

A good process will reproduce, evolve, and spread its success far and wide.  But just as some organisms can't live in some environments, the ecosystem has to be receptive to the practices embraced in a process for them to take root.  There are no "best practices."  Context is everything.

Successful processes arm decision makers with timely information, and offer guidance for resolving problems.  As a corollary, the more empowered the workers are, the more freely available such information must be, because there are more decision makers shaping progress.  The contrapositive also follows.  Without transparency, success rests on the talents of just a few privileged individuals.

Useful processes permit the measurement of and influence over:
  • Quality
  • Costs
  • Progress
  • Growth
By Quality, of course we mean customer satisfaction.  What's not quite so obvious is that many people in the organization wear the customer hat for various artifacts and services during development.

By Costs, we mean the fiduciary expenditures for salary, tools, training, hardware, and so on.   (This is sometimes more difficult than it would appear, beause a single software effort could have multiple funders, each interested in different features being developed.)

By Progress, we mean the maturing of the artifacts, such as code, documentation, and models, into a consumable or sellable state.  Often, the careful monitoring of progress is especially important to certain stakeholders.

By Growth, we mean the professional growth of the human beings who are developing goods.  This includes skills improvement, job satisfaction, value to the organization, and contributions to the profession and the art.