Improving on Constructors

Constructors, as they appear in mainstream object-oriented languages, have numerous issues. Directly allocating objects with constructors creates coupling, and since most languages cannot abstract over constructors, we must resort to techniques like Factory patterns or Dependency Injection to provide the abstraction.

These issues seem to be well understood (or at least well documented), so I thought I’d bring up a less dangerous but no less annoying issue: when I try to code in a mostly-functional style, the approach to construction in C++, Java and C# forces me to write way too much boilerplate.

For an example, Imagine I am defining some C# classes to represent a simple lambda-calculus AST:


class Exp { ... }
class Abs : Exp { ... }
class App : Exp { ... }
class Var : Exp { ... }

Immutable Objects

I’d like to work with objects that are immutable once constructed. That means that I will not expose their fields, and will expose only “getter” properties. If I’d like each Exp to have some information on its range in the source code (using a value type SourceRange) I might write a canonical Exp as:


public class Exp
{
    public Exp( SourceRange range )
    {
        _range = range;
    }

    public SourceRange Range { get { return _range; } }

    private SourceRange _range;
}

For an immutable class with a single attribute, I’ve had to write a surprising amount of boilerplate. I’ve had to write the type of the attribute (SourceRange) three times, and variations on the name of the attribute (range, Range, _range) six times.

If I were using Scala, though, I could express the original intent quite compactly:


public class Exp( val Range : SourceRange )
{}

This notation defines both a parameter to the default constructor of Exp and a read-only property Range that gives access to the value passed into the constructor.

Derived Classes

So it appears that Scala can eliminate our boilerplate in Exp, but what happens in our derived classes? Starting with a canonical C# encoding again, here is Abs:


public class Abs : Exp
{
    public Abs( SourceRange range,
                string name,
                Exp body )
        : base( range )
    {
        _name = name;
        _body = body;
    }

    public string Name { get { return _name; } }
    public Exp Body { get { return _body; } }

    private string _name;
    private Exp _body;
}

The boilerplate for the new properties is the same as before. What is new, though, is that we are forced to re-state the attributes of the base class in our new constructor. While this seems like a relatively small annoyance at first, we end up having to repeat this boilerplate in each subclass we add. If the base class has a non-trivial number of attributes, this obviously gets proportionally worse.

In this case Scala doesn’t provide a solution to avoid this kind of boilerplate:


public class Abs( range : SourceRange,
                  val name : String,
                  val body : Exp )
    extends Exp(range)
{}

Extending the Base Class

So what’s so bad about this per-subclass boilerplate? The dogmatic answer is that it is a violation of Once and Only Once. A more pragmatic answer arises if we need to alter or extend the base class.

Suppose we decide to add a Type attribute to Exp. This attribute might have a default value (e.g. null), so existing call sites that create expressions do not need to be updated. How much code do we have to edit to achieve this?

Adding the a new field and property to Exp is relatively easy, as is adding a new Exp constructor with an additional parameter. In addition, though, we’d have to update every subclass of Exp to include another constructor with the new parameter.

This is a serious compromise in modularity. If we are creating a class library used by other programmers or other organizations then we may not even have access to all subclasses. This means there are certain edits that we cannot make to the base class.

A Possible Compromise

If we sacrificed the goal of having immutable objects, we could use C# auto-generated properties to avoid the per-subclass boilerplate:


public class Exp
{
    public SourceRange Range { get; set; }
}

public class Abs : Exp
{
    public string Name { get; set; }
    public Exp Body { get; set; }
}

With this approach we would then use the property-based initialization syntax when constructing an instance:


var abs = new Abs{ Range = new SourceRange(...),
                   Name = "x",
                   Body = ... };

Adding a Type property to Exp could then be accomplished without affecting every subclass. Clients who create expressions could freely include the new parameter in their initializer lists.

There are two big downsides to this approach, though. The first is that we have sacrificed the immutability of our objects – every property has both a getter and a setter. The second is that clients can now create uninitialized or partially-initialized objects by forgetting to include any of the “required” attributes in their initializer.

You can decide for yourself whether that is an appropriate solution. I for one find it distasteful, and dislike that newer .NET technologies like WPF and XAML seem to be encouraging this style.

Doing Better

Ideally we’d have a solution that combines the declarative style and guaranteed initialization of the Scala approach with the easy extensibility of the C# automatic-property approach. It turns out that CLOS (the Common Lisp Object System) and its descendent Dylan already use a solution along these lines.

Casting our example into idiomatic Dylan, we would have:


define class <exp>
    constant slot range :: <source-range>, required-init-keyword: range:;
end class;

define class <abs> (<exp>)
    constant slot name :: <string>, required-init-keyword: name:;
    constant slot body :: <exp>, required-init-keyword: body:;
end class;

A user could then create an expression using the standard make function (the Dylan equivalent of the new operator in other languages):


let abs = make(<abs>,
               range: someRange,
               name: "x",
               body: ... );

Because Dylan and CLOS are dynamic languages, failure to provide all required parameters yields a runtime rather than compile-time error. Except for this, however, the Dylan approach provides exactly the combination of benefits described above.

Conclusion

Object initialization is a thorny issue in many modern object-orientated languages. In order to gain the benefits of both safety and extensibility, we should be willing to look at a wide variety of languages for inspiration.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s