Saturday, October 28, 2006

Shorthand Interfaces

Joshua Flanagan has written a fluent API for creating readable regular expressions. Very cool. (Thanks for the link Bevan.)

But... when I see a fluent interface, usually I feel that something’s not quite right. It’s as if the API designer had a beautiful concept in mind, but that the end result was a workaround - an attempt to express an elegant DSL in a host language which doesn’t really support DSLs. In particular, I don’t like the verbosity I see in fluent APIs.

So, I took Joshua’s example as a starting point and came up with a more concise alternative. I think of it as a "shorthand interface" – it flows with fluency, but is also concise. (By the way, this "shorthand interface" is in C#; it would probably be impossible Java. More on that point below...)

Here is Joshua’s fluent example:
Pattern findGamesPattern = Pattern.With
.Literal("<div")
.WhiteSpace.Repeat.ZeroOrMore
.Literal("class=\"game\"")
.WhiteSpace.Repeat.ZeroOrMore
.Literal("id=\"")
.NamedGroup("gameId", Pattern.With.Digit.Repeat.OneOrMore)
.Literal("-game\"")
.NamedGroup("content", Pattern.With.Anything.Repeat.Lazy.ZeroOrMore)
.Literal("<!--gameStatus")
.WhiteSpace.Repeat.ZeroOrMore
.Literal("=")
.WhiteSpace.Repeat.ZeroOrMore
.NamedGroup("gameState", Pattern.With.Digit.Repeat.OneOrMore)
.Literal("-->");
And here’s what I’d prefer to write:
"<div",   
Space >= 0,                
"class = \"game\"",
Space >= 0,
"id=\"",
Group(Digit >= 1).Named("gameId"), 
"-game\"",
Group(Anything >= Lazy(0)).Named("content"), 
"<!--gameStatus",
Space >= 0,
"=",
Space >= 0,
Group(Digit >= 1).Named("gameState"),
"-->"
This is valid C#. The key points to note are:
  • Unlike the fluent example, this is not one big expression. Instead, it is a list of expressions, delimited with commas. We can pass the whole list into a factory method that takes a "params" argument ("varargs" in Java-speak).
  • With a suitable implicit conversion operator, the string literals will be automatically converted into the objects that we need. We don't need to wrap them in a method called "Literal()"
  • With suitable operator overloading, we can use operators like >= to specify how many matches are required for the various elements.
  • That just leaves the fact that I’ve used unqualified property and method names. E.g. "Space" and "Group" instead of say "RegexHelper.Space" and "RegexHelper.Group". This issue is a pet peeve of mine. For DSL-like usage, the need to qualify methods is a pain. Fortunately there are several solutions which, while not perfect, seem justifiable in the context of readable DSLs. Firstly, if you have control of the classes concerned, you may be able to declare the methods in a base class. Or, in .NET 3.0, perhaps extension methods on "object" (effectively extending "this") may allow unqualified method names. Of course, if qualifying your method names doesn’t bug you, then you don’t need a solution :-)

Here’s a full example of the "shorthand syntax" so you can see how the method names are resolved. I have declared the methods and properties as members of the class SmartRegex, but so that I don’t have to qualify them, I’m using a constructor as a "container" for the list of expressions. Here, GameFinder is a type of SmartRegex that represents our desired regex. It contains nothing but this constructor:
class GameFinder:SmartRegex
{
public GameFinder():base(

"<div",   
Space >= 0,                
"class = \"game\"",
Space >= 0,
"id=\"",
Group(Digit >= 1).Named("gameId"), 
"-game\"",
Group(Anything >= Lazy(0)).Named("content"), 
"<!--gameStatus",
Space >= 0,
"=",
Space >= 0,
Group(Digit >= 1).Named("gameState"),
"-->"

){}
}

SmartRegex findGamesPattern = new GameFinder();


That's the entire regex example. A class that defines our "game finder" regex, and a variable where we instantiate the class.

I hope this post serves as "food for thought". I find it interesting that the "shorthand interface" relies on operator overloading, implict conversions, and "varags" (methods taking variable numbers of arguments). All those features were historically missing from Java (varargs were added in Java 1.5). Because Java lacked those features, Java developers used the fluent style as a workaround.

Do we need the same workaround in C#?


PS: escaping all those quote characters above annoyed me (as it does everytime I work with strings containing double quotes), so I made this feature suggestion to Microsoft.