Saturday, May 05, 2007

Implementing Workflow with Persistent Iterators

Why do workflow tools use diagrams and XML instead of regular programming languages? It’s a good question. When it comes to the core programming tasks - looping, conditionals and abstractions - XML does a lousy job. (As Koz’s post on BPEL explains.) As for workflow diagrams, they're just a visual representation of the XML.

So why are smart people writing workflows in XML instead of Ruby, C# or Java? If we ignore marketing, hype and that fact that XML is "just so darn cool" , we’re left with one good reason. It is the "showstopper" that prevents workflow development in ordinary languages.

In this post, I’m going to describe that showstopper. I’m also going to solve it. With the solution presented here, workflows really can be written in ordinary languages.

By the way, the solution presented here is in C#. As far as I can tell it is not possible in Java. It might be possible in Python or Ruby...

The Problem

The problem is simple: workflows have really big pauses in them. The execution of a workflow may be paused for days, weeks or even months waiting for some external event. Normal programming languages don’t handle this well. Imagine a long routine which represented the logic for a particular workflow. You can’t suspend the method at the half way point, wait for three weeks (maybe even rebooting the machine while you wait), and then have it pick up exactly where it left off – resuming at the exact line where it paused, with all local variables exactly as they where before.

Here’s a simplified example of the problem:

public void ReviewDocument()
{
SendToReviewer();
WaitForReviewersReply(); // might take weeks
PublishDocument();
}
How do get the method to resume at the right place when the reviewer finally replies? You can’t just call the method again, since if you do it will start again from the beginning.

I believe this is why smart people are writing workflows in XML instead of real languages. They construct a representation of the logic in XML, and write a workflow engine which can (re)serialize that representation (complete with its internal state) at each pause in the workflow. When the wait is over, everything gets deserialized and it resumes where it left off.

Ironically, all this XML stuff is unnecessary in C# 2.0 and later – which leads us to the solution...

Solution: Part 1 – Persistent Iterators

In C# 2.0 and later, you really can get a method to stop and then resume where it left off. The secret is the new "yield" keyword. In .NET, a method with one or more "yields" in it is called an iterator.

Here’s an example. It returns "One" the first time it is called, "Two" the second, and "Three" the third:

public IEnumerable<string> Foo()
{
yield return "One";
yield return "Two";
yield return "Three";
}
C# makes this possible with some special behind-the-scenes magic. But there’s a catch, all that magic happens in memory. It’s no use if we want a 3-week pause complete with system reboots.

To achieve our goal we have to serialize all that magic out to disk. We call the method once, serialize the iterator state to disk, then three weeks later we deserialize the iterator state and make the next call to the method.

That might sound difficult, but as it turns out, it’s not. Thanks to Microsoft’s implementation of "yield" (and the flexibility of .NET serialization) it only takes a few lines of code. Here it is in action:



Note that I ran the program three times. Each time it got the next value from the sample method. That confirms that state is being persisted outside the process, which means it can cope with arbitrarily long delays. You could run it today, and have it yield "One". Then you can reboot overnight and still have it yield "Two" when you run it tomorrow.

(Download source code.)

This is the first part of our solution: persist state between yields. (Update: See Microsoft's comments here).

Solution: Part 2 – Yield "When to Call Me Back"

Let's recap on what we've got. We now have a way to call a method, have it stop part way through, pause for a reeeeeealy long time, and then have it resume where it left off. That’s exactly what we need to solve our workflow problem.

There is only one more thing we need to figure out: what should the method return when it yields? Assuming that the method will be called by some hypothetical workflow engine, it should return something which tells the engine when to make the next call.

  • For instance, if the method needs to pause until 7:30 am, it should yield an object which says "Call me back at 7:30".
  • If the method needs to pause until data is updated for customer number 123, it should yield an object which says, "Call me back when customer #123 is updated".

This lets us write our workflow as a perfectly ordinary method, yielding only when a long pause is required.

There must be a degree of polymorphism in the yielded objects – since some of them represent a delay until a fixed time, some represent a delay until certain data is updated, and others may represent delays until users reply to system-generated emails and so on. There’ll also need to be some "smarts" in the engine to interpret theses objects appropriately – roughly equivalent to the logic in an ordinary XML workflow engine, although different in the implementation details.

At this point I must apologize, because I have not had time to put together a working example. If there is enough interest in this post, I’ll try to publish one in the future. In the meantime, here’s how you might write a simple workflow:

public IEnumerator<EventToWaitFor> Foo()
{
DoSomething();

// Ask the engine to call back in 5 hours
yield return new TimeDelayEvent(new TimeSpan(5, 0, 0));

DoSomethingElse();
}
To make this really workable, a little syntactic sugar may help. Therefore, in the next example, I have introduced a method called "Until", just to make things read more smoothly. Again, I must apologize for not having more time to explain this example, so please leave a comment below if I have failed to explain it properly.

public IEnumerator<EventToWaitFor> ApproveDocument(Document doc)
{
do
{
// Initiate a review and yield (i.e. wait) until it is complete
ReviewAction review = new ReviewAction("qa@example.com", doc);
yield return Until(review.Completed);  // might take weeks

// (omitted) ... email original author to notify them
// of review outcome ...

} while (!review.DocumentIsAcceptable);

// Yield (i.e. wait) until scheduled publication date
yield return Until(doc.PublicationDate);  // might take days
PublishDocument(doc);
}
This is the second part of our solution: yield for large pauses, returning an object which says when to resume.

Conclusion

Modern programming languages express logic far better than XML. With persistent iterators, they can also handle the long pauses required in workflows.

At present, there is no production-quality workflow engine which supports this model. But perhaps there might be soon...

Update: see this - Microsoft may change implementation details that affect this approach. But they recognise the usefulness of the technique, so hopefully it will be a case of changing it, rather than breaking it.