Saturday, January 31, 2009

Things I'd like to see in the Entity Framework

With the announcement that the Entity Framework (aka EF or LINQ to Entities) will become the favoured Microsoft ORM, many people have blogged about how important it will be for EF to achieve better ease-of-use in .NET 4.0.
I'd like to add my 2c worth on what I think EF should include. (By the way, I've never worked with EF, and won't until at least the .NET 4 version, so for all I know some of these things might be present already. Basically, this list is my attempt to capture some of the lesser-known lessons learned from using LINQ to SQL.)
1. Metamodel
An acccessible metamodel that is at least as capable as that in LINQ to SQL (i.e. MetaModel and friends). On my current project the metamodel has saved us over and over again - allowing us to do things that we needed but couldn't do with just the basic generated entities. I think it's extremely important to have this kind of "back door" that lets programmers step outside of the POCO model and access mapping-related state and metadata.

2. Lifecyle information in constructor
A way to tell the difference between explicit construction of an object in code, verus construction by EF as the object is loaded out of the database. This is a gap that we've seen in LINQ to SQL. There are partial methods for various parts of the object lifecyle (such as creation and pre-save validation) but there is no way to tell whether the creation event is happening due to a load from the database, or the "new"-ing of an instance by application code. It seems wrong to have a full set of "hooks" into the object lifecycle except for this.
My suggestion would be to have the generated entity class include two constructors. One would be a parameterised constructor. Its parameter would be an enum that indicates whether creation is due to loading or some other reason. E.g.
public Person(CreationContext context)
{.. }
where context may be something CreationContext.NewInstance or CreationContext.Loading
The other generated constructor would simply look like this
public Person():this(CreationContext.NewInstance)
So, when we write "new Person()" in code, the parameterless constructor passes the NewInstance parameter through to the "real" constructor. When the framework loads an object from the database, it should call the parameterized constructor directly, passing CreationContext.Loading. Then, at construction time, the object can always tell the purpose for which it is being constructed: does it represent a brand-new entity, or is it, conceptually, an existing entity which is being "deserialized" from the database?
The most obvious use for this is objects which, when created, should always have particular child objects. E.g. an order that always has at least one order line. If the order doesn't know why it's constructor has been called, then it cannot go ahead and make a default child instance - one might already exist in the database in the CreationContext.Loading scenario.
3. Don't presume or require a big impedance mismatch
There are some advantages to LINQ to SQL's very direct mapping approach. In particular, because the database is virtually identical to the object model the team only has to learn, understand and remember one model. The means that Ubiquitous Language extends all the way down to the database. (Particularly important if SQL, rather than objects, will be the basis for report generation; but still useful even if you're not doing SQL reporting, IMHO.)
As soon as you get the complex mappings, which are touted as an advantage of EF, the team has to understand two models, and the relationship between them.
In some applications, such mapping is indeed necessary. Perhaps an existing database doesn't match good object design; or the object design is complex enough to require non-trival mappings.
However, in projects where the same team is building the application and database from scratch, and where the object model is not too complex, it can be a good thing to have a simple mapping that's virtually one-to-one. EF should allow this, and the EF documentation should present it as a valid option.
4. Hand-writable entities
Property getters and setters, on domain entities, should be simple enough to write by hand. This can be done by some kind of AOP (to inject "smarts" into ordinary-looking properties) or by ActiveSharp. LightSpeed is a good example of an ORM in which all the "smarts" of property setting and getting reside behind simple hand-writable properties.
[Disclaimer: I wrote ActiveSharp and Lighspeed (optionally) uses it. My point here is not to promote my own code, but to promote the idea that entity properties should be concise. Concise properties, whether hand-written or codegen'd, make the code much easier to work with.]
5. Object relationships should be able to span diagrams (and assemblies)
An annoying limitation of LINQ to SQL (and LINQ to Entities) is that, in practice, all your entities must be in one designer diagram. I say that because, if they are not in one diagram, then you can't have meaningful relationships (and lazy loading) between objects in different diagrams.
This relates to point 4, above. If properties are simple enough to write by hand, then you can use several different diagrams and then "stitch them together" using hand-coded relationship properties.
E.g. you might have one diagram focusing on enties related to sales, and another on entities related to production. For the (hopefully few) relationships that go from entities in the Sales diagram to entities in the Production diagram, you can create them by hand as long as hand-authoring is a viable option.
I'm planning to do this on my current LINQ to SQL project, possibly using some of this code.
Finally, such a solution should not just allow entities to be defined on different diagrams, but also for those diagrams (and their generated entities) to be in different assemblies/projects.
Well, that's the end of my 2c worth.... for now ;-)