Saturday, January 31, 2009

Things I'd like to see in the Entity Framework

With the announcement that the Entity Framework (aka EF or LINQ to Entities) will become the favoured Microsoft ORM, many people have blogged about how important it will be for EF to achieve better ease-of-use in .NET 4.0.

I'd like to add my 2c worth on what I think EF should include. (By the way, I've never worked with EF, and won't until at least the .NET 4 version, so for all I know some of these things might be present already. Basically, this list is my attempt to capture some of the lesser-known lessons learned from using LINQ to SQL.)

1. Metamodel

An acccessible metamodel that is at least as capable as that in LINQ to SQL (i.e. MetaModel and friends). On my current project the metamodel has saved us over and over again - allowing us to do things that we needed but couldn't do with just the basic generated entities. I think it's extremely important to have this kind of "back door" that lets programmers step outside of the POCO model and access mapping-related state and metadata.

2. Lifecyle information in constructor

A way to tell the difference between explicit construction of an object in code, verus construction by EF as the object is loaded out of the database. This is a gap that we've seen in LINQ to SQL. There are partial methods for various parts of the object lifecyle (such as creation and pre-save validation) but there is no way to tell whether the creation event is happening due to a load from the database, or the "new"-ing of an instance by application code. It seems wrong to have a full set of "hooks" into the object lifecycle except for this.

My suggestion would be to have the generated entity class include two constructors. One would be a parameterised constructor. Its parameter would be an enum that indicates whether creation is due to loading or some other reason. E.g.

public Person(CreationContext context)

{.. }

where context may be something CreationContext.NewInstance or CreationContext.Loading

The other generated constructor would simply look like this

public Person():this(CreationContext.NewInstance)

{...}

So, when we write "new Person()" in code, the parameterless constructor passes the NewInstance parameter through to the "real" constructor. When the framework loads an object from the database, it should call the parameterized constructor directly, passing CreationContext.Loading. Then, at construction time, the object can always tell the purpose for which it is being constructed: does it represent a brand-new entity, or is it, conceptually, an existing entity which is being "deserialized" from the database?

The most obvious use for this is objects which, when created, should always have particular child objects. E.g. an order that always has at least one order line. If the order doesn't know why it's constructor has been called, then it cannot go ahead and make a default child instance - one might already exist in the database in the CreationContext.Loading scenario.

3. Don't presume or require a big impedance mismatch

There are some advantages to LINQ to SQL's very direct mapping approach. In particular, because the database is virtually identical to the object model the team only has to learn, understand and remember one model. The means that Ubiquitous Language extends all the way down to the database. (Particularly important if SQL, rather than objects, will be the basis for report generation; but still useful even if you're not doing SQL reporting, IMHO.)

As soon as you get the complex mappings, which are touted as an advantage of EF, the team has to understand two models, and the relationship between them.

In some applications, such mapping is indeed necessary. Perhaps an existing database doesn't match good object design; or the object design is complex enough to require non-trival mappings.

However, in projects where the same team is building the application and database from scratch, and where the object model is not too complex, it can be a good thing to have a simple mapping that's virtually one-to-one. EF should allow this, and the EF documentation should present it as a valid option.

4. Hand-writable entities

Property getters and setters, on domain entities, should be simple enough to write by hand. This can be done by some kind of AOP (to inject "smarts" into ordinary-looking properties) or by ActiveSharp. LightSpeed is a good example of an ORM in which all the "smarts" of property setting and getting reside behind simple hand-writable properties.

[Disclaimer: I wrote ActiveSharp and Lighspeed (optionally) uses it. My point here is not to promote my own code, but to promote the idea that entity properties should be concise. Concise properties, whether hand-written or codegen'd, make the code much easier to work with.]

5. Object relationships should be able to span diagrams (and assemblies)

An annoying limitation of LINQ to SQL (and LINQ to Entities) is that, in practice, all your entities must be in one designer diagram. I say that because, if they are not in one diagram, then you can't have meaningful relationships (and lazy loading) between objects in different diagrams.

This relates to point 4, above. If properties are simple enough to write by hand, then you can use several different diagrams and then "stitch them together" using hand-coded relationship properties.

E.g. you might have one diagram focusing on enties related to sales, and another on entities related to production. For the (hopefully few) relationships that go from entities in the Sales diagram to entities in the Production diagram, you can create them by hand as long as hand-authoring is a viable option.

I'm planning to do this on my current LINQ to SQL project, possibly using some of this code.

Finally, such a solution should not just allow entities to be defined on different diagrams, but also for those diagrams (and their generated entities) to be in different assemblies/projects.

Well, that's the end of my 2c worth.... for now ;-)

Tuesday, December 02, 2008

What's up with P&P?

What's up with the interface between Microsoft's Patterns and Practices group (P&P) and the wider community?

P&P are writing a new version of their guidance for architecture on the Microsoft platform, and they're asking for community feedback. But by and large the community isn't giving any! And, when feedback is offered, P&P aren't necessarily replying.

For instance:

After the P&P Knowledge base project had been up for about 2 months, I counted exactly three meaningul comments on the substance of what MS had written. Of those three comments, two were from me(!), and to this day those two remain unanswered!

Confusingly, there are now three different CodePlex projects in which P&P are seeking community input. There's the App Arch Guide knowledge base, the App Arch Guide Book, and the App Arch Community Contribution project (which, as if to prove my point, is completely empty)!

So, community, what's up! Has no-one got anything to say about architecture!!!!

And Microsoft, what's up with you? What are you doing to make this work?

Update 1 Feb 08: Microsoft recently contacted me to follow up on my questions. Thanks :-) As much as I appreciate that contact (and I do) the overall lack of engagement seems to remain. I still don't see much meaningful involvement from/with the community. The Community Contribution project is still empty, except for a brief statement of it's purpose, which has only been read 38 times.

Tuesday, November 18, 2008

Hosting a window from one process inside another

Every time this topic comes up, I seem to have lost my bookmarks on it. So, here's a blog entry so I won't lose them again...

Under Windows it is possible to visually "dock" the main window of one process inside a Window belonging to another process. You get the visual effect of one program, but there are still two completely separate exes involved.

I once used this to "host" an EXE inside Internet Explorer. We wrote a little tiny ActiveX (this was in the dark ages before managed code). All the ActiveX did was start our EXE, and then "dock" the EXE's main window inside the client area of the ActiveX. It looked like our EXE was the ActiveX - but our EXE had no idea that any of this was going on. It just doing its thing, running as an independent process.

The secret is the Windows API function SetParent. You can use it to set a window from process A as the parent of the main window of process B.

I haven't done this for a while. As I recall you need to make another call to make the hosted window look like a borderless child window (SetWindowLong IIRC) and I think I also had to detect resising of the "host" and programatically resize the child.

Here are some links on the subject, following a rather brief Google:

http://geekswithblogs.net/gyoung/archive/2006/04/26/76521.aspx

http://www.codeguru.com/forum/showthread.php?threadid=234862

http://www.codeproject.com/KB/miscctrl/winwordcontrol.aspx

I'm fairly sure I learnt this technique from something on Microsoft's site, in about 2000. But this post says they are no longer recommending it (at least, not for hosting Office apps) so perhaps that explains why I can't find the original MS post.

Monday, November 03, 2008

LINQ to SQL Presentation

Here is a copy of the presentation I have at the 2008 Christchurch Code Camp. (zipped ppt)

One of the key points of the presentation was to cover what LINQ to SQL includes, both out-of-the-box, and with the additions that we have been able to build on top of it at Optimation. I put together the list that follows from two sources: Ayende's list of 25 things your OR Mapper must do; plus other things that we found useful. Ayende's points are in normal font; my additions to his is are shown in italics.

Out of the box, you get:

CRUD and querying
Transactions
(Bi-directional) associations
Lazy loading
Polymorphic queries (single-table inheritance only)
"Dirty Tracking" with ability to return full change set
Loading properties without loading whole object
Identity Map
Concurrency Control
Acceptable debuggability
Safe multi-threading (1 datacontext = 1 user on 1 thread is the general rule of thumb)
Well-defined exception policy
Lifecycle events (create/update etc)
Composite primary keys
Automatic dependency ordering when saving changes
Paging support (via Skip/Take)
Aggregation support (group/max/min etc)
Original value tracking
Property change notifications
Runtime SQL logging
Ability to generate database from object model
Persistence by reachabilty
Enhanced "reflection" via LINQ to SQL MetaModel

With our add-ons to LINQ to SQL you (or at least we ;-) also get

Undo (both to "as fetched" and "current DB state")
Flexible eager load
Unit testability
Test: is this object new?
Check mapping against DB
In-memory savepoints (like DB savepoints, but for in-memory entities which haven't been saved yet)
Proper serialization of entities (none of the usual LINQ to SQL serialization restrictions)
Clone trees of related entities (clone parent with children)
Delete-any-object (even if is a new unsaved one; automatically sever relationships to other objects)
Get data context from object (based on this with only very minor changes)
Get reachable objects (approximates "find me all objects in the datacontext)

You get only limited support for:

Caching at unit-of-work (datacontext) level
Custom field types (limited to built-in types, enums, xml, .Parse()-able strings and binary ISerializable. Here is the valuable and hard-to-find reference page)

You get no support at all for:

Caching at application-wide level (other than that which is done for you by SQL Server itself)
Cache invalidation policies
Update batching
Cascading update/delete in object model (must rely on DB to do this instead, if you want it)




All in all, its a fairly strong list. Stronger, I suspect, than many people expect.

As for the rest of the presentation, some of it will make sense from written slides, and some won't ;-) A couple of notes here might help:
  • The solution to comments on entity properties, and refreshing the designer, can be found here. It looks real good, although I haven't got round to trying it yet myself.

  • Here are the hyperlinks to the change set and association bugs. NOTE: Microsoft have just announced that they will fix the latter in .NET 4.0, which goes to prove that they really are going to include LINQ to SQL fixes in that release.


  • Using the "mapping checker" code. These three points will help if you want to use it: LinqUtil.GetModel is our wrapper for MappingSource.GetModel() - see later slides for details; LinqUtil.IsMappedType() is our wrapper for calling IsEntity on a metaType returned from the MetaModel; and Uow.All(type) is our wrapper for DataContext.GetTable(type). Simply copy the code, replacing our wrappers as noted above.


That's all for now. As noted in the presentation, please do comment below or email me.

Sunday, November 02, 2008

Future of LINQ to SQL

Yesterday, I did a presentation on LINQ to SQL.  At about the time I was talking about how great it was, Microsoft was announcing that LINQ to SQL will not be their recommended solution in .NET 4!  They will be recommending LINQ to Entities instead.

Here, in no particular order, are some intial thoughts:

  1. Sorry I wasn't well-informed enough to mention this at the presentation!  It seems like the news hit the net after I left for the Code Camp.  I did hear there'd been speculation from outside Microsoft, but I mistakenly dismissed it as an over-reaction to the recent emphasis on Entity Framework. 
  2. The news is not as bad as it sounds.  The best post on the topic seems to be this one: http://damieng.com/blog/2008/10/31/linq-to-sql-next-steps.  Read that post now, if you have not done so already.  Basically, LINQ to SQL will still be available and there will even be improvements and bug fixes.  In particular "we [Microsoft] are going to make sure LINQ to SQL continues to operate as it should. This doesn’t just mean making sure what we had works in .NET 4.0 but also fixing a number of issues that have arisen as people pick it up for more advanced projects and put it into production environments".
  3. To successfully execute this strategy, Microsoft will have to bring the best of LINQ to SQL to LINQ to Entities - in particular, they will have to bring lightness, speed and simplicity to cases where the mapping is simple and relatively direct.  I hope the recent addition of Andrew Peters to the team will help in that regard, due to his experience on Mindscapes' LightSpeed ORM.   
  4. I think it will be very important for LINQ to Entities to expose the equivalent of LINQ's MetaModel - complete with the ability to use it to implement workarounds, as I demonstrated in my presentation.  (We use the MetaAccessors to do all kinds of things). We've been able to use it plug virtually all the "holes" we've found in LINQ to SQL, and a similar capability is a must in LINQ to Entities if it is to become the favoured product.
  5. Microsoft: to make this kind of announcement, about a product that has been out for less than 12 months, is not a good look!  (LINQ to SQL's production release was roughtly 11 months ago).  This is particularly true in the area of data access, where there were already plenty of jokes about a long history of technical changes - ODBC, RDO, DAO, ADO, OLEDB, now ADO.NET.
  6. For projects you start today, with .NET 3.5, I believe the advice I gave in the presentation is still accurate.  For simple mapping consider products like LINQ to SQL and LightSpeed; for complex mapping consider products like LINQ to Entities and nHibernate.   If Microsoft achieves the goals mentioned above, my advice might change for new projects that are started after .NET 4 is released.

Tuesday, October 21, 2008

Presenting at Christchurch Code Camp 2008

I'm presenting at the Christchurch Code Camp on 1 November.

I'm presenting on LINQ-to-SQL and also on my open source project, ActiveSharp. If you were coming to a presentation on either of those topics, what would you like to hear? (This is your chance to shape the presentations, because I haven't finished writing them yet ;-)

At the moment, the abstracts for the two presentations look like this:

Title: Extending LINQ to SQL

Abstract: How to build on LINQ to SQL to add extra capabilities and work around the product’s limitations. The presentation will share techniques we’ve used at Optimation to "push the envelope" when using LINQ to SQL. I’ll outline the base techniques, how we’ve applied them, and the minor limitations that remain unresolved. For those new to LINQ to SQL, the presentation will begin with a whirlwind introduction to the product. It will conclude with some thoughts on how LINQ fits into the "big picture" of .NET ORM tools.


and

Title: Low-Level .NET – a look at ActiveSharp

Abstract: How to get started on working with MSIL, the low-level "assembly" language for .NET. The presentation is based on my experiences writing ActiveSharp, a library which parses MSIL at runtime and also emits its own runtime-generated methods. I’ll introduce the key concepts, cover some techniques to help you "cheat" while you learn, and include examples from the ActiveSharp codebase.


What do you think? Am I focusing on the right points? Are there other aspects of these subjects which you'd prefer me to cover instead? All suggestions welcome...

Saturday, October 04, 2008

How eBoostr Works

I'm using eBoostr to speed up compile times in Visual Studio.

My only complaint with the product was the lack of documentation on what it caches and when. You can view the list of all the files that it is caching, but I couldn't find any documentation on how it actually chooses which files to cache. So, I emailed eBoostr technical support, and they sent back a very helpful and informative reply - which they kindly allowed me to quote in this post.

eBoostr works as follows:
  1. It monitors all read/write requests to the hard disk, and gathers statistics on the most frequently-read files
  2. By default it will "update cache contents each hour during computer idle time automatically". You can also manually request a cache rebuild (if your computer has no idle time(!), or you just want the cache to reflect your most recent usage patterns).
  3. It doesn't mess with write operations. Each write goes straight to the real disk. If the file was cached, the cached copy is invalidated. The cached copy will not become valid again until the next cache rebuild.
  4. If you are using both RAM and USB caching, RAM "has the highest priority" (so gets the most frequently used files, I presume). Files in RAM are not cached on USB.
At first, the algorithm struck me as a little crude. But on reflection I realised that it makes sense. By rebuilding the cache in batch mode (either automatically or manually) eBooster can make sure that the cache reflects the most commonly-used files. Contrast that with the other approach, which would be to move a file into the cache as soon as it is read - possibly meaning that an infrequently-read file, which maybe you are only going to read once, will bump a more frequently-used file out of the cache.

Subjectively, eBoostr seems to be working well for me. And as I've mentioned previously, it seems to be getting very high cache hit ratios in my typical daily usage.