Tuesday, April 27, 2010

This blog -- taking a break

Prompted, in part, by my annoyance at Google dropping the FTP support this blog relies on, but mostly by a lack of time, I won't be doing any work on this blog for the foreseeable future.

Commenting on all posts will be disabled.

However I do have plans to write new material on my other "non-technical" blog, especially about "people skills for geeks".  (A topic on which I recently posted this video).

Thanks to everyone who's read and commented on this blog over the past 4 years.

-- John

Memory Leaks in Managed Code

(This is a repost of an old article I wrote in early 2006, before I started this blog)

When writing a Window Forms application, it's useful to display the current memory usage in the "About" box. It comes in handy when trouble shooting. It usually proves that memory usage it not the problem, but sometimes it proves the opposite: the application is chewing up more and more RAM.

How can that be? How can a managed application leak memory?

Types of Managed Memory Leaks

There are two main types of "leak" in managed applications

1. Unintended references keeping managed objects alive

You're not using an object any more, so you expect that the garbage collector will clean it up. But the garbage collector doesn't. Why? There is only one possible reason: something, somewhere, still has a reference to the object. Perhaps you put the object in some global (i.e. static) list or perhaps it is referred to be some other object which you are still using.

So this isn't a "real" leak at all, it just looks like one. You've forgotten about a reference to your object, but the garbage collector hasn't. It sees the reference and keeps the object alive.

Of course, a reference only counts if it can be (recursively) traced back to a "root" object which is still in use - i.e. a static field or a local variable/parameter in a currently executing method. References from other objects which are, themselves, due for garbage collection will not keep an object alive. In fact, the garbage collector never even sees them. (It just starts from the roots and recursively visits reachable objects.)

Event handlers are a common cause of this problem. You create objects A and B then run some code like this:

A.SomeEvent += new EventHandler(B.SomeMethod)

Later, you finish using B, but you're still using A. Your on-going use of A keeps B alive, since A refers to B by way of the event handler. If you want B's lifetime to be shorter than A's, you have to unhook the event handler with a line like this:

A.SomeEvent -= new EventHandler(B.SomeMethod)  //note the '-='

Of course, event handlers aren't the only cause of this problem, any reference between objects will do. Consequently, these problems can be difficult to track down. You can't think what the remaining reference is, but there must be one there, somewhere. To help track it down try a tool like SciTech .NET Memory Profiler. Or, if you prefer free development tools, try Microsoft's SOS debugger extension. But be warned, it can be a bit complicated.

2. Managed objects holding unmanaged resources

There are two variants of this problem:

(a) Badly-written managed objects, which don't clean-up after themselves. Of course, you don't write any of these, so let's move straight on to (b)…

(You don't write bad objects because you know that a good managed object should always implement IDisposable if it uses unmanaged resources. A good managed programmer will use that interface to clean the object up when he or she has finished with it. IF the programmer forgets, the good managed object will clean up the unmanaged resources anyway when the garbage collector collects the object. )

(b) Small managed objects, which use significant amounts of unmanaged memory. The .NET 1.1 garbage collector cannot see the unmanaged memory. It just sees the small managed part, so it thinks there's no need for a collection. (If it did do a collection, the well-written managed object would indeed free the unmanaged memory, but the garbage collector doesn't realise a collection is required.).

Bitmaps are a classic example of this problem, since they have a small managed component and a large unmanaged part. Under .NET 2.0, there is a new method which such objects can call to inform the garbage collector of their true size. It will solve this problem (as long as it's used correctly).

Other Types of Managed Memory Problems

Two other leak-like problems are possible (although relatively unlikely, in practice, I'd say). The first relates to "pinned" objects preventing the garbage collector from moving objects around. This is an internal .NET implementation issue, and has been significantly improved in .NET 2. The second relates to declaring lots of large objects and never letting them leave scope. In the unlikely even that you find yourself doing that - and suffering unacceptable memory usage - you should stop doing it :-)

This page itemizes all the kinds of memory leaks that I've listed here (although it fails to mention managed objects holding unmanaged resources).

What to Do
  • Always examine memory usage when you test your application. Don't assume everything will be OK simply because you're using managed code. Test it.

  • If you're writing a GUI application, consider displaying both managed and unmanaged memory usage in the About box, using GC.GetTotalMemory(true) and Process.GetCurrentProcess().PrivateMemorySize respectively. (See this page, and this one, for an explanation of exactly what PrivateMemorySize means.) Open the About box from time to time during your testing, to check that the figures look reasonable.

  • Call Dispose on IDisposable objects when you have finished with them. If an object implements IDisposable it is saying "please clean me up when you've finished with me". The object may hold unmanaged memory, or other resources such as files or GDI handles. Call Dispose() or get the using statement to do it for you.

  • Be careful with global (static) members. Don't overuse them. Why? If you have a lot of statics, you may find it harder to understand which of them are keeping objects alive. (This recommendation is for your benefit, not the garbage collector's.)

  • Get to know your garbage collector. Here's a good starting point and some more advanced stuff too.

What NOT to Do

  • In general, do not call GC.Collect(). Leave it to the garbage collector to decide when a collection is required. Having said that, I like to force a collection when my About box opens, to ensure the figures are accurate. (Here's how to force a complete collection - but like I said, you should almost never do so.)

  • Don't null-out local variables. In general, there is no need to set local variables to null when you have finished using them. In Release builds (unlike Debug builds), a local variable becomes eligible for garbage collection immediately after the last line that uses it. If you add a line after that, to set the variable to null, the new line does no good. It will either (very slightly) extend the lifetime of the object or it will have no effect at all (if the JIT compiler realizes the line is useless and optimizes it away). So keep your code clean and readable by avoiding pointless assignments to null. (When should you set things to null? I can think of a couple of situations: (1) When a small object with long lifetime refers to a large object with a short lifetime. Say you have two objects, A and B, and A contains a field that refers to B. If B is a very large object and you want its lifetime to be significantly shorter than that of A, then consider setting the field to null when you've finished with B. But don't go overboard - you probably won't need to do this very often. (2) Another situation, relating specifically to servers with threads waiting on external resources,is described here. Make sure you read the comments at the end.)

  • Don't get paranoid about garbage collector performance. It is easy to underestimate just how good the .NET garbage collector is. For instance, how many developers realise that a .NET app can create and destory around 50 million small, short-lived objects per second! (Admittedly, small-shortlived objects is the garbage collector's best-case scenario - but it also happens to be the most common scenaio in real world apps.)

Happy coding!

Friday, January 15, 2010

Debugging IEnumerable

I often find enumerables difficult to inspect in the debugger. Here's a trick I just discovered.


IEnumerable foo = ...

you can inspect it more easily by first writing this in the Immediate Window:

foo = foo.ToList()

Press Enter in the immediate window, and foo changes from a (lazily) evaluated list (which is hard to inspect) into a real List which you can easily inspect by just hovering over it's Non-public members -> items.

Sunday, October 25, 2009

Exporting High-Res Graphics from Excel

I've recently been battling with technology, trying to save some Excel charts as high-res (900 dpi) files.
Here's the technique I finally came up with.  (It also works for PowerPoint slides).:
1. Save the chart as a PDF from Office (The PDF output is vector-based, so is very high quality)
2. Use a command line line, this, to use Ghostscript to convert the PDF to a suitable tiff file.  (In my case, I want the final images to be 4 inches wide, at 900 dpi, but here I have specified an output size of 5 inches (72 * 5 = 360), then I'll crop the images back to 4in in the next step.)
gswin32c -dSAFER -dBATCH -dNOPAUSE -sFONTPATH="E:\windows\fonts" -sDEVICE=tiffgray -r900 -dDEVICEWIDTHPOINTS=360 -dDEVICEHEIGHTPOINTS=221 -dPDFFitPage -sOutputFile=%~n1.tiff %1

Key parameters are -dPDFFitPage, to cause resizing when using PDFs ans input, the DEVICExPOINTS to set the size of the output in points (1/72 ths of an inch), and -r to set the DPI.
3. Finally, open the tiff in Photoshop (or similar) to crop and make any other final adjustments, then save from Photoshop.

Monday, August 31, 2009

Fix the Cross Functional Button on Toshiba Portege M200

I recently bought a second-hand Toshiba M200 tablet PC. It's great!

I just had one problem, the little 4-way button, which looks like a mini joystick and is technically known as the "Cross Functional Button", basically didn't work. It was more of a non-functional button, really.

It is supposed to allow you to scroll in all 4 directions, which is a very important thing when in tablet mode, because you don't have access to the keyboard. But, it just output numbers instead of scrolling.

I searched and searched and found nothing particularly helpful. Eventually I went to the Toshiba site, found the "Toshiba Tablet PC Buttons Driver", installed it, and my problem was solved!
I have no idea why, out of all the pages I searched, no-one mentioned this.  I found one where someone had the exact same problem, but resorted to a much more complicated solution.  So, this post is here for the next person who has the same problem.  Hopefully, you'll find this post and save yourself the hours of frustration that I had.  Just install the driver ;-)
PS while you're there, install the accelerometer driver too, if you have this problem.

Saturday, January 31, 2009

Things I'd like to see in the Entity Framework

With the announcement that the Entity Framework (aka EF or LINQ to Entities) will become the favoured Microsoft ORM, many people have blogged about how important it will be for EF to achieve better ease-of-use in .NET 4.0.
I'd like to add my 2c worth on what I think EF should include. (By the way, I've never worked with EF, and won't until at least the .NET 4 version, so for all I know some of these things might be present already. Basically, this list is my attempt to capture some of the lesser-known lessons learned from using LINQ to SQL.)
1. Metamodel
An acccessible metamodel that is at least as capable as that in LINQ to SQL (i.e. MetaModel and friends). On my current project the metamodel has saved us over and over again - allowing us to do things that we needed but couldn't do with just the basic generated entities. I think it's extremely important to have this kind of "back door" that lets programmers step outside of the POCO model and access mapping-related state and metadata.

2. Lifecyle information in constructor
A way to tell the difference between explicit construction of an object in code, verus construction by EF as the object is loaded out of the database. This is a gap that we've seen in LINQ to SQL. There are partial methods for various parts of the object lifecyle (such as creation and pre-save validation) but there is no way to tell whether the creation event is happening due to a load from the database, or the "new"-ing of an instance by application code. It seems wrong to have a full set of "hooks" into the object lifecycle except for this.
My suggestion would be to have the generated entity class include two constructors. One would be a parameterised constructor. Its parameter would be an enum that indicates whether creation is due to loading or some other reason. E.g.
public Person(CreationContext context)
{.. }
where context may be something CreationContext.NewInstance or CreationContext.Loading
The other generated constructor would simply look like this
public Person():this(CreationContext.NewInstance)
So, when we write "new Person()" in code, the parameterless constructor passes the NewInstance parameter through to the "real" constructor. When the framework loads an object from the database, it should call the parameterized constructor directly, passing CreationContext.Loading. Then, at construction time, the object can always tell the purpose for which it is being constructed: does it represent a brand-new entity, or is it, conceptually, an existing entity which is being "deserialized" from the database?
The most obvious use for this is objects which, when created, should always have particular child objects. E.g. an order that always has at least one order line. If the order doesn't know why it's constructor has been called, then it cannot go ahead and make a default child instance - one might already exist in the database in the CreationContext.Loading scenario.
3. Don't presume or require a big impedance mismatch
There are some advantages to LINQ to SQL's very direct mapping approach. In particular, because the database is virtually identical to the object model the team only has to learn, understand and remember one model. The means that Ubiquitous Language extends all the way down to the database. (Particularly important if SQL, rather than objects, will be the basis for report generation; but still useful even if you're not doing SQL reporting, IMHO.)
As soon as you get the complex mappings, which are touted as an advantage of EF, the team has to understand two models, and the relationship between them.
In some applications, such mapping is indeed necessary. Perhaps an existing database doesn't match good object design; or the object design is complex enough to require non-trival mappings.
However, in projects where the same team is building the application and database from scratch, and where the object model is not too complex, it can be a good thing to have a simple mapping that's virtually one-to-one. EF should allow this, and the EF documentation should present it as a valid option.
4. Hand-writable entities
Property getters and setters, on domain entities, should be simple enough to write by hand. This can be done by some kind of AOP (to inject "smarts" into ordinary-looking properties) or by ActiveSharp. LightSpeed is a good example of an ORM in which all the "smarts" of property setting and getting reside behind simple hand-writable properties.
[Disclaimer: I wrote ActiveSharp and Lighspeed (optionally) uses it. My point here is not to promote my own code, but to promote the idea that entity properties should be concise. Concise properties, whether hand-written or codegen'd, make the code much easier to work with.]
5. Object relationships should be able to span diagrams (and assemblies)
An annoying limitation of LINQ to SQL (and LINQ to Entities) is that, in practice, all your entities must be in one designer diagram. I say that because, if they are not in one diagram, then you can't have meaningful relationships (and lazy loading) between objects in different diagrams.
This relates to point 4, above. If properties are simple enough to write by hand, then you can use several different diagrams and then "stitch them together" using hand-coded relationship properties.
E.g. you might have one diagram focusing on enties related to sales, and another on entities related to production. For the (hopefully few) relationships that go from entities in the Sales diagram to entities in the Production diagram, you can create them by hand as long as hand-authoring is a viable option.
I'm planning to do this on my current LINQ to SQL project, possibly using some of this code.
Finally, such a solution should not just allow entities to be defined on different diagrams, but also for those diagrams (and their generated entities) to be in different assemblies/projects.
Well, that's the end of my 2c worth.... for now ;-)

Tuesday, December 02, 2008

What's up with P&P?

What's up with the interface between Microsoft's Patterns and Practices group (P&P) and the wider community?

P&P are writing a new version of their guidance for architecture on the Microsoft platform, and they're asking for community feedback. But by and large the community isn't giving any! And, when feedback is offered, P&P aren't necessarily replying.

For instance:

After the P&P Knowledge base project had been up for about 2 months, I counted exactly three meaningul comments on the substance of what MS had written. Of those three comments, two were from me(!), and to this day those two remain unanswered!

Confusingly, there are now three different CodePlex projects in which P&P are seeking community input. There's the App Arch Guide knowledge base, the App Arch Guide Book, and the App Arch Community Contribution project (which, as if to prove my point, is completely empty)!

So, community, what's up! Has no-one got anything to say about architecture!!!!

And Microsoft, what's up with you? What are you doing to make this work?

Update 1 Feb 08: Microsoft recently contacted me to follow up on my questions. Thanks :-) As much as I appreciate that contact (and I do) the overall lack of engagement seems to remain. I still don't see much meaningful involvement from/with the community. The Community Contribution project is still empty, except for a brief statement of it's purpose, which has only been read 38 times.

Tuesday, November 18, 2008

Hosting a window from one process inside another

Every time this topic comes up, I seem to have lost my bookmarks on it. So, here's a blog entry so I won't lose them again...

Under Windows it is possible to visually "dock" the main window of one process inside a Window belonging to another process. You get the visual effect of one program, but there are still two completely separate exes involved.

I once used this to "host" an EXE inside Internet Explorer. We wrote a little tiny ActiveX (this was in the dark ages before managed code). All the ActiveX did was start our EXE, and then "dock" the EXE's main window inside the client area of the ActiveX. It looked like our EXE was the ActiveX - but our EXE had no idea that any of this was going on. It just doing its thing, running as an independent process.

The secret is the Windows API function SetParent. You can use it to set a window from process A as the parent of the main window of process B.

I haven't done this for a while. As I recall you need to make another call to make the hosted window look like a borderless child window (SetWindowLong IIRC) and I think I also had to detect resising of the "host" and programatically resize the child.

Here are some links on the subject, following a rather brief Google:




I'm fairly sure I learnt this technique from something on Microsoft's site, in about 2000. But this post says they are no longer recommending it (at least, not for hosting Office apps) so perhaps that explains why I can't find the original MS post.