A Little For Thought

Is Parallel.ForEach the New Black?

The forthcoming release of version 4 of the .NET CLR includes quite a few spiffing features, from Background Garbage Collection and Historical Debugging to Auto-Implemented Properties and underscore-free line breaks in VB. But perhaps one of the more head-turning items is the new Task Parallel Library (formerly known as the Parallel Framework Extensions, or PFX), and its prominent representative, the System.Threading.Tasks.Parallel class.

If this is the first you are hearing of this new Parallel framework, put down whatever you are drinking, because you’ll only end up spit-taking it all over your bulky off-white CRT monitor. And then you’ll have to reboot Windows ME. Well, you’d probably have to reboot Windows ME in any case, but still. Make sure you’ve saved your work in the VB6 IDE. I digress.

Serial Killer

Two of the more attractive features of the Parallel class are its For and ForEach methods.  These can replace the standard for and foreach loops with structures that will process your Business by as many Ferrets as your hardware can bite off at once, rather than by just one domesticated weasel at a time. The syntax, for the uninitiated, is as follows.

If your old and busted iterator reads thusly:

IList<Stuff> things = MyData.GetStuff();
foreach(Stuff thing in things)

Your new face-melting parallelized code will look like this:

IList<Stuff> things = MyData.GetStuff();
Parallel.ForEach<Stuff>(things, (thing) => 

That’s it. No spawning new threads, no Mutexes, no Semaphores. Just juice when you ask for it. Clearly, this is the best thing that has ever happened to .NET development, and we should go throw a big search and replace party for all of our embarrassing foreach code. Serial loops are the new GOTO.


Your Attention, Please

At this point, the attentive among us may wonder (as did Richard Campbell on .NET Rocks! show #517) that if it was that simple, why didn’t Microsoft spare us the code-fu and just make loops parallel-enabled under the CLR 4 hood? Luckily, Microsoft’s Jason Olson dispatched this notion pretty handily:

The problem is, in a generic way, there is no way that we can guarantee that it’s safe to do that, because there’s no way beforehand to determine… how much granularity is in the calculation itself. If you’re just looping through an array of numbers, and adding two numbers together, the overhead you’re going to incur from bringing in all these parallel primitives isn’t going to justify the overhead that is coming from the actual calculations being done. So there is some domain-specific knowledge that the programmer has to have to say, ‘Yes, this is somewhere that I need to parallelize.’

But how will we know when it’s time to saddle up .NET 4 and wield the parallel blade?

It comes down to measure, measure, measure, measure, measure, measure. You never know where the actual bottleneck is for sure until you actually have physical numbers and physical proof in your hand that says ‘here’s the place that’s the problem.’

So, we still have to pay attention to the needs of each specific application, each component, and each block of code before we can navigate such a choice successfully? The CLR will handle the heavy lifting and the drudgery of threading, but we still need to have an understanding of what it is that our code is actually asking of the machine?