The Two Types of Collection

When I was at university our very forward thinking lecturer (Dr Bacon as I remember) introduced us to the most amazing up-to-date programming environment: C on UNIX. Storing an item in a collection meant declaring an array, or if you are really being adventurous, using malloc to dynamically allocate memory. That was it. Back then it was practically state of the art. As well as making for the kind of excellent training that you curse at the time but realize 10 years later you should be eternally thankful for having been made to do it.

What a difference from today when the .NET Framework comes with so many collection classes out of the box that I bet most .NET developers don’t even know what to do with some of them. (Test of your knowledge: Do you know the difference between a Dictionary and a HashSet? WITHOUT Googling it!). In the C++ world, you’re equally well blessed, as the standard template library has long since gone mainstream and boasts an equally rich set of choices.

Now here’s a question for you: What’s the fundamental difference between LinkedList<T> and – well, basically everything else in System.Collections.Generics?

Think about it before you read on.

You just read on without stopping to think, didn’t you. I saw you.

The clue is in the signature of LinkedList’s various Add methods. Look at one of them:

public LinkedListNode<T> AddFirst(T value)

It returns a LinkedListNode<T>. And that LinkedListNode is crucial. Its purpose is to know where it is in the list, and without it, moving around the list is a bit harder (though still possible).

Coll2Types_LinkedList

Linked List (with Nodes)

And that LinkedListNode is the difference we’re looking for. All the other collections in System.Generics will let you add an item without anything other than the collection itself needing to know that that item is in the collection (or at least: Anything that’s visible outside the collection. I don’t care about private implementation details). Your item can sit there all pure and happy and oblivious to the fact that not only is it no longer a collection-virgin, but it maybe it now even sits rather promiscuously inside 3 or 4 different collections. All at the same time (gasp).

But linked lists are different. The whole point of a linked list is that it allows efficient insertion, deletion, and navigation, all by having each object store references to its neighbours in the list. And that means an object can’t be in the list unless it knows it’s there. Now of course at this point you’re dying to tell me that I’m slightly lying and really the thing that knows its inside a linked list is merely the LinkedListNode instance. That node also contains a reference to the underlying object and the underlying object still doesn’t know that it’s in the list. That’s fair enough: That class design gives excellent separation of concerns (separating the need to know that where you are in the list from everything else about the underlying object). But it doesn’t change the fact that something connected with the object has to know that it is in the list and where it is.

So that’s one way that I personally categorise collections. You have collections where the members don’t need to know they are in the collection, and collections where they do. Offhand I don’t think anyone’s given a name to those categories, but my personal terminology is that I call the things whose purpose is to know where they are nodes, and I call collections like LinkedList nodal collections.

Of course it may have occurred to you that a categorization that puts one collection on the side and everything else sitting together seems a tad one-sided. Well sorry, I wasn’t aiming for an even split, I was looking for something meaningful that impacts how I code, and in particular how I design my own collection classes. And yes I often find that the MS classes, rich though they are, don’t fulfil my needs and I need to write new collections. Things like:

  • A dictionary that can store multiple values with the same key.
  • A single-valued dictionary that allows quick lookup of key from value as well as value from key.
  • A graph, (in the mathematical sense, like the diagram. Yes, some of my code requires very complicated data structures!).
  • Or of course that common favourite, the tree.

Coll2Types_Graph

A graph. Probably requires nodes!

And one thing that I’ve never read in any best practices book, but which I’ve found repeatedly is that if I design any collection in such a way that objects need to know they are in the collection, my code complexity in the classes that consume that collection seems to practically quadruple. There’s various factors at work there. You get bidirectional relationship madness because the collection needs to know who its nodes are, AND the nodes need to know who the collection is, added to the restriction that the nodes are often the kind of object that really ought to be immutable. And that adds a memory footprint because there are a lot of nodes (each one needs its own data structure to figure out where it is too). You get the division of responsibilities between node and collection: Do you go to the node to add a new object or the collection? It all gets fuzzy, and of course everything related to the collection is a generic class too, which doesn’t really add complexity, but sure makes the code harder to read. You quickly end up with one unholy mess.

In short. My golden rule for collections is:

Don’t design your collection as a nodal collection that needs a public associated node class unless you really have to.

I suspect someone in the Microsoft .NET team independently figured this out long ago because almost all of their collection classes have apparently been designed not to require public nodes.

Of course avoiding nodal collections isn’t always possible. A linked list would simply not be able to function effectively without nodes that know they are in the list. And the same is probably true of things like graphs and trees. But if I think I need a node class: I always look more carefully first.

Advertisements

10 thoughts on “The Two Types of Collection

  1. Simon, just curious, what was the need for and how did you implement the dictionary which allowing same keys?

    I was thinking of storing it internally in a dictionary(of TKey, List (of TValue)), and overriding the Add() to add it to the List. Is this the optimal approach?

    1. Hi Prabin, the most common reason I need a dictionary that allows multiple values with the same key is if I need to partition a collection (for example, say I have a collection of People instances, and I want to be able to directly access just the ones who live in the UK, or just the ones who live in France, etc.).

      Your thinking is close to what I’d do, except that I’d typically use containment rather than inheritance: Define a MultiValueDictionary class with Dictionary<TKey, List<TValue>> as its main contained member. Usually I’m thinking of just getting something working quickly for the problem at hand, and containment allows me to do that by selectively defining only the public methods I’m going to need. By the way, Dictionary.Add() isn’t virtual so can’t be overridden.

      1. Evidently there comes a point when it’s time to leave pedants’ corner and rejoin the real world… What I was saying is that as far as I’m aware, there’s no Dictionary<T> class, the correct something like Dictionary<TKey, TValue>

  2. On your desire for a dictionary that can store multiple values with the same key: .NET partly addresses that. There is the Lookup type, which implements your wishes. The only potential problem is that values of the type are immutable, so you can’t modify them: you need to create brand new instances.

Comment on this article (your first comment on TechieSimon will be moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s