Friday, December 17, 2004

Library design is hard

For as long as I can remember, I've had an ongoing project to build a graph library (that's graph as in abstract data type, not graph as in pretty picture...) I remember years ago putting something together in C. In C, all the problems were about representation and generality - as anyone who has ever added a "next" pointer to a structure in C to create a linked list knows, sometimes writing generic structures in C is harder than implementing them inline every time you need them.

Nowadays, I use Python. And of course, Python is high level, and all these low-level problems go away. Long ago, Guido wrote an article explaining how to implement graphs in Python - it's beautifully simple. (You need to accept the constraint that vertices have to be hashable objects, but I've never hit an issue where that is a problem). So graphs in Python are easy, like everything else :-). And writing a little library of algorithms will be pretty simple, too.

Yeah, right.

I'm still working on my ongoing project to write a graph library. It's just the level of the problem that has changed. I no longer bother with a graph class, or ways of building graphs. That's easy - just use Guido's adjacency-list dictionary representation. And as long as I write my algorithms to use the minimum functionality needed, they will work with any graph implementation. Call it dynamic polymorphism, or duck typing, or interface-based programming - whatever it is, it's hugely liberating when you're trying to write algorithms. But it doesn't make designing the interface to your code any easier.

Take graph searches as an example. A depth-first search is easy to write, but there are some fiddly details to get right. Nothing hard, but worth checking a book if you haven't written a DFS for a while. Much like binary search, which is easy enough to write yourself, but the edge cases are fiddly enough to make a library version worth having.

But to use a DFS, there are a number of points you might want to interact with the algorithm. Sometimes you just want the vertices in the order they are discovered - a generator is ideal here. Other times, you'd like to see the tree edges. Again, maybe you want all the edges as they are processed, but you want to know the type - tree or not.

Designing an interface that satisfies all of these requirements is a real pain. I've tried a number of ways:

Generator - great, but limited. If you just want a stream of things back, a generator is clearly the right answer. But that's not always what you want, and you can't write something that's sometimes a generator, sometimes not...

Callbacks - messy call signature. There are maybe 5 points in the DFS algorithms where you might want to do something. Even using optional keyword arguments, that's not going to look pretty.

Visitor - Basically, lump all the callback functions into a single visitor object, which you pass to the call. This one is my current favourite, and with a bit of work you can make everything optional, so there's not much boilerplate involved. But you need to write a class for each use, which increases the overhead a bit.

Subclassing - Make the DFS algorithm a class, and require users to subclass it to implement their callbacks. This is something I've seen a lot, but never in Python. It might be good, but it doesn't feel "Pythonic".

So I'm still writing and throwing away attempts at writing general graph libraries. It's given me a great deal of respect for the people who design library code - every time I use a library whose interface feels "just right", I think about how easy it would be to write something that would be a pain to use. And when I have to put up with a module that feels a bit clumsy, I find it easier now to remind myself how much worse it could have been.

And I've still got my graph library project to keep me interested for years yet...

(Thanks to Philip Eby for his recent series of articles about Python and Java, which got me thinking about this)


Post a Comment

<< Home