Thursday, April 21, 2005

Switches, blocks and Generators

There's been quite a discussion going on on python-dev about blocks (à la Ruby), spinning off into switch statements. I can't say that I have any strong opinions, as I've never had any real-life code which suffers from the lack of these constructs. I have, however, had occasions when code I have been thinking about looked like it might benefit.

Fredrik Lundh points out in the discussion that iterators cover many of the uses of blocks as callbacks - rather than writing something like
 @myfunc(args):
block
of statements...
(using one of the many syntax variations proposed) you can often use
 for data in myfunc(args):
block
of statements
instead. Here data can be a dummy (and the iterator yields no useful value, but just uses yield as a placeholder) or can pass information "into" the statement block. A good example of this structure is cElementTree's iterparse function.

I'm not 100% convinced that every type of block construct being discussed can be transformed into this style, but I'd be willing to bet that many can - and that many callback-style APIs would benefit as well (witness cElementTree, and my previous musings on graph traversal). Actually, the discussion points out that the main lack is where the transformation would need a yield inside a try...finally block (which Python doesn't currently allow).

Of course, this leads us into switch statements. Graph traversal and iterparse share a common need to "call back" with multiple types of event. That's easy - just yield ("event_type", data) and use a switch to choose the processing to do.

Python's canonical switch is
   if key == "value1":
process1
elif key == "value2":
process2
...
else:
default
Disadvantages:
  • It is fairly long-winded
  • It takes a linear search of the options
  • It's not "visibly" a switch on the value of a single variable
It's difficult to take the "long-winded" argument seriously - one line to specify the switch value, then the (essential) code to execute. I have to admit, I was stretching to come up with this one.

OK, linear time. Is this really an issue? How many cases will the average switch have? Can you honestly claim that a linear search of maybe 10 options is going to kill your application? I know, inner loop, rich comparison, yada yada... - I'll deal with that below. Scratch this one for the moment.

So we're left with the last one. This does have some substance, in my view. The if tests don't make it clear that it's the same variable being tested each time (and the fact that the variable gets repeated is a possible source of problems). Also, if chains get used for lots of things other than switches, so there's no visual clue as to what's going on.

OK, so there are some issues with if-chains. What else can we try?

The other big switch idiom is the dictionary mapping value->callable. Here, we'd do
   def process1():
...
def process2():
...
def default():
...
switch = {
"value1": process1,
"value2": process2
}

fn = get(switch, key, default)
fn()
Quite neat, and if you don't need a default, you can even go with
   switch[key]()
which starts to get a little obfuscated, but expresses what we're doing pretty clearly. And it's got a constant lookup time (at the cost of requiring hashable keys), so that fixes the linear time issue above, as well.

But it does need a lot of temporary names, and a lot of definitions "up front". While I take Guido's point that namespaces are there to be used, there's a namespace in my brain as well, and that's pretty cluttered. I just can't always think of meaningful names - "value1_processing" looks silly, and "foo" is just a cop-out.

One possibility is to use a class, and a bit of introspection:
    class Switch(object):
def __call__(self, key):
fn = getattr(self, "case_" + key, self.default)
return fn()
def default(self):
pass

class _(Switch):
def case_value1(self):
process1
def case_value2(self):
process2
def default(self):
default stuff
_()() # Ugly, I know...
Lots of negatives here - that _()() is as ugly as sin, the need for the values to be Python identifiers (essentially). I'm not sure it improves over the raw dictionary.

But as I said, I've never needed to do this in real code yet. (That's not a claim that it's not useful - just a disclaimer that I've no real experience of the issues :-)) There are enough possibilities open to me, though, that I'm not sure there's any real justification for new syntax here.

Conclusions? I'm not sure. The use of generators as a way of structuring callback-style code is a really neat idea. It bears some serious exploration - I'm sure we're a long way from understanding the full value of generators in Python yet. I'm very unsure on switches, though. I know that I'm put off the generator-callback idiom by the need for a switch-type structure, but that may just be an aversion to switches in general, rather than to the syntax.

I think I need to go back to my graph traversal code, and rewrite it in generator style, then use it a bit and see how it feels.