Thursday, April 21, 2005

Switches, blocks and Generators

There's been quite a discussion going on on python-dev about blocks (à la Ruby), spinning off into switch statements. I can't say that I have any strong opinions, as I've never had any real-life code which suffers from the lack of these constructs. I have, however, had occasions when code I have been thinking about looked like it might benefit.

Fredrik Lundh points out in the discussion that iterators cover many of the uses of blocks as callbacks - rather than writing something like
 @myfunc(args):
block
of statements...
(using one of the many syntax variations proposed) you can often use
 for data in myfunc(args):
block
of statements
instead. Here data can be a dummy (and the iterator yields no useful value, but just uses yield as a placeholder) or can pass information "into" the statement block. A good example of this structure is cElementTree's iterparse function.

I'm not 100% convinced that every type of block construct being discussed can be transformed into this style, but I'd be willing to bet that many can - and that many callback-style APIs would benefit as well (witness cElementTree, and my previous musings on graph traversal). Actually, the discussion points out that the main lack is where the transformation would need a yield inside a try...finally block (which Python doesn't currently allow).

Of course, this leads us into switch statements. Graph traversal and iterparse share a common need to "call back" with multiple types of event. That's easy - just yield ("event_type", data) and use a switch to choose the processing to do.

Python's canonical switch is
   if key == "value1":
process1
elif key == "value2":
process2
...
else:
default
Disadvantages:
  • It is fairly long-winded
  • It takes a linear search of the options
  • It's not "visibly" a switch on the value of a single variable
It's difficult to take the "long-winded" argument seriously - one line to specify the switch value, then the (essential) code to execute. I have to admit, I was stretching to come up with this one.

OK, linear time. Is this really an issue? How many cases will the average switch have? Can you honestly claim that a linear search of maybe 10 options is going to kill your application? I know, inner loop, rich comparison, yada yada... - I'll deal with that below. Scratch this one for the moment.

So we're left with the last one. This does have some substance, in my view. The if tests don't make it clear that it's the same variable being tested each time (and the fact that the variable gets repeated is a possible source of problems). Also, if chains get used for lots of things other than switches, so there's no visual clue as to what's going on.

OK, so there are some issues with if-chains. What else can we try?

The other big switch idiom is the dictionary mapping value->callable. Here, we'd do
   def process1():
...
def process2():
...
def default():
...
switch = {
"value1": process1,
"value2": process2
}

fn = get(switch, key, default)
fn()
Quite neat, and if you don't need a default, you can even go with
   switch[key]()
which starts to get a little obfuscated, but expresses what we're doing pretty clearly. And it's got a constant lookup time (at the cost of requiring hashable keys), so that fixes the linear time issue above, as well.

But it does need a lot of temporary names, and a lot of definitions "up front". While I take Guido's point that namespaces are there to be used, there's a namespace in my brain as well, and that's pretty cluttered. I just can't always think of meaningful names - "value1_processing" looks silly, and "foo" is just a cop-out.

One possibility is to use a class, and a bit of introspection:
    class Switch(object):
def __call__(self, key):
fn = getattr(self, "case_" + key, self.default)
return fn()
def default(self):
pass

class _(Switch):
def case_value1(self):
process1
def case_value2(self):
process2
def default(self):
default stuff
_()() # Ugly, I know...
Lots of negatives here - that _()() is as ugly as sin, the need for the values to be Python identifiers (essentially). I'm not sure it improves over the raw dictionary.

But as I said, I've never needed to do this in real code yet. (That's not a claim that it's not useful - just a disclaimer that I've no real experience of the issues :-)) There are enough possibilities open to me, though, that I'm not sure there's any real justification for new syntax here.

Conclusions? I'm not sure. The use of generators as a way of structuring callback-style code is a really neat idea. It bears some serious exploration - I'm sure we're a long way from understanding the full value of generators in Python yet. I'm very unsure on switches, though. I know that I'm put off the generator-callback idiom by the need for a switch-type structure, but that may just be an aversion to switches in general, rather than to the syntax.

I think I need to go back to my graph traversal code, and rewrite it in generator style, then use it a bit and see how it feels.

7 Comments:

Anonymous Anonymous said...

I think the if..elif.. is the clearest one. The fact that the condition doesn't depend on a single variable is in my opinion not a disadvantage. In many cases, it is an advantage to have different condition styles. One gets more flexible without losing the readability of the code.
If you really need a speed optimazation then profile the code and reorder it.

However, as soon as you need a dynamic case statement (where conditions may be added or removed dynamically as in OS software) then the dictionary style is perfect. But, it is more difficult to read and to debug it.

The switch as a class style seems to me too complicated and the code is not very readable.

April 23, 2005 6:40 pm  
Anonymous Anonymous said...

How about:
_cases = {}
def cases(key):
____def record(fun):
________store = _cases.setdefault(fun.__name__, {})
________store[key] = fun
________fun.__name__ = '%s__%s' % (fun.__name__, key)
________return store
____return record

Which can be used like:

@cases('value1')
def once(a, b): return a+b
@cases('value2')
def once(a, b): return a-b

...
once['value2'](13,16)

April 23, 2005 8:05 pm  
Blogger Paul Moore said...

woolfy: Overall, I agree that the if variation is probably best. But people keep on wanting "switch statements", myself included, so there must be something not quite right about it. I think that it's the fact that chains of ifs are too flexible - if you really are choosing one option from many, based on the value of a single variable, the if chain doesn't give you the hint that it really is that simple.

I agree that the class stuff was pretty much a failed experiment. But I can imagine a use in some situations (graph searching). I'll save that for a future post, after I've experimented, though.

anonymous: Your decorator won't work if you have two switch statements in the same code (because of the global _cases variable). You probably realise that - I suspect it can be fixed, at the cost of complicating the decorator.

I've not worked out yet if I like this sort of use of decorators. I've seen similar things (and also far worse abuses!) in a number of places. I have a vague feeling that there's a core "good style" for decorators, beyond which you shouldn't go, although I can't define it properly yet...

April 23, 2005 10:23 pm  
Anonymous Anonymous said...

Hi,

Nice article. From a more conceptual point of view, I've always considered that :

1. If I need more than three if/else in the same code block, then there is a problem with my code design and not with anything else. Then most of the time, you start digging and you understand where you were wrong. Multiple if/else should stop existing IMO.

2. The switch statement is evil because it gives you the possibility to create a long list of different cases which are against good code programming.

I think it is a really a good thing that python doesn't offer such a possibility.

There is always a better solution IMO.

April 25, 2005 8:32 am  
Anonymous Anonymous said...

Remember that "switch" is, in its purest form, very restrictive - only constants can be used for the cases, and the variable to test must be of a compatible type. Contrast that with what goes on in an "if...elif...else" statement and you'll observe that they're quite different.

Not that Python couldn't do with some shorthand for a "switch"-like construct that does what "if...elif...else" does (where the evaluation of dictionary contents before a choice is made would be inconvenient), but this is mostly a syntax issue that could be solved with stuff like macros, not yet more runtime baggage and a multitude of PEPs.

April 25, 2005 8:35 am  
Anonymous Anonymous said...

Re: linear time, IMHO optimization should not take priority over readability

Re: long-windedness and being visible, how about this:

myInterestingVar = 'asdf'
#Function to implement switch-like
#behaviour
#(by adding this comment we make it
# visible)
def eq( b ):
__return myInterestingVar == b

if eq( 1 ):
  print 'process a one'
elif eq( 'text' ):
  print 'process text'
elif eq( 'asdf' ):
  print 'process some asdf'
else:
  print 'do the default'

April 25, 2005 4:32 pm  
Anonymous Anonymous said...

I use dictionaries where a switch would do:


class foo:
def __init__(self):
self.switch = {"value1":self.value1,
"value2":self.value2}

def doSomething(self, input, args):
try:
self.switch[input](args)
except KeyError:
# default case
pass

def value1(self, args):
# do something
pass

def value2(self, args):
# do something else
pass

Sorry about that formatting, pre isn't accepted.

I find this more readable than a long if/elif sequence, and it separates everything into small methods which is easier to read than long methods.

Python dictionaries are O(1) to access, so this is fast. No iterating through a many statements one at a time. It only works on one variable (like C/C++), but most of the time that is what you need.

Args is often a list, so each method can have a differnet number of arguments. (though you have to check to be sure you have the right number, a disadvantage)

April 25, 2005 5:28 pm  

Post a Comment

<< Home