Random Ramblings

Monday, July 10, 2006

Comment Spam

It's been a long time since I posted, but it's nice to see the blog has carried on without me. Or not :-(

I've just spent far too long deleting comment spam. I've turned on word verification for comments - hopefully, that will keep things under control. We'll see.

It's depressing how guilty I feel about having this sort of junk appear on the internet associated with my name.

Saturday, September 24, 2005

TurboGears

Sigh. It's been a long time since my last post.

Anyway, I've just discovered TurboGears. Looks like a really nice Web development framework - they even have a movie, just like Rails :-) Actually, the site calls it a "megaframework", the idea being that it isn't just another framework, but it's actually a combination of existing frameworks.

The demo (build a Wiki in 20 minutes) is really neat. Things I especially like:

It uses CherryPy, which is probably my current favourite web framework for Python.
AJAX support built in (via MochiKit)

The database support uses SQLObject, which is pretty cool, although it doesn't support Oracle, which is a shame from my POV. There are ongoing suggestions that Oracle support could be added to SQLObject, though - it seems to be mainly lack of Oracle experience from the current developers, and lack of Oracle experts offering help, that are holding this up. Of course, I could offer help, but I doubt that I'd actually be able to devote the necessary time...

The easy_install setup looks seriously neat. However, I'm still nervous about how it will work alongside my existing Python installation, which has CherryPy and SQLObject already installed as traditional bdist_wininst installers. But I've muttered enough about that on the distutils-sig, and I don't want to be a pain about it. Let's just say that setuptools hasn't really addressed integration with platform package management yet. But I do think that setuptools is the way to go, and now that I have TurboGears as a basis, maybe I can offer some actual help at last (I'm a package user, not a package builder, so I'm stuck until packages start being published which use setuptools).

Definitely one for my "must investigate" list.

Wednesday, May 04, 2005

Finding Nevow

I know, I'm sorry. I couldn't resist the pun.

I'd really like to like Nevow. It feels like a really neat idea. There are lots of parts that simply feel like a good way to write code (some of these are not unique to Nevow, or are inherited from Twisted) - interfaces, stan, livepage, guard, renderers, etc.

But some bits of the whole package really bug me. Far out of proportion to their real importance, I'll freely admit, but enough to put me off every time I try to use it.

The lack of documentation is a pain. Things are getting better - in Nevow 0.4.1 there are some good basic documents in the package, but it still feels like diving into the examples and source, and experimenting is the only way of finding out what's going on. And that sucks. It's not unique to Nevow, but Nevow's structure makes it feel worse. Maybe that's the interface stuff - I suspect it is - but whatever the reason, I'm forever finding odd neat, magic tricks, to achieve things (I remember the moment when I finally found out that you can use ISession(context) to get at the session object) but with no clear feel for the general principle involved. (OK, the context implements the ISession interface, and I can look in the code to find out what other interfaces it implements, but what's the logic? The context isn't a session, so adaptation isn't being used in the sense I understand. Why not use context.session with a getattr hook? How does knowing I can get the session this way help me to deduce that I can get similar things like the request?) I can ask on the mailing list or on IRC - the Nevow community is very helpful - but there are only so many dumb newbie questions you can ask before you start being a pain, even in the most helpful group.

Also, there's a feeling that things aren't really stable yet. The examples supplied produce reams of deprecation warnings of one form or another. I know I can suppress them, but it feels like there's something wrong. Should I not be using Twisted 2.0 yet? Should I be getting Nevow from subversion rather than using the last release? Or is this OK, and I shouldn't worry?

And the guard stuff, while it's a neat way of doing authentication, really, really bugs me with its redirects to ?__session_just_started__=1 and other such things (I've seen a few variations). These internal details should not be visible to the user! What happens if I bookmark something like this by mistake, not noticing that it's not the URL I originally entered?

And the formatting (or lack of...) of the generated HTML is a pain, too. I know it makes no difference to the browser, and I can always run the output through something like HTML-tidy if I want a neatly formatted version, but I do use "View source" as a debugging tool. And I can't with Nevow.

I'm sorry - this isn't much more than a rant. I've brought these issues up on the twisted-web mailing list, but no-one seems particularly motivated to do anything about them. I can understand why, and I certainly don't want to make any of these things into some sort of crusade - particularly as I can't offer any fixes. But they put me off using Nevow, which I do think is a shame (for me - the developers aren't going to miss me :-))

I'll keep trying to like Nevow, but I can't see myself actually writing any real applications with it for a while yet...

Tuesday, May 03, 2005

Python web frameworks and Rails

There's been a lot of discussion on the Python web SIG recently, prompted by the good press that Rails for Ruby has been getting. I'm not entirely sure it's going anywhere specific - it's not at all clear that there is a problem to be "fixed". But it does make me think once again about web development in Python.

I'm not actually a web developer. More than that, I have no actual requirement to produce any sort of web application in my job. But we do have some web applications, which we developed using Oracle's HTML DB. So to some extent, my view on Python web development is prompted by the question "why did we end up using HTML DB rather than Python?" Before I can talk about that, I need to start with a basic explanation of what HTML DB is. It's a web development environment, shipped with the Oracle 10g database - it gives you a web-based interface, which makes it very easy to build simple CRUD (Create Report Update Delete) applications. The key points are

Rapid wizard-based development of CRUD applications
Decent look and feel (tabbed pages, graphical buttons, etc)
Lots of features to make sophisticated stuff "easy" as part of the initial application (built in search and pagination for reports, "are you sure?" prompts for deletes, things like that)

The nasty problems come once you have built your initial application, and are trying to maintain it. Like many wizard-style code generators, what you get from the wizard is complex and quite subtle, and you can get in a real mess when you try to modify it. And a lot of the slick features are produced by generating a lot of code, so when you try to add something similar without the benefit of the wizard, it's nigh-on impossible.

So every time we need to add a feature to the application, I feel that it's a maintenance nightmare, and we'd have been far better with a nice maintainable Python codebase. But I couldn't compete with the initial productivity of HTML DB (I was the Python advocate, a colleague built the HTML DB version).

So, the place I'd like to see more help for Python web developers is in the form of tutorial documentation and application templates. Something that looks nice (I don't have any design skills, and I suspect many Python developers are in the same boat). Displaying and updating a database table is pretty much the same regardless of the details of the table columns, so a generic CRUD application shouldn't be impossible.

I'm not too hung up on which framework I use - I'd like to see this sort of thing in any of them. Combinations I've tried to use have included

Twisted + Nevow
Quixote
CherryPy

For the database layer, SQLObject seems to be the most common choice, although non-SQL formats such as Durus aren't completely implausible. For my need, I'd want to store data in an Oracle database, and as SQLObject doesn't support Oracle at the moment, I'd be comfortable with a "raw" DB-API database layer.

So that's my impression - Python has lots of good things to offer in terms of maintainability and flexibility, but we lose on the "quick start" front. People want something "right now", and they don't seem to mind sacrificing longer-term maintainability for it. That almost certainly isn't true for "real" web developments, but in my environment, the quick intranet hack is king. For better or worse :-(

And that's where I see Ruby on Rails getting the attention - the 10-minute demo videos, showing how you can build an application from nothing in no time at all. So what if it's only a toy? It works, and it's a basis. Maybe Ian Bicking's Paste is going in that direction - I don't know, I've not looked closely at it yet. I can't get comfortable with the application generator style which Rails uses, and Paste seems to have picked up on. But that could start a whole new post, and I don't know Paste well enough yet to comment on it.

Thursday, April 21, 2005

Switches, blocks and Generators

There's been quite a discussion going on on python-dev about blocks (à la Ruby), spinning off into switch statements. I can't say that I have any strong opinions, as I've never had any real-life code which suffers from the lack of these constructs. I have, however, had occasions when code I have been thinking about looked like it might benefit.

Fredrik Lundh points out in the discussion that iterators cover many of the uses of blocks as callbacks - rather than writing something like

 @myfunc(args):
    block
    of statements...

(using one of the many syntax variations proposed) you can often use

 for data in myfunc(args):
    block
    of statements

instead. Here data can be a dummy (and the iterator yields no useful value, but just uses yield as a placeholder) or can pass information "into" the statement block. A good example of this structure is cElementTree's iterparse function.

I'm not 100% convinced that every type of block construct being discussed can be transformed into this style, but I'd be willing to bet that many can - and that many callback-style APIs would benefit as well (witness cElementTree, and my previous musings on graph traversal). Actually, the discussion points out that the main lack is where the transformation would need a yield inside a try...finally block (which Python doesn't currently allow).

Of course, this leads us into switch statements. Graph traversal and iterparse share a common need to "call back" with multiple types of event. That's easy - just yield ("event_type", data) and use a switch to choose the processing to do.

Python's canonical switch is

   if key == "value1":
      process1
  elif key == "value2":
      process2
  ...
  else:
      default

Disadvantages:

It is fairly long-winded
It takes a linear search of the options
It's not "visibly" a switch on the value of a single variable

It's difficult to take the "long-winded" argument seriously - one line to specify the switch value, then the (essential) code to execute. I have to admit, I was stretching to come up with this one.

OK, linear time. Is this really an issue? How many cases will the average switch have? Can you honestly claim that a linear search of maybe 10 options is going to kill your application? I know, inner loop, rich comparison, yada yada... - I'll deal with that below. Scratch this one for the moment.

So we're left with the last one. This does have some substance, in my view. The if tests don't make it clear that it's the same variable being tested each time (and the fact that the variable gets repeated is a possible source of problems). Also, if chains get used for lots of things other than switches, so there's no visual clue as to what's going on.

OK, so there are some issues with if-chains. What else can we try?

The other big switch idiom is the dictionary mapping value->callable. Here, we'd do

   def process1():
      ...
  def process2():
      ...
  def default():
      ...
  switch = {
      "value1": process1,
      "value2": process2
  }

  fn = get(switch, key, default)
  fn()

Quite neat, and if you don't need a default, you can even go with

   switch[key]()

which starts to get a little obfuscated, but expresses what we're doing pretty clearly. And it's got a constant lookup time (at the cost of requiring hashable keys), so that fixes the linear time issue above, as well.

But it does need a lot of temporary names, and a lot of definitions "up front". While I take Guido's point that namespaces are there to be used, there's a namespace in my brain as well, and that's pretty cluttered. I just can't always think of meaningful names - "value1_processing" looks silly, and "foo" is just a cop-out.

One possibility is to use a class, and a bit of introspection:

    class Switch(object):
       def __call__(self, key):
           fn = getattr(self, "case_" + key, self.default)
           return fn()
       def default(self):
           pass

   class _(Switch):
       def case_value1(self):
           process1
       def case_value2(self):
           process2
       def default(self):
           default stuff
   _()() # Ugly, I know...

Lots of negatives here - that _()() is as ugly as sin, the need for the values to be Python identifiers (essentially). I'm not sure it improves over the raw dictionary.

But as I said, I've never needed to do this in real code yet. (That's not a claim that it's not useful - just a disclaimer that I've no real experience of the issues :-)) There are enough possibilities open to me, though, that I'm not sure there's any real justification for new syntax here.

Conclusions? I'm not sure. The use of generators as a way of structuring callback-style code is a really neat idea. It bears some serious exploration - I'm sure we're a long way from understanding the full value of generators in Python yet. I'm very unsure on switches, though. I know that I'm put off the generator-callback idiom by the need for a switch-type structure, but that may just be an aversion to switches in general, rather than to the syntax.

I think I need to go back to my graph traversal code, and rewrite it in generator style, then use it a bit and see how it feels.

Monday, February 21, 2005

Test-driven development

At last - a Python-related posting!

I've been thinking about trying test-driven development for a while, but have never really had a good project. But I've found something which looks worth a go. FWIW, it's a skill planner for ToME. There's one available already, but it uses Curses and so isn't available on Windows. And anyway, it's in Perl, so it clearly needs rewriting :-)

Anyway, off we go. The obvious place to start is with a "Skill" class, which isn't difficult to set up. Or rather, to set up a test for.


class SkillTests(unittest.TestCase):
 def test_create(self):
 combat = Skill(0.8)

From here on, it's pretty smooth going. The hard bit is to add functionality which is just enough to make the tests pass. Also, something the articles I've read don't make clear is that test-driven development doesn't avoid the need for good design sense. (It's not that they hide the fact, it's more that it's an "obvious" assumption that I missed). You still have to think about how you want to use your classes, it's just that you document that use in test cases rather than in specifications (or more likely on bits of paper which you then lose...)

The first major test comes when I want to add some non-trivial functionality. By this point, my Skill class has a "multiplier", and "points" and "value" attributes. The points can be set, and the value is a derived attribute, basically points * multiplier. There is a restriction that the value cannot exceed 50, which I implement at the moment as a constraint on how the points attribute is set. But more on this later.

Now, however, I want to add the concept of dependent skills, so that points spent on one skill can affect the value of another skill. My class won't handle this at all. So I'm going to have to do some major refactoring, which means that I need some confidence that I won't break anything. Have I got this confidence? Well.... not really. I think that before I start, I'd like to add some more tests to make sure my basic skill class behaves exactly as I want. Whatever that is - there are clearly still some design decisions to make.

That's an interesting insight in itself. I identified the need to tighten up the basic spec before moving on because I didn't feel the necessary confidence that my tests covered everything. The articles never mentioned that one, either :-)

After some thinking, however, I come to two conclusions:

I can't think of any more tests I want to add

I've a sneaking suspicion I'm writing the whole class backwards, and the interface I'll ultimately want is not the one I'm currently designing

The first one makes me feel better about my current set of tests, so that's OK. The second one could be an issue, but I'll park it for now, and trust that I'll be able to fix it when I really need to. It does make me wonder, though. I'll be refactoring my tests at that stage, which, while not wrong as such, feels risky. What if I delete a test which is no longer correct, but replace it with a weaker one which my implementation passes "by accident"? No, I should wait and see. Use the YAGNI principle, and carry on regardless...

As I progress adding tests, I discover a very interesting thing. I know I want to write a test for the case where a skill is at its maximum, and points are added to a subskill (one which adds bonus points to its "parent" when it is increased). This could cause a skill to exceed its maximum, so I need a test here. But I don't know what behaviour I want, so I don't know how to complete the test! That's excellent - the test-driven approach has teased out a funamental design issue I'd have missed otherwise.

I'm enjoying this. But I'll stop blogging now as I actually need to think about my design before I can proceed.

(Maybe I should wait to post until I've finished. These flow-of-consciousness posts help me, but I don't know if they make enough sense to be worth preserving for posterity :-))

Thursday, February 10, 2005

Threading and performance

I've got a program that needs to connect to 100+ Oracle databases, really fast (it's generating data for a web page). So what I did was to write a Python script, using theads and cx_Oracle to do the connections in parallel. It's pretty good, under 10 seconds on a good day to get all my data.

But this is the script I'm interfacing to a Java program, and I was having trouble getting data back - weird stuff like the same query giving different results each time. And no good way of getting tracebacks or debugging prints out (don't ask...) So I thought, if I rewrite in Java to pump data round via serialised objects, maybe it'll work better and I won't have to do the hard debugging.

So I rewrote the framework in Java - spawn off the threads, do a connect, and close the connection. Time to run (no actual query yet!) was 1 minute 10 seconds. Gack. Maybe it's because I'm using the JDBC thin driver - I'll try the Oracle "OCI" driver. No, that's as bad if not slower :-(

It's possible, I suppose, that JDBC is just a lot slower than cx_Oracle, but a quick test says not (6 sec for 10 connections in Python, 5 sec in Java). So it looks like threading in Java is really, really slow. Probably because there is a lot of (basically unnecessary) locking, from too many things being serialised by default. But I haven't got time to work out why, or work around it. Phoey. Strike one for Java again.

Actually, this quote from the Oracle JDBC documentation probably explains it: "all Oracle JDBC API methods are synchronized". So does that mean the connection attempt gets synchronised? The Sun documentation for DriverManager.getConnection doesn't say anything - it's possible that means that by default it isn't synchronised. But maybe the underlying Oracle connection method is - I don't know how to tell. Anyway, it doesn't matter - Java threads are not fast enough for my need whatever the reason.

One day I'll blog about something other than Java. Maybe.