Monday, February 21, 2005

Test-driven development

At last - a Python-related posting!

I've been thinking about trying test-driven development for a while, but have never really had a good project. But I've found something which looks worth a go. FWIW, it's a skill planner for ToME. There's one available already, but it uses Curses and so isn't available on Windows. And anyway, it's in Perl, so it clearly needs rewriting :-)

Anyway, off we go. The obvious place to start is with a "Skill" class, which isn't difficult to set up. Or rather, to set up a test for.

class SkillTests(unittest.TestCase):
def test_create(self):
combat = Skill(0.8)

From here on, it's pretty smooth going. The hard bit is to add functionality which is just enough to make the tests pass. Also, something the articles I've read don't make clear is that test-driven development doesn't avoid the need for good design sense. (It's not that they hide the fact, it's more that it's an "obvious" assumption that I missed). You still have to think about how you want to use your classes, it's just that you document that use in test cases rather than in specifications (or more likely on bits of paper which you then lose...)

The first major test comes when I want to add some non-trivial functionality. By this point, my Skill class has a "multiplier", and "points" and "value" attributes. The points can be set, and the value is a derived attribute, basically points * multiplier. There is a restriction that the value cannot exceed 50, which I implement at the moment as a constraint on how the points attribute is set. But more on this later.

Now, however, I want to add the concept of dependent skills, so that points spent on one skill can affect the value of another skill. My class won't handle this at all. So I'm going to have to do some major refactoring, which means that I need some confidence that I won't break anything. Have I got this confidence? Well.... not really. I think that before I start, I'd like to add some more tests to make sure my basic skill class behaves exactly as I want. Whatever that is - there are clearly still some design decisions to make.

That's an interesting insight in itself. I identified the need to tighten up the basic spec before moving on because I didn't feel the necessary confidence that my tests covered everything. The articles never mentioned that one, either :-)

After some thinking, however, I come to two conclusions:

  1. I can't think of any more tests I want to add

  2. I've a sneaking suspicion I'm writing the whole class backwards, and the interface I'll ultimately want is not the one I'm currently designing

The first one makes me feel better about my current set of tests, so that's OK. The second one could be an issue, but I'll park it for now, and trust that I'll be able to fix it when I really need to. It does make me wonder, though. I'll be refactoring my tests at that stage, which, while not wrong as such, feels risky. What if I delete a test which is no longer correct, but replace it with a weaker one which my implementation passes "by accident"? No, I should wait and see. Use the YAGNI principle, and carry on regardless...

As I progress adding tests, I discover a very interesting thing. I know I want to write a test for the case where a skill is at its maximum, and points are added to a subskill (one which adds bonus points to its "parent" when it is increased). This could cause a skill to exceed its maximum, so I need a test here. But I don't know what behaviour I want, so I don't know how to complete the test! That's excellent - the test-driven approach has teased out a funamental design issue I'd have missed otherwise.

I'm enjoying this. But I'll stop blogging now as I actually need to think about my design before I can proceed.

(Maybe I should wait to post until I've finished. These flow-of-consciousness posts help me, but I don't know if they make enough sense to be worth preserving for posterity :-))

Thursday, February 10, 2005

Threading and performance

I've got a program that needs to connect to 100+ Oracle databases, really fast (it's generating data for a web page). So what I did was to write a Python script, using theads and cx_Oracle to do the connections in parallel. It's pretty good, under 10 seconds on a good day to get all my data.

But this is the script I'm interfacing to a Java program, and I was having trouble getting data back - weird stuff like the same query giving different results each time. And no good way of getting tracebacks or debugging prints out (don't ask...) So I thought, if I rewrite in Java to pump data round via serialised objects, maybe it'll work better and I won't have to do the hard debugging.

So I rewrote the framework in Java - spawn off the threads, do a connect, and close the connection. Time to run (no actual query yet!) was 1 minute 10 seconds. Gack. Maybe it's because I'm using the JDBC thin driver - I'll try the Oracle "OCI" driver. No, that's as bad if not slower :-(

It's possible, I suppose, that JDBC is just a lot slower than cx_Oracle, but a quick test says not (6 sec for 10 connections in Python, 5 sec in Java). So it looks like threading in Java is really, really slow. Probably because there is a lot of (basically unnecessary) locking, from too many things being serialised by default. But I haven't got time to work out why, or work around it. Phoey. Strike one for Java again.

Actually, this quote from the Oracle JDBC documentation probably explains it: "all Oracle JDBC API methods are synchronized". So does that mean the connection attempt gets synchronised? The Sun documentation for DriverManager.getConnection doesn't say anything - it's possible that means that by default it isn't synchronised. But maybe the underlying Oracle connection method is - I don't know how to tell. Anyway, it doesn't matter - Java threads are not fast enough for my need whatever the reason.

One day I'll blog about something other than Java. Maybe.

Monday, February 07, 2005

Nice things about Java

After my previous moans about Java, it's probably only fair that I say something nice about it. If you've been following our story so far, I have a Java stored procedure in an Oracle database, which calls Runtime.exec to fire off a Python program which generates some results for me.

The programs communicate via stdout, using a home-brew text format (I gave up on XML) - it's not very complex, but a surprising amount of the Java code is dedicated to parsing it. Then it occurred to me that I could use Java's serialization features to pass data round - if the external program was in Java. A quick test later, and it surprised me - it just works. And no problems pushing serialized data through stdout either (this is on Windows, where text mode is a permanent thorn in my side).

That was surprisingly neat. The code is still a bit verbose to my mind for what it does, but it all interoperates just like that. Yes, I know Python could do the same via pickle, and I know I can't fault Python for not being able to write Java serialized objects any more than I can fault Java for not being able to read Python pickles, but that's not the point. It's a language vs library thing again - Java's libraries are very rich, like Python's. The style differs radically but I'd rather dislike the style than lack the functionality.

And some things just aren't as neat in Python. Passing binary data via stdout is dodgy in Python because of text-mode issues. I couldn't see how Java dealt with text mode at first, until I realised that was what the Reader/Writer interfaces were for. At the bottom level, all IO in Java is binary, and you have to wrap a codec round a stream to get character IO. Python can't be that "pure" because of the historical use of strings as byte buffers. Whether Python could become stricter, I don't know. And whether it could do so while still remaining "Pythonic", I'm even less sure.

But it's nice, for a change, to be able to sling data round without enduring "fear of text mode" all the time.

So, I'm getting more tolerant of Java. Not a fan, not even a convert, but at least not as biased against it. After all, we're all consenting Von Neumann machines when it comes down to it...