Friday, January 28, 2005

Bloglines related feeds

Got this one from Simon Brunning and Andy Todd. The idea is, you look for your own blog on Bloglines, then click on the "Related Feeds" link. Post the top 5 (or more) on your blog.

With mine, it helps if you say "Include feeds I'm subscribed to", but maybe that's just because my blog doesn't have much content yet...
  2. Python owns us
  3. Jeremy Hylton's Web Log
  4. Ian Bicking: A Blog
  5. Ted Leung on the air
  6. Lambda the Ultimate - Programming Languages Weblog
  7. Python News
  8. Simon Willison's Weblog
  9. ongoing
  10. Martin Fowler's Bliki
If I exclude blogs I'm subscribed to, there's a lot more Java blogs, maybe because I've posted a couple of Java rants. There's also GIGO: words unreadable aloud. I'm not at all sure how to take that........

Thursday, January 20, 2005

XML - even more aaargh

I was right. That Java-calls-Python code I wrote about did indeed get complex enough that I was spending too much effort writing code to parse data coming in via a single pipe (the stdout of my Python program). I needed to pass error codes, as well as "real" data, and the real data had multiple data types.

Oh, well. Why write a home-grown parser of my own? Java supports XML, so I grit my teeth...

As expected, converting the Python program to write XML is easy. Ten minutes, max. So let's grab all those neat Java XML libraries, and enter a world of standards-based heaven. Or not. SAX seems to be the obvious choice, but that seems to imply that I have to write my own state machine code to handle the SAX events. So remind me - why is writing my own state machine easier than writing my own parser? I hate writing state machines, I always get the end conditions wrong :-(

Maybe I'll try DOM. Who knows, it may be easier. It's ugly and verbose in other languages, but I'm getting used to the concept that everything is ugly and verbose in Java, so maybe it'll seem fine.

Sorry, but I needed that rant. We now return you to your scheduled program.

Monday, January 10, 2005

Java - aaargh

Normally, I code in Python. It's easy and quick to write fairly powerful scripts, and the standard library is very extensive.

But just recently, I have had to write some Java. I've written the same code in Python, but it needed migrating into an Oracle database (don't ask why, it's a long story), which supports Java in the server, so I started converting the code.

Initially, it wasn't too hard - the main bits of the code used Python's threading library and the DB API, which aren't too dissimilar to Java's thread class and JDBC. But that fell by the wayside because Oracle's JVM serialises threads - a multithreaded program runs one thread at a time. Pah. The whole point of using threading was to run 100+ database queries in parallel! Add to that the fact that the JDBC calls were failing mysteriously, and I decided to back off from this.

So I resurrected my Python script, and decided to use Runtime.exec to fire it off and read its results. This works, and seems to be a good compromise. But I now need to transfer resultsets back from the Python program to Java. The only serious contender is to transfer the data as text on stdout, which means I have to parse text data in Java.

Boy, is that hard :-( I'd forgotten how many hoops low-level languages like Java and C++ can make you jump through just to parse text data! What do I do to split a string into words? Java 1.4 has String.split, but that's regex based, and I'm pretty sure the Oracle JVM isn't version 1.4. So does that mean I have to roll my own, with indexOf and substing, and all the error checking and boundary condition nonsense that is just built into Python's str.split()? Ack. I don't know if I can face looking at numeric conversion, string escaping, etc.

I can just feel it - this way lies XML. Why waste all that energy parsing data when there's ready-made XML libraries to do it for me? But why do I need to use something as heavyweight as XML to just transfer some tabular data?


I don't have any problem with Java as a language. It's just another flavour of C-like syntax, and as such has its warts, but works fine. And the library is nothing if not extensive. But things just generally seem hard. I suspect it's a mindset of the Java community. It can't be a bad thing in itself, given the success of Java. But I sure don't get it...

Oh well, back to the fiddly code. Was the index 0-based or 1-based again?