Threading and performance
I've got a program that needs to connect to 100+ Oracle databases, really fast (it's generating data for a web page). So what I did was to write a Python script, using theads and cx_Oracle to do the connections in parallel. It's pretty good, under 10 seconds on a good day to get all my data.
But this is the script I'm interfacing to a Java program, and I was having trouble getting data back - weird stuff like the same query giving different results each time. And no good way of getting tracebacks or debugging prints out (don't ask...) So I thought, if I rewrite in Java to pump data round via serialised objects, maybe it'll work better and I won't have to do the hard debugging.
So I rewrote the framework in Java - spawn off the threads, do a connect, and close the connection. Time to run (no actual query yet!) was 1 minute 10 seconds. Gack. Maybe it's because I'm using the JDBC thin driver - I'll try the Oracle "OCI" driver. No, that's as bad if not slower :-(
It's possible, I suppose, that JDBC is just a lot slower than cx_Oracle, but a quick test says not (6 sec for 10 connections in Python, 5 sec in Java). So it looks like threading in Java is really, really slow. Probably because there is a lot of (basically unnecessary) locking, from too many things being serialised by default. But I haven't got time to work out why, or work around it. Phoey. Strike one for Java again.
Actually, this quote from the Oracle JDBC documentation probably explains it: "all Oracle JDBC API methods are synchronized". So does that mean the connection attempt gets synchronised? The Sun documentation for DriverManager.getConnection doesn't say anything - it's possible that means that by default it isn't synchronised. But maybe the underlying Oracle connection method is - I don't know how to tell. Anyway, it doesn't matter - Java threads are not fast enough for my need whatever the reason.
One day I'll blog about something other than Java. Maybe.
0 Comments:
Post a Comment
<< Home