Friday, October 24, 2008

SQL comparison updates: Informix, JDBC

A reader of my SQL implementation comparison page suggested that I keep a change log for the page. Currently, I don't want to keep a detailed log, but I'll add blog posts here when I've made changes which are somewhat significant, tagging the posts 'sqlpage'.

Today, I added a new section related to JDBC, because I'm tired of having to spend time looking up details like JDBC driver names and connection URLs in manuals. (I don't work much with Java, so I keep forgetting where to find this stuff when I need it.) The section is far from complete.

I also started coverage of the Informix database, as I've recently become responsible for an important Informix database at work. Informix was once a very important player in the database game, but the story goes that the Informix company suddenly started getting managed very badly -- and at some point IBM bought Informix. Since then, IBM's strategy for Informix has been rather unclear. I have a personal, very weakly founded theory: I think IBM is positioning Informix as the place where the exciting new developments happen first; and when the new features are known to work well, they are introduced to DB2. This may not be a bad role for Informix.

Wednesday, October 15, 2008

Flash 10

Adobe released generation 10 of its Flash player software today.

Good:
Adobe provides a "yum" channel for the Flash player, so my PCs were automatically updated. This is nice and responsible: Many security problems on home computers are actually due to 3rd party software which isn't automatically patched when the user performs a base system update. After fixing a problem (see below), Flash 10 seemed to run fine.

Bad:
Unfortunately, Adobe still doesn't seem to provide x86_64 versions of the player. And consequently, 64-bit PCs need to have a bunch of -- otherwise irrelevant -- compatibility software installed. The need to maintain parallel 32 and 64 software worlds can also be tricky.

I wonder what's keeping Adobe from making life a little bit easier for users by adding a 64-bit download option. Which reminds me that it's probably time to take a closer look at the The Lively Kernel and Moonlight soon.

Flash 10 systematically crashed on my home PC, until I added a 32-bit curl package to the system (Flash 10 seems to link to libcurl -- something that earlier versions didn't). Strangely, the Flash RPM package doesn't carry a dependency for libcurl, although it's clearly needed.

Thursday, October 09, 2008

DB2 lets you drop a parent table

DB2 is usually a rather strict database system: It doesn't allow you to drop a procedure which is being used by a function. It uses pessimistic locking. It typically forces you to back up a tablespace if you aggressively load data into a table if the database isn't using circular logging. Etc.

So I found it surprising that it allows you to drop a table which is being referred to in a foreign key. DB2 doesn't even warn you about the fact that the child table(s) have lost a potentially important constraint. That's evil.

I know of no other DBMS which lets you do drop a parent table: PostgreSQL refuses it (unless you add a CASCADE option to the DROP statement), MSSQL refuses it. Not even MySQL lets you do it.

As always, MySQL has little surprises:
CREATE TABLE child (
child_id INT NOT NULL,
parent_id INT NOT NULL,
whatever VARCHAR(50) NOT NULL,
PRIMARY KEY(child_id,parent_id),
CONSTRAINT child_fk FOREIGN KEY (parent_id) REFERENCES parent
);
ERROR 1005 (HY000): Can't create table './test/child.frm' (errno: 150)

So what is the reason behind that confusing error message (which also qualifies as evil)? -- "REFERENCES parent" must be explicit: "REFERENCES parent(parent_id)"...

Monday, October 06, 2008

Recursive SQL, soon also in PostgreSQL

SQL:1999-style recursive SQL has been added to the development version of PostgreSQL. Consequently, soon, three DBMSes (DB2, MSSQL, PostgreSQL) will be supporting the "WITH [RECURSIVE]" construct. Recursive SQL is becoming mainstream. (Oracle also has recursive SQL, but implemented in a limited and non-standard way.)

Saturday, October 04, 2008

Open Source Days 2008, Saturday

Today was actually a work day for me, as we carried out a major service window. Fortunately, my duties during the service window were limited, so I managed to sneak over and attend a few sessions at Open Source Days while waiting for some SAN and operating system patches to be applied by co-workers.

I attended the last half of Jan Wieck's talk about Slony-I (not "Slony-l", as written in the conference agenda). Slony-I is an asynchronous master-slave replication program for the PostgreSQL database management system. Unfortunately, I don't work much with PostgreSQL in my current job, but if I did, I'd certainly try out Slony-I. It can be useful for scalability (think: read-only database slaves in a CMS cluster) and continuous backup to an offline location. It can also be used when upgrading PostgreSQL, resulting in close to zero down time, because Slony-I can (to a certain degree) replicate between different PostgreSQL versions. Slony-I has some rather impressive replication routing features, so that you have have master->slave->slave->slave.

This talk was an example of why I like participating in conferences with open source focus: Jan was very clear about Slony-I's limitations and weaknesses -- contrary to some corporate guy who might not be lying, but who might be suppressing unfortunate facts. Slony-I has a number of weak points: It's rather complex to install, configure, and manage. And the current version 1 does some dirty tricks with the system schema (will be cleaned up in version 2).

Jan once had a plan for multi-master replication in Slony-I, but that idea has been dropped for now. Fine with me: Although it sounds cool, I would have a hard time trusting such a feature anyway, thinking about the implementation complexity it would entail.

Next, Magnus Hagander spoke about Hidden gems of PostgreSQL. Magnus works at Redpill who provides 24x7 PostgreSQL support (among a number of other services). As far as I know, Redpill has recently opened an office in Denmark -- which means that it's now possible to sign up for local PostgreSQL support in our little pond.

Magnus went through a few selected PostgreSQL features, meaning that he had the time to explain them properly:
  • DDL operations (such as dropping a table) are transactional PostgreSQL. Magnus presented this as a rather exclusive feature which few other DBMSes have. Actually, DB2 has the feature, and it's a mixed blessing: Transactions are a tremendously handy and time-saving feature, including transactional DDL. But if DDLs are transactional, it also means that a user with very low privileges can lock the system catalog by performing a DDL and not committing -- meaning that other users (potentially equipped with loads of high permissions) are blocked from completing DDL operations. I assume that PostgreSQL's transactional DDL suffers from the same drawback(?) By the way, Magnus pointed out a serious drawback with performing DDLs in some other DBMSes that don't have transactional DDL: They may carry out an implicit commit when a DDL statement is executed; this leaves potential for rather uncomfortable situations.
    Update, Monday, Oct 6: PostgreSQL doesn't suffer from the problem described above for DB2.
  • PostgreSQL now has built-in full text indexing (FTI), based on a somewhat cleaned up version of "Tsearch2" which used to be an add-on to PostgreSQL. The FTI can be been used in a simple way, but you can also configure it in very specific and powerful ways, using language specific dictionaries and/or specialized parsers and stemmers.
  • Finally, Magnus when through a few of the packages in PostgreSQL's "contrib" add-on. The crypto add-on is something, I'd much like to have in DB2.
After the talks, I went to the SSLUG booth to have a look at the extremely small PC which was on display there. Fascinating stuff. I really like the trend towards down-scaled and cheaper PCs, exemplified also by the EEE (which were everywhere at the conference). At the booth, I had a chat with Phillip S. Bøgh who told me that for a typical desktop PC, 81% of its energy consumption actually happens during production, long before it's sold to the customer. The corollary is that there is value in keeping old hardware alive, instead of buying new equipment whenever some large software company decides to try to force us to buy new products featuring new heights of bloat.

Open Source Days 2008, Friday

For several years, there has been an annual two-day open source conference in Denmark. It has had different names in the past ("Linux 98", "Open Networks 99", "Linuxforum 200{0,2,3,4,5,6,7}"), but nowadays, it's called "Open Source Days".

I've attended the conference almost every year. This year is no exception, although I may miss out on most of the Saturday talks.

Here are my notes from Friday.

OpenID, by Simon Josefsson
Users of authentication-requiring web applications normally have an unfortunate choice: Use one or two passwords at all web sites, or store passwords in the local browser or a local password "wallet". The first option is clearly not attractive, because a rogue web site administrator could be using your credentials to log in as you on other web sites. The second option is troublesome if you use several PCs, or if your PC is stolen (workstations are often not regularly backed up). OpenID brings a good solution to this dilemma: Create an account at an OpenID provider which you choose to trust (I use myOpenId, currently). Then, you can use that account at all sites supporting OpenID logins (several weblog sites, Plaxo, Stack Overflow, etc). OpenID can also make life easier for web site developers.

Simon Josefsson went through the OpenID protocol, superficially (time was limited). In a comparison with other authentication systems, he noted that OpenID is based on a voluntary trust relationship between website and authenticator, in contrast with--e.g.--SAML. OpenID can only be used in a web context. All in all, OpenID is a rather simple and light-weight protocol.

The main potential security problem with OpenID is phishing, but Simon noted that this is a problem with other systems as well: Even though the system may use non-web-browser password dialogs, such dialogs can be rather closely mimicked using Flash. The most effective solution to the phishing threat is to avoid relying (exclusively) on passwords, through SMS-based one-time codes, one-time code dongles, etc. Simon's company produces an elegant, small USB device which emulates USB keyboards; when you press a button on the device, a long password is emitted. In combination with an encryption system, this results in very secure authentication.

Where I work, we face an identity handling challenge: We need to have the authenticator convey a list of group memberships for an account to the web application. OpenID has deliberately been kept simple, so there is no dedicated solution for that. But Simon noted that OpenID 2 includes an assertion mechanism which can--in priciple--be used to communicate any kind of attribute about a user to a web-site.
Unfortunately, we can't really use OpenID for the before mentioned challenge, but I would certainly look at OpenID if I were to implement an authentication system elsewhere.

Using Puppet to manage Linux and Mac platforms in an enterprise environment, by Nigel Kersten
Ever since I heard a recent interview with Luke Kanies, I've wanted to know more about Puppet. Luke has an interestering statement about system administrators: Sysadmins need to move to a higher level, by adopting some of the methology used in software development. This relates to version control, traceability, abstraction, and code reuse. I very much agree on this.

Without having personally tried Puppet yet, I think it's somewhat fair to characterize as a modern cfengine, and as the unix-world's version of the Microsoft-world's SMS-tool (SMS having better reporting facilities, while Puppet probably has better scripting features). Puppet has gained a lot of attention in Linux and Mac sysadm circles, lately. Kersten is part of a team managing more than 10000 Linux and Mac internal workstations at Google.

Puppet is written in the Ruby programming language. So it was reassuring to hear that Nigel Kersten is "a Python guy": Puppet is not just being hyped as an example of an Ruby implementation.

Random notes from the talk: I learned that Puppet can actually work offline: Many rules will work without network dependencies. And it seems that Puppet can be a good way to implement local adjustments to software packages without having to mess around with local re-packaging. Puppet goes out of its way to avoid adjusting configuration files if there is no need (nice: that way, file mtimes don't mysteriously change without file content changes). Unfortunately, it sounds like there are issues to be worked out regarding Puppet installations on workstations where SELinux is in enforcing mode.

Nigel has heard from no one with personal experiences getting Puppet running on AIX. And as we are (for better or for worse) using AIX on the majority of unix installations where I work, I probably can't justify fiddling with Puppet, currently.

By the way: RedMonk has a podcast where Nigel Kersten is interviewed. (RedMonk's podcasts generally have too much random chit-chat for my taste, but this interview is actually good, as far as I remember).

PostgreSQL
During a lunch break, I had a talk with Magnus Hagander at Redpill's excibition booth. Magnus Hagander is one of the developers of my favorite database system, PostgreSQL. PostgreSQL is generally being very conservative/unaggressive by default, in order to be a good citizen on any server installation. But often, the administrator of a PostgreSQL installation actually wants it to be very aggressive. I asked Magnus Hagander for a list of top-three PostgreSQL parameters he generally adjusts in PostgreSQL installations. His answer: shared buffers, work mem, checkpoint segment (and effective cache size).

Open Source Virtualization, an Overview, by Kris Buytaert
I've been using Xen virtualization for a while, both at home and at work. And I regularly touch IBM's Advanced POWER Virtualization as well as VMWare ESX. In other words, I'm interested in virtualization, not just from a theoretical perspective.

So I went to Kris Buytaert for an update of the status of open source virtualization technologies. Kris Boytaert went through the history of open source virtualization. He listed three virtualization products which he currently recommends: Xen for servers, VirtualBox for desktops, and Linux-VServer for mass hosting of web-servers. And he mentioned the openQRM solution which can be used for setting up a local cloud, as far as I understood. He had some surprising statements: If you have the choice between full, VT-based virtualization and paravirtualization, then go for paravirtualization, for performance reasons. Live migration is of little practical use (contrary to experiences where I work). It sounded like Kris is somewhat skeptical with regard to KVM; on the other hand, Kris described how Xen has been moving further and further away from the open source world, ever since it was bought by Citrix (Citrix: How can you let this slip away?)

Best practices, by Poul-Henning Kamp
The best practice concept is starting to annoy me. I've often heard obviously stupid solutions being motivated by "but that's best practice!"; the statement is often heard from someone with highly superficial knowledge about the related field. Recently, Mogens Nørgaard had some good comments about the phenomenon in his video column (in Danish only).

In his talk, Poul-Henning was also skeptical about the best practice term. He joked about people asking for operations to be done in a way which is "best practice, or better!". Apart from that, Poul-Henning went through various recommendations for programmers, C-programmers in particular: Do use assertions, and don't compile them away. Use code generation when possible. Print out your code, zoomed out to fit in few pages; surprisingly, that can reveal patterns in your code which you didn't realize. Use lints and other code checkers, and try compiling your code with different compilers on different platforms. Certainly good advice, but the talk left me wondering: How about changing to a programming language with better type safety, instead of all the band-aids? (I believe that Poul-Henning once touched upon this in another context, basically stating that C is the only realistic language for a systems programmer, for various reasons.)

Many people have high regard for Poul-Henning, the coder. At this talk, however, the loudness of the applauses were in the guru-admiration league. --Which was a bit out of proportion for the talk, in my opinion.

Wednesday, October 01, 2008

JAOO 2008, Wednesday

Everything is dead and crap -- but here comes DCI
James O. Coplien held a talk called Not your Grandfather's Architecture: Taking Architecture into the Agile World - take 2. The talk was highly unstructured, but entertaining. James declared a lot of concepts to be crap and/or dead, such as runtime-polymorphism, class diagrams, Java, and aspect orientation. E.g., class diagrams was declared waste of time, because the end user doesn't care about classes -- however, object diagrams were somehow very good (do users care about objects?). Much of the criticism was highly appropriate, but unclearly motivated. Next, James described a paradigm which was described as brand new, and as the way to "save" object orientation: DCI, short for Data, Context, Interaction. The paradigm was illustrated through C++ code examples, and seemed to rely on a special way of combining C++ class templates with multiple inheritance -- mixins, effectively. The goal of the new paradigm is allegedly to increase code readability. However confusingly the concepts were illustrated, I left with a feeling DCI may actually be something we will see more of.

WeDo
Next, I went to a session about Lego's new WeDo product, presented in a clear and inspiring way by Eik Thyrsted Brandsgård. WeDo is a robotics Lego product, like Lego's Mindstorms NXT. It will be available next year. WeDo has connections to the OLPC project; consequently, Lego has put an effort into making the product inexpensive. WeDo is also targeted at an audience younger than that of NXT, i.e. age 7+. Surprisingly, the presentation contained a lot of idealistic statements - and it was quite convincing: Through world-wide Lego robotics-competitions, kids may have a chance to become fans of technically creative peers -- instead of the teen-band of the year (my interpretation). And due to being affordable, WeDo may reach to a large number of schools and end-users, triggering creativity. Several components of WeDo are open source. Apart from that, Eik described Lego's development model where they take advantage of having partners world wide: A new idea is spawned in Denmark in the morning and made ready for consumption by software developers in the US. When the Danes go home, the Americans spend the day implementing it. Finally, the implementation is sent to the QA people in India. And the cycle starts over. In other words: Lego isn't just outsourcing for financial reasons. The talk was highly refreshing: Instead of paradigm/architecture name dropping (which some of JAOO's other presentations were full of), this was a talk which actually gave a strong urge to write some code! (My son will actually turn seven, soon...)

The Lively Kernel
Before Dan Ingalls' talk, The Lively Kernel, I thought that Google's web-based office applications marked the frontier of online web applications. I was wrong. Ingalls performed an hour of outmost sorcery, dragging, dropping, copying and morphing highly live objects around. The Lively Kernel is a collection of javascript, using SVG as "canvas" instead of HTML. Think of the Lively Kernel as Flash, only built on open standards and without the plugin requirement (as long as you use a modern browser). Lively Kernel even includes a browser based development environment (profiler included), and it uses WebDAV for versioning. I wasn't sure if/how Lively Kernel applications can be integrated with existing web pages. If they can't be well integrated, they may suffer from the un-webbiness problem which is discussed in an episode of the Stackoverflow podcast series. Looking forward, Ingalls mentioned that online collaboration features are somewhere around the corner. SVG has gained momentum now that most modern browsers support it and Wikipedia has started using it extensively; this could be another move forward. I wonder when Microsoft wakes up and adds built-in SVG support in Internet Explorer. By the way: FLOSS Weekly has an interview with Ingalls about the Lively Kernel.

Pattern of boredom
I'm a big fan of domain specific languages (first and foremost: SQL). So I chose to attend Patterns of Internal DSLs by Martin Fowler. External DSLs are like yacc grammars, while internal DSLs are DSLs embedded into the host language. The talk seemed to be aimed at the (many) poor people forced to work with a language lacking expressiveness. Although Fowler did mention how some things can be done rather elegantly in Ruby, most of the talk described various ways of twisting and abusing a Java-like language in order to express data in a way where chatty language boiler plate syntax doesn't hide the essence of the data. (This reminds me of yet another episode of the Stackoverflow podcast series where Spolsky distinguishes between languages in which you can easily express tree literals, and those where you can't.) I would recommend that Java programmers simply accept some lack of expressiveness and don't spend any time on the kind of DSL hacks which Fowler presented.

Parallel extensions to .Net
It's commonly heard that CPUs basically don't grow faster any more, so we need to employ several CPU cores if we want more performance. Natually, this entails that the multiple CPU cores can actually be fed some work -- which can be a serious challenge. In some parts of the IT world, the "parallelization problem" has already been solved: Server-side web applications can easily spread work to several cores, because the work of a web server is inherently concurrent. And SQL databases use a language which is for the most part very declarative, so the better DBMSes are actually able to chunk up work without special effort from the systems developer. But there are other cases where programmers need to implement parallel execution in more or less explicit ways, e.g. in traditional desktop applications, games, and in high-performance computing (HPC). The last JAOO talk I went to was on this subject.

Concurrent Programming with Parallel Extensions to .NET was a suitable name for Joe Duffy's talk. Microsoft is working on making it easy for .Net developers to choose parallel versions of operations and data structures. Joe's talk gave good and comprehensive insight into that. The extensions are not finished yet, but a preview is available from Microsoft's Parallel Computing Development Center. However, even after the hard work from Microsoft's developers, parallel .Net will still be very much in the "sharp knives" category, and Joe strongly urged people to get acquainted with Haskell which he described as the One True North in this space (Duffy has a Haskell logo tattooed on his arm). It's always encouraging to hear a Microsoft employee recommend something in the open source world. Even more encouraging was that Joe told me that he'd heard that parallel extensions for .Net has already been committed for the Mono project.

Conference over
That was it for JAOO 2008 (although some people were going to stay for a few more days, for Scrum tutorials). After the last talks, there were meet the speakers sessions, but I (and many others) didn't have time for that. Maybe JAOO should arrange meet the speakers sessions every day next year?