public virtual MemoryStream: On Object/Relational Mapping Tools

Monday, October 6, 2003 1:42 AM

On Object/Relational Mapping Tools

Object/Relational mapping tools sound wonderful in theory. You get a type-safe object-view to your database, there's no more need to do any kind of manual SQL coding, database changes typically only need changes in one place in your code-base, sometimes the O/R tool performs a certain level of caching for you, etc. etc.

Unfortunately in practice things aren't always quite that wonderful. The main problem is that, because you don't write the SQL yourself, you typically have very little control over the SQL that is used to communicate with your database. Some SQL may be impossible to generate - try to write a CMP finder for example that returns rows from more than one table/object, or try to update more than one record at a time. And for actions that are possible, there is oftentimes no guarantee as to how many SQL statements are executed to get the desired result. For instance, if you have a one-to-many relationship between COMPANY and ORDER, and you retrieve all COMPANY objects that adhere to certain criteria, and then get all their ORDERS, many O/R mapping tools will end up executing N+1 SQL queries: 1 to return the N COMPANY objects, and N more for their associated ORDER objects. This is a totally unscalable approach of course.

To make things worse, because this happens behind the scenes, you oftentimes don't even find out about it until you start working with large datasets and your system begins to slow down to a crawl. Once you get there, how do you know what code caused the sudden drop in performance? Running a profiler on your application is generally a good idea, but when using a O/R mapping tool, 9 out of 10 times the performance hit will have been caused by too much SQL and a good JDBC logger can be used to pinpoint the problem.

Now many JDBC drivers have an option to perform logging themselves. Unfortunately they oftentimes spit out way more information than you need and finding the repetitive SQL statements becomes like searching for a needle in a haystack.

I recommend instead using an open source tool called p6spy for this purpose. p6spy is JDBC logger that sits in between your application and the real JDBC driver. It intercepts all calls to the real JDBC driver this way and subsequently logs a single line per executed SQL statement. It is easy to install and highly configurable through a properties file - allowing the user to for instance only log certain actions, or log a stack trace along with every SQL statement log to easily trace the SQL back to the code that caused it.

Using a tool like p6spy it becomes possible to catch the N+1 problem even in small datasets where it hasn't yet become a performance problem. You can then try to tweak the O/R mapping tool settings or your interaction with the tool and hopefully change things or the better. I highly recommend it.

TrackBack URL for this entry: http://www.hutteman.com/scgi-bin/mt/mt-tb.cgi/93

Comments

JBDC logging
Luke Hutteman writes about p6spy which "is [a] JDBC logger that sits in between your application and the real JDBC driver. It intercepts all calls to the real JDBC driver this way and subsequently logs a single line per executed

Trackback from Random Stuff at October 6, 2003 4:06 AM

I use hibernate for O/R mapping here's how I deal with this issue:

With properly configured mapping xml files Hibernate does not generate N+1 SQL for one to many relationships (using the Oracle dialect + outer joins enabled anyway). When things aren't properly configured I just turn SQL logging on.

Posted by Chad Johnson at October 6, 2003 1:54 PM

I claim that the benefits of explicit mapping exceed those of automatic O/R mapping by far. There's more to code in the beginning, that's pretty much all that speaks for O/R. I've wasted 1 1/2 years on an O/R mapping infrastructure that did everything from clever data retrieval to smartt caching and we always came back to the simple fact that "just code the damn thing" yields far superior, more manageable and maintainable results.

Posted by Clemens Vasters at October 6, 2003 4:29 PM

Have you tried one of the modern ORM tools like Toplink or Hibernate?

Posted by Christian Bauer at October 7, 2003 5:43 AM

I used TopLink extensively at a previous job. It was definately more advanced than what a standard CMP solution will give you, and can be used to work around most of the ORM pitfalls. At the same time though, the more advanced features weren't always bug-free back then and support was rather poor. We would run into clustering problems for instance and would have to wait 3 days for the initial response to our support request. A lot more time would pass betweent that initial request and the final resolution of the problem, if one ever came :-(

Now this was over two years ago so things may have improved since then, but based on that experience I'd probably rather try a free open-source solution like hibernate than an expensive close-source product like TopLink.

Posted by Luke Hutteman at October 7, 2003 1:56 PM

This discussion has been closed. If you wish to contact me about this post, you can do so by email.