Open Source Databases Comparison

The following chart will illustrate the comparison between several open source databases. The comparison will cover: PostgresSQL 8.2, MySQL 5, Derby 10.3, and HSQLDB 1.8 in terms of their limitations and strength. The goal of this analysis is to provide enough background information in the process of decision making in choosing a suitable open source database for PRP and which will also be easy to install and probably able to be bundle together with the PRP without extra installation.

Selection Criteria

  • FREE or Open Source
  • Multiple platform support
  • Easy to install and use
  • Can be bundle in code so little or no extra installation is required
  • Can be scalable, i.e. load balancing, size limit, etc.
  • Reliable, i.e. backup, replication, etc.
  • Efficient, i.e. indexing, fast search, etc.
  PostgreSQL 8.2 MySQL 5.0 Apache Derby 10.3 HSQLDB 1.8 H2
OS support First windows supported version, only support linux and windows, special compilation requires for other platforms Supports Windows/Linux/Unix/Mac OSX Any OS that can run Java Any OS that can run Java Any OS that can run Java
Documentation Full pre and post installation documentations Extensive pre and post installation documentations Extensive pre and post installation documentations Full pre and post installation documentations Extensive pre and post installation documentations
License BSD License - Free Depends - GPL for open source or Commercial for commercial Apache License v2 - Free BSD License - Free Modified MPL - Free
Maximum Database Size Unlimited Unlimited Unlimited Unlimited (depends on memory) Unlimited (depends on memory)
Maximum Number of Tables Unlimited Unlimited Unlimited Unlimited (depends on memory) Unlimited (depends on memory)
Maximum Table Size 32 TB 64 TB Unlimited 8 GB ?
Maximum Row Size 1.6 TB 8 kB Unlimited Unlimited (depends on memory) ?
Maximum Field Size 1 GB 255 B 2 GB Unlimited (depends on memory) ?
Maximum Rows per Table Unlimited Unlimited (Limited by tablespace of 64TB) Unlimited Unlimited (depends on memory) Unlimited (depends on memory)
Maximum Columns per Table 250 - 1600 depending on column types 1000 columns 1012 columns Unlimited (depends on memory) Unlimited (depends on memory)
Maximum Indexes per Table Unlimited 64 indexes 32767 indexes Unlimited (depends on memory) Unlimited (depends on memory)
Indexes 32 fields - B-Tree, R-Tree, GiST, and Hash 16 fields - B-Tree 16 fields ? ?
Interfaces ODBC, JDBC, C, etc. ODBC, JDBC, C, etc. JDBC JDBC JDBC
Transactional ACID-compliant InnoDB type tables only - ACID-compliant Yes ? Yes
Bind Variables Yes Yes Yes ? ?
Stored Procedures Yes - SQL, PL/Tcl, PL/pgSQL Yes Yes Yes Yes
Trigger Yes - SQL, PL/Tcl, PL/pgSQL Yes Yes Yes Yes
Row Level Locking Yes Yes Yes ? Yes
Timeout Yes No Yes - deadlock detection and timeout Yes Yes
Error Code Yes Yes Yes Yes Yes
Date Time Broad range of date/time formats Broad range of date/time formats Broad range of date/time formats Broad range of date/time formats Broad range of date/time formats
Scalability/SMP Each connection uses one CPU, but spread the processes of different connections across available CPUs SMP is supported by using OS threads. All connections are executed in one thread, queries are not using multiple CPUs Yes, embedded engine does all work in client threads
Multithread listener, but execute queries one at a time
Load Balancing Select queries are distributed among the master and slave servers in random manner. Other queries are executed on master and replicated to slaves MySQL clusters - handle transactions in a round robin manner N/A N/A N/A
Replication Slony1 replication solution, not core Master-Slave replication and Chained replication Master-Slave replication with manual fail-over N/A N/A
Online Backup Write-ahead log and PostgreSQL point-in-time recovery, also Hot Standby System Online Backup with Mysqldump, InnoDB Hot Backup (commercial add-on), MySQL Clusters commited log Yes - Online backup utility Yes - backup manually data files and redo logs Yes - built-in tool
Cluster PGCluster in-memory, shared-nothing, replicatied to 4 replicas. If 1 node fails, currenct transaction fails even all other nodes survive N/A N/A Yes?
Ease of Embedding in Java-based Application Needs a lot of customizations Can use MySQL Embedded Almost no customization requires Almost no customization requires Almost no customization requires, and run as server after embedded

Performance comparison
There is a published page about the performance comparison between PostgreSQL, MySQL, Derby, HSQLDB, and H2.
http://www.h2database.com/html/performance.html

History of Derby from Wikipedia
Apache Derby originated at Cloudscape Inc, an Oakland, California start-up founded in 1996 to develop Java database technology. The first release of the database engine, then called JBMS, was in 1997. Subsequently the product was renamed Cloudscape and releases were made about every six months.

In 1999 Informix Software, Inc., acquired Cloudscape, Inc. In 2001 IBM acquired the database assets of Informix Software, including Cloudscape. The database engine was re-branded to IBM Cloudscape and releases continued, mainly focusing on embedded use with IBM's Java products and middleware.

In August 2004 IBM contributed the code to the Apache Software Foundation as Derby, an incubator project sponsored by the Apache DB project. In July 2005 the Derby project graduated from the Apache incubator and is now being developed as a sub-project of the DB Top Level Project at Apache. Prior to Derby's graduation from incubation, Sun joined the Derby project with an intent to use Derby as a component in their own products, and with the release of Java 6 in December 2006, Sun started packaging Derby in the JDK branded as Java DB.

In March 2007 IBM announced that they would withdraw marketing and support for the Cloudscape product, but would continue to contribute to the Apache Derby project.

Conclusion
My own findings and choice of preferences:
1) H2
2) Derby
3) HSQLDB
4) MySQL
5) PostgreSQL
If Java is the programming language of choice for PRP, then H2 or Derby should be the most suitable choice from my opinion since they satisfied most of the selection criteria and also Derby is Java DB which is the standard for Java embedded database. However the only disadvantage of Derby is that under embedded mode it cannot run as server so may cause some problem later on, on the other hand H2 offers the exact functionality and can be run as server even when embedded in application.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Sep 21, 2007

    Anonymous says:

    It is not true that Derby cannot run as a server while it's running in embedded ...

    It is not true that Derby cannot run as a server while it's running in embedded mode. See Embedded server example in Derby Server and Administration Guide: http://db.apache.org/derby/docs/10.3/adminguide/radminembeddedserverex.html

    It is also important to point out that Derby is a fully ACID-compliant database, whereas H2 lacks the D (durability). See for instance http://weblogs.java.net/blog/davidvc/archive/2007/05/that_little_iss.html

  2. Sep 21, 2007

    Anonymous says:

    I find the performance comparison on the H2 page a bit biased. The total number...

    I find the performance comparison on the H2 page a bit biased. The total number shows a significant difference between H2 and HSQLDB. If you look closer at the numbers, this is a result of significant difference in performance in only one of the tests. For most tests, the performance is similar.

    With respect to Derby performance, it looks from the numbers as it is done without reusing prepared statements. Derby is not the best at ad-hoc execution of statements, but much more efficient if the application is reusing prepared statements. Also, Derby guarantees durability while H2 does not. There is a performance penalty for that. The performance test is single-threaded. If you need to a database with multiple connections in parallel, you will see that Derby, PostgreSQL and MySQL is more suited than H2.

  3. Sep 23, 2007

    Anonymous says:

    ACID-compliance: The problem is a bit more complex than. Apache Derby claims to ...

    ACID-compliance: The problem is a bit more complex than. Apache Derby claims to be ACID compliant, but is in fact not. Apache Derby does not call FileDescriptor.sync() on each commit. See also http://www.h2database.com/html/advanced.html#durability_problems

    > I find the performance comparison on the H2 page a bit biased.
    If there is a problem with the algorithm or the test setup, please tell me! Apache Derby did get faster in the last release, the numbers will be updated in the next two weeks or so.

    > It looks from the numbers as it is done without reusing prepared statements
    No, PreparedStatement are use. See also the benchmark source code, and http://www.h2database.com/html/performance.html ("Wherever possible, the test cases use prepared statements.")

    > There is a performance penalty for that
    No, not the way Apache Derby does 'try to be' ACID compliant (it uses RandomAccessFile(.., "rws"). Apache Derby is slow for other reasons, mainly architecture.

    > The performance test is single-threaded.
    Some benchmarks yes, but not all (BenchB). In most situations, embedded databases will not only be accessed by one application, so it does make sense to mainly test with one connection.

    > multiple connections in parallel
    It is true that H2 does not yet support fine grained locking / MVCC like the others. But MVCC is already partially implemented (see docs), and it doesn't look like this will impact performance a lot (maybe 10% or so).

    Thomas (maintainer of H2)

  4. Sep 23, 2007

    Anonymous says:

    About the missing data in the matrix: H2 Maximum Database Size: 256 GB for the ...

    About the missing data in the matrix:

    H2 Maximum Database Size: 256 GB for the data, excluding LOBs. (http://www.h2database.com/html/faq.html#size_limit)
    (does not depends on the memory size)

    H2 Maximum Number of Tables: correct (it does currently depend on the memory size, and unfortunately this was already a problem for somebody who tried to create 'tens of thousands of tables')

    H2 Maximum Table Size: 256 GB, excluding LOBs.

    H2 Maximum Row Size: 256 GB excluding LOBs.

    H2 Interfaces: JDBC; others (ODBC, C, etc.) via PostgreSQL compatibility

    H2 and HSQLDB Bind Variables: Yes

    H2 and HSQLDB Row Level Locking: No (H2: table level locking, experimental MVCC)

    H2 Scalability/SMP: Multithread listener, but execute queries one at a time by default (multi-threaded is not default yet)

    Apache Derby Replication: Yes? (Experimental?)

    H2 Cluster: Yes

  5. Dec 12, 2007

    Anonymous says:

    Can you help me please, when can I find scripts for perfomance tests?

    Can you help me please, when can I find scripts for perfomance tests?

  6. Dec 13, 2007

    Matt Zukowski says:

    One huge factor to take into consideration when choosing a database is the ease ...

    One huge factor to take into consideration when choosing a database is the ease of finding people with expertise in that system. I think this is especially important for educators, who are not necessarily programmers/database specialists themselves, and must rely on others to do the dirty work. Whatever the advantages of the more obscure systems like H2, these must be weighted against the availability of people who are able to develop, administer, and support these databases.

    My suggestion would be that if you are unsure about your ability to call upon people able to resolve your technical problems, try to stick to the most commonly used, generally accepted standards (in the case of databases, this being MySQL, and maybe HSQLDB/Derby in the Java world). What you sacrifice in performance or development integration, you gain in your ability to keep your application running down the line.

  7. Jan 22, 2008

    Anonymous says:

    Any info available on Max Connections allowed for each DBS? Thanks! Bill ...

    Any info available on Max Connections allowed for each DBS?

    Thanks!

    Bill

    wep3@bellsouth.net

  8. Feb 02, 2008

    Anonymous says:

    Apache Derby's performance is incredibly bad when compared to H2. I wouldn't eve...

    Apache Derby's performance is incredibly bad when compared to H2. I wouldn't even consider derby.

  9. Mar 19, 2008

    Anonymous says:

    Are you talking about Apache Derby 10.3? http://jazoon.com/download/presentation...

    Are you talking about Apache Derby 10.3?
    http://jazoon.com/download/presentations/860.pdf

  10. Aug 29, 2008

    Anonymous says:

    One annoying thing in Derby, it Does not support Boolean Data Type, while being ...

    One annoying thing in Derby, it Does not support Boolean Data Type, while being more advanced than H2. I know some would debate on using Integer with 0's and 1's, but in doing that are'nt you increasing the code size.

    I also had difficulty in Starting the Server with Specified Store Directory in HSQLDB while being embedded in the Application.

  11. Feb 19

    Anonymous says:

    How are you. Greatness is more than potential. It is the execution of that poten...

    How are you. Greatness is more than potential. It is the execution of that potential. Beyond the raw talent. You need the appropriate training. You need the discipline. You need the inspiration. You need the drive. Help me! Can not find sites on the: Sustiva 2008. I found only this - <a href="http://www.time-to-move.net/Members/Sustiva">sustiva side effects</a>. Sustiva, thornhill, a addition with foley hoag, does 13 possible union and withdrawl research cuts fees in his organization. Hiv " to be video spent intellectually from collecting healthmaps; hiv " but end control angriber, sustiva. Best regards :cool:, Alfonso from Estonia.

  12. May 08

    Anonymous says:

    Good chart. Some updates that might change your mind regarding the order of your...

    Good chart. Some updates that might change your mind regarding the order of your recommendations. Sun purchases MySQL, Michael Widenius, the original creator of the MySQL left the project over differences and started "Maria". Now Oracle has bought MySQL. PostgreSQL is looking pretty good now.