• Open Source Databases Comparison
Skip to end of metadata
Go to start of metadata

The following chart will illustrate the comparison between several open source databases. The comparison will cover: PostgresSQL 8.2, MySQL 5, Derby 10.3, and HSQLDB 1.8 in terms of their limitations and strength. The goal of this analysis is to provide enough background information in the process of decision making in choosing a suitable open source database for PRP and which will also be easy to install and probably able to be bundle together with the PRP without extra installation.

Selection Criteria

  • FREE or Open Source
  • Multiple platform support
  • Easy to install and use
  • Can be bundle in code so little or no extra installation is required
  • Can be scalable, i.e. load balancing, size limit, etc.
  • Reliable, i.e. backup, replication, etc.
  • Efficient, i.e. indexing, fast search, etc.

 

PostgreSQL 8.2

MySQL 5.0

Apache Derby 10.3

HSQLDB 1.8

H2

OS support

First windows supported version, only support linux and windows, special compilation requires for other platforms

Supports Windows/Linux/Unix/Mac OSX

Any OS that can run Java

Any OS that can run Java

Any OS that can run Java

Documentation

Full pre and post installation documentations

Extensive pre and post installation documentations

Extensive pre and post installation documentations

Full pre and post installation documentations

Extensive pre and post installation documentations

License

BSD License - Free

Depends - GPL for open source or Commercial for commercial

Apache License v2 - Free

BSD License - Free

Modified MPL - Free

Maximum Database Size

Unlimited

Unlimited

Unlimited

Unlimited (depends on memory)

Unlimited (depends on memory)

Maximum Number of Tables

Unlimited

Unlimited

Unlimited

Unlimited (depends on memory)

Unlimited (depends on memory)

Maximum Table Size

32 TB

64 TB

Unlimited

8 GB

?

Maximum Row Size

1.6 TB

8 kB

Unlimited

Unlimited (depends on memory)

?

Maximum Field Size

1 GB

255 B

2 GB

Unlimited (depends on memory)

?

Maximum Rows per Table

Unlimited

Unlimited (Limited by tablespace of 64TB)

Unlimited

Unlimited (depends on memory)

Unlimited (depends on memory)

Maximum Columns per Table

250 - 1600 depending on column types

1000 columns

1012 columns

Unlimited (depends on memory)

Unlimited (depends on memory)

Maximum Indexes per Table

Unlimited

64 indexes

32767 indexes

Unlimited (depends on memory)

Unlimited (depends on memory)

Indexes

32 fields - B-Tree, R-Tree, GiST, and Hash

16 fields - B-Tree

16 fields

?

?

Interfaces

ODBC, JDBC, C, etc.

ODBC, JDBC, C, etc.

JDBC

JDBC

JDBC

Transactional

ACID-compliant

InnoDB type tables only - ACID-compliant

Yes

?

Yes

Bind Variables

Yes

Yes

Yes

?

?

Stored Procedures

Yes - SQL, PL/Tcl, PL/pgSQL

Yes

Yes

Yes

Yes

Trigger

Yes - SQL, PL/Tcl, PL/pgSQL

Yes

Yes

Yes

Yes

Row Level Locking

Yes

Yes

Yes

?

Yes

Timeout

Yes

No

Yes - deadlock detection and timeout

Yes

Yes

Error Code

Yes

Yes

Yes

Yes

Yes

Date Time

Broad range of date/time formats

Broad range of date/time formats

Broad range of date/time formats

Broad range of date/time formats

Broad range of date/time formats

Scalability/SMP

Each connection uses one CPU, but spread the processes of different connections across available CPUs

SMP is supported by using OS threads. All connections are executed in one thread, queries are not using multiple CPUs

Yes, embedded engine does all work in client threads

Multithread listener, but execute queries one at a time

Load Balancing

Select queries are distributed among the master and slave servers in random manner. Other queries are executed on master and replicated to slaves

MySQL clusters - handle transactions in a round robin manner

N/A

N/A

N/A

Replication

Slony1 replication solution, not core

Master-Slave replication and Chained replication

Master-Slave replication with manual fail-over

N/A

N/A

Online Backup

Write-ahead log and PostgreSQL point-in-time recovery, also Hot Standby System

Online Backup with Mysqldump, InnoDB Hot Backup (commercial add-on), MySQL Clusters commited log

Yes - Online backup utility

Yes - backup manually data files and redo logs

Yes - built-in tool

Cluster

PGCluster

in-memory, shared-nothing, replicatied to 4 replicas. If 1 node fails, currenct transaction fails even all other nodes survive

N/A

N/A

Yes?

Ease of Embedding in Java-based Application

Needs a lot of customizations

Can use MySQL Embedded

Almost no customization requires

Almost no customization requires

Almost no customization requires, and run as server after embedded

Performance comparison
There is a published page about the performance comparison between PostgreSQL, MySQL, Derby, HSQLDB, and H2.
http://www.h2database.com/html/performance.html

History of Derby from Wikipedia
Apache Derby originated at Cloudscape Inc, an Oakland, California start-up founded in 1996 to develop Java database technology. The first release of the database engine, then called JBMS, was in 1997. Subsequently the product was renamed Cloudscape and releases were made about every six months.

In 1999 Informix Software, Inc., acquired Cloudscape, Inc. In 2001 IBM acquired the database assets of Informix Software, including Cloudscape. The database engine was re-branded to IBM Cloudscape and releases continued, mainly focusing on embedded use with IBM's Java products and middleware.

In August 2004 IBM contributed the code to the Apache Software Foundation as Derby, an incubator project sponsored by the Apache DB project. In July 2005 the Derby project graduated from the Apache incubator and is now being developed as a sub-project of the DB Top Level Project at Apache. Prior to Derby's graduation from incubation, Sun joined the Derby project with an intent to use Derby as a component in their own products, and with the release of Java 6 in December 2006, Sun started packaging Derby in the JDK branded as Java DB.

In March 2007 IBM announced that they would withdraw marketing and support for the Cloudscape product, but would continue to contribute to the Apache Derby project.

Conclusion
My own findings and choice of preferences:
1) H2
2) Derby
3) HSQLDB
4) MySQL
5) PostgreSQL
If Java is the programming language of choice for PRP, then H2 or Derby should be the most suitable choice from my opinion since they satisfied most of the selection criteria and also Derby is Java DB which is the standard for Java embedded database. However the only disadvantage of Derby is that under embedded mode it cannot run as server so may cause some problem later on, on the other hand H2 offers the exact functionality and can be run as server even when embedded in application.

  • No labels

10 Comments

  1. Anonymous

    It is not true that Derby cannot run as a server while it's running in embedded mode. See Embedded server example in Derby Server and Administration Guide: http://db.apache.org/derby/docs/10.3/adminguide/radminembeddedserverex.html

    It is also important to point out that Derby is a fully ACID-compliant database, whereas H2 lacks the D (durability). See for instance http://weblogs.java.net/blog/davidvc/archive/2007/05/that_little_iss.html

  2. Anonymous

    I find the performance comparison on the H2 page a bit biased. The total number shows a significant difference between H2 and HSQLDB. If you look closer at the numbers, this is a result of significant difference in performance in only one of the tests. For most tests, the performance is similar.

    With respect to Derby performance, it looks from the numbers as it is done without reusing prepared statements. Derby is not the best at ad-hoc execution of statements, but much more efficient if the application is reusing prepared statements. Also, Derby guarantees durability while H2 does not. There is a performance penalty for that. The performance test is single-threaded. If you need to a database with multiple connections in parallel, you will see that Derby, PostgreSQL and MySQL is more suited than H2.

  3. Anonymous

    ACID-compliance: The problem is a bit more complex than. Apache Derby claims to be ACID compliant, but is in fact not. Apache Derby does not call FileDescriptor.sync() on each commit. See also http://www.h2database.com/html/advanced.html#durability_problems

    > I find the performance comparison on the H2 page a bit biased.
    If there is a problem with the algorithm or the test setup, please tell me! Apache Derby did get faster in the last release, the numbers will be updated in the next two weeks or so.

    > It looks from the numbers as it is done without reusing prepared statements
    No, PreparedStatement are use. See also the benchmark source code, and http://www.h2database.com/html/performance.html ("Wherever possible, the test cases use prepared statements.")

    > There is a performance penalty for that
    No, not the way Apache Derby does 'try to be' ACID compliant (it uses RandomAccessFile(.., "rws"). Apache Derby is slow for other reasons, mainly architecture.

    > The performance test is single-threaded.
    Some benchmarks yes, but not all (BenchB). In most situations, embedded databases will not only be accessed by one application, so it does make sense to mainly test with one connection.

    > multiple connections in parallel
    It is true that H2 does not yet support fine grained locking / MVCC like the others. But MVCC is already partially implemented (see docs), and it doesn't look like this will impact performance a lot (maybe 10% or so).

    Thomas (maintainer of H2)

  4. Anonymous

    About the missing data in the matrix:

    H2 Maximum Database Size: 256 GB for the data, excluding LOBs. (http://www.h2database.com/html/faq.html#size_limit)
    (does not depends on the memory size)

    H2 Maximum Number of Tables: correct (it does currently depend on the memory size, and unfortunately this was already a problem for somebody who tried to create 'tens of thousands of tables')

    H2 Maximum Table Size: 256 GB, excluding LOBs.

    H2 Maximum Row Size: 256 GB excluding LOBs.

    H2 Interfaces: JDBC; others (ODBC, C, etc.) via PostgreSQL compatibility

    H2 and HSQLDB Bind Variables: Yes

    H2 and HSQLDB Row Level Locking: No (H2: table level locking, experimental MVCC)

    H2 Scalability/SMP: Multithread listener, but execute queries one at a time by default (multi-threaded is not default yet)

    Apache Derby Replication: Yes? (Experimental?)

    H2 Cluster: Yes

  5. One huge factor to take into consideration when choosing a database is the ease of finding people with expertise in that system. I think this is especially important for educators, who are not necessarily programmers/database specialists themselves, and must rely on others to do the dirty work. Whatever the advantages of the more obscure systems like H2, these must be weighted against the availability of people who are able to develop, administer, and support these databases.

    My suggestion would be that if you are unsure about your ability to call upon people able to resolve your technical problems, try to stick to the most commonly used, generally accepted standards (in the case of databases, this being MySQL, and maybe HSQLDB/Derby in the Java world). What you sacrifice in performance or development integration, you gain in your ability to keep your application running down the line.

  6. Anonymous

    Any info available on Max Connections allowed for each DBS?

    Thanks!

    Bill

    wep3@bellsouth.net

  7. Anonymous

    Apache Derby's performance is incredibly bad when compared to H2. I wouldn't even consider derby.

    1. Anonymous

  8. Anonymous

    One annoying thing in Derby, it Does not support Boolean Data Type, while being more advanced than H2. I know some would debate on using Integer with 0's and 1's, but in doing that are'nt you increasing the code size.

    I also had difficulty in Starting the Server with Specified Store Directory in HSQLDB while being embedded in the Application.

  9. Anonymous

    Good chart. Some updates that might change your mind regarding the order of your recommendations. Sun purchases MySQL, Michael Widenius, the original creator of the MySQL left the project over differences and started "Maria". Now Oracle has bought MySQL. PostgreSQL is looking pretty good now.