The following chart will illustrate the comparison between several open source databases. The comparison will cover: PostgresSQL 8.2, MySQL 5, Derby 10.3, and HSQLDB 1.8 in terms of their limitations and strength. The goal of this analysis is to provide enough background information in the process of decision making in choosing a suitable open source database for PRP and which will also be easy to install and probably able to be bundle together with the PRP without extra installation.
Selection Criteria
- FREE or Open Source
- Multiple platform support
- Easy to install and use
- Can be bundle in code so little or no extra installation is required
- Can be scalable, i.e. load balancing, size limit, etc.
- Reliable, i.e. backup, replication, etc.
- Efficient, i.e. indexing, fast search, etc.
|
PostgreSQL 8.2 |
MySQL 5.0 |
Apache Derby 10.3 |
HSQLDB 1.8 |
H2 |
|---|---|---|---|---|---|
OS support |
First windows supported version, only support linux and windows, special compilation requires for other platforms |
Supports Windows/Linux/Unix/Mac OSX |
Any OS that can run Java |
Any OS that can run Java |
Any OS that can run Java |
Documentation |
Full pre and post installation documentations |
Extensive pre and post installation documentations |
Extensive pre and post installation documentations |
Full pre and post installation documentations |
Extensive pre and post installation documentations |
License |
BSD License - Free |
Depends - GPL for open source or Commercial for commercial |
Apache License v2 - Free |
BSD License - Free |
Modified MPL - Free |
Maximum Database Size |
Unlimited |
Unlimited |
Unlimited |
Unlimited (depends on memory) |
Unlimited (depends on memory) |
Maximum Number of Tables |
Unlimited |
Unlimited |
Unlimited |
Unlimited (depends on memory) |
Unlimited (depends on memory) |
Maximum Table Size |
32 TB |
64 TB |
Unlimited |
8 GB |
? |
Maximum Row Size |
1.6 TB |
8 kB |
Unlimited |
Unlimited (depends on memory) |
? |
Maximum Field Size |
1 GB |
255 B |
2 GB |
Unlimited (depends on memory) |
? |
Maximum Rows per Table |
Unlimited |
Unlimited (Limited by tablespace of 64TB) |
Unlimited |
Unlimited (depends on memory) |
Unlimited (depends on memory) |
Maximum Columns per Table |
250 - 1600 depending on column types |
1000 columns |
1012 columns |
Unlimited (depends on memory) |
Unlimited (depends on memory) |
Maximum Indexes per Table |
Unlimited |
64 indexes |
32767 indexes |
Unlimited (depends on memory) |
Unlimited (depends on memory) |
Indexes |
32 fields - B-Tree, R-Tree, GiST, and Hash |
16 fields - B-Tree |
16 fields |
? |
? |
Interfaces |
ODBC, JDBC, C, etc. |
ODBC, JDBC, C, etc. |
JDBC |
JDBC |
JDBC |
Transactional |
ACID-compliant |
InnoDB type tables only - ACID-compliant |
Yes |
? |
Yes |
Bind Variables |
Yes |
Yes |
Yes |
? |
? |
Stored Procedures |
Yes - SQL, PL/Tcl, PL/pgSQL |
Yes |
Yes |
Yes |
Yes |
Trigger |
Yes - SQL, PL/Tcl, PL/pgSQL |
Yes |
Yes |
Yes |
Yes |
Row Level Locking |
Yes |
Yes |
Yes |
? |
Yes |
Timeout |
Yes |
No |
Yes - deadlock detection and timeout |
Yes |
Yes |
Error Code |
Yes |
Yes |
Yes |
Yes |
Yes |
Date Time |
Broad range of date/time formats |
Broad range of date/time formats |
Broad range of date/time formats |
Broad range of date/time formats |
Broad range of date/time formats |
Scalability/SMP |
Each connection uses one CPU, but spread the processes of different connections across available CPUs |
SMP is supported by using OS threads. All connections are executed in one thread, queries are not using multiple CPUs |
Yes, embedded engine does all work in client threads |
Multithread listener, but execute queries one at a time |
|
Load Balancing |
Select queries are distributed among the master and slave servers in random manner. Other queries are executed on master and replicated to slaves |
MySQL clusters - handle transactions in a round robin manner |
N/A |
N/A |
N/A |
Replication |
Slony1 replication solution, not core |
Master-Slave replication and Chained replication |
Master-Slave replication with manual fail-over |
N/A |
N/A |
Online Backup |
Write-ahead log and PostgreSQL point-in-time recovery, also Hot Standby System |
Online Backup with Mysqldump, InnoDB Hot Backup (commercial add-on), MySQL Clusters commited log |
Yes - Online backup utility |
Yes - backup manually data files and redo logs |
Yes - built-in tool |
Cluster |
PGCluster |
in-memory, shared-nothing, replicatied to 4 replicas. If 1 node fails, currenct transaction fails even all other nodes survive |
N/A |
N/A |
Yes? |
Ease of Embedding in Java-based Application |
Needs a lot of customizations |
Can use MySQL Embedded |
Almost no customization requires |
Almost no customization requires |
Almost no customization requires, and run as server after embedded |
Performance comparison
There is a published page about the performance comparison between PostgreSQL, MySQL, Derby, HSQLDB, and H2.
http://www.h2database.com/html/performance.html
History of Derby from Wikipedia
Apache Derby originated at Cloudscape Inc, an Oakland, California start-up founded in 1996 to develop Java database technology. The first release of the database engine, then called JBMS, was in 1997. Subsequently the product was renamed Cloudscape and releases were made about every six months.
In 1999 Informix Software, Inc., acquired Cloudscape, Inc. In 2001 IBM acquired the database assets of Informix Software, including Cloudscape. The database engine was re-branded to IBM Cloudscape and releases continued, mainly focusing on embedded use with IBM's Java products and middleware.
In August 2004 IBM contributed the code to the Apache Software Foundation as Derby, an incubator project sponsored by the Apache DB project. In July 2005 the Derby project graduated from the Apache incubator and is now being developed as a sub-project of the DB Top Level Project at Apache. Prior to Derby's graduation from incubation, Sun joined the Derby project with an intent to use Derby as a component in their own products, and with the release of Java 6 in December 2006, Sun started packaging Derby in the JDK branded as Java DB.
In March 2007 IBM announced that they would withdraw marketing and support for the Cloudscape product, but would continue to contribute to the Apache Derby project.
Conclusion
My own findings and choice of preferences:
1) H2
2) Derby
3) HSQLDB
4) MySQL
5) PostgreSQL
If Java is the programming language of choice for PRP, then H2 or Derby should be the most suitable choice from my opinion since they satisfied most of the selection criteria and also Derby is Java DB which is the standard for Java embedded database. However the only disadvantage of Derby is that under embedded mode it cannot run as server so may cause some problem later on, on the other hand H2 offers the exact functionality and can be run as server even when embedded in application.

10 Comments
Hide/Show CommentsSep 21, 2007
Anonymous
It is not true that Derby cannot run as a server while it's running in embedded mode. See Embedded server example in Derby Server and Administration Guide: http://db.apache.org/derby/docs/10.3/adminguide/radminembeddedserverex.html
It is also important to point out that Derby is a fully ACID-compliant database, whereas H2 lacks the D (durability). See for instance http://weblogs.java.net/blog/davidvc/archive/2007/05/that_little_iss.html
Sep 21, 2007
Anonymous
I find the performance comparison on the H2 page a bit biased. The total number shows a significant difference between H2 and HSQLDB. If you look closer at the numbers, this is a result of significant difference in performance in only one of the tests. For most tests, the performance is similar.
With respect to Derby performance, it looks from the numbers as it is done without reusing prepared statements. Derby is not the best at ad-hoc execution of statements, but much more efficient if the application is reusing prepared statements. Also, Derby guarantees durability while H2 does not. There is a performance penalty for that. The performance test is single-threaded. If you need to a database with multiple connections in parallel, you will see that Derby, PostgreSQL and MySQL is more suited than H2.
Sep 23, 2007
Anonymous
ACID-compliance: The problem is a bit more complex than. Apache Derby claims to be ACID compliant, but is in fact not. Apache Derby does not call FileDescriptor.sync() on each commit. See also http://www.h2database.com/html/advanced.html#durability_problems
> I find the performance comparison on the H2 page a bit biased.
If there is a problem with the algorithm or the test setup, please tell me! Apache Derby did get faster in the last release, the numbers will be updated in the next two weeks or so.
> It looks from the numbers as it is done without reusing prepared statements
No, PreparedStatement are use. See also the benchmark source code, and http://www.h2database.com/html/performance.html ("Wherever possible, the test cases use prepared statements.")
> There is a performance penalty for that
No, not the way Apache Derby does 'try to be' ACID compliant (it uses RandomAccessFile(.., "rws"). Apache Derby is slow for other reasons, mainly architecture.
> The performance test is single-threaded.
Some benchmarks yes, but not all (BenchB). In most situations, embedded databases will not only be accessed by one application, so it does make sense to mainly test with one connection.
> multiple connections in parallel
It is true that H2 does not yet support fine grained locking / MVCC like the others. But MVCC is already partially implemented (see docs), and it doesn't look like this will impact performance a lot (maybe 10% or so).
Thomas (maintainer of H2)
Sep 23, 2007
Anonymous
About the missing data in the matrix:
H2 Maximum Database Size: 256 GB for the data, excluding LOBs. (http://www.h2database.com/html/faq.html#size_limit)
(does not depends on the memory size)
H2 Maximum Number of Tables: correct (it does currently depend on the memory size, and unfortunately this was already a problem for somebody who tried to create 'tens of thousands of tables')
H2 Maximum Table Size: 256 GB, excluding LOBs.
H2 Maximum Row Size: 256 GB excluding LOBs.
H2 Interfaces: JDBC; others (ODBC, C, etc.) via PostgreSQL compatibility
H2 and HSQLDB Bind Variables: Yes
H2 and HSQLDB Row Level Locking: No (H2: table level locking, experimental MVCC)
H2 Scalability/SMP: Multithread listener, but execute queries one at a time by default (multi-threaded is not default yet)
Apache Derby Replication: Yes? (Experimental?)
H2 Cluster: Yes
Dec 13, 2007
Matt Zukowski
One huge factor to take into consideration when choosing a database is the ease of finding people with expertise in that system. I think this is especially important for educators, who are not necessarily programmers/database specialists themselves, and must rely on others to do the dirty work. Whatever the advantages of the more obscure systems like H2, these must be weighted against the availability of people who are able to develop, administer, and support these databases.
My suggestion would be that if you are unsure about your ability to call upon people able to resolve your technical problems, try to stick to the most commonly used, generally accepted standards (in the case of databases, this being MySQL, and maybe HSQLDB/Derby in the Java world). What you sacrifice in performance or development integration, you gain in your ability to keep your application running down the line.
Jan 22, 2008
Anonymous
Any info available on Max Connections allowed for each DBS?
Thanks!
Bill
wep3@bellsouth.net
Feb 02, 2008
Anonymous
Apache Derby's performance is incredibly bad when compared to H2. I wouldn't even consider derby.
Mar 19, 2008
Anonymous
Are you talking about Apache Derby 10.3?
http://jazoon.com/download/presentations/860.pdf
Aug 29, 2008
Anonymous
One annoying thing in Derby, it Does not support Boolean Data Type, while being more advanced than H2. I know some would debate on using Integer with 0's and 1's, but in doing that are'nt you increasing the code size.
I also had difficulty in Starting the Server with Specified Store Directory in HSQLDB while being embedded in the Application.
May 08, 2010
Anonymous
Good chart. Some updates that might change your mind regarding the order of your recommendations. Sun purchases MySQL, Michael Widenius, the original creator of the MySQL left the project over differences and started "Maria". Now Oracle has bought MySQL. PostgreSQL is looking pretty good now.