Storing Objects in a Database

Source: comp.database.object, comp.object
Date: 28-Jul-97

I'm working in a project where we need to do a mapping between our object model and a traditional relational/navigational database. We need to be able to connect to a various different database types, like Btrieve, Informix, DB2, via ODBC and so on. I.e. we need total independence of the database.

Have anyone done this before and have any clues? Right now we are considering writing the layers by ourselves, but it takes time and time cost.

Robert C. Martin explained the problems:

The problem actually has very little to do with OO and RDB. The problem has to do with OO and DB. That is, the problem exists with any kind of database, including OO databases.

The reason is that applications are dynamic, and databases are static. There is a paradigm boundary that must be crossed when communicating between these two domains.

Consider. A database, any type of database, is a repository of information. An application, on the other hand, is a manipulator of information. One is static and the other is dynamic. The database is a representation of information, and the application is a representation of behavior. Two different paradigms.

In the case of the applications, our goal is to encapsulate the data and expose the behavior. In the case of the database, our goal is to expose the data and remove the behavior altogether. This is the boundary that is difficult to cross.

Now you might think that in an OO database, the object in the database possess complex behavior and so the database can encapsulate data and expose behavior just like an application. However there are forces that prevent this from being an effective strategy.

The database is typically depended upon by many different applications. The applications are not depended upon by the database. Thus the applications are free to change without affecting the database, but the database cannot change without affecting all the applications that use it. This means that the database is extremely stable (i.e. difficult to change) and the applications are relatively instable (easier to change).

This difference in stability is the crux of the problem. If the behaviors used by the applications are part of the database, then those behaviors are as difficult to change as the database. i.e. when you change the behavior of an object in the database you affect all the other applications that depend upon the database. The more applications that use the database the more intolerable this becomes.

Thus, the designers of the database are driven to export the behavior from the database and into the applications, thus reinforcing the paradigm crossing.

Why is the paradigm crossing difficult? There are two reasons. First the modeling needs are reversed. We must cross from a domain in which data is hidden to a domain in which data is exposed. This by itself would be enough to make matters difficult. However, there is another more significant factor.

Databases are difficult to change, and yet we need to change them often! Sometimes we simply need to change the schema. Other times we need to rip out a DBMS and put in a new DBMS. e.g. swapping Sybase for Oracle or something like that.

In order to allow that to happen, we have to protect the applications from the database. We have to build a layer in each application that hides the details of the database from the rest of the application. This layer is what bridges the paradigm boundary and protects the application from changes to the database.

The layer depends upon the database, and depends upon the application. The application does not depend upon the layer. (For details see Chapter 4 of my book: Designing Object Oriented C++ Applications using the Booch Method.) The main idea is to use the Proxy pattern, or some other design pattern that inverts the dependencies between the application and the layer.

This layer is horribly dependent. When the schema changes, the layer changes too. When the application changes, the layer changes too. The layer is a focus for maintenance. It breeds nightmares.

[...]

BTW, the difficulty in crossing this layer is not simply a result of using OO. Any well designed application in any paradigm has the same problems. Encapsulation has been a principle of software design for much longer than OO has been popular. In any application where behavior is exposed and data is encapsulated, the application/database problem arises. OO heightens the problem somewhat because it enforces the encapsulation to the point where it is difficult to break the rules, but it is not the cause of the problem. The cause of the problem is the desire for well designed applications.

Joshua Duhl suggested some commercial solutions:

My first bit of advice is don't build it yourself -- it will be very costly to build and to maintain. For the money that you'd put into building it yourself you can buy something that will get you a lot farther, with much less, but not necessarily no, problems. That being said, there are a bunch of products on the market which may help you get where you want to go. You may, in fact, depending on the product you select (assuming you select one :-) have to do some bit of tailoring of the solution to fit your needs exactly....

What follows is my personal FAQ answer for this question (having researched and presented on the topic numerous times... perhaps I can submit it to a FAQ site? :-)

There are a host of products available for connecting OO languages to relational and possibly non-relational data stores. Each product can offer a range of features. The list below outlines some of the possible features, but you will need to check with the vendors for specific support of these features. They include:

a specific OO language binding (C++, Smalltalk, Java)
database independence
transparency (to the database & perhaps to the mapping layer itself)
a degree of flexibility of mapping (e.g., kinds of mapping, subtyping, and whether it is compiled in code or stored outside code, etc.)
direction of engineering (forward -- starting with OO model and generating relational schema or reverse -- starting w/ relational schema/database and generating an OO layer in the language)
support for mapping OO collection classes to database constructs
SQL support and generation (generates SQL, stored procedures, calls existing SQL code, allows access to the SQL, etc.)
Concurrency control (adds additional concurrency options, supports multiple concurrency options, uses only database's concurrency control)
transactional capabilities (throughput, multithreading, etc.)
synchronization resolution (between in memory and RDBMS changes)
performance tuning options (prefetch, look ahead, multithreads...)
multiple simultaneous database access
multiple mappings to the same database
multiuser support
mapping creation (by hand, automatic, semi-automatic, by GUI, by textual description)
specific databases (e.g., Informix, Sybase, Oracle, etc.)
API support (e.g., ODBC, JDBC, etc.)

They also vary in the kinds of applications they are best suited for. Some are best for single users. Others scale up well for multi-users. Some are good for multi-database apps. Some are good for simultaneous updates of ODB and RDBs. Almost all generate the SQL, but they hide it or allow access to it to varying degrees.

Vendors & products include (with some brief and incomplete notes):

Persistence, PowerTier:: C++ or Java, small to large scale, multi-user, application server, mostly forward engineering + some reverse, hard coded mappings, mostly single database, SQL accessible if needed
ONTOS, Integrator:: C++, repository based & uses meta data to map between databases and COM/DCOM objects, GUI mapping, generates running/deployable COM/DCOM components directly from mapping, optimized for Windows NT, supports Visual Basic, C++, Java
Object Design, DBConnect:: C++, ODBMS based, reverse engineering, hard coded mappings, performance tuning options, single database, text based mapping definition, client side, SQL accessible if needed
POET Software, SQL Object Factory:: C++, Java, small-medium scale, forward engineering, multi-database, flexible mapping, soft coded mappings, multi-user (first release of product)
Rogue Wave, DBTools.h++:: C++, low level class library, SQL exposed, hard coded mappings
The Object People, Top Link:: Smalltalk and Java, class library based, flexible table-to-object, object-to-table and collection mappings, reverse engineering supported, GUI based mapping tool, trigger-based change notification with client or server side, pre and post function invocation.
GemStone, GemConnect:: Smalltalk, small-large scale, application server, default mappings with user override (you can rewrite them), SQL accessible, extensible API, repository holds mappings
Thought Inc, Cocobase:: Java, soft coded mappings, multiple mappings to same schema, JDBC based, object schema stored in RDB with relational data, multi-table updateable views, client side, GUI based tool
Novera, Database Wizard:: Java, flexible mapping, automated mappings, reverse engineering, powerful query language, GUI mapping, client side, performance tuning options, SQL hidden but accessible, additional Java classes, integrated with JBusiness (a Java application server), uses JDBC underneath
Ardent Software (formerly O2 Technology/Unidata), JRB and CRB (Java and C++ Relational Binding):: Java/C++ to relational database mapping. JRB is 100% pure Java; runs on any JDBC compliant driver; maps Java classes to hidden SQL queries; manages an objects cache to optimize the performance. Optimized versions for Oracle and Sybase. Has strong query support, supports ODMG OQL and Java Binding, strong collection support.
Watershed Technologies, Relational Object Framework (ROF):: Java, flexible table-to-object, object-to-table and collection mappings, forward and reverse engineering supported, can function as application server or client side, multiple caching options, can generate stored procedures, mapping stored as meta data, strong collection support, uses ODMG OQL and Java Binding.
Objectmatter, Business Sight Framework:: Java, flexible table-to-object, object-to-table and collection mappings, forward and reverse engineering supported, mapping stored as meta data, collections only, no query support.
Software Tree, JDX:: Java, can function as client side or single application server with multiple clients, caching in application server, lookahead/prefetch support, limited support for collections, mapping exists as meta data, cannot generate stored procedures but can call existing ones.
2Link Consulting, DbGen:: Generates Java code to map to RDBMS; uses JDBC; 100% Pure JavaBean; supports reverse engineering.
Sun Microsystems, Java Blend:: Java Blend: Java, supports forward and reverse engineering (Java --> DB, DB --> Java); multiple concurrency control options (pessimistic, optimistic and advanced optimistic with lock groups), table-to-object mapping, uses ODMG OQL and Java Bindings; GUI based; generates Java and SQL; generates stored procedures; prefetch based on relationship chains.

You will need to check with the vendors to see which specific databases they support. Most support ODBC or JDBC and some also have native database API support.

There are other smaller products but they are much less well known. So which you want to use largely depends on what your application needs, how much flexibility you need, what scale your application is, and how much you expect the tool to do for you.

And I add:

KORMA, generates Java and COM components based on a relational database schema

FORM, Framework for Object-Relational Mapping

JOP, Java Object Persistence (free library)

More Info:

Barry & Associates, ODBMS FAQ

Kyle Brown and Bruce G. Whitenack, Crossing Chasms: A Pattern Language for Object-RDBMS Integration

Wolfgang Keller and Jens Coldewey, Relational Database Access Layers - A Pattern Language

H. Vogelsang and U. Brinkschulte, Persistent Objects In A Relational Database (postscript)

Mark L. Fussell, Foundations of Object-Relational Mapping

Arthur M. Keller (from Persistence), Papers on object-oriented databases

Joseph W. Yoder, Joe's Object-Oriented Mapping Pages

Cetus links about Databases: Mapping Objects to Relations