Databases, ACID Compliance, NoSQL, and More

NoSQL has been in the media for the last couple years as one of the new marketing buzzwords and you may be wondering exactly what it is, what it can do, and how it can fit into your current infrastructure.

One of the first things to do is describe exactly what is meant by SQL and RDBMS and then proceed to explain the characteristics of databases and NoSQL databases in particular.

SQL and RDBMS

SQL stands for Structured Query Language.  When we refer to SQL we are talking about statements such as “SELECT * from my_Table;”  That statement will go and ask a database to return all of the rows from a relational database management system (RDBMS).  SQL acts as the interface language to a relational database.  A relational database is a specific type of database with data stored in rows structured to a rigidly defined schema.  One of the reasons that RDBMS are so powerful is that they are originally based on Set Theory and the Relational Model from Mathematics.  This allows for some powerful mathematical concepts to be translated into queries and vice versa allowing for surprising versatility in storing and analyzing data in disparate tables. See here for one of the earlier papers on relational databases.  You will often hear of a RDBMS referred to as a SQL database, this is not completely true as there are Non-RDBMS databases that have SQL-like front-end languages but most individuals interchange RDBMS and SQL when referring to relatinoal databases.  Common SQL/RDBMS are MySQL, Oracle Database, Sybase, IBM DB2, MSSQL, etc.

NoSQL

NoSQL is currently being redefined by many as “Not Only SQL” rather than “Not SQL” due to the development of SQL-like front-ends for Non-relational databases.  The term NoSQL doesn’t really mean that there isn’t SQL available but rather the backend database doesn’t follow the relational model.  NoSQL now encompasses many different types of Non-relational databases such as Document, Graph, Columnar and Key-value databases being the most commonly discussed.  They each have their own benefits and drawbacks which have been espoused in many other blog posts from other sources.

The main point to bring out with the different database types is that whereas prior to the late 90’s early 00’s when a database was being referred to, 99 out of 100 times, a relational database was being discussed.  Since the development of NoSQL database projects, there has been a movement in the technology sector to use the appropriate database for the data which is to be stored rather than automatically throwing everything into a relational database.  Some databases are extremely fast from a read perspective while others offer extreme scalability.  While this expansion of database types has increased the workload and planning for developers, it has also created an opportunity to implement tools which may suit a particular type of job 100’s of times better than a typical RDBMS.  Commonly mentioned NoSQL databases are Redis, Riak, CouchDB, HBase, MongoDB, Neo4j, etc.

This all sounds fantastic when it is first heard and the thought that everyone in every situation should immediately re-evaluate their database choice may occur.  There are other issues however.  Beyond support, which always develops if there is a need, there needs to be a discussion between ACID compliance, conflict resolution, scalability, etc.

ACID Compliance

If you’ve dealt with databases before, you’ve most likely heard of ACID.  ACID stands for Atomicity, Consistency, Isolation and Durability.  Here’s a quick (and by no means complete) discussion of the ACID concepts.

  • Atomicity refers to making sure that a transaction COMPLETELY succeeds or fails COMPLETELY.  So that if part of the transaction fails, none of it completes.  This is important when making changes in a relational database with Referential Integrity so that if multiple tables are being edited, the transaction doesn’t half complete and then you’re left with a mess
  • Consistency means that the transaction won’t put the database into an invalid state.  So that if there are constraints or triggers, those won’t be violated by the transaction.
  • Isolation means that if multiple transactions occur at the same time, they will force the database into the same state if they had been run one at a time.
  • Durability means that once a transaction has occured, the database will hold that state even in the event of an outside event such as a power loss or error.  This means that the database actually wrote the data to non-volatile memory (i.e. not RAM but actual storage) so a power loss won’t wipe the previous state.

There are many more complications to those ACID concepts but those are the general descriptions.  Obviously, when dealing with the monolithic systems that predominated early computing efforts, ACID compliance was a hurdle but an achievable one.  Today, the focus is on highly available distributed computing that can replicate changes across nodes that may be on separate continents.  The mathematical theorem known as the CAP theorem states that it is impossible for a distributed system to provide a guarantee of Consistency, Availability and Partition tolerance all at the same time.  (Note that consistency in CAP Theorem is slightly different than ACID consistency)

As a result, ACID compliance becomes an issue so other compliance models have been created.  To date (and to my knowledge) there isn’t a NoSQL database that exhibits true ACID compliance.  There are some that have a limited amount of ACID compliance but not full compliance.  However, there are many that achieve similar goals through a different process.

BASE compliance (Eventual Consistency)

BASE stands for Basically Available, Soft state, Eventual consistency and is used when dealing with distributed computing to handle consistency.  The key from our standpoint is Eventual Consistency.  Eventual Consistency means that at some point, all of the data sources will show the same data.  That may not occur at the immediate point of the transaction in all databases but replication will eventually push that change out.  How long this “eventual” time period may be is defined by the individual database paradigm.

Conflict Resolution

As a consequence, conflict resolution must be handled.  What if a change is being propagated, but not completed yet, from Node A to Node B.  Before the change can be finished, another transaction on Node B modifies the data that Node A wanted to modify.  As a result, there is a conflict and this needs to be handled.  Different databases handle these conflicts differently and the actual way conflicts are managed may affect which database is selected for a specific application.

ACID vs BASE

The main difference between ACID and BASE compliance is that ACID ensures that at the point in time of the transaction ACID compliance is respected.  BASE compliance allows for that compliance to be violated for a time as long as it eventually gets to a compliant end state.  This difference is largely due to the nature of distributed systems and the robustness required to perform replication across the different database nodes.  A nice primer on ACID vs BASE from the Association of Computing Machinery can be found here.

Scalability

Today, scalability is on the mind of everyone planning database applications.  How much can an application and its data store grow if need be.  Monolithic systems handle this by adding more physical resources.  Distributed systems may handle this by adding more nodes.  How a database supports scalability is another important feature to consider when selecting a database.

RDBMS and SQL vs NoSQL

So after that long-winded explanation, what does all of that discussion have to do with the difference between RDBMS and NoSQl?  It really comes down to the application and environment that is being created.  Distributed database systems aren’t going to have full ACID compliance and thus need to be investigated for the pros/cons of their conflict resolution, data management, etc.  Also, certain NoSQL types are much more valuable for how they store data and handle the relationships between individual datum points.  Scalability can become a major issue, what happens when the application needs to grow to support more users?

In my view, the question isn’t which one is inherently better between RDBMS and NoSQL.  Rather, the question to be asked is which tool is best.  Using the right database for the correct application is much more important than using the latest technology if it doesn’t actually fit your needs.