Our Articles

A Monthly Article from our Speakers

Current Article of the month

Articles of 2020

Articles of 2019

Articles of 2018

Articles of 2017

Articles of 2016

Articles of 2015

Articles of 2014

Articles of 2013

Articles of 2012

Articles of 2011

Articles of 2010

Articles of 2009

Articles of 2008

Articles of 2007

Articles of 2006

Articles of 2005

Articles of 2004

Articles of 2003

Articles of 2002

Rick van der LansIs NoSQL Turning Into SQL?

by Rick van der Lans

August 2014

 

Undoubtedly, big data is one of the biggest trends in the IT industry. Big data involves storing, managing, and analyzing massive amounts of data. Today, organizations have a choice with respect to storing data: they can choose a SQL-based solution or a NoSQL-based solution, such as Hadoop, Cassandra, or MongoDB. Especially this second group has received a lot of attention the last few years. Lately, many products have become available that wrap a NoSQL system with a SQL layer. In other words, NoSQL systems are turning into SQL systems. But why is that?

It’s hard to give a definition of NoSQL, because they’re all different. They have only two things in common. First, they have been categorized in a group called NoSQL. Second, they all have a highly scalable MPP architecture that allows them to spread processing and storage over hundreds of nodes. Their internal architecture is ideal for scale-out, whereas classic SQL databases have an architecture designed for scale-up. But that’s where the similarity ends.
NoSQL products are very different from SQL systems. They don’t organize their data in flat tables consisting of columns, they don’t support data integrity, they are not restricted to flat relational structures but allow for hierarchical structures in the data, and they don’t support a high-level language such as SQL, but low-level APIs and languages.
These different database concepts and interfaces have been implemented for very good reasons. It allows them to be used in big data environments where the storage and processing demands are high to extremely high.
However, almost all the popular tools for reporting and analytics require data to be organized in SQL tables and expect a SQL interface; they don’t work with NoSQL. In addition, because of the low-level APIs they support productivity is lower than with SQL, and finally, not all developers (especially within BI departments) have expertise with programming in languages such as Java or Python.
For such reasons, to use the well-known reporting and analytical tools together with NoSQL requires that data is first copied from a NoSQL system to a SQL system. As can be imagined, this copying process takes time, and storing big data twice (purely to get a SQL interface) is expensive. It’s better if the data stays within the NoSQL system and can be accessed through a SQL interface.
The good news is that more and more solutions have become available for SQL-fication of NoSQL. These tools that allow applications to access data stored in NoSQL systems by using SQL. These SQL-fication solutions can be classified as follows:
Dedicated driver: Many solutions offer a SQL interface on one or more NoSQL systems. Some also allow access to SQL data sources and can federate data stored in multiple data sources. These drivers are responsible for transforming incoming SQL requests in the languages supported by the NoSQL system. They hide the complexities of the low-level languages. Examples are Apache Hive, Cassandra CQL, Cloudera Impala, DataDirect Cloud, Facebook Presto, Hortonworks Stinger, IBM BigSQL, MapR Drill, Quest Toad for Cloud Databases, and Salesforce Phoenix.
Data virtualization server: Data virtualization servers offer comparable SQL-to-NoSQL translation features as the previous category. There are three main differences between these categories. Data virtualization servers support more powerful query optimization features when accessing the data sources, they have more mature federation technologies, and they offer data security capabilities and design environments. Examples are Cisco/Composite Information Server, Denodo Platform, Informatica IDS, RedHat JBoss Data Virtualization, and Stone Bond Enterprise Enabler Virtuoso.
SQL database server: Some of the SQL database servers give developers a choice: SQL tables can be stored in the native database or in a NoSQL system such as Hadoop. Where the data is stored is hidden for the applications. Examples are Actian Paraccell, EMC Greenplum UAP, Hadapt, Microsoft Polybase, and Teradata Aster database.
Note that SQL-fication of NoSQL is not as simple as it may seem, especially because performance is such an important aspect. Here are some of the issues these solutions have to tackle:
• Some of the data in NoSQL systems is schema-less. How does the SQL interface turn this schema-less data into relational data with a schema?
• Most NoSQL systems allow that records belonging to the same table have different sets of columns. In SQL this concept is not supported.
• Many NoSQL systems support non-relational constructs such as hierarchical structures. In relational terminology these would be called nested tables or non-first normal form tables (NF2 tables). Somehow the SQL-fication solutions have to flatten these hierarchical structures to flat relational structures and back.
• Joining data in SQL is a very common thing to do. Therefore, SQL database servers are overloaded with features to execute joins fast. Many NoSQL systems are not fast when joining data. They prefer data to be stored in some denormalized way to minimize the need for joins. It will be a challenge for SQL-fication solutions to run SQL joins fast.
The SQL-fication of NoSQL has started and will continue to evolve. The expectation is that this year more new products will be released and improved versions of existing ones will become available. This is a good thing. Organizations are not interested in NoSQL because they don’t like SQL, it’s because they need their scalability and performance. However, they also want the productivity and maintainability of SQL. SQL-fication will offer them the best of both worlds: the performance of NoSQL and the productivity of SQL.