An explanation of NoSQL databases such as open source MongoDB, Cassandra and Redis, and how they compare to relational databases such as MySQL.
This article is part of a Penton Technology special report on big data.
NoSQL databases have emerged as a key tool for organizations battling the data deluge. What does NoSQL actually mean, and which advantages does it deliver for data storage needs? Here's everything you need to know about NoSQL.
For starters, let's make clear that NoSQL is not a specific database product. It's a term that refers to a general category of database, which different vendors have implemented in different ways.
Yet all of the NoSQL products share a common defining characteristic, which is that (as the term implies) they do not use the relational-database model of traditional SQL-style databases, such as venerable MySQL.
What's a "Traditional" Database?
Understanding exactly what this means requires a quick primer on how most databases have typically worked for the past several decades. When you use a relational database like MySQL, you have to define ahead of time where your data is going to live. You create tables, store different pieces of data inside different tables and retrieve data based on the table structure.
MySQL and other relational databases are great if you know ahead of time what structure your data will take, and have a sense of how much data you need to store. But what if your storage needs are less predictable? What if they need to be highly scalable? Relational databases work less well in those situations.
NoSQL's Advantages: Simplicity, Scalability and Openness
That's where NoSQL comes in. NoSQL databases allow you to stuff data into a database without defining a formal storage structure ahead of time. That means you do not need to write as much code for an app to interact with a database. It also means you can retrieve data quickly without having to tell your program where exactly to find what you want within a large, rigid database structure.
NoSQL databases also tend to scale better, because they're designed to be able to run easily on distributed or clustered environments. In other words, a NoSQL database can run on top of multiple servers at the same time and still look to your app like a single database. That makes it easy to add more storage quickly if you suddenly have a lot more data to store -- which is a key advantage in an era when the cloud and IoT devices have imposed rapidly changing data storage needs on organizations.
NoSQL's support for distributed storage makes it different from traditional databases, which were designed before clusters and the cloud became the norm. True, you can "shard" relational databases, which means distributing them across multiple hosts, but it is more complicated than doing the same with NoSQL databases. It also tends to require more expensive hardware, whereas NoSQL databases can shard on cheap commodity servers.
The third key advantage that most NoSQL databases offer is that they're open source. Several traditional relational databases, including MySQL, are now open source as well. But they were not always open (at least in their ancestral forms), and they are still limited in some ways by a proprietary legacy that encouraged vendor lock-in. For example, even though the MySQL code is open source, the documentation that you need to make the most of MySQL is less openly available to the community.
Issues like these have not arisen in a serious way on the NoSQL front, probably because NoSQL databases came into widespread use within the last decade (technically, they have a much longer history, but that's fodder for a different post), when open source was already a commonly accepted practice.
Who's Developing NoSQL?
Again, NoSQL is a type of database, not a specific database product. There are now a wide variety of NoSQL implementations. But at the core of most of them is one of three main NoSQL projects:
The list of NoSQL vendors is longer. That's because, like many other open source projects (think OpenStack or Linux), these core NoSQL databases are available as distributions from multiple companies. IBM has a Cassandra offering, for example, even though it does not actually develop Cassandra. MongoDB is an open source project, but there is also a MongoDB company, which offers a commercial version of the database.