Introduction to Cassandra
Cassandra is an open source distributed storage system which is designed to store and manage large amounts of data over a commodity server. It can serve as a real-time operational data store for online transectional applications and read intended databases for a large-scale system as well. Designed to have peer-to-peer node connection, instead of master and slave nodes, Cassandra makes sure there can never be a single point of failure. It also automatically divides the data across all the nodes in the cluster, but the administrator has the authentication to control what data will be replicated and how many replica of the data will be created.
Features of Cassandra:
1.> Distributed and decentralized =>
Cassandra is distributed, which means it can run on multiple machines while appearing to run on a single machine to a user. Running Cassandra on a single node is also acceptable for getting started and learning how it works. It also provides better performance across multiple datacenters, as well as in a single cluster running on geographically different datacenters.
2.> Elastic Scalability =>
Scalability is a well known architectural feature of a system that can serve a greater number of requests with less degradation in performance. There are two kinds of scalability: vertical scaling, or adding memory and hardware in an existing system, and horizontal scaling, or adding more machines to share burden or greater requests.
Horizontal scalability has a special property known as Elastic Scalability, where you can scale up and scale down your cluster smoothly.
3.> High Availability and fault tolerant =>
Cassandra has high availability, which means we can replace any failed node in the cluster without any downtime. A user is allowed to replicate data across multiple datacenters to provide better performance and to avoid downtime.
4.> Tunable consistency =>
Cassandra is a database which provides flexibility to tune the consistency of the system. Setting up tunable consistency, also referred to as “Eventual Consistency,” means you can increase the consistency of your database, but you have to compromise the availability.
The above diagram shows if we increase the availability, the consistency will decrease and vice versa.
5.> High Performance =>
Cassandra has been developed to perform exceptionally well under a heavy load of requests. It provides very consistent and fast throughput for write operations per second on a basic commodity hardware/workstation. It allows us to maintain all desired properties without any effect on performance while adding more server/machines into a cluster.
6.> Row-oriented =>
This is a row oriented database, where a row can have one or more than one column. It also allows rows to have a different number of columns (the first row can have one column and a second row can have one or more than one column, also known as sparse). As it is not a relational database, it can represent data structure in sparse multidimensional hashtables.
7.> CQL =>
Cassandra Query Language, referred to as CQL, is the default and primary interface to the Cassandra database. CQL is also known as SQL -like query language.
Limitations of Cassandra
1.> Cassandra does not support joins
2.> It is not possible for Cassandra to manage consistency on behalf of the user as it does not support foreign key features.
3.> A key has to be unique in it’s scope. If not, data will be overwritten.
4.> It is possible that failed operation may leave the changes as Cassandra does not support atomic operations.
5.> Healing from failure is a manual effort in Cassandra.
Cassandra is derived from BigTable(Data Model) and Dynamo(Architecture), two of the most well-known and powerful databases today. It’s exceptional performance on large datasets makes it more powerful and popular than the other two. Currently, NoSql databases are becoming very popular and important part of the database landscape and Cassandra is a major reason for this. If used appropriately, real benefits and advantages can be easily achieved.