wiki.accu.org

This is an old revision of the document!

Back to 2014-proposals

Title: If the CAP fits - The Apache Cassandra Database
Proposer: Gavin Heavyside
Type: Tutorial
Duration: 90 mins
Description:
Apache Cassandra (http://cassandra.apache.org/) is one of the world's most popular open-source distributed databases. Originally created by Facebook to power their Inbox Search, it was open-sourced in 2008, became a top-level Apache project in 2010, and is now used by companies all over the world including household names like Spotify and Netflix.

Cassandra is designed to handle large amounts of data with high availability across many commodity servers (even spanning datacentres), with no single point of failure, and has extremely high performance. Distributed systems are ofter thought of in terms of the CAP theorem (Consistency, Availability, Partition Tolerance). Marrying the Amazon Dynamo distributed system model with the Google BigTable data model, Cassandra allows us to tune the Consistency behaviour of our distributed system to achieve high availability. Cassandra 1.2 introduced CQL3, a SQL-like query language that makes using Cassandra easier than ever.

In this session we'll take a look at Apache Cassandra, it's ecosystem, data model, and what you need to consider when deploying. We'll learn how to live with the implications of the CAP theorem, and even to embrace eventual consistency, and best practices for structuring data to achieve fast read performance. The key considerations for ensuring scalability and high performance will be covered, and we'll learn about CQL3, and how it can make accessing and querying your data simple even if you're new to Cassandra. All of this will be demonstrated with reference to the source code of a live application reading and writing to a Cassandra database.