wiki.volution.ro -- CiprianDorinCraciun/Teaching/2010-2011/AdvancedDataStructures/Cassandra

Contents

NOSQL Overview
Cassandra Overview

NOSQL Overview

The acronym:

Not only SQL;
No SQL;

The main models:

columnar (see some examples);
key-value (see some examples);
document (see some examples);

The columnar model (see also Wikipedia):

differences between the relational and columnar data model;
see the "Twissandra" example below;

Advantages:

scalability;
availability;
plus the media hype;

Disadvantages:

schema-less (i.e. it's very easy to become sloppy);
lack of (native) joins (i.e. SELECT * FROM users, messages WHERE users.id = messages.user_id);
lack of (native) filters (i.e. SELECT * FROM users WHERE users.date_added < '2010-01-01' AND users.active);
lack (or limited) range queries (i.e. SELECT * FROM users WHERE users.date_added BETWEEN '2009-01-01' AND '2009-12-31');
lack of data modeling / querying tools;
lack of standardized data model and API;
highly denormalized;
query analysis takes precedence over data modeling;

Links:

Cassandra Overview

Data model levels:

cluster (equivalent of servers in RDBMS);
keyspace (equivalent of databases / schemas for RDBMS);
row (somehow the same as for RDBMS);
column family (over simplifying the issue it can be thought as a table in a RDBMS);
column (somehow the same as for RDBMS);

Data model examples:

Twissandra (analysis and code) -- a proof-of-concept Twitter clone:

# User, Friends, Followers, and Userline are all column families hosted in the same keyspace.

User = {
    'hermes': {
        'password': '****',
        (other properties),
    },
}

Friends = {
    'hermes': {
        # friend id: timestamp of when the friendship was added
        'larry': '1267413962580791',
        'curly': '1267413990076949',
        'moe'  : '1267414008133277',
    },
}

Followers = {
    'hermes': {
        # friend id: timestamp of when the followership was added
        'larry': '1267413962580791',
        'curly': '1267413990076949',
        'moe'  : '1267414008133277',
    },
}

Tweet = {
    '7561a442-24e2-11df-8924-001ff3591711': {
        'username': 'hermes',
        'body': 'Trying out Twissandra. This is awesome!',
    },
}

Userline = {
    'hermes': {
        # timestamp of tweet: tweet id
        1267414247561777: '7561a442-24e2-11df-8924-001ff3591711',
        1267414277402340: 'f0c8d718-24e2-11df-8924-001ff3591711',
        1267414305866969: 'f9e6d804-24e2-11df-8924-001ff3591711',
        1267414319522925: '02ccb5ec-24e3-11df-8924-001ff3591711',
    },
}

Principles:

read path: 1 and 2;
write path: 1 and 2;
commit log;
Memtable-SSTable;
SSTable;

Keyspace partitioning:

(Dynamo) (Riak)

Operations:

(Write) (Read)

Involved data-structures:

Links:

Cassandra Data Model (from the 0.6 release documentation);
Cassandra Data Model (the current explanation);
Cassandra Data Model (an work-in-progress explanation);
Cassandra Architecture Overview;
Cassandra Architecture Internals;
Cassandra -- A Decentralized Structured Storage System;
BigTable Model with Cassandra and HBase;
see also my Cassandra assessment;

http://bit.ly/cWP59m