Contents
NOSQL Overview
The acronym:
Not only SQL;
- No SQL;
The main models:
columnar (see some examples);
key-value (see some examples);
document (see some examples);
The columnar model (see also Wikipedia):
- differences between the relational and columnar data model;
- see the "Twissandra" example below;
Advantages:
- scalability;
- availability;
plus the media hype;
Disadvantages:
- schema-less (i.e. it's very easy to become sloppy);
lack of (native) joins (i.e. SELECT * FROM users, messages WHERE users.id = messages.user_id);
lack of (native) filters (i.e. SELECT * FROM users WHERE users.date_added < '2010-01-01' AND users.active);
lack (or limited) range queries (i.e. SELECT * FROM users WHERE users.date_added BETWEEN '2009-01-01' AND '2009-12-31');
- lack of data modeling / querying tools;
- lack of standardized data model and API;
- highly denormalized;
- query analysis takes precedence over data modeling;
Links:
Cassandra Overview
Data model levels:
- cluster (equivalent of servers in RDBMS);
- keyspace (equivalent of databases / schemas for RDBMS);
- row (somehow the same as for RDBMS);
- column family (over simplifying the issue it can be thought as a table in a RDBMS);
- column (somehow the same as for RDBMS);
Data model examples:
Twissandra (analysis and code) -- a proof-of-concept Twitter clone:
# User, Friends, Followers, and Userline are all column families hosted in the same keyspace. User = { 'hermes': { 'password': '****', (other properties), }, } Friends = { 'hermes': { # friend id: timestamp of when the friendship was added 'larry': '1267413962580791', 'curly': '1267413990076949', 'moe' : '1267414008133277', }, } Followers = { 'hermes': { # friend id: timestamp of when the followership was added 'larry': '1267413962580791', 'curly': '1267413990076949', 'moe' : '1267414008133277', }, } Tweet = { '7561a442-24e2-11df-8924-001ff3591711': { 'username': 'hermes', 'body': 'Trying out Twissandra. This is awesome!', }, } Userline = { 'hermes': { # timestamp of tweet: tweet id 1267414247561777: '7561a442-24e2-11df-8924-001ff3591711', 1267414277402340: 'f0c8d718-24e2-11df-8924-001ff3591711', 1267414305866969: 'f9e6d804-24e2-11df-8924-001ff3591711', 1267414319522925: '02ccb5ec-24e3-11df-8924-001ff3591711', }, }
Principles:
Keyspace partitioning:
(Dynamo) (Riak)
Operations:
(Write) (Read)
Involved data-structures:
- sorted index;
Links:
Cassandra Data Model (from the 0.6 release documentation);
Cassandra Data Model (the current explanation);
Cassandra Data Model (an work-in-progress explanation);
see also my Cassandra assessment;