wiki.volution.ro -- CiprianDorinCraciun/Notes/Public/NoSqlComparison

Contents

Cassandra

it inherits part of Google Big Table architecture (data file layout), and part of Amazon Dynamo distribution model;

architectural simplicity:
- all nodes are equal in responsibility;
- there is no single point of failure;
- aware of node organization into racks and data-centers;
data consistency:
- the consistency level can be tweaked: one, quorum, or all (ref);
- uses a commit log, which guarantees recovery (ref, ref);
- the number of replica can be configured;
data IO:
- high write throughput (ref, ref);
network interface:
- the Thrift interface is brittle and can easily bring a node down (ref);
backups:
- it is possible to backup and restore each node individually (ref);
deployment:
- to deploy just copy the jars and config file and start it;

data distribution:
- by using the ring topology and the initial token, the data can easily get unbalanced when using the order preserving partitioner; (by default it uses the random partitioner which transforms it into a hash table);
data consistency:
- conflicts are resolved based on the client provided timpestamp;
data IO:
- reads go to any node, which then it proxies it to replicas (ref);
- one read is transformed by the proxy to multiple reads to multiple replicas (see the consistency level) (ref);
- high read latency (ref);
- limited support for in memory caching (from the documentation is unclear how caching is done, but I suspect that it always reads from the file on the assumption that the operating system handles file system caching);
schema:
- it is unclear if adding a new column family implies a cluster restart;
Java inheritance:
- it is quite easy to trigger an OutOfMemory condition and bring down a node (ref);