Monday, April 19, 2010

Cassandra Insert Throughput

** 0.5.1

Test Cluster:
DELL 2950 1*CPU Intel Xeon 5310 (4 cores)
5 nodes
1 node: 2GB heap for Cassandra JVM
4 nodes: 4GB heap for Cassandra JVM

Commit-log and Data stored on same disks.
25 client threads run on 5 nodes.

Data Model:
Keyspace Name = “Test”
Column Family Name = “ABC”
CompareWith for Column = LongType
Column Name = Timestamp (LongType), Value = 400 bytes binary
Billions of keys, thousands of columns.

Partitioner = dht.RandomPartitioner
MemtableSizeInMB = 64MB
ReplicationFactor = 3

Use Thrift Client Interface
Client.insert(..)
Consistency Level (write) = 1

Total inserted 1,076,333,461 columns.
Disk Use: 302GB+283GB+335GB+186GB+276GB=1,382GB (~~400B*1G=400GB *3= 1200GB)

On inserting: 1000 SSTables on each node. The latency of a query is about 1~3 seconds.
Quiet for long time: 10 SSTables (very big files, such as there is one 144GB SSTable data file)
The latency of a query is in ms.

Result: 18,000 columns/second


** 0.6.0
Only 4 nodes.

JVM GC for big heap.
Memory, GC..., always to be the bottleneck and big issue of java-based infrastructure software!
https://issues.apache.org/jira/browse/CASSANDRA-896 (LinkedBlockingQueue issue, fixed in jdk-6u19)

Seems 0.5.1 performed better.
0.6.0 eat more memory.