Monday, April 19, 2010

Cassandra Insert Throughput

** 0.5.1

Test Cluster:
DELL 2950 1*CPU Intel Xeon 5310 (4 cores)
5 nodes
1 node: 2GB heap for Cassandra JVM
4 nodes: 4GB heap for Cassandra JVM

Commit-log and Data stored on same disks.
25 client threads run on 5 nodes.

Data Model:
Keyspace Name = “Test”
Column Family Name = “ABC”
CompareWith for Column = LongType
Column Name = Timestamp (LongType), Value = 400 bytes binary
Billions of keys, thousands of columns.

Partitioner = dht.RandomPartitioner
MemtableSizeInMB = 64MB
ReplicationFactor = 3

Use Thrift Client Interface
Consistency Level (write) = 1

Total inserted 1,076,333,461 columns.
Disk Use: 302GB+283GB+335GB+186GB+276GB=1,382GB (~~400B*1G=400GB *3= 1200GB)

On inserting: 1000 SSTables on each node. The latency of a query is about 1~3 seconds.
Quiet for long time: 10 SSTables (very big files, such as there is one 144GB SSTable data file)
The latency of a query is in ms.

Result: 18,000 columns/second

** 0.6.0
Only 4 nodes.

JVM GC for big heap.
Memory, GC..., always to be the bottleneck and big issue of java-based infrastructure software! (LinkedBlockingQueue issue, fixed in jdk-6u19)

Seems 0.5.1 performed better.
0.6.0 eat more memory.


  1. Big data is now taking the guesswork out of discerning which individuals are the best targets for a particular product. To know more about SAP, Visit Big data training in chennai

  2. I just see the post i am so happy to the communication science post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be replay for your great thinks and I hope you post again soon...
    Java Training in Chennai

  3. The blog gave me idea about components of selenium. They explained in effective manner. Thanks for sharing it. Keep sharing more blogs.

    Selenium Training in Chennai

  4. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic.