Big Data Engineering, Practices and Research: April 2010

** 0.5.1

Test Cluster:

DELL 2950 1*CPU Intel Xeon 5310 (4 cores)

5 nodes

1 node: 2GB heap for Cassandra JVM

4 nodes: 4GB heap for Cassandra JVM

Commit-log and Data stored on same disks.

25 client threads run on 5 nodes.

Data Model:

Keyspace Name = “Test”

Column Family Name = “ABC”

CompareWith for Column = LongType

Column Name = Timestamp (LongType), Value = 400 bytes binary

Billions of keys, thousands of columns.

Partitioner = dht.RandomPartitioner

MemtableSizeInMB = 64MB

ReplicationFactor = 3

Use Thrift Client Interface

Client.insert(..)

Consistency Level (write) = 1

Total inserted 1,076,333,461 columns.

Disk Use: 302GB+283GB+335GB+186GB+276GB=1,382GB (~~400B*1G=400GB *3= 1200GB)

On inserting: 1000 SSTables on each node. The latency of a query is about 1~3 seconds.

Quiet for long time: 10 SSTables (very big files, such as there is one 144GB SSTable data file)

The latency of a query is in ms.

Result: 18,000 columns/second

** 0.6.0

Only 4 nodes.

JVM GC for big heap.

Memory, GC..., always to be the bottleneck and big issue of java-based infrastructure software!

http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts

https://issues.apache.org/jira/browse/CASSANDRA-896 (LinkedBlockingQueue issue, fixed in jdk-6u19)

Seems 0.5.1 performed better.

0.6.0 eat more memory.

Cassandra 0.6.0 insert throughput

View more presentations from Schubert Zhang.

Big Data Engineering, Practices and Research

Monday, April 19, 2010

Cassandra Insert Throughput

About Me

Blog Archive

Labels

Search This Blog

Followers