Even though we had put the Cassandra away in all our products, we would like to share our works here.
Why we put away the Cassandra in our products? Because:
(1) It is a big wrong in Cassandra’s implementation, especially on it’s local storage engine layer, i.e. SSTable and Indexing.
(2) It is a big wrong to combine Bigtable and Dynamo. Dynamo’s hash ring architecture is a obsolete technolohy for scale, it’s consistency and replication policy is also unusable in big data storage.
Showing posts with label Cassandra. Show all posts
Showing posts with label Cassandra. Show all posts
Sunday, June 12, 2011
Saturday, July 10, 2010
My comments to "Cassandra at Twitter Today"
Someone said the twitter blog post "Cassandra at Twitter Today" is a big blow to the reputation of Cassandra.
It is ardently discussing @http://news.ycombinator.com/item?id=1502756
Here are my comments:
1. Cassandra is very young! Especially, the design and implementation of local storage and local indexing are junior and not good.
2. Pool read-performance is also due to the poor local storage implementation.
3. The local storage, indexing and persistence structures are not stable. They need to be re-designed /re-implemented. If Twitter move data to current Cassandra, they should do another move later for a new local storage, indexing and persistence structure.
4. There are many good techniques in Cassandra and other open-sourced projects (such as Hadoop, HBase ...), etc. But, they are not ready for production. Understand the detail of these techniques and implement them in your projects/products.
Monday, April 19, 2010
Cassandra Insert Throughput
** 0.5.1
Test Cluster:
DELL 2950 1*CPU Intel Xeon 5310 (4 cores)
5 nodes
1 node: 2GB heap for Cassandra JVM
4 nodes: 4GB heap for Cassandra JVM
Commit-log and Data stored on same disks.
25 client threads run on 5 nodes.
Data Model:
Keyspace Name = “Test”
Column Family Name = “ABC”
CompareWith for Column = LongType
Column Name = Timestamp (LongType), Value = 400 bytes binary
Billions of keys, thousands of columns.
Partitioner = dht.RandomPartitioner
MemtableSizeInMB = 64MB
ReplicationFactor = 3
Use Thrift Client Interface
Client.insert(..)
Consistency Level (write) = 1
Total inserted 1,076,333,461 columns.
Disk Use: 302GB+283GB+335GB+186GB+276GB=1,382GB (~~400B*1G=400GB *3= 1200GB)
On inserting: 1000 SSTables on each node. The latency of a query is about 1~3 seconds.
Quiet for long time: 10 SSTables (very big files, such as there is one 144GB SSTable data file)
The latency of a query is in ms.
Result: 18,000 columns/second
** 0.6.0
Only 4 nodes.
JVM GC for big heap.
Memory, GC..., always to be the bottleneck and big issue of java-based infrastructure software!
https://issues.apache.org/jira/browse/CASSANDRA-896 (LinkedBlockingQueue issue, fixed in jdk-6u19)
Seems 0.5.1 performed better.
0.6.0 eat more memory.
Cassandra 0.6.0 insert throughput
View more presentations from Schubert Zhang.
Subscribe to:
Comments (Atom)