Monday, August 17, 2009

Hybrid store of row and column! Hybrid query of lookup and MapReduce?

- Hybrid store of row and column!

In our practices, we were aware of the hybrid of row-oriented store and column-oriented store is a realistic choice. I got this inspiration from Bigtable's column-family concept.

Now Vertica 3.5 move from pure columnar store to hybrid. It is called "Column Grouping", which is the major part of the veritica's enhancement in storing and processing columnar data called FlexStore. I think FlexStore means "Flexible Store". Users can define their column group flexibly.

Hybrid is the trend. I like Bigtable's model abstraction, it is simple and flexible.

- Hybrid query of lookup and MapReduce?

It seems it is a contradiction for low-latency lookup and high-latency ad-hoc MapReduce query. But I don't know if it make sense to support both in one data system. But sometimes, it seems needed.

Hive is one of the best practices to provide a easy-to-used MapReduce expression tool, or data warehouse. No real-time lookup. In fact, it is not a easy work to melt MapReduce into SQL, after reading of the DAG abstraction in Hive's paper in VLDB09.

As expected, Dr. Stonebraker's Vertica 3.5 also integrate Hadoop MapReduce now. And HadoopDB is Hive+Hadoop+PostgresDB. Vertica does not integrate MapReduce into SQL now, it is different from Greenplum and AsterData and HadoopDB.


No comments:

Post a Comment