Big Data Engineering, Practices and Research: DBMS

Daniel Abadi have a blog post here:

http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html

I want to leave a comment and to correct it here:

It is meaningless to compare the two groups, they target to different applications. I think the post just make more confusion. And what Mr. Stonebraker said is also not right.

I think the only thing which make these confusions is the term "column" in Group A. In fact, it is not traditional "column" of RDBMS area. And your example of a traditional spreadsheet table is also not the real target of Group A. The "column" name in Group A is in fact data (not schema).

If have got to change the term, I think we can change the term "column" in Group A to "end-key". In fact, in Bigtable, there is no column, it is "qualifier".

In short:

(1) Group A's "column" is in data, not schema.

(2) Group B's "column" is in schema.

They are different in conception and application target.

Recently, tow famous vendors of analytic DBMS, Vertica and Aster-Data announced their integration with Hadoop. The analytic DBMS and Hadoop, each address distinct but complementary problems for managing large data.

Vertica:

Currently it is a light integration.

ETL, ELT, data cleansing, data mining, etc.
Moving data between Hadoop and Vertica.
InputFormat (InputSplit , VerticaRecord, push down relational map operations by parameterizing the database query).
OutputFormat (to existing or create a new table).
Easy for Hadoop developers to push down Map operations to Vertica databases in parallel by specifying parameterized queries which result in pre-aggregated data for each mapper.
Support Hadoop streaming interface.

Typical usages:
(1) Raw Data->Hadoop(ETL)->Vertical (for fast ad-hoc query, near realtime)
(2) Vertical -> Hadoop(ETL) ->Vertical (for fast ad-hoc query, near realtime)
(3) Vertical -> Hadoop (sophisticated query for analysis or mining)

We can expect to see tighter integration and higher performance.

References
[1] The Scoop on Hadoop and Vertica: http://databasecolumn.vertica.com/2009/09/the_scoop_on_hadoop_and_vertic.html
[2] Using Vertica as a Structured Data Repository for Apache Hadoop: http://www.vertica.com/MapReduce
[3] Cloudera DBInputFormat interface: http://www.cloudera.com/blog/2009/03/06/database-access-with-hadoop/
[4] Managing Big Data with Hadoop and Vertica: http://www.vertica.com/resourcelogin?type=pdf&item=ManagingBigDatawithHadoopandVertica.pdf

AsterData:

AsterData already provide in-database MapReduce.

The new Aster-Hadoop Data Connector, which utilizes Aster’s patent-pending SQL-MapReduce capabilities for two-way, high-speed, data transfer between Apache Hadoop and Aster Data’s massively parallel data warehouse.

ETL processing or data mining, and then pull that data into Aster for interactive queries or ad-hoc analytics on massive data scales.
The Connector utilizes key new SQL-MapReduce functions to provide ultra-fast, two-way data loading between HDFS (Hadoop Distributed File System) and Aster Data’s MPP Database.
Parallel loader.
LoadFromHadoop: Parallel data loading from HDFS to Aster nCluster.
LoadToHadoop: Parallel data loading from Aster nCluster to HDFS.

Key advantages of Aster’s Hadoop Connector include:

High-performance: Fast, parallel data transfer between Hadoop and Aster nCluster.
Ease-of-use: Analysts can now seamlessly invoke a SQL command for ultra-simple import of Hadoop-MapReduce jobs, for deeper data analysis. Aster intelligently and automatically parallelizes the load.
Data Consistency: Aster Data's data integrity and transactional consistency capabilities treat the data load as a 'transaction', ensuring that the data load or export is always consistent and can be carried out while other queries are running in parallel in Aster.
Extensibility: Customers can easily further extend the Connector using SQL-MapReduce, to provide further customization for their specific environment.

The typical usages are similar to Vertica.

References
[1] Aster Data Announces Seamless Connectivity With Hadoop: http://www.nearshorejournal.com/2009/10/aster-data-announces-seamless-connectivity-with-hadoop/
http://www.asterdata.com/news/091001-Aster-Hadoop-connector.php
[2] DBMS2 - MapReduce tidbits http://www.dbms2.com/2009/10/01/mapreduce-tidbits/#more-983
[3] AstaData Blog: Aster Data Seamlessly Connects to Hadoop, http://www.asterdata.com/blog/index.php/2009/10/05/aster-data-seamlessly-connects-to-hadoop/

Another Integration of Analytic DBMS and Hadoop case is HadoopDB project. http://db.cs.yale.edu/hadoopdb/hadoopdb.html

Big Data Engineering, Practices and Research

Tuesday, March 30, 2010

Please don't puzzle on Column-Stores

Friday, October 2, 2009

The Integration of Analytic DBMS and Hadoop

About Me

Blog Archive

Labels

Search This Blog

Followers