Vertica:
Currently it is a light integration.
- ETL, ELT, data cleansing, data mining, etc.
- Moving data between Hadoop and Vertica.
- InputFormat (InputSplit , VerticaRecord, push down relational map operations by parameterizing the database query).
- OutputFormat (to existing or create a new table).
- Easy for Hadoop developers to push down Map operations to Vertica databases in parallel by specifying parameterized queries which result in pre-aggregated data for each mapper.
- Support Hadoop streaming interface.
Typical usages:
(1) Raw Data->Hadoop(ETL)->Vertical (for fast ad-hoc query, near realtime)
(2) Vertical -> Hadoop(ETL) ->Vertical (for fast ad-hoc query, near realtime)
(3) Vertical -> Hadoop (sophisticated query for analysis or mining)
We can expect to see tighter integration and higher performance.
References
[1] The Scoop on Hadoop and Vertica: http://databasecolumn.vertica.com/2009/09/the_scoop_on_hadoop_and_vertic.html
[2] Using Vertica as a Structured Data Repository for Apache Hadoop: http://www.vertica.com/MapReduce
[3] Cloudera DBInputFormat interface: http://www.cloudera.com/blog/2009/03/06/database-access-with-hadoop/
[4] Managing Big Data with Hadoop and Vertica: http://www.vertica.com/
AsterData:
AsterData already provide in-database MapReduce.
The new Aster-Hadoop Data Connector, which utilizes Aster’s patent-pending SQL-MapReduce capabilities for two-way, high-speed, data transfer between Apache Hadoop and Aster Data’s massively parallel data warehouse.
- ETL processing or data mining, and then pull that data into Aster for interactive queries or ad-hoc analytics on massive data scales.
- The Connector utilizes key new SQL-MapReduce functions to provide ultra-fast, two-way data loading between HDFS (Hadoop Distributed File System) and Aster Data’s MPP Database.
- Parallel loader.
- LoadFromHadoop: Parallel data loading from HDFS to Aster nCluster.
- LoadToHadoop: Parallel data loading from Aster nCluster to HDFS.
Key advantages of Aster’s Hadoop Connector include:
- High-performance: Fast, parallel data transfer between Hadoop and Aster nCluster.
- Ease-of-use: Analysts can now seamlessly invoke a SQL command for ultra-simple import of Hadoop-MapReduce jobs, for deeper data analysis. Aster intelligently and automatically parallelizes the load.
- Data Consistency: Aster Data's data integrity and transactional consistency capabilities treat the data load as a 'transaction', ensuring that the data load or export is always consistent and can be carried out while other queries are running in parallel in Aster.
- Extensibility: Customers can easily further extend the Connector using SQL-MapReduce, to provide further customization for their specific environment.
The typical usages are similar to Vertica.
References
[1] Aster Data Announces Seamless Connectivity With Hadoop: http://www.nearshorejournal.com/2009/10/aster-data-announces-seamless-connectivity-with-hadoop/
http://www.asterdata.com/news/091001-Aster-Hadoop-connector.php
[2] DBMS2 - MapReduce tidbits http://www.dbms2.com/2009/10/01/mapreduce-tidbits/#more-983
[3] AstaData Blog: Aster Data Seamlessly Connects to Hadoop, http://www.asterdata.com/blog/index.php/2009/10/05/aster-data-seamlessly-connects-to-hadoop/
Another Integration of Analytic DBMS and Hadoop case is HadoopDB project. http://db.cs.yale.edu/hadoopdb/hadoopdb.html