Recently, tow famous vendors of analytic DBMS, Vertica and Aster-Data announced their integration with Hadoop. The analytic DBMS and Hadoop, each address distinct but complementary problems for managing large data.
Vertica:Currently it is a light integration.
- ETL, ELT, data cleansing, data mining, etc.
- Moving data between Hadoop and Vertica.
- InputFormat (InputSplit , VerticaRecord, push down relational map operations by parameterizing the database query).
- OutputFormat (to existing or create a new table).
- Easy for Hadoop developers to push down Map operations to Vertica databases in parallel by specifying parameterized queries which result in pre-aggregated data for each mapper.
- Support Hadoop streaming interface.
Typical usages: (1) Raw Data->Hadoop(ETL)->Vertical (for fast ad-hoc query, near realtime)
(2) Vertical -> Hadoop(ETL) ->Vertical (for fast ad-hoc query, near realtime)
(3) Vertical -> Hadoop (sophisticated query for analysis or mining)
We can expect to see tighter integration and higher performance.
References
[1] The Scoop on Hadoop and Vertica:
http://databasecolumn.vertica.com/2009/09/the_scoop_on_hadoop_and_vertic.html[2] Using Vertica as a Structured Data Repository for Apache Hadoop:
http://www.vertica.com/MapReduce[3] Cloudera DBInputFormat interface:
http://www.cloudera.com/blog/2009/03/06/database-access-with-hadoop/[4] Managing Big Data with Hadoop and Vertica:
http://www.vertica.com/resourcelogin?type=pdf&item=ManagingBigDatawithHadoopandVertica.pdfAsterData:AsterData already provide in-database MapReduce.
The new Aster-Hadoop Data Connector, which utilizes Aster’s patent-pending SQL-MapReduce capabilities for two-way, high-speed, data transfer between Apache Hadoop and Aster Data’s
massively parallel data warehouse.
- ETL processing or data mining, and then pull that data into Aster for interactive queries or ad-hoc analytics on massive data scales.
- The Connector utilizes key new SQL-MapReduce functions to provide ultra-fast, two-way data loading between HDFS (Hadoop Distributed File System) and Aster Data’s MPP Database.
- Parallel loader.
- LoadFromHadoop: Parallel data loading from HDFS to Aster nCluster.
- LoadToHadoop: Parallel data loading from Aster nCluster to HDFS.
Key advantages of Aster’s Hadoop Connector include:
- High-performance: Fast, parallel data transfer between Hadoop and Aster nCluster.
- Ease-of-use: Analysts can now seamlessly invoke a SQL command for ultra-simple import of Hadoop-MapReduce jobs, for deeper data analysis. Aster intelligently and automatically parallelizes the load.
- Data Consistency: Aster Data's data integrity and transactional consistency capabilities treat the data load as a 'transaction', ensuring that the data load or export is always consistent and can be carried out while other queries are running in parallel in Aster.
- Extensibility: Customers can easily further extend the Connector using SQL-MapReduce, to provide further customization for their specific environment.
The typical usages are similar to Vertica.
References
[1] Aster Data Announces Seamless Connectivity With Hadoop:
http://www.nearshorejournal.com/2009/10/aster-data-announces-seamless-connectivity-with-hadoop/http://www.asterdata.com/news/091001-Aster-Hadoop-connector.php[2] DBMS2 - MapReduce tidbits
http://www.dbms2.com/2009/10/01/mapreduce-tidbits/#more-983[3] AstaData Blog: Aster Data Seamlessly Connects to Hadoop,
http://www.asterdata.com/blog/index.php/2009/10/05/aster-data-seamlessly-connects-to-hadoop/Another Integration of Analytic DBMS and Hadoop case is HadoopDB project.
http://db.cs.yale.edu/hadoopdb/hadoopdb.html