Friday, October 2, 2009

The Integration of Analytic DBMS and Hadoop

Recently, tow famous vendors of analytic DBMS, Vertica and Aster-Data announced their integration with Hadoop. The analytic DBMS and Hadoop, each address distinct but complementary problems for managing large data.

Vertica:

Currently it is a light integration.
  • ETL, ELT, data cleansing, data mining, etc.
  • Moving data between Hadoop and Vertica.
  • InputFormat (InputSplit , VerticaRecord, push down relational map operations by parameterizing the database query).
  • OutputFormat (to existing or create a new table).
  • Easy for Hadoop developers to push down Map operations to Vertica databases in parallel by specifying parameterized queries which result in pre-aggregated data for each mapper.
  • Support Hadoop streaming interface.

Typical usages:

(1) Raw Data->Hadoop(ETL)->Vertical (for fast ad-hoc query, near realtime)
(2) Vertical -> Hadoop(ETL) ->Vertical (for fast ad-hoc query, near realtime)
(3) Vertical -> Hadoop (sophisticated query for analysis or mining)

We can expect to see tighter integration and higher performance.

References
[1] The Scoop on Hadoop and Vertica: http://databasecolumn.vertica.com/2009/09/the_scoop_on_hadoop_and_vertic.html
[2] Using Vertica as a Structured Data Repository for Apache Hadoop: http://www.vertica.com/MapReduce
[3] Cloudera DBInputFormat interface: http://www.cloudera.com/blog/2009/03/06/database-access-with-hadoop/
[4] Managing Big Data with Hadoop and Vertica: http://www.vertica.com/resourcelogin?type=pdf&item=ManagingBigDatawithHadoopandVertica.pdf

AsterData:

AsterData already provide in-database MapReduce.

The new Aster-Hadoop Data Connector, which utilizes Aster’s patent-pending SQL-MapReduce capabilities for two-way, high-speed, data transfer between Apache Hadoop and Aster Data’s massively parallel data warehouse.
  • ETL processing or data mining, and then pull that data into Aster for interactive queries or ad-hoc analytics on massive data scales.
  • The Connector utilizes key new SQL-MapReduce functions to provide ultra-fast, two-way data loading between HDFS (Hadoop Distributed File System) and Aster Data’s MPP Database.
  • Parallel loader.
  • LoadFromHadoop: Parallel data loading from HDFS to Aster nCluster.
  • LoadToHadoop: Parallel data loading from Aster nCluster to HDFS.

Key advantages of Aster’s Hadoop Connector include:
  • High-performance: Fast, parallel data transfer between Hadoop and Aster nCluster.
  • Ease-of-use: Analysts can now seamlessly invoke a SQL command for ultra-simple import of Hadoop-MapReduce jobs, for deeper data analysis. Aster intelligently and automatically parallelizes the load.
  • Data Consistency: Aster Data's data integrity and transactional consistency capabilities treat the data load as a 'transaction', ensuring that the data load or export is always consistent and can be carried out while other queries are running in parallel in Aster.
  • Extensibility: Customers can easily further extend the Connector using SQL-MapReduce, to provide further customization for their specific environment.

The typical usages are similar to Vertica.

References
[1] Aster Data Announces Seamless Connectivity With Hadoop: http://www.nearshorejournal.com/2009/10/aster-data-announces-seamless-connectivity-with-hadoop/
http://www.asterdata.com/news/091001-Aster-Hadoop-connector.php
[2] DBMS2 - MapReduce tidbits http://www.dbms2.com/2009/10/01/mapreduce-tidbits/#more-983
[3] AstaData Blog: Aster Data Seamlessly Connects to Hadoop, http://www.asterdata.com/blog/index.php/2009/10/05/aster-data-seamlessly-connects-to-hadoop/

Another Integration of Analytic DBMS and Hadoop case is HadoopDB project. http://db.cs.yale.edu/hadoopdb/hadoopdb.html

11 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    http://glimtechnologies.com/java-training-coimbatore/
    http://glimtechnologies.com/digital-marketing-training-coimbatore/
    http://glimtechnologies.com/seo-training-coimbatore/
    http://glimtechnologies.com/tally-training-coimbatore/
    http://glimtechnologies.com/python-training-in-coimbatore/
    http://glimtechnologies.com/hadoop-training-in-coimbatore/
    http://glimtechnologies.com/big-data-training-in-coimbatore/

    ReplyDelete
  3. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    http://onlinejobsupport.net/online-job-support-from-india/
    http://onlinejobsupport.net/job-support/java-online-job-support/
    http://onlinejobsupport.net/job-support/php-online-job-support/
    http://onlinejobsupport.net/job-support/selenium-online-job-support/
    http://onlinejobsupport.net/job-support/dotnet-online-job-support/
    http://onlinejobsupport.net/job-support/devops-online-job-support/
    http://onlinejobsupport.net/job-support/manual-testing-online-job-support/

    ReplyDelete
  4. best rice cooker. subtle interruption to viewing the page than a modal.

    ReplyDelete
  5. sad shayari. , but only today i modify to OO-structure and publish at

    ReplyDelete
  6. very nice information Share This Jobalert Free Job Alert

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Truff Stock Get free stock market quotes, stock information, company news, historical charts and financial overviews from Truff Stock . We bring you all the latest Truff Stock market & financial news All in one place.

    ReplyDelete