Sunday, September 27, 2009

Tips of Hadoop Runtime Environment

java

To setup a cluster for running Hadoop/HBase/Hive, etc., besides the configuration and tuning of these open-source programs themselves, the system hardware and software, and some useful utilities, should also be considered to improve the system performance and to ease the system maintenance.


1. Cluster Facilities and Hardware [1]

(1) Data center:

Usually, we run Hadoop/HBase/Hive in a single data center.

(2) Servers:

Clusters are often either capacity bound or CPU bound.

The 1U or 2U configuration is usually used.

The storage capability of each node is usually not too dense (<= 4TB is recommended).

Commodity server: 2x4 core CPU, 16 GB RAM, 4x1TB SATA, 2x1 GE NIC

Use ECC RAM and cheap hard drives: 7200 RPM SATA.

Start with standard 64-bit box for masters and workers.

(3) Network:

Gigabit Ethernet, 2 level tree, 5:1 oversubscription to core

May want redundancy at top of rack and core

Usually, for a small cluster, all nodes are under a single GE switch.

(4) RAID configuration: RAID0

If there are two or more disks in each machine, RAID0 can provide better disk throughput than other RAID levels (and JBOD?). The multiple data replicas of HDFS can tolerate failure and guarantee the data safety.


2. System Software

(1) Linux:

RedHat5+ or CentOS5+ (recommended). Now, we use CentOS5.3-x64.

(2) Local File System:

Ext3 is ok.

We usually configure a separate disk partition for Hadoop used local file system, create and mount separate local file system for Hadoop.

Mount with noatime and nodiratime for performance improvements. Default, Linux will update the atime of files and directories, which is unnecessary in most cases. [2]

-- Edit /etc/fstab:

e.g. /dev/VolGroup00/LogVol02 /data ext3 defaults,noatime,nodiratime 1 2

-- remound:

mount -o remount /data

(3) Swappiness configuration: [3]

With the introduction of version 2.6, the new variable "swappiness" was added in the Linux kernel memory management subsystem and a tunable was created for it. High value of swappiness will make the kernel page out application text in favour of another application or even file-system cache. The default value is 60 (see mm/vmscan.c).

If you end up swapping, you're going to start seeing some weird behavior and very slow GC runs, and likely killing off HBase regionservers as ZooKeeper times out and assume the RegionServer is dead. Suggest setting vm.swappiness = 0 or other low number (e.g. 10), and observe the state of swap.

-- Edit /etc/sysctl.conf : vm.swappiness=0

-- To check the current value on a running system: sysctl vm.swappiness

(4) Linux default file handle limit: [4]

Currently HBase is a file handle glutton. To up the users' file handles, edit /etc/security/limits.conf on all nodes.

* - nofile 32768

(5) Java: JRE/JDK1.6 latest and GC options [5]

For machine with 4-16 cores, our Hadoop/HBase and other java applications should use GC option as: “-XX:+UseConcMarkSweepGC”.

For machine with 2 cores, should use GC option as: “-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode”.

(6) Apache ANT 1.7.1

(7) Useful Linux utilities:

top, sar, iostat, iftop, vmstat, nfsstat, strace, dmesg, and friends

Especially iostat is very useful for disk I/O analysis.

(8) Useful java utilities:

jps, jstack, jconsole

(9) Compression native library: Gzip and LZO [7]

(10) Ganglia:

To integrate metrics of Hadoop, HBase, Hive, applications, and Linux system.


References:

[1] Hadoop and Cloudera, Managing Petabytes with Open Source, Jeff Hammerbacher, Aug. 21 http://indico.cern.ch/conferenceDisplay.py?confId=59791 or http://bit.ly/NXH6p

[2] Set noatime of local file system: http://www.chinaz.com/Server/Linux/0515L0032009.html

[3] Linux Swappiness: http://www.sollers.ca/blog/2008/swappiness/

[4] HBase FAQ: http://wiki.apache.org/hadoop/Hbase/FAQ

[5] Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning, http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

[6] Apache Ant: http://ant.apache.org/

[7] http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html

78 comments:

  1. The one surprise to me is recommending RAID over JBOD. I'd heard the opposite, e.g.
    http://www.nabble.com/RAID-vs.-JBOD-td21404366.html

    ReplyDelete
  2. Thanks Dave. It seems JBOD gains better performance. We should have a test. The post from Runping Qi at Yahoo is interesting.

    ReplyDelete
  3. Though the hadoop online training gave me the much needed information about the basic hadoop concepts I learned more information like data, cloud, analytic grealy on thie website. Thanks for sharing.

    ReplyDelete
  4. Very nice post here thanks for it I always like and search such topics and everything connected to them.Excellent and very cool idea and the subject at the top of magnificence and I am happy to comment on this topic through which we address the idea of positive reaction.


    Hadoop Training in Chennai

    ReplyDelete
  5. Very inspiring article! You’ve really made it! These tech giants are leading the tech world because they think different. Thanks for sharing this wonderful article here!
    Selenium Training in Chennai

    ReplyDelete
  6. Really i enjoyed very much. And this may helpful for lot of peoples. So you are provided such a nice and great article within this.

    Dot Net Training in Chennai

    Software Testing Training in Chennai

    ReplyDelete
  7. After reading this blog i very strong in this topics and this blog really helpful to all.Big Data Hadoop Online Training Hyderabad

    ReplyDelete
  8. Thanks for sharing this blog with us, it will be so helpful for us. Keep sharing like this. Big Data Training in Pune

    ReplyDelete
  9. Really i enjoyed very much. And this may helpful for lot of peoples. So you are provided such a nice and great article within this.

    Best Hadoop Training Pune

    ReplyDelete
  10. This is extremely great information for these blog!! And Very good work. It is very interesting to learn from to easy understood. Thank you for giving information.
    Big data Hadoop Training in Mumbai

    ReplyDelete
  11. Its a wonderful post and very helpful, thanks for all this information. You are including better information.
    Big Data Training in Gurgaon
    Big Data Course in Gurgaon
    Big Data Training institute in Gurgaon

    ReplyDelete
  12. Thank you for providing such an awesome article and it is a very useful blog for others to read.


    Oracle ICS Online Training

    ReplyDelete
  13. This comment has been removed by the author.

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. nice blog
    [url=http://procinehub.com/]best baby photographer in jalandhar[/url]
    [url=http://procinehub.com/]best fashion photographer in Chandigarh[/url]
    [url=https://www.styleandgeek.com/home-remedies-hair-fall//]home remedies for hair fall[/url]
    [url=https://www.styleandgeek.com/top-25-home-remedies-to-remove-tanning//home-remedies-hair-fall//]home remedies to get rid of tanning[/url]
    [url=https://www.lms.coim.in//]Online Digital Marketing Training[/url]

    ReplyDelete
  16. new Great to share this information thanks. I am really happy to say it’s an interesting post to read. I learn information from your blog.
    https://www.healthywealthydiet.in

    ReplyDelete
  17. This information you provided in the blog that is really unique I love it!! Thanks for sharing such a great blog Keep posting.
    Big Data Training In Delhi
    Big Data Course In Delhi

    ReplyDelete
  18. Travelling Atomwas started with a vision of Travel Blogging back in 2016 .

    ReplyDelete
  19. Thanks for sharing this blog with us, it will be so helpful for us...
    Selenium Training in Bangalore | Selenium Courses | Selenium Training Institutes - RIA Institute of Technology - Best Selenium Training in Bangalore - Placement oriented Selenium Training Institutes in Bangalore.
    Learn Selenium Testing Training from expert Trainers.

    ReplyDelete
  20. Really i appreciate the effort you made to share the knowledge. The topic here i found was really effective...

    Learn SAP Training from the Industry Experts we bridge the gap between the need of the industry. Softgen Infotech provide the Best SAP ABAP Training in Bangalore with 100% Placement Assistance. Book a Free Demo Today.

    ReplyDelete
  21. Really i appreciate the effort you made to share the knowledge. The topic here i found was really effective...

    Looking for SAP HANA ADMIN Training in Bangalore, learn from eTechno Soft Solutions SAP HANA ADMIN Training on online training and classroom training. Join today!

    ReplyDelete
  22. Great post!I am actually getting ready to across this information,i am very happy to this commands.Also great blog here with all of the valuable information you have.Well done,its a great knowledge.RPA Training in Bangalore

    ReplyDelete
  23. I can’t imagine that’s a great post. Thanks for sharing.

    Upgrade your career Learn Mulesoft Training in Bangalore from industry experts get Complete hands-on Training, Interview preparation, and Job Assistance at Softgen Infotech.

    ReplyDelete
  24. Travelling Atom Travelling Atom was started with a vision of Travel Blogging back in 2016 . Earlier the blog was named Virtual Nerves .

    ReplyDelete
  25. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    http://rexapparels.com/
    http://rexapparels.com/
    http://rexapparels.com/
    http://rexapparels.com/kids-wear-manufacturer-in-tirupur-india/
    http://rexapparels.com/corporate-t-shirt-manufacturer-in-tirupur-india/

    ReplyDelete
  26. i have been following this website blog for the past month. i really found this website was helped me a lot and every thing which was shared here was so informative and useful. again once i appreciate their effort they are making and keep going on.

    Digital Marketing Consultant in Chennai

    Freelance Digital Marketing Consultant

    ReplyDelete
  27. Best Web design company in Brampton
    Best Website Development service company in toronto
    Webaxis - Canada’s Top SEO organization puts stock in developing our customer’s business using White Hat SEO, an inventive and creative methodology that helps in bringing increasingly significant traffic that converts into leads.

    ReplyDelete
  28. TIOBE predicts Python will replace Java as top programming language. ... Java still holds the top spot while C is in second place. According to TIOBE, if Python keeps this pace up it could replace both Java and C in three to four years.
    Java training in Bangalore

    Java training in Hyderabad

    Java Training in Coimbatore

    Java training in Bangalore

    Java training in Hyderabad

    Java Training in Coimbatore

    ReplyDelete
  29. Thanks for sharing.I found a lot of interesting information here. A really good post, very thankful and hopeful that you will write many more posts like this one.
    acte reviews

    acte velachery reviews

    acte tambaram reviews

    acte anna nagar reviews

    acte porur reviews

    acte omr reviews

    acte chennai reviews

    acte student reviews

    ReplyDelete
  30. Your site is amazing and your blogs are informative and knlowledgeble to my websites.This is one of the best tips in my life.I have in quite some time.Nicely written and great info.Thanks to share the more informations.
    python training in chennai

    python course in chennai

    python online training in chennai

    python training in bangalore

    python training in hyderabad

    python online training

    python training

    python flask training

    python flask online training

    python training in coimbatore

    ReplyDelete
  31. Thank you so much for this nice information. Hope so many people will get aware of this and useful as well.

    Indium Software

    ReplyDelete
  32. Really Great Article, I have seen here. This is in related to your article that the Best Digital Marketing Course offered by 99 Digital Academy. The course is designed for students, professionals and for business owners. This course is in trend. Click on link to see more.

    ReplyDelete
  33. This concept is a good way to enhance knowledge. thanks for sharing..

    Selenium Training in Pune

    ReplyDelete
  34. Turpentine Market Status (2016-2020) and Forecast (2021E-2028F) by Region, Product Type & End-Use


    TURPENTINE MARKET

    Market Overview

    At the beginning of a recently published report on the global Turpentine Market, extensive analysis of the industry has been done with an insightful explanation. The overview has explained the potential of the market and the role of key players that have been portrayed in the information that revealed the applications and manufacturing technology required for the growth of the global Turpentine Market.

    Turpentine Market

    ReplyDelete
  35. PikaShow APK for Android app is an all-in-one application that provides premium video content from all sources. You can find all the OTT content from movies, web series, and shows on PikaShow.

    ReplyDelete