Big Data Engineering, Practices and Research: Tips of Hadoop Runtime Environment

Sunday, September 27, 2009

Tips of Hadoop Runtime Environment

java

To setup a cluster for running Hadoop/HBase/Hive, etc., besides the configuration and tuning of these open-source programs themselves, the system hardware and software, and some useful utilities, should also be considered to improve the system performance and to ease the system maintenance.

1. Cluster Facilities and Hardware [1]

(1) Data center:

Usually, we run Hadoop/HBase/Hive in a single data center.

(2) Servers:

Clusters are often either capacity bound or CPU bound.

The 1U or 2U configuration is usually used.

The storage capability of each node is usually not too dense (<= 4TB is recommended).

Commodity server: 2x4 core CPU, 16 GB RAM, 4x1TB SATA, 2x1 GE NIC

Use ECC RAM and cheap hard drives: 7200 RPM SATA.

Start with standard 64-bit box for masters and workers.

(3) Network:

Gigabit Ethernet, 2 level tree, 5:1 oversubscription to core

May want redundancy at top of rack and core

Usually, for a small cluster, all nodes are under a single GE switch.

(4) RAID configuration: RAID0

If there are two or more disks in each machine, RAID0 can provide better disk throughput than other RAID levels (and JBOD?). The multiple data replicas of HDFS can tolerate failure and guarantee the data safety.

2. System Software

(1) Linux:

RedHat5+ or CentOS5+ (recommended). Now, we use CentOS5.3-x64.

(2) Local File System:

Ext3 is ok.

We usually configure a separate disk partition for Hadoop used local file system, create and mount separate local file system for Hadoop.

Mount with noatime and nodiratime for performance improvements. Default, Linux will update the atime of files and directories, which is unnecessary in most cases. [2]

-- Edit /etc/fstab:

e.g. /dev/VolGroup00/LogVol02 /data ext3 defaults,noatime,nodiratime 1 2

-- remound:

mount -o remount /data

(3) Swappiness configuration: [3]

With the introduction of version 2.6, the new variable "swappiness" was added in the Linux kernel memory management subsystem and a tunable was created for it. High value of swappiness will make the kernel page out application text in favour of another application or even file-system cache. The default value is 60 (see mm/vmscan.c).

If you end up swapping, you're going to start seeing some weird behavior and very slow GC runs, and likely killing off HBase regionservers as ZooKeeper times out and assume the RegionServer is dead. Suggest setting vm.swappiness = 0 or other low number (e.g. 10), and observe the state of swap.

-- Edit /etc/sysctl.conf : vm.swappiness=0

-- To check the current value on a running system: sysctl vm.swappiness

(4) Linux default file handle limit: [4]

Currently HBase is a file handle glutton. To up the users' file handles, edit /etc/security/limits.conf on all nodes.

* - nofile 32768

(5) Java: JRE/JDK1.6 latest and GC options [5]

For machine with 4-16 cores, our Hadoop/HBase and other java applications should use GC option as: “-XX:+UseConcMarkSweepGC”.

For machine with 2 cores, should use GC option as: “-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode”.

(6) Apache ANT 1.7.1

(7) Useful Linux utilities:

top, sar, iostat, iftop, vmstat, nfsstat, strace, dmesg, and friends

Especially iostat is very useful for disk I/O analysis.