Saturday, September 12, 2009

HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs

1. Intruduction

HFile is a mimic of Google’s SSTable. Now, it is available in Hadoop HBase-0.20.0. And the previous releases of HBase temporarily use an alternate file format – MapFile[4], which is a common file format in Hadoop IO package. I think HFile should also become a common file format when it becomes mature, and should be moved into the common IO package of Hadoop in the future.

Following words of SSTable are from section 4 of Google’s Bigtable paper.

The Google SSTable file format is used internally to store Bigtable data. An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Operations are provided to look up the value associated with a specified key, and to iterate over all key/value pairs in a specified key range. Internally, each SSTable contains a sequence of blocks (typically each block is 64KB in size, but this is configurable). A block index (stored at the end of the SSTable) is used to locate blocks; the index is loaded into memory when the SSTable is opened. A lookup can be performed with a single disk seek: we first find the appropriate block by performing a binary search in the in-memory index, and then reading the appropriate block from disk. Optionally, an SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching disk.[1]

The HFile implements the same features as SSTable, but may provide more or less.

2. File Format

Data Block Size

Whenever we say Block Size, it means the uncompressed size.

The size of each data block is 64KB by default, and is configurable in HFile.Writer. It means the data block will not exceed this size more than one key/value pair. The HFile.Writer starts a new data block to add key/value pairs if the current writing block is equal to or bigger than this size. The 64KB size is same as Google’s [1].

To achieve better performance, we should select different block size. If the average key/value size is very short (e.g. 100 bytes), we should select small blocks (e.g. 16KB) to avoid too many key/value pairs in each block, which will increase the latency of in-block seek, because the seeking operation always finds the key from the first key/value pair in sequence within a block.

Maximum Key Length

The key of each key/value pair is currently up to 64KB in size. Usually, 10-100 bytes is a typical size for most of our applications. Even in the data model of HBase, the key (rowkey+column family:qualifier+timestamp) should not be too long.

Maximum File Size

The trailer, file-info and total data block indexes (optionally, may add meta block indexes) will be in memory when writing and reading of an HFile. So, a larger HFile (with more data blocks) requires more memory. For example, a 1GB uncompressed HFile would have about 15600 (1GB/64KB) data blocks, and correspondingly about 15600 indexes. Suppose the average key size is 64 bytes, then we need about 1.2MB RAM (15600X80) to hold these indexes in memory.

Compression Algorithm

- Compression reduces the number of bytes written to/read from HDFS.

- Compression effectively improves the efficiency of network bandwidth and disk space

- Compression reduces the size of data needed to be read when issuing a read

To be as low friction as necessary, a real-time compression library is preferred. Currently, HFile supports following three algorithms:

(1)NONE (Default, uncompressed, string name=”none”)

(2)GZ (Gzip, string name=”gz”)

Out of the box, HFile ships with only Gzip compression, which is fairly slow.

(3)LZO(Lempel-Ziv-Oberhumer, preferred, string name=”lzo”)

To achieve maximal performance and benefit, you must enable LZO, which is a lossless data compression algorithm that is focused on decompression speed.

Following figures show the format of an HFile.

In above figures, an HFile is separated into multiple segments, from beginning to end, they are:

- Data Block segment

To store key/value pairs, may be compressed.

- Meta Block segment (Optional)

To store user defined large metadata, may be compressed.

- File Info segment

It is a small metadata of the HFile, without compression. User can add user defined small metadata (name/value) here.

- Data Block Index segment

Indexes the data block offset in the HFile. The key of each index is the key of first key/value pair in the block.

- Meta Block Index segment (Optional)

Indexes the meta block offset in the HFile. The key of each index is the user defined unique name of the meta block.

- Trailer

The fix sized metadata. To hold the offset of each segment, etc. To read an HFile, we should always read the Trailer firstly.

The current implementation of HFile does not include Bloom Filter, which should be added in the future.

3. LZO Compression

LZO is now removed from Hadoop or HBase 0.20+ because of GPL restrictions. To enable it, we should install native library firstly as following. [6][7][8][9]

(1) Download LZO:, and build.

# ./configure --build=x86_64-redhat-linux-gnu --enable-shared --disable-asm

# make

# make install

Then the libraries have been installed in: /usr/local/lib

(2) Download the native connector library, and build.

Copy hadoo-0.20.0-core.jar to ./lib.

# ant compile-native

# ant jar

(3) Copy the native library (build/native/ Linux-amd64-64) and hadoop-gpl-compression-0.1.0-dev.jar to your application’s lib directory. If your application is a MapReduce job, copy them to hadoop’s lib directory. Your application should follow the $HADOOP_HOME/bin/hadoop script to ensure that the native hadoop library is on the library path via the system property -Djava.library.path=. [9]

4. Performance Evaluation


4 slaves + 1 master

Machine: 4 CPU cores (2.0G), 2x500GB 7200RPM SATA disks, 8GB RAM.

Linux: RedHat 5.1 (2.6.18-53.el5), ext3, no RAID, noatime

1Gbps network, all nodes under the same switch.

Hadoop-0.20.0 (1GB heap), lzo-2.0.3

Some MapReduce-based benchmarks are designed to evaluate the performance of operations to HFiles, in parallel.

Total key/value entries: 30,000,000.

Key/Value size: 1000 bytes (10 for key, and 990 for value). We have totally 30GB of data.

Sequential key ranges: 60, i.e. each range have 500,000 entries.

Use default block size.

The entry value is a string, each continuous 8 bytes are a filled with a same letter (A~Z). E.g. “BBBBBBBBXXXXXXXXGGGGGGGG……”.

We set to avoid client side bottleneck.

(1) Write

Each MapTask for each range of key, which writes a separate HFile with 500,000 key/value entries.

(2) Full Scan

Each MapTask scans a separate HFile from beginning to end.

(3) Random Seek a specified key

Each MapTask opens one separate HFile, and selects a random key within that file to seek it. Each MapTask runs 50,000 (1/10 of the entries) random seeks.

(4) Random Short Scan

Each MapTask opens one separate HFile, and selects a random key within that file as a beginning to scan 30 entries. Each MapTask runs 50,000 scans, i.e. scans 50,000*30=1,500,000 entries.

This table shows the average entries which are written/seek/scanned per second, and per node.

In this evaluation case, the compression ratio is about 7:1 for gz(Gzip), and about 4:1 for lzo. Even through the compression ratio is just moderate, the lzo column shows the best performance, especially for writes.

The performance of full scan is much better than SequenceFile, so HFile may provide better performance to MapReduce-based analytical applications.

The random seek in HFiles is slow, especially in none-compressed HFiles. But the above numbers already show 6X~10X better performance than a disk seek (10ms). Following Ganglia charts show us the overhead of load, CPU, and network. The random short scan makes the similar phenomena.


[1] Google, Bigtable: A Distributed Storage System for Structured Data,

[2] HBase-0.20.0 Documentation,

[3] HFile code review and refinement.

[4] MapFile API:

[5] Parallel LZO: Splittable Compression for Hadoop.

[6] Using LZO in Hadoop and HBase:

[7] LZO:

[8] Hadoop LZO native connector library:

[9] Hadoop Native Libraries Guide:


  1. Might be interesting doing a contrast with tfile which was just committed to hadoop trunk (and 0.20.0)

  2. @stack,
    Yes, we are also interested in TFile and others.

  3. i am having hard time understanding the value of a data block. What would HBase lose if concept of block is removed?
    You could compress whole file (in fact compression would be better), you could build an index for whole file. You can make file available by replicating it.

    1. 1. Think about read:
      If you want to read one row or one column, you must decompress the whole non-blocked file.

      2. Think about data cache.

  4. Big data is now taking the guesswork out of discerning which individuals are the best targets for a particular product. To know more about SAP, Visit Big data training in chennai

  5. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Hadoop Training in Chennai

  6. Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point.
    mcdonaldsgutscheine | startlr | saludlimpia

  7. Now Big Data is an emerging technology thanks for sharing such a wonderful information...
    Best Hadoop Training in chennai

  8. I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.salesforce training in hyderabad

  9. Great Post. Keep sharing such kind of noteworthy information.

    IoT Training in Chennai | IoT Courses in Chennai

  10. This comment has been removed by the author.

  11. I would like to thank you for your nicely written post, its informative and your writing style encouraged me to read it till end. Thanks

    angularjs-Training in annanagar

    angularjs Training in chennai

    angularjs Training in chennai

    angularjs Training in bangalore

  12. You rock particularly for the high caliber and results-arranged offer assistance. I won't reconsider to embrace your blog entry to anyone who needs and needs bolster about this and safety course in chennai

  13. My rather long internet look up has at the end of the day been compensated with pleasant insight
    nebosh course in chennai

  14. Thank you for your post.The post contain a valuable information is very useful to me.
    Qlikview Training
    Application Packagining Training
    Python Training

  15. Thanks for Sharing this Wonderfull Post

    Big Data Training in Chennai

  16. I wanted to thank you for this great read!! I definitely enjoying every little bit of it I have you bookmarked to check out new stuff you article.
    online Python certification course | python training in OMR | python training course in chennai

  17. Great post! I am actually getting ready to across this information, It’s very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.
    Online DevOps Certification Course - Gangboard
    Best Devops Training institute in Chennai

  18. I’ve desired to post about something similar to this on one of my blogs and this has given me an idea. Cool Mat.

    Java training in Chennai | Java training in Bangalore

    Java interview questions and answers | Core Java interview questions and answers

  19. Thank you for this post. Thats all I are able to say. You most absolutely have built this blog website into something speciel. You clearly know what you are working on, youve insured so many corners.thanks

    Data Science Training in Chennai | Data Science course in anna nagar

    Data Science course in chennai | Data science course in Bangalore

    Data Science course in marathahalli | Data Science course in btm layout

  20. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.. 

    best rpa training in chennai | rpa online training |
    rpa training in chennai |
    rpa training in bangalore
    rpa training in pune
    rpa training in marathahalli
    rpa training in btm

  21. Hey, Wow all the posts are very informative for the people who visit this site. Good work! We also have a Website. Please feel free to visit our site. Thank you for sharing.
    Well written article.Thank You Sharing with Us android development for beginners | future of android development 2018 |
    android device manager location histor

  22. Thanks for such a nice article on Blueprism.Amazing information of Blueprism you have . Keep sharing and updating this wonderful blog on Blueprism
    Thanks and regards,
    blue prism training in chennai
    blue prism training institute in chennai
    Blueprism certification in chennai

  23. This is extremely great information for these blog!! And Very good work. It is very interesting to learn from to easy understood. Thank you for giving information. Please let us know and more information get post to link.
    ibm websphere training

  24. i am really interested in learning Bid data. but will you tell the online platfrom to start learning this technology

  25. I think things like this are really interesting. I absolutely love to find unique places like this. It really looks super creepy though!! Roles and reponsibilities of hadoop developer | hadoop developer skills Set | hadoop training course fees in chennai | Hadoop Training in Chennai Omr

  26. Good job! Fruitful article. I like this very much. It is very useful for my research. It shows your interest in this topic very well. I hope you will post some more information about the software. Please keep sharing!!
    SEO Training Center in Chennai
    SEO Institutes in Chennai
    SEO Course Chennai
    Best digital marketing course in chennai
    Digital marketing course chennai
    Digital Marketing Training Institutes in Chennai

  27. Write more; that’s all I have to say. It seems as though you relied on the video to make your point.
    fire and safety course in chennai

  28. Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Work

    DevOps is currently a popular model currently organizations all over the world moving towards to it. Your post gave a clear idea about knowing the DevOps model and its importance.

    Good to learn about DevOps at this time.

    devops training in chennai | devops training in chennai with placement | devops training in chennai omr | devops training in velachery | devops training in chennai tambaram | devops institutes in chennai | devops certification in chennai | trending technologies list 2018

  29. Incredible post! I am really preparing to over this data, It's exceptionally useful for this blog.Also extraordinary with the majority of the profitable data you have Keep up the great work you are doing admirably. Presently Big Data is a developing innovation, this exceptionally enlightening post.

    Siva Prasad

    Hadoop Training in Bangalore

  30. Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Work

    DevOps is currently a popular model currently organizations all over the world moving towards to it. Your post gave a clear idea about knowing the DevOps model and its importance.

    Good to learn about DevOps at this time.

    devops training in chennai | devops training in chennai with placement | devops training in chennai omr | devops training in velachery | devops training in chennai tambaram | devops institutes in chennai | devops certification in chennai | trending technologies list 2018

  31. Thanks for such a great article here. I was searching for something like this for quite a long time and at last, I’ve found it on your blog. It was definitely interesting for me to read about their market situation nowadays.AngularJS Training in Chennai | Best AngularJS Training Institute in Chennai

  32. Nice blog...
    Thanks for sharing useful information and it's helps a lot and your giving clear explanation on this topic.
    And keep on continue to posting here.
    I got a good website regarding Online IT courses, after a long search in google, i hope it will help to you.
    AWS Developer Training

  33. Thanks for your post. This is excellent information. The list of your blogs is very helpful for those who want to learn, It is amazing!!! You have been helping many application.
    best selenium training in chennai | best selenium training institute in chennai selenium training in chennai

  34. Thank you so much for your information,its very useful and helful to me.Keep updating and sharing. Thank you.
    RPA training in chennai | UiPath training in chennai

  35. Such a Great Article!! I learned something new from your blog. Amazing stuff. I would like to follow your blog frequently. Keep Rocking!!
    Blue Prism training in chennai | Best Blue Prism Training Institute in Chennai

  36. Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work.

    best machine learning institutes in chennai
    artificial intelligence and machine learning course in chennai
    machine learning classroom training in Chennai
    Android training in Chennai
    PMP training in chennai

  37. Hi, Thanks a lot for your explanation which is really nice. I have read all your posts here. It is amazing!!!
    Keeps the users interest in the website, and keep on sharing more, To know more about our service:
    Please free to call us @ +91 9884412301 / 9600112302

    Openstack course training in Chennai | best Openstack course in Chennai | best Openstack certification training in Chennai | Openstack certification course in Chennai | openstack training in chennai omr | openstack training in chennai velachery

  38. It’s great to come across a blog every once in a while, that isn’t the same out of date rehashed material. Fantastic read.
    safety course in chennai

  39. Nice post. By reading your blog, i get inspired and this provides some useful information. Thank you for posting this exclusive post for our vision....

    data science online training
    sas online training
    linux online training
    aws online training
    testing tools online training
    devops online training
    salesforce online training

  40. I am waiting for your more posts like this or related to any other informative topic.
    Testing tools Training
    Tibco Training

  41. Wow!! Really a nice Article. Thank you so much for your efforts. Definitely, it will be helpful for others. I would like to follow your blog. Share more like this. Thanks Again.
    iot training in Chennai | Best iot Training Institute in Chennai

  42. Hi Very Nice Blog I Have Read Your Post It Is Very Informative And Useful Thanks For Posting And Sharing With Us.
    Carpet Cleaning Sunshine Coast
    Carpet Cleaners Sunshine Coast

  43. I’ve bookmarked your site, and I’m adding your RSS feeds to my Google account.
    nebosh course in chennai

  44. Nice post. By reading your blog, i get inspired and this provides some useful information. Thank you for posting this exclusive post for our vision....
    vmware online training
    tableau online training
    qlikview online training
    python online training
    java online training
    sql online training
    cognos online training

  45. I simply wanted to thank you so much again. I am not sure the things that I might have gone through without the type of hints revealed by you regarding that situation.
    occupational health and safety course in chennai

  46. Appreciating the persistence, you put into your blog and detailed information you provide.
    iosh safety course in chennai

  47. it is really explainable very well and i got more information from your blog.
    sap-sd training
    sap-security training