Wednesday, January 6, 2010

Jeff Dean and Sanjay Ghemawat's good advices on MapReduce

I'd like to put a copy here, since this paper[1] matchs my opinions so much on MapReduce model and the pratices about large-dataset management/processing implementations.

In the paper, Jeffrey Dean and Sanjay Ghemawat reply Stonebrake and DeWitt's misconceptions about MapReduce. In fact, these misconceptions are so obvious and easy to understand for us.

It is also a good guide to improve the implementation of Hadoop and other members in the family. Suggest you reading it carefully.

Dean and other scientists from Google always bring us clear and reasonable explains about their technologies and pratices. But sometimes, someones from other organizations bring use puzzles.

Except for the five witchcrafts which Google exposed in following papers:
幻灯片 6 Google Cluster and WorkQueue Cluster Management

Following papers/articles/keynotes are very worthy of careful reading:
Jeff Dean Keynotes on LADIS09 (Designs, Lessons and Advice from Building Large
Distributed Systems):
Jeff Dean Keynotes on WSDM09(Challenges in Building Large-Scale Information Retrieval Systems):
Jeff Dean Stanford-295-talk (Software Engineering Advice from Building Large-Scale Distributed Systems):
Jeff Dean "Handling Large Datasets at Google":
Jeff Dean "A Behind the ScenesTour":

And following so called GFS-II articals:
Sean Quinlan: GFS: Evolution on Fast-forward (