Entries from December 1, 2009 - December 31, 2009

Monday
Dec282009

The mathematics behind Hadoop-based systems

I wish I had known this a year ago. Now, with some simple mathematics I can finally answer:

  • Why doesn't the speed of my workflow double when I double the amount of processing power?
  • Why does a 10% failure rate cause my runtime to go up by 300%?
  • How does optimizing out 30% of my workflow runtime cause the runtime to decrease by 80%?
  • How many machines should I have in my cluster to be adequately performant and fault-tolerant?

All of these questions are neatly answered by one simple equation:

Click to read more ...