Monday
Dec282009
The mathematics behind Hadoop-based systems
I wish I had known this a year ago. Now, with some simple mathematics I can finally answer:
- Why doesn't the speed of my workflow double when I double the amount of processing power?
- Why does a 10% failure rate cause my runtime to go up by 300%?
- How does optimizing out 30% of my workflow runtime cause the runtime to decrease by 80%?
- How many machines should I have in my cluster to be adequately performant and fault-tolerant?
All of these questions are neatly answered by one simple equation: