thoughts from the red planet

Top

Follow Nathan on

Entries from December 1, 2009 - December 31, 2009

Monday

Dec282009

The mathematics behind Hadoop-based systems

Monday, December 28, 2009

I wish I had known this a year ago. Now, with some simple mathematics I can finally answer:

Why doesn't the speed of my workflow double when I double the amount of processing power?
Why does a 10% failure rate cause my runtime to go up by 300%?
How does optimizing out 30% of my workflow runtime cause the runtime to decrease by 80%?
How many machines should I have in my cluster to be adequately performant and fault-tolerant?

All of these questions are neatly answered by one simple equation:

Click to read more ...

Copyright © 2012-2019, Nathan Marz. All rights reserved.