binomial distribution - Probability of system failure in a distributed network

Saturday, 14 January 2012

binomial distribution - Probability of system failure in a distributed network

I am trying to build a mathematical model of the availability of a file in a distributed file-system. The system works like this: a node $x$ stores a file $f$ (encoed using erasure codes) at $rb$ remotes nodes, $r$ is the replication-factor where $b$ is a constant. Erasure-coded files have the property that the file can be restored $iff$ at least $b$ of the remote nodes are available and return its part of the file.

The simplest approach to this is to assume that all remote nodes are independent of each other and have the same availability $p$. With these assumptions the availability of a file becomes $$pf = sum_{i=b}^{rb} binom{rb}{i} p^i(1 - p)^{rb - i}$$

Unfortunately these assumptions can introduce a non-neligible error, as shown by this paper: http://deim.urv.cat/~lluis.pamies/uploads/Main/icpp09-paper.pdf.

The other extreme is to calculate the probability of each possible combination of availaible/non-available node and take the sum of all these outcomes (which is sort of what they suggest in the paper above, just more formally than what I just described). This approach is more correct but has a computational cost of $O(2^{rb})$.

Do you guys have any ideas of a good approximation which introduces less error than the binomial distribution-aproximation but with better computational cost than $O(2^{rb})$?

You can assume that the availability-data of each node is a set of tuples consisting of $(measurement-date, node measuring, node being measured, succes/failure-bit)$

Answer Desk

Saturday, 14 January 2012

binomial distribution - Probability of system failure in a distributed network

No comments:

Post a Comment