Saturday, May 10, 2014

Notes: Distributed System -- Paxos

what is instance
  • An instance is the algorithm for choosing one value.
  • A round refers to a proposer's single attempt of a Phase 1 + Phase 2. A node can have multiple rounds in an instance of Paxos. A round id is globaly unique per instance across all nodes. This is sometimes called proposal number.
  • A node may take on several roles; most notably Proposer and Acceptor. In my answers, I'll assume each node takes on both roles.

A naive implementation of Paxos is not guaranteed to terminate because multiple nodes can leap-frog Prepare phases. There are two ways of getting around this. One is to have a random backoff before starting new Prepare phases. The second is to route all requests to a designated leader, that acts as proposer (The leader is chosen by a Paxos instance. See also Multi-paxos) 

roundId = i*M + index[node]where i is the ith round this node is starting (that is i is unique per node per paxos instance, and is monotonically increasing).

reference:  http://stackoverflow.com/questions/5850487/questions-about-paxos-implementation/10151660#10151660
----------------------
more paxos: Michael Deardeuff on stackoverflow
pseudo code
http://stackoverflow.com/questions/14435646/paxos-value-choice/14472334#14472334
 --------------------
Round ids (aka proposal numbers) should be increasing and must be unique per instance across all nodes.

-------------------------------
Paxos notes in Chinese
http://blog.csdn.net/m_vptr/article/details/8014220
 -------------------------------------------
No timeout in Paxos???? see below
----------------------------------
finish How to Build a Highly Available System, from section 6.4

------------------------------------------------------
lessons-learned-from-paxos: http://blog.willportnoy.com/2012/06/lessons-learned-from-paxos.html
explained a more practical Paxos

You may even choose to introduce nondeterminism: for example, I use randomized exponential backoff as a response to timeouts to allow a set of Paxos replicas to settle on a persistent leader allowing for faster consensus.

To make code deterministic for repeatable tests, it must have the same inputs. And the same inputs means the same network i/o, thread scheduling, random number generation - effectively all interaction with the outside world.

about timeout in Paxos
  1. Many theoretical algorithms from distributed systems use the Asynchronous Model of time, where there is no shared notion of time across different nodes. One technique used to reduce these methods to practice is to introduce timeouts on certain operations. There would be a timeout for the replica trying to become leader, a timeout for the non-leader replicas watching for a leader replica that has gone offline, and a timeout for the phase 2 rounds of voting. There is an important detail in the last timeout though - the better implementations of Paxos allow multiple instances to reach consensus in parallel (without picking a fixed factor of concurrency, as Stoppable Paxosdescribes). But it simply takes longer, even if purely by network latency, to drive more instances to consensus. So in the end, you either vary your timeout in proportion to the number of requests or you limit the number of concurrent requests so that the timeout is sufficient.

#11 how to avoid massive amount of messages passing in phase1
#15 queue with timeout can work as batching

----------------------------------------




Sunday, May 4, 2014

Hashing

hash function
page 264, clrs
h.k/ D bm.kA mod 1/c

Division Method:
h(k) = k mod m
where m is ideally prime

Multiplication Method:
h(k) = [(a k) mod 2w] (w >> r)
where a is a random odd integer between 2^(w-1) and 2^w, k is given by w bits, and m = table
size = 2r.

universal hashing

----------------------------------
Karp Rabin_wiki, Rolling Hash_wiki

However, there are two problems with this. First, because there are so many different strings, to keep the hash values small we have to assign some strings the same number. This means that if the hash values match, the strings might not match; we have to verify that they do, which can take a long time for long substrings. Luckily, a good hash function promises us that on most reasonable inputs, this won't happen too often, which keeps the average search time within an acceptable range.

hash functions used for karp rabin, go to wiki hash function
The key to the Rabin–Karp algorithm's performance is the efficient computation of hash values of the successive substrings of the text. One popular and effective rolling hash function treats every substring as a number in some base, the base being usually a large prime. For example, if the substring is "hi" and the base is 101, the hash value would be 104 × 1011 + 105 × 1010 = 10609 (ASCII of 'h' is 104 and of 'i' is 105)

multiple pattern search
The Rabin–Karp algorithm is inferior for single pattern searching to Knuth–Morris–Pratt algorithm, Boyer–Moore string search algorithm and other faster single pattern string searching algorithms because of its slow worst case behavior. However, it is an algorithm of choice for multiple pattern search.

------------------------------------------
Probing 
Liner probing
quadratic probing 
Double hashing
h(k, i) = (h1(k) + i * h2(k) mod m