Saturday, May 10, 2014

Notes: Distributed System -- Paxos

what is instance
  • An instance is the algorithm for choosing one value.
  • A round refers to a proposer's single attempt of a Phase 1 + Phase 2. A node can have multiple rounds in an instance of Paxos. A round id is globaly unique per instance across all nodes. This is sometimes called proposal number.
  • A node may take on several roles; most notably Proposer and Acceptor. In my answers, I'll assume each node takes on both roles.

A naive implementation of Paxos is not guaranteed to terminate because multiple nodes can leap-frog Prepare phases. There are two ways of getting around this. One is to have a random backoff before starting new Prepare phases. The second is to route all requests to a designated leader, that acts as proposer (The leader is chosen by a Paxos instance. See also Multi-paxos) 

roundId = i*M + index[node]where i is the ith round this node is starting (that is i is unique per node per paxos instance, and is monotonically increasing).

reference:  http://stackoverflow.com/questions/5850487/questions-about-paxos-implementation/10151660#10151660
----------------------
more paxos: Michael Deardeuff on stackoverflow
pseudo code
http://stackoverflow.com/questions/14435646/paxos-value-choice/14472334#14472334
 --------------------
Round ids (aka proposal numbers) should be increasing and must be unique per instance across all nodes.

-------------------------------
Paxos notes in Chinese
http://blog.csdn.net/m_vptr/article/details/8014220
 -------------------------------------------
No timeout in Paxos???? see below
----------------------------------
finish How to Build a Highly Available System, from section 6.4

------------------------------------------------------
lessons-learned-from-paxos: http://blog.willportnoy.com/2012/06/lessons-learned-from-paxos.html
explained a more practical Paxos

You may even choose to introduce nondeterminism: for example, I use randomized exponential backoff as a response to timeouts to allow a set of Paxos replicas to settle on a persistent leader allowing for faster consensus.

To make code deterministic for repeatable tests, it must have the same inputs. And the same inputs means the same network i/o, thread scheduling, random number generation - effectively all interaction with the outside world.

about timeout in Paxos
  1. Many theoretical algorithms from distributed systems use the Asynchronous Model of time, where there is no shared notion of time across different nodes. One technique used to reduce these methods to practice is to introduce timeouts on certain operations. There would be a timeout for the replica trying to become leader, a timeout for the non-leader replicas watching for a leader replica that has gone offline, and a timeout for the phase 2 rounds of voting. There is an important detail in the last timeout though - the better implementations of Paxos allow multiple instances to reach consensus in parallel (without picking a fixed factor of concurrency, as Stoppable Paxosdescribes). But it simply takes longer, even if purely by network latency, to drive more instances to consensus. So in the end, you either vary your timeout in proportion to the number of requests or you limit the number of concurrent requests so that the timeout is sufficient.

#11 how to avoid massive amount of messages passing in phase1
#15 queue with timeout can work as batching

----------------------------------------




No comments:

Post a Comment