Saturday, June 28, 2014

Notes: Distributed System -- Chubby, Zookeeper

Use of Chubby or Zookeeper
  • Group membership and name services
    By having each node register an ephemeral znode for itself (and any roles it might be fulfilling), you can use ZooKeeper as a replacement for DNS within your cluster. Nodes that go down are automatically removed from the list, and your cluster always has an up-to-date directory of the active nodes.
  • Distributed mutexes and master election
    We discussed these potential uses for ZooKeeper above in connection with sequential nodes. These features can help you implement automatic fail-over within your cluster, coordinate concurrent access to resources, and make other decisions in your cluster safely.
  • Asynchronous message passing and event broadcasting
    Although other tools are better suited to message passing when throughput is the main concern, I’ve found ZooKeeper to be quite useful for building a simple pub/sub system when needed. In one case, a cluster needed a long sequence of actions to be performed in the hours after a node was added or removed in the cluster. On demand, the sequence of actions was loaded into ZooKeeper as a group of sequential nodes, forming a queue. The “master” node processed each action at the designated time and in the correct order. The process took several hours and there was a chance that the master node might crash or be decommissioned during that time. Because ZooKeeper recorded the progress on each action, another node could pick up where the master left off in the event of any problem.
  • Centralized configuration management
    Using ZooKeeper to store your configuration information has two main benefits. First, new nodes only need to be told how to connect to ZooKeeper and can then download all other configuration information and determine the role they should play in the cluster for themselves. Second, your application can subscribe to changes in the configuration, allowing you to tweak the configuration through a ZooKeeper client and modify the cluster’s behavior at run-time.
Reference:
http://blog.cloudera.com/blog/2013/02/how-to-use-apache-zookeeper-to-build-distributed-apps-and-why/

http://www.ibm.com/developerworks/library/bd-zookeeper/

awesome read:  https://developer.yahoo.com/hadoop/tutorial/module6.html#intro
-------------------------------------------
group membership:
example
The Group Membership Service (GMS) uses self-defined system membership. Processes can join or leave the distributed system at any time. The GMS communicates this information to every other member in the system, with certain consistency guarantees. Each member in the group participates in membership decisions, which ensures that either all members see a new member or no members see it.
The membership coordinator, a key component of the GMS, handles "join" and "leave" requests, and also handles members that are suspected of having left the system. The system automatically elects the oldest member of the distributed system to act as the coordinator, and it elects a new one if the member fails or is unreachable. The coordinator's basic purpose is to relay the current membership view to each member of the distributed system and to ensure the consistency of the view at all times.
Because the SQLFire distributed system is dynamic, you can add or remove members in a very short time period. This makes it easy to reconfigure the system to handle added demand (load).The GMS permits the distributed system to progress under conditions in which a statically-defined membership system could not. A static model defines members by host and identity, which makes it difficult to add or remove members in an active distributed 

------------------------------------
Note:
write: only through master
read: any nodes can handle it

use: leader election, lock, consensus/voting?
more uses: http://ksjeyabarani.blogspot.com/2012/11/zookeeper-as-distributed-consensus.html
zookeeper recipe: http://zookeeper.apache.org/doc/current/recipes.html