browse svn | home page
YDB (Yang’s Database) is a simple replicated memory store, developed for the purpose of researching various approaches to recovery in such OLTP-optimized databases as VOLTDB (formerly H-Store/Horizontica).
Currently, the only recovery implemented mechanism is to have the first-joining replica serialize its entire database state and send that to the joining node.
If you start a system of n replicas, then the leader will wait for n-1 of them to join before it starts issuing transactions. (Think of n-1 as the minimum number of replicas the system requires before it is willing to process transactions.) Then when replica n joins, it will need to catch up to the current state of the system, and it will do so by contacting that first replica and receiving a complete dump of its DB state.
The leader will report the current txn seqno to the joiner, and start streaming txns beyond that seqno to the joiner, which the joiner will push onto its backlog. It will also instruct that first replica to snapshot its DB state at this txn seqno and prepare to send it to the recovering node as soon as it connects.
To start a leader to manage 3 replicas, run:
This will listen on port 7654. Then to start the first two replicas, run:
./ydb localhost 7654 7655 ./ydb localhost 7654 7656
This means “connect to the leader at localhost:7654, and listen on port 7655.” The replicas have to listen for connections from other replicas (namely the recovering replica).
The recovering replica then joins:
./ydb localhost 7654 7657
It will connect to the first replica (on port 7655) and receive a DB dump from it.
To terminate the system, send a sigint (ctrl-c) to the leader, and a clean shutdown should take place. The replicas dump their DB state to a tmp file, which you can then verify to be identical.
foreach event if event == departure remove replica if event == join add replica send init msg to new replica who else is in the system which txn we're on start sending txns to new replica start handling responses from new replica read responses up till the current seqno
start listening for conns from new replicas generate recovery msg from map send recovery msg to new replica send join msg to leader recv init msg from leader start recving txns from leader if map is caught up apply txn directly else push onto backlog foreach replica connect to replica recv recovery msg from replica apply the state apply backlog
Expose program options.
Add test suite.
Run some benchmarks, esp. on multiple physical hosts.
Figure out why things are running so slowly with >2 replicas.
Add a variant of the recovery scheme so that the standing replicas can just send any snapshot of their DB beyond a certain seqno. The joiner can simply discard from its leader-populated backlog any txns before the seqno of the actual state it receives. This way, no communication between the leader and the standing replicas needs to take place, and the replicas don’t need to wait for the new guy to join before they can continue processing txns.
Add a recovery scheme to recover from multiple replicas simultaneously.
Add richer transactions/queries/operations.
Add disk-based recovery methods.
YDB is released under the GNU GPL3.
Copyright 2008 Yang Zhang.
All rights reserved.
Back to assorted.sf.net.