YDB (Yang’s Database) is a simple replicated memory store, developed for the purpose of researching various approaches to recovery in such OLTP-optimized databases as VOLTDB (formerly H-Store/Horizontica).
Currently, the only recovery implemented mechanism is to have the first-joining replica serialize its entire database state and send that to the joining node.
If you start a system of n replicas, then the leader will wait for n-1 of them to join before it starts issuing transactions. (Think of n-1 as the minimum number of replicas the system requires before it is willing to process transactions.) Then when replica n joins, it will need to catch up to the current state of the system, and it will do so by contacting that first replica and receiving a complete dump of its DB state.
The leader will report the current txn seqno to the joiner, and start streaming txns beyond that seqno to the joiner, which the joiner will push onto its backlog. It will also instruct that first replica to snapshot its DB state at this txn seqno and prepare to send it to the recovering node as soon as it connects.
Requirements:
To start a leader to manage 3 replicas, run:
./ydb 3
This will listen on port 7654. Then to start the first two replicas, run:
./ydb localhost 7654 7655
./ydb localhost 7654 7656
This means “connect to the leader at localhost:7654, and listen on port 7655.” The replicas have to listen for connections from other replicas (namely the recovering replica).
The recovering replica then joins:
./ydb localhost 7654 7657
It will connect to the first replica (on port 7655) and receive a DB dump from it.
To terminate the system, send a sigint (ctrl-c) to the leader, and a clean shutdown should take place. The replicas dump their DB state to a tmp file, which you can then verify to be identical.
foreach event
if event == departure
remove replica
if event == join
add replica
send init msg to new replica
who else is in the system
which txn we're on
start sending txns to new replica
start handling responses from new replica
read responses up till the current seqno
start listening for conns from new replicas
generate recovery msg from map
send recovery msg to new replica
send join msg to leader
recv init msg from leader
start recving txns from leader
if map is caught up
apply txn directly
else
push onto backlog
foreach replica
connect to replica
recv recovery msg from replica
apply the state
apply backlog
Expose program options.
Add test suite.
Run some benchmarks, esp. on multiple physical hosts.
Figure out why things are running so slowly with >2 replicas.
Add a variant of the recovery scheme so that the standing replicas can just send any snapshot of their DB beyond a certain seqno. The joiner can simply discard from its leader-populated backlog any txns before the seqno of the actual state it receives. This way, no communication between the leader and the standing replicas needs to take place, and the replicas don’t need to wait for the new guy to join before they can continue processing txns.
Add a recovery scheme to recover from multiple replicas simultaneously.
Add richer transactions/queries/operations.
Add disk-based recovery methods.
YDB is released under the GNU GPL3.
Copyright 2008 Yang Zhang.
All rights reserved.
Back to assorted.sf.net.