YDB

Overview

YDB (Yang’s Database) is a simple replicated memory store, developed for the purpose of researching various approaches to recovery in such OLTP-optimized databases as VOLTDB (formerly H-Store/Horizontica).

Currently, the only recovery implemented mechanism is to have the first-joining replica serialize its entire database state and send that to the joining node.

If you start a system of n replicas, then the leader will wait for n-1 of them to join before it starts issuing transactions. (Think of n-1 as the minimum number of replicas the system requires before it is willing to process transactions.) Then when replica n joins, it will need to catch up to the current state of the system, and it will do so by contacting that first replica and receiving a complete dump of its DB state.

The leader will report the current txn seqno to the joiner, and start streaming txns beyond that seqno to the joiner, which the joiner will push onto its backlog. It will also instruct that first replica to snapshot its DB state at this txn seqno and prepare to send it to the recovering node as soon as it connects.

Setup

Requirements:

boost 1.37
C++ Commons svn r1082
GCC 4.3.2
Lazy C++ 2.8.0
Protocol Buffers 2.0.0
State Threads 1.8

Usage

To start a leader to manage 3 replicas, run:

./ydb 3

This will listen on port 7654. Then to start the first two replicas, run:

./ydb localhost 7654 7655
./ydb localhost 7654 7656

This means “connect to the leader at localhost:7654, and listen on port 7655.” The replicas have to listen for connections from other replicas (namely the recovering replica).

The recovering replica then joins:

./ydb localhost 7654 7657

It will connect to the first replica (on port 7655) and receive a DB dump from it.

To terminate the system, send a sigint (ctrl-c) to the leader, and a clean shutdown should take place. The replicas dump their DB state to a tmp file, which you can then verify to be identical.

Pseudo-code

Leader

foreach event
  if event == departure
    remove replica
  if event == join
    add replica
    send init msg to new replica
      who else is in the system
      which txn we're on
    start sending txns to new replica
    start handling responses from new replica
      read responses up till the current seqno

Replica

start listening for conns from new replicas
  generate recovery msg from map
  send recovery msg to new replica
send join msg to leader
recv init msg from leader
start recving txns from leader
  if map is caught up
    apply txn directly
  else
    push onto backlog
foreach replica
  connect to replica
  recv recovery msg from replica
  apply the state
  apply backlog

Todo

Expose program options.
Add test suite.
Add benchmarking information, e.g.:
- txns/second normally
- txns/second during recovery
- time to recover
- bytes used to recover
Run some benchmarks, esp. on multiple physical hosts.
Figure out why things are running so slowly with >2 replicas.
Add a variant of the recovery scheme so that the standing replicas can just send any snapshot of their DB beyond a certain seqno. The joiner can simply discard from its leader-populated backlog any txns before the seqno of the actual state it receives. This way, no communication between the leader and the standing replicas needs to take place, and the replicas don’t need to wait for the new guy to join before they can continue processing txns.
Add a recovery scheme to recover from multiple replicas simultaneously.
Add richer transactions/queries/operations.
Add disk-based recovery methods.

License

YDB is released under the GNU GPL3.

Contact

Back to assorted.sf.net.