Monday, March 21, 2016

pg_keeper - Simple failover module for PostgreSQL

I introduce a new module for PostgreSQL called pg_keeper I made, here.
pg_keeper is a quite simplified failover module based on back ground worker for PostgreSQL.
Using this module, you can set up high available replication environment with 1-master and 1-standby.

And you don't need complexly configuration for dedicated HA middleware any more.
pg_keeper does following two things,
  • pool to primary server
  • promote standby server

1. Installaiton

You can get this source code from github.
<https://github.com/MasahikoSawada/pg_keeper>

 $ git clone git@github.com:MasahikoSawada/pg_keeper.git
 $ cd pg_keeper
 $ make USE_PGXS=1
 $ su
 # make install USE_PGXS=1

2. Set up pg_keeper

Please refer to README.


 3. Starting of two servers

In master server,
$ pg_ctl start -D master-data
In slave server,
$ pg_ctl start -D slave-data
Processes are,
14084 pts/2 S 0:00 /home/postgres/pgsql/9.5/bin/postgres -D master-data
14086 ? Ss 0:00 postgres: checkpointer process
14087 ? Ss 0:00 postgres: writer process
14088 ? Ss 0:00 postgres: wal writer process
14089 ? Ss 0:00 postgres: autovacuum launcher process
14090 ? Ss 0:00 postgres: stats collector process
14105 pts/2 S 0:00 /home/postgres/pgsql/9.5/bin/postgres -D slave-data
14109 ? Ss 0:00 postgres: startup process waiting for 000000010000000000000003
14110 ? Ss 0:00 postgres: checkpointer process
14111 ? Ss 0:00 postgres: writer process
14112 ? Ss 0:00 postgres: stats collector process
14113 ? Ss 0:00 postgres: bgworker: pg_keeper
14114 ? Ss 0:00 postgres: wal receiver process streaming 0/3000090
14115 ? Ss 0:00 postgres: wal sender process postgres [local] streaming 0/3000090

The above shows PID 14113 is pg_keeper process. pg_keeper is pooling to master server.

4. Switching primary server

 In here, I kill the primary server process by "kill -9 14084" command.
$ kill -9 14084
After that, pg_keeper promotes standby server while emitting following log messages,


LOG:  could not get tuple from primary server at 1 time(s)
LOG:  could not get tuple from primary server at 2 time(s)
LOG:  promote standby server to primary server
LOG:  received promote request
LOG:  redo done at 0/5000028
LOG:  selected new timeline ID: 2
LOG:  archive recovery complete
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started


Pull request, Issue and feedback are very welcome.
Please checkout and test it.





No comments:

Post a Comment