Monday, March 21, 2016

pg_keeper - Simple failover module for PostgreSQL

I introduce a new module for PostgreSQL called pg_keeper I made, here.
pg_keeper is a quite simplified failover module based on back ground worker for PostgreSQL.
Using this module, you can set up high available replication environment with 1-master and 1-standby.

And you don't need complexly configuration for dedicated HA middleware any more.
pg_keeper does following two things,
  • pool to primary server
  • promote standby server

1. Installaiton

You can get this source code from github.
<https://github.com/MasahikoSawada/pg_keeper>

 $ git clone git@github.com:MasahikoSawada/pg_keeper.git
 $ cd pg_keeper
 $ make USE_PGXS=1
 $ su
 # make install USE_PGXS=1

2. Set up pg_keeper

Please refer to README.


 3. Starting of two servers

In master server,
$ pg_ctl start -D master-data
In slave server,
$ pg_ctl start -D slave-data
Processes are,
14084 pts/2 S 0:00 /home/postgres/pgsql/9.5/bin/postgres -D master-data
14086 ? Ss 0:00 postgres: checkpointer process
14087 ? Ss 0:00 postgres: writer process
14088 ? Ss 0:00 postgres: wal writer process
14089 ? Ss 0:00 postgres: autovacuum launcher process
14090 ? Ss 0:00 postgres: stats collector process
14105 pts/2 S 0:00 /home/postgres/pgsql/9.5/bin/postgres -D slave-data
14109 ? Ss 0:00 postgres: startup process waiting for 000000010000000000000003
14110 ? Ss 0:00 postgres: checkpointer process
14111 ? Ss 0:00 postgres: writer process
14112 ? Ss 0:00 postgres: stats collector process
14113 ? Ss 0:00 postgres: bgworker: pg_keeper
14114 ? Ss 0:00 postgres: wal receiver process streaming 0/3000090
14115 ? Ss 0:00 postgres: wal sender process postgres [local] streaming 0/3000090

The above shows PID 14113 is pg_keeper process. pg_keeper is pooling to master server.

4. Switching primary server

 In here, I kill the primary server process by "kill -9 14084" command.
$ kill -9 14084
After that, pg_keeper promotes standby server while emitting following log messages,


LOG:  could not get tuple from primary server at 1 time(s)
LOG:  could not get tuple from primary server at 2 time(s)
LOG:  promote standby server to primary server
LOG:  received promote request
LOG:  redo done at 0/5000028
LOG:  selected new timeline ID: 2
LOG:  archive recovery complete
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started


Pull request, Issue and feedback are very welcome.
Please checkout and test it.





Tuesday, February 10, 2015

Evolution of GIN index - PostgreSQL9.4



0. Introduction
PostgreSQL9.4, the latest version of PostgreSQL has been released at 18th Dec, 2014. This changes contain not only various newly functions but also evaluation existing function. One of them, the evolution of that GIN index is used for full text search, are
  1. Compression of GIN index
  2. Improvement of query speed in specialized case
This article would show the detail of these improvement. If you thought that what's the GIN index? or what's the full text search?, you can read and refer to this(Japanese).

1. Compression of GIN index
In version 9.4, the size of GIN index has reduced in large amounts using by compression method called "verbyte encoding". The image are followings.


In 9.3 or earlier, posting list has each ctid of tuple(block number and offset number). In contrast version 9.4, posting list has only a number which show the difference between one before item. Therefore, we can reduce the size of index 19 bytes in above case. The difference which shows its block id and its offset number is used under the following rules.
  • Use 1-6 bytes for showing difference
  • Therefore, we can show number up to 127  in one byte.
  • 8th bit is set to '1' for showing over 128.
  • Therefore, the setting 1 to 8th bit means "we use also next byte for showing difference"
  • The first 10 bits shows offset number, other bits show block number
For example,
127 = 01111111
128 = 11111111 00000001

2. Improvemtn of query speed in specialized case
In full text searching, some keys are used in searching are generated based on query key word. And postgres returns query result using by these keys. (For example, pg_bigm generates 2-characters keys). version 9.3 or earlier has been using all keys(ctid) of each posting list to judge when postgres search tuple which matches to all key of posting list, so far.



But, in above case, there are no need to judge all keys. It's just enough to use two keys "と京" has. That is, postgres need to judge in "some ctids whose posting list less most keys has" to avoid unnecessary scans.

In version 9.4, postgres uses keys whose posting list is less most keys9.4 has, to match keys. This changes leaded to improve performance.  But it does not mean for all cases. It will be effective for query which has rare data and frequent data, especially.

Friday, December 19, 2014

GINインデックスの進化 - PostgreSQL9.4

この記事はPostgreSQL Advent Calendar 2014の12/19担当分です。
昨日の12月18日、ついにPostgreSQL9.4がリリースされました!!

0. はじめに
PostgreSQL9.4では様々な新機能が組み込まれたほか、既存機能の進化もありました。 その中でも全文検索等に使われる”GINインデックス”では、
  1. インデックスサイズの縮小
  2. 特定ケースでの検索速度向上
の2つの大きな進化がありました。
本記事では、これらの内容について具体的にどのような改善があったかをご紹介いたします。GINインデックスって?全文検索って? という方はこちら(pg_bigmを用いた全文検索のしくみ)をご参照ください。

1. インデックスサイズの縮小
9.4ではGINインデックスサイズが大幅に縮小されました。
”varbyteエンコーディング”といわれる圧縮方法を用いて、GINインデックスのPostingListを圧縮することでインデックスサイズの縮小を実現しています。ざっくり書くと下図のようになります。


9.3までのPostingListの各要素にはタプルのctid(ブロック番号とオフセット番号)が記録されていたのに対し、9.4のPostingListの各要素には"一つ前の要素からの差分"が記録されています。このようにすることで、上記の例では19bytesの圧縮が実現しています。
そして、”ひとつ前の要素の差分”を記録する際の”差分”は以下のように表しています。

  • 1 - 6Byte(s)で差分を表現
  • 最初の10bitはオフセット番号、残りはブロック番号で使用
  • 各1Byteでは127までの数字を表すことができる
  • 128以上の数字は第8ビット目を1にする
  • 「第8ビット目が1」= 「次の1byteも含めてサイズを計算する
例えば、
127 = 01111111
128 = 11111111 00000001
という感じ。

2. 特定ケースでの検索速度向上
全文検索では検索キーワードを元にいくつかのキー生成します。そして、キーを元にGINインデックスを検索し、各キーのPostingListの内容を照らしあわせて、結果を返します
(たとえば、pg_bigmでは検索キーワードを2文字ずつ一文字ずらしながらキーを生成します)。これまでは、各PostingListすべてにマッチしたタプルを見つける際に、タプルのctidが最小のものから順番にすべてのctidについて判別を行っていました。


しかし、例えば上記のケースの場合、”と京”にぶら下がっているPostingListには2つしか要素が無いため、すべてのctidについて判別していく必要はありまあせん。上記のような場合は”一番要素数の少ないPostingListのctid”だけ(上記例の場合は(1,2)と(2,1)のみ)をチェックすれば無駄な処理をスキップすることができます。


9.4からは上図のように、各キーが持つPostingListの要素数の”一番少ないPostingList”を元に結果を判別していきます。この変更がGINインデックスの検索性能を向上させたました。ただすべての検索において劇的な向上があったわけではなく、特に「稀なデータとよく出るデータを含んだ検索」をする時がこの改善効果が大きく出ます。

3. まとめ
PostgreSQL9.4のGIN改善内容の紹介はいかがでしたでしょうか?圧縮によりインデックスサイズも小さくなり、検索性能も向上しました。また、9.4に新しく追加されたJSONB型もGINインデックスと組み合わせることで検索性能を向上させることができます。9.4の正式にリリースされたので、今後のGINインデックスのさらなる発展を期待しながら使ってみましょう!
(まさか9.4リリースの次の日の担当になるとは。たまたま新機能についてと書こうと思ってたからよかった。)

明日のPostgreSQL Advent Calendar 2014はkwatchさんです。よろしくおねがいします!

Saturday, September 13, 2014

pg_dbms_stats - Fixing relation and column statistics and planning, and managing stable query performance - PostgreSQL

pg_dbms_stats, which is a little popular module in Japan, fix relation(it means table or index) and column statistics, and then make stable query planning. This article mainly shows how to operate it with simple scenario as follows. If you want to know basically usage (i.g., installation, creation of extension), please check out the official manual.
  1. Lock relation and column statistics
  2. Confirmation of state of locked statistics
  3. Confirmation of statistics used by planner
1. Lock relation and column statistics
dbms_stats.lock() function with one argument locks both relation and column statistics. Also we can specify two (schema, relation), or three(schema, relation, column) arguments to this function.
=# SELECT relname, relpages, reltuples, relallvisible FROM pg_class WHERE relname = 'hoge';
+-----------+------------+-------------+---------------+
| relname | relpages | reltuples | relallvisible |
+-----------+------------+-------------+---------------+
| hoge      |      1       |  100       |             1    |
+-----------+------------+-------------+---------------+
(1 row)
=# SELECT dbms_stats.lock('hoge');
+------+
| lock |
+------+
| hoge |
+------+
(1 row)
2. Confirmation of state of locked statistics
The each locked statistics are stored into dbms_stats._relation_stats_locked or dbms_stats._column_stats_locked.
=# SELECT * FROM dbms_stats._relation_stats_locked;
+-[ RECORD 1 ]-----+-------------------------------------------------+
| relid                      | 16384                                                |
| relname                | public.hoge                                         |
| relpages               | 1                                                         |
| reltuples               | 100                                                     |
| relallvisible           | 1                                                         |
| curpages              | 1                                                         |
| last_analyze          |                                                           |
| last_autoanalyze   | 2014-08-18 22:35:13.744151+09 |
+------------------+-------------------------------------------------------+
=# SELECT * FROM dbms_stats._column_stats_locked;
+-[ RECORD 1 ]+-------+
| starelid        | 16384 |
| staattnum   | 1     |
| stainherit    |  f     |
| stanullfrac  | 0     |
| stawidth     | 4     |
| stadistinct  | 1     |
| stakind1     | 1     |
| stakind2     | 3     |
| stakind3     | 0     |
| stakind4     | 0     |
| stakind5     | 0     |
| staop1       | 96    |
| staop2       | 97    |
| staop3       | 0     |
| staop4       | 0     |
| staop5       | 0     |
| stanumbers1 | {1}   |
| stanumbers2 | {1}   |
| stanumbers3 |       |
| stanumbers4 |       |
| stanumbers5 |       |
| stavalues1  | {0}   |
| stavalues2  |       |
| stavalues3  |       |
| stavalues4  |       |
| stavalues5  |       |
+-------------+-------+

3. Confirmation of statistics used by planner
The planner use both dbms_stats.column_stats_effective and dbms_stats.relation_stats_effective tables actually. These tables are merged two tables(original one and locked one) tables. i.g., The locked statistics are used preferentially if there are, and then original statistics are used for rest of unlocked statistics.

You might want to say after reading this article, it is possible same operation with disable analyzing  and autovacuum. Note that pg_dbms_stats has also actually physical size, which is used for planning, into dbms_stats.relation_stats_locked. So planning will be stable.

Monday, September 8, 2014

Customize psql prompting format - PostgreSQL

I introduce about how to display line number on psql command, and how to customize them here. Actually this function needs latest version of PostgreSQL (from 9.5).
Here is development version manual.

The manual says,
%l
The line number inside the current statement, starting from 1.
We can set variable settings to psql output style, output format, prompt characters and so on. One of them is showing line number on psql.
You guys may used to it for now, it's really simple as default.
e.g., we can set prompt formatting and sample operation as follows.
postgres=# \set PROMPT1 '[%/][%>](%l)%R%# '
[postgres][55432](1)=# \set PROMPT2 '[%/][%>](%l)%R%# '
[postgres][55432](1)=# SELECT
[postgres][55432](2)=# aaaaaa -- error position
[postgres][55432](3)=# FROM
[postgres][55432](4)=# hoge; 
ERROR:  column "aaaaaa" does not exist
LINE 2: aaaaaa
        ^ 
How is this?
The prompt character means database name(%/), port number,(%>) line number (%l)from left. Please see describe them in manual.

If you feel waste to set prompting characters whenever execution of psql then psql read setting file (~/.psqlrc) before lunching itself. So if we write them to setting file then we don't need to set manually each time. Please try that. The sample .psqlrc file is following.
$ cat ~/.psqlrc
\pset border 2
\set PROMPT2 '[%/][%>](%l)%R%# '
\set PROMPT1 '[%/][%>](%l)%R%# '