PostgreSQL9.4, the latest version of PostgreSQL has been released at 18th Dec, 2014. This changes contain not only various newly functions but also evaluation existing function. One of them, the evolution of that GIN index is used for full text search, are
- Compression of GIN index
- Improvement of query speed in specialized case
1. Compression of GIN index
In version 9.4, the size of GIN index has reduced in large amounts using by compression method called "verbyte encoding". The image are followings.
- Use 1-6 bytes for showing difference
- Therefore, we can show number up to 127 in one byte.
- 8th bit is set to '1' for showing over 128.
- Therefore, the setting 1 to 8th bit means "we use also next byte for showing difference"
- The first 10 bits shows offset number, other bits show block number
127 = 01111111
128 = 11111111 00000001
2. Improvemtn of query speed in specialized case
In full text searching, some keys are used in searching are generated based on query key word. And postgres returns query result using by these keys. (For example, pg_bigm generates 2-characters keys). version 9.3 or earlier has been using all keys(ctid) of each posting list to judge when postgres search tuple which matches to all key of posting list, so far.
But, in above case, there are no need to judge all keys. It's just enough to use two keys "と京" has. That is, postgres need to judge in "some ctids whose posting list less most keys has" to avoid unnecessary scans.
In version 9.4, postgres uses keys whose posting list is less most keys9.4 has, to match keys. This changes leaded to improve performance. But it does not mean for all cases. It will be effective for query which has rare data and frequent data, especially.