PostgreSQL cluster: partitioning with plproxy (part II)

In the last post i described how you can setup plproxy and create a basic horizontally partitioned cluster. Now we will take a look at another real life usage: building a read-only cluster for your database

Distributing read-only load

The simplest real world usage for plproxy would be it’s use for redirecting read-only queries to read-only replicas of master database. The replicated databases can be filled with data via Londiste that is part of the SkyTools package, setup tutorial can be found here or with Slony which is a more heavyweight solution but from my own experience also harder to setup and maintain though definitely at the time being better documented.
A typical read-only cluster could look like on the following schema. The databases with the letter (P) on them are connection poolers. We ourself use PgBouncer but pgpool is also a choice.
The poolers are needed to minimize the number of open connections to a database also execution plans are cached on a connection basis. Of course everything will work fine also without the poolers. Dashed bold arrows represent replicas.
read only cluster

In this setup the plproxy functions determine the database to which the query is redirected. Read&write queries go to master database and read-only queries are distributed based on the algorithm you define to read-only replicas.
Setting up replication itself is relatively easy once you have passed the painful skytools installation process.
First let us create both replicas from write database toward ro1 & ro2. ro1 configuration file looks like this:
replica1.ini

[londiste]
job_name = londiste_master_to_r1
provider_db = dbname=write
subscriber_db = dbname=ro1
# it will be used as sql ident so no dots/spaces
pgq_queue_name = londiste.write
pidfile = %(job_name)s.pid
logfile = %(job_name)s.log
use_skylog = 0

replica2.ini is basically the same only job name and database name need to be changed. Now let’s install Londiste on provider (write) and subscribers (ro1,ro2) and start the replication daemons:

mbpro:~/temp kristokaiv$ londiste.py replica1.ini provider install
mbpro:~/temp kristokaiv$ londiste.py replica1.ini subscriber install
mbpro:~/temp kristokaiv$ londiste.py replica2.ini subscriber install
mbpro:~/temp kristokaiv$ londiste.py replica1.ini replay -d
mbpro:~/temp kristokaiv$ londiste.py replica2.ini replay -d

The next thing you need to do is to setup the ticker process on the database where write is performed. The ticker creates sync events so running it with shorter intervals will reduce latency. My configuration file looks like this:
ticker_write.ini

[pgqadm]
job_name = ticker_write
db = dbname=write
# how often to run maintenance [minutes]
maint_delay_min = 1
# how often to check for activity [secs]
loop_delay = 0.1
logfile = %(job_name)s.log
pidfile = %(job_name)s.pid
use_skylog = 0

To start the ticker as a daemon just run:

mbpro:~/temp kristokaiv$ pgqadm.py ticker_write.ini ticker -d

Lets create a simple table that we will replicate from master to read-only’s

mbpro:~/temp kristokaiv$ psql -c "CREATE TABLE users (username text primary key, password text);" write
mbpro:~/temp kristokaiv$ psql -c "CREATE TABLE users (username text primary key, password text);" ro1
mbpro:~/temp kristokaiv$ psql -c "CREATE TABLE users (username text primary key, password text);" ro2

And add it to replication

mbpro:~/temp kristokaiv$ londiste.py replica1.ini provider add users
mbpro:~/temp kristokaiv$ londiste.py replica1.ini subscriber add users
mbpro:~/temp kristokaiv$ londiste.py replica2.ini subscriber add users

After some time the tables should be up to date. Insert a new record in the write database and check if it’s delivered to both read-only db’s.
The functions to insert and select from users table:

CREATE OR REPLACE FUNCTION public.add_user(
    in i_username text,
    in i_password text,
    out status_code text
) AS $$
BEGIN
    PERFORM 1 FROM users WHERE username = i_username;
    IF NOT FOUND THEN
        INSERT INTO users (username, password) VALUES (i_username, i_password);
        status_code = 'OK';
    ELSE
        status_code = 'user exists';
    END IF;
    RETURN;
END; $$ LANGUAGE plpgsql SECURITY DEFINER;

GRANT EXECUTE ON FUNCTION public.add_user(
    in i_username text,
    in i_password text,
    out status_code text
) TO plproxy;

CREATE OR REPLACE FUNCTION login(
    in i_username text,
    in i_password text,
    out status_code text
) AS $$
BEGIN
    SELECT 'OK' FROM users u WHERE username = i_username AND password = i_password INTO status_code;
    IF NOT FOUND THEN status_code = 'FAILED'; END IF;
    RETURN;
END; $$ LANGUAGE plpgsql SECURITY DEFINER;

GRANT EXECUTE ON FUNCTION login(
    in i_username text,
    in i_password text,
    out status_code text
) TO plproxy;

Just for the comfort of those actually trying to follow these steps, here is how the proxy databases
cluster config:

CREATE OR REPLACE FUNCTION plproxy.get_cluster_partitions (cluster_name text)
RETURNS SETOF text AS $$
BEGIN
   IF cluster_name = 'readonly' THEN
        RETURN NEXT 'host=127.0.0.1 dbname=ro1';
        RETURN NEXT 'host=127.0.0.1 dbname=ro2';
        RETURN;
    ELSIF cluster_name = 'write' THEN
        RETURN NEXT 'host=127.0.0.1 dbname=write';
        RETURN;
    END IF;
    RAISE EXCEPTION 'no such cluster%', cluster_name;
END; $$ LANGUAGE plpgsql SECURITY DEFINER;

CREATE OR REPLACE FUNCTION plproxy.get_cluster_config(
    in cluster_name text,
    out key text,
    out val text)
RETURNS SETOF record AS $$
BEGIN
    RETURN;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION plproxy.get_cluster_version(cluster_name text) RETURNS int AS $$
    SELECT 1;
$$ LANGUAGE SQL;

The last thing left to do is to create the plproxy function definitions that will redirect the login function calls against read-only databases and add_user calls against write database:

CREATE OR REPLACE FUNCTION public.login(
    in i_username text,
    in i_password text,
    out status_code text
) AS $$
CLUSTER 'readonly'; RUN ON ANY;
$$ LANGUAGE plproxy;

CREATE OR REPLACE FUNCTION public.add_user(
    in i_username text,
    in i_password text,
    out status_code text
) AS $$
CLUSTER 'write';
$$ LANGUAGE plproxy;

This is it, the read-only cluster is ready. Note that even though creating such a read-only cluster seems simple and a quick solution for your performance problems it is not a silver bullet solution. Asynchronous replication often creates more problems than it solves so be careful to replicate only non-timecritical data or guarantee a fallback solution when data is not found (eg. proxy function first checks readonly database and if data is not found looks the data up from write db)

About this entry

You’re currently reading “PostgreSQL cluster: partitioning with plproxy (part II),” an entry on <(-_-)> on PostgreSQL

Published:: September 2, 2007 / 6:41 pm

Category:: cluster, partitioning, plproxy, postgresql

Tags:

ahti 9.20.07 / 8am

hei

olen su blogi juba algusest peale lugenud ja paistab, et olen ka esimene kommenteerija.
need õpetused tunduvad päris asjalikud, niiet võtsin ka lõpuks kätte ja hakkasin endale virtuaalmasinasse esialgu postgresi baasi ja ehk hiljem ka mitme virtuaalmasina peale postgresi clusterit katsetama.
jätka samas vaimus 🙂

Jon 10.18.07 / 6pm

Excellent article. I’ve been playing around with all the skytools for a while and your 2-parter here does a great job of summarizing / implementing these features.

kaiv 10.19.07 / 8pm

thank you, do you have any suggestions on what should i write in my next post?

Dragan Zubac 3.14.08 / 11pm

Hello

Very nice of Skype to publish PostgreSQL code as Open Source. I have only one question regarding all this partition/clustering/etc. issues, of You have partitioned Your table accros various servers/nodes,and You have a large percentage of updates,how do You deal with ‘vacuum’ operation which is a must in situations with updates ?

Any thoughts ?

Sincerely

Dragan

Anonymous 3.15.08 / 8am

no special magic involved, vacuum runs on every node separately. Of course the ability to do quick switchovers even enables you to cluster data / do non-blocking alter-tables on the switchover servers.

and by non-blocking i meant of course that it doesnt block the live users.

steve 4.10.08 / 2am

For example, you have one partition where few power users are located and you want to move these power users on to separate partiton to balance the load. How would you move users from one partition to another? Maybe this will be discussed in another post.

kaiv 4.10.08 / 1pm

cluster ‘yourclustername’; run on yourcustomfunction(username);

you can do for example so that when you have 4 partitions then you split normal users onto 3 of them and power users to the 4-th one like this:

yourcustomfunction()
lookup if user is power user from a table, if so return 3
otherwise return hashtext(user) % 3

the partitioning function can be whatever you like it to be.. partitioning by the first name, category of products etc. hashing is just good for even distribution of load.

steve 4.10.08 / 6pm

Kaiv,

I guess you can write a custom function to move all the records related to the power users from partiton1 tables to partition2 tables. Main thing here is that make sure to pick primary keys for the tables during the design phase so that there aren’t any primary key collisions among partition1 tables and partition2 tables.

Any advice on how to wisely choose primary key (auto increment by offset or hash) to avoid primary key collision among partitions’ tables, when moving records related to powerusers?

kaiv 4.11.08 / 1pm

i understand that powerusers are already existing users so you should not have issues with id collisions there anyway. But when talking about numeric primary keys on partitions then incrementing PK-s by number of partitions like this:
part 0, start=0 increment by 4
part 1, start=1 increment by 4
part 2, start=2 increment by 4
part 3, start=3 increment by 4
is probably not the best idea(tm), it can get a bit complicated to mange it when you need to split your cluster to more partitions. It’s better to keep a large enough PK (bigint) that consists of 2 parts. First of all a partition id that is systemwise unique and then the local incrementor.
part 1, start=1<<32 increment by 1
part 2, start=2<<32 increment by 1
part 3, start=3<<32 increment by 1
part 4, start=4<<32 increment by 1
I really doubt that you will hit the int limit on any of the partitions, by that time you most likely will have split the partitions already anyway.
The downside of this is of course that the PK id-s are not sequential.

Andrew Grytsenko 6.26.08 / 6pm

Is it posible to setup pgBounser that when we had failure on one database than data would be transfered in to backup database.
If not Is there any “ready to use” solutions to solve that problem?

Anonymous 6.26.08 / 10pm

you would not belive how complicated it is to make the decision about on what conditions to do so 🙂 but to leave that aside i think you can do it with an external script that changes the run on connection aliases.

joaocosme 7.2.08 / 1pm

would you mind if i translate yout how to to PT_BR(Brazilian Portuguese)? im from brazilian postgresql community

darkanthey 7.10.08 / 12pm

Can we create the variable arguments-list function with help of plproxy? For example, I need to ask the function for update but only when ask this function I will knew what column will be changed.

Anonymous 7.10.08 / 1pm

you can write normal plpgsql funcs on the proxy layer so you could do that inside the plpgsql func. Maybe i misunderstood you though. What exactly are you trying to accomplish?

darkanthey 7.10.08 / 4pm

For example. But this not beautiful method. Needed analog *args.

CREATE OR REPLACE FUNCTION "Account".updater_user(_name character varying, _email character varying, _id bigint) RETURNS void AS $BODY$ DECLARE sql_argument varchar; BEGIN sql_argument := ''; if _name notnull then sql_argument := 'name = ''' || _name || ''''; end if; if _email notnull then if sql_argument != '' then sql_argument := sql_argument || ','; end if; sql_argument := sql_argument || 'email = ''' || _email || ''''; end if; sql_argument := sql_argument || ' where id = ' || _id; execute 'update "Account"."Account" set ' || sql_argument; END;

kaiv 7.15.08 / 11am

Dynamic sql in big databases is something i would never do. Its perfectly ok for small backoffice like applications but not at our data amounts.

realqi.cn 10.22.09 / 8am

hi , i have quote it to my blog. is it ok?

Kristo 10.23.09 / 10am

yes it’s ok as far as you link it

PostgreSQL partitioning with plproxy (part II) | Yet Another SA's Blog 12.19.09 / 12pm

[…] 此文我在学习找到的并转载过来,原文需要翻墙原文这里 this article is not the original,i just quote it from https://kaiv.wordpress.com and […]

brealania 12.31.09 / 6pm

Sorry for commenting off topic … which wordpress template are you using? It looks great!!

kaiv 1.1.10 / 10am

Hemingway by Kyle Neath

janitorial services in los angeles 9.9.12 / 8pm

Great work! That is the kind of info that are meant
to be shared around the net. Disgrace on the seek engines for now not positioning this
publish upper! Come on over and discuss with my web site . Thanks
=)

top 10 herbal breast enlargement pills 12.7.12 / 8am

I must thank you for the efforts you’ve put in writing this website. I am hoping to view the same high-grade blog posts by you later on as well. In fact, your creative writing abilities has motivated me to get my own site now 😉

Bebe 12.7.12 / 10am

You should be a part of a contest for one of the highest quality
sites on the web. I most certainly will recommend this site!

technorati.com 12.8.12 / 8am

I’ve been surfing online greater than 3 hours today, yet I never found any fascinating article like yours. It is beautiful price enough for me. Personally, if all web owners and bloggers made excellent content material as you did, the web will be much more useful than ever before.

Solar engine 12.14.12 / 12am

I don’t even know how I ended up here, but I thought this post was good. I don’t know who you are
but certainly you’re going to a famous blogger if you are not already 😉 Cheers!

Shonda 12.16.12 / 2pm

Great way of seeing things – I am a little more of a black
and white person, myself

how to get rid of static cling 12.17.12 / 9am

Hi there, I read your blog on a regular basis. Your writing style is awesome, keep doing
what you’re doing!

Guillermo 12.19.12 / 1am

I absolutely love your website.. Very nice colors & theme.
Did you build this amazing site yourself?
Please reply back as I’m planning to create my very own website and would love to learn where you got this from or just what the theme is named. Thanks!

http://rosaceanaturaltreatmentx.tumblr.com/ 12.24.12 / 12pm

What a material of un-ambiguity and preserveness of precious knowledge
on the topic of unpredicted feelings.

malt liquors 12.28.12 / 2pm

Hello there! Quick question that’s completely off topic. Do you know how to make your site mobile friendly? My website looks weird when viewing from my iphone. I’m trying to find a theme or plugin
that might be able to resolve this problem. If you have any
suggestions, please share. With thanks!

Laura 1.14.13 / 11pm

But the the main thing is a lot of your personal data is out there whether it’s health information, bank card details or even as the Albuquerque mishap shows courtroom files. I even find myself backing away from recommending other resources which, in the past, I would have shared willingly. Official Picture of President Reagan by the Executive Office of the Presidency used under Public Domain. So, in conclusion, removing the headphones from your MP3 player is not bad, provided you don’t yank too hard on the cord.
In doing so, it becomes the only potential choice for best
gaming PC under 500 bucks in the desktop category. But there is a
magic formula. Talcahuano. That assures a steady supply since it’s easily replaceable. *Composite fillings. There is a rarely discussed addiction that can be as enslaving as drugs and as devastating to self-respect, self-confidence and healthy functioning as alcoholism.

http://www.aricha-blog.info/ 1.16.13 / 9pm

Hi there, i read your blog from time to time and i own
a similar one and i was just curious if you get a lot of spam feedback?

If so how do you stop it, any plugin or anything you can advise?
I get so much lately it’s driving me insane so any support is very much appreciated.

Baptism Jewelry Info 1.23.13 / 12am

Looking in Yahoo raised your websites – I’m delighted it did, cheers.

Beauty Stories Site 1.24.13 / 12pm

Great article but I am not sure that I agree. However, folks consider me tricky at the best of
times! Cheers.

http://olciq-on-line-car-insurance-quote.blogspot.com/ 2.1.13 / 8am

Glass cleaner, aerosol, and other similar chemicals should not
be applied on any portion of the device.
s clever too, as it not only attaches to the i – Pad 2 using magnets,
but uses one to activate and deactivate the screen when
the cover is opened or closed too. You can listen to the music, read digital books, watch
movies, and browse the internet, among others.

sales trainer 2.7.13 / 12pm

I’m impressed, I have to admit. Seldom do I encounter a blog that’s both educative and engaging, and
without a doubt, you have hit the nail on
the head. The problem is something not enough folks are speaking intelligently about.
I’m very happy that I stumbled across this during my search for something regarding this.

online printing 6.11.13 / 7pm

Hello just wanted to give you a quick heads up. The words in your article seem to be running off the screen in Safari.
I’m not sure if this is a format issue or something to do with browser compatibility but I thought I’d post to let you know.
The style and design look great though! Hope you get the problem resolved soon.
Thanks

ежедневный макияж глаз 7.12.13 / 1am

Great blog! Is your theme custom made or did you download it
from somewhere? A theme like yours with a few simple
adjustements would really make my blog jump out. Please let me know where you got
your theme. With thanks

fruityjob 12.19.14 / 9am

Hi there to all, because I am actually keen of reading this blog’s post to be updated regularly.
It carries good information.

<(-_-)> on PostgreSQL

PostgreSQL cluster: partitioning with plproxy (part II)

About this entry

41 Comments

Leave a comment Cancel reply

Pages

Blogroll

Recent Posts

Archives

Pages

Categories

<(-_-)> on PostgreSQL

PostgreSQL cluster: partitioning with plproxy (part II)

Share this:

Related

About this entry

41 Comments

Leave a comment Cancel reply

Pages

Blogroll

Recent Posts

Archives

Pages

Blogroll

Categories