<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>&#60;(-_-)&#62; on PostgreSQL</title>
	<atom:link href="http://kaiv.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://kaiv.wordpress.com</link>
	<description>don't rely on luck, count on it</description>
	<lastBuildDate>Fri, 30 Sep 2011 15:38:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='kaiv.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>&#60;(-_-)&#62; on PostgreSQL</title>
		<link>http://kaiv.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://kaiv.wordpress.com/osd.xml" title="&#60;(-_-)&#62; on PostgreSQL" />
	<atom:link rel='hub' href='http://kaiv.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Defining constants in Pl/pgSQL, with a twist</title>
		<link>http://kaiv.wordpress.com/2008/03/02/defining-constants-in-plpgsql-with-a-twist/</link>
		<comments>http://kaiv.wordpress.com/2008/03/02/defining-constants-in-plpgsql-with-a-twist/#comments</comments>
		<pubDate>Sun, 02 Mar 2008 18:06:49 +0000</pubDate>
		<dc:creator>kaiv</dc:creator>
				<category><![CDATA[postgresql]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[tips&tricks]]></category>

		<guid isPermaLink="false">http://kaiv.wordpress.com/?p=29</guid>
		<description><![CDATA[For the past few months i have been thinking about how to implement constants (like for example order status codes) in your code so that you maintain readability even when the constants are defined as integers and you can&#8217;t do anything about it. Of course i would prefer having status codes as text (eg. order [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=29&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>For the past few months i have been thinking about how to implement constants (like for example order status codes) in your code so that you maintain readability even when the constants are defined as integers and you can&#8217;t do anything about it. Of course i would prefer having status codes as text (eg. order CREATED, DELIVERED, REVERSED) as in PostgreSQL this doesn&#8217;t create noticeable performance penalties and there are no limitations on text column sizes. but you don&#8217;t just refactor all your database code and do berserk ALTER COLUMNS as this is way too much work. So still you have the integer status codes in your table and have to define lengthy <code>CONSTANT C_ORDER_DELIVERED int :=1;</code> blocks in your functions declare section to keep the code readable. This gets extremly annoying when there are let&#8217;s say 20 different statuses, should i declare them all in the DECLARE section or just the ones used by the function? However this can be avoided by a small trick&#8230;<br />
<span id="more-29"></span></p>
<h2>Handling constants as record types attributes</h2>
<p>Then one day i discovered that our JAVA team implements constants as a simple SQL function that returns one record where the statuses textual code is a out parameter and the integer value of the status is kept inside that parameter.</p>
<pre>
<code>

CREATE OR REPLACE FUNCTION order_statuses(

    OUT CREATED int,

    OUT DELIVERED int,

    OUT CANCELLED int

) RETURNS record AS

$$

BEGIN

     CREATED := 1;

     DELIVERED := 2;

     CANCELLED := 3;

END;

$$ LANGUAGE plpgsql;

</code></pre>
<p>This is a really neat idea, you only have to select the output of the function into a record type variable and Voilà!</p>
<pre>
<code>

C_ORDER = order_statuses();

SELECT * FROM orders WHERE key_user = i_username AND key_status = C_ORDER.DELIVERED;

</code></pre>
<p><b><br />
Note that this kind of approach also has it&#8217;s drawbacks: The query is unable to use partial indexes in the plan and when you are scanning a table solely based on this key then hardcoding a constant will give you a more correct plan as it is able to use the statistics collected for the column.<br />
</b></p>
<p>No need to declare anything besides the C_ORDER record and code is readable, no integer constants inside.<br />
However such an approach has it&#8217;s shortcomings, when you add a new status you need to drop the previous constant returning function and create a new one, this is quite annoying when you have all the same statuses stored inside a table and could fetch them from there. However the limitations that a function must always return a predefined fixed count of output variables will not enable you to return a record with dynamical length.</p>
<h2>creating records with variable length</h2>
<p>Then i figured out that this approach actually works if we do this one little exception to our &#8220;no SQL should be dynamically created inside DB&#8221; policy. It&#8217;s perfectly ok to create dynamic SQL if you are not doing any data access within it &#8211; just for converting data. (Doing data fetching inside dynamically generated SQL makes monitoring the database performance and access patterns a pain in the ass).</p>
<p>So here is what i came up with:</p>
<pre>
<code>

CREATE TABLE classificator.classificator(

    id_classificator serial PRIMARY KEY,

    classificator text,

    code int,

    lookup text,

    description text

);

CREATE INDEX idx_classificator ON classificator.classificator(classificator);</code>INSERT INTO classificator.classificator (classificator, code, lookup, description) VALUES ('order_status',1,'CREATED','Order created');

INSERT INTO classificator.classificator (classificator, code, lookup, description) VALUES ('order_status',2,'DELIVERED','Order delivered');

INSERT INTO classificator.classificator (classificator, code, lookup, description) VALUES ('order_status',3,'CANCELLED','Order cancelled');

CREATE AGGREGATE array_accum (anyelement)

(

sfunc = array_append,

stype = anyarray,

initcond = '{}'

);

CREATE OR REPLACE FUNCTION classificator.get_constant_structure_sql(

    IN i_classificator text,

    OUT exec_str text

) RETURNS text AS $$

DECLARE

    _lookup text[];

    _key_integer text[];

    _keys text;

    _vals text;

BEGIN

    SELECT array_accum('"'||c.lookup||'"'),

           array_accum(c.code)

      FROM classificator.classificator c

    WHERE c.classificator = i_classificator

      INTO _lookup,

           _key_integer;

    IF NOT FOUND THEN

         RETURN;

    END IF;

_keys = array_to_string(_lookup, ', ');

    _vals = array_to_string(_key_integer, ', ');

    exec_str = 'SELECT * FROM (VALUES ('||_vals||')) AS t ('||_keys||')';

    RETURN;

END

$$ LANGUAGE plpgsql SECURITY DEFINER;</pre>
<p>As you can see i don&#8217;t care much about the 3NF (Third Normalized Form) standards to keep all status codes inside separate tables. I personally am not a big fan of 3NF as it has it&#8217;s own drawbacks. Having a swarm of small status codes &amp; other similar tables in your system makes it too difficult to locate anything. However if you like you can create this function in a way that it handles only one table (eg. it creates the status codes record only for the table order_statuses).</p>
<p>The implementation into a function is really simple as you can see from the following example:</p>
<pre>
<code>

...

DECLARE

    _rec_struct text;

    C_ORDER record;

BEGIN

    _rec_struct = classificator.get_constant_structure_sql('order_status');

    EXECUTE _rec_struct INTO C_ORDER;

    SELECT * FROM orders WHERE key_status = C_ORDER."DELIVERED";

...

</code></pre>
<p>Code is clean, simple, readable and consumes only 3 rows for every constant set that you need defined. Works also nicely as a eval check so you can not implement constants in your code that are not yet in the table.</p>
<p>Here&#8217;s a small data dump to visualize whats going on inside these functions:</p>
<pre>
<code>

select * from classificator.classificator;

 id_classificator | classificator | code |  lookup   |   description

------------------+---------------+------+-----------+-----------------

                1 | order_status  |    1 | CREATED   | Order created

                2 | order_status  |    2 | DELIVERED | Order delivered

                3 | order_status  |    3 | CANCELLED | Order cancelled

(3 rows)</code>select * from classificator.get_constant_structure_sql('order_status');

                                  exec_str

-----------------------------------------------------------------------------

 SELECT * FROM (VALUES (3, 2, 1)) AS t ("CANCELLED", "DELIVERED", "CREATED")

(1 row)

SELECT * FROM (VALUES (3, 2, 1)) AS t ("CANCELLED", "DELIVERED", "CREATED");

 CANCELLED | DELIVERED | CREATED

-----------+-----------+---------

         3 |         2 |       1

(1 row)</pre>
<p>So now we have the possibility to create a record inside every Pl/PgSQL function that holds all the constants, changes dynamically on adding new constants and spares us a lot of code in function DECLARE parts. Hope this idea was useful, i for sure plan to implement it wherever possible <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<h2>Promo corner</h2>
<p><img src="http://c.skype.com/i/images/logos/skype_logo.png" alt="skype" /> is <a href="http://jobs.skype.ee/vacancy.html?ref=QA-BE-DATA-EE">looking for a DB QA engineer</a>, the job is located in Tallinn. The team is small and brilliant, this is a great opportunity if you are a quick learner and are interested how big database clusters actually work!</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/kaiv.wordpress.com/29/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/kaiv.wordpress.com/29/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kaiv.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kaiv.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kaiv.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kaiv.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kaiv.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kaiv.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kaiv.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kaiv.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kaiv.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kaiv.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kaiv.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kaiv.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kaiv.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kaiv.wordpress.com/29/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=29&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kaiv.wordpress.com/2008/03/02/defining-constants-in-plpgsql-with-a-twist/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c111344c9bac47033d707eec98451507?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">kaiv</media:title>
		</media:content>

		<media:content url="http://c.skype.com/i/images/logos/skype_logo.png" medium="image">
			<media:title type="html">skype</media:title>
		</media:content>
	</item>
		<item>
		<title>PostgreSQL substring search</title>
		<link>http://kaiv.wordpress.com/2007/12/11/postgresql-substring-search/</link>
		<comments>http://kaiv.wordpress.com/2007/12/11/postgresql-substring-search/#comments</comments>
		<pubDate>Tue, 11 Dec 2007 21:07:30 +0000</pubDate>
		<dc:creator>kaiv</dc:creator>
				<category><![CDATA[postgresql]]></category>
		<category><![CDATA[query tuning]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[tips&tricks]]></category>

		<guid isPermaLink="false">http://kaiv.wordpress.com/2007/12/11/postgresql-substring-search/</guid>
		<description><![CDATA[As most of you have probably discovered there is no nice way to do substring search in PostgreSQL however where&#8217;s a will there&#8217;s a way. The following post is about overcoming this limit, as with all such cases it is somewhat specific to the problem that we are solving and has it&#8217;s drawbacks. However i [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=27&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>As most of you have probably discovered there is no nice way to do substring search in PostgreSQL however where&#8217;s a will there&#8217;s a way. The following post is about overcoming this limit, as with all such cases it is somewhat specific to the problem that we are solving and has it&#8217;s drawbacks. However i hope this might be of some help for the really desperate folks.<span id="more-27"></span>This somewhat hackish approach relies on the full text search contrib module called <a href="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/">Tsearch2</a> by Oleg Bartunov and Teodor Sigaev. I tried to use their trigram module first but it didn&#8217;t really help me out so here is the solution:  Let&#8217;s create a really simple substring search on email addresses, for input we have the following table:</p>
<pre>
<code>
    Table "public.tmp_email"
  Column  |   Type   | Modifiers
----------+----------+-----------
 email    | text     |
 trigrams | tsvector |
</code></pre>
<p>trigrams is a column added solely for the purpose of being able to help us to search emails by substrings, it contains all the unique sequential 3 letter combinations used in the email field. For example for an email &#8216;kristo@eesti.ee&#8217; it would consist of</p>
<pre>o@e est sti .ee @ee sto ris ees ist ti. kri i.e to@</pre>
<p>We will do this with a simple plpython function (can be done in plpgsql also but it looks nicer in python)</p>
<pre>
<code>
CREATE FUNCTION text_to_trigrams(i_txt text) RETURNS text AS $$
    trigrams = {}
    for i in range(len(i_txt)-2):
        trigrams[i_txt[i:i+3]] = 1
    return ' '.join(trigrams.keys())
$$ LANGUAGE plpythonu SECURITY DEFINER;
</code></pre>
<p>In a production system you would probably be calculating the trigrams in an update/insert trigger on the table doing the same operation provided in the following query:</p>
<pre>
<code>
update tmp_email set trigrams = to_tsvector('simple',text_to_trigrams(email));
</code></pre>
<p>The principle itself is really simple, after splitting up the string in email column into these unique 3 letter combinations we will create a full text index on that column (trigram column) which will enable us to search for a predefined set of trigrams inside a string.</p>
<pre>
<code>
create index idx_email_substring on tmp_email using gin (trigrams);
</code></pre>
<p>Be warned that the index is quite hefty. In my testcase with live e-mail addresses the index was just a bit bigger than the table itself, remember i warned you about the drawbacks <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  However the search itself has a quite acceptable performance, but before doing the search we will have to create the search condition. This is achived by creating a query that consists of the trigrams that we are looking for, eg. we want to find users with e-mail address that includes the substring &#8216;%to@skype%&#8217;</p>
<pre>
<code>
test=# select replace(text_to_trigrams('to@skype'),' ',' &amp; ');
              replace
-----------------------------------
 o@s &amp; sky &amp; @sk &amp; ype &amp; to@ &amp; kyp
</code></pre>
<p>This query will find all the trigrams that are contained within the string and replace the spaces we needed for tsvector creation with &amp; operations needed for the tsquery. So using this generated string we can do the search in the following way:</p>
<pre><code>
test=# explain analyze select * from tmp_email where trigrams @@ to_tsquery('simple','o@s &amp; sky &amp; @sk &amp; ype &amp; to@ &amp; kyp');
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_email_substring on tmp_email  (cost=0.00..15.50 rows=14 width=64) (actual time=0.086..3.349 rows=2 loops=1)
   Index Cond: (trigrams @@ '''o'' &amp; ''s'' &amp; ''sky'' &amp; ''sk'' &amp; ''ype'' &amp; ''to'' &amp; ''kyp'''::tsquery)
 Total runtime: 3.379 ms

test=# explain analyze select count(*) from tmp_email ;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=676.60..676.61 rows=1 width=0) (actual time=45.656..45.657 rows=1 loops=1)
   -&gt;  Seq Scan on tmp_email  (cost=0.00..642.28 rows=13728 width=0) (actual time=0.009..27.038 rows=13728 loops=1)
 Total runtiows=13728 loops=1) Total runtime: 45.701 ms(3 rows)test=# select count(*) from tmp_email ; count ------- 13728

test=# select count(*) from tmp_email ;
 count
-------
 13728
</code></pre>
<p>I know that this is not the nicest way to do this but until we have some generic substring search implemented into PostgreSQL this is at least one way to do it for certain cases. If you want to know in more detail how the previously described solution works i suggest you take a loot at the <a href="http://conference.postgresql.org/download/TFCKUpload/73.pdf">GIN spec</a>. Be aware that this currently only works if the substring is at least 3 characters long.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/kaiv.wordpress.com/27/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/kaiv.wordpress.com/27/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kaiv.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kaiv.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kaiv.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kaiv.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kaiv.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kaiv.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kaiv.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kaiv.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kaiv.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kaiv.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kaiv.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kaiv.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kaiv.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kaiv.wordpress.com/27/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=27&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kaiv.wordpress.com/2007/12/11/postgresql-substring-search/feed/</wfw:commentRss>
		<slash:comments>35</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c111344c9bac47033d707eec98451507?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">kaiv</media:title>
		</media:content>
	</item>
		<item>
		<title>Getting current time inside a transaction</title>
		<link>http://kaiv.wordpress.com/2007/11/02/getting-current-time-inside-a-transaction/</link>
		<comments>http://kaiv.wordpress.com/2007/11/02/getting-current-time-inside-a-transaction/#comments</comments>
		<pubDate>Fri, 02 Nov 2007 15:07:07 +0000</pubDate>
		<dc:creator>kaiv</dc:creator>
				<category><![CDATA[postgresql]]></category>
		<category><![CDATA[tips&tricks]]></category>

		<guid isPermaLink="false">http://kaiv.wordpress.com/2007/11/02/getting-current-time-inside-a-transaction/</guid>
		<description><![CDATA[This has already been on the lists but i hope it will be useful to somebody who has not time to read all of them. If you are doing stuff inside a transaction you may have noticed that the the results of NOW(), CURRENT_TIMESTAMP etc. will always return the same value, eg. test=# begin; BEGIN [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=23&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This has already been on the lists but i hope it will be useful to somebody who has not time to read all of them. If you are doing stuff inside a transaction you may have noticed that the the results of NOW(), CURRENT_TIMESTAMP etc. will always return the same value, eg.</p>
<pre>
<code>
test=# begin;
BEGIN
test=# select now();
              now
-------------------------------
 2007-11-02 16:57:38.011621+02
(1 row)

test=# select now();
              now
-------------------------------
 2007-11-02 16:57:38.011621+02
(1 row)

test=# commit;
COMMIT
</code>
</pre>
<p>The time returned is always the transaction start date (don&#8217;t remember exactly though if it was the time of the begin statement or first SQL statement). However this can be overcome, there is a function that for whatever reasons i don&#8217;t know/remember and also really don&#8217;t care acts differently &#8211; it&#8217;s called timeofday().</p>
<pre>
<code>
test=# begin;
BEGIN
test=# select timeofday()::timestamptz;
           timeofday
-------------------------------
 2007-11-02 17:01:15.042648+02
(1 row)

test=# select timeofday()::timestamptz;
           timeofday
-------------------------------
 2007-11-02 17:01:19.266846+02
(1 row)

test=# commit;
COMMIT
</code>
</pre>
<p>Thanks to Ze / Hannu for pointing this out to me (bow)</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/kaiv.wordpress.com/23/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/kaiv.wordpress.com/23/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kaiv.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kaiv.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kaiv.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kaiv.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kaiv.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kaiv.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kaiv.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kaiv.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kaiv.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kaiv.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kaiv.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kaiv.wordpress.com/23/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kaiv.wordpress.com/23/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kaiv.wordpress.com/23/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=23&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kaiv.wordpress.com/2007/11/02/getting-current-time-inside-a-transaction/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c111344c9bac47033d707eec98451507?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">kaiv</media:title>
		</media:content>
	</item>
		<item>
		<title>Skytools database scripting framework &amp; PgQ</title>
		<link>http://kaiv.wordpress.com/2007/10/19/skytools-database-scripting-framework-pgq/</link>
		<comments>http://kaiv.wordpress.com/2007/10/19/skytools-database-scripting-framework-pgq/#comments</comments>
		<pubDate>Fri, 19 Oct 2007 18:20:53 +0000</pubDate>
		<dc:creator>kaiv</dc:creator>
				<category><![CDATA[postgresql]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[pgq]]></category>
		<category><![CDATA[skytools]]></category>

		<guid isPermaLink="false">http://kaiv.wordpress.com/2007/10/19/skytools-database-scripting-framework-pgq/</guid>
		<description><![CDATA[introduction In this post we will look at the skytools scripting framework in general and also look into writing simple queue consumers. There are quite a lot of tasks in the database that don&#8217;t need immediate completion and can aswell run in the background. A simple example for this would be sending out e-mails of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=21&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h3>introduction</h3>
<p>In this post we will look at the skytools scripting framework in general and also look into writing simple queue consumers. There are quite a lot of tasks in the database that don&#8217;t need immediate completion and can aswell run in the background. A simple example for this would be sending out e-mails of user creation / password reminders etc. Usually these kind of batch jobs are done by using a queue table that the script fetches data from and after completion removes the row from table. PgQ enables us to do this even more conveniently and is a lot more effective performancewise.<br />
<span id="more-21"></span><br />
PgQ is the queueing solution that empowers the londiste replication. When you have londiste installed in your database you have also PgQ installed.<br />
So first of all let&#8217;s bring out a few key points why it&#8217;s better to use PgQ instead of queue tables:</p>
<ul>
<li>Performance : cleanup is done by rotating between 3 tables and using truncate to get rid of old data, no need for delete queries</li>
<li>Scalability : one PgQ queue can have basically unlimited consumers that keep their own high watermark and share the data</li>
<li>Retry queue: if a queue message can not be processed instantly it can be moved to the retry queue that will automatically reinsert the events into main queue later</li>
</ul>
<p>We will not look at any of above topics in this post but i promise i will do this sometime in the future&#8230;</p>
<h3>setup</h3>
<p>If you already have londiste installed you don&#8217;t have to do anything.<br />
If not locate the txid.sql &amp; pgq.sql file inside the skytools frame and load it into the database in the same order<br />
Create configuration file for the ticker and put the ticker daemon running. If you don&#8217;t know what i&#8217;m talking about look at the ticker setup section in <a href="http://kaiv.wordpress.com/2007/09/02/postgresql-cluster-partitioning-with-plproxy-part-ii/">clustering with plproxy part II</a><br />
Create the queue<br />
<code><br />
select * from pgq.create_queue('mailer');<br />
</code><br />
This should generate the result of &#8217;1&#8242; indicating that Markos brain was in set_no_documentation = 1 mode while writing code <img src='http://s1.wp.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> <br />
More informative would be to do a select on the pgq.queue or in older versions pgq.queue_config table:<br />
<code><br />
queries=# select * from pgq.queue;<br />
-[ RECORD 1 ]------------+------------------------------<br />
queue_id                 | 1<br />
queue_name               | mailer<br />
queue_ntables            | 3<br />
queue_cur_table          | 0<br />
queue_rotation_period    | 02:00:00<br />
queue_switch_step1       | 1057688<br />
queue_switch_step2       | 1057688<br />
queue_switch_time        | 2007-10-19 18:57:55.775194+03<br />
queue_external_ticker    | f<br />
queue_ticker_max_count   | 500<br />
queue_ticker_max_lag     | 00:00:03<br />
queue_ticker_idle_period | 00:01:00<br />
queue_data_pfx           | pgq.event_1<br />
queue_event_seq          | pgq.event_1_id_seq<br />
queue_tick_seq           | pgq.event_1_tick_seq<br />
</code></p>
<h3>A peek inside the queue</h3>
<p>The queue data itself is in 3+1 tables the pgq.event_X table is the table that the other 3 inherit:<br />
<code><br />
queries=# \dt pgq.event_<br />
pgq.event_1         pgq.event_1_0       pgq.event_1_1       pgq.event_1_2       pgq.event_template<br />
</code><br />
So if you want to look at the data inside the queue the main table is enough.<br />
The event itself is simple it consists of 2 fields that are filled by the user ev_type &amp; ev_data both of which are text fields.<br />
An example of ev_type field values could be &#8216;I&#8217;,'U&#8217;,'D&#8217; for replication actions (Insert, Update, Delete)<br />
Ev data contains all the data that you want to send, it can be a single value but we usually go for urlencoded strings:<br />
<code><br />
"user=kristokaiv&amp;email=kristo.kaiv@geemail.com"<br />
</code><br />
as the tools in skytools framework support this format.</p>
<h3>Putting messages to queue</h3>
<p>For inserting events we ourselves usually use the following trick:<br />
create a dummy table that is used only to define the events structure<br />
<code><br />
queries=# create schema queue;<br />
CREATE SCHEMA<br />
queries=# create table queue.welcome_email(username text, language text, firstname text, lastname text);<br />
CREATE TABLE<br />
</code><br />
add a pgq.logutriga to the table, what pgq.logutriga does is it urlencodes the inserted column field pairs: column1=value1&amp;column2=value2..<br />
and inserts it to the queue given as the parameter. The other parameter is either &#8216;SKIP&#8217; or &#8216;OK&#8217;. Skip meaning that data is discarded after trigger has processed it. &#8216;OK&#8217; meaning that data will be actually inserted to the table. Both of them are useful but mostly you don&#8217;t actually need the data for any other reasons than just debugging.<br />
<code><br />
queries=# CREATE TRIGGER ins_to_queue BEFORE INSERT ON queue.welcome_email FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga('mailer', 'welcome_email', 'SKIP');<br />
CREATE TRIGGER<br />
</code><br />
or if you don&#8217;t have the latest PgQ version then:<br />
<code><br />
queries=# CREATE TRIGGER ins_to_queue BEFORE INSERT ON queue.welcome_email FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga('mailer','SKIP');<br />
CREATE TRIGGER<br />
</code><br />
So finally we come to the actual event creation:<br />
<code><br />
queries=# insert into queue.welcome_email (username, language, firstname, lastname) values ('kristokaiv','Kristo','Kaiv','kristo.kaiv@geemail.com');<br />
INSERT 0 0<br />
queries=# select * from pgq.event_1 where ev_data like '%kristokaiv%';<br />
-[ RECORD 1 ]------------------------------------------------------------------------------------<br />
ev_id     | 2<br />
ev_time   | 2007-10-19 19:56:53.353656+03<br />
ev_txid   | 1057748<br />
ev_owner  |<br />
ev_retry  |<br />
ev_type   | I:<br />
ev_data   | username=kristokaiv&amp;language=Kristo&amp;firstname=Kaiv&amp;lastname=kristo.kaiv%40geemail.com<br />
ev_extra1 | queue.welcome_email<br />
ev_extra2 |<br />
ev_extra3 |<br />
ev_extra4 |<br />
</code><br />
As you can see the for every column in our dummy table we have the inserted value and everything is nicely encoded. Also the name of the table on which the trigger resides is added to one of the extra columns. This enables us to use multiple dummy queue tables to insert to one queue table.</p>
<p>If you don&#8217;t like this solution much you could or would like to use different encoding you are absolutely free to do this. When dummy tables are not your favourite weapon of choice but urlencoding is fine then you could for example create the following function:</p>
<pre>
<code>
CREATE OR REPLACE FUNCTION public.urlencode (text, text)  RETURNS text AS $$
    import skytools
    key_value = {args[0]:args[1]}
    return skytools.db_urlencode (key_value)
$$ LANGUAGE plpythonu VOLATILE SECURITY DEFINER;
</code>
</pre>
<p>And later on explicitly add the key, value pairs together:<br />
<code><br />
perform pgq.insert_event('mailer','welcome_email',urlencode('username','kristo.kaiv')||'&amp;'||urlencode('email','kristo.kaiv@geemail.com'));<br />
</code><br />
but this dummy table based solution is imho quite easy to maintain.</p>
<h3>consuming the event</h3>
<p>This is what our consumer looks like:</p>
<pre>
<code>
import sys, os, pgq, skytools

class Mailer(pgq.Consumer):
    def sendWelcomeMail(self, params):
        """try to send mail, return true on success, false on failure"""
        return True

    def process_batch(self, src_db, batch_id, ev_list):
        for ev in ev_list:
            d = skytools.db_urldecode(ev.data)
            self.log.debug ("event : %s | type : %s | inserted by : %s" % (d, ev.type, ev.extra1))
            if not self.sendWelcomeMail(d):
                sys.exit(1)
            ev.tag_done()

if __name__ == '__main__':
    script = Mailer("mailer_daemon","src_db",sys.argv[1:])
    script.start()
</code>
</pre>
<p>Quite short isn&#8217;t it?<br />
I didn&#8217;t actually remove the mail sending part but it was never there. I admit i have absolutely no idea how to send an e-mail from python. Sending them however isn&#8217;t probably also the topic you are interested in, so let&#8217;s skip it.<br />
The parameters that are given to the pgq.Consumer are:</p>
<ul>
<li>name of the configuration section in config file : &#8220;mailer_daemon&#8221;</li>
<li>name of parameter from config file that contains the connection string to PgQ database : &#8220;src_db&#8221;</li>
<li>command line arguments</li>
</ul>
<p>Our configuration file looks like this<br />
<code><br />
[mailer_daemon]<br />
job_name          = mailer_daemon<br />
src_db            = dbname=queries<br />
pgq_queue_name    = mailer<br />
logfile           = %(job_name)s.log<br />
pidfile           = %(job_name)s.pid<br />
</code></p>
<h3>running the script &amp; monitoring</h3>
<p>Start the script with:<br />
<code><br />
python mailer.py mailer.conf -v<br />
</code><br />
-v means all self.log.debug info will be also displayed, this is useful for debugging what your script does. To run the script as a background process use the key -d. If you start the process in background be sure you have a pid file defined in the script config.<br />
The script will start producing output that looks like this:<br />
<code><br />
mbpro:~/temp kristokaiv$ python mailer.py mailer.conf -v<br />
2007-10-19 20:59:59,968 7634 DEBUG Attaching<br />
2007-10-19 21:00:00,002 7634 DEBUG event : {'hi!': None} | type : 4 | inserted by : None<br />
2007-10-19 21:00:00,002 7634 DEBUG event : {'username': 'kristokaiv', 'lastname': 'kristo.kaiv@geemail.com', 'firstname': 'Kaiv', 'language': 'Kristo'} | type : I: | inserted by : queue.welcome_email<br />
2007-10-19 21:00:00,011 7634 INFO {count: 2, duration: 0.0260310173035}<br />
2007-10-19 21:00:00,018 7634 INFO {count: 0, duration: 0.00694489479065}<br />
</code><br />
You can monitor the queue status in the following way:</p>
<pre>
<code>
mbpro:~/temp kristokaiv$ pgqadm.py ticker2.ini status
Postgres version: 8.2.3   PgQ version: 2.1.5

Event queue                                    Rotation        Ticker   TLag
------------------------------------------------------------------------------
mailer                                          3/7200s    500/3s/60s     2s
</code>
</pre>
<p>If you need to automate reporting the queue statuses you can find it by calling<br />
<code><br />
select * from pgq.get_queue_info();<br />
</code><br />
in the database. This concludes the short introduction to writing queue based batch jobs using PgQ and skytools framework </p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/kaiv.wordpress.com/21/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/kaiv.wordpress.com/21/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kaiv.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kaiv.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kaiv.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kaiv.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kaiv.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kaiv.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kaiv.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kaiv.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kaiv.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kaiv.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kaiv.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kaiv.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kaiv.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kaiv.wordpress.com/21/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=21&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kaiv.wordpress.com/2007/10/19/skytools-database-scripting-framework-pgq/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c111344c9bac47033d707eec98451507?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">kaiv</media:title>
		</media:content>
	</item>
		<item>
		<title>PostgreSQL cluster: partitioning with plproxy (part II)</title>
		<link>http://kaiv.wordpress.com/2007/09/02/postgresql-cluster-partitioning-with-plproxy-part-ii/</link>
		<comments>http://kaiv.wordpress.com/2007/09/02/postgresql-cluster-partitioning-with-plproxy-part-ii/#comments</comments>
		<pubDate>Sun, 02 Sep 2007 18:41:27 +0000</pubDate>
		<dc:creator>kaiv</dc:creator>
				<category><![CDATA[cluster]]></category>
		<category><![CDATA[partitioning]]></category>
		<category><![CDATA[plproxy]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://kaiv.wordpress.com/2007/09/02/postgresql-cluster-partitioning-with-plproxy-part-ii/</guid>
		<description><![CDATA[In the last post i described how you can setup plproxy and create a basic horizontally partitioned cluster. Now we will take a look at another real life usage: building a read-only cluster for your database Distributing read-only load The simplest real world usage for plproxy would be it’s use for redirecting read-only queries to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=17&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the last post i described how you can setup plproxy and create a basic horizontally partitioned cluster. Now we will take a look at another real life usage: building a read-only cluster for your database<br />
<span id="more-17"></span><br />
<strong>Distributing read-only load</strong></p>
<p>The simplest real world usage for plproxy would be it’s use for redirecting read-only queries to read-only replicas of master database. The replicated databases can be filled with data via <a href="http://pgfoundry.org/projects/skytools/">Londiste</a> that is part of the SkyTools package, setup tutorial can be found <a href="http://pgsql.tapoueh.org/londiste.html">here</a> or with <a href="http://slony.info/">Slony</a> which is a more heavyweight solution but from my own experience also harder to setup and maintain though definitely at the time being better documented.<br />
A typical read-only cluster could look like on the following schema. The databases with the letter (P) on them are connection poolers. We ourself use<a href="http://pgfoundry.org/projects/pgbouncer/"> PgBouncer</a> but <a href="http://pgpool.projects.postgresql.org/">pgpool</a> is also a choice.<br />
The poolers are needed to minimize the number of open connections to a database also execution plans are cached on a connection basis. Of course everything will work fine also without the poolers. Dashed bold arrows represent replicas.<br />
<img src="http://kaiv.files.wordpress.com/2007/09/read_only_cluster_238x250shkl.jpg?w=600" alt="read only cluster" /></p>
<p>In this setup the plproxy functions determine the database to which the query is redirected. Read&amp;write queries go to master database and read-only queries are distributed based on the algorithm you define to read-only replicas.<br />
Setting up replication itself is relatively easy once you have passed the painful skytools installation process.<br />
First let us create both replicas from write database toward ro1 &amp; ro2. ro1 configuration file looks like this:<br />
<strong>replica1.ini</strong></p>
<pre>
[londiste]
job_name = londiste_master_to_r1
provider_db = dbname=write
subscriber_db = dbname=ro1
# it will be used as sql ident so no dots/spaces
pgq_queue_name = londiste.write
pidfile = %(job_name)s.pid
logfile = %(job_name)s.log
use_skylog = 0</pre>
<p>replica2.ini is basically the same only job name and database name need to be changed. Now let’s install Londiste on provider (write) and subscribers (ro1,ro2) and start the replication daemons:</p>
<pre>
mbpro:~/temp kristokaiv$ londiste.py replica1.ini provider install
mbpro:~/temp kristokaiv$ londiste.py replica1.ini subscriber install
mbpro:~/temp kristokaiv$ londiste.py replica2.ini subscriber install
mbpro:~/temp kristokaiv$ londiste.py replica1.ini replay -d
mbpro:~/temp kristokaiv$ londiste.py replica2.ini replay -d</pre>
<p>The next thing you need to do is to setup the ticker process on the database where write is performed. The ticker creates sync events so running it with shorter intervals will reduce latency. My configuration file looks like this:<br />
<strong>ticker_write.ini</strong></p>
<pre>
[pgqadm]
job_name = ticker_write
db = dbname=write
# how often to run maintenance [minutes]
maint_delay_min = 1
# how often to check for activity [secs]
loop_delay = 0.1
logfile = %(job_name)s.log
pidfile = %(job_name)s.pid
use_skylog = 0</pre>
<p>To start the ticker as a daemon just run:</p>
<pre>
mbpro:~/temp kristokaiv$ pgqadm.py ticker_write.ini ticker -d</pre>
<p>Lets create a simple table that we will replicate from master to read-only’s</p>
<pre>
mbpro:~/temp kristokaiv$ psql -c "CREATE TABLE users (username text primary key, password text);" write
mbpro:~/temp kristokaiv$ psql -c "CREATE TABLE users (username text primary key, password text);" ro1
mbpro:~/temp kristokaiv$ psql -c "CREATE TABLE users (username text primary key, password text);" ro2</pre>
<p>And add it to replication</p>
<pre>
mbpro:~/temp kristokaiv$ londiste.py replica1.ini provider add users
mbpro:~/temp kristokaiv$ londiste.py replica1.ini subscriber add users
mbpro:~/temp kristokaiv$ londiste.py replica2.ini subscriber add users</pre>
<p>After some time the tables should be up to date. Insert a new record in the write database and check if it’s delivered to both read-only db’s.<br />
The functions to insert and select from users table:</p>
<pre>
CREATE OR REPLACE FUNCTION public.add_user(
    in i_username text,
    in i_password text,
    out status_code text
) AS $$
BEGIN
    PERFORM 1 FROM users WHERE username = i_username;
    IF NOT FOUND THEN
        INSERT INTO users (username, password) VALUES (i_username, i_password);
        status_code = 'OK';
    ELSE
        status_code = 'user exists';
    END IF;
    RETURN;
END; $$ LANGUAGE plpgsql SECURITY DEFINER;

GRANT EXECUTE ON FUNCTION public.add_user(
    in i_username text,
    in i_password text,
    out status_code text
) TO plproxy;

CREATE OR REPLACE FUNCTION login(
    in i_username text,
    in i_password text,
    out status_code text
) AS $$
BEGIN
    SELECT 'OK' FROM users u WHERE username = i_username AND password = i_password INTO status_code;
    IF NOT FOUND THEN status_code = 'FAILED'; END IF;
    RETURN;
END; $$ LANGUAGE plpgsql SECURITY DEFINER;

GRANT EXECUTE ON FUNCTION login(
    in i_username text,
    in i_password text,
    out status_code text
) TO plproxy;</pre>
<p>Just for the comfort of those actually trying to follow these steps, here is how the proxy databases<br />
<strong>cluster config</strong>:</p>
<pre>
CREATE OR REPLACE FUNCTION plproxy.get_cluster_partitions (cluster_name text)
RETURNS SETOF text AS $$
BEGIN
   IF cluster_name = 'readonly' THEN
        RETURN NEXT 'host=127.0.0.1 dbname=ro1';
        RETURN NEXT 'host=127.0.0.1 dbname=ro2';
        RETURN;
    ELSIF cluster_name = 'write' THEN
        RETURN NEXT 'host=127.0.0.1 dbname=write';
        RETURN;
    END IF;
    RAISE EXCEPTION 'no such cluster%', cluster_name;
END; $$ LANGUAGE plpgsql SECURITY DEFINER;

CREATE OR REPLACE FUNCTION plproxy.get_cluster_config(
    in cluster_name text,
    out key text,
    out val text)
RETURNS SETOF record AS $$
BEGIN
    RETURN;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION plproxy.get_cluster_version(cluster_name text) RETURNS int AS $$
    SELECT 1;
$$ LANGUAGE SQL;</pre>
<p>The last thing left to do is to create the plproxy function definitions that will redirect the login function calls against read-only databases and add_user calls against write database:</p>
<pre>
CREATE OR REPLACE FUNCTION public.login(
    in i_username text,
    in i_password text,
    out status_code text
) AS $$
CLUSTER 'readonly'; RUN ON ANY;
$$ LANGUAGE plproxy;

CREATE OR REPLACE FUNCTION public.add_user(
    in i_username text,
    in i_password text,
    out status_code text
) AS $$
CLUSTER 'write';
$$ LANGUAGE plproxy;</pre>
<p>This is it, the read-only cluster is ready. Note that even though creating such a read-only cluster seems simple and a quick solution for your performance problems it is not a silver bullet solution. Asynchronous replication often creates more problems than it solves so be careful to replicate only non-timecritical data or guarantee a fallback solution when data is not found (eg. proxy function first checks readonly database and if data is not found looks the data up from write db)</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/kaiv.wordpress.com/17/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/kaiv.wordpress.com/17/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kaiv.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kaiv.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kaiv.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kaiv.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kaiv.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kaiv.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kaiv.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kaiv.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kaiv.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kaiv.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kaiv.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kaiv.wordpress.com/17/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kaiv.wordpress.com/17/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kaiv.wordpress.com/17/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=17&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kaiv.wordpress.com/2007/09/02/postgresql-cluster-partitioning-with-plproxy-part-ii/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c111344c9bac47033d707eec98451507?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">kaiv</media:title>
		</media:content>

		<media:content url="http://kaiv.files.wordpress.com/2007/09/read_only_cluster_238x250shkl.jpg" medium="image">
			<media:title type="html">read only cluster</media:title>
		</media:content>
	</item>
		<item>
		<title>PostgreSQL cluster: partitioning with plproxy (part I)</title>
		<link>http://kaiv.wordpress.com/2007/07/27/postgresql-cluster-partitioning-with-plproxy-part-i/</link>
		<comments>http://kaiv.wordpress.com/2007/07/27/postgresql-cluster-partitioning-with-plproxy-part-i/#comments</comments>
		<pubDate>Fri, 27 Jul 2007 16:22:33 +0000</pubDate>
		<dc:creator>kaiv</dc:creator>
				<category><![CDATA[cluster]]></category>
		<category><![CDATA[partitioning]]></category>
		<category><![CDATA[plproxy]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://kaiv.wordpress.com/2007/07/27/postgresql-cluster-partitioning-with-plproxy-part-i/</guid>
		<description><![CDATA[Skype has developed many handy tools for creating a database cluster and this series of posts is intended to shed some light on their rather undocumented features. At the base of it stand&#8217;s plproxy. The best way to describe it&#8217;s features would be &#8220;dblink on steroids&#8221;. This short tutorial will explain how to install plproxy, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=11&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Skype has developed many handy tools for creating a database cluster and this series of posts is intended to shed some light on their rather undocumented features. At the base of it stand&#8217;s plproxy. The best way to describe it&#8217;s features would be &#8220;dblink on steroids&#8221;. This short tutorial will explain how to install plproxy, do simple remote database calls and setup a simple horizontally partitioned database cluster.<br />
<span id="more-11"></span></p>
<p><strong>Partitioning for dummys</strong></p>
<p>Partitioning let&#8217;s you distribute the database load and data between multiple database servers.<br />
The principle itself is simple. Let&#8217;s say you have one table that contains the users login credentials but the problem is that there are millions of users that daily log in to their account. This of cause creates a lot of load not talking about the huge table it needs for storage. First we need a criteria based on which we choose what server contains what data. We could do this by the usernames first character. Users from a-j go to first server and usernames beginning with k-z go to the second server. It does work but probably one of the servers gets more load than the other. The most common option is to choose the partition based on the hash of the primary key value, in our case the username. Using a hashing function will distribute the users between servers very evenly. What you need to know about hashing is that hash function basically calculates a number based on any given input it can handle:</p>
<pre>
select hashtext('kristokaiv1') = 1116512480
select hashtext('kristokaiv2') = 1440348351
select hashtext('kristokaiv3') = -219299073
</pre>
<p>How it works internally is beyond the scope of this post.<br />
Now let&#8217;s say we have 2 partitions, then we could get the partition number based on the username hash like this</p>
<pre>
partition nr = hashtext($1) &amp; 1
</pre>
<p>The &amp; 1 will give us the last bit of the number which can be either 0 or 1 which is the number we will use to choose the partition the user data will be stored in. If it&#8217;s 0 the data goes to partition 0 and if it&#8217;s one the data goes to partition 1</p>
<pre>
select hashtext('kristokaiv1') &amp; 1 = 0 -&gt; partition 0
select hashtext('kristokaiv2') &amp; 1 = 1 -&gt; partition 1
select hashtext('kristokaiv3') &amp; 1 = 1 -&gt; partition 1
</pre>
<p><strong>How plproxy works</strong></p>
<p>The concept itself is rather simple &#8211; plproxy is a new language created inside the PostgreSQL database that enables to make remote database calls exactly as you do with dblink. The syntax is really straightforward &#8211; the following statement creates a new plproxy function in the database that when run will connect to the database remotedb, execute the function get_user_email(text) and return the results.</p>
<pre>
localdb=#
CREATE FUNCTION get_user_email(username text) RETURNS text AS $$
    CONNECT 'dbname=remotedb host=123.456.123.21 user=myuser';
$$ LANGUAGE plproxy;
</pre>
<p>Lets create a dummy function in the remotedb that will respond to the call</p>
<pre>
remotedb=#
create function get_user_email(text)
returns text as $$
    select 'me@somewhere.com'::text;
$$ language sql;
</pre>
<p>On execution we will see exactly the same results as we would when executing the query on remotedb</p>
<pre>
localdb=#
select * from get_user_email('tere');
get_user_email
------------------
me@somewhere.com
(1 row)
</pre>
<p>Of course this is just a really simple example and i will get back to the more complex syntax later, let&#8217;s first take a look on how to install the plproxy language.</p>
<p><strong>Installing plproxy</strong></p>
<p>Plproxy can be downloaded from <a href="http://pgfoundry.org/projects/plproxy/">http://pgfoundry.org/projects/plproxy/</a> but i strongly suggest you get the newest version from the pgfoundry CVS, instructions how to set it up are <a href="http://pgfoundry.org/scm/?group_id=1000207">here</a>. You have to have the PostgreSQL developement environment installed and the folder where the PostgreSQL configuration info tool (pg_config) is needs to be in included in your $PATH variable. If those prerequisites are met then the installation is simple:</p>
<pre>
$ make
$ make install
$ make installcheck
</pre>
<p>If you don&#8217;t manage to get it working by yourself there is always the <a href="http://lists.pgfoundry.org/mailman/listinfo/plproxy-users">mailing list</a> to help you get started.<br />
The final step is to install the language into the database, this can&#8217;t be done as for other languages (createlang utility) instead you have to execute the plproxy.sql file that will create the language call handler. locate plproxy.sql is how i found it but it should be somewhere under contrib.</p>
<pre>
$ psql -f /usr/local/pgsql/share/contrib/plproxy.sql queries
CREATE FUNCTION
CREATE LANGUAGE
</pre>
<p>Now everything should be done and you can test the setup with the simple plproxy function in the syntax example.</p>
<p><strong>Setting up our first cluster</strong></p>
<p>Let&#8217;s create a simple cluster that consists of 3 databases (in my example they are all running on the same PostgreSQL instance). One proxy database called queries and 2 partitions queries_0000 and queries_0001. Horizontal partitioning is done based on username, It&#8217;s the most common way for partitioning as most of the data in the database is usually user related eg. users login, users orders, users payments, users settings&#8230;<br />
<img src="http://kaiv.files.wordpress.com/2007/09/cluster_250x149shkl.jpg?w=600" alt="cluster setup" /></p>
<p>The database cluster setup is stored inside plpgsql functions that plproxy calls.<br />
There are 3 functions that you _MUST_ create for the cluster configuration to work properly. So let&#8217;s create them on the proxy database &#8220;queries&#8221;.</p>
<p>1) plproxy.get_cluster_version(cluster_name text)<br />
This function called on each request and is used to determine if the configuration for a cluster has been changed, if the version number it returns is higher than the cached version number partitions configuration is reloaded. Let&#8217;s start with the first version of our configuration like this:</p>
<pre>
CREATE OR REPLACE FUNCTION plproxy.get_cluster_version(cluster_name text) RETURNS int AS $$
BEGIN
    IF cluster_name = 'queries' THEN
        RETURN 1;
    END IF;
END;
$$ LANGUAGE plpgsql;
</pre>
<p>2) plproxy.get_cluster_partitions(cluster_name text)<br />
Function should return the connection strings for all partitions in the correct order.<br />
Because of some unreasonable limitation the total count must be power of 2. This is a unreasonable limitation that can easily be overcome but let&#8217;s discuss this in another post.</p>
<pre>
CREATE OR REPLACE FUNCTION plproxy.get_cluster_partitions(cluster_name text) RETURNS SETOF text AS $$
BEGIN
    IF cluster_name = 'queries' THEN
        RETURN NEXT 'host=127.0.0.1 dbname=queries_0000';
        RETURN NEXT 'host=127.0.0.1 dbname=queries_0001';
        RETURN;
    END IF;
    RAISE EXCEPTION 'no such cluster: %', cluster_name;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
</pre>
<p>If postgres username is not specified in the connection string the name of CURRENT_USER will be used. As plproxy does not know any passwords, the partition databases should trust connections from the proxy database.</p>
<p>3) plproxy.get_cluster_config(cluster_name text)<br />
This is the equivalent of an init file. It should return the configuration parameters as key &#8211; value pairs. All of them are optional but you still need the dummy placeholder function:</p>
<pre>
CREATE OR REPLACE FUNCTION plproxy.get_cluster_config (cluster_name text, out key text, out val text)
RETURNS SETOF record AS $$
BEGIN
    RETURN;
END;
$$ LANGUAGE plpgsql;
</pre>
<p>The details of configuration parameters and what they do can be found in the plproxy documentation.</p>
<p>Now the setup is complete and we can start playing around with our new cluster.<br />
Let&#8217;s create a new table to store usernames on both partitions</p>
<pre>
#queries_0000=# CREATE TABLE users (username text PRIMARY KEY);
#queries_0001=# CREATE TABLE users (username text PRIMARY KEY);
</pre>
<p>Also we must create a new function that is used to insert new usernames into the table:</p>
<pre>
CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS text AS $$
BEGIN
    PERFORM 1 FROM users WHERE username = i_username;
    IF NOT FOUND THEN
        INSERT INTO users (username) VALUES (i_username);
        RETURN 'user created';
    ELSE
        RETURN 'user already exists';
    END IF;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
</pre>
<p>Now let&#8217;s create the proxy function on the proxy database that will call the partitions</p>
<pre>
queries=#
CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS TEXT AS $$
    CLUSTER 'queries'; RUN ON hashtext(i_username);
$$ LANGUAGE plproxy;
</pre>
<p>Filling the partitions with random data:</p>
<pre>
SELECT insert_user('user_number_'||generate_series::text) FROM generate_series(1,10000);
</pre>
<p>Now if we go to the partition databases we will see that both of them are filled<br />
and the distribution is quite even.</p>
<pre>
queries_0001 count(*) -&gt; 5071
queries_0000 count(*) -&gt; 4930
</pre>
<p>To be continued&#8230;</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/kaiv.wordpress.com/11/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/kaiv.wordpress.com/11/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kaiv.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kaiv.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kaiv.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kaiv.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kaiv.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kaiv.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kaiv.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kaiv.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kaiv.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kaiv.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kaiv.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kaiv.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kaiv.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kaiv.wordpress.com/11/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=11&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kaiv.wordpress.com/2007/07/27/postgresql-cluster-partitioning-with-plproxy-part-i/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c111344c9bac47033d707eec98451507?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">kaiv</media:title>
		</media:content>

		<media:content url="http://kaiv.files.wordpress.com/2007/09/cluster_250x149shkl.jpg" medium="image">
			<media:title type="html">cluster setup</media:title>
		</media:content>
	</item>
		<item>
		<title>Decreasing the index size on wide columns</title>
		<link>http://kaiv.wordpress.com/2007/07/23/decreasing-the-index-size-on-wide-columns/</link>
		<comments>http://kaiv.wordpress.com/2007/07/23/decreasing-the-index-size-on-wide-columns/#comments</comments>
		<pubDate>Mon, 23 Jul 2007 21:41:27 +0000</pubDate>
		<dc:creator>kaiv</dc:creator>
				<category><![CDATA[data mining]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[query tuning]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[tips&tricks]]></category>

		<guid isPermaLink="false">http://kaiv.wordpress.com/2007/07/23/decreasing-the-index-size-on-wide-columns/</guid>
		<description><![CDATA[If you have indexes on columns that are quite large but are not used for range scans (greater than, less than, between) then it is a wise choice to compact the index to cut back on the disk usage overhead and free up some memory that you desperately need. For example if you have a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=10&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>If you have indexes on columns that are quite  large but are not used for range scans (greater than, less than, between) then it is a wise choice to compact the index to cut back on the disk usage overhead and free up some memory that you desperately need.</p>
<p>For example if you have a column username that is used to keep the usernames as text then an average row width could be something like 24B eg. &#8216;mister_nice@hotmail.com&#8217;. If you have 1 million users this means you will be using 24MB just for keeping the values, and when the index is accessed regularly the database does it&#8217;s best to keep it all in the memory.</p>
<p>The solution itself is simple. You should create a functional index on the column that is calculated using a <a href="http://en.wikipedia.org/wiki/Hash_functions">hash  algorithm</a> When using the PostgreSQL built-in hash function called hashtext(text) we are able to decrease the space needed to store the value from 24B  to 4B (Actual index tuples have a lot more information that is used internally)</p>
<pre>CREATE INDEX my_ix ON users (hashtext(username));</pre>
<p>This statement will generate an index that is much more compact as it only has to store 4B per user. Raw data itself is not anymore kept inside the index when you create the functional index, instead the result given back from the hash function is stored there. For &#8216;mister_nice@hotmail.com&#8217; it would be  hashtext(&#8216;mister_nice@hotmail.com&#8217;) = 1408893908 that is a 4B integer.  The select statement that is used to look up the user is as follows:</p>
<pre>SELECT * FROM users WHERE hashtext(username) = hashtext($1) AND username = $1;</pre>
<p>Hashing can create <a href="http://en.wikipedia.org/wiki/Hash_collision">hash collisions</a> (different usernames will have the same hash) so that is why we need to also add the exact match criteria (username = $1). This select will use the created hashtext index to find 1..N rows that match the calculated hash and then filter out the ones whos username is an exact match.</p>
<p>Just a few numbers from a quick test:<br />
100K rows 24B text each<br />
Size of: btree(text) = 516096<br />
Size of: btree(hashtext(text)) = 245760<br />
This is a 52% gain.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/kaiv.wordpress.com/10/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/kaiv.wordpress.com/10/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kaiv.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kaiv.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kaiv.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kaiv.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kaiv.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kaiv.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kaiv.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kaiv.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kaiv.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kaiv.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kaiv.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kaiv.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kaiv.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kaiv.wordpress.com/10/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=10&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kaiv.wordpress.com/2007/07/23/decreasing-the-index-size-on-wide-columns/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c111344c9bac47033d707eec98451507?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">kaiv</media:title>
		</media:content>
	</item>
		<item>
		<title>Last row in a history table</title>
		<link>http://kaiv.wordpress.com/2007/07/19/last-row-in-a-history-table/</link>
		<comments>http://kaiv.wordpress.com/2007/07/19/last-row-in-a-history-table/#comments</comments>
		<pubDate>Thu, 19 Jul 2007 12:58:29 +0000</pubDate>
		<dc:creator>kaiv</dc:creator>
				<category><![CDATA[postgresql]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[tips&tricks]]></category>

		<guid isPermaLink="false">http://kaiv.wordpress.com/2007/07/19/last-row-in-a-history-table/</guid>
		<description><![CDATA[When writing analytical query-s where you need to get the last rows in the table per user there is a nice feature in PostgreSQL that doesn&#8217;t need you to rely on subqueries. The following query selects the last event for every user in a status history table: select distinct on (key_user) * from order_status_log where [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=6&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>When writing analytical query-s where you need to get the last rows in the table per user there is a nice feature in PostgreSQL that doesn&#8217;t need you to rely on subqueries. The following query selects the last event for every user in a status history table:</p>
<pre>
<strong>select distinct on (key_user) * from order_status_log where key_status=7 order by key_user, deliver_time desc;</strong>
explain:
Unique  (cost=48603.10..50390.54 rows=109180 width=78)
    -&gt;  Sort  (cost=48603.10..49496.82 rows=357487 width=78)
    Sort Key: key_user, deliver_time
        -&gt;  Seq Scan on user_number  (cost=0.00..15629.34 rows=357487 width=78)
        Filter: (key_status = 7)</pre>
<p>Note that the columns that are used for distinct must also be in the order by clause!</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/kaiv.wordpress.com/6/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/kaiv.wordpress.com/6/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kaiv.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kaiv.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kaiv.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kaiv.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kaiv.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kaiv.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kaiv.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kaiv.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kaiv.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kaiv.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kaiv.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kaiv.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kaiv.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kaiv.wordpress.com/6/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=6&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kaiv.wordpress.com/2007/07/19/last-row-in-a-history-table/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c111344c9bac47033d707eec98451507?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">kaiv</media:title>
		</media:content>
	</item>
		<item>
		<title>Faster insert for multiple rows</title>
		<link>http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/</link>
		<comments>http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/#comments</comments>
		<pubDate>Thu, 19 Jul 2007 12:06:57 +0000</pubDate>
		<dc:creator>kaiv</dc:creator>
				<category><![CDATA[postgresql]]></category>
		<category><![CDATA[query tuning]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[tips&tricks]]></category>

		<guid isPermaLink="false">http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/</guid>
		<description><![CDATA[As you all probably know the fastest way to get data into the database is the COPY statement, but there is also a means to speed up inserting multiple rows. PostgreSQL supports inserting multiple rows with one INSERT statment. The syntax is as follows: INSERT INTO tablename (column, column...) VALUES (row1_val1, row1_val2...), (row2_val1, row2_val2)..; I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=7&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>As you all probably know the fastest way to get data into the database is the COPY statement, but there is also a means to speed up inserting multiple rows. PostgreSQL supports inserting multiple rows with one INSERT statment. The syntax is as follows:</p>
<p><code>
<pre>INSERT INTO tablename (column, column...) VALUES (row1_val1, row1_val2...), (row2_val1, row2_val2)..;</pre>
<p></code></p>
<p>I did some quick tests to compare the performance difference between multiple insert statments inside one transaction versus one multirow insert. The testset was 100K records and i run the tests for several times, the magnitude of difference in performance was constant over the tests, showing that multi-row insert is approximately 4 times faster than normal insert statements. Multi-row insert needs the statment to be parsed, prepared and changes written to WAL only once therefore resulting in less overhead.<br />
<span id="more-7"></span><br />
Test:<br />
<code></p>
<pre>
tmpdb=# create table things (things_id serial primary key, thing text);
NOTICE:  CREATE TABLE will create implicit sequence "things_things_id_seq" for serial column "things.things_id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "things_pkey" for table "things"
CREATE TABLE
</pre>
<p></code></p>
<p><strong>Multi-row insert SQL file</strong><br />
<code></p>
<pre>
insert into things (thing) values ('thing nr. 0'),
('thing nr. 1'),
('thing nr. 2'),
('thing nr. 3'),
...
('thing nr. 99999),
('thing nr. 100000);
</pre>
<p></code></p>
<p><strong>Multiple insert statments SQL file</strong><br />
<code></p>
<pre>
begin;
insert into things (thing) values ('thing nr. 0');
insert into things (thing) values ('thing nr. 1');
insert into things (thing) values ('thing nr. 2');
....
insert into things (thing) values ('thing nr. 99999');
insert into things (thing) values ('thing nr. 100000');
commit;
</pre>
<p></code></p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/kaiv.wordpress.com/7/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/kaiv.wordpress.com/7/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kaiv.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kaiv.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kaiv.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kaiv.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kaiv.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kaiv.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kaiv.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kaiv.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kaiv.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kaiv.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kaiv.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kaiv.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kaiv.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kaiv.wordpress.com/7/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=7&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c111344c9bac47033d707eec98451507?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">kaiv</media:title>
		</media:content>
	</item>
		<item>
		<title>NOW() vs &#8216;NOW&#8217;</title>
		<link>http://kaiv.wordpress.com/2007/07/17/now-vs-now/</link>
		<comments>http://kaiv.wordpress.com/2007/07/17/now-vs-now/#comments</comments>
		<pubDate>Tue, 17 Jul 2007 18:55:04 +0000</pubDate>
		<dc:creator>kaiv</dc:creator>
				<category><![CDATA[postgresql]]></category>
		<category><![CDATA[query tuning]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://kaiv.wordpress.com/2007/07/17/now-vs-now/</guid>
		<description><![CDATA[The following examples explain how postgresql query planner deals with queries that use the current time as a parameter: FALSE: explain SELECT * FROM orders WHERE creation_date &#62; NOW() - interval '30 mins' AND key_status = 5; ----------------------------------------------------------------------------------- Seq Scan on orders (cost=0.00..9370300.50 rows=1964090 width=217) Filter: ((creation_date &#62; (now() - '00:30:00'::interval)) AND (key_status = 5)) [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=5&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The following examples explain how postgresql query planner deals with queries that use the current time as a parameter:<br />
<code></p>
<pre>
<b>FALSE:</b>
explain SELECT * FROM orders WHERE creation_date &gt; NOW() - interval '30 mins' AND key_status = 5;
-----------------------------------------------------------------------------------
 Seq Scan on orders  (cost=0.00..9370300.50 rows=1964090 width=217)
   Filter: ((creation_date &gt; (now() - '00:30:00'::interval)) AND (key_status = 5))
</pre>
<p></code><br />
In the first case the planner sees the timestamp parameter <code>NOW()</code> as a inline function and therefore can&#8217;t estimate the number of rows matching the criteria as the result of a function call can never be determined (except for immutable functions). This results in an suboptimal execution plan (sequential scan)</p>
<p><code></p>
<pre>
<b>CORRECT:</b>
explain SELECT * FROM orders WHERE creation_date &gt; 'NOW'::timestamptz - interval '30 mins' AND key_status = 5;
--------------------------------------------------------------------------------------------
 Index Scan using idx_order_creation_date on orders  (cost=0.00..98.03 rows=587 width=217)
   Index Cond: (creation_date &gt; '2006-03-03 15:07:22.492173+02'::timestamp with time zone)
   Filter: (key_status = 5)
</pre>
<p></code></p>
<p>In the second case the planner handles &#8216;NOW&#8217; as a constant and therefore is able to estimate how many rows have a greater value resulting in a optimal execution plan (index range scan)</p>
<p>Instead &#8216;NOW&#8217; you can also use the variable CURRENT_TIMESTAMP and other similar constants which you don&#8217;t have to quote. They are somewhere amidst other useful information in the PostgreSQL documentation.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/kaiv.wordpress.com/5/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/kaiv.wordpress.com/5/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kaiv.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kaiv.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kaiv.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kaiv.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kaiv.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kaiv.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kaiv.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kaiv.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kaiv.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kaiv.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kaiv.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kaiv.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kaiv.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kaiv.wordpress.com/5/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=kaiv.wordpress.com&amp;blog=1381779&amp;post=5&amp;subd=kaiv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://kaiv.wordpress.com/2007/07/17/now-vs-now/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c111344c9bac47033d707eec98451507?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">kaiv</media:title>
		</media:content>
	</item>
	</channel>
</rss>
