I've been working on some PHP code for use in a postgres database system. The code already supports both MySQL and Postgres, but there is a significant speed delta between the two in very basic things.
Discrepancies between query speeds isn't really what interests me. I know that due to architectural differences, MySQL is going to be slightly faster in a low traffic situation, with my (untested) assumption being that it will degrade to something similar to Postgres performance under high traffic yields. The big issue has become connection time to Postgres via a TCP socket in PHP. Using my own laptop as a test case, I'm running 7.4 postgres, a copy of Apache v1.whatever, and PHP 4.3.4 and receiving connect times of 19.88902* ms or larger (degree of variance is about 3%), while running MySQL the connection time is significantly lower. Why?
Looking through the ext/pgsql PHP code I haven't found any significant deviations that would suggest a reasoning for the time differential. That has left the use of the postgres libraries as the major barrier. Having read through the postgres mailing lists, the biggest suggestion is that I shouldn't be killing my Apache processes too quickly (not a valid reason in this case) to benefit from pconnects (duh). Outside of that I haven't found much of any reasoning or analysis on the subject.
* All times have been created through the addition of gettimeofday() calls before and after the PQconnectdb() call. Calls to zend_hash_update did not significantly influence the timing.
[EDIT: corrected function call name from PGconnectdb to PQconnectdb. Added category]
Posted by Dan at March 22, 2004 04:31 AMYou see this in many other database systems as well. Because they have a deeper session-model, they front-load alot of initialization work to connection time. Why are using persistent connections not possible?
Posted by: George Schlossnagle at March 22, 2004 06:04 AMDefine persistent connection in this setting. A PHP pg_pconn(), or an SRM (or some other application infrastructure based) system connection? In the first case, once an Apache child has been asked to establish a connection things run fine until the child dies, and the connection is relegated to recreation status. The second option has yet to be tried completely, but is on schedule to do so in the near future.
Posted by: Dan Kalowsky at March 22, 2004 06:31 AMMySQL is only faster with simple selects than PostgreSQL. Performance will degrade to much much worse than PostgreSQL in heavy-usage scenarios, and PostgreSQL is also a clear winner with complex selects (benefiting from things as functional and partial indexes, among others).
Connect time is certainly slower in PostgreSQL. However, you should wonder if it really matters. The pconnect() model of PHP is broken anyway, as it opens way too much database connections on a busy webserver. A webserver with 500 threads having 500 db connections open at the same time will slow MySQL to a standstill, and PostgreSQL doesn't really like that scenario either (although it copes much better).
Using normal connects is a lot faster, since you will almost never have more than, say, 50 connections open at the same time. Other webservers use a pool of persistent db connections for all webserver threads, allowing those 500 webserver threads to use the same 50 persistent db connections without connect delays. This model is obviously the fastest.
If you have a scenario where all that matters is low-delay responses to a few users, by all means use MySQL. Or better yet, stop using an SQL database entirely and switch to some other storage method (flat files, shared memory, whatever) as relational databases aren't built for this scenario and won't perform as good as other methods.
Posted by: Vinz at March 23, 2004 04:40 AMMySQL is only faster with simple selects than PostgreSQL. Performance will degrade to much much worse than PostgreSQL in heavy-usage scenarios, and PostgreSQL is also a clear winner with complex selects (benefiting from things as functional and partial indexes, among others).
Connect time is certainly slower in PostgreSQL. However, you should wonder if it really matters. The pconnect() model of PHP is broken anyway, as it opens way too much database connections on a busy webserver. A webserver with 500 threads having 500 db connections open at the same time will slow MySQL to a standstill, and PostgreSQL doesn't really like that scenario either (although it copes much better).
Using normal connects is a lot faster, since you will almost never have more than, say, 50 connections open at the same time. Other webservers use a pool of persistent db connections for all webserver threads, allowing those 500 webserver threads to use the same 50 persistent db connections without connect delays. This model is obviously the fastest.
If you have a scenario where all that matters is low-delay responses to a few users, by all means use MySQL. Or better yet, stop using an SQL database entirely and switch to some other storage method (flat files, shared memory, whatever) as relational databases aren't built for this scenario and won't perform as good as other methods.
Posted by: Vinz at March 23, 2004 04:40 AMI think the crucial difference here is that postgres fork()s a child for each incoming connection. This is bound to be more expensive than mysql's threaded approach. Connection pooling largely addresses the issue, but as Vinz already mentioned, PHP's similar, but different "persistent connections" model isn't really a solution.
Posted by: Marko at March 23, 2004 08:09 AMYour title is a bit misleading, when I first saw it I though you would be comparing a query, not how long it takes to connect. I wasn't really surprised by your finding that MySQL can make a connection much faster than PostgreSQL, but it did make me curious as to how much faster. So I did a few simple benchmarks to test this out. You can see my
benchmark results for yourself (http://www.randomnetworks.com/joseph/blog/?eid=100), but the long and the short of it is that MySQL wins when not using persistent connections and PostgreSQL wins when you do.
I'm not a postgres user (though I have used it), but I am a big Oracle user, and the process model seems similar. The biggest key in my experience is to ensure that you have a correctly sized, stable, and long-lived pool of apache children. If the overhead of the connect is amortized over 100,000 requests, it becomes insignificant. One often overlooked aspect of this is sizing your spare children settings to ensure yoou don't get rolling spikes as children are created and auto-reaped.
Posted by: George Schlossnagle at March 24, 2004 06:46 AMThere's an extensive discussion by postgresql developers on forking and connection time between mysql threading model.
http://go.jitbot.com/pgsql-prefork
Apparently, forking and connection time is very minimal compared to the time it takes to execute a query.
The question isn't on connection time vs query time. It's understood that some queries will take time a large amount of time to execute and can only be optimized to a certain point. The connection time debate on that thread seems to be handling by the OS on how to deal with idle connections. I tend to agree with them that most of this arguement/discussion is pointless, as a fully threaded model would work a lot better, and that is in the works
Posted by: Dan Kalowsky at June 17, 2004 10:03 AM