November 04, 2004

Comments on PDO

I've received a few emails from various users of PHP asking for my opinion on the PDO extension. Written by Wez, who for all I've ever seen is a rather intelligent programmer, my initial impression is that it's purpose is to unify the many database functionalities to one extensible interface. Admirable goal in my opinion. Stig and I had talked about this years ago and obviously not done anything with the concept. Partly because as I thought about it more, it seems implementing such a beast (to me at least) would be re-inventing the ODBC wheel (unified abstracted database accessor functionality that is industry supported).

Not to toot ODBCs horn, because my initial guess is PDO works significantly faster when utilizing native driver connections, and it probably also works faster in ODBC mode by not using a dynamic cursor like the ext/ODBC does. In any case my general opinion is it looks nice, seems to work, isn't documented very well (yet), and I have no idea what the general long term goal is for it's development.

The lingering question is PDO going to split the PHP database support, attempt to nullify the current generation of extensions, or is it going to just peacefully co-exist? I have a feeling it's original intention was to be a C style system, but I see it becoming more of an A+B case as time goes on.

Posted by Dan at 11:18 AM | Comments (0)

August 13, 2004

PHPs odbc_fetch functions

Dave Lawson had sent out a patch a little while ago on the php-internals list with a collection of patches for the ODBC system. He basically took out the useless ID numbering system ODBC has in place relying upon the system in Zend for resource management, and a second patch that forces a ecalloc instead of an emalloc for the SQLExtendedFetch failure points.

While the first is something I've toyed with doing a lot, I've just never felt it was that big a deal really to bother with it. The second patch is one I'm not entirely sure about. While his patch works fine, I think the basic assumption and means to correcting it are not the best possible path. Mainly because it forces a reliance upon the behavior of a section of code instead of being reliant upon the return status/codes of a series of functions. Regardless, it needs testing and due to circumstances beyond my control I cannot test it completely/throughly at the moment. As such, I've committed the ecalloc patch for testing and use in the generated snapshots. Hopefully if something is not working, someone will email me and let me know. Platform of interest is MS Access, which has historically been the most finicky of the ODBC systems.

Oh yeah, I should mention that it's not that the ecalloc patch is wrong or going to be harmful in anyway at the moment. In fact, for what it does, it's really quite a nice temporary solution to a more systemic problem of the ODBC module. My point of concern with it comes from future evolution of the ODBC module, where this behavior may not be assured, or more importantly not recognized/remembered when extending the ODBC module.

Posted by Dan at 06:16 PM | Comments (0)

July 13, 2004

PHP 5

It seems that PHP 5 is finally out. At least the CVS is finally open again and we can get back to contributing code.

Posted by Dan at 09:55 PM | Comments (0)

March 24, 2004

postgres furthered...

While not to add more gas to the fire here, Joseph Scott took some time to try and prove or disprove my findings over on his blog. Very cool, minor comments on it though:

- My tests were done using gettimeofday() because each connection was done via the C codebase, not directly in PHP where you also get added overhead of parsing time and connection hashing. As a side, the PHP microtime() function just calls gettimeofday() with a little additional math to micro-ize it ( tv_usec * 1000000.00).

- I wanted to ensure that each connection was being freshly established, not following the PHP hash to see if one connection currently exists or not. To do this any and all connection timings were done in the PHP-4.3.4/ext/pgsql.c:557 or there about. The same was done for the mysql and mysqli systems. This could explain why the significant difference was found in the PHP *_pconnect() tests, which often are essentially the same C code as connect (only one gets added to an internal PHP hash.

I'm not trying to suggest that there are flaws in Joseph's method, rather it's interesting to note despite the differences in approach the results are rather similar (minus the *_pconnect() testing).

Thanks to everyone who have provided constructive criticisms via email and comments.

Posted by Dan at 05:30 AM | Comments (0)

March 22, 2004

postgres ... turtle or hare?

I've been working on some PHP code for use in a postgres database system. The code already supports both MySQL and Postgres, but there is a significant speed delta between the two in very basic things.

Discrepancies between query speeds isn't really what interests me. I know that due to architectural differences, MySQL is going to be slightly faster in a low traffic situation, with my (untested) assumption being that it will degrade to something similar to Postgres performance under high traffic yields. The big issue has become connection time to Postgres via a TCP socket in PHP. Using my own laptop as a test case, I'm running 7.4 postgres, a copy of Apache v1.whatever, and PHP 4.3.4 and receiving connect times of 19.88902* ms or larger (degree of variance is about 3%), while running MySQL the connection time is significantly lower. Why?

Looking through the ext/pgsql PHP code I haven't found any significant deviations that would suggest a reasoning for the time differential. That has left the use of the postgres libraries as the major barrier. Having read through the postgres mailing lists, the biggest suggestion is that I shouldn't be killing my Apache processes too quickly (not a valid reason in this case) to benefit from pconnects (duh). Outside of that I haven't found much of any reasoning or analysis on the subject.

* All times have been created through the addition of gettimeofday() calls before and after the PQconnectdb() call. Calls to zend_hash_update did not significantly influence the timing.

[EDIT: corrected function call name from PGconnectdb to PQconnectdb. Added category]

Posted by Dan at 04:31 AM | Comments (9)

March 10, 2004

More blobs

It seems that yet again, another DB2 user created a patch to support BLOBs in PHP. Unfortunately, the patch uses a type, SQL_BLOB, that only exists in DB2 land. The first reaction of course is, well why not preface the code with #if defined(HAVE_IBM_DB2) #endif markers, creating a DB2 specific area. My real issue with this is that it creates DB2 specific functionality, wherein users will expect to see the same functionality on ABC database as well. Yes the code already has plenty of sections that do this, but I'm attempting to slowly remove such requirements from the codebase.

Actually, I don't blame the DB2 community for using this method. Not only does it seem to be clean and relatively easy, it does work for them. I just believe there has to be a way in pure ODBC language to reproduce this so that all may bask in the glory that is BLOB support.

Posted by Dan at 06:46 AM | Comments (0)

October 14, 2003

Operators in PHP

There's an odd conversation about a proposal to add a new operator to PHP for regular expressions. To start things off, I don't like regex's at all. I find them useful for some tasks, but overall, they're confusing, bulky, slow, and just a pain in the ass to debug.

George basically summed up my feelings towards the whole thing with this post:


Now _that's_ the best idea so far! I actually propose a set of operators

=~ PCRE
=-) POSIX (happy, because there on every system)
>={ strstr (evil, because it's confusing - which is needle, which is haystack?)

Posted by Dan at 01:06 PM | Comments (0)

August 28, 2003

BLOB support for PHP

For those who haven't been paying attention today to the PHP mailing lists, I broke the build, or at least those that follow and use the ODBC extension by default (*cough* Windows *cough*). Why did I break the build?

The patch provided by Clara Liu was supposed to add/simplify BLOB support within PHP's current architecture. The catch (there's always a catch) is that there is no default data type for a BLOB in ODBC. As such, Clara decided to emulate BLOB behavior via a SQL_LONGVARBINARY type, which should work just fine. The build problem showed it's ugly face when the patch she sent expects a #define for a type of SQL_BLOB that just does not exist on the 3 major driver managers I support (iODBC, unixODBC, and Windows).

There are two options to fixing this that I see, but I do need people to test them.

The first patch simply adds in a "#define SQL_BLOB SQL_LONGVARCHAR" to the system and hopes all is right.

The second patch removes the use of the word SQL_BLOB as a case type, and hopes that REGISTER_LONG_CONSTANT will properly convert a SQL_BLOB over to a SQL_LONGVARBINARY.

Any testing of these patches would be appreciated since I cannot test them currently. As always comments and feedback are looked upon fondly. Email, comments, and trackbacks all work very well.

So why did I commit this basically untested patch? To get it tested first off. I had a handful of people claiming it worked great for them, but they were all limited to the same DB types and this patch needed further testing. Many of those wanting to test were Windows users without compilers, and as such the only way to enable their testing was to give them a snapshot. Seeing as I have now Windows machine around me, this was my best option. I don't have much hope that these current patches will even be tested, but I could be proven wrong.

On another note, if none of this works I'll abandon the attempts to make this work.

Posted by Dan at 03:16 PM | Comments (2)

August 20, 2003

PHP Blob support

Clara Liu, of Zealworks, had sent me a patch many moons ago that incorporated blob and clob support to the PHP ODBC functions. After having looked over it awhile I had decided that I really didn't understand the how the patch would work. I was hesitant to add such a change to the PHP source. Looking at my saved emails from the conversation she never really explained how or why this patch would work either, so I put it off until I could later look at it. What was so confusing about the patch? She copied the SQL_LONGVARBINARY portions of code and just renamed them to SQL_BLOB. It seems now that there are a significant number of users asking/requesting for BLOB support within the PHP system, and as such I would like to present to those users her patch for testing. Just be nice to my slow internet connection please.

The patch is done from a recent snapshot of CVS for PHP 5, but should be easily backported to PHP 4.3 systems. You can find the patch file at http://www.deadmime.org/~dank/blob_patch. If this works, thank Clara Liu for her efforts some how :)

If this works or does not work for you, please let me know via blog comments, trackbacks, or even email. Thanks for your time.

Posted by Dan at 06:51 PM | Comments (1)

ODBC fetch speed

While talking with Edin the other day, he complained about how painfully slow the PHP ODBC fetch system is for him. He provided me with some numbers and a sample query to back his complaints. While I don't remember the exact timings he had, it looks something like this:

Connecting - 0.5 seconds
Executing - 0.1 seconds
Fetching - 2.7+ seconds

Fetching is obviously what he was complaining about, as the query was simply returning 70 rows of single data elements. I suggested that he do one minor hack to his PHP source and recompile. After doing this hack, his fetching time went from that astronomically high number down to a lightning quick 0.1 seconds.

This hack is wonderful, and I actually tried to implement it back in the PHP 4.2.0 days, only to discover it broke a significant portion of Microsoft based clients.

You can implement this hack yourself too by changing your cursor type from SQL_CURSOR_DYNAMIC to SQL_CURSOR_FORWARD_ONLY within the php_odbc.c file (only the two instances of SQLSetStmt please). Mind you, I do not support such changes and cannot ask for anything more than feedback if it worked for you (via a TB or a Comment would be nice). You should see a dramatic change as well in your fetch speeds.

Supposedly with IBMs DB2 v8 (and greater) you can do this same alteration via a few words of SQL at the end of your query. For example adding in "FOR READ ONLY" will convert the cursor to a FORWARD_ONLY cursor and speed up your queries as well. The reason this works within the DB2 system is it seems they've dropped support for dynamic cursors awhile back. If anyone else has found that their DBs also support this change for cursor alterations, let me know.

Posted by Dan at 06:36 PM | Comments (0)

July 15, 2003

PHP ODBV Environments

In the past, one of the big issues with using an ODBC system via PHP has been the ability to control the environment. Many features and functionality disappear when a developer no longer can set a permanent cursor type, scroll length, or insert random other option here. Strangely enough, PHP has been able to work fairly successfully without such functionality for a substantial time, but it's becoming apparent that this functionality will need to be added (witness the use of the Microsoft cursor). As such, I would like to introduce to you the new (but not yet improved) function:

odbc_env()

odbc_env() will take a series of parameters in an array and attempt to set them based upon a key/value system in the array. It will not report any errors back though at this time, and just run through the entire list. Yuck! WTF? How can you do that you say? Well, I haven't figured out a better solution. In cases where the second of 30 or so options fail, do I not make the environment? What if the option exists on only some platforms and not others?

The return value of this function is a handle to the ODBC environment, which you can use to do a variety of functions now. For example:
odbc_set_env_attr()
odbc_get_env_attr()
odbc_data_source()
odbc_connect()

The biggest part of this function is that you actually do not need to even know this functionality exists. If you've been using ODBC fairly happily all along without any problems, you can safely ignore this functionality and not notice any service degradation.

Posted by Dan at 07:01 AM | Comments (0)

July 11, 2003

PHPs New ODBC

In the past I've threatened to re-write large portions of the ext/odbc system for PHP. A little while back, I decided to finally shut up and begin the re-write. A few things have changed since that initial email, and, hopefully, over the next few days I'll be able to highlight some of the changes and discuss how they will effect the everyday PHP/ODBC user.

Today's topic of interest: Database support

At this time I intend to fully drop support for native driver interfaces. Why? A couple of reasons come to mind.

First, the biggest of which, is that I have no way to test on database X. I don't see any real reason why any of the changes being made won't be supported by an ODBC v 3 compliant interface, but I will not make that assertation blindly.

Second, the purported speed increase in many of the more mainstream commercial databases is not entirely correct. In fact many would call this a flat out database myth. Ken North performed a rather detailed examination of speed differences between Oracle 8/9 native interfaces and ODBC interfacing (via Data Direct). Oddly enough this information is in direct conflict with the testing results Georg Richter and I have collected through MyODBC on a MySQL database which showed ODBC to be significantly slower (code maturity?).

In any case, the final result (as I read it) from this research is that while there are sections each is marginally better at, the overall effect is that neither is significantly better performance wise. As such this leaves me, the developer, with the convience factor, or better known as the least amount of work I need to do to make things happy. All signs point towards a Driver Manager only world, much along the lines of the Perl DB interface.

Third, it seems to me rather odd to support a native interface for a technology that is designed to work as a non-native interface. While not a technical reason at all, in an odd way it makes sense to me.

As I see right now there are three major ODBC Driver Managers to support; Microsoft, OpenLink Software's iODBC, and unixODBC. If you have another you think should be supported, please let me know via comments, email, or a TrackBack.

Posted by Dan at 09:47 AM | Comments (0)

June 02, 2003

PHP and Namespaces

It seems as if the namespace issue has come to a head now. Marcus Boerger has posted some commentary on a recent commit that seems to remove their support by Stas.

While I can't say I've kept up on the debate (it's been brewing for a long time), mainly because to me it seems rather silly not to support namespaces, the reasonings highlighted by Marcus for removal seem rather weak.

The : character is not something I would expect to work in a namespace... ever. While you could argue for a naming sequence that utilizes such a format, I could also argue for a naming sequence which doesn't. The point being it's synactic sugar and need not be a point of contention as it's not a technical issue, but rather an individual style issue. Style conformance is not an objective PHP is setting out to solve (unless it's for CVS commits), so I would suggest leaving the style issues to individual PHP users and ensure that all functionality works. Easier said than done I know.

I'm still a bit lost on the whole problem with import though. I'll have to do a bit of back-reading and research before I say to much. So far though, it sounds like it was designed for functionality in one means, but is now needing to be used in other means.

Posted by Dan at 09:28 AM | Comments (1)

May 23, 2003

PHP, ODBC, and everything in between

For about the last two years I've been threatening to rewrite large portions of ext/odbc (the Unified ODBC module) for PHP. Frighteningly enough, I've started. The basic goal is to provide a more modern interface, reflecting interface functionality found in some of the more popular database extensions (i.e. MySQL, pgsql), and a series of other improvements.

Work progresses and I believe the new functionality options have all been hammered out to the point of being static. The goal is to have this finished for inclusion in PHP5, as it readily breaks backwards compatibility completely. The problem with this is I've no real indicators of how many people use the ODBC extension anymore, nor does anyone really give me feedback when requested. I take the silence as being acceptance of the proposed changes, and will implement features/functionality that I deem necessary.

Essentially I've only added a few functions, but have drastically changed the underlying code to do this. For example you can now gain access to the odbc_environment, providing a larger amount of control to an ODBC developer. The first step was to upgrade from ODBC v2 to something more substantial (like ODBC v3.5). This transition is proving to be a bottleneck of sorts, but not the largest of them.

The largest bottleneck so far hasn't been a code issue, but rather my ODBC driver manager. When Apple introduced Jaguar they began to bundle a version of iODBC, but provided their own interface (Applications -> Utilities -> ODBC Manager) to it. Unfortunately, the Apple interface is poorly lacking in documentation, labeling, and ease of use. It seems as if the standard Apple development guidelines were thrown out, beaten with the ugly stick, abandoned, and left for dead when this gem was in the test and release phase. This becomes a problem when one is unable to diagnose why a DSN is not being found by the system.

Originally I had an installation of iODBC working with the Virtuoso database and was happily plugging along developing ODBC based test cases and systems. My upgrade to Jaguar bothered nothing. Eventually I decided to move to PostgreSQL to do some work on a more functional database. Exit usability, enter problems. I built the pgsqlodbc driver just fine for iODBC. After plugging in the appropriate values, I discover that iODBC cannot connect to pgsql installation. Thinking I built the binary wrong I recompile it and try again. The same result. More interesting is that the odbctest program cannot identify any DSN I've previously had entered now (???).

Further testing provides no solutions. I re-install the iODBC manager to discover that it too is still fubar, and does not recognize anything either. Now neither iODBC nor the Apple ODBC Manager seem to be working, leaving me high on dry on the development front. The iODBC help forum admins have suggested I've built PHP wrong, but when even their odbctest program doesn't work, I don't believe it's a PHP issue (I've also rebuilt numerous times since then). It seems others are having the same problems on the board too. Apple's help system seems to be devoid of any information regarding the ODBC Manager and how to make it work properly.

So I'm asking for help from those of who read this. I know OpenLink provides a driver to connect to pgsql back-ends, but charge a fee for this binary to which I can't/won't pay. I'd be willing to move to unixODBC, but have been stopped by the dreaded dlopen issue (and no I don't want to install dl_compat). I'd really like to know what has gone wrong here, and how to fix it.

Help.

Posted by Dan at 09:19 PM | Comments (6)

May 07, 2003

bundled software

Sterling has put some serious effort (to which I'm insanely jealous of the free time) into integrating libxml with PHP to the point of requesting that it now replace the expat library, thus requiring it to be bundled. In the past, one of my constant arguments on the internals@ (then php-dev@) list has been to stop this practice as it provides no added benefit to the system as a whole. The interesting bit is Brad and I had run this discussion before. The end result was to not bundled libxml, but did not really resolve any technical merits either. Let me explain my stance on this a bit.

- Advantage one: bundling ensures a specific base version of some library.

The problem isn't wanting to establish a base version of library code, but rather trying to keep in sync with an actively maintained library. One could argue that it need only be updated when the community as a whole requests/requires it. Unfortunately, this doesn't take into consideration the power of a squeaky wheel that will complain feature XYZ doesn't exist in your software, but does in product ABC. In the case of an actively maintained software it may become a full time position to have someone maintain concurrent source (libxml is actively developed). In the world of open source software requiring a developer to do anything is not an exceptionally prosperous route for any.

- Advantage two: bundling allows fixes/patches to the code

A very true idea, but shouldn't such patches be passed back to the maintainers? In the case where they have been ignored (i.e. libgd), shouldn't this be reason enough to branch the code base in question and provide a new venue of distribution? It makes little to no sense to create a proprietary version of a code base, when the work could be useful to others. This also ties into a problem found in Advantage 3.

- Advantage three: bundling allows my software to ensure functionality

Typically in a server based software, those installing it are cognizant of what they are trying to accomplish. With this in mind, an installer will (typically) take the time to read what is required and necessary for such functionality to be enabled, i.e. FAQs, configure help, and potentially the manual. I won't say that all installers do, as one can quickly scan the PHP mailing lists to prove me wrong. This type of reasoning though leads to a downward spiral that, in my mind, ends with an application of an enormous size.

Where does the line get drawn for what libraries get bundled and what libraries don't? If you're really worried about providing a base line functionality always, it surely would be best to bundle every external dependency, right? The point being this is what a configure script is designed to do. Locate, identify, and use the requested functionality. When the necessary support is not found, throw up an error and give the person installing a chance to correct the error or remove the configure option from their install.

Leaving the installation of external dependencies to the user base has a few added side benefits. First, it only requires that maintainers of the interface to the library keep up to speed with any API changes, which typically should be few and far between (given for any amount of usability).

Second it removes the onus on your software from the point of blame. As with all software today, bugs and holes are rather prevalent no matter what QA process is employed. The hope is to keep these to a minimum and their impact even less. By bundling external software you are now (potentially) adding functionality to a system without the sysadmins knowledge. This is a dangerous path to take with respect to system security, and is ( in some cases) considered a trojan horse. If a vulnerability is discovered in this software, it now becomes the library bundlers fault for including the piece of vulnerable software. More importantly if a sysadmin does not realize that this software is installed, they may not upgrade a vulnerable library leading to system compromise. In either case, the onus of a non-secure software is placed upon a project. This is a stigma that is near impossible to remove once the seed has been planted.

Third if a vulnerability is discovered in the bundled software, getting users to upgrade an entire product installation for one library provides no added benefit. Wasn't this the point of using shared libraries in the first place? If you have already made customizations to the bundled software, the end user cannot just simply grab the latest version and update it.

So why not just unbundle everything?

Realize that unbundling an already established element becomes increasingly difficult. Within the PHP project there is a mindset that can be summed up with the idea of 'if it exists now it should exist always'. AKA the breaking of user perceived rules (i.e. expat bundled) is something that shouldn't be broken, and this is typically a good mantra to take for providing backwards compatibility.

On the one hand I agree with this, an established user base is hard to steer towards a new goal. On the other hand, I'd rather see PHP be minimalist. :) In either case I think libxml should not be bundled...

Posted by Dan at 07:52 AM | Comments (1) | TrackBack