Anthony Gentile

Postgres Cursors and PHP

October 4, 2010, 8:35am

I recently found myself in a situation where I was needing to send query results to a csv file and gzip it for some report generation. To accomplish this, I created a script to be executed that would call Postgres' COPY command and then execute gzip on the resulting file. The code looks something like this:

// execute my script and continue on without waiting for it to finish
exec('/path/to/my/script.php > /dev/null 2>&1 &');

...and the script.php file

// script.php snippet
set_time_limit(0);
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$pdo->beginTransaction();
$pdo->exec('SET transaction_read_only=true');
$sql = "SELECT a.num AS one, b.num AS two, a.num * b.num AS result FROM (SELECT generate_series AS num FROM generate_series(1, 100)) a, (SELECT generate_series AS num FROM generate_series(1, 1000)) b"

$save_file = '/tmp/blah.csv';
$pdo->exec("COPY ({$sql}) TO '{$save_file}' WITH CSV HEADER");
if (file_exists($save_file)) {
    exec("gzip -9 $save_file");
}
$pdo->rollback();

This is all fine and dandy, unless the database server is not on the same box as the web server. In that case, if you want to store the gzipped csv file on the web server you need to get it from the database server to the web server. You could do this with an NFS mount or rsync, but for my use case I wanted to see if I could keep it within PHP and the web server. This poses some problems. Depending on the query, your result set might be very large; so large that it exhausts the allocated memory in PHP trying to shove that result set into an array. As far as I know, PDO does not support fetching query results in chunks (unless you are dealing with LOBs) which is what I need to do.

With the help of Robert Treat and Hubert Lubaczewski, I was made aware that Postgres has database level cursors that one can use to run a query and delay fetching the result set. For those unfamiliar with database cursors, they are simply a way for the client to iterate through a result set. Using PDO, I would be able to declare my cursor for the SQL statement and then fetch through the result set by whatever count size I wanted. Let me show the refactored code:

set_time_limit(0);
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$pdo->beginTransaction();
$pdo->exec('SET transaction_read_only=true');
$sql = "SELECT a.num AS one, b.num AS two, a.num * b.num AS result FROM (SELECT generate_series AS num FROM generate_series(1, 100)) a, (SELECT generate_series AS num FROM generate_series(1, 1000)) b"

$save_file = '/tmp/blah.csv';

$pdo->exec("DECLARE name_of_cursor NO SCROLL CURSOR FOR ({$sql})");
$stmt = $pdo->query("FETCH 200 FROM name_of_cursor");
$results = $stmt->fetchAll(PDO::FETCH_ASSOC);

while(count($results) > 0) {
    $c = count($results);
    for ($i = 0; $i < $c; $i++) {
        file_put_contents($save_file, implode(',',$results[$i]) . "\n", FILE_APPEND | LOCK_EX);
    }
    $stmt = $pdo->query("FETCH 200 FROM name_of_cursor");
    $results = $stmt->fetchAll(PDO::FETCH_ASSOC);
}

if (file_exists($save_file)) {
    exec("gzip -9 $save_file");
}
$pdo->rollback();

A couple things to note here. When declaring the cursor I have set NO SCROLL, which means that I am expecting to always move forward and not back when fetching the rows. This means there is a possibility of Postgres making things a little faster for me. You can however move around as documented here. Something else to consider is how many rows to fetch at a time, as depending on your memory limit and what you are fetching (text fields, blobs, etc), you could potentially exhaust the memory fetching only 200 rows at a time. To combat this, I decided to do some calculation to figure out what my fetch row count should be. Here is the code:

set_time_limit(0);
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$pdo->beginTransaction();
$pdo->exec('SET transaction_read_only=true');
$sql = "SELECT a.num AS one, b.num AS two, a.num * b.num AS result FROM (SELECT generate_series AS num FROM generate_series(1, 100)) a, (SELECT generate_series AS num FROM generate_series(1, 1000)) b"

$save_file = '/tmp/blah.csv';

$pdo->exec("DECLARE name_of_cursor NO SCROLL CURSOR FOR ({$sql})");
$mem_limit = getMemoryLimit();

$initial_mem = memory_get_usage();
$initial_fetch_count = 100;
$stmt = $pdo->query("FETCH $initial_fetch_count FROM name_of_cursor");
$results = $stmt->fetchAll(PDO::FETCH_ASSOC);
$fetch_mem = memory_get_usage();
$fetch_mem = $fetch_mem - $initial_mem;
// calculate what subsequent amount of rows we should fetch
$fetch_count = floor($mem_limit / $fetch_mem);

while(count($results) > 0) {
    $c = count($results);
    for ($i = 0; $i < $c; $i++) {
        file_put_contents($save_file, implode(',',$results[$i]) . "\n", FILE_APPEND | LOCK_EX);
    }
    $stmt = $pdo->query("FETCH $fetch_count FROM name_of_cursor");
    $results = $stmt->fetchAll(PDO::FETCH_ASSOC);
}

if (file_exists($save_file)) {
    exec("gzip -9 $save_file");
}
$pdo->rollback();


// adapted from http://www.php.net/manual/en/faq.using.php#78405
function getMemoryLimit($reserve_amount = 20, $reserve_unit = 'm')
{
    $value = ini_get('memory_limit');
    if (is_numeric( $value )) {
        return $value;
    }

    $qty = substr($value, 0, strlen($value) - 1 );
    $unit = strtolower(substr($value, strlen( $value ) - 1));
    switch ($unit) {
        case 'k':
            $multiplier = 1024;
        break;
        case 'm':
            $multiplier = 1048576;
        break;
        case 'g':
            $multiplier = 1073741824;
        break;
    }

    if (isset($multiplier)) {
        $qty *= $multiplier;
        if ($reserve_unit == $unit) {
            $reserve = ($reserve_amount * $multiplier);
            if ($qty - $reserve > 0) {
                $qty -= $reserve;
            }
        }
    }
    return (int) $qty;
}

So what I am doing here is fetching an initial 100 rows of whatever SQL statement I may be wanting to run. I then find out how much memory was used to capture those results and then use that number and my total available memory to figure out how many rows should be subsequently fetched. Hopefully this post has been helpful, Postgres has some cool features that I am only recently discovering and it has been fun to be able to take advantage of them within PHP.

EDIT: I made a wrong assumption about the abilities of PDO, please refer to Wez Furlong's comment below for some additional examples on how to implement cursors with PDO.

Take a look at this article, and scroll down to "Paging data, scrolling cursors, positioned updates": http://www.ibm.com/developerworks/data/library/techarticle/dm-0505furlong/
Alternatively here's, a short code snippet that uses a scrolling cursor to fetch data from postgres:

$db = new PDO('pgsql:');

$q = $db->prepare('select * from severities', array(
  PDO::ATTR_CURSOR => PDO::CURSOR_SCROLL
));
$q->execute();
foreach ($q as $row) {
  var_dump($row);
}

Both of these things enable scrolling cursors. For the postgres driver, this is basically doing under the covers what you do in your code above.
The IBM article shows how to seek around in the resultset (admittedly, that's for ODBC–I don't recall how that works with the postgres driver).

An additional comment from Sean Coates

To copy a table from a remote postgres server to a file on the local filesystem you can do:

local_linux> psql -U myusername -h remote_pg_server -p 5432 my_remote_db
psql> \copy data_tb to 'my_localfile';

In other words, using "copy" (an SQL command) assumes the destination file to be on the same file system as the postgres server. While using "\copy" (a psql command), assumes the destination file to be on the server where psql is being run.
You can do the same to copy from a local file to a table on a remote db.

Cheers.