| Sample report | Documentation | Download | Project page | Credits |
If you have a busy PostgreSQL or MySQL database application, you might want to analyze the queries to see if they can be improved. Here's a little utility to help with that.
To use PQA, you'll need Ruby, PQA itself, and of course a PostgreSQL or MySQL installation.
Here's an example report generated from data collected on RubyForge's database activity.
You can install PQA via RubyGems by downloading the RubyGem file and doing a:
gem install pqa-1.6.gem
And then you can run PQA like this:
pqa -file /path/to/your/logfile -normalize -top 5
Or, if you want to use the zip file, download it and unzip it into any directory. Then open a command line window, cd into the pqa-1.6 directory, and run it like this:
[tom@hal lib]$ ./pqa.rb -file ../sample/pglog_sample.log -normalize -top 5 645 queries (65 unique), longest ran in 0.370524 seconds), parsed in 0.088793 seconds ### Queries by type SELECTs: 593 (92%) INSERTs: 51 (8%) UPDATEs: 1 (0%) ### 5 most frequent queries 61 times: SELECT total FROM forum_group_list_vw WHERE group_forum_id='' 46 times: SELECT plugin_id, plugin_name FROM plugins 46 times: INSERT INTO activity_log (day,hour,group_id,browser,ver,platform,time,page,type) VALUES (0,'','','','','','','',''); 40 times: select classname from supported_languages where language_code = '' 40 times: SELECT language_code FROM supported_languages WHERE language_id=''
Parameters:
-file - required - The name of the data file to analyze
-top N - optional - "Top N queries", defaults to 10.
-format N - optional - Either "text" or "html", defaults to "text"
-normalize - optional - Removes quoted items and various bits of whitespaces
-logtype - optional - The type of log file: syslog, pglog, or mysql - defaults to pglog
Here are some recommendations on configuring postgresql.conf:
| Variable | Recommending setting | Notes |
|---|---|---|
| log_statement | true | This reports the actual SQL statements; turn this one on. |
| log_duration | true | This reports how long it took to run each statement; turn this one on too. |
| log_pid | true | This reports the process id (pid) of the client making the query. This is needed to correctly track messages from multiple clients. Turn this one on. |
| log_min_duration_statement | time in ms (see note) | This setting allows only queries that are taking more than the specified number of milliseconds are reported. All SQL statements that run for the time specified or longer will be logged with their duration. Setting this to zero will print all queries and their durations. It is not necessary to enable log_statement or log_duration to use feature. log_min_duration_statement was introduced with PostgreSQL 7.4. |
| log_connection | false | This reports when connections are established. Turn this one off. |
| log_timestamp | false | This reports the time at which each event occurs. Turn this one off as well. |
| log_error_verbosity | default | This controls how verbose the log file is; 'default' is the format PQA expects. |
| lc_messages | 'C' | This ensures that the log file will contain the English messages - that's necessary because PQA looks for strings like "Duration" |
There are two ways to collect data from your PostgreSQL database server:
postmaster
startup script and redirect output into a file. On my RedHat 8
workstation I've got PostgreSQL installed from RPM, and so the
appropriate file is /etc/rc.d/init.d/postgresql and the script snippet looks like this:
su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -o '-i' -D $PGDATA -p /usr/bin/postmaster -o '-p ${PGPORT}' start > /var/lib/pgsql/data/serverlog.log 2>&1" < /dev/null
And then restart the database.
syslog parameter in postgresql.conf
to at least 1. If you don't want to have all the other system messages
mixed up with your query data, you should change your syslogd
configuration. This is located in /etc/syslog.conf. To tell syslog to put every PostgreSQL messages in the /var/log/postgresql you have to add a line like this one :
local0.* /var/log/postgresql
local0 being the same as the syslog_facility setting of postgresql.conf. You should also tell syslog not to log those messages in /var/log/messages. Just add local0.none to the line :
*.info;mail.none;authpriv.none;cron.none;local0.none /var/log/messagesDon't forget to restart or reload syslog in order to apply your changes (for example
service syslog restart). Then restart the database.
And to collect data from MySQL:
/etc/my.cnf by adding a line to the mysqld section: log=/var/log/mysql-queries.log
touch /var/log/mysql-queries.log && chown mysql:mysql /var/log/mysql-queries.log
/etc/rc.d/init.d/mysql restart
Note that PQA assumes that the ruby binary is in /usr/local/bin/. If it's somewhere else, like /usr/bin/, you'll need to edit the first line of pqa.rb to reflect that location.
If there's anything we can add to make this documentation more helpful, please let us know about it by posting to the project forums. Thanks!
PHP programmers may be interested in a similar application written by Havard Eide - EPCQueryAnalyzer.
Contributors
Tom Copeland wrote an article (here's the errata) about PQA which might be helpful reading. It's a bit outdated, though, and it includes some statements about syslog that reflect my ignorance rather than syslog's actual capabilities :-)
The article states that to turn off logging, you can just comment out the syslog configuration settings and do a postgresql reload. That won't work; instead, you'll need to change the settings back to what they were originally and then postgresql reload. Thanks to Neil Conway for pointing this out to me.
The article also has the wrong name for the PostgreSQL IRC channel - it's #postgresql, not #pgsql. Thanks to Robert Treat for noticing this.