PostgreSQL#
Note
The Deploy documentation covers many topics relating to PostgreSQL, including: configuration, maintenance, and usage by clients. This page addresses topics relating to software development.
See also
Connect to a database#
Connect to the database using a connection string stored in the DATABASE_URL
environment variable.
In Python, connect to the database using dj-database-url if using Django, or psycopg2.connect() otherwise.
To set the search path for a PostgreSQL connection, append to the connection string:
?options=-csearch_path%3Dmyschema,public
?currentSchema=myschema,public
Identify the client#
Database administrators need to identify the sources of queries, in order to notify developers of inefficient queries or alert users whose queries will be interrupted by maintenance. For example:
Django applications set the application_name query string parameter in the PostgreSQL connection URI, or use a service-specific user
Kingfisher Summarize uses the psycopg2 package, and adds
/* kingfisher-summarize {identifier} */
as a comment to expensive queriesKingfisher Colab uses the ipython-sql package, and adds the Google Colaboratory notebook URL as a comment to all queries
Construct SQL statements#
Follow best practices to avoid accidental errors and SQL injection. The code samples below use the psycopg2 Python package.
Pass parameters to SQL queries, using the second argument to the
execute
method. This adapts the Python value’s type (likebool
,int
,str
) to the correct SQL representation:cur.execute("SELECT * FROM data WHERE data->>'date' > %(date)s", {'date': '2020-01-01'})
DO NOT use string interpolation (
%
):cur.execute("SELECT * FROM data WHERE data->>'date' > '%(date)s'" % {'date': '2020-01-01'}) # WRONG
DO NOT use string concatenation (
+
):cur.execute("SELECT * FROM data WHERE data->>'date' > '" + '2020-01-01' + "'") # WRONG
AVOID using literal values:
cur.execute("SELECT * FROM data WHERE data->>'date' > '2020-01-01'") # AVOID
For example, if you forget that dates are represented as strings in SQL, you might do the following, which evaluates
2020-12-31
to1977
, which will match everything in the database:cur.execute("SELECT * FROM data WHERE data->>'date' > 2020-12-31") # BROKEN
Use named placeholders like
%(collection_id)s
. This allows you to use the same placeholder multiple times in the query, while only having to pass a single parameter, and to edit and re-order your query without re-ordering your parameters.cur.execute(""" SELECT * FROM release WHERE collection_id = %(collection_id)s UNION SELECT * FROM record WHERE collection_id = %(collection_id)s AND ocid = %(ocid)s """, {'collection_id': 1, 'ocid': 'ocds-213czf-1'})
AVOID use anonymous placeholders (
%s
):cur.execute(""" SELECT * FROM release WHERE collection_id = %s UNION SELECT * FROM record WHERE collection_id = %s AND ocid = %s """, (1, 1, 'ocds-213czf-1')) # AVOID
If you are writing a query template in which you want to substitute column names or table names, use the
format
method and theSQL
andIdentifier
classes (documentation):from psycopg2.sql import SQL, Identifier cur.execute(SQL("SELECT * FROM {table}").format(table=Identifier('collection')))
You can use this together with passing parameters:
cur.execute(SQL("SELECT * FROM {table} WHERE id = %(id)s").format( table=Identifier('collection')), {'id': 1})
Remember to format the
SQL()
object. DO NOT format the string itself:cur.execute(SQL("SELECT * FROM {table} WHERE id = %(id)s".format( table='collection'), {'id': 1}) # WRONG
DO NOT use string interpolation (
%
):cur.execute("SELECT * FROM %s" % 'collection') # WRONG
DO NOT use string concatenation (
+
):cur.execute("SELECT * FROM " + 'collection') # WRONG
AVOID using anonymous placeholders:
cur.execute(SQL("SELECT * FROM {}".format('collection')) # AVOID
Generate Entity Relationship Diagram#
Install SchemaSpy
Download the PostgreSQL JDBC Driver
Rename the JAR files to
schemaspy.jar
andpostgresql.jar
Move the JAR files to a preferred location
Run SchemaSpy, using appropriate values for the -db
(database name), -s
(schema, optional), -u
(user) and -p
(password, optional) arguments:
java -jar schemaspy.jar -t pgsql -dp postgresql.jar -host localhost -db DATABASE -s SCHEMA -u USER -p PASSWORD -o schemaspy -norows
Use either the schemaspy/diagrams/summary/relationships.real.compact.png
or schemaspy/diagrams/summary/relationships.real.large.png
file and check the schemaspy/diagrams/orphans/
directory.