PostgreSQL

Note

The Deploy documentation covers many topics relating to PostgreSQL, including: configuration, maintenance, and usage by clients. This page addresses topics relating to software development.

Connect to a database

Connect to the database using a connection string stored in the DATABASE_URL environment variable.

In Python, connect to the database using dj-database-url if using Django, or psycopg2.connect() otherwise.

To set the search path for a PostgreSQL connection, append to the connection string:

psycopg2
?options=-csearch_path%3Dmyschema,public
dj-database-url
?currentSchema=myschema,public

Identify the client

Database administrators need to identify the sources of queries, in order to notify developers of inefficient queries or alert users whose queries will be interrupted by maintenance. For example:

  • Django applications set the application_name query string parameter in the PostgreSQL connection URI, or use a service-specific user

  • Kingfisher Summarize uses the psycopg2 package, and adds /* kingfisher-summarize {identifier} */ as a comment to expensive queries

  • Kingfisher Colab uses the ipython-sql package, and adds the Google Colaboratory notebook URL as a comment to all queries

Define tables

  • In PostgreSQL, use TEXT instead of other character types, as there is no performance difference.

  • Use NOT NULL with character types, as recommended by Django.

  • Use NOT NULL with JSON types, and set the default to an empty object, array or string.

  • In Python, do not set default values to {} or []. In Django, use default=dict and default=list. In Pydantic (including SQLModel), use default_factory=dict and default_factory=list.

See also

Django models

Name conventions

  • Timestamp columns: created_at, updated_at and deleted_at. (Some projects use created and modified.)

Load (or dump) data

Use the copy meta-command instead of the COPY command, so that file accessibility and privileges are those of the user, not the server – such that no SQL superuser privileges are required.

Construct SQL statements

Follow best practices to avoid accidental errors and SQL injection. The code samples below use the psycopg2 Python package.

  • Pass parameters to SQL queries, using the second argument to the execute method. This adapts the Python value’s type (like bool, int, str) to the correct SQL representation:

    cur.execute("SELECT * FROM data WHERE data->>'date' > %(date)s", {'date': '2020-01-01'})
    

    DO NOT use string interpolation (%):

    cur.execute("SELECT * FROM data WHERE data->>'date' > '%(date)s'" % {'date': '2020-01-01'})  # WRONG
    

    DO NOT use string concatenation (+):

    cur.execute("SELECT * FROM data WHERE data->>'date' > '" + '2020-01-01' + "'")  # WRONG
    

    AVOID using literal values:

    cur.execute("SELECT * FROM data WHERE data->>'date' > '2020-01-01'")  # AVOID
    

    For example, if you forget that dates are represented as strings in SQL, you might do the following, which evaluates 2020-12-31 to 1977, which will match everything in the database:

    cur.execute("SELECT * FROM data WHERE data->>'date' > 2020-12-31")  # BROKEN
    
  • Use named placeholders like %(collection_id)s. This allows you to use the same placeholder multiple times in the query, while only having to pass a single parameter, and to edit and re-order your query without re-ordering your parameters.

    cur.execute("""
        SELECT * FROM release WHERE collection_id = %(collection_id)s
        UNION
        SELECT * FROM record WHERE collection_id = %(collection_id)s AND ocid = %(ocid)s
    """, {'collection_id': 1, 'ocid': 'ocds-213czf-1'})
    

    AVOID use anonymous placeholders (%s):

    cur.execute("""
        SELECT * FROM release WHERE collection_id = %s
        UNION
        SELECT * FROM record WHERE collection_id = %s AND ocid = %s
    """, (1, 1, 'ocds-213czf-1'))  # AVOID
    
  • If you are writing a query template in which you want to substitute column names or table names, use the format method and the SQL and Identifier classes (documentation):

    from psycopg2.sql import SQL, Identifier
    
    cur.execute(SQL("SELECT * FROM {table}").format(table=Identifier('collection')))
    

    You can use this together with passing parameters:

    cur.execute(SQL("SELECT * FROM {table} WHERE id = %(id)s").format(
        table=Identifier('collection')), {'id': 1})
    

    Remember to format the SQL() object. DO NOT format the string itself:

    cur.execute(SQL("SELECT * FROM {table} WHERE id = %(id)s".format(
        table='collection'), {'id': 1})  # WRONG
    

    DO NOT use string interpolation (%):

    cur.execute("SELECT * FROM %s" % 'collection')  # WRONG
    

    DO NOT use string concatenation (+):

    cur.execute("SELECT * FROM " + 'collection')  # WRONG
    

    AVOID using anonymous placeholders:

    cur.execute(SQL("SELECT * FROM {}".format('collection'))  # AVOID
    

Paginate rows

Do not use LIMIT with OFFSET. OFFSET becomes more inefficient as its value increases. Instead, filter on the table’s primary key, which has near-constant performance. For example:

SELECT id, mycolumn
FROM mytable
WHERE
    id > %s
    AND myfilter = %s
ORDER BY id
LIMIT 1000

Format code

Format SQL files with pg_format, which has web and command-line interfaces.

Web

  1. Open https://sqlformat.darold.net

  2. Paste your SQL text

  3. Set Functions to Lower case

  4. Click Format my code

CLI

On macOS, using Homebrew, install it with:

brew install pgformatter

Then, change into the project’s directory and run, for example:

find . -name '*.sql' -exec pg_format -f 1 -o {} {} \;

Generate entity relationship diagram

  1. Install SchemaSpy

  2. Download the PostgreSQL JDBC Driver

  3. Rename the JAR files to schemaspy.jar and postgresql.jar

  4. Move the JAR files to a preferred location

Run SchemaSpy, using appropriate values for the -db (database name), -s (schema, optional), -u (user) and -p (password, optional) arguments:

java -jar schemaspy.jar -t pgsql -dp postgresql.jar -host localhost -db DATABASE -s SCHEMA -u USER -p PASSWORD -o schemaspy -norows

Use either the schemaspy/diagrams/summary/relationships.real.compact.png or schemaspy/diagrams/summary/relationships.real.large.png file and check the schemaspy/diagrams/orphans/ directory.

Reference