Performance

Software performance depends on many choices: language (like Rust versus Python), framework (like FastAPI versus Django), architecture (e.g. map-reduce), networking (e.g. batch requests), etc. Many choices are costly to change at a later date (e.g. full rewrite).

Profiling

Use profiling to:

  • Identify slow dependencies, in case faster alternatives can be easily swapped in

  • Find major hotspots, like a loop that runs in exponential time instead of quadratic time

  • Find minor hotspots, if changing language, etc. is too costly

Once a hotspot is found, the solution might be to:

See also

  • Scalene for CPU, GPU and memory statistical profiling

  • Austin for CPU and memory statistical profiling, including running processes

  • psrecord to chart CPU and memory usage, including running processes

  • psutil

CPU

  • cProfile is a deterministic profiler, measures functions, and lacks support for threads. For example:

    cat packages.json | python -m cProfile -o code.prof ocdskit/__main__.py compile > /dev/null
    gprof2dot -f pstats code.prof | dot -Tpng -o output.png
    open output.png
    
  • py-spy is a statistical profiler, measures lines, and supports threads, subprocesses and native extensions. The top command can attach to a running process.

  • pprofile is a statistical profiler (and very slow deterministic profiler), measures lines, and supports threads and PyPy.

  • vmprof is a statistic profiler, measures functions or lines, and supports threads and PyPy (and is aware of JIT).

  • timeit is a deterministic profiler for code snippets.

Other profilers

Memory

Tip

When profiling a Django project, ensure DEBUG = False: for example, by running env DJANGO_ENV=production.

Memory profilers have two use cases: reduce memory consumption (like in data processing) and fix memory leaks (like in long-running processes). Tools for reducing memory consumption typically measure peaks and draw flamegraphs; that said, they also can be used for memory leaks, by generating work that leaks memory.

When evaluating memory usage in production, remember the differences between heap memory and resident memory. In particular, resident memory is not freed immediately.

Optimizations

  • Set __slots__ on classes or slots=True on dataclasses that are instantiated frequently.

    “The space saved over using __dict__ can be significant. Attribute lookup speed can be significantly improved as well.”

Reference