Skip to main content

Recovery recipes

platform v0.9.11verified 2026-05-14

These recipes are written for platform operators of a Delphi deployment. They assume you have:

  • SSH access to the service hosts and can docker compose on each.
  • IAM permission to read/write the relevant AWS SSM parameters and Secrets Manager secrets (or your equivalent secret store).
  • A backup strategy already in place for Postgres and configuration. Delphi does not enforce a specific backup tool; verify yours before you need it.
Stop-and-ask points

Every recipe has explicit STOP markers. They mark steps that can lose data or extend the outage if you skip the surrounding check. Treat them as hard gates, not suggestions.

Recipes in this section

  • Redeploy a service — re-run update.sh cleanly, including the case where a migration partly succeeded.
  • Rotate secrets — change a value in Secrets Manager (or equivalent), get every container to re-read it, with the right restart order.
  • Restore Postgres — bring back the platform database from a backup. The most disruptive recipe — read the whole page before starting.
  • Replay stuck jobs — when the Ops service is healthy but a queue stalled.
  • Regain superuser access — when no human can log in as superuser anymore.

Before any recipe

Capture a baseline so you can compare:

docker compose ps
docker compose images
df -h

And from SigNoz, snapshot the error rate per service for the affected window. The first thing you'll want to know post-fix is "did this actually move the needle?".