Bootstrap and init

platform v0.9.11verified 2026-05-14

Once the instance is provisioned with the right bootstrap .env and a way to fetch the config bundle from S3, every service uses the same init.sh flow. The same flow runs on first boot and on every subsequent restart — it is idempotent.

The whole point of this flow is that secrets never land on disk. fetch-env.sh prints export VAR=value lines to stdout; init.sh evaluates the output into the shell, then docker compose up inherits the environment. There is no .env file in /opt/services/<role>/ — init.sh actively shreds any leftover one as a safety net.

What `init.sh` does, step by step

Every service ships its own init.sh (each one specialized for that role's needs), but they all follow the same outline. The lines below are taken from the API service's script; every other service is structurally similar.

1. Load the bootstrap environment

DEPLOYMENT_ENV="/opt/deployment/.env"
source "$DEPLOYMENT_ENV"

This brings ENVIRONMENT, HOSTNAME, NAMESPACE, ECR_REGISTRY, ECR_TAG, CONFIG_BUCKET, CONFIG_REF, AWS credentials, and any proxy variables into the shell. None of these come from AWS — they were placed by cloud-init.

2. Pull the latest config bundle from S3 (unless `--skip-fetch`)

if [ "$SKIP_FETCH" = false ] && command -v fetch-config &>/dev/null; then
    fetch-config
fi

fetch-config is a one-liner placed by cloud-init that runs aws s3 sync s3://${CONFIG_BUCKET}/deployments/<slug>/${CONFIG_REF}/<role>/ /opt/services/<role>/ plus the same for common/. After this step the host has the latest docker-compose.yaml, init.sh, vars.yaml, and any service-specific files (Caddyfile, otel-collector-config.yaml, …) for the configured CONFIG_REF.

If you are running update.sh --config-ref <new-ref> to roll a config change, this is the step that pulls the new bundle.

3. Configure Docker log rotation (first run only)

if [ ! -f /etc/docker/daemon.json ] || ! grep -q max-size ...; then
    cat > /etc/docker/daemon.json <<DAEMONJSON
    { "log-driver": "json-file", "log-opts": { "max-size": "20m", "max-file": "3" } }
DAEMONJSON
    systemctl restart docker
fi

Docker logs ship to SigNoz over OTel; the local rotation just prevents disk fill if the OTel collector is briefly down.

4. Auto-detect `PRIVATE_IP` if not supplied

Some scripts (API, Voice) auto-detect PRIVATE_IP from the first 10.x interface and persist it to /opt/deployment/.env if it wasn't already set. This keeps the bootstrap .env minimal in cloud-init and lets the host figure out its own private IP at runtime.

5. Resolve env from SSM / Secrets Manager into memory

The single most important step:

COMMON_DIR="$(cd "$SERVICE_DIR/../common" && pwd)"
ENV_EXPORTS="$("$COMMON_DIR/fetch-env.sh" \
    --manifest "$SERVICE_DIR/vars.yaml" \
    --bootstrap "$DEPLOYMENT_ENV" \
    --format export)"
eval "$ENV_EXPORTS"

fetch-env.sh reads vars.yaml, fetches every declared variable from the right source (bootstrap, SSM, SM, external secret ARN, default, or compose.template), and prints export VAR=value lines to stdout. The eval lifts them into the shell that's about to run docker compose up.

The full resolution order, namespace conventions, and gotchas are in Environment resolution.

6. Service-specific "magic"

This is the per-service variation. Each init.sh sandwiches one or more of these between step 5 and docker compose up:

Service	Magic
API	`prepare-tls.sh` decodes `INTERNAL_CA_CRT_B64` from SM into `./tls/ca.crt` (mounted into the container); fall-back to an empty placeholder file when not provided.
Web	Caddy auto-issues Let's Encrypt for `DOMAIN_TELWEB`. TelWeb entrypoint: Prisma `migrate deploy` then baseline `prisma db seed`; operators run `docker compose exec -it voiceai-telweb delphi-setup` after the container is healthy — see First use.
Voice	Copies `log-to-span` binary into the right path; `prepare-tls.sh` for the optional Postgres/Redis CA bundle.
TelPro	Maps `LOG_LEVEL` → numeric `RTP_LOG_LEVEL` / `DEBUG_LEVEL`; resolves TLS via Certbot, SM-decoded PEM, manual files, or self-signed (in priority order).
Database	`prepare-tls-server.sh` decodes `INTERNAL_TLS_CERT_B64` / `INTERNAL_TLS_KEY_B64` / `INTERNAL_CA_CRT_B64` from SM, sets ownership for postgres + pgbouncer.
Media	TLS resolution in three branches (`MEDIA_TLS_*_B64` env-PEM → existing files → self-signed); writes `tls/ca-for-clients.pem` for downstream consumers to trust.
SigNoz	Clones the upstream SigNoz repo; converts named volumes to bind mounts on `/mnt/signoz-data`; configures redsocks-based transparent proxy via Squid.
Ops	Picks SMTP vs SES email transport from `EMAIL_TRANSPORT`; static AWS keys must be unset when `AWS_USE_INSTANCE_PROFILE=true`.
Squid	None — the container is configured entirely via `squid.conf`.

7. Log into the registry and pull images (unless `--skip-images`)

aws ecr get-login-password --region "${AWS_REGION:-eu-central-1}" | \
    docker login --username AWS --password-stdin "${ECR_REGISTRY}"
docker compose pull

8. Safety: shred any leftover `.env`

if [ -f "$SERVICE_DIR/.env" ]; then
    shred -u "$SERVICE_DIR/.env" 2>/dev/null || rm -f "$SERVICE_DIR/.env"
fi

This catches the case where an operator accidentally left a .env from manual debugging — the live secrets the containers are about to use are in the parent shell's environment, not on disk.

9. Start Compose

docker compose up -d --force-recreate --remove-orphans

The current shell (with all the evald exports) is what docker compose inherits. Compose then injects each variable into the right container according to the environment: and ${VAR} substitutions in docker-compose.yaml.

10. Healthcheck loop and image pruning

Each service waits for its primary container to report ready (/health/live, pg_isready, redis-cli ping, etc.), then prunes dangling and unused Docker images so disk usage stays bounded.

Same flow, every restart

The same init.sh runs on every restart. Use the standard flags to skip steps that don't apply:

--restart-only — skip fetch-config and image pulls; just re-resolve env and docker compose up. Use this after a value changes in SSM / Secrets Manager.
--skip-images — fetch-config runs but no docker pull; useful when iterating on a docker-compose.yaml change in the bundle without rolling images.
--skip-fetch — don't sync from S3; useful if you're testing local edits to /opt/services/<role>/ (be careful — the next update.sh will overwrite them).

The deployment-manager script update.sh (shared across all services) is the recommended entry point in steady state — it updates /opt/deployment/.env (notably ECR_TAG / CONFIG_REF), prompts for confirmation, takes a backup, and then calls init.sh. Full rundown in init.sh and update.sh.

Smoke test the deployment

Once every service is up:

Check	How
`docker compose ps` clean on every host	SSH via Bastion to each host; expect every container `Up` and any healthchecks `healthy`.
Postgres reachable	From the Database host: `docker exec -it voiceai-postgres pg_isready -U voiceai`.
Redis reachable	From the Database host: `docker exec -it voiceai-redis redis-cli -a "$REDIS_PASSWORD" PING`.
API healthy through the LB	`curl https://${DOMAIN_API}/health` should return `200`.
Dashboard loads	Browse to `https://${DOMAIN_TELWEB}`; first load may take 30–40s while migrations + baseline seed run.
SigNoz UI loads	Browse to `https://${DOMAIN_SIGNOZ}`; you should see traces from every service.
OTel pipeline	On any service host: `curl http://10.0.1.10:4318/v1/traces` (4xx is fine — it proves reachability).
First call (with a SIP carrier)	Place a call to a DID routed to TelPro. Check `voiceai-telphi` logs for an OpenAI realtime session opening.
First call (WebRTC)	Open the WebRTC phone in TelWeb's call dialer; the connection should succeed in one click.

If any step fails, the per-service operations pages have a Troubleshooting tab keyed on the symptom you are seeing.

When TelWeb is Up (healthy), continue to First use on TelWeb (delphi-setup). After that, day-2 ops live in the per-service operations pages and the configuration section.

What init.sh does, step by step​

1. Load the bootstrap environment​

2. Pull the latest config bundle from S3 (unless --skip-fetch)​

3. Configure Docker log rotation (first run only)​

4. Auto-detect PRIVATE_IP if not supplied​

5. Resolve env from SSM / Secrets Manager into memory​

6. Service-specific "magic"​

7. Log into the registry and pull images (unless --skip-images)​

8. Safety: shred any leftover .env​

9. Start Compose​

10. Healthcheck loop and image pruning​

Same flow, every restart​

Smoke test the deployment​