Skip to main content

Bootstrap and init

platform v0.9.11verified 2026-05-14

Once the instance is provisioned with the right bootstrap .env and a way to fetch the config bundle from S3, every service uses the same init.sh flow. The same flow runs on first boot and on every subsequent restart — it is idempotent.

The whole point of this flow is that secrets never land on disk. fetch-env.sh prints export VAR=value lines to stdout; init.sh evaluates the output into the shell, then docker compose up inherits the environment. There is no .env file in /opt/services/<role>/init.sh actively shreds any leftover one as a safety net.

What init.sh does, step by step

Every service ships its own init.sh (each one specialized for that role's needs), but they all follow the same outline. The lines below are taken from the API service's script; every other service is structurally similar.

1. Load the bootstrap environment

DEPLOYMENT_ENV="/opt/deployment/.env"
source "$DEPLOYMENT_ENV"

This brings ENVIRONMENT, HOSTNAME, NAMESPACE, ECR_REGISTRY, ECR_TAG, CONFIG_BUCKET, CONFIG_REF, AWS credentials, and any proxy variables into the shell. None of these come from AWS — they were placed by cloud-init.

2. Pull the latest config bundle from S3 (unless --skip-fetch)

if [ "$SKIP_FETCH" = false ] && command -v fetch-config &>/dev/null; then
fetch-config
fi

fetch-config is a one-liner placed by cloud-init that runs aws s3 sync s3://${CONFIG_BUCKET}/deployments/<slug>/${CONFIG_REF}/<role>/ /opt/services/<role>/ plus the same for common/. After this step the host has the latest docker-compose.yaml, init.sh, vars.yaml, and any service-specific files (Caddyfile, otel-collector-config.yaml, …) for the configured CONFIG_REF.

If you are running update.sh --config-ref <new-ref> to roll a config change, this is the step that pulls the new bundle.

3. Configure Docker log rotation (first run only)

if [ ! -f /etc/docker/daemon.json ] || ! grep -q max-size ...; then
cat > /etc/docker/daemon.json <<DAEMONJSON
{ "log-driver": "json-file", "log-opts": { "max-size": "20m", "max-file": "3" } }
DAEMONJSON
systemctl restart docker
fi

Docker logs ship to SigNoz over OTel; the local rotation just prevents disk fill if the OTel collector is briefly down.

4. Auto-detect PRIVATE_IP if not supplied

Some scripts (API, Voice) auto-detect PRIVATE_IP from the first 10.x interface and persist it to /opt/deployment/.env if it wasn't already set. This keeps the bootstrap .env minimal in cloud-init and lets the host figure out its own private IP at runtime.

5. Resolve env from SSM / Secrets Manager into memory

The single most important step:

COMMON_DIR="$(cd "$SERVICE_DIR/../common" && pwd)"
ENV_EXPORTS="$("$COMMON_DIR/fetch-env.sh" \
--manifest "$SERVICE_DIR/vars.yaml" \
--bootstrap "$DEPLOYMENT_ENV" \
--format export)"
eval "$ENV_EXPORTS"

fetch-env.sh reads vars.yaml, fetches every declared variable from the right source (bootstrap, SSM, SM, external secret ARN, default, or compose.template), and prints export VAR=value lines to stdout. The eval lifts them into the shell that's about to run docker compose up.

The full resolution order, namespace conventions, and gotchas are in Environment resolution.

6. Service-specific "magic"

This is the per-service variation. Each init.sh sandwiches one or more of these between step 5 and docker compose up:

ServiceMagic
APIprepare-tls.sh decodes INTERNAL_CA_CRT_B64 from SM into ./tls/ca.crt (mounted into the container); fall-back to an empty placeholder file when not provided.
WebCaddy auto-issues Let's Encrypt for DOMAIN_TELWEB. TelWeb entrypoint: Prisma migrate deploy then baseline prisma db seed; operators run docker compose exec -it voiceai-telweb delphi-setup after the container is healthy — see First use.
VoiceCopies log-to-span binary into the right path; prepare-tls.sh for the optional Postgres/Redis CA bundle.
TelProMaps LOG_LEVEL → numeric RTP_LOG_LEVEL / DEBUG_LEVEL; resolves TLS via Certbot, SM-decoded PEM, manual files, or self-signed (in priority order).
Databaseprepare-tls-server.sh decodes INTERNAL_TLS_CERT_B64 / INTERNAL_TLS_KEY_B64 / INTERNAL_CA_CRT_B64 from SM, sets ownership for postgres + pgbouncer.
MediaTLS resolution in three branches (MEDIA_TLS_*_B64 env-PEM → existing files → self-signed); writes tls/ca-for-clients.pem for downstream consumers to trust.
SigNozClones the upstream SigNoz repo; converts named volumes to bind mounts on /mnt/signoz-data; configures redsocks-based transparent proxy via Squid.
OpsPicks SMTP vs SES email transport from EMAIL_TRANSPORT; static AWS keys must be unset when AWS_USE_INSTANCE_PROFILE=true.
SquidNone — the container is configured entirely via squid.conf.

7. Log into the registry and pull images (unless --skip-images)

aws ecr get-login-password --region "${AWS_REGION:-eu-central-1}" | \
docker login --username AWS --password-stdin "${ECR_REGISTRY}"
docker compose pull

8. Safety: shred any leftover .env

if [ -f "$SERVICE_DIR/.env" ]; then
shred -u "$SERVICE_DIR/.env" 2>/dev/null || rm -f "$SERVICE_DIR/.env"
fi

This catches the case where an operator accidentally left a .env from manual debugging — the live secrets the containers are about to use are in the parent shell's environment, not on disk.

9. Start Compose

docker compose up -d --force-recreate --remove-orphans

The current shell (with all the evald exports) is what docker compose inherits. Compose then injects each variable into the right container according to the environment: and ${VAR} substitutions in docker-compose.yaml.

10. Healthcheck loop and image pruning

Each service waits for its primary container to report ready (/health/live, pg_isready, redis-cli ping, etc.), then prunes dangling and unused Docker images so disk usage stays bounded.

Same flow, every restart

The same init.sh runs on every restart. Use the standard flags to skip steps that don't apply:

  • --restart-only — skip fetch-config and image pulls; just re-resolve env and docker compose up. Use this after a value changes in SSM / Secrets Manager.
  • --skip-imagesfetch-config runs but no docker pull; useful when iterating on a docker-compose.yaml change in the bundle without rolling images.
  • --skip-fetch — don't sync from S3; useful if you're testing local edits to /opt/services/<role>/ (be careful — the next update.sh will overwrite them).

The deployment-manager script update.sh (shared across all services) is the recommended entry point in steady state — it updates /opt/deployment/.env (notably ECR_TAG / CONFIG_REF), prompts for confirmation, takes a backup, and then calls init.sh. Full rundown in init.sh and update.sh.

Smoke test the deployment

Once every service is up:

CheckHow
docker compose ps clean on every hostSSH via Bastion to each host; expect every container Up and any healthchecks healthy.
Postgres reachableFrom the Database host: docker exec -it voiceai-postgres pg_isready -U voiceai.
Redis reachableFrom the Database host: docker exec -it voiceai-redis redis-cli -a "$REDIS_PASSWORD" PING.
API healthy through the LBcurl https://${DOMAIN_API}/health should return 200.
Dashboard loadsBrowse to https://${DOMAIN_TELWEB}; first load may take 30–40s while migrations + baseline seed run.
SigNoz UI loadsBrowse to https://${DOMAIN_SIGNOZ}; you should see traces from every service.
OTel pipelineOn any service host: curl http://10.0.1.10:4318/v1/traces (4xx is fine — it proves reachability).
First call (with a SIP carrier)Place a call to a DID routed to TelPro. Check voiceai-telphi logs for an OpenAI realtime session opening.
First call (WebRTC)Open the WebRTC phone in TelWeb's call dialer; the connection should succeed in one click.

If any step fails, the per-service operations pages have a Troubleshooting tab keyed on the symptom you are seeing.

When TelWeb is Up (healthy), continue to First use on TelWeb (delphi-setup). After that, day-2 ops live in the per-service operations pages and the configuration section.