Skip to main content
Version: 0.9.12

Provision instances

platform v0.9.11verified 2026-05-14

Step 3 is creating the actual cloud instances. You can use any cloud and any provisioning tool — the public contract is that every instance comes up with:

  • Network connectivity to the private subnet, AWS APIs (via Squid for private hosts), and the container registry.
  • A populated /opt/deployment/.env (the bootstrap environment).
  • The deployment bundle synced from S3 to /opt/services/<role>/ (or just enough to bootstrap a fetch-config helper that does the sync on the first init.sh run).
  • Docker Engine and Docker Compose installed.
  • IAM credentials with the grants from Prerequisites.

What every instance needs

Regardless of provider, every service instance must have these things in place when init.sh runs:

Bootstrap .env at /opt/deployment/.env

Cloud-init writes this file. It contains only infrastructure-level values — not application config. Application config comes from SSM / Secrets Manager later.

# /opt/deployment/.env
ENVIRONMENT=staging
HOSTNAME=staging-api-01
SERVER_ROLE=api

# Container registry
ECR_REGISTRY=123456.dkr.ecr.eu-central-1.amazonaws.com
ECR_TAG=v0.9.11

# Config bundle source
CONFIG_BUCKET=ki-kombinat-delphi-configs
CONFIG_REF=v0.9.11

# AWS access
NAMESPACE=voiceai/staging
AWS_REGION=eu-central-1
AWS_ACCESS_KEY_ID=AKIA... # or rely on instance profile / IRSA
AWS_SECRET_ACCESS_KEY=...

# Optional
HTTP_PROXY=http://10.0.1.5:3128
HTTPS_PROXY=http://10.0.1.5:3128
NO_PROXY=localhost,127.0.0.1,10.0.0.0/8

# Optional (Database role only, when using managed RDS)
DB_MASTER_SECRET_ARN=arn:aws:secretsmanager:eu-central-1:123456:secret:rds!cluster-xyz-...

Why these are bootstrap and not in SSM: they are needed before the instance can talk to AWS at all (AWS_REGION, credentials), or they describe this instance rather than the deployment (HOSTNAME, SERVER_ROLE). The corresponding vars.yaml entries are flagged source: local so fetch-env.sh knows to read them from this file rather than from AWS.

For the full bootstrap-vs-SSM-vs-SM decision, see vars.yaml schema.

Tooling on the host

init.sh and fetch-env.sh need:

  • bash, curl, python3, jq, yq (the script aborts without them).
  • Docker Engine + Compose plugin.
  • AWS CLI v2.
  • A fetch-config helper that runs aws s3 sync s3://${CONFIG_BUCKET}/deployments/<slug>/${CONFIG_REF}/${SERVER_ROLE}/ /opt/services/${SERVER_ROLE}/ (and common/ to /opt/services/common/). The reference cloud-init drops a one-liner fetch-config script into /usr/local/bin/.

IAM access path

Instances reach AWS via either static keys in the bootstrap .env (simple, what the Hetzner reference uses) or an instance profile (recommended on AWS EC2 — no static credentials on disk).

For instance-profile-based instances, leave AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY unset in the bootstrap .env. The AWS SDK default chain prefers static keys when present and quietly bypasses IMDS, which is the most common cause of CredentialsProviderError on EC2 deployments.

Containers running with network_mode: bridge need --http-put-response-hop-limit 2 on the EC2 instance metadata options so the Docker bridge hop doesn't cut off IMDS. See the Ops operations page for the full IMDS / SES checklist.

Provision foundation services first because later services depend on their private IPs, DNS, or telemetry endpoints:

  1. Squid → SigNoz → Database → Media (foundation services).
  2. TelPro → Web (services with public IPs and DNS).
  3. API instances → load balancer.
  4. Voice instances.
  5. Ops (Scaler + Tasker).
  6. Bastion (knows every other private IP).

Provider-specific choices

Anything that runs Linux + Docker can host a Delphi service. Your cloud-init, image template, Ansible playbook, or cloud-native provisioning needs to:

  1. Install Docker Engine + Compose plugin, AWS CLI v2, jq, yq.
  2. Write /opt/deployment/.env with bootstrap values appropriate for this instance.
  3. Place a fetch-config helper that knows how to pull /deployments/<slug>/<ref>/<role>/ from S3.
  4. Create /opt/services/${SERVER_ROLE}/ and run init.sh once.

Provider differences are mostly outside Delphi itself:

NeedTypical implementation
Private networkVPC / private subnet / cloud private network shared by every service.
Public entry pointsStatic IPs or load balancers for TelPro, Web, and API.
FirewallsSecurity groups / firewall rules matching each service's operations page.
Persistent storageAttached volumes for Database, SigNoz, and Media where used.
IAMInstance profile / role where available; static bootstrap keys only when no instance identity exists.
ScalingCloud-provider adapter used by the Ops Scaler, if autoscaling is enabled.

After that, the service-specific operations pages apply unchanged.

Common cloud-init mistakes

SymptomCause
init.sh aborts with NAMESPACE not setBootstrap .env missing NAMESPACE. Cloud-init didn't write it.
aws ecr get-login-password failsIAM policy missing ecr:GetAuthorizationToken, or bootstrap creds unset on a non-instance-profile host.
aws ssm get-parameters-by-path returns emptyStep 2 not run yet, or NAMESPACE mismatch between bootstrap and seeded SSM paths.
Containers can't reach signoz.<env>.delphiOTEL_BACKEND_HOST defaults to 10.0.1.10; SigNoz instance not up or wrong private IP.
Containers can't pull from upstream registriesSquid not up, or HTTP_PROXY / HTTPS_PROXY not exported.
Database instance: Data directory does not existBlock storage volume not formatted / not mounted at /mnt/data.

Next

Continue to Bootstrap and init.