Provision instances
Step 3 is creating the actual cloud instances. You can use any cloud and any provisioning tool — the public contract is that every instance comes up with:
- Network connectivity to the private subnet, AWS APIs (via Squid for private hosts), and the container registry.
- A populated
/opt/deployment/.env(the bootstrap environment). - The deployment bundle synced from S3 to
/opt/services/<role>/(or just enough to bootstrap afetch-confighelper that does the sync on the firstinit.shrun). - Docker Engine and Docker Compose installed.
- IAM credentials with the grants from Prerequisites.
What every instance needs
Regardless of provider, every service instance must have these things in place when init.sh runs:
Bootstrap .env at /opt/deployment/.env
Cloud-init writes this file. It contains only infrastructure-level values — not application config. Application config comes from SSM / Secrets Manager later.
# /opt/deployment/.env
ENVIRONMENT=staging
HOSTNAME=staging-api-01
SERVER_ROLE=api
# Container registry
ECR_REGISTRY=123456.dkr.ecr.eu-central-1.amazonaws.com
ECR_TAG=v0.9.11
# Config bundle source
CONFIG_BUCKET=ki-kombinat-delphi-configs
CONFIG_REF=v0.9.11
# AWS access
NAMESPACE=voiceai/staging
AWS_REGION=eu-central-1
AWS_ACCESS_KEY_ID=AKIA... # or rely on instance profile / IRSA
AWS_SECRET_ACCESS_KEY=...
# Optional
HTTP_PROXY=http://10.0.1.5:3128
HTTPS_PROXY=http://10.0.1.5:3128
NO_PROXY=localhost,127.0.0.1,10.0.0.0/8
# Optional (Database role only, when using managed RDS)
DB_MASTER_SECRET_ARN=arn:aws:secretsmanager:eu-central-1:123456:secret:rds!cluster-xyz-...
Why these are bootstrap and not in SSM: they are needed before the instance can talk to AWS at all (AWS_REGION, credentials), or they describe this instance rather than the deployment (HOSTNAME, SERVER_ROLE). The corresponding vars.yaml entries are flagged source: local so fetch-env.sh knows to read them from this file rather than from AWS.
For the full bootstrap-vs-SSM-vs-SM decision, see vars.yaml schema.
Tooling on the host
init.sh and fetch-env.sh need:
bash,curl,python3,jq,yq(the script aborts without them).- Docker Engine + Compose plugin.
- AWS CLI v2.
- A
fetch-confighelper that runsaws s3 sync s3://${CONFIG_BUCKET}/deployments/<slug>/${CONFIG_REF}/${SERVER_ROLE}/ /opt/services/${SERVER_ROLE}/(andcommon/to/opt/services/common/). The reference cloud-init drops a one-linerfetch-configscript into/usr/local/bin/.
IAM access path
Instances reach AWS via either static keys in the bootstrap .env (simple, what the Hetzner reference uses) or an instance profile (recommended on AWS EC2 — no static credentials on disk).
For instance-profile-based instances, leave AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY unset in the bootstrap .env. The AWS SDK default chain prefers static keys when present and quietly bypasses IMDS, which is the most common cause of CredentialsProviderError on EC2 deployments.
Containers running with network_mode: bridge need --http-put-response-hop-limit 2 on the EC2 instance metadata options so the Docker bridge hop doesn't cut off IMDS. See the Ops operations page for the full IMDS / SES checklist.
Recommended provisioning order
Provision foundation services first because later services depend on their private IPs, DNS, or telemetry endpoints:
- Squid → SigNoz → Database → Media (foundation services).
- TelPro → Web (services with public IPs and DNS).
- API instances → load balancer.
- Voice instances.
- Ops (Scaler + Tasker).
- Bastion (knows every other private IP).
Provider-specific choices
Anything that runs Linux + Docker can host a Delphi service. Your cloud-init, image template, Ansible playbook, or cloud-native provisioning needs to:
- Install Docker Engine + Compose plugin, AWS CLI v2,
jq,yq. - Write
/opt/deployment/.envwith bootstrap values appropriate for this instance. - Place a
fetch-confighelper that knows how to pull/deployments/<slug>/<ref>/<role>/from S3. - Create
/opt/services/${SERVER_ROLE}/and runinit.shonce.
Provider differences are mostly outside Delphi itself:
| Need | Typical implementation |
|---|---|
| Private network | VPC / private subnet / cloud private network shared by every service. |
| Public entry points | Static IPs or load balancers for TelPro, Web, and API. |
| Firewalls | Security groups / firewall rules matching each service's operations page. |
| Persistent storage | Attached volumes for Database, SigNoz, and Media where used. |
| IAM | Instance profile / role where available; static bootstrap keys only when no instance identity exists. |
| Scaling | Cloud-provider adapter used by the Ops Scaler, if autoscaling is enabled. |
After that, the service-specific operations pages apply unchanged.
Common cloud-init mistakes
| Symptom | Cause |
|---|---|
init.sh aborts with NAMESPACE not set | Bootstrap .env missing NAMESPACE. Cloud-init didn't write it. |
aws ecr get-login-password fails | IAM policy missing ecr:GetAuthorizationToken, or bootstrap creds unset on a non-instance-profile host. |
aws ssm get-parameters-by-path returns empty | Step 2 not run yet, or NAMESPACE mismatch between bootstrap and seeded SSM paths. |
Containers can't reach signoz.<env>.delphi | OTEL_BACKEND_HOST defaults to 10.0.1.10; SigNoz instance not up or wrong private IP. |
| Containers can't pull from upstream registries | Squid not up, or HTTP_PROXY / HTTPS_PROXY not exported. |
Database instance: Data directory does not exist | Block storage volume not formatted / not mounted at /mnt/data. |
Next
Continue to Bootstrap and init.