Instance and container debugging
Use Monitoring in SigNoz first when telemetry is flowing: it gives you correlated logs, traces, metrics, and call spans without touching hosts. Drop to an instance when telemetry is missing, containers are unhealthy, startup fails before logs reach SigNoz, or you need to verify files, mounts, ports, and Docker state directly.
This page is the shared command set for every service host. Service-specific symptoms stay in each service page's Troubleshooting tab.
Get onto the host
SSH through the Bastion or your cloud provider's approved path, then enter the service directory:
cd /opt/services/<service>
Common service directories:
| Role | Directory |
|---|---|
| Web | /opt/services/web |
| API | /opt/services/api |
| Voice | /opt/services/voice |
| TelPro | /opt/services/telpro |
| Database | /opt/services/database |
| Media | /opt/services/media |
| Ops | /opt/services/ops |
| Squid | /opt/services/squid |
| SigNoz | /opt/services/signoz for deployment scripts; upstream compose runs under /opt/signoz-repo/deploy/docker/ |
Container state
docker compose ps
docker compose ps --all
Look for containers that are Exit, Restarting, or missing a healthy status. If the issue is service-specific, jump to that service's operations page:
Logs
Use local logs when the container never started cleanly or when SigNoz has stopped receiving telemetry from this host:
docker compose logs --tail=200
docker compose logs --tail=200 <container>
docker compose logs -f <container>
Prefer SigNoz for longer investigations because it keeps logs correlated with traces, call IDs, and service names. Avoid pasting full logs publicly if they may contain credentials, phone numbers, SIP headers, or customer data.
Health checks
Inspect Docker health output when docker compose ps says unhealthy:
docker inspect --format '{{json .State.Health}}' <container> | jq
Common next steps:
- Web: confirm TelWeb finished migrations and baseline seed before running
delphi-setup. - API: check the load balancer target health and WebSocket stickiness.
- Voice: check TelSys / TelPhi containers and Redis reachability.
- TelPro: check public ports, RTP ranges, Janus / TURN reachability, and Redis dispatcher state.
- Database: check Postgres, Redis, PgBouncer, and mounted volume state.
- SigNoz: check ClickHouse disk and query-service health.
Restart without rolling artifacts
When configuration changed in SSM / Secrets Manager and you only need to re-resolve env:
./update.sh --restart-only
For a full pull of config and images:
./init.sh
The common init.sh / update.sh flow and supported flags are documented in Bootstrap and init and the service runbooks.
Environment resolution
Runtime variables are resolved into the parent shell before docker compose up; service secrets should not be left on disk. To debug missing variables:
- Check that
/opt/deployment/.envexists and contains bootstrap-only values (ENVIRONMENT,NAMESPACE,ECR_TAG,CONFIG_BUCKET,CONFIG_REF, AWS bootstrap access). - Re-run
./init.shor./update.sh --restart-onlyand watch wherefetch-env.shfails. - Check the matching service's
vars.yamlentry and whether the value should come from SSM, Secrets Manager, a default, or bootstrap.
Do not paste full environment dumps into tickets or chat. Share the variable names, source (ssm, sm, local, default), and the exact error message.
See Configuration, vars.yaml schema, and Environment resolution.
Network checks
From the affected host:
# SigNoz OTLP receiver
nc -vz 10.0.1.10 4317
nc -vz 10.0.1.10 4318
# Squid egress proxy, if this host depends on it
nc -vz <squid-private-ip> 3128
# Database services, from private-network hosts
nc -vz <database-private-ip> 5432
nc -vz <database-private-ip> 6379
If network checks fail, verify security groups / firewall rules, private IPs, and whether the dependency service is healthy.
OTel collector checks
Every service host should have an OTel collector container or host-network collector configured to export to SigNoz:
docker compose ps | grep -i otel
docker compose logs --tail=200 voiceai-otel-collector
Symptoms:
| Symptom | Likely issue | Next step |
|---|---|---|
Collector cannot reach 10.0.1.10:4317 | Network route, SigNoz down, or firewall | Check SigNoz operations and network reachability. |
| Service logs exist locally but not in SigNoz | Collector pipeline or exporter failure | Check collector logs for export errors. |
| No local logs and no SigNoz telemetry | Container never started or crashed early | Use docker compose ps --all and container logs. |
Disk and mounts
Stateful services depend on mounted volumes:
df -h
lsblk
mount | grep -E 'data|signoz|media|postgres'
Pay special attention to:
- Database data volume (
/mnt/data). - SigNoz ClickHouse volume (
/mnt/signoz-data). - Media storage and TLS material.
- Docker disk usage when image pulls fail.
When to escalate
Capture:
- service role and host;
docker compose psoutput;- relevant
docker compose logs --tail=200 <container>output with secrets removed; - whether SigNoz is receiving logs / traces for that host;
ECR_TAG,CONFIG_REF, and the failing command;- links to the relevant SigNoz trace, dashboard, or alert if available.
Then continue with the service-specific troubleshooting page or the broader Support entry point.
See also
- Monitoring in SigNoz — dashboards, alerts, traces, and logs.
- Bootstrap and init — how
init.sh, S3 config sync, andfetch-env.shwork. - Configuration — variable sources and resolution order.