Version: 0.9.12

Instance and container debugging

platform v0.9.11verified 2026-05-14

Use Monitoring in SigNoz first when telemetry is flowing: it gives you correlated logs, traces, metrics, and call spans without touching hosts. Drop to an instance when telemetry is missing, containers are unhealthy, startup fails before logs reach SigNoz, or you need to verify files, mounts, ports, and Docker state directly.

This page is the shared command set for every service host. Service-specific symptoms stay in each service page's Troubleshooting tab.

Get onto the host

SSH through the Bastion or your cloud provider's approved path, then enter the service directory:

cd /opt/services/<service>

Common service directories:

Role	Directory
Web	`/opt/services/web`
API	`/opt/services/api`
Voice	`/opt/services/voice`
TelPro	`/opt/services/telpro`
Database	`/opt/services/database`
Media	`/opt/services/media`
Ops	`/opt/services/ops`
Squid	`/opt/services/squid`
SigNoz	`/opt/services/signoz` for deployment scripts; upstream compose runs under `/opt/signoz-repo/deploy/docker/`

Container state

docker compose ps
docker compose ps --all

Look for containers that are Exit, Restarting, or missing a healthy status. If the issue is service-specific, jump to that service's operations page:

Logs

Use local logs when the container never started cleanly or when SigNoz has stopped receiving telemetry from this host:

docker compose logs --tail=200
docker compose logs --tail=200 <container>
docker compose logs -f <container>

Prefer SigNoz for longer investigations because it keeps logs correlated with traces, call IDs, and service names. Avoid pasting full logs publicly if they may contain credentials, phone numbers, SIP headers, or customer data.

Health checks

Inspect Docker health output when docker compose ps says unhealthy:

docker inspect --format '{{json .State.Health}}' <container> | jq

Common next steps:

Web: confirm TelWeb finished migrations and baseline seed before running delphi-setup.
API: check the load balancer target health and WebSocket stickiness.
Voice: check TelSys / TelPhi containers and Redis reachability.
TelPro: check public ports, RTP ranges, Janus / TURN reachability, and Redis dispatcher state.
Database: check Postgres, Redis, PgBouncer, and mounted volume state.
SigNoz: check ClickHouse disk and query-service health.

Restart without rolling artifacts

When configuration changed in SSM / Secrets Manager and you only need to re-resolve env:

./update.sh --restart-only

For a full pull of config and images:

./init.sh

The common init.sh / update.sh flow and supported flags are documented in Bootstrap and init and the service runbooks.

Environment resolution

Runtime variables are resolved into the parent shell before docker compose up; service secrets should not be left on disk. To debug missing variables:

Check that /opt/deployment/.env exists and contains bootstrap-only values (ENVIRONMENT, NAMESPACE, ECR_TAG, CONFIG_BUCKET, CONFIG_REF, AWS bootstrap access).
Re-run ./init.sh or ./update.sh --restart-only and watch where fetch-env.sh fails.
Check the matching service's vars.yaml entry and whether the value should come from SSM, Secrets Manager, a default, or bootstrap.

Do not paste full environment dumps into tickets or chat. Share the variable names, source (ssm, sm, local, default), and the exact error message.

See Configuration, vars.yaml schema, and Environment resolution.

Network checks

From the affected host:

# SigNoz OTLP receiver
nc -vz 10.0.1.10 4317
nc -vz 10.0.1.10 4318

# Squid egress proxy, if this host depends on it
nc -vz <squid-private-ip> 3128

# Database services, from private-network hosts
nc -vz <database-private-ip> 5432
nc -vz <database-private-ip> 6379

If network checks fail, verify security groups / firewall rules, private IPs, and whether the dependency service is healthy.

OTel collector checks

Every service host should have an OTel collector container or host-network collector configured to export to SigNoz:

docker compose ps | grep -i otel
docker compose logs --tail=200 voiceai-otel-collector

Symptoms:

Symptom	Likely issue	Next step
Collector cannot reach `10.0.1.10:4317`	Network route, SigNoz down, or firewall	Check SigNoz operations and network reachability.
Service logs exist locally but not in SigNoz	Collector pipeline or exporter failure	Check collector logs for export errors.
No local logs and no SigNoz telemetry	Container never started or crashed early	Use `docker compose ps --all` and container logs.

Disk and mounts

Stateful services depend on mounted volumes:

df -h
lsblk
mount | grep -E 'data|signoz|media|postgres'

Pay special attention to:

Database data volume (/mnt/data).
SigNoz ClickHouse volume (/mnt/signoz-data).
Media storage and TLS material.
Docker disk usage when image pulls fail.

When to escalate

Capture:

service role and host;
docker compose ps output;
relevant docker compose logs --tail=200 <container> output with secrets removed;
whether SigNoz is receiving logs / traces for that host;
ECR_TAG, CONFIG_REF, and the failing command;
links to the relevant SigNoz trace, dashboard, or alert if available.

Then continue with the service-specific troubleshooting page or the broader Support entry point.

Get onto the host​

Container state​

Logs​

Health checks​

Restart without rolling artifacts​

Environment resolution​

Network checks​

OTel collector checks​

Disk and mounts​

When to escalate​

See also​