Skip to main content
Version: 0.9.12

Instance and container debugging

platform v0.9.11verified 2026-05-14

Use Monitoring in SigNoz first when telemetry is flowing: it gives you correlated logs, traces, metrics, and call spans without touching hosts. Drop to an instance when telemetry is missing, containers are unhealthy, startup fails before logs reach SigNoz, or you need to verify files, mounts, ports, and Docker state directly.

This page is the shared command set for every service host. Service-specific symptoms stay in each service page's Troubleshooting tab.

Get onto the host

SSH through the Bastion or your cloud provider's approved path, then enter the service directory:

cd /opt/services/<service>

Common service directories:

RoleDirectory
Web/opt/services/web
API/opt/services/api
Voice/opt/services/voice
TelPro/opt/services/telpro
Database/opt/services/database
Media/opt/services/media
Ops/opt/services/ops
Squid/opt/services/squid
SigNoz/opt/services/signoz for deployment scripts; upstream compose runs under /opt/signoz-repo/deploy/docker/

Container state

docker compose ps
docker compose ps --all

Look for containers that are Exit, Restarting, or missing a healthy status. If the issue is service-specific, jump to that service's operations page:

Logs

Use local logs when the container never started cleanly or when SigNoz has stopped receiving telemetry from this host:

docker compose logs --tail=200
docker compose logs --tail=200 <container>
docker compose logs -f <container>

Prefer SigNoz for longer investigations because it keeps logs correlated with traces, call IDs, and service names. Avoid pasting full logs publicly if they may contain credentials, phone numbers, SIP headers, or customer data.

Health checks

Inspect Docker health output when docker compose ps says unhealthy:

docker inspect --format '{{json .State.Health}}' <container> | jq

Common next steps:

  • Web: confirm TelWeb finished migrations and baseline seed before running delphi-setup.
  • API: check the load balancer target health and WebSocket stickiness.
  • Voice: check TelSys / TelPhi containers and Redis reachability.
  • TelPro: check public ports, RTP ranges, Janus / TURN reachability, and Redis dispatcher state.
  • Database: check Postgres, Redis, PgBouncer, and mounted volume state.
  • SigNoz: check ClickHouse disk and query-service health.

Restart without rolling artifacts

When configuration changed in SSM / Secrets Manager and you only need to re-resolve env:

./update.sh --restart-only

For a full pull of config and images:

./init.sh

The common init.sh / update.sh flow and supported flags are documented in Bootstrap and init and the service runbooks.

Environment resolution

Runtime variables are resolved into the parent shell before docker compose up; service secrets should not be left on disk. To debug missing variables:

  1. Check that /opt/deployment/.env exists and contains bootstrap-only values (ENVIRONMENT, NAMESPACE, ECR_TAG, CONFIG_BUCKET, CONFIG_REF, AWS bootstrap access).
  2. Re-run ./init.sh or ./update.sh --restart-only and watch where fetch-env.sh fails.
  3. Check the matching service's vars.yaml entry and whether the value should come from SSM, Secrets Manager, a default, or bootstrap.

Do not paste full environment dumps into tickets or chat. Share the variable names, source (ssm, sm, local, default), and the exact error message.

See Configuration, vars.yaml schema, and Environment resolution.

Network checks

From the affected host:

# SigNoz OTLP receiver
nc -vz 10.0.1.10 4317
nc -vz 10.0.1.10 4318

# Squid egress proxy, if this host depends on it
nc -vz <squid-private-ip> 3128

# Database services, from private-network hosts
nc -vz <database-private-ip> 5432
nc -vz <database-private-ip> 6379

If network checks fail, verify security groups / firewall rules, private IPs, and whether the dependency service is healthy.

OTel collector checks

Every service host should have an OTel collector container or host-network collector configured to export to SigNoz:

docker compose ps | grep -i otel
docker compose logs --tail=200 voiceai-otel-collector

Symptoms:

SymptomLikely issueNext step
Collector cannot reach 10.0.1.10:4317Network route, SigNoz down, or firewallCheck SigNoz operations and network reachability.
Service logs exist locally but not in SigNozCollector pipeline or exporter failureCheck collector logs for export errors.
No local logs and no SigNoz telemetryContainer never started or crashed earlyUse docker compose ps --all and container logs.

Disk and mounts

Stateful services depend on mounted volumes:

df -h
lsblk
mount | grep -E 'data|signoz|media|postgres'

Pay special attention to:

  • Database data volume (/mnt/data).
  • SigNoz ClickHouse volume (/mnt/signoz-data).
  • Media storage and TLS material.
  • Docker disk usage when image pulls fail.

When to escalate

Capture:

  • service role and host;
  • docker compose ps output;
  • relevant docker compose logs --tail=200 <container> output with secrets removed;
  • whether SigNoz is receiving logs / traces for that host;
  • ECR_TAG, CONFIG_REF, and the failing command;
  • links to the relevant SigNoz trace, dashboard, or alert if available.

Then continue with the service-specific troubleshooting page or the broader Support entry point.

See also