Monitoring in SigNoz
SigNoz is the central place to investigate a Delphi deployment. Every service host runs an OpenTelemetry collector that forwards logs, traces, metrics, and health signals to the SigNoz service at 10.0.1.10:4317 (OTLP gRPC) / 10.0.1.10:4318 (OTLP HTTP). The SigNoz UI is exposed through the Web service Caddy proxy at https://${DOMAIN_SIGNOZ}.
For running the SigNoz backend itself, use SigNoz operations. This page is about using SigNoz once telemetry is flowing.
What lands in SigNoz
| Signal | Examples | Typical use |
|---|---|---|
| Traces | TelAPI WebSocket lifecycle, TelPhi conversation spans, provider calls, flow execution | Follow a request or call end-to-end. |
| Logs | Application logs, tasker jobs, API errors, multiline Kamailio / RTPEngine / Janus output | Search for failures by service, trace ID, or call ID. |
| Metrics | Container CPU / memory, request rates, error rates, collector health, ClickHouse storage | Dashboards and alerts. |
| SIP / call spans | Kamailio SIP ladder, RTPEngine media session events, Asterisk / TelSys channel handling | Debug call setup, teardown, routing, and no-audio cases. |
Kamailio, RTPEngine, and Janus logs are converted into OTLP spans by the log-to-span sidecar on TelPro. Voice-side call handling lands through Voice collectors.
Dashboards
Custom Delphi dashboards ship with the deployment bundle under the SigNoz service's signoz/dashboards/ directory. Import them in SigNoz via Settings > Dashboards > Import.
| Dashboard | File | Answers |
|---|---|---|
| Delphi Overview | voiceai-overview.json | Is the platform healthy? Are calls flowing? Are errors rising? |
| Call Statistics | voiceai-call-statistics.json | Call volume, duration, success / failure, traffic trends. |
| Containers | voiceai-containers.json | CPU, memory, restarts, and container-level saturation. |
| Infrastructure | voiceai-infrastructure.json | Disk, network, CPU per host. |
The file names keep the legacy voiceai-* prefix that ships in the bundle. Dashboard titles should appear as Delphi in the SigNoz UI.
Useful upstream references:
Alerts worth creating
Start with a small set of noisy-but-actionable alerts, then tune thresholds per deployment:
| Area | Alert idea | Why it matters |
|---|---|---|
| Telemetry intake | No spans or logs received from a service for 5-10 minutes. | The service may be down, or the local OTel collector cannot reach SigNoz. |
| Call failures | Error rate or failed-call count exceeds a threshold. | Catches carrier, TelPro, Voice, and provider regressions. |
| Web / API | 5xx responses or request latency spike. | User-visible outage signal. |
| Voice | TelPhi provider latency or error spans spike. | AI provider or proxy path degraded. |
| Containers | Restart count increases, CPU / memory stays high. | Service instability before users report symptoms. |
| Storage | ClickHouse or data volume above 80%. | SigNoz retention will fail or queries will degrade. |
| Collectors | Collector export errors to 10.0.1.10:4317. | Telemetry gap: local debugging may be needed. |
Useful upstream references:
Investigating a call
Every call surfaces three correlated identifiers:
| Identifier | Where it lives | Used to find |
|---|---|---|
| Trace ID | SigNoz, TelPhi logs, TelWeb conversation row | Full distributed trace across TelAPI / TelPhi / TelSys / Kamailio. |
| Call ID | Kamailio / Asterisk / TelPhi | SIP ladder, RTPEngine sessions, ARI events. |
| Conversation ID | Postgres Conversation row, TelWeb URL | Persisted timeline, transcripts, QA scoring. |
In v0.9.11 these identifiers are available on the TelWeb conversation detail page.
TelWeb call detail tabs
| Tab | Shows |
|---|---|
| Debug | Logs, span tree, and SIP ladder for the trace. Logs query the SigNoz Logs API filtered to trace_id = <traceId>. |
| Timeline | Channel events ordered by ChannelMessage.timestamp. Chat, browser actions, audio segments, control. |
| Flow run | Per-node execution of the Flow Builder graph with inputs / outputs at each step. |
| QA | QA scoring results (enqueued on flow finalisation / hangup). |
| Token | Token usage breakdown by provider. |
| Action | Browser action invocations and their results. |
Query patterns
Use these as starting filters in SigNoz Logs / Traces:
service.name = "telphi" AND trace_id = "<id>" # Full conversation trace
service.name = "telapi" AND trace_id = "<id>" # Channel WebSocket lifecycle
service.name = "kamailio" AND call_id = "<id>" # SIP ladder for the call
service.name = "rtpengine" AND call_id = "<id>" # Media session
service.name = "telsys" AND call_id = "<id>" # Asterisk channel handling
Common investigations
| Question | Start in SigNoz / TelWeb | Then check |
|---|---|---|
| Why did this call hang up? | TelWeb conversation > Debug > span tree. | TelPro / Voice troubleshooting if the terminating span points at SIP or Asterisk. |
| Why is there no audio? | Debug > SIP ladder for the call_id; RTPEngine spans. | TelPro no-audio matrix and RTP firewall ranges. |
| Which provider was slow? | Token + Debug logs filtered to service.name = telphi. | Voice logs if provider spans are missing. |
| Did billing or subscription checks fire? | Flow/API spans and TelAPI logs. | Ops / Tasker logs for background reconciliation. |
| Was QA scored? | QA tab and Tasker spans. | Ops if qaScoring jobs are absent. |
| Why is telemetry missing? | Check whether all services stopped reporting at once or only one host is missing. | Instance and container debugging and SigNoz operations. |
When to debug on the instance
Use SigNoz first when telemetry is present. Drop to the host or container when:
- a service or host has stopped sending telemetry entirely;
- Docker health checks are failing but no fresh logs appear in SigNoz;
- the local OTel collector cannot export to
10.0.1.10:4317; - startup fails before the collector is running;
- you need to validate local files, mounts, ports, or
docker composestate.
The shared command set lives in Instance and container debugging. Service-specific symptoms stay in each Operations page's Troubleshooting tab.
Stitching a retained trace into a conversation
When a known-good trace has aged out of a tenant's view but is still retained in SigNoz, you can re-link a Conversation row to that trace ID to make the Debug tab usable for support sessions:
UPDATE "Conversation"
SET "traceId" = '<retained-trace-id>'
WHERE "id" = '<conversation-id>';
The mapping is informational; nothing else moves. Use during outage post-mortems or paired support sessions.
See also
- SigNoz operations — run and repair the SigNoz backend.
- Instance and container debugging — SSH / Docker checks when telemetry is missing.
- TelPro operations — SIP, RTP, Janus, TURN.
- Voice operations — TelPhi and Asterisk call handling.