Monitoring in SigNoz

platform v0.9.11verified 2026-05-14

SigNoz is the central place to investigate a Delphi deployment. Every service host runs an OpenTelemetry collector that forwards logs, traces, metrics, and health signals to the SigNoz service at 10.0.1.10:4317 (OTLP gRPC) / 10.0.1.10:4318 (OTLP HTTP). The SigNoz UI is exposed through the Web service Caddy proxy at https://${DOMAIN_SIGNOZ}.

For running the SigNoz backend itself, use SigNoz operations. This page is about using SigNoz once telemetry is flowing.

What lands in SigNoz

Signal	Examples	Typical use
Traces	TelAPI WebSocket lifecycle, TelPhi conversation spans, provider calls, flow execution	Follow a request or call end-to-end.
Logs	Application logs, tasker jobs, API errors, multiline Kamailio / RTPEngine / Janus output	Search for failures by service, trace ID, or call ID.
Metrics	Container CPU / memory, request rates, error rates, collector health, ClickHouse storage	Dashboards and alerts.
SIP / call spans	Kamailio SIP ladder, RTPEngine media session events, Asterisk / TelSys channel handling	Debug call setup, teardown, routing, and no-audio cases.

Kamailio, RTPEngine, and Janus logs are converted into OTLP spans by the log-to-span sidecar on TelPro. Voice-side call handling lands through Voice collectors.

Dashboards

Custom Delphi dashboards ship with the deployment bundle under the SigNoz service's signoz/dashboards/ directory. Import them in SigNoz via Settings > Dashboards > Import.

Dashboard	File	Answers
Delphi Overview	`voiceai-overview.json`	Is the platform healthy? Are calls flowing? Are errors rising?
Call Statistics	`voiceai-call-statistics.json`	Call volume, duration, success / failure, traffic trends.
Containers	`voiceai-containers.json`	CPU, memory, restarts, and container-level saturation.
Infrastructure	`voiceai-infrastructure.json`	Disk, network, CPU per host.

The file names keep the legacy voiceai-* prefix that ships in the bundle. Dashboard titles should appear as Delphi in the SigNoz UI.

Useful upstream references:

Alerts worth creating

Start with a small set of noisy-but-actionable alerts, then tune thresholds per deployment:

Area	Alert idea	Why it matters
Telemetry intake	No spans or logs received from a service for 5-10 minutes.	The service may be down, or the local OTel collector cannot reach SigNoz.
Call failures	Error rate or failed-call count exceeds a threshold.	Catches carrier, TelPro, Voice, and provider regressions.
Web / API	5xx responses or request latency spike.	User-visible outage signal.
Voice	TelPhi provider latency or error spans spike.	AI provider or proxy path degraded.
Containers	Restart count increases, CPU / memory stays high.	Service instability before users report symptoms.
Storage	ClickHouse or data volume above 80%.	SigNoz retention will fail or queries will degrade.
Collectors	Collector export errors to `10.0.1.10:4317`.	Telemetry gap: local debugging may be needed.

Useful upstream references:

Investigating a call

Every call surfaces three correlated identifiers:

Identifier	Where it lives	Used to find
Trace ID	SigNoz, TelPhi logs, TelWeb conversation row	Full distributed trace across TelAPI / TelPhi / TelSys / Kamailio.
Call ID	Kamailio / Asterisk / TelPhi	SIP ladder, RTPEngine sessions, ARI events.
Conversation ID	Postgres `Conversation` row, TelWeb URL	Persisted timeline, transcripts, QA scoring.

In v0.9.11 these identifiers are available on the TelWeb conversation detail page.

TelWeb call detail tabs

Tab	Shows
Debug	Logs, span tree, and SIP ladder for the trace. Logs query the SigNoz Logs API filtered to `trace_id = <traceId>`.
Timeline	Channel events ordered by `ChannelMessage.timestamp`. Chat, browser actions, audio segments, control.
Flow run	Per-node execution of the Flow Builder graph with inputs / outputs at each step.
QA	QA scoring results (enqueued on flow finalisation / hangup).
Token	Token usage breakdown by provider.
Action	Browser action invocations and their results.

Query patterns

Use these as starting filters in SigNoz Logs / Traces:

service.name = "telphi" AND trace_id = "<id>"     # Full conversation trace
service.name = "telapi" AND trace_id = "<id>"     # Channel WebSocket lifecycle
service.name = "kamailio" AND call_id = "<id>"    # SIP ladder for the call
service.name = "rtpengine" AND call_id = "<id>"   # Media session
service.name = "telsys" AND call_id = "<id>"      # Asterisk channel handling

Common investigations

Question	Start in SigNoz / TelWeb	Then check
Why did this call hang up?	TelWeb conversation > Debug > span tree.	TelPro / Voice troubleshooting if the terminating span points at SIP or Asterisk.
Why is there no audio?	Debug > SIP ladder for the `call_id`; RTPEngine spans.	TelPro no-audio matrix and RTP firewall ranges.
Which provider was slow?	Token + Debug logs filtered to `service.name = telphi`.	Voice logs if provider spans are missing.
Did billing or subscription checks fire?	Flow/API spans and TelAPI logs.	Ops / Tasker logs for background reconciliation.
Was QA scored?	QA tab and Tasker spans.	Ops if `qaScoring` jobs are absent.
Why is telemetry missing?	Check whether all services stopped reporting at once or only one host is missing.	Instance and container debugging and SigNoz operations.

When to debug on the instance

Use SigNoz first when telemetry is present. Drop to the host or container when:

a service or host has stopped sending telemetry entirely;
Docker health checks are failing but no fresh logs appear in SigNoz;
the local OTel collector cannot export to 10.0.1.10:4317;
startup fails before the collector is running;
you need to validate local files, mounts, ports, or docker compose state.

The shared command set lives in Instance and container debugging. Service-specific symptoms stay in each Operations page's Troubleshooting tab.

Stitching a retained trace into a conversation

When a known-good trace has aged out of a tenant's view but is still retained in SigNoz, you can re-link a Conversation row to that trace ID to make the Debug tab usable for support sessions:

UPDATE "Conversation"
SET    "traceId" = '<retained-trace-id>'
WHERE  "id"      = '<conversation-id>';

The mapping is informational; nothing else moves. Use during outage post-mortems or paired support sessions.

What lands in SigNoz​

Dashboards​

Alerts worth creating​

Investigating a call​

TelWeb call detail tabs​

Query patterns​

Common investigations​

When to debug on the instance​

Stitching a retained trace into a conversation​

See also​