Skip to main content

Monitoring in SigNoz

platform v0.9.11verified 2026-05-14

SigNoz is the central place to investigate a Delphi deployment. Every service host runs an OpenTelemetry collector that forwards logs, traces, metrics, and health signals to the SigNoz service at 10.0.1.10:4317 (OTLP gRPC) / 10.0.1.10:4318 (OTLP HTTP). The SigNoz UI is exposed through the Web service Caddy proxy at https://${DOMAIN_SIGNOZ}.

For running the SigNoz backend itself, use SigNoz operations. This page is about using SigNoz once telemetry is flowing.

What lands in SigNoz

SignalExamplesTypical use
TracesTelAPI WebSocket lifecycle, TelPhi conversation spans, provider calls, flow executionFollow a request or call end-to-end.
LogsApplication logs, tasker jobs, API errors, multiline Kamailio / RTPEngine / Janus outputSearch for failures by service, trace ID, or call ID.
MetricsContainer CPU / memory, request rates, error rates, collector health, ClickHouse storageDashboards and alerts.
SIP / call spansKamailio SIP ladder, RTPEngine media session events, Asterisk / TelSys channel handlingDebug call setup, teardown, routing, and no-audio cases.

Kamailio, RTPEngine, and Janus logs are converted into OTLP spans by the log-to-span sidecar on TelPro. Voice-side call handling lands through Voice collectors.

Dashboards

Custom Delphi dashboards ship with the deployment bundle under the SigNoz service's signoz/dashboards/ directory. Import them in SigNoz via Settings > Dashboards > Import.

DashboardFileAnswers
Delphi Overviewvoiceai-overview.jsonIs the platform healthy? Are calls flowing? Are errors rising?
Call Statisticsvoiceai-call-statistics.jsonCall volume, duration, success / failure, traffic trends.
Containersvoiceai-containers.jsonCPU, memory, restarts, and container-level saturation.
Infrastructurevoiceai-infrastructure.jsonDisk, network, CPU per host.

The file names keep the legacy voiceai-* prefix that ships in the bundle. Dashboard titles should appear as Delphi in the SigNoz UI.

Useful upstream references:

Alerts worth creating

Start with a small set of noisy-but-actionable alerts, then tune thresholds per deployment:

AreaAlert ideaWhy it matters
Telemetry intakeNo spans or logs received from a service for 5-10 minutes.The service may be down, or the local OTel collector cannot reach SigNoz.
Call failuresError rate or failed-call count exceeds a threshold.Catches carrier, TelPro, Voice, and provider regressions.
Web / API5xx responses or request latency spike.User-visible outage signal.
VoiceTelPhi provider latency or error spans spike.AI provider or proxy path degraded.
ContainersRestart count increases, CPU / memory stays high.Service instability before users report symptoms.
StorageClickHouse or data volume above 80%.SigNoz retention will fail or queries will degrade.
CollectorsCollector export errors to 10.0.1.10:4317.Telemetry gap: local debugging may be needed.

Useful upstream references:

Investigating a call

Every call surfaces three correlated identifiers:

IdentifierWhere it livesUsed to find
Trace IDSigNoz, TelPhi logs, TelWeb conversation rowFull distributed trace across TelAPI / TelPhi / TelSys / Kamailio.
Call IDKamailio / Asterisk / TelPhiSIP ladder, RTPEngine sessions, ARI events.
Conversation IDPostgres Conversation row, TelWeb URLPersisted timeline, transcripts, QA scoring.

In v0.9.11 these identifiers are available on the TelWeb conversation detail page.

TelWeb call detail tabs

TabShows
DebugLogs, span tree, and SIP ladder for the trace. Logs query the SigNoz Logs API filtered to trace_id = <traceId>.
TimelineChannel events ordered by ChannelMessage.timestamp. Chat, browser actions, audio segments, control.
Flow runPer-node execution of the Flow Builder graph with inputs / outputs at each step.
QAQA scoring results (enqueued on flow finalisation / hangup).
TokenToken usage breakdown by provider.
ActionBrowser action invocations and their results.

Query patterns

Use these as starting filters in SigNoz Logs / Traces:

service.name = "telphi" AND trace_id = "<id>" # Full conversation trace
service.name = "telapi" AND trace_id = "<id>" # Channel WebSocket lifecycle
service.name = "kamailio" AND call_id = "<id>" # SIP ladder for the call
service.name = "rtpengine" AND call_id = "<id>" # Media session
service.name = "telsys" AND call_id = "<id>" # Asterisk channel handling

Common investigations

QuestionStart in SigNoz / TelWebThen check
Why did this call hang up?TelWeb conversation > Debug > span tree.TelPro / Voice troubleshooting if the terminating span points at SIP or Asterisk.
Why is there no audio?Debug > SIP ladder for the call_id; RTPEngine spans.TelPro no-audio matrix and RTP firewall ranges.
Which provider was slow?Token + Debug logs filtered to service.name = telphi.Voice logs if provider spans are missing.
Did billing or subscription checks fire?Flow/API spans and TelAPI logs.Ops / Tasker logs for background reconciliation.
Was QA scored?QA tab and Tasker spans.Ops if qaScoring jobs are absent.
Why is telemetry missing?Check whether all services stopped reporting at once or only one host is missing.Instance and container debugging and SigNoz operations.

When to debug on the instance

Use SigNoz first when telemetry is present. Drop to the host or container when:

  • a service or host has stopped sending telemetry entirely;
  • Docker health checks are failing but no fresh logs appear in SigNoz;
  • the local OTel collector cannot export to 10.0.1.10:4317;
  • startup fails before the collector is running;
  • you need to validate local files, mounts, ports, or docker compose state.

The shared command set lives in Instance and container debugging. Service-specific symptoms stay in each Operations page's Troubleshooting tab.

Stitching a retained trace into a conversation

When a known-good trace has aged out of a tenant's view but is still retained in SigNoz, you can re-link a Conversation row to that trace ID to make the Debug tab usable for support sessions:

UPDATE "Conversation"
SET "traceId" = '<retained-trace-id>'
WHERE "id" = '<conversation-id>';

The mapping is informational; nothing else moves. Use during outage post-mortems or paired support sessions.

See also