Skip to main content

Ops service operations

platform v0.9.11verified 2026-05-14

The Ops service runs infrastructure management and background processing.

  • Scaler — autoscales API and Voice instances based on utilization metrics. Provider-agnostic via per-provider shell scripts (scaleUp.sh, scaleDown.sh, scalingStatus.sh).
  • Tasker — runs scheduled and queued background tasks (DB backups, maintenance, recording processing, email notifications) on a Redis-backed job queue.

Both components use Redis-based leader election with separate keys so only one active leader processes at a time.

Containers

ContainerBasePurpose
voiceai-scalerNode 24-alpineDistributed scaling orchestrator
voiceai-taskerNode 24-alpineBackground job + cron runner
voiceai-otel-collectorotel/opentelemetry-collector-contrib:0.150.1Telemetry collector

Neither the Scaler nor the Tasker exposes an HTTP health endpoint. Their health is determined by container status and log output.

Scaler decision loop

  1. Every SCALING_EVALUATION_INTERVAL_MS (default 30s), the leader reads utilization metrics from Redis against scaleUpThreshold / scaleDownThreshold + min/max from ServerGroup.scalingConfig in Postgres.
  2. On scale up, fetch optional bootstrap secrets from Secrets Manager (secretsName), generate a cloud-init via generate-cloud-init.sh, run scaleUp.sh.
  3. Poll scalingStatus.sh until ready, then wait for a service heartbeat in Redis (~5 min for cloud-init + container startup).
  4. Voice: append to the Kamailio dispatcher set in Redis. API: managed LB picks up via labels.
  5. Cooldown — gates the next decision.

Scale-down reverses the flow: remove from routing, drain active calls (Voice waits up to 60 min), delete the server, clean Redis + Postgres state.

See also