SIP Telephony — Zero to Hero v1 · 2026-04-25
PART 06 · STEPS 76–85

Production Concerns

Once a call connects, an entirely different family of bugs starts mattering: caller-ID attestation, 60-second drops, light-mode proxies that bypass ACK, retransmit timers, NAT topology, encryption boundaries, trunk concurrency caps, codec CPU, CDR plumbing, and KVKK-grade recording compliance. This part walks the production landscape using the real Garanti M08 incident and the 24 April 2026 KKB outbound test as scaffolding.

10 steps 3 demos ~22 minutes
76stir/shaken

STIR/SHAKEN — caller-ID attestation

Anti-spoofing for North American calls. The originating carrier signs a JWT asserting "yes, this caller-ID belongs to us". Verifying carriers check the signature; if it fails, the call is marked as suspected spam at the terminating endpoint.

Originating carrier Signs Identity JWT STI-AS private key, attest A/B/C SIP / IMS Identity: ... Terminating carrier Verifies signature STI-VS public cert, verify Caller +1 415 555 0102 Callee — sees "Verified"

In our config, the warning Endpoint dyn-…: Option 'stir_shaken' is no longer supported. Use 'stir_shaken_profile' instead. is cosmetic — Asterisk 22.8.2 deprecated the old syntax. Replace with stir_shaken_profile in the endpoint block to silence it.

We do not currently sign or verify STIR/SHAKEN on outbound TR calls. If we expand to US calls, this becomes mandatory: full-attestation A means we asserted both the call origin and the right to use that caller-ID; B means call origin only; C means we just relayed it.

7760-second drop

The 60-second drop

Whenever a call drops at exactly ~60 seconds, the cause is missing RTP, not the timer. The timer (rtptimeout=60) is doing its job. Find the missing ACK, the missing route, or the missing firewall rule.

The Garanti M08 6-hypothesis trail is the canonical example of this symptom — six different ways to cause one drop pattern. Walk through it below.

Demo 1 — Garanti M08: the 60-second drop incident, 2026

Every outbound call to Garanti's VVB dropped at exactly 60 s. Six hypotheses on the table. Click a card to expand its evidence; one of them is right.

H1
Firewall ACL gap (5060 / RTP)
Tap to expand
TestedSplunk capture on Garanti's perimeter firewall during call.
EvidenceUDP 5060 INVITE/200 OK observed. RTP 10000–10499 flowed both directions. No drops at the firewall.
VerdictRuled out — for the proxy path. Did not yet test the direct UA-to-UA path.
H2
NAT / address rewrite
Tap to expand
TestedInspected SDP c= line and packet source IPs at both ends.
EvidencePure IP routing on Garanti's MPLS — no NAT in the path. SDP IPs matched packet source IPs.
VerdictRuled out. Symmetric NAT not present.
H3
Light-mode proxy ACK bypass
Tap to expand
TestedCaptured INVITE on KASP, then watched for ACK on KASP and direct VVB↔Asterisk.
EvidenceINVITE traversed KASP (10.231.16.153). 200 OK came back through KASP. ACK never reached Asterisk on either path. KASP had no Record-Route header in the dialog.
VerdictConfirmed root cause.
H4
Codec / transport mismatch
Tap to expand
TestedCompared SDP offer vs answer; checked transport on both 5060 UDP and 5060 TCP.
Evidenceulaw negotiated cleanly. RTP audio actually streamed for the first 60 s — proves codec is fine.
VerdictRuled out. If codec were wrong, drop would be at second 0, not 60.
H5
identify match miss
Tap to expand
TestedVerified match=10.231.16.153/32 in [providers-identify] resolved to KASP.
EvidenceAsterisk pjsip show endpoints classified inbound INVITE under the providers endpoint. Auth passed.
VerdictRuled out — INVITE was accepted and dialed, just no ACK afterwards.
H6
Header parse failure
Tap to expand
TestedCompared raw INVITE bytes vs Asterisk parsed dialog state.
EvidenceAll custom headers ASCII. No silent drops. PJSIP parsed the INVITE without error and the dialplan ran.
VerdictRuled out. (Not the same trap as Anadolu A4.)
Root cause
Explore all six hypotheses (or click "Reveal answer") to see the conclusion.
Real incident, Garanti BBVA POC, March–April 2026. The "Reveal answer" path is the canonical resolution.
78light-mode proxy

SIP "light mode" proxy

A stateless SIP proxy passes signaling but does not insert itself into the dialog (no Record-Route). Consequence: ACK and BYE bypass the proxy and go end-to-end UA-to-UA.

If your firewall opens 5060 only between the proxy and your Asterisk, ACK is lost. This is exactly the Garanti M08 root cause. Fix: open 5060 directly UA-to-UA, in addition to the proxy path.

Stateful proxy (Record-Route inserted)

UA-A Proxy Record-Route UA-B INVITE INVITE 200 OK 200 OK ACK ACK All packets pass through proxy
Firewall only needs UA↔Proxy on 5060. Predictable. Default behaviour for Kamailio with record_route() in the route block.

Light-mode proxy (no Record-Route)

UA-A KASP light no Record-Route UA-B INVITE INVITE 200 OK 200 OK ACK direct UA→UA · BLOCKED No firewall rule for direct path → 60 s drop
Firewall must also allow UA↔UA direct on 5060. The Garanti fix was a one-line ACL change after six hypotheses' worth of investigation.
Try this on KKB
$ ssh freya@192.168.35.197 "docker exec freya-asterisk asterisk -rx 'pjsip show contacts'"
$ ssh freya@192.168.35.197 "docker exec freya-asterisk asterisk -rx 'pjsip set logger on'"
$ # place a call, watch for Record-Route headers in the logs
79timer b

SIP timer B

Default 32 s. After Timer B fires on an INVITE without a final response, the UA gives up and emits 408 Request Timeout. KKB campaign 879c7bb0 had one such timeout on +905537721287 at 20:38 UTC — the carrier never answered the INVITE.

You can shorten Timer B (timer_b in PJSIP) but you lose tolerance to slow PSTN gateways. The default 32 s comes from RFC 3261's worst-case retransmit ladder: T1 = 500 ms, then doubling — 0.5, 1, 2, 4, 8, 16 s — for a total of seven INVITE attempts before giving up.

Demo 2 — Timer B simulator (T1 = 500 ms, doubles to 32 s)
Idle. Press a button to launch an INVITE.
~3 real seconds maps to 32 s of simulated SIP time. Tick marks at 0.5, 1, 2, 4, 8, 16, 32 s mark RFC 3261 retransmit boundaries (Timer A doubling, capped by Timer B).
80symmetric nat

Symmetric NAT and direct_media

Symmetric NAT maps every (src-IP, src-port, dst-IP, dst-port) tuple to a unique external port. Two peers behind symmetric NAT cannot direct-talk even with STUN — they need TURN.

direct_media = no sidesteps this entirely (Asterisk relays). It is also why our agent can record + transcribe + run the LLM. Two reasons one knob stays off.

NAT typeMapping ruleSTUN works?Direct media works?
Full coneSame external port for any peerYesYes
Restricted coneExternal port reused; remote IP gatedYesYes if STUN-warmed
Port-restrictedRemote IP+port both gatedYesSometimes
SymmetricNew external port per (src,dst) tupleNoNo — needs TURN
81tls vs srtp

TLS vs SRTP

TLS — signaling

Encrypts SIP messages: INVITE, 200 OK, REGISTER, headers, request URIs. Not the audio. Cheap. Recommended on any inter-network leg.

SRTP — audio

Encrypts RTP payloads with AES. Needs a key exchange (SDES in SDP, or DTLS-SRTP). More CPU, slightly higher per-packet overhead.

You can have one without the other. Most enterprise deployments use TLS without SRTP — signaling is small and contains identifiers; audio is high-bandwidth and stays inside trusted MPLS. KKB and Garanti both run unencrypted UDP today because the traffic never leaves the customer's controlled network.

82concurrent caps

Concurrent channel limits

Every trunk provider has a per-account ceiling. Exceeding triggers 503 Service Unavailable from the provider, or 603 Decline per destination. Verimor caps and per-destination blocks both showed up in the 24 April 2026 KKB outbound test.

Demo 3 — Verimor concurrent dialing (KKB, 24 April 2026)
Calls launched simultaneously 9
200 OK
0
603 Decline
0
503 Cap
0
408 Timeout
0
9 calls · 6 connect · 5 decline (cause 41 — same numbers across runs) · 1 timeout
Numbers grounded in KKB campaign 879c7bb0 (20:38 UTC, 24 April 2026): +905374705251, +905510154221, +905532890209, +905053864612, +905535918355 all returned SIP 603 with Q.850 cause 41 across multiple runs — stable carrier-side block, not transient congestion.

You can rate-limit on our side from the campaign-worker: cap concurrency, add inter-INVITE gap. See step 95.

83codec cost

Codec transcoding cost

Each codec conversion uses CPU. Asterisk on modern x86 can handle hundreds of ulaw ↔ alaw transcodings without breaking a sweat (8 kHz mono, simple lookup tables). Opus or G.722 transcoding is more expensive — watch CPU when adding HD codecs.

Codec pairCPU per channelNotes
ulaw ↔ alaw~0.1%Lookup table, no DSP work
ulaw ↔ G.722~0.5%16 kHz upsampling + ADPCM
ulaw ↔ Opus~1.5%Resampling, frame boundary alignment
G.711 passthrough0%No transcoding — the goal

Rule of thumb: pick a codec the carrier already speaks. We standardise on ulaw + alaw at every TR customer and let the agent's STT/TTS handle the higher-fidelity end.

84cdr

CDR — Call Detail Records

Asterisk writes a CDR row per call: caller-ID, called-number, start/answer/end times, duration, hangup cause, recording filename.

CSV backend writes to /var/log/asterisk/cdr-csv/Master.csv. ODBC/MySQL backends write to a database. We rely primarily on the database CDR + dashboard's own call records; the CSV file is a backup. The current Unable to open … cdr-csv/Master.csv warning at KKB is because the directory does not exist in the container — cosmetic.

accountcode
kkb-prod
src
908502427127
dst
+905532890209
dcontext
from-trunk-outbound
channel
PJSIP/providers-00000123
start
2026-04-24 20:38:14
answer
— (never answered)
end
2026-04-24 20:38:18
duration
4
billsec
0
disposition
FAILED
hangupcause
41 (Temporary failure)
sipresponse
603 Declined
uniqueid
1761341894.123
recordingfile
— (no answer, no recording)

When the dashboard team asks "what does our column X mean?", this row is your answer. Open work item from KKB step 99: surface SIP response code + Q.850 cause directly in the dashboard call detail page, so triage doesn't need shell access.

85recording compliance

Recording compliance

Türkiye PDPL (KVKK) requires:

  • Consent disclosure at call start.
  • Audit logs of access to recordings.
  • Time-bounded retention (typically 1–3 years).

Our agent plays a disclosure (configurable per workflow). Recordings live in S3/MinIO with bucket policies; access goes through the dashboard, which logs every fetch.

Production checklist — 10 things to verify before a customer goes live

01
Concurrent caps confirmed
Per-trunk cap known + campaign-worker concurrency lower than carrier ceiling.
02
Recording compliance
Disclosure in opening prompt, dashboard download audit on, KVKK-compliant.
03
KVKK retention
Bucket lifecycle policy in MinIO/S3 set to 12–36 months per customer contract.
04
CDR backup
Database CDR confirmed writing; CSV fallback directory exists or warning silenced.
05
Codec discipline
Endpoint allow=ulaw,alaw only — no surprise Opus / G.722 transcoding.
06
rtptimeout safety
60 s in place. Confirm dashboard alerts on rtp-timeout-rate > 1%.
07
Firewall ACL hygiene
5060 + RTP both directions, AND direct UA-to-UA path if proxy is light-mode.
08
Transport confirmed
Active tcpdump, not just docs. UDP vs TCP misalignment is silent.
09
Header sanity
All custom headers ASCII names. Run a smoke INVITE before go-live.
10
Monitoring
Datadog/Sentry on agent + dashboard; alert on 5xx rate and answer rate drop.
Try this on KKB
$ ssh freya@192.168.35.197 \
    "docker exec freya-asterisk asterisk -rx 'cdr show status'"
$ ssh freya@192.168.35.197 \
    "docker exec freya-asterisk asterisk -rx 'pjsip show endpoint providers'" \
    | grep -E 'allow|direct_media|rtptimeout|force_rport|rewrite_contact'
Checkpoint 6

Why is our direct_media = no non-negotiable today?

Show answer

Two reasons that compound: (1) topology — symmetric NAT plus the typical TR enterprise firewall layout means peers cannot reliably negotiate end-to-end RTP, so Asterisk relaying audio is the only path that works through every customer's network; (2) product — our voice agent must record, transcribe, and inject TTS into the audio stream. Direct media would route RTP UA-to-UA and bypass our pipeline entirely. As long as we are an AI agent — not a pure SIP proxy — direct media stays off.