SIP Telephony — Zero to Hero v1 · 2026-04-25
PART 03 · STEPS 26–40

Media, RTP, NAT

SIP sets up the call; RTP carries the audio. Once you understand the 12-byte header, the symmetric-port trick, the 60-second safety net, and where NAT chews everything up, ninety percent of "the audio sounds bad" tickets become trivial.

15 steps 4 demos ~22 minutes
26rtp packet

RTP packet structure

Every audio frame is wrapped in a 12-byte RTP header followed by the encoded payload. The header is small, fixed-shape, and tells you everything you need to align packets in time, drop duplicates, and reassemble streams from multiple senders.

Payload type 0 is G.711 µ-law, 8 is A-law, 101 is the DTMF event payload. The 16-bit sequence number lets you detect lost or reordered packets; the 32-bit timestamp drives the playout buffer; the SSRC identifies the source so a mixer can demultiplex.

Demo 1 — interactive RTP header
click a field
0 8 16 24 31 V=2 2 bits P 1 X 1 CC 4 bits M 1 PT (payload type) 7 bits sequence number 16 bits timestamp 32 bits — increments by samples per packet SSRC — synchronization source 32 bits — random, identifies the sender payload — 160 bytes (G.711 @ 20ms) ... 12 bytes total header + payload of N bytes
Hover or click a field
Each field has a fixed bit position. The receiver parses bytes 0–11, then the codec consumes the rest.
12-byte fixed header. PT=0 means G.711 µ-law @ 8 kHz; sample-tick is 8000/sec, so a 20 ms packet bumps the timestamp by 160.
# a typical Asterisk RTP frame on the wire (PT=0, ulaw)
80 00 7a 31  ad 4e f6 80   91 4c 28 d3   // V=2 PT=0 seq=31281 ts=2907211904 ssrc=0x914c28d3
ff fe ff fc ... // 160 bytes of µ-law samples
27rtcp

RTCP — the side-channel

Every RTP stream has a paired RTCP stream. By convention RTCP runs on RTP port + 1. RTCP carries receiver reports (jitter, loss, round-trip estimate) and sender reports (NTP-anchored timestamp, packet/byte counts) about every 5 seconds.

When a customer says "the call sounded bad", RTCP is where the evidence lives. pjsip show channelstats summarizes the most recent receiver report per channel — jitter in ms, fraction lost, sequence gaps. If your jitter is over 30 ms or loss is over 1%, you have a network problem, not an Asterisk problem.

28codec sdp

Codec negotiation in SDP

Each a=rtpmap: line registers a codec under a payload-type number. The order of payload types on the m=audio line is the offerer's preference. The answerer picks one and replies with a single PT.

Asterisk's disallow=all; allow=ulaw,alaw in pjsip.conf means: tell the peer we only speak ulaw and alaw. If they offer Opus, we send 488 Not Acceptable Here.

// snippet from an INVITE's SDP
m=audio 18342 RTP/AVP 8 0 101
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
29symmetric rtp

Symmetric vs asymmetric RTP

Symmetric RTP = the source port for outgoing media is the same as the destination port for incoming. Almost universal in practice. The setting rtp_symmetric = yes in our pjsip.conf makes Asterisk send replies back to whatever port the peer sent from, even if the SDP advertised a different port. This is the single line that saves us when peers are behind NAT.

Without it, you politely follow the SDP into a black hole. With it, you reply to the address that successfully reached you, which is by definition routable.

30nat breaks

Why NAT breaks SIP

The SIP message body (SDP) carries the IP address (c= line) and port (m= line) where media should be sent. If your endpoint is behind NAT, those fields contain a private IP. The peer dutifully sends RTP into the void.

Three responses, in increasing order of complexity:

  1. Peer-side fix: tell Asterisk to ignore the SDP c= and use the actual packet source (rtp_symmetric=yes).
  2. UA-side fix: rewrite the SDP with the public IP using external_media_address in pjsip.conf.
  3. Tunnel: ICE / STUN / TURN. Required for browser endpoints.
Demo 2 — NAT visualizer
NAT (carrier / corporate) Phone 192.168.1.10 :10042 NAT 203.0.113.7:? Asterisk res_pjsip :8000–8500 STUN (3478) idle — pick a scenario
SDP advertises a private IP. Without help, return RTP from Asterisk dies at the NAT boundary. rtp_symmetric=yes lets Asterisk reply to the observed source port; STUN lets the phone rewrite its own SDP using its discovered public mapping.
31stun

STUN — discover your public IP

A STUN server answers the simplest possible question: "from my point of view, your packet came from IP X, port Y." The UA sends one binding request, parses the response, and now knows its own external mapping. It writes that into SDP. Works for most NATs, including the asymmetric / restricted cone variants common on residential and corporate networks.

coturn is our STUN/TURN server. It runs on every Freya host (coturn-config/turnserver.conf). Default port: UDP/TCP 3478.

32turn

TURN — relay when STUN fails

For symmetric NAT (rare, but real on certain mobile carriers), STUN fails: the public port the STUN server saw is not the same one the call peer would see. You need a relay. TURN is a STUN extension that forwards media on your behalf. It adds latency and burns bandwidth, but it works when nothing else does.

WebRTC clients (browser-side) need TURN as a fallback. SIP trunks usually do not — carriers control their own NAT and provision symmetric paths.

33ice

ICE — gather and prioritize candidates

Interactive Connectivity Establishment. Each side gathers a list of candidate addresses — host (local), server-reflexive (via STUN), relayed (via TURN) — exchanges them in SDP, then probes pairs in priority order until a working pair is found.

icesupport = yes in rtp.conf enables it. Without ICE, NAT traversal is fragile; with ICE, browsers can join calls reliably.

Discovery

STUN

  • "What's my public IP?"
  • One round-trip to port 3478
  • Free, low cost
  • Fails on symmetric NAT
Relay

TURN

  • Forwards media end-to-end
  • Authenticated, eats bandwidth
  • Works through any NAT
  • Adds 5–30 ms latency
Orchestration

ICE

  • Gathers host + STUN + TURN
  • Prioritizes & probes pairs
  • Picks the best working path
  • Required for WebRTC
34sbc

SBC — the carrier-side translator

Session Border Controller. A box that sits between two SIP networks and translates everything: codecs, transport (UDP/TCP/TLS), IP versions, SDP rewriting, header normalization, security policy.

Anadolu Sigorta uses a Genesys SBC at 10.1.137.60/61. It:

  • Hides their internal IP map from us.
  • Enforces TLS / TCP transport.
  • Strips or adds custom headers.
  • Polices the RTP port range.

When you debug an incident with an SBC in the path, every assumption you made about end-to-end transparency is wrong. Always capture on both sides of the SBC, not just yours.

35rtp timeout

RTP timeout — the 60-second safety net

rtp.conf has rtptimeout = 60. If Asterisk does not see RTP for 60 seconds, it tears the channel down. This is not a bug and not a tuning knob. It is a safety feature that catches:

  • Lost ACK — call is "up" in signaling but no media confirms it.
  • One-way audio — we send, they don't.
  • Silent disconnects — peer rebooted; nothing tells us.

Whenever a customer says "calls drop at exactly 60 seconds", the answer is always "your ACK is lost or your RTP path is broken in one direction". Find the missing message; do not raise the timeout.

Demo 3 — 60-second RTP timeout
timeline (10× accelerated — 9 s real = 90 s simulated) 0:00
0:00
0:30
1:00
1:30
// pick a scenario to start the call
"Drop ACK" and "One-way RTP" both terminate at exactly 0:60 — Asterisk fires rtptimeout, regardless of whether the SIP dialog thinks the call is up. The fix is upstream: find the missing ACK or the blocked RTP direction.
36one-way audio

One-way audio

Classic symptoms:

  • Caller hears callee, callee hears nothing (or vice versa).
  • Call drops at 60 seconds because rtptimeout fires on the silent side.

Causes, in order of likelihood:

  1. SDP carries an unreachable IP (NAT not handled).
  2. RTP firewall is asymmetric — one direction allowed, the other blocked.
  3. Wrong RTP port range advertised vs allowed by the firewall.
  4. Codec mismatch sneaks through (rare; usually 488 instead).
Try this on KKB
$ asterisk -rx "rtp set debug on"
$ asterisk -rx "rtp set debug off"   # turn it off, it's noisy

If only one direction is logged, you have your answer.

37codec mismatch

Codec mismatch troubleshooting

If you see 488 Not Acceptable Here on an INVITE, the offerer's codec list does not intersect with your allow=. Common causes:

  • Customer offers G.722 only; we allow ulaw/alaw only.
  • Customer offers Opus (rare, usually a WebRTC client).
  • Customer offers a proprietary codec we don't have a license for.

Fix path: capture the SDP in the INVITE, see what they actually offered, then either update allow= to add their codec or push back on them to enable ulaw. Never silently transcode something exotic — CPU cost in production scales linearly with concurrent calls.

38direct media

Direct media vs media-relay

  • direct_media = yes — once both peers know each other's media address, RTP flows peer-to-peer. Asterisk drops out of the media path. Saves CPU; loses visibility.
  • direct_media = no — Asterisk relays every RTP packet. CPU cost, but Asterisk can do MixMonitor (recording), DTMF detection, transcoding, and fork the audio to the AI agent.

Our default is direct_media = no. We need to record calls and run the agent on the media stream. Direct media would make Asterisk invisible to the audio, which defeats the entire architecture.

39port range

RTP port range

rtp.conf:

rtpstart = 10000
rtpend   = 10499

Each call leg uses one even/odd pair (RTP + RTCP). 500 ports = up to ~250 simultaneous legs. If you hit the limit, calls fail to negotiate media — symptom looks like a codec mismatch but is actually port exhaustion.

Demo 4 — port allocation (50 of 500 slots, scaled)
used 0 / 500 — capacity 0%
Each cell = 10 ports (one or two concurrent legs). Firewall ACLs must permit UDP 10000–10499 in both directions between Asterisk and the carrier's RTP range. This is the single most common firewall bug at customer onboarding.
40srtp

SRTP — encrypted media

Secure RTP (RFC 3711). Encrypts the payload, authenticates the header. Keys are exchanged either in the SDP under a=crypto: (SDES — keys travel in-band, must be over TLS) or via a DTLS handshake on the media port (used by WebRTC).

We do not currently use SRTP on any production trunk. WebRTC does, mandatorily. When a customer requires SRTP, you enable it on the endpoint with media_encryption = sdes and ensure SIP runs over TLS so the key exchange itself is protected.

; pjsip.conf endpoint snippet for SRTP
[customer-trunk]
type = endpoint
transport = transport-tls
media_encryption = sdes
media_encryption_optimistic = no   ; refuse plaintext fallback
Checkpoint 3

A new customer's calls drop at exactly 60 seconds. Their SBC is behind NAT. List four hypotheses, in the order you would check them.

Show answer
  1. Lost ACK after 200 OK. Signaling thinks the call is up; no media flows because the dialog never confirmed at the SBC. Check pjsip set logger on for missing ACK from us, and SBC logs for retransmits.
  2. One-way RTP (we send, they drop). Their NAT or firewall blocks our return path. Check rtp set debug on — packets in but none out, or vice versa.
  3. SDP advertises an unreachable IP. Their SBC put a private IP in the c= line. Confirm rtp_symmetric = yes on our endpoint; if it's already on, capture and verify we're sending to the observed source instead.
  4. Firewall blocks UDP 10000–10499 inbound. The carrier's NAT permits outbound but the customer's firewall blocks inbound RTP. Test with nc -u to a known port; ask their network team to confirm both directions.

Never raise rtptimeout. The 60-second drop is the symptom; the broken ACK or one-way path is the cause.