A phone call is two streams, not one
Every phone call has two completely independent paths between the two parties:
- Signaling — the control plane. "I want to call you", "ringing", "you answered", "we hung up". Tiny text messages.
- Media — the voice plane. The actual audio samples. Big, continuous, real-time.
These run on different protocols, different ports, and often different network paths. 95% of bugs you will debug come from the two planes disagreeing: signaling says the call is up but media never flows. Or media flows for a while, then stops.
Burn this in: "signaling vs media" is the first question you ask of any incident.
PSTN, ISDN, SS7 — the legacy world
The Public Switched Telephone Network ran on circuit-switched copper for a century. Telephone exchanges set up a continuous electrical path. Later, ISDN digitized it, and SS7 (Signaling System 7) carried out-of-band signaling between exchanges.
You will not configure SS7. But the language of telephony — caller-ID, ring tones, busy signals, DTMF, Q.850 cause codes — was invented for SS7 and is still spoken by SIP.
Q.850 cause codes show up constantly. The campaign that just failed at KKB returned Reason: Q.850;cause=41 — that's a PSTN-era "temporary failure" code, tunneled through SIP.
The shift to VoIP
VoIP = "Voice over IP". Same call, but signaling and media ride on IP packets instead of dedicated circuits. Three big benefits:
- Cheap — shares the data network.
- Programmable — every call is a message you can inspect.
- Endpoint-agnostic — a softphone, a hardware phone, an Asterisk dialplan, a browser, a voice AI agent — they all look the same to the signaling layer.
The cost: VoIP exposes you to NAT, firewalls, jitter, and codec negotiation. We will spend half this guide on those.
SIP vs H.323 vs WebRTC vs MGCP
Four major VoIP signaling protocols. SIP is what we use. WebRTC reuses many SIP concepts (SDP, ICE, RTP) but its signaling is application-defined. Asterisk speaks SIP natively and can transcode to WebRTC.
| Protocol | Year | Where you see it |
|---|---|---|
| H.323 | 1996 | Cisco enterprise, video conferencing legacy |
| MGCP / Megaco | 1999 | Cable carriers, controlled endpoints |
| SIP | 1999 | Almost everything modern (our stack) |
| WebRTC | 2011 | Browsers, mobile apps |
The four SIP entities
The SIP RFC defines four logical roles. A single piece of software can play more than one.
- User Agent (UA) — anything that originates or terminates calls. A phone, a softphone, Asterisk acting as the called party.
- Proxy — forwards SIP requests. Doesn't terminate the call. Cisco KASP at Garanti is a proxy.
- Registrar — accepts
REGISTERrequests, builds a phone-book of where each user can be reached. - Redirect server — replies "go ask over there" instead of forwarding.
In our stack, Asterisk is a UA. The customer's SBC is usually a proxy or a back-to-back UA (B2BUA).
Layered model and default ports
A SIP message is text (like HTTP). It rides on UDP, TCP, or TLS, which rides on IP. The audio (RTP) rides on UDP/IP separately.
+-----------+ SIP signaling +-----------+ | Asterisk | <----- UDP/5060 -----> | SBC | | | <----- RTP/10000+ ----> | | +-----------+ audio media +-----------+
| Port | Protocol | Use |
|---|---|---|
| 5060 | UDP/TCP | SIP signaling (cleartext) |
| 5061 | TLS | SIP over TLS (encrypted signaling) |
| 8089 | WSS | SIP over WebSocket-Secure (WebRTC) |
| 10000–10499 | UDP | RTP/RTCP media (Asterisk default range) |
RTP carries the media
Real-time Transport Protocol. Tiny header (12 bytes), then 20–30 ms of audio samples. Sent ~50× per second per direction.
The two key fields:
- Payload Type — which codec.
0= G.711 µ-law,8= G.711 A-law,9= G.722,101= DTMF events. - SSRC — synchronization source, identifies one stream.
RTP has no built-in security or flow control. Its companion RTCP carries receiver reports (jitter, packet loss). When carriers grade your call quality, they read RTCP.
Codecs you will actually meet
Freya negotiates ulaw, alaw on every trunk. You will rarely touch the others — but you'll see them in SDP offers and need to recognize where they belong.
| Codec | Sample rate | Bitrate (kbps) | Quality (MOS) | CPU cost |
|---|
$ ssh freya@192.168.35.197 \
"docker exec freya-asterisk asterisk -rx 'pjsip show endpoints' | head -40"
You should see endpoint definitions naming the codecs.
DTMF — three ways to send a digit
Dual-Tone Multi-Frequency. The "press 1 for sales" tones. Three ways to carry them across IP:
| Mode | How | Reliability |
|---|---|---|
| In-band audio | Actual tones in the RTP stream | Fragile — G.729 distorts them |
| RFC 2833 / RFC 4733 | Special RTP packets (payload type 101) | Most common, our default |
| SIP INFO | Out-of-band SIP messages | Reliable but hops through every proxy |
Our default in pjsip.conf is dtmf_mode = rfc4733. Anadolu A27 (the IVR DTMF-swallow bug) was a timing problem on this exact path — the customer's IVR played an announcement that masked the early DTMF.
$ ssh freya@192.168.35.197 \
"docker exec freya-asterisk asterisk -rx 'pjsip show endpoint trunk' | grep -i dtmf"
Quality metrics
If a customer says "calls sound bad", quantify it:
| Metric | What it means | Threshold |
|---|---|---|
| Jitter | Variance in inter-packet arrival time | >30 ms = audible |
| Packet loss | RTP packets dropped | >1% degrades, >5% breaks |
| MOS | Mean Opinion Score (1–5) | >4.0 is good |
| Round-trip latency | Talker to listener and back | >300 ms = talk-over |
These come out of RTCP and from pjsip show channelstats on Asterisk.
$ ssh freya@192.168.35.197 \
"docker exec freya-asterisk asterisk -rx 'pjsip show channelstats'"
What two protocols carry signaling and media for a SIP call, and on which default ports?
Show answer
Signaling uses SIP, typically on UDP/TCP port 5060 (or TLS on 5061, WSS on 8089). Media uses RTP on a UDP port range — Asterisk defaults to 10000–10499. The two streams ride independent paths; "signaling up but media missing" is the most common voice bug.