SIP Telephony — Zero to Hero v1 · 2026-04-25
PART 01 · STEPS 1–10

Telephony Fundamentals

Before any SIP message makes sense, you need the mental model: every phone call is two independent streams (signaling and media), born of a century-old PSTN vocabulary, now riding IP. Burn this in and the rest of the stack falls into place.

10 steps 3 demos ~25 minutes
01signaling-vs-media

A phone call is two streams, not one

Every phone call has two completely independent paths between the two parties:

  • Signaling — the control plane. "I want to call you", "ringing", "you answered", "we hung up". Tiny text messages.
  • Media — the voice plane. The actual audio samples. Big, continuous, real-time.

These run on different protocols, different ports, and often different network paths. 95% of bugs you will debug come from the two planes disagreeing: signaling says the call is up but media never flows. Or media flows for a while, then stops.

Burn this in: "signaling vs media" is the first question you ask of any incident.

Demo · Two streams animator
Watch the two channels run independently. Try Drop ACK to see how the media plane never starts when signaling fails to complete.
02pstn-isdn-ss7

PSTN, ISDN, SS7 — the legacy world

The Public Switched Telephone Network ran on circuit-switched copper for a century. Telephone exchanges set up a continuous electrical path. Later, ISDN digitized it, and SS7 (Signaling System 7) carried out-of-band signaling between exchanges.

You will not configure SS7. But the language of telephony — caller-ID, ring tones, busy signals, DTMF, Q.850 cause codes — was invented for SS7 and is still spoken by SIP.

Q.850 cause codes show up constantly. The campaign that just failed at KKB returned Reason: Q.850;cause=41 — that's a PSTN-era "temporary failure" code, tunneled through SIP.

Demo · Q.850 cause-code lookup
14 codes
type a number, or use ↑ ↓
Try 16 (normal hangup), 17 (busy), 41 (the KKB failure), 102 (no response). Arrow keys cycle through known causes.
03voip

The shift to VoIP

VoIP = "Voice over IP". Same call, but signaling and media ride on IP packets instead of dedicated circuits. Three big benefits:

  • Cheap — shares the data network.
  • Programmable — every call is a message you can inspect.
  • Endpoint-agnostic — a softphone, a hardware phone, an Asterisk dialplan, a browser, a voice AI agent — they all look the same to the signaling layer.

The cost: VoIP exposes you to NAT, firewalls, jitter, and codec negotiation. We will spend half this guide on those.

04signaling-protocols

SIP vs H.323 vs WebRTC vs MGCP

Four major VoIP signaling protocols. SIP is what we use. WebRTC reuses many SIP concepts (SDP, ICE, RTP) but its signaling is application-defined. Asterisk speaks SIP natively and can transcode to WebRTC.

ProtocolYearWhere you see it
H.3231996Cisco enterprise, video conferencing legacy
MGCP / Megaco1999Cable carriers, controlled endpoints
SIP1999Almost everything modern (our stack)
WebRTC2011Browsers, mobile apps
05sip-entities

The four SIP entities

The SIP RFC defines four logical roles. A single piece of software can play more than one.

  • User Agent (UA) — anything that originates or terminates calls. A phone, a softphone, Asterisk acting as the called party.
  • Proxy — forwards SIP requests. Doesn't terminate the call. Cisco KASP at Garanti is a proxy.
  • Registrar — accepts REGISTER requests, builds a phone-book of where each user can be reached.
  • Redirect server — replies "go ask over there" instead of forwarding.

In our stack, Asterisk is a UA. The customer's SBC is usually a proxy or a back-to-back UA (B2BUA).

06layered-model

Layered model and default ports

A SIP message is text (like HTTP). It rides on UDP, TCP, or TLS, which rides on IP. The audio (RTP) rides on UDP/IP separately.

+-----------+     SIP signaling          +-----------+
|  Asterisk |  <----- UDP/5060 ----->    |   SBC     |
|           |  <----- RTP/10000+ ---->   |           |
+-----------+      audio media           +-----------+
PortProtocolUse
5060UDP/TCPSIP signaling (cleartext)
5061TLSSIP over TLS (encrypted signaling)
8089WSSSIP over WebSocket-Secure (WebRTC)
10000–10499UDPRTP/RTCP media (Asterisk default range)
07rtp

RTP carries the media

Real-time Transport Protocol. Tiny header (12 bytes), then 20–30 ms of audio samples. Sent ~50× per second per direction.

The two key fields:

  • Payload Type — which codec. 0 = G.711 µ-law, 8 = G.711 A-law, 9 = G.722, 101 = DTMF events.
  • SSRC — synchronization source, identifies one stream.

RTP has no built-in security or flow control. Its companion RTCP carries receiver reports (jitter, packet loss). When carriers grade your call quality, they read RTCP.

08codecs

Codecs you will actually meet

Freya negotiates ulaw, alaw on every trunk. You will rarely touch the others — but you'll see them in SDP offers and need to recognize where they belong.

Demo · Codec comparison
click a row
Codec Sample rate Bitrate (kbps) Quality (MOS) CPU cost
Click a row to see where each codec is used in production.
Bars are scaled to the heaviest codec in the set. CPU cost is qualitative — measured per RTP frame.
Try this on KKB
$ ssh freya@192.168.35.197 \
    "docker exec freya-asterisk asterisk -rx 'pjsip show endpoints' | head -40"

You should see endpoint definitions naming the codecs.

09dtmf

DTMF — three ways to send a digit

Dual-Tone Multi-Frequency. The "press 1 for sales" tones. Three ways to carry them across IP:

ModeHowReliability
In-band audioActual tones in the RTP streamFragile — G.729 distorts them
RFC 2833 / RFC 4733Special RTP packets (payload type 101)Most common, our default
SIP INFOOut-of-band SIP messagesReliable but hops through every proxy

Our default in pjsip.conf is dtmf_mode = rfc4733. Anadolu A27 (the IVR DTMF-swallow bug) was a timing problem on this exact path — the customer's IVR played an announcement that masked the early DTMF.

Try this on KKB
$ ssh freya@192.168.35.197 \
    "docker exec freya-asterisk asterisk -rx 'pjsip show endpoint trunk' | grep -i dtmf"
10quality-metrics

Quality metrics

If a customer says "calls sound bad", quantify it:

MetricWhat it meansThreshold
JitterVariance in inter-packet arrival time>30 ms = audible
Packet lossRTP packets dropped>1% degrades, >5% breaks
MOSMean Opinion Score (1–5)>4.0 is good
Round-trip latencyTalker to listener and back>300 ms = talk-over

These come out of RTCP and from pjsip show channelstats on Asterisk.

Try this on KKB
$ ssh freya@192.168.35.197 \
    "docker exec freya-asterisk asterisk -rx 'pjsip show channelstats'"
Checkpoint 1

What two protocols carry signaling and media for a SIP call, and on which default ports?

Show answer

Signaling uses SIP, typically on UDP/TCP port 5060 (or TLS on 5061, WSS on 8089). Media uses RTP on a UDP port range — Asterisk defaults to 10000–10499. The two streams ride independent paths; "signaling up but media missing" is the most common voice bug.