Part 07 — Troubleshooting Tools

86pjsip show endpoints

Asterisk's confession booth: `pjsip show *`

Before any packet capture, before any tcpdump, ask Asterisk what it thinks. pjsip show endpoints is the first thing to type when a customer says "calls are failing". If a trunk's Endpoint state is anything other than In use or Not in use, you have a config error and you can stop debugging the network.

The four pjsip show commands you will reach for ninety percent of the time are endpoints, contacts, channels, and identifies. Below is the cheat sheet — click any row to see a longer example output on the right. Filter by typing.

Demo 1 — Tools cheat sheet (interactive)

click a row · type to filter

Command	What it does	Snippet

All commands run inside docker exec freya-asterisk on KKB. The cheat sheet covers ~95% of day-to-day Freya SIP triage.

Try this on KKB

$ ssh freya@192.168.35.197 \
    "docker exec freya-asterisk asterisk -rx 'pjsip show endpoints'"

87live SIP capture

The reflex: `pjsip set logger on` + `docker logs -f`

When pjsip's snapshots aren't enough, watch the dialog happen. pjsip set logger on tells Asterisk to print every inbound and outbound SIP message to the same log stream as the rest of the agent — which means docker logs -f becomes your live SIP scope. Place a test call, capture the INVITE / 100 / 180 / 200 / ACK arc, then turn the logger off so the logs stop drowning in SIP.

Pair the logger with grep -E 'INVITE|200 OK|ACK|BYE|SIP/2.0' to keep only the headers that matter. If you need the raw bytes (header parsing bugs, suspect transport mismatch), drop to tcpdump in step 89.

# 1. flip logger on
ssh freya@192.168.35.197 "docker exec freya-asterisk asterisk -rx 'pjsip set logger on'"

# 2. tail and filter
ssh freya@192.168.35.197 "docker logs -f freya-asterisk" \
  | grep -E 'INVITE|200 OK|ACK|BYE|SIP/2.0|488|503'

# 3. place the test call from the customer side

# 4. flip logger off (otherwise prod logs drown)
ssh freya@192.168.35.197 "docker exec freya-asterisk asterisk -rx 'pjsip set logger off'"

What you are looking for in the stream:

No INVITE at all → packet never reached Asterisk. Move to tcpdump; suspect H1 firewall or H5 identify.
INVITE with no 100 Trying within 200ms → Asterisk silently dropped it. This was the Garanti M08 root cause. Check the identify match block (H5).
488 Not Acceptable Here → codec or transport mismatch (H4). Read the SDP carefully.
200 OK then call drops at ~32s or ~60s → ACK lost on the return path. Classic stateless proxy symptom (H3).

88sngrep

sngrep — terminal Wireshark for SIP

When you can't get a customer to share a Wireshark capture (compliance, no shell, bureaucracy), sngrep on our side gives you 80% of the same view. It captures live, groups packets by SIP dialog, and renders a ladder diagram per call right inside the terminal. Arrow-keys to select a dialog, Enter for the ladder, F2 to save the dialog as a pcap.

Demo 2 — sngrep dialog list (click into the panel, then use ↑ ↓, Enter)

sngrep — Capture: any port 5060 — 5 dialogs [F1] help [F2] save [Enter] flow

IdxMethodFromToSourceMsgsState

↑↓ select Enter ladder / filter q quit

Real sngrep is faster than this mock and uses ncurses, but the muscle memory transfers. On KKB run sudo sngrep -i any port 5060.

The killer feature is per-dialog grouping: ten concurrent calls become ten clean rows, not 600 interleaved messages. When a dialog row goes red ("FAIL"), it's already pre-grouped — open it and the ladder tells you which leg failed.

89tcpdump & Wireshark

tcpdump on the host, Wireshark on your laptop

For media plane debugging — jitter, packet loss, codec negotiation, listening to actual audio — sngrep is too narrow. Capture full UDP with tcpdump, scp the pcap to your laptop, and open it in Wireshark. The trick is filtering only what matters: SIP on 5060 plus the RTP port range Freya uses (10000–10499).

ssh freya@192.168.35.197 \
  "sudo tcpdump -i any -w /tmp/call.pcap udp portrange 10000-10499 or port 5060"
# place a call, then Ctrl-C the tcpdump
scp freya@192.168.35.197:/tmp/call.pcap .
open call.pcap   # macOS opens it in Wireshark

Inside Wireshark, three menu paths are all you need:

Telephony → VoIP Calls → pick the call → Flow Sequence for the ladder.
Telephony → RTP → Stream Analysis for jitter, packet loss, sequence gaps.
Telephony → RTP → RTP Player to listen. If the audio plays cleanly here, your problem is upstream of RTP. If it doesn't, the problem is RTP itself (NAT, port range, codec).

What the Flow Sequence view looks like (mock screenshot):

Wireshark · call.pcap · Flow Sequence

Time VVB 10.231.16.148 KASP 10.231.16.153 Asterisk 10.231.x.120 Comment

12:37:16.108

SIP/SDP INVITE sip:agent@.120 (CSeq 101)

12:37:16.302

SIP/SDP INVITE sip:agent@.120 (forwarded)

12:37:23.118

SIP 502 No More Routes (KASP timeout)

12:37:23.122

SIP 502 returned to VVB · call drops

Read this: KASP forwarded the INVITE at 12:37:16.302 and waited. Asterisk never replied — no 100 Trying, no 180, no 200. After 7 seconds KASP's own timer fired and it returned 502 No More Routes. Absence of a packet, not presence of an error. That is exactly the signature of H5 (identify match miss) — the canonical Garanti M08 root cause.

906-hypothesis tree

The 6-hypothesis decision tree

Every recurring SIP integration failure we have ever debugged at Freya falls into one of six bins. freya-onprem/playbooks/sip-asterisk-debug.md codifies them — Ricky's writeup of the Garanti M08 / M09 arc. Memorize this tree. Walking it in order means you do the cheap network checks before the expensive config tuning, which means you fix incidents in 30 minutes instead of three days.

Walk the tree. Each yes/no narrows you to a concrete next command:

Demo 3 — Walk the 6-hypothesis tree

Quote from the playbook: "Media plane is cheaper to rule out than pjsip config tuning. Signaling silence is the hardest thing to spot because it's an absence of a log line, not a visible error."

For at-a-glance scanning when you don't have time for the tree:

Firewall ACL gap

UDP 5060 or RTP 10000–10499 blocked. Most common on Day 1 of a new trunk.

tcpdump on Asterisk; do you see the INVITE arriving?

NAT / address rewrite

SDP carries an unreachable container-internal IP. external_media_address missing or wrong.

check SDP c= line; is it a private IP?

Stateless / light proxy

ACK and BYE bypass the proxy. Calls drop at exactly 32s or 60s.

is there a Record-Route header? if no, open 5060 direct UA-to-UA

Codec / transport mismatch

UDP vs TCP, ulaw vs G.722. Often surfaces as 488 Not Acceptable Here.

look at SIP/SDP transport vs the 488 Not Acceptable Here

Identify match miss

The trunk's match= block doesn't include the carrier's actual source IP. Asterisk drops the INVITE silently.

pjsip show endpoints; is the source IP in match=?

Header parse failure

Non-ASCII custom header names; missing required headers; malformed CSeq.

tcpdump and inspect bytes; ASCII-only field names?

Discipline: every time you face a SIP issue, write down which hypotheses you have ruled out and the evidence that rules them out. Save the artefact (pcap, log snippet, screenshot). The customer will ask the same question two weeks later, and you will be glad you have receipts.

Checkpoint 7

Pick a real-or-imagined incident: "calls fail with 503 Service Unavailable from our trunk". Walk through the six hypotheses in order. Which can you rule out without leaving your laptop?

Show answer

503 Service Unavailable is a server-side rejection — Asterisk (or our trunk) actually answered, just unhappily. That alone rules out the silent-drop hypotheses and narrows the tree:

H1 firewall — partially rule out. We got a SIP response, so the path is open in both directions for signaling. Still possible the RTP range is blocked, but the 503 itself proves 5060 is fine.
H5 identify — rule out. If identify didn't match, Asterisk would have dropped the INVITE silently (H5 signature), not returned 503. pjsip show endpoints from your laptop confirms the trunk is registered.
H6 header parse — rule out from the laptop by tcpdump'ing one INVITE and inspecting bytes for ASCII-only field names.
H2 NAT — keep. Asterisk could be returning 503 because the agent pool is unreachable from its own network namespace. Check pjsip show channels — if no inbound channels are landing, 503 likely means the dialplan can't route.
H3 stateless proxy — unlikely cause of 503; Record-Route issues cause mid-call drops, not synchronous 503s. Defer.
H4 codec / transport — keep. 488 Not Acceptable Here is more typical, but some carriers translate codec mismatches into 503. Check the SDP m= line in the failing INVITE.

From your laptop, in order: pjsip show endpoints (confirms trunk online, kills H5), pjsip show channels + tcpdump on a test INVITE (kills H6, narrows H2 vs H4), then pjsip set logger on on a live test call to read the rejection reason. The only hypothesis that requires the customer firewall team is H1 RTP, and you only escalate that after the laptop checks come back clean.

Troubleshooting Tools

Asterisk's confession booth: pjsip show *

The reflex: pjsip set logger on + docker logs -f

sngrep — terminal Wireshark for SIP

tcpdump on the host, Wireshark on your laptop

The 6-hypothesis decision tree

Asterisk's confession booth: `pjsip show *`

The reflex: `pjsip set logger on` + `docker logs -f`