Architecting Element Call: Escaping Docker Bottlenecks, Double Encryption, and WebRTC Port Ranges

Deploying a native, high-performance video conferencing stack for a sovereign Matrix homeserver requires far more than just spinning up a few containers. At the core of Element Call’s architecture are two critical components: LiveKit (operating as the Selective Forwarding Unit, or SFU) and Coturn (acting as the STUN/TURN relay).

When engineering the communications stack for our sovereign appliances, we realized that blindly following generic WebRTC deployment tutorials leads to catastrophic performance bottlenecks, excessive memory overhead, and fundamental misunderstandings of cryptographic payloads.

If you have ever watched your Docker daemon hang indefinitely on startup, or wondered why a TURN relay is even necessary alongside an SFU, this architectural deep-dive will clarify exactly how to build a zero-compromise Element Call environment.

1. The Docker Daemon Murderer: UDP Port Ranges

The most common mistake in containerized WebRTC deployments is exposing massive blocks of UDP ports. Standard documentation often suggests allocating a range like 60000-60200 for media transport.

When you map a wide port range in a docker-compose.yml file, Docker does not simply write a single blanket iptables rule. Instead, the Docker daemon spawns an individual, userland docker-proxy process for every single port in that range. Mapping hundreds of ports instantly consumes hundreds of megabytes (or even gigabytes) of RAM, spikes CPU usage during initialization, and causes agonizingly slow container startup times.

The Architectural Fix: UDP Multiplexing We completely eliminated the UDP port range overhead. Modern LiveKit configurations support UDP Multiplexing, allowing us to bind all incoming participant media tracks through a single UDP socket on port 7882/udp. The SFU is perfectly capable of demultiplexing the inbound SRTP streams internally based on their headers. This keeps the Docker network footprint exceptionally light, resulting in instant stack initialization and zero docker-proxy memory bloat, without sacrificing concurrency or call capacity.

2. Protocol Realities: Why UDP is Mandatory and TCP is a Last Resort

In our configuration, we explicitly open 7882/udp alongside a 7881/tcp fallback. Understanding why UDP is prioritized requires looking at how transport layers handle packet loss.

Real-time media must travel over UDP. Because UDP is stateless, it fires packets continuously. If a packet containing a single frame of video drops in transit, UDP simply drops it and renders the next available frame. The user experiences a microscopic, unnoticeable visual glitch.

TCP, conversely, guarantees in-order delivery. If a TCP packet drops, the protocol halts the entire stream and demands a retransmission. In a video call, this results in “head-of-line blocking”—the video freezes, stutters, and the audio roboticizes while the protocol frantically tries to recover stale data that is no longer relevant to the real-time conversation.

We prioritize 7882/udp for raw performance. However, because our infrastructure is built to survive highly restrictive corporate firewalls—which often drop unrecognized UDP traffic entirely—LiveKit gracefully falls back to negotiating the ICE handshake over 7881/tcp to guarantee connectivity.

3. The Coturn Misconception: SFUs Do Not Replace Relays

There is a pervasive myth among system administrators: “If I am using an SFU like LiveKit to handle routing, I don’t need a STUN/TURN server.” This is architecturally false.

To understand why, you have to separate the destination from the path:

  • LiveKit (The SFU): The destination. It receives your client’s video track and selectively forwards it to other room participants, saving your local device from uploading ten separate video streams.
  • Coturn (The TURN Server): The bridge across broken network topologies.

When a user is trapped behind Carrier-Grade NAT (CGNAT) or a symmetric corporate firewall, their router will actively block the incoming WebRTC media streams originating from the LiveKit server. During the Interactive Connectivity Establishment (ICE) phase, the client realizes it cannot establish a direct path to the SFU.

Without a TURN server, that user gets a black screen. Coturn solves this by acting as a reliable public relay. The restricted client connects to Coturn over standard allowed ports, and Coturn relays that media directly to LiveKit. (Pro-tip: We deploy Coturn using network_mode: host to bypass Docker’s NAT entirely, allowing it to dynamically allocate its ephemeral relay ports directly on the host interface with zero container-networking overhead).

4. The “Double Encryption” Trap: Why We Dropped TURNS

Security audits frequently flag the use of standard STUN/TURN (port 3478) and blindly mandate the use of STUNS/TURNS (port 5349 wrapped in TLS). We deliberately chose to omit port 5349 and run purely on 3478.

Why? Because wrapping a TURN relay in TLS results in double encryption, which provides zero additional security for the media payload while adding unnecessary cryptographic latency.

By protocol design, all WebRTC media streams are mandated to be encrypted using SRTP (Secure Real-time Transport Protocol). The initial key exchange happens over DTLS. Therefore, the payload traversing the Coturn server is completely opaque; neither the relay server nor the ISP can intercept the audio or video. Wrapping this already-encrypted traffic in a secondary TLS tunnel solely to satisfy a checkbox creates administrative overhead (managing SSL certificates inside the Coturn container) for zero tangible security gain.

A Note on End-to-End Encryption (E2EE): It is important to note that standard DTLS provides hop-by-hop encryption (Client -> SFU -> Client). The SFU decrypts the outer transport layer to route the tracks. For true E2EE—where even the LiveKit server cannot read the media—Element Call utilizes SFrame (Secure Frame) or Insertable Streams. This encrypts the individual video frames before they enter the WebRTC transport layer, guaranteeing absolute mathematical privacy regardless of the transit path.

5. Securing the Perimeter: Application vs. Network Layer Authentication

A sovereign Matrix homeserver should be invisible to the public. In our stack, the Tuwunel Matrix backend is strictly locked behind a VPN, enforced by Traefik middlewares (vpn-only@docker). Yet, to function correctly, the LiveKit media ports and the lk-jwt-service must remain exposed to the public internet.

Does this public exposure compromise the VPN perimeter? No. The security simply shifts from the network layer to the application layer.

Here is the chain of cryptographic trust:

  1. A client connects to the secure VPN.
  2. They authenticate natively with the private Tuwunel Matrix homeserver.
  3. To join a voice/video channel, the client pings the public LiveKit JWT Service (/sfu/get).
  4. The JWT Service reaches out to the Tuwunel homeserver via the internal Docker network to validate the user’s Matrix access token.
  5. Because an external attacker cannot reach the VPN-locked Matrix server to acquire an access token in the first place, the JWT service immediately rejects their request.
  6. For valid users, a short-lived cryptographic JWT is issued, granting them access to the LiveKit SFU.

By exposing the media ports and token service publicly, we ensure that heavy, high-bandwidth UDP video traffic routes optimally across the open internet rather than bottlenecking inside the VPN tunnel, while absolute access control remains cryptographically anchored to the secure internal network.