Voice on MLS — Standardizing What Was Already Encrypted

Editor’s note (2026-06-02): a clarification on two points below. Voice frames are encrypted and authenticated under a key exported from the MLS group state that is shared per epoch by everyone in the channel. The per-sender ratchet described as a goal here — binding each frame to an individual sender so co-participants can’t impersonate one another — is implemented but gated off by default; it ships once it passes live two-client validation. And the “~2 second” window I mention is the brief grace period during which a just-rotated-out key is still accepted so packets already in flight decrypt; with the VOICE-09 fix that grace is now ~10 seconds by default, while the rekey itself happens immediately on every join and leave. The site copy was updated to match; this post is left as written.

A few weeks ago I wrote about retrofitting post-quantum cryptography across the stack. DM text key exchange got hybrid X25519 + ML-KEM-768. The QUIC transport got a PQ key exchange group. Voice got mentioned almost in passing — “voice packets are already E2E encrypted with per-channel keys.” That was true. The voice path was using the same X25519+ML-KEM-768 hybrid wrap as DM text to distribute the per-call symmetric key, then encrypting frames with XChaCha20-Poly1305. Everything was hybrid. Everything was post-quantum.

So why am I writing about voice encryption again?

The honest answer: it was custom code

The voice keying scheme worked. It was correct. It was even post-quantum. But it was Voidcom-original. I designed it, I implemented it, I wrote tests for it. We’re five friends building Voidcom — three of us writing code, two testing — and on the crypto specifically I’m the one wearing the hat. Five friends in pre-beta cannot afford a third-party crypto audit, and trusting “Max wrote it carefully” is not the same as “this primitive has been audited by the cryptography community.” Custom crypto without an audit means “trust me.” That’s not a stance I want to ship security on.

There were also three real flaws I’d been avoiding looking at too closely:

No per-sender authenticity. Every member of a voice channel shared the same symmetric key. That meant any member of the channel could technically forge audio frames as any other member. The sender_id field in each packet was authenticated data — sure — but the underlying key wasn’t bound to a specific sender’s identity. If someone got into a channel they shouldn’t be in, the protocol couldn’t tell you whose voice you were actually hearing.
No forward secrecy on leave. When someone left a voice channel, they kept the key. Forever. If they were quietly recording packets the whole time, they could decrypt anything they’d captured until I manually rotated. Manual rotation is the kind of operational hygiene that breaks down exactly when you can’t afford it to.
It didn’t scale. Every membership change was an O(N) re-wrap — fine at 8 people, painful at 80, miserable at 800. Voidcom is pre-beta. We don’t have 800 people in a voice channel today. But “doesn’t scale” is the kind of debt that becomes a fire drill exactly when you don’t want one.

I’d written an internal research doc laying these out and asking the obvious question: do we patch this, or do we replace the whole thing?

Discord did the hard work first

There’s a protocol for this. It’s called MLS — Messaging Layer Security, RFC 9420 — and it’s the IETF’s standardized answer to “how do you do continuous group key agreement at scale.” Wire uses it. Element uses it. And Discord uses it.

Discord’s variant is called DAVE, and they published the protocol whitepaper. They didn’t have to. They could have shipped it as a black box. Instead they wrote it down in detail — how the voice gateway acts as MLS Delivery Service, how External Senders work, where they skip the standard MLS PCS leaf rotation because real-time media sessions are ephemeral, what their post-leave decryption window looks like (~2 seconds, by the way). It’s a generous piece of engineering writing.

Reading that paper, the structure was obvious. The voice signaling service we already had mapped cleanly onto MLS Delivery Service. The voice SFU could carry MLS Welcome / Commit / Proposal messages without a new gRPC service. We could lift Discord’s trade-offs (skip PCS leaf rotation, batch proposals at the gateway, ~2s window) almost wholesale.

So that’s what we did. Voidcom voice now runs on MLS 1.0, the same protocol family as DAVE. We learned the architecture from Discord’s papers and shipped our own implementation on top of mls-rs, AWS Labs’ open-source MLS library.

What MLS gives us beyond audit pedigree

The audit pedigree alone would have been enough to justify the migration — moving from “Voidcom-original protocol I’d need to convince you to trust” to “RFC 9420, implemented on top of an open-source library, used by Wire, Element, and Discord.” But MLS also fixes all three of the flaws I listed above:

Per-sender signature keys. In MLS, each member has their own signing keypair for proposals and commits. The audio frames themselves still ride a symmetric AEAD (we kept XChaCha20-Poly1305 for that), but the symmetric key is exported from MLS group state, and the trust in who can validly modify that state is rooted in per-member signatures. Member A can no longer forge group operations as member B. The “any member can spoof any other” flaw goes away as a structural property, not a patch.

Forward secrecy on leave. MLS uses a TreeKEM construction where each group operation rotates the group key. When someone leaves, the next commit advances the epoch — and within ~2 seconds, the departing member’s key is dead. They can no longer decrypt new frames. We took this window directly from DAVE; it’s the right balance between forward secrecy and tolerating in-flight packets across the rotation.

Scaling. TreeKEM rekeys are O(log N). I benchmarked our implementation: at N=99 (Discord’s product cap), a single Add commit costs ~1.5 ms on the committer side; a Welcome message to a new joiner is ~25 KB. The old O(N) re-wrap at 99 members would have been painful enough that we’d have had to put a low cap on voice channels and fight to raise it. MLS removes the conversation.

What we kept: post-quantum hybrid

Here’s the part where Voidcom voice diverges from Discord DAVE.

The MLS specification doesn’t dictate a ciphersuite. It defines the protocol — the TreeKEM, the External Sender extension, the export-secret derivation — and lets implementations pick which HPKE primitive does the actual key agreement. Discord DAVE picked ciphersuite 2 — MLS_128_DHKEMP256_AES128GCM_SHA256_P256. Classical NIST P-256 ECDH for HPKE, AES-128-GCM, SHA-256, ECDSA-P256 signatures. Solid. Audited. Post-quantum? No.

If we’d followed DAVE’s choice, we’d have regressed from where Voidcom already was. The old custom voice path was wrapping per-channel keys with X25519+ML-KEM-768 hybrid HPKE. DM text was already on that hybrid. Switching to MLS without a PQ ciphersuite would have meant: voice loses harvest-now-decrypt-later resistance to gain protocol standardization.

That’s not a trade I was willing to make.

So we picked ciphersuite 65100 — ML_KEM_768_X25519, the XWing combiner. HPKE built on a hybrid of ML-KEM-768 (the NIST-finalized post-quantum KEM) and X25519 (the classical one). AWS LC — AWS’s FIPS 140-3-validated cryptographic library — implements it natively. Same primitive Discord uses internally for some of their other crypto, just not the bit DAVE specifies.

Voidcom voice is now on the same MLS protocol family as Discord DAVE, but on a post-quantum hybrid ciphersuite where DAVE itself is still classical. That’s not a competition — DAVE made a tradeoff that made sense for them at the time, and they may swap when the IETF MLS hybrid-suite draft lands. We made a different call because our DM stack was already PQ-hybrid and we weren’t going to let voice regress.

The marketing version of this is “stronger than DAVE on the PQ axis.” The honest version is “we made a different ciphersuite choice for a specific reason.”

Bonus: we standardized DM crypto too

While we had AWS LC linked in for the MLS work, we did one more piece of cleanup. The DM channel-key wrap and per-recipient file-key wrap had been using my hand-rolled hybrid combiner — concat(X25519_ss, ML-KEM_ss) → HKDF-SHA512 → XChaCha20-Poly1305. Correct, but custom. With AWS LC already in the build, replacing that combiner with cs.hpke_seal() / cs.hpke_open() for the same ML_KEM_768_X25519 ciphersuite was the obvious next step.

The custom Voidcom crypto code surface dropped from ~600 LOC to ~100 LOC. The remaining ~100 lines are mostly a byte-reorder shim (the legacy keypair format had X25519 || ML-KEM ordering; AWS LC’s XWing uses ML-KEM || X25519 internally) and a tiny amount of FFI plumbing. Everything cryptographically interesting is now one HPKE call into AWS LC. Voice MLS keying, DM channel-key wrap, per-recipient file DEK wrap — three different surfaces, one audited primitive.

The Stage caveat

I can’t write a post about MLS voice without being honest about what isn’t covered.

Voice rooms above 99 participants are a separate channel type in Voidcom called Stage. Stage rooms are server-mediated rather than end-to-end encrypted. The server can read Stage audio. This is the same posture Discord takes for its Stage channels — DAVE applies to non-Stage voice only. The reason in both cases is the same: per-frame MLS in rooms of hundreds is a different engineering problem (commit-ordering throughput at the gateway, per-receiver commit-processing cost, batching strategies that maintain forward secrecy under sustained churn). It’s not impossible — Discord may eventually solve it; we may eventually solve it — but it isn’t solved today by MLS-as-DAVE-uses-it.

Rather than pretend Stage rooms are E2E or hide them in fine print, we made Stage a separate channel type with different UI and a clearly different security posture. If you’re in a regular voice channel (≤99 participants, or a DM voice call), you’re under MLS E2E. If you’re in a Stage room, you’re under server-mediated encryption. The line is drawn explicitly.

Honest about the limits

A few things that aren’t fixed by this work and that I want to call out before someone else does:

Ed25519 signatures inside MLS are still classical. ML-KEM and X25519 do the key encapsulation; the actual MLS proposals and commits are signed with Ed25519, which is not post-quantum. Why didn’t we hybridize the signatures too? Because the post-quantum threat model is asymmetric: a quantum adversary can decrypt recorded ciphertext from today (harvest-now-decrypt-later), but they cannot retroactively forge signatures from today — by the time they have a CRQC, your call is over. PQ urgency is for KEMs, not signatures. When the IETF MLS hybrid-signature draft (draft-ietf-mls-combiner) lands and mls-rs adopts it, we swap.
MLS leaf credentials are ephemeral per session. They’re not bound to your long-term Voidcom identity keys (the X25519+ML-KEM hybrid you use for DM). That’s a known follow-up — binding voice MLS identity to the long-term identity would let us build “verify your friend’s voice fingerprint” UX on top. It’s design work I haven’t done yet.
The voice gateway is trusted to order commits. It’s the External Sender for membership change proposals. A compromised gateway can’t read media — that’s the property MLS gives us — but it can selectively delay proposals or refuse to add a member. This is the same trust assumption DAVE makes; the gateway isn’t a passive forwarder.
The wrapper code on top of mls-rs is still mine. ~600 lines of voidcom_crypto::mls. That part isn’t audited externally; it’d be the natural target if and when we can afford a review. Compared to ~600 lines of custom crypto primitives, ~600 lines of glue around an audited library is a much smaller surface to defend.

The takeaway, for anyone else building this kind of thing

Custom cryptography is a debt. Even when it’s correct. Even when it’s post-quantum. Even when you wrote tests, even when you understand the literature, even when you’re conscientious about it. The debt is the audit you can’t afford to commission.

Standardized cryptography inherits the audit community. RFC 9420 has been beaten on by far more eyes than I’ll ever have. AWS LC is FIPS 140-3 validated and ships in production at companies with real budget for this. mls-rs is open source, reviewed by the AWS Labs team and the wider MLS community. None of this means it’s perfect — but it means the ratio of “code I personally have to defend” to “code that’s been independently scrutinized” is dramatically better.

For five friends in pre-beta, the right call is to push as much as possible toward standards. Custom is a multiplier on your audit cost. Standard is a multiplier on the audit cost of everyone before you.

I’ll keep building. I’ll keep reading. And the next time I find a piece of Voidcom code that’s correct but custom and could be standard instead, I’ll make the same trade.

— Max