Blog

Technical notes on speech, language, and machine learning.

Autoregressive Models for Speech

Speech Synthesis Jun 2026

EnCodec: High-Fidelity Neural Audio Codec with Streaming and Variable Bitrate

Encoder/decoder architecture, RVQ with EMA codebook updates, MS-STFT discriminator, loss balancer, streaming vs. non-streaming mode, variable bitrate, and ablation results.

Speech Synthesis Jun 2026

Codec-based TTS Pipeline: RVQ, Semantic Tokens, and Acoustic Tokens

RVQ mechanics, codebook delay pattern, semantic vs. acoustic token comparison, codebook collapse, exposure bias, streaming implementation, and EnCodec vs. DAC vs. Mimi.