Self-Hosting
- Self-Hosting Your AI Stack: A Practical Guide
Updated 3 April 2026: Google releases Gemma 4 under Apache 2.0 -- the 26B MoE activates only 3.8B parameters at inference, the 31B Dense hits #3 on Arena, and E2B/E4B run on Raspberry Pi at 6GB RAM; Gemma is now a credible primary alternative to Qwen for self-hosted inference.
- Self-Hosting Your AI Stack: A Practical Guide
Updated 26 March 2026: Intel Arc Pro B70 launches with 32GB VRAM at $949, the first single card to hit that tier under $1,000; first-person LiteLLM malware incident account adds depth to the supply-chain risk section.
- Mistral Open-Sourced Voice AI. Your ElevenLabs Bill Is Now a Choice.
Mistral released Voxtral TTS, an open-weights text-to-speech model with 70ms latency, 9 languages, and voice cloning from 3-second samples. For engineering teams building voice agents, the per-character billing model just became optional.
- Mistral Small 4: One Model for Reasoning, Multimodal, and Coding
Mistral Small 4 unifies reasoning, multimodal, and coding agent capabilities into a single 119B MoE model under Apache 2.0. 6B active parameters at inference, 256K context, configurable reasoning effort. One deployment replaces three specialised models.