Self-Hosting

Self-Hosting Your AI Stack: A Practical Guide April 3, 2026
Updated 3 April 2026: Google releases Gemma 4 under Apache 2.0 -- the 26B MoE activates only 3.8B parameters at inference, the 31B Dense hits #3 on Arena, and E2B/E4B run on Raspberry Pi at 6GB RAM; Gemma is now a credible primary alternative to Qwen for self-hosted inference.
Self-Hosting Your AI Stack: A Practical Guide March 26, 2026
Updated 26 March 2026: Intel Arc Pro B70 launches with 32GB VRAM at $949, the first single card to hit that tier under $1,000; first-person LiteLLM malware incident account adds depth to the supply-chain risk section.
Mistral Open-Sourced Voice AI. Your ElevenLabs Bill Is Now a Choice. March 26, 2026
Mistral released Voxtral TTS, an open-weights text-to-speech model with 70ms latency, 9 languages, and voice cloning from 3-second samples. For engineering teams building voice agents, the per-character billing model just became optional.
Mistral Small 4: One Model for Reasoning, Multimodal, and Coding March 17, 2026
Mistral Small 4 unifies reasoning, multimodal, and coding agent capabilities into a single 119B MoE model under Apache 2.0. 6B active parameters at inference, 256K context, configurable reasoning effort. One deployment replaces three specialised models.