Version History: Self-Hosting Your AI Stack: A Practical Guide
← Back to Self-Hosting Your AI Stack: A Practical Guide
Changelog
| Date | Summary |
|---|---|
| 26 Mar 2026 | 26 Mar 2026 – Intel Arc Pro B70 and B65 launch with 32GB VRAM at $949, adding a new competitive option at the 32GB inference tier; first-person LiteLLM malware incident account adds depth to the supply-chain risk section. |
| 25 Mar 2026 | 25 Mar 2026 – Google TurboQuant achieves zero-accuracy-loss KV cache compression at ICLR 2026 (429 HN points); Ensu by Ente ships privacy-first local LLM app with E2E encrypted sync (293 HN points); GitHub Copilot data usage policy updated. |
| 24 Mar 2026 | 24 Mar 2026 – LiteLLM 1.82.7/1.82.8 on PyPI compromised with credential-stealing payload (182 HN points); supply-chain risk hits AI middleware directly; Hypura adds storage-tier-aware Apple Silicon inference scheduler for models exceeding RAM. |
| 23 Mar 2026 | 23 Mar 2026 – iPhone 17 Pro demonstrated running 400B MoE via SSD streaming at 0.6 tok/s on A18 Pro, extending the Flash-MoE SSD streaming story from MacBook Pro to mobile Apple Silicon. |
| 22 Mar 2026 | 22 Mar 2026 – Flash-MoE runs Qwen3.5-397B on a MacBook Pro M3 Max 48GB via SSD streaming at 4.4 tok/s; Apple Silicon inference ceiling updated. |
| 21 Mar 2026 | 21 Mar 2026 – Quiet day, thesis holds. |
| 20 Mar 2026 | 20 Mar 2026 – Supermicro co-founder charged in $2.5B smuggling plot adds supply-chain risk signal; MacBook M5 Pro running Qwen3.5 extends the Apple Silicon inference case. |
| 19 Mar 2026 | 19 Mar 2026 – KittenTTS (<25MB) extends local voice inference beyond Apple Silicon to any hardware. |
| 18 Mar 2026 | 18 Mar 2026 – Quiet day, thesis holds. |
| 17 Mar 2026 | 17 Mar 2026 – Quiet day, thesis holds. |
| 16 Mar 2026 | 16 Mar 2026 – Quiet day, thesis holds. |
| 15 Mar 2026 | 15 Mar 2026 – Quiet day, thesis holds. |
| 14 Mar 2026 | 14 Mar 2026 – Quiet day, thesis holds. |
| 13 Mar 2026 | 13 Mar 2026 – Qatar helium shutdown adds near-term GPU supply risk to the hardware calculus; CanIRun.ai (463 HN points) is a useful model-compatibility checker for self-hosters. |
| 12 Mar 2026 | 12 Mar 2026 – RTX 5090 (32GB VRAM) adds a single-card path to Qwen3.5-35B-A3B; hardware section updated. |
| 11 Mar 2026 | 11 Mar 2026 – Microsoft BitNet shows 100B 1-bit models running at readable speed on a single CPU, adding nuance to the CPU inference section. |
| 10 Mar 2026 | 10 Mar 2026 – Quiet day, thesis holds. |
| 9 Mar 2026 | 9 Mar 2026 – US Appeals Court rules TOS updates by email bind users via implied consent, strengthening the vendor lock-in argument for self-hosting. |
| 8 Mar 2026 | Quiet day, thesis holds. |
| 7 Mar 2026 | Quiet day, thesis holds. |
| 6 Mar 2026 | A quieter day – nothing today that shifts the thesis. |
| 5 Mar 2026 | PersonaPlex 7B enables local full-duplex voice on Apple Silicon; AMD Ryzen AI 400 desktop (50 TOPS NPU, AM5). |
| 4 Mar 2026 | Apple MacBook Neo ($599, A18 Pro, ships Mar 11). |
| 2 Mar 2026 | Initial publication |
Snapshots
No dated snapshots yet.