Version History: Self-Hosting Your AI Stack: A Practical Guide

← Back to Self-Hosting Your AI Stack: A Practical Guide

Changelog

DateSummary
26 Mar 202626 Mar 2026 – Intel Arc Pro B70 and B65 launch with 32GB VRAM at $949, adding a new competitive option at the 32GB inference tier; first-person LiteLLM malware incident account adds depth to the supply-chain risk section.
25 Mar 202625 Mar 2026 – Google TurboQuant achieves zero-accuracy-loss KV cache compression at ICLR 2026 (429 HN points); Ensu by Ente ships privacy-first local LLM app with E2E encrypted sync (293 HN points); GitHub Copilot data usage policy updated.
24 Mar 202624 Mar 2026 – LiteLLM 1.82.7/1.82.8 on PyPI compromised with credential-stealing payload (182 HN points); supply-chain risk hits AI middleware directly; Hypura adds storage-tier-aware Apple Silicon inference scheduler for models exceeding RAM.
23 Mar 202623 Mar 2026 – iPhone 17 Pro demonstrated running 400B MoE via SSD streaming at 0.6 tok/s on A18 Pro, extending the Flash-MoE SSD streaming story from MacBook Pro to mobile Apple Silicon.
22 Mar 202622 Mar 2026 – Flash-MoE runs Qwen3.5-397B on a MacBook Pro M3 Max 48GB via SSD streaming at 4.4 tok/s; Apple Silicon inference ceiling updated.
21 Mar 202621 Mar 2026 – Quiet day, thesis holds.
20 Mar 202620 Mar 2026 – Supermicro co-founder charged in $2.5B smuggling plot adds supply-chain risk signal; MacBook M5 Pro running Qwen3.5 extends the Apple Silicon inference case.
19 Mar 202619 Mar 2026 – KittenTTS (<25MB) extends local voice inference beyond Apple Silicon to any hardware.
18 Mar 202618 Mar 2026 – Quiet day, thesis holds.
17 Mar 202617 Mar 2026 – Quiet day, thesis holds.
16 Mar 202616 Mar 2026 – Quiet day, thesis holds.
15 Mar 202615 Mar 2026 – Quiet day, thesis holds.
14 Mar 202614 Mar 2026 – Quiet day, thesis holds.
13 Mar 202613 Mar 2026 – Qatar helium shutdown adds near-term GPU supply risk to the hardware calculus; CanIRun.ai (463 HN points) is a useful model-compatibility checker for self-hosters.
12 Mar 202612 Mar 2026 – RTX 5090 (32GB VRAM) adds a single-card path to Qwen3.5-35B-A3B; hardware section updated.
11 Mar 202611 Mar 2026 – Microsoft BitNet shows 100B 1-bit models running at readable speed on a single CPU, adding nuance to the CPU inference section.
10 Mar 202610 Mar 2026 – Quiet day, thesis holds.
9 Mar 20269 Mar 2026 – US Appeals Court rules TOS updates by email bind users via implied consent, strengthening the vendor lock-in argument for self-hosting.
8 Mar 2026Quiet day, thesis holds.
7 Mar 2026Quiet day, thesis holds.
6 Mar 2026A quieter day – nothing today that shifts the thesis.
5 Mar 2026PersonaPlex 7B enables local full-duplex voice on Apple Silicon; AMD Ryzen AI 400 desktop (50 TOPS NPU, AM5).
4 Mar 2026Apple MacBook Neo ($599, A18 Pro, ships Mar 11).
2 Mar 2026Initial publication

Snapshots

No dated snapshots yet.