Local-Ai
- Intel Arc Pro B70: 32GB for $949 and What It Does to the Inference Cost Equation
Intel launched the Arc Pro B70 on 25 March 2026 -- 32GB GDDR6, 608 GB/s bandwidth, $949. That's more VRAM than Nvidia's $1,800 RTX Pro 4000 Blackwell, at nearly half the price. The VRAM:price calculus for local AI inference just shifted.
- Building a Local AI Machine: Three Builds at £500, £800, and £1500
A practical buying guide for engineers who want to run local AI models and agents in 2026. Three tiers at £500, £800, and £1500, with honest assessments of what each actually runs.
- How MoE Sparsity and Apple Silicon SSD Architecture Make 397B Local Inference Possible
Flash-MoE runs a 397-billion-parameter model on a MacBook Pro with 5.5GB of active RAM by combining MoE weight sparsity with Apple Silicon's direct SSD-to-GPU memory architecture. This is a specific technical convergence, not a general trick, and understanding why it works on Apple Silicon but not on a standard PC changes how you think about hardware selection for local inference.
- Nvidia's Open-Source Play: Nemotron 3 and the Agentic Token Tax
Running agentic AI workflows through closed APIs is getting expensive fast. Nvidia's Nemotron 3 Super is the most credible open-weight answer yet -- but the hardware strategy underneath it is worth understanding before you reach for the Ollama docs.
- Running a 397B Model on 48GB: Flash-MoE and the Active-Parameter Insight
Dan Woods streamed a 209GB MoE model from SSD on a 48GB MacBook Pro and got 5-5.7 tokens per second. The key insight: memory constraints on local inference are about active parameters, not total ones. MoE architecture changes the math entirely.
- Open-Weight vs Frontier: How Close Is the Accuracy Gap Really?
Benchmark scores for open-weight models have converged with frontier cloud models on many tasks. But benchmarks measure what benchmarks measure. This is what the data actually says about where the gap is real and where it has closed.
- Tinybox vs Apple Silicon vs Project Digits: Which Local AI Box for Engineering Teams
Three different philosophies for running AI locally: raw GPU VRAM (Tinybox), unified memory that just works (Apple Silicon), and the Nvidia stack in a compact box (Project Digits). This is a decision guide, not a benchmark sheet.
- Local AI Inference Has Crossed a Threshold
Three things converged in 2026: hardware that can actually run useful models, open-weight models that match cloud quality for most engineering tasks, and economics that make the API-forever assumption look increasingly expensive. The architectural question has shifted from 'can you run AI locally?' to 'why are you paying per-token when you don't have to?'