Inference
- How MoE Sparsity and Apple Silicon SSD Architecture Make 397B Local Inference Possible
Flash-MoE runs a 397-billion-parameter model on a MacBook Pro with 5.5GB of active RAM by combining MoE weight sparsity with Apple Silicon's direct SSD-to-GPU memory architecture. This is a specific technical convergence, not a general trick, and understanding why it works on Apple Silicon but not on a standard PC changes how you think about hardware selection for local inference.