Hardware for AI: Self-Build Recommendations and the Inference Landscape

23 March 2026 - 9 mins read

Commissioned, Curated and Published by Russ. Researched and written with AI.

What’s New This Week

The Trivy supply chain attack (March 19-21, 2026) is a reminder that the security tooling in your CI pipeline is itself an attack surface – but that’s covered in the Security signal. On the hardware side: Nvidia’s RTX 50-series (Blackwell consumer) is rolling out, with the RTX 5090 confirmed at 32GB GDDR7. Pricing has not yet stabilised enough to recommend. The 4090 equivalent pricing position hasn’t dropped yet. Hold on Blackwell unless you need the bleeding edge. The RX 7900 XTX remains the VRAM-per-pound leader at the high end.

Changelog

Date	Summary
23 Mar 2026	Initial publication.

The Principle: VRAM Is the Constraint

For local LLM inference, one number matters more than any other: VRAM. The rule of thumb is roughly 2GB of VRAM per billion parameters at standard quantisation (Q4/Q5). A 13B model needs around 8-10GB. A 70B model needs around 40GB. These are floors, not ceilings – context window and batch size push the number up.

The practical thresholds in 2026:

8GB VRAM: 3-7B models comfortably. Entry point, fine for experimentation, not recommended if you’re running agents.
12GB VRAM: 7-8B well, 13B in a squeeze. Workable but you’ll feel the limit quickly.
16GB VRAM: 13-14B models smoothly, 30B with aggressive quantisation. The practical agent floor. Most people running local coding assistants or agent loops sit here.
24GB VRAM: 30-34B comfortably, quantised 70B possible. Where serious inference work lives.
48GB+: 70B cleanly, 120B with quantisation. Research territory or production pipelines.

The implication: any GPU you buy in 2026 should have at least 12GB VRAM. Ideally 16GB. The 8GB cards are already a compromise and models are not getting smaller.

Recommended Self-Build: Three Tiers

£500 “The Enabler” – CPU Inference Focus

For a first local AI machine or a tight budget, you don’t need a discrete GPU. The AMD Ryzen 5 8600G has an integrated Radeon 760M that can allocate up to 8GB of shared VRAM from system RAM, running 7B models at reasonable speed.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 5 8600G	£190
Motherboard	B650 budget	£80
RAM	64GB DDR5	£90
Storage	1TB NVMe SSD	£55
Case + 550W 80+ Gold PSU		£70
Total		~£500

The 64GB RAM matters here: it feeds the iGPU with enough headroom and also enables CPU inference on larger models via llama.cpp. With CPU inference, a 13B Q4 model is usable – just slow (a few tokens per second). The iGPU path gets you 7B at something approaching interactive speed.

What it runs: 7B-13B models via CPU inference, 7B reasonably fast via iGPU. What it doesn’t: Fast inference on anything over 13B. Who it’s for: First local AI machine, experimenting, tight budget.

£800 “The Agent Rig” – Recommended Tier

This is the sweet spot for 2026. The RTX 4060 Ti 16GB is the specific recommendation: 16GB VRAM at the lowest price point of any 16GB card, full CUDA support, runs 13-14B models at 20-30 tokens/sec. That’s fast enough for real work.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 7 7700X	£185
GPU	RTX 4060 Ti 16GB	£300
Motherboard	B650 mid-range	£100
RAM	32GB DDR5	£60
Storage	2TB NVMe Gen4	£80
Case + 750W 80+ Gold PSU		£95
Total		~£800

The Intel i7-14700F (~£200) is a reasonable CPU alternative if you find a better deal. On the GPU, the RTX 4060 Ti 16GB is specifically the 16GB variant – the 8GB version exists at a similar price point and is not recommended (see What to Avoid below).

What it runs: 13-14B models smoothly, 30B quantised, full agent loops, ComfyUI for image work. What it doesn’t: 70B at useful speed. Who it’s for: Engineers running daily agent work, coding assistants, local LLM power users.

£1500 “The Serious Setup” – Maximum VRAM Per Pound

At this tier, the GPU choice is the interesting one. The RX 7900 XTX at £600-650 gives you 24GB GDDR6 at roughly half the price of an RTX 4090. ROCm support with llama.cpp is now production-ready, which wasn’t confidently true a year ago.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 9 7950X	£380
GPU	RX 7900 XTX 24GB	£625
Motherboard	X670E	£180
RAM	64GB DDR5	£110
Storage	2TB NVMe Gen4 + 2TB data	£140
Case + 1000W 80+ Gold PSU		£140
Total		~£1500

The Ryzen 9 7950X (16 cores) earns its place here: parallel inference requests, heavy compilation, and fine-tuning runs all benefit from real core counts. The 16-core chip handles cases where the GPU is waiting on CPU-side processing.

For the GPU, if you prefer the CUDA ecosystem: the RTX 4080 Super at ~£750 is the alternative. You get 16GB instead of 24GB, but better Tensor cores, deeper framework support, and no ROCm dependency. The right choice depends on your tooling requirements.

What it runs: 34B models cleanly, 70B quantised at ~15-20 tokens/sec, small model fine-tuning. Who it’s for: Running multiple models, production agent pipelines, serious local inference.

GPU Quick Reference

Budget	Pick	VRAM	Notes
Under £250	Intel Arc B580	12GB	Surprise pick. ROCm + llama.cpp work. Best VRAM:price at this tier.
£300-400	RTX 4060 Ti 16GB	16GB	Recommended. RTX 4070 12GB if CUDA perf matters more than VRAM.
£450-600	RX 7900 GRE 16GB	16GB	Solid AMD option. ROCm production-ready.
£550-650	RTX 4070 Super 16GB	16GB	Good CUDA card, competitive at this range.
£650-750	RX 7900 XTX 24GB	24GB	Best VRAM:price at the high end for AI workloads specifically.
£1,200+	RTX 4090 24GB	24GB	Fastest consumer card. RX 7900 XTX closes the gap on LLM inference specifically.

AMD vs Nvidia in 2026: The CUDA ecosystem remains deeper for AI tooling – fine-tuning, Triton kernels, some research code, and many commercial tools are CUDA-first. For pure inference with llama.cpp or Ollama, AMD ROCm is now production-ready and the gap is small. Choose AMD for VRAM budget; choose Nvidia for ecosystem breadth and fine-tuning use cases.

What to Avoid

Any GPU with less than 12GB VRAM for new purchases in 2026. Models are not getting smaller. An 8GB card bought today will feel the ceiling within months.

The RTX 4060 8GB specifically. The 16GB version exists at a similar price. There is no reason to buy the 8GB variant.

Pre-built “AI PCs” from OEMs. They typically ship mediocre GPUs at premium prices with inadequate cooling. The “AI PC” label is marketing. Build it yourself.

Mining GPUs on the secondary market. VRAM runs hot under sustained load and mining rigs run sustained load continuously. Degraded VRAM, no warranty, and the seller’s incentive is to not tell you this.

Gaming & General Purpose Self-Build

The same three price points, optimised for games and everyday use rather than AI inference. Gaming builds prioritise GPU clock speed, single-core CPU performance, and fast memory – different priorities from the AI tiers above.

£500 “The 1080p Machine”

Budget gaming that doesn’t compromise where it counts. The GPU gets the lion’s share of the budget.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 5 5600	£90
GPU	RX 7600 XT 16GB	£210
Motherboard	B550 mid-range	£65
RAM	32GB DDR4-3600	£40
Storage	1TB NVMe SSD	£50
Case + 550W 80+ Gold PSU		£65
Total		~£520

The RX 7600 XT’s 16GB is unusual at this price – most cards at this budget ship with 8GB, which is tightening for modern titles. At 1080p max settings this build handles anything currently released. The Ryzen 5 5600 remains excellent value for gaming: fast single-core clocks and no need for DDR5 keeps costs down.

Upgrade path: drop in an RX 7700 XT or RTX 4060 Ti later without touching anything else.

£800 “The 1440p Sweet Spot”

The real gaming sweet spot. The RTX 4070 handles 1440p at max settings with DLSS – and at this price nothing else touches it for the combination of performance and efficiency.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 5 7600	£150
GPU	RTX 4070 12GB	£350
Motherboard	B650	£90
RAM	32GB DDR5-6000	£65
Storage	2TB NVMe Gen4	£80
Case + 750W 80+ Gold PSU		£90
Total		~£825

DDR5-6000 matters here – AMD Ryzen 7000 series benefits measurably from fast memory in CPU-limited scenarios. The RTX 4070 is the sweet spot for DLSS 3 (frame generation) which effectively doubles perceived frame rate in supported titles. At 1440p this rig doesn’t need to compromise.

Alternative GPU: RX 7800 XT 16GB (~£280) saves £70 and gives you 4GB more VRAM – trade DLSS for VRAM headroom and FSR 3.

£1500 “The 4K Powerhouse”

The Ryzen 7 7800X3D is the standout choice at this tier – the 3D V-Cache gives it a larger lead in CPU-limited games than any other architecture. Pair it with the RTX 4080 Super for 4K high/ultra framerates with headroom to spare.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 7 7800X3D	£290
GPU	RTX 4080 Super 16GB	£750
Motherboard	X670	£160
RAM	32GB DDR5-6000	£70
Storage	2TB NVMe Gen4	£90
Case + 850W 80+ Gold PSU		£120
Total		~£1,480

The 7800X3D’s 3D V-Cache is the reason for this specific CPU choice – in CPU-bound gaming scenarios it can outperform chips costing twice as much. The RTX 4080 Super handles 4K with DLSS Quality mode (effectively native quality) at high framerates across modern titles.

Note: if you also want to run local AI inference on this build, swap the RTX 4080 Super for an RX 7900 XTX (£640, saves ~£110) and accept the DLSS trade-off – you gain 8GB VRAM (24GB total) at lower cost. See the AI inference builds above for context on when that matters.

The Hardware Landscape: What to Track

This section will update as the market moves.

Nvidia Blackwell (RTX 50-series): Consumer rollout is underway. RTX 5090 at 32GB GDDR7 is confirmed, but pricing has not stabilised. Not yet recommended over established 40-series pricing unless you specifically need the latest architecture for research work. RTX 5080 at 16GB is expected – watch for that positioning against the 4060 Ti 16GB.

AMD RDNA 4: RX 9000-series details emerging. RDNA 4 is expected to improve ROCm performance further. Watch for any 20GB+ VRAM options in the mid-range.

Memory pricing trends: DDR5 continues to fall. Budget more aggressively on RAM capacity – 64GB is achievable cheaply now and matters for CPU-side inference and iGPU configurations.

Notable prebuilts: The Tinybox and DGX Station GB300 are interesting at the far end of the budget range but outside self-build territory. Apple Silicon (M3/M4 Ultra) remains a competitive option for unified memory configurations – 192GB of unified memory for 70B inference is compelling if you’re already in the Apple ecosystem, though at significantly higher cost per GB of effective VRAM.