Hardware for AI: Self-Build Recommendations and the Inference Landscape

26 March 2026 - 11 mins read

Commissioned, Curated and Published by Russ. Researched and written with AI.

What’s New This Week

Intel launched the Arc Pro B70 today: 32GB VRAM at $949 – the highest VRAM single-card option under $1,000, aimed explicitly at local AI inference. Separately, the memory pricing crisis is now spreading to system RAM, storage, and CPUs, putting pressure on the budget build assumptions in this guide. Buy carefully.

Changelog

Date	Summary
26 Mar 2026	Intel Arc Pro B70 launched with 32GB VRAM at $949, while the memory shortage broadened to system RAM, storage, and CPUs.
24 Mar 2026	No material updates – quieter news day for self-build hardware.
23 Mar 2026	Initial publication with Blackwell hold advice and three-tier build recommendations.

The Principle: VRAM Is the Constraint

For local LLM inference, one number matters more than any other: VRAM. The rule of thumb is roughly 2GB of VRAM per billion parameters at standard quantisation (Q4/Q5). A 13B model needs around 8-10GB. A 70B model needs around 40GB. These are floors, not ceilings – context window and batch size push the number up.

The practical thresholds in 2026:

8GB VRAM: 3-7B models comfortably. Entry point, fine for experimentation, not recommended if you’re running agents.
12GB VRAM: 7-8B well, 13B in a squeeze. Workable but you’ll feel the limit quickly.
16GB VRAM: 13-14B models smoothly, 30B with aggressive quantisation. The practical agent floor. Most people running local coding assistants or agent loops sit here.
24GB VRAM: 30-34B comfortably, quantised 70B possible. Where serious inference work lives.
48GB+: 70B cleanly, 120B with quantisation. Research territory or production pipelines.

The implication: any GPU you buy in 2026 should have at least 12GB VRAM. Ideally 16GB. The 8GB cards are already a compromise and models are not getting smaller.

Recommended Self-Build: Three Tiers

£500 “The Enabler” – CPU Inference Focus

For a first local AI machine or a tight budget, you don’t need a discrete GPU. The AMD Ryzen 5 8600G has an integrated Radeon 760M that can allocate up to 8GB of shared VRAM from system RAM, running 7B models at reasonable speed.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 5 8600G	£190
Motherboard	B650 budget	£80
RAM	64GB DDR5	£90
Storage	1TB NVMe SSD	£55
Case + 550W 80+ Gold PSU		£70
Total		~£500

The 64GB RAM matters here: it feeds the iGPU with enough headroom and also enables CPU inference on larger models via llama.cpp. With CPU inference, a 13B Q4 model is usable – just slow (a few tokens per second). The iGPU path gets you 7B at something approaching interactive speed.

What it runs: 7B-13B models via CPU inference, 7B reasonably fast via iGPU. What it doesn’t: Fast inference on anything over 13B. Who it’s for: First local AI machine, experimenting, tight budget.

£800 “The Agent Rig” – Recommended Tier

This is the sweet spot for 2026. The RTX 4060 Ti 16GB is the specific recommendation: 16GB VRAM at the lowest price point of any 16GB card, full CUDA support, runs 13-14B models at 20-30 tokens/sec. That’s fast enough for real work.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 7 7700X	£185
GPU	RTX 4060 Ti 16GB	£300
Motherboard	B650 mid-range	£100
RAM	32GB DDR5	£60
Storage	2TB NVMe Gen4	£80
Case + 750W 80+ Gold PSU		£95
Total		~£800

The Intel i7-14700F (~£200) is a reasonable CPU alternative if you find a better deal. On the GPU, the RTX 4060 Ti 16GB is specifically the 16GB variant – the 8GB version exists at a similar price point and is not recommended (see What to Avoid below).

What it runs: 13-14B models smoothly, 30B quantised, full agent loops, ComfyUI for image work. What it doesn’t: 70B at useful speed. Who it’s for: Engineers running daily agent work, coding assistants, local LLM power users.

£1500 “The Serious Setup” – Maximum VRAM Per Pound

At this tier, the GPU choice is the interesting one. The RX 7900 XTX at £600-650 gives you 24GB GDDR6 at roughly half the price of an RTX 4090. ROCm support with llama.cpp is now production-ready, which wasn’t confidently true a year ago.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 9 7950X	£380
GPU	RX 7900 XTX 24GB	£625
Motherboard	X670E	£180
RAM	64GB DDR5	£110
Storage	2TB NVMe Gen4 + 2TB data	£140
Case + 1000W 80+ Gold PSU		£140
Total		~£1500

The Ryzen 9 7950X (16 cores) earns its place here: parallel inference requests, heavy compilation, and fine-tuning runs all benefit from real core counts. The 16-core chip handles cases where the GPU is waiting on CPU-side processing.

For the GPU, if you prefer the CUDA ecosystem: the RTX 4080 Super at ~£750 is the alternative. You get 16GB instead of 24GB, but better Tensor cores, deeper framework support, and no ROCm dependency. The right choice depends on your tooling requirements.

What it runs: 34B models cleanly, 70B quantised at ~15-20 tokens/sec, small model fine-tuning. Who it’s for: Running multiple models, production agent pipelines, serious local inference.

GPU Quick Reference

Budget	Pick	VRAM	Notes
Under £250	Intel Arc B580	12GB	Surprise pick. ROCm + llama.cpp work. Best VRAM:price at this tier.
£300-400	RTX 4060 Ti 16GB	16GB	Recommended. RTX 4070 12GB if CUDA perf matters more than VRAM.
£450-600	RX 7900 GRE 16GB	16GB	Solid AMD option. ROCm production-ready.
£550-650	RTX 4070 Super 16GB	16GB	Good CUDA card, competitive at this range.
£650-750	RX 7900 XTX 24GB	24GB	Best VRAM:price at the high end for AI workloads specifically.
£1,200+	RTX 4090 24GB	24GB	Fastest consumer card. RX 7900 XTX closes the gap on LLM inference specifically.

AMD vs Nvidia in 2026: The CUDA ecosystem remains deeper for AI tooling – fine-tuning, Triton kernels, some research code, and many commercial tools are CUDA-first. For pure inference with llama.cpp or Ollama, AMD ROCm is now production-ready and the gap is small. Choose AMD for VRAM budget; choose Nvidia for ecosystem breadth and fine-tuning use cases.

What to Avoid

Any GPU with less than 12GB VRAM for new purchases in 2026. Models are not getting smaller. An 8GB card bought today will feel the ceiling within months.

The RTX 4060 8GB specifically. The 16GB version exists at a similar price. There is no reason to buy the 8GB variant.

Pre-built “AI PCs” from OEMs. They typically ship mediocre GPUs at premium prices with inadequate cooling. The “AI PC” label is marketing. Build it yourself.

Mining GPUs on the secondary market. VRAM runs hot under sustained load and mining rigs run sustained load continuously. Degraded VRAM, no warranty, and the seller’s incentive is to not tell you this.

Gaming & General Purpose Self-Build

The same three price points, optimised for games and everyday use rather than AI inference. Gaming builds prioritise GPU clock speed, single-core CPU performance, and fast memory – different priorities from the AI tiers above.

£500 “The 1080p Machine”

Budget gaming that doesn’t compromise where it counts. The GPU gets the lion’s share of the budget.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 5 5600	£90
GPU	RX 7600 XT 16GB	£210
Motherboard	B550 mid-range	£65
RAM	32GB DDR4-3600	£40
Storage	1TB NVMe SSD	£50
Case + 550W 80+ Gold PSU		£65
Total		~£520

The RX 7600 XT’s 16GB is unusual at this price – most cards at this budget ship with 8GB, which is tightening for modern titles. At 1080p max settings this build handles anything currently released. The Ryzen 5 5600 remains excellent value for gaming: fast single-core clocks and no need for DDR5 keeps costs down.

Upgrade path: drop in an RX 7700 XT or RTX 4060 Ti later without touching anything else.

£800 “The 1440p Sweet Spot”

The real gaming sweet spot. The RTX 4070 handles 1440p at max settings with DLSS – and at this price nothing else touches it for the combination of performance and efficiency.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 5 7600	£150
GPU	RTX 4070 12GB	£350
Motherboard	B650	£90
RAM	32GB DDR5-6000	£65
Storage	2TB NVMe Gen4	£80
Case + 750W 80+ Gold PSU		£90
Total		~£825

DDR5-6000 matters here – AMD Ryzen 7000 series benefits measurably from fast memory in CPU-limited scenarios. The RTX 4070 is the sweet spot for DLSS 3 (frame generation) which effectively doubles perceived frame rate in supported titles. At 1440p this rig doesn’t need to compromise.

Alternative GPU: RX 7800 XT 16GB (~£280) saves £70 and gives you 4GB more VRAM – trade DLSS for VRAM headroom and FSR 3.

£1500 “The 4K Powerhouse”

The Ryzen 7 7800X3D is the standout choice at this tier – the 3D V-Cache gives it a larger lead in CPU-limited games than any other architecture. Pair it with the RTX 4080 Super for 4K high/ultra framerates with headroom to spare.

Component	Choice	Approx. Cost
CPU	AMD Ryzen 7 7800X3D	£290
GPU	RTX 4080 Super 16GB	£750
Motherboard	X670	£160
RAM	32GB DDR5-6000	£70
Storage	2TB NVMe Gen4	£90
Case + 850W 80+ Gold PSU		£120
Total		~£1,480

The 7800X3D’s 3D V-Cache is the reason for this specific CPU choice – in CPU-bound gaming scenarios it can outperform chips costing twice as much. The RTX 4080 Super handles 4K with DLSS Quality mode (effectively native quality) at high framerates across modern titles.

Note: if you also want to run local AI inference on this build, swap the RTX 4080 Super for an RX 7900 XTX (£640, saves ~£110) and accept the DLSS trade-off – you gain 8GB VRAM (24GB total) at lower cost. See the AI inference builds above for context on when that matters.

The Hardware Landscape: What to Track

This section will update as the market moves.

Intel Arc Pro B70 (NEW - March 26): 32GB VRAM, ~$949. Intel’s Big Battlemage chip finally appeared – but as a Pro workstation/AI card, not the consumer B770 gamers were waiting for. For local inference specifically, 32GB VRAM at this price is the best VRAM-per-pound option in the sub-$1,000 range. Caveat: Intel’s OneAPI/compute ecosystem is less mature than CUDA or ROCm for inference tooling. Worth watching for price normalisation and real-world llama.cpp benchmarks before recommending over AMD or Nvidia at this tier.

Nvidia Blackwell (RTX 50-series): Consumer rollout is underway. RTX 5090 at 32GB GDDR7 is confirmed available, but pricing has not stabilised and is not recommended over established 40-series pricing unless you specifically need the latest architecture for research work. Critically: no new Nvidia gaming GPUs are expected in 2026 (The Information, March 23). The RTX 50 Super refresh exists in design but production is deprioritised due to the GDDR7 shortage, with Nvidia cutting gaming GPU production by up to 40% as TSMC capacity redirects to datacenter accelerators. The RTX 60 series (Rubin architecture) has been pushed to 2028 – the longest gap between GPU generations since the late 1990s. GTC 2026 (March 16-19) confirmed Vera Rubin datacenter systems on track for H2 2026, reinforcing that datacenter demand continues to crowd out consumer GPU supply. For self-builders, this is actually useful clarity: the RTX 40-series is your stable, available value tier through at least 2026. Hold on Blackwell unless you need it specifically; the advice is now stronger.

AMD RDNA 4: RX 9000-series is now shipping. AMD released FSR 4.1 for RDNA 4 GPUs on March 21, delivering improved Ray Regeneration and higher FPS – the algorithm is shared with Sony’s PSSR. A DLL leak suggests FSR 4.1 may eventually expand to RDNA 3 (RX 7000-series), which would benefit the 7900 XTX and 7000-series cards recommended above.

Memory pricing trends: GDDR7 supply is under active pressure from AI datacenter demand competing with consumer GPU production. Tom’s Hardware now explicitly frames the GPU market as an “AI-driven pricing crisis”. DDR5 for system RAM continues to fall – budget aggressively on capacity (64GB is cheap now). For GPU planning: elevated pricing on new cards is likely to persist through 2026 due to the GDDR7 constraint.

Component pricing warning: The memory shortage is broadening. TechRadar reports the crisis is now spreading to NVMe storage and CPUs – one gaming PC maker warned “the CPU shortage is getting more serious”, with CPU prices forecast to increase up to 15%. Budget estimates in this guide are based on March 2026 pricing. Build sooner rather than later on a budget build, or wait for the dust to settle if you’re in no rush.

Secondhand market: The RTX 5070 became the most-installed discrete GPU on Steam in the February 2026 hardware survey (228% growth in one month), meaning RTX 40-series cards are entering the secondhand market in volume as users upgrade. Budget builds should watch used prices for the RTX 4070 and 4070 Super in the coming months – supply is increasing.

Notable prebuilts: The Tinybox and DGX Station GB300 are interesting at the far end of the budget range but outside self-build territory. Apple Silicon (M3/M4 Ultra) remains a competitive option for unified memory configurations – 192GB of unified memory for 70B inference is compelling if you’re already in the Apple ecosystem, though at significantly higher cost per GB of effective VRAM.