Signal vs Noise: How We Decide What Actually Matters
Commissioned, Curated and Published by Russ. Researched and written with AI. This is the living version of this post. View versioned snapshots in the changelog below.
Disclaimer: This post reflects the editorial approach of this blog as of the date above. It will evolve. That’s the point.
This is a living document. The version you’re reading was last updated on 6 March 2026. If something has changed, the changelog at the bottom will say what and why.
What’s New This Week
6 March 2026. Two stories today that add concrete data points to the post’s thesis.
GPT-5.4 dropped from OpenAI this morning. 874 points and nearly 700 comments on HN – exactly the practitioner signal threshold the post describes. The substantive discussion centres on the 1M context window at a flat rate per token, no surcharge beyond 200k, putting it meaningfully ahead of current alternatives on context at this price point. GPT-5.4 supercedes GPT-5.3-Codex, a model many practitioners were only recently integrating. The Gergely sweater moment, every week. The pace is not slowing.
The more thematically sharp story: 406.fail, a proposed standard protocol for rejecting low-effort AI-generated pull requests, got 212 points on HN with a solid comment thread. The project – RAGS (Rejection of Artificially Generated Slop) – gives maintainers a canonical URL to paste when closing AI-generated noise contributions. This is practitioners building signal/noise filtering infrastructure at the code contribution layer. The post describes the information noise problem; this shows the same dynamic now operating on codebases. The signal-to-noise problem is not staying confined to news feeds.
Changelog
| Date | Summary |
|---|---|
| 6 Mar 2026 | GPT-5.4 practitioner signal; 406.fail shows noise problem expanding to code contributions. |
| 5 Mar 2026 | Alibaba Qwen leadership departure – practitioner signal for open weights users. |
| 4 Mar 2026 | Added: JVG quantum algorithm watch item, SteerEval controllability benchmark, UniG2U-Bench multimodal finding, Knuth “Claude’s Cycles” signal |
| 2 Mar 2026 | Initial publication |
The Problem: There Is Too Much
Let’s start with a moment that captures the current situation better than any graph could.
Gergely Orosz – The Pragmatic Engineer – wore a Gemini 3 sweater to a conference talk. By the time he was done speaking, Gemini 3.1 Pro had dropped. The sweater was already outdated. He posted about it with the weary good humour of someone who has accepted that this is just what the industry is like now.
That story is funny. It is also a pretty accurate description of what it feels like to try to follow AI news in 2026.
A model that was genuinely frontier-class six weeks ago is mid-tier today. Capabilities that would have been headline news eighteen months ago now get a footnote in a changelog. The pace is not slowing down. If anything, it is accelerating, and the coverage ecosystem has scaled to match it in volume but not in quality.
Consider what you would need to follow, if you were trying to follow everything:
- Newsletters (The Pragmatic Engineer, TLDR, Import AI, The Batch, Stratechery, and dozens more)
- Podcasts (Latent Space, Lex Fridman, Dwarkesh, The TWIML Podcast, and on and on)
- Preprint servers (arXiv gets dozens of relevant papers a day)
- Twitter/X (researchers, engineers, founders, journalists, all posting in real time)
- Hacker News (aggregating from all of the above, plus things they missed)
- Discord servers (model-specific communities, research groups, indie hacker spaces)
- YouTube (demos, teardowns, conference talks, explainers)
- Company blogs, earnings calls, researcher Substacks
Nobody follows all of this. Anyone who tells you they do is either lying or not sleeping. The ecosystem is too large, too fast, and too unevenly distributed in quality.
This creates two failure modes, and most people end up in one of them.
Failure mode one: FOMO-driven reading. You try to keep up. You add another newsletter, follow another researcher, bookmark more tabs. You feel perpetually behind. Every week there is something you missed and you spend more time catching up than actually thinking. The reading becomes the job, and the thinking gets squeezed out.
Failure mode two: tuning out entirely. The volume overwhelms you. You stop trying. You catch things second-hand, in team meetings or through colleagues, always slightly late. You lose the thread. When something genuinely important happens, you hear about it but don’t have enough context to know why it matters.
Both of these are understandable responses to a genuinely difficult situation. Neither is particularly useful.
The third path – the one this blog tries to embody – is to build a filter and be explicit about it. That means being honest about what we read, how we decide what matters, and what we’re probably missing.
The Sources: What We Actually Read and Why
A filter is only as good as its inputs. Here is what actually goes into the reading pile, and why each source earns its place.
Hacker News
HN remains the most reliable real-time signal for practitioner reaction to anything in the tech space. Not because the posts are always good – many of them are just links to press releases – but because the comments often are.
The heuristic here is simple: if something gets 500+ points and the comment thread has substantive technical discussion, it matters. If it gets 500+ points and the comments are mostly “this is huge” or “AI is going to change everything,” that’s a different signal. High points with low-quality comments usually means something was interesting to a general audience but didn’t move the technical needle.
The inverse is also useful. When something drops and HN barely notices, or when the comments are mostly sceptical practitioners pointing out limitations, that tells you something too. Absence of practitioner enthusiasm is data.
We treat HN as a filter and aggregator, not a source. It surfaces things from everywhere else. Its real value is the community reaction, not the original content.
Simon Willison’s Weblog (simonwillison.net)
Simon Willison’s blog has one of the highest signal-to-noise ratios of any single source in the space. Every post is either an experiment he has actually run, an observation from hands-on work, or a careful synthesis of something he has read. There is no hype. There are no predictions dressed as analysis.
What makes it particularly useful is the methodology on display. He shows his working. When he tests something, he tells you how. When he is uncertain, he says so. When a previous finding gets updated by new evidence, he notes it.
The hit rate on “things Simon wrote about that turned out to matter” is high enough that this blog treats it as a primary input. If he is paying attention to something, we probably should be too.
The Pragmatic Engineer (Gergely Orosz)
The Pragmatic Engineer is valuable for a specific reason: it takes an engineering leadership perspective on things that most AI coverage treats as purely technical. Gergely’s sourcing is unusually rigorous for the newsletter space – he checks things, quotes people, and is explicit when something is uncertain.
The sweater story is instructive here too. Someone who can laugh at themselves for being slightly behind on a news cycle is more trustworthy than someone who projects certainty. The Pragmatic Engineer is honest about what it doesn’t know. That’s rarer than it should be.
It is particularly useful for anything that touches engineering organisations, hiring, tooling adoption, and the gap between what vendors claim and what teams are actually experiencing.
Latent Space (swyx)
Latent Space earns its place through depth. The podcast and the writing both go substantially further into technical territory than most general-audience AI coverage. swyx has a good instinct for synthesis – for finding the thread that connects several things that look separate on the surface.
The practical value here is for topics where you need to actually understand what is happening, not just that something happened. When a new architecture drops, when there is a shift in how people are approaching a particular problem, Latent Space is usually one of the first places to engage with it seriously.
arXiv and HuggingFace Papers
We do not read everything on arXiv. Nobody does, nobody should, and anyone recommending you try is setting you up for failure mode one.
The approach instead is to read what the practitioner community is already discussing. If a paper is getting attention on Twitter/X from researchers whose work we respect, or if it surfaces on HN with a strong comment thread, that is the filter. The paper gets read. Everything else gets skimmed at best.
HuggingFace’s paper discussion threads are underrated here. They often surface practitioner reaction faster than anywhere else, and the people commenting are frequently the ones who have actually tried to replicate or apply the work.
Primary Sources
For anything significant – a major model release, a company shift, an acquisition – we go to the primary source. The company blog. The earnings call transcript. The researcher’s own post or thread.
The intermediary layer (news articles, newsletter summaries) often introduces errors, strips context, and adds framing that wasn’t in the original. Reading the press release is faster than reading three articles about the press release, and usually more accurate.
This is especially true for earnings calls, which are dense but contain information that rarely makes it into coverage intact.
Absence as Signal
One thing that doesn’t get discussed enough: when something is not being discussed in the practitioner community, that is also information.
If a company announces a new model and the HN thread is thin, the Twitter/X engineering community is mostly quiet, and Simon Willison doesn’t write anything about it – that tells you something. Either the model isn’t doing anything new, or the access is too limited for anyone to have tested it, or the target audience isn’t practitioners at all.
A lot of AI coverage treats all announcements as equally significant. The practitioner community’s reaction is a decent proxy for whether something actually is.
The Filters: How We Decide What Makes It
Reading widely is necessary but not sufficient. Everything that gets read still has to pass some filters before it becomes something on this blog.
The “so what” test. Does this change anything for someone building with or thinking about AI? A benchmark result alone does not pass this test. A benchmark result that crosses a capability threshold that unlocks a new class of applications, with a cost implication that changes the economics, does. The question is always: so what does someone actually do differently as a result of this?
The practitioner signal. Is the engineering community actually using this, reacting to this, or building with this? Or is the reaction mostly from the media and marketing layer? PR dressed as news is the majority of AI coverage. The practitioner filter cuts most of it.
The “still true in six months” test. Will this matter beyond the news cycle? Some things are genuinely durable: a shift in what is economically possible, a new architectural approach that is getting adopted, a change in how a major platform works. Other things are interesting for a week and then irrelevant. The specific benchmark score for a model that will be superseded next month fails this test. The underlying capability improvement that drove that score might pass it.
The first-principles test. Does this represent something genuinely new, or is it an iteration on existing patterns? Both can be worth covering, but they are different things. Iteration is often more practically useful than novelty – something that makes an existing approach 40% cheaper is often more significant than something that does a new thing badly. But they need to be distinguished.
Negative filtering. Some things we explicitly do not cover, or cover only rarely:
- Most crypto/web3 AI. The intersection of these two hype cycles does not produce much that matters for practitioners.
- “AI will do X” predictions without evidence. Predictions are not news. They are content.
- Benchmark announcements without context. A number on a leaderboard is not analysis.
- Product launches from companies with no track record. Launches are cheap. Shipped products that people use are not.
This list is not exhaustive and it is not absolute. Exceptions happen. But the default is to skip it.
The Mistakes We Make and How We Handle Them
The pace means we sometimes get things wrong. This section exists to be honest about that.
We update when we’re wrong. The living document model on this blog is not just a formatting choice. It exists because the field moves fast enough that something that was accurate at publication may need revision three months later. When we update a post, the changelog shows what changed and why. Not all at once, and not always immediately, but it happens.
AI-generated content can be confidently incorrect. This blog uses AI tools in parts of its production process. That creates a specific failure mode: content that sounds authoritative but contains errors that a human expert would catch. The curation and editing step exists to catch these. It does not catch everything. If you find something factually wrong, the feedback mechanism at the bottom of the post exists for exactly this.
We have biases, and we know some of them. The coverage here skews heavily toward the Western, English-language AI ecosystem. Chinese labs – Baidu, DeepSeek, Moonshot, Zhipu, and others – do genuinely significant work that gets less attention here than it deserves. We are actively trying to correct this, but the language barrier and the different publication culture mean we are probably still under-covering it.
We also likely over-index on practitioners relative to researchers. The practitioner framing is deliberate – this blog is for people building things, not for academics – but it means we probably miss things that are significant in the research community but haven’t yet surfaced as practical tools or techniques.
The changelog is the accountability mechanism. When a post gets a significant update, the date, the version link, and a brief note on what changed appear in the changelog table at the top. This is not perfect but it is better than pretending posts are static documents in a field that is not.
What We’re Not
It is probably useful to be explicit about this.
Not a news service. We do not cover everything. We cover what we think matters, at a weekly cadence that allows for some reflection rather than immediate reaction. Things fall through. That is a feature, not a bug. If something genuinely matters, it will still matter next week.
Not objective. We have a point of view. We think that is more honest than performing false balance. “On one hand, this model is impressive; on the other hand, some people are sceptical” is not analysis. It is hedging dressed as journalism. We make calls. Sometimes they are wrong. See above.
Not comprehensive. The scope is deliberately narrow: what matters for senior engineers and technical leaders thinking about how AI affects their work and the systems they build. That excludes a lot. Policy, ethics at a philosophical level, consumer applications, the broader social implications – these are real and important topics, and this blog is not the place for them.
Not infallible. This has been covered above, but it bears repeating in its own section because it is the thing most AI content gets wrong. The field is moving fast enough that confident, comprehensive coverage is not achievable. The honest position is: here is what we think we know, here is how confident we are, and here is the mechanism for updating when we’re wrong.
A Practical Guide for Building Your Own Filter
If any of the above is useful, the most useful thing might be to adapt it rather than consume it.
Pick three to five sources and read them deeply, not twenty sources shallowly. The marginal value of source number twelve is close to zero. The depth you can apply to three sources you trust is substantially higher than the breadth you can achieve across twenty.
Treat Hacker News as a filter, not a source. It aggregates from everywhere. Use it to surface what the community is reacting to, then go read the original thing. The comments are the signal; the links are the index.
Follow five to ten practitioners whose judgment you trust, and pay attention over time. Not their Twitter presence. Their actual work, their posts, their talks. The goal is to calibrate your sense of whose intuitions tend to be right, so that when they react to something, you have context for why that reaction matters.
Let the practitioner community be your first filter. If the engineers building things are not talking about it, it probably does not matter yet. This is not a perfect heuristic – sometimes things matter before the community catches up – but it cuts a lot of noise.
Give yourself permission to not know everything. The pace is genuinely impossible to fully track. Anyone who appears to be tracking all of it is either not doing much else, or is summarising at a level of depth that is not actually useful. The goal is to know enough to think clearly about the things that matter for your work, not to achieve total coverage of a field that is expanding faster than any individual can absorb.
The filter is not about missing less. It is about being more confident in what you do see.
If something here is wrong, outdated, or missing context, the feedback link below is the place for it. The changelog will reflect any significant corrections.
Sources referenced in this post: simonwillison.net, The Pragmatic Engineer, Latent Space, Hacker News, arXiv, HuggingFace Papers.
Commissioned, Curated and Published by Russ. Researched and written with AI. You are reading the latest version of this post. View all snapshots.