Opinion15 min readArticle · 04

The 'listicle penalty' isn't about listicles

What everyone is calling Google's 'listicle penalty' isn't about format. It is about self-elevation without methodology — and the originator of the data has already said so.

Ibrahim Furkan Ozcelik · works on GEO and AI search

PublishedMay 30, 2026UpdatedJuly 7, 2026Sourceibrahimfurkanozcelik.com

Sometime between Lily Ray's February 3, 2026 Substack post and the April–May trade press recaps, a narrative hardened: Google penalized listicles. Search Engine Land wrote it. Search Engine Roundtable wrote it. Sword and the Script wrote it. Half a dozen SEO newsletters wrote it. The phrase 'self-promotional listicle penalty' became the working shorthand for what the December 2025 broad core update did to a particular kind of B2B SaaS content.

The narrative is wrong. Or, more precisely, the narrative is a flattened version of an argument the person who started it has since publicly walked back. Ray's own March 18 follow-up was explicit: 'comparison content can absolutely be a legitimate and useful content format when done thoughtfully and in moderation. The problem isn't the tactic itself; it's the scale.' Her May 13 piece went further: 'the tools themselves are not the problem, but the implementation can be.' The originator of the data is not in the trade-press position the trade press attributes to her.

Devesh Khanal at Grow and Convert reached the same conclusion on May 26 with a piece titled 'Self-Promotional Listicles Aren't the Problem. Bad Content Is.' This essay starts from his thesis and pushes further, because the case has gotten more decisive than either Ray or Khanal has yet argued in public: there is a falsification test of the format hypothesis that nobody has applied, a technical counter-mechanism that nobody has engaged, and an existence proof for transparent self-promotion that has been hiding in plain sight for nine months. Together they make a stronger claim than 'listicles aren't the problem.' They make the claim: the signal Google is downweighting is unearned self-elevation, and the format is the scapegoat.

Quality · 00

High quality content in the age of AI: how methodology became the signal

For most of SEO's history, 'high quality content' meant comprehensive, well-researched, and authoritative. In the generative engine optimization era, high quality has acquired a methodological dimension: does the author explain how they arrived at their conclusion? Was it tested? By whom? On what dataset?

AI engines are retrieval systems. When they extract a claim from a page, they can cross-reference it against other indexed sources. A page that lists 'Top 11 Best Enterprise SEO Agencies' with scoring criteria, named tools, and disclosed bias is a verifiable node in an information network. A page that lists the same agencies without criteria is unverifiable — an island the retrieval system cannot trust.

This is why the 'listicle penalty' narrative missed the point. The sites Lily Ray documented as casualties were not penalized because they used the listicle format. They were penalized because they shared a methodology problem: company self-ranked #1, no disclosed scoring, AI-generated body text, no first-hand testing. The format was the substrate the signals clustered on. Strip the bad signals and the format survives — the data says so explicitly.

The following sections apply a falsification test to the format hypothesis, surface the technical counter-mechanism nobody engaged, and produce the existence proof that was hiding in plain sight for nine months.

Misreading · 01

What Ray actually said

Ray's February 3 post is the canonical artifact in the listicle-penalty narrative. Reread it. She examined seven B2B SaaS sites that lost between 29% and 49% of SISTRIX U.S. visibility starting in mid-to-late January 2026 — measurement window after the December 2025 broad core update had completed. She did not name any of the seven (the largest was an '$8B B2B brand' with a −49% drop). She inventoried 191 self-promotional listicles across the affected subfolders. She ran originality.ai against samples and got 100% AI-generated confidence on multiple pages.

The signals she flagged were specific: company self-ranking #1 without independent verification; year-swap titles (2026 swapped in over a 2025 page with no substantive update); AI-generated text at scale; missing first-hand testing; programmatic templates; schema misuse, including fabricated AggregateRating; artificial dateModified refreshes. The pattern she described was a conjunction of behaviors, not a format.

Her March 18 piece corrected the trade-press distortion before it had fully crystallised: comparison content is legitimate; scale is the problem. Her May 13 piece corrected it again, explicitly: the tools are not the problem, the implementation is. Both posts were available before the April–May recaps that continued to cite her February 3 work as evidence of a categorical penalty. The originator of the data was already in a different place than the discourse built on top of it.

If you only read one Ray post, read May 13. If you've only read February 3, you have an out-of-date model of her position.

“The problem isn't the tactic itself; it's the scale.”
— Lily Ray · March 18, 2026

Falsification · 02

The Popper test the discourse never applied

Seer Interactive published a 2 million-citation analysis on February 25, 2026 covering November 2025 through February 2026. They found two things at the same time. ChatGPT's listicle citations declined 30% month-over-month between December and January, dropping from 17.2% of citations to 15.5%. Google AI Overviews' listicle citations, measured against the same prompts, were 'trending up or staying flat.'

This is the cleanest possible falsification of the format hypothesis. If the signal Google and the LLMs were downweighting was the listicle as a structural form, both ChatGPT and AI Overviews would move in the same direction on identical inputs. They did not. Same listicle URLs. Same prompts. Opposite trajectories. Whatever changed in December and January, it cannot be the format itself — because the format is invariant across the two platforms and the platforms diverged.

Every published commentary on the Seer data treated the divergence as a curiosity ('listicles in AIOs are doing better, weird'). None of them treated it as what it actually is: a controlled experiment in which the format hypothesis fails. The hypothesis 'listicles are getting demoted because they're listicles' makes a prediction. The data refutes it. The remaining work is figuring out what the platforms were actually filtering on — and the answer has to be something downstream of the format itself.

Mechanism · 03

The counter-mechanism nobody engaged

On March 1, 2026, Carolyn Holzman published a forensic analysis arguing the demotion mechanism is technical, not editorial. Her case: Google's Helpful Content classifier treats jump-link fragment identifiers as duplicate-page signals, and on February 2, 2026 it demoted not just self-promotional B2B listicles but also Google's own developer documentation and Grokipedia — neither of which is a self-promotional anything. If a single classifier change hits a fragment-heavy listicle, Google's own developer docs, and a Wikipedia clone on the same day, the proximate cause is not 'AI is judging your editorial intent.' The proximate cause is a duplicate-detection rule firing on a shared structural feature.

She might be wrong. The mechanism might be partial, or one of several stacked changes. But the silence about her counter-mechanism is itself the finding. The SEO trade press reached for the narrative explanation — Google is cracking down on bad behavior — and skipped the technical one. The narrative explanation is more shareable; the technical one is more falsifiable. Independent writers should engage the falsifiable one even when it spoils the better story.

The two findings — Seer's divergence and Holzman's mechanism — together strip the case for a categorical listicle penalty down to almost nothing. The cross-platform behavior rules out a format-as-signal hypothesis. The Google-docs-and-Grokipedia comparison rules out a self-promotion-as-signal hypothesis at the level Holzman identifies. What's left for the format-penalty story to defend?

Existence proof · 04

The existence proof in plain sight

iPullRank publishes a piece called 'Top 11 Best Enterprise SEO Agencies of 2026.' iPullRank is listed at position #1. The first paragraph of the post reads: 'Obviously, we're going to kick off this listicle with ourselves. Yes, yes, we know, SEO faux pas.' The methodology section names the specific tool used to measure citations (Profound), the topic set, and the platforms covered. The author is Garrett Sussman; the framing is Mike King's. The page was published June 26, 2025 and last updated March 10, 2026 — that is, during the alleged crackdown, with the year-swap pattern the trade press flagged as a demotion signal.

The page is still live. It still ranks. Mike King is still publishing under his name. If 'self-promotional listicle ranked #1' were the signal Google's classifier downweighted, this page would have lost visibility on the exact dates Ray's anonymous casualties did. It didn't. The difference between iPullRank's #1 self-listing and the SaaS casualties Ray documented is not the format and not the self-placement. It is everything else: disclosed bias, named methodology, named author, real measurement, no template scaling.

Evertune published a 25,000-URL analysis on May 19, 2026 — after the listicle-penalty narrative had peaked. Of roughly 400 million citations across all models, 63% pointed to listicles. Half of the URLs Evertune analysed were listicles. Listicles, in May 2026, are still the dominant cited format in AI search. The narrative outran the data.

“Across nearly 400 million citations from all models, 63% pointed to listicles.”
— Evertune · May 19, 2026

Signal stack · 05

What is actually being penalized

Drop the format frame. The signal Google's classifier appears to act on is a conjunction — not any single property, but the co-occurrence of several. Read down the rows below and ask, against any given page, how many fire. Pages that fire none are fine. Pages that fire one or two are fine. Pages that fire five at once are the ones losing visibility.

Self-rank #1Publisher's own product placed at #1 in their own listicle without disclosed bias or independent verification.
AI-generated bodyPage body returns >90% AI confidence on detectors. Not because AI generation is itself banned, but because at this density it correlates with absent first-hand experience.
Year-swap freshnessTitle updated from 2025 to 2026 with no substantive content rewrite — dateModified manipulated to look fresh.
No disclosed methodologyNo section explaining what was tested, what tool was used, what criteria were applied, when prices were checked.
Programmatic scale100+ near-identical pages produced from a template. The pattern at site level, not the page in isolation.
Schema manipulationFabricated AggregateRating, fake Review schema, claims the page cannot substantiate.

These are the signals Lily Ray's casualties shared. They are the signals iPullRank's surviving page does not. They are the signals Google's May 15, 2026 Search Central guidance described as 'seeking inauthentic mentions across the web,' framing the problem as manipulation intent rather than form. They are also — and this is the part the trade press has not engaged with — the signals an honest publisher can choose to avoid without giving up the listicle format.

The reframe is not 'listicles are good.' It is 'these six signals are what's being penalized, and listicles are simply where they tend to cluster.' Avoid the signals; keep the format. The data says it works.

Test · 06

The Self-Elevation Test

Apply this to any self-promotional listicle on your own site before publishing. If you can answer all six honestly, the demotion narrative does not describe your page.

01
Did a human use the products you're listing?
First-hand experience is the load-bearing E-E-A-T signal. If the answer is 'we read the docs,' the listicle is research-summary content and should not pretend to be a tested ranking.
02
Is your own product placed honestly?
If your product is genuinely the best by the criteria you publish, place it #1 and disclose. If it isn't, place it where it actually lands. iPullRank's #1 disclosure works because the criteria are visible; if the criteria were hidden, the placement would be a tell.
03
Is the methodology visible on the page?
A 'how we tested' section, even a paragraph, separates a journalistic listicle from a marketing one. Name the tool, the time period, the criteria.
04
Does the page link to alternatives, including competitors?
AI engines verify claims against other sources. A page that links only to itself is unverifiable in the retrieval model. Allsopp's research is explicit on this.
05
Did anything substantive change since the last update?
If the only change is the year in the title, you are running the freshness-manipulation pattern Ray named. Substantive updates — new data, new entries, removed entries — earn the dateModified.
06
Is the page the product of a template firing across 100 URLs, or a deliberate edit?
Scale is the multiplier. A single well-researched self-promotional listicle is the iPullRank pattern. The same template across 200 URLs is the casualty pattern.

FAQ · 07

Frequently asked questions

Q · 01

Is there an actual Google penalty against listicles?

No. Google has not announced a format-targeted policy. Its May 15, 2026 Search Central guidance frames the problem as 'seeking inauthentic mentions across the web,' which targets manipulation intent, not form. Its spokesperson Jennifer Kutz told The Verge in April 2026 that Google is working to combat 'low-quality listicle content,' with the qualifier on quality, not on format. The data — Seer's cross-platform divergence and Evertune's 63% citation share — is consistent with no format penalty.

Q · 02

Then why did the seven SaaS sites Lily Ray analysed lose visibility?

Because they ran a conjunction of signals — self-rank #1, AI-generated body, year-swap freshness, no methodology, programmatic scale, schema manipulation — at the same time. The format was the substrate the conjunction lived on. Strip the conjunction and the format survives. The proof is iPullRank's #1-self-listed enterprise SEO listicle, still ranking, still updated, still cited.

Q · 03

Aren't you contradicting Lily Ray by writing this?

No. Ray made the same argument explicitly in her March 18 and May 13 posts: the tactic is legitimate, the implementation is the problem, the scale matters. This essay engages the discourse downstream of her work, not her work itself.

Q · 04

What about the Holzman fragment-identifier theory?

It is the strongest published technical counter-mechanism. It might be wrong; it might be partially right. The reframe in this essay does not depend on Holzman being correct — the Seer cross-platform divergence and the iPullRank existence proof falsify the categorical-format-penalty story on their own. Holzman matters because she is the only writer who looked for a mechanism instead of a morality tale, and the SEO community should engage that work rather than ignore it.

Q · 05

What should I do with my self-promotional listicle on Monday morning?

Run the six-question Self-Elevation Test against it. If it answers cleanly, leave the page alone — it does not match the casualty pattern. If three or more answers fail, the work is not 'remove the listicle.' The work is to fix the failing signals: write the methodology section, add first-hand testing notes, link to competitors, change the placement if your product does not earn the rank, and do a substantive update on the next refresh.

Q · 06

Are AI engines going to keep citing listicles?

On current evidence, yes. Evertune's May 19, 2026 analysis of 25,000 URLs and 400 million citations found listicles received 63% of all model citations. ChatGPT specifically showed a 30% month-over-month decline December to January and may continue compressing toward the surviving honest pages, but AI Overviews held flat or rose on the same prompts. Both directions are consistent with the reframe: AI engines are filtering on quality signals, not on format.

Q · 07

What does 'high quality content for SEO' actually mean in 2026?

High-quality content means methodology over production value. It answers: who wrote this, what did they test, what criteria did they use, and what data backs the claim? For AI engines specifically, quality is verified by cross-referencing claims against other indexed sources. A page that shows its work — named authors, disclosed methodology, external citations, first-hand testing notes — becomes a trusted node. A page that asserts conclusions without evidence is unverifiable. The format (listicle, how-to, essay) is secondary to the methodology behind it.

Sources

Lily Ray — Is Google Finally Cracking Down on Self-Promotional Listicles? (Feb 3, 2026)lilyraynyc.substack.com →
Lily Ray — Your GEO Strategy Might Be Destroying Your SEO (Mar 18, 2026)lilyraynyc.substack.com →
Lily Ray — It Works Until It Doesn't (May 13, 2026)lilyraynyc.substack.com →
Seer Interactive — The listicle window is closing in AI search: 30% decline MoM (Feb 25, 2026)seerinteractive.com →
Carolyn Holzman — Forensic analysis: Is Google cracking down on listicles? (Mar 1, 2026)carolynholzman.github.io →
Evertune in Search Engine Land — AI search loves listicles: What 25,000 URLs reveal about citations (May 19, 2026)searchengineland.com →
Devesh Khanal / Grow and Convert — Self-promotional listicles aren't the problem. Bad content is. (May 26, 2026)growandconvert.com →
Glen Allsopp / Detailed on Ahrefs — Do self-promotional 'Best' lists boost ChatGPT visibility? (Dec 4, 2025)ahrefs.com →
iPullRank — Top 11 Best Enterprise SEO Agencies of 2026 (last updated Mar 10, 2026)ipullrank.com →
Google Search Central — Optimizing your website for generative AI features on Google Search (May 15, 2026)developers.google.com →
Glenn Gabe — The Core Before Christmas: December 2025 broad core update analysis (Jan 5, 2026)gsqi.com →
Aggarwal et al. — GEO: Generative Engine Optimization (KDD 2024)arxiv.org →