The BBC's RSS Feed
Due to the incorrect way the BBC’s RSS 2.0 feed handles guids, RSS readers are repeatedly left displaying duplicate articles.
Let’s have a look at why this happens with a sample article from their feed:
<item>
<title>
<![CDATA[
'We fell off the face of the earth': Dad-daughter duo who took on 7,500 miles for TV
]]>
</title>
<description>
<![CDATA[
Molly Clifford and her father are part of this year's line up for the BBC's Race Across the World.
]]>
</description>
<link>
https://www.bbc.com/news/articles/c9951jrr18no?at_medium=RSS&at_campaign=rss
</link>
<guid isPermaLink="false">https://www.bbc.com/news/articles/c9951jrr18no#3</guid>
<pubDate>Fri, 03 Apr 2026 05:19:07 GMT</pubDate>
<media:thumbnail width="240" height="135" url="https://ichef.bbci.co.uk/ace/standard/240/cpsprodpb/bb22/live/0bdf4fa0-2db9-11f1-934f-036468834728.jpg"/>
</item>
Specifically, let’s focus on the guid:
<guid isPermaLink="false">https://www.bbc.com/news/articles/c9951jrr18no#3</guid>
What I’ve seen the BBC doing is incrementing the suffix after the # and, as per the RSS 2.0 specification below, RSS readers tend to treat each incremented guid as a new entry:
guidstands for globally unique identifier. It’s a string that uniquely identifies the item. When present, an aggregator may choose to use this string to determine if an item is new.
The above article has been fetched by Gobbler twice and the title had changed between fetches:
| guid | title | content hash |
|---|---|---|
| https://www.bbc.com/news/articles/c9951jrr18no#2 | ’We fell off the face of the earth’: Dad and daughter raced across world but had to keep it secret | a8159e96 |
| https://www.bbc.com/news/articles/c9951jrr18no#3 | ’We fell off the face of the earth’: Dad-daughter duo who took on 7,500 miles for TV | 17cbc6b7 |
Strictly speaking, the RSS 2.0 specification doesn’t prohibit a guid from changing. Additionally, there are no update semantics available (e.g., an updatedDate element) in the 2.0 specification. So, in this scenario with a change of title, an incremented guid is almost justifiable.
However, this isn’t always the case. Let’s look at a different example in the Gobbler database:
| guid | title | content hash |
|---|---|---|
| https://www.bbc.com/news/articles/cyv1q9gz39do#0 | How English-only condolences undid one of Canada’s top CEOs | 8845f9d6 |
| https://www.bbc.com/news/articles/cyv1q9gz39do#1 | How English-only condolences undid one of Canada’s top CEOs | 8845f9d6 |
| https://www.bbc.com/news/articles/cyv1q9gz39do#3 | How English-only condolences undid one of Canada’s top CEOs | 8845f9d6 |
Gobbler has fetched this article three times. The article hasn’t changed at all: same title, same content, and same published date1, all validated by the content_hash. This is simply not justifiable. There is no reason to change the guid if the article hasn’t changed.
What could the BBC do differently?
First, don’t change the guid when the article content hasn’t changed. Just don’t.
Second, if the article has been updated, use <atom:updated> in the <item>. The feed declares the Atom namespace and already uses it:
<atom:link href="https://feeds.bbci.co.uk/news/uk/rss.xml" rel="self" type="application/rss+xml"/>
Lastly, and this is a bit of a stretch goal, put the full content of each article in the feed instead of a summary.
Footnotes
-
I couldn’t fit everything in the table. ↩