Feeds, well-formed XML and Vista

The Microsoft RSS Blog just announced that Vista will only accept RSS feeds that are well-formed XML.

I agree with Nick, who commented "This is the right thing to do, and I'm glad you're doing it - thanks". I'd like to add some emphasis to that statement though: "This is the right thing to do, and I'm glad you're doing it - thanks".

See, neither Nick's aggregator nor mine requires well-formed XML. This is because there are a lot of non-well-formed feeds out there, and the typical aggregator user doesn't care about XML specs, they just want to see the feed content. And if you're requiring well-formed XML, something as small as a single "&" in a single post will invalidate the entire feed, for as long as that post remains in the feed (which can be weeks depending on the update frequency of the feed).

Microsoft being more strict than us has the following positive results though:

  • It will most likely reduce the number of invalid feeds out there, making it easier for everyone to parse feeds.
  • Microsoft gets positive press for "doing the right thing".
  • For those feeds that still break, it may be another reason for people to look for alternative aggregators that can read that feed they're interested in.
Maybe there's an exception to Postel's law after all.

TrackBack URL for this entry: http://www.hutteman.com/scgi-bin/mt/mt-tb.cgi/198
Comments

Am I mistaken in my understanding that almost all complex querystrings violate well-formed xml protocols? That would lead me to believe that the protocol is poorly written.

Posted by Don McArthur at November 4, 2005 7:58 PM

Hmm part of the problem is that those ppl who are responsible for mal-formed xml feeds prolly don't know that it is mal-formed. If programs were to notify the user when a feed is malformed (and how), yet still try and show it in it's best form would be the best.

Posted by Factory at November 4, 2005 8:16 PM

Feeds, well-formed XML and Vista

Trackback from Lorenzo Barbieri @ UGIblogs! at November 4, 2005 9:11 PM

Don: if someone creates a feed by simply concatenating a few strings together, a complex querystring would indeed make the xml non-well-formed. If you're using an xml library to build your feed though, the "&" would be escaped to "&", which is valid. Not everybody uses an xml library though.

Posted by Luke Hutteman at November 4, 2005 9:28 PM

So, dumb question then. Why does SharpReader not quite know how to deal with the feeds that contain odd tags in the body of the message. Mostly, these are feeds with enclosures and the "iframe" tag, but I have also seen it with Amazon references. Here are some examples:
http://www.girldetective.net/wp-rss2.php (iframe problem)
http://innovationchallenge.typepad.com/innovation_challen/rss.xml (iframe problem)

Thanks, Jack

Posted by Jack Vinson at November 5, 2005 1:07 AM

Prisoner's Dilemma

Trackback from Sam Ruby at November 5, 2005 5:11 AM

Luke, thanks, but sometimes I see feed validators throwing an error if there is more than one equals sign in the querystring, too. Or am I misreading that?

Posted by Don McArthur at November 5, 2005 2:55 PM

Don, if the URL is something like http://www.google.com/search?hl=en&q=xml, and the ampersand isn't escaped, an XML parser will start parsing an entity reference at the ampersand, and will find it's invalid when it gets an equals sign instead of a semicolon. This is probably what you're seeing.

Posted by Carey Evans at November 5, 2005 4:29 PM

Well formed feeds in Vista
Just in case you haven't heard, Vista will require well-formed XML for RSS feeds. This will help us to avoid the inevitable scenario of "breaking the bugs" in the future. Your small aggregator software provider with 50,000 users may be...

Trackback from Randy Holloway Unfiltered at November 5, 2005 4:31 PM

(Later) results:


  • Apps built on top of the RSS platform in Vista lose data quietly.

  • Apps needing reliable support (one of the stated goals of the decision to go draconian) can't use the Vista RSS platform.

Or do you think Microsoft platform decisions can (still) cause event horizons?

Posted by Jeremy Dunck at November 5, 2005 4:52 PM

Jack: SharpReader filters out iframes for security purposes; see http://diveintomark.org/archives/2003/06/12/how_to_consume_rss_safely

Posted by Luke Hutteman at November 6, 2005 1:17 AM

Thanks for the reply, Luke. I wonder if there isn't something funny about those feeds then. Because I _see_ the iframe tags in my feed: the tags and underlying code are displayed as text. I'd much rather they be completely stripped, as you suggest. The current Girl Detective feed (linked above), for example, has a content:encoded section that contains a bunch of iframes. When I view in SharpReader these come through as flat text.

Posted by Jack Vinson at November 6, 2005 11:55 PM

While subscribed to:

http://xml.weather.yahoo.com/forecastrss?p=70448

I noticed that when things are sorted numerically but are not correct timewise. For example an item at 10:53 am comes before 12:53 am, when 12:53 am the same day is actually before 10:53 am.

Posted by Justin Goldberg at November 9, 2005 1:06 PM

While subscribed to:

http://xml.weather.yahoo.com/forecastrss?p=70448

I noticed that when things are sorted numerically but are not correct timewise. For example an item at 10:53 am comes before 12:53 am, when 12:53 am the same day is actually before 10:53 am.

Posted by Justin Goldberg at November 9, 2005 1:15 PM

FWIW, I'm having no problems using SharpReader 0.9.6 and .NET Framework 2.0 which I noticed quite by accident on the M$ download site (Windows Update)

Posted by at November 11, 2005 4:58 PM

My Sharpreader has also stopped working all of a sudden (2005 framework installed. If I add a new feed, I get the error:

The type initialise for "System.Net.HttpWebRequest" threw and exception.

IF I click on Send Error Report, nothing seems to happen. The program is useless now as it cannot add new or retrieve old feeds.

Si

Posted by at November 17, 2005 3:04 AM

Si - it seems like some people are running into issues with SharpReader after installing the 2005 framework, while others (like myself, and also the commenter above you) do not. The error seems to be similar to what happens when service pack 1 for .NET 1.1 is not installed (see this faq entry). One possibly resolution for that issue is to remove the SharpReader.exe.config file from your SharpReader directory and restart SharpReader. Give that a try and see if that fixes your issue.

Posted by Luke Hutteman at November 17, 2005 2:20 PM

Thanks Luke I will do that and let you know what happens

Si

Posted by Si at November 17, 2005 2:46 PM

Aye it works - thanks for the tip, didnt spot that FAQ entry before!

Si

Posted by Si at November 17, 2005 2:53 PM

Isn't the success of a protocol more about a level of acceptance rather than strict compliance? How many web pages adhere strictly to web standards? I can't help thinking that people will simply use alternative readers if a built in Vista one doesn't work. It remains to be seen whether a critical mass will be reached were either RSS providers are forced to fix up their data or whether alternative readers will dominate in a 'not strictly RSS but good enough to be practical' world.

Posted by Adam Ralph at November 30, 2005 12:23 PM

Have they taken a role of a teacher? But were they asked to do that? I bet they were not. And what the use for them?

Posted by at January 6, 2006 5:19 PM

If they are confident of what they are doing let them do it. But I think it's preventing a lot of users from using the service.

Posted by Alan at January 6, 2006 5:22 PM

Hmm part of the problem is that those ppl who are responsible for mal-formed xml feeds prolly don't know that it is mal-formed. If programs were to notify the user when a feed is malformed (and how), yet still try and show it in it's best form would be the best.

Yes, I really don't understand why, e.g., web browsers don't helpfully flag malformed source documents even as they do their best to render them. That much should be trivially easy to do.

Posted by Hamilton Lovecraft at July 13, 2006 7:54 PM
This discussion has been closed. If you wish to contact me about this post, you can do so by email.