The Microsoft RSS Blog just announced that Vista will only accept RSS feeds that are well-formed XML.
I agree with Nick, who commented "This is the right thing to do, and I'm glad you're doing it - thanks". I'd like to add some emphasis to that statement though: "This is the right thing to do, and I'm glad you're doing it - thanks".
See, neither Nick's aggregator nor mine requires well-formed XML. This is because there are a lot of non-well-formed feeds out there, and the typical aggregator user doesn't care about XML specs, they just want to see the feed content. And if you're requiring well-formed XML, something as small as a single "&" in a single post will invalidate the entire feed, for as long as that post remains in the feed (which can be weeks depending on the update frequency of the feed).
Microsoft being more strict than us has the following positive results though:
Am I mistaken in my understanding that almost all complex querystrings violate well-formed xml protocols? That would lead me to believe that the protocol is poorly written.
Posted by Don McArthur at November 4, 2005 7:58 PMHmm part of the problem is that those ppl who are responsible for mal-formed xml feeds prolly don't know that it is mal-formed. If programs were to notify the user when a feed is malformed (and how), yet still try and show it in it's best form would be the best.
Posted by Factory at November 4, 2005 8:16 PMFeeds, well-formed XML and Vista
Don: if someone creates a feed by simply concatenating a few strings together, a complex querystring would indeed make the xml non-well-formed. If you're using an xml library to build your feed though, the "&" would be escaped to "&", which is valid. Not everybody uses an xml library though.
Posted by Luke Hutteman at November 4, 2005 9:28 PMSo, dumb question then. Why does SharpReader not quite know how to deal with the feeds that contain odd tags in the body of the message. Mostly, these are feeds with enclosures and the "iframe" tag, but I have also seen it with Amazon references. Here are some examples:
http://www.girldetective.net/wp-rss2.php (iframe problem)
http://innovationchallenge.typepad.com/innovation_challen/rss.xml (iframe problem)
Thanks, Jack
Posted by Jack Vinson at November 5, 2005 1:07 AMLuke, thanks, but sometimes I see feed validators throwing an error if there is more than one equals sign in the querystring, too. Or am I misreading that?
Posted by Don McArthur at November 5, 2005 2:55 PMDon, if the URL is something like http://www.google.com/search?hl=en&q=xml, and the ampersand isn't escaped, an XML parser will start parsing an entity reference at the ampersand, and will find it's invalid when it gets an equals sign instead of a semicolon. This is probably what you're seeing.
Posted by Carey Evans at November 5, 2005 4:29 PMWell formed feeds in Vista
Just in case you haven't heard, Vista will require well-formed XML for RSS feeds. This will help us to avoid the inevitable scenario of "breaking the bugs" in the future. Your small aggregator software provider with 50,000 users may be...
(Later) results:
Or do you think Microsoft platform decisions can (still) cause event horizons?
Jack: SharpReader filters out iframes for security purposes; see http://diveintomark.org/archives/2003/06/12/how_to_consume_rss_safely
Thanks for the reply, Luke. I wonder if there isn't something funny about those feeds then. Because I _see_ the iframe tags in my feed: the tags and underlying code are displayed as text. I'd much rather they be completely stripped, as you suggest. The current Girl Detective feed (linked above), for example, has a content:encoded section that contains a bunch of iframes. When I view in SharpReader these come through as flat text.
While subscribed to:
http://xml.weather.yahoo.com/forecastrss?p=70448
I noticed that when things are sorted numerically but are not correct timewise. For example an item at 10:53 am comes before 12:53 am, when 12:53 am the same day is actually before 10:53 am.
Posted by Justin Goldberg at November 9, 2005 1:06 PMWhile subscribed to:
http://xml.weather.yahoo.com/forecastrss?p=70448
I noticed that when things are sorted numerically but are not correct timewise. For example an item at 10:53 am comes before 12:53 am, when 12:53 am the same day is actually before 10:53 am.
Posted by Justin Goldberg at November 9, 2005 1:15 PMFWIW, I'm having no problems using SharpReader 0.9.6 and .NET Framework 2.0 which I noticed quite by accident on the M$ download site (Windows Update)
Posted by at November 11, 2005 4:58 PMMy Sharpreader has also stopped working all of a sudden (2005 framework installed. If I add a new feed, I get the error:
The type initialise for "System.Net.HttpWebRequest" threw and exception.
IF I click on Send Error Report, nothing seems to happen. The program is useless now as it cannot add new or retrieve old feeds.
Si
Posted by at November 17, 2005 3:04 AMSi - it seems like some people are running into issues with SharpReader after installing the 2005 framework, while others (like myself, and also the commenter above you) do not. The error seems to be similar to what happens when service pack 1 for .NET 1.1 is not installed (see this faq entry). One possibly resolution for that issue is to remove the SharpReader.exe.config file from your SharpReader directory and restart SharpReader. Give that a try and see if that fixes your issue.
Posted by Luke Hutteman at November 17, 2005 2:20 PMThanks Luke I will do that and let you know what happens
Si
Posted by Si at November 17, 2005 2:46 PMAye it works - thanks for the tip, didnt spot that FAQ entry before!
Si
Posted by Si at November 17, 2005 2:53 PMIsn't the success of a protocol more about a level of acceptance rather than strict compliance? How many web pages adhere strictly to web standards? I can't help thinking that people will simply use alternative readers if a built in Vista one doesn't work. It remains to be seen whether a critical mass will be reached were either RSS providers are forced to fix up their data or whether alternative readers will dominate in a 'not strictly RSS but good enough to be practical' world.
Posted by Adam Ralph at November 30, 2005 12:23 PMHave they taken a role of a teacher? But were they asked to do that? I bet they were not. And what the use for them?
Posted by at January 6, 2006 5:19 PMIf they are confident of what they are doing let them do it. But I think it's preventing a lot of users from using the service.
Posted by Alan at January 6, 2006 5:22 PMHmm part of the problem is that those ppl who are responsible for mal-formed xml feeds prolly don't know that it is mal-formed. If programs were to notify the user when a feed is malformed (and how), yet still try and show it in it's best form would be the best.
Yes, I really don't understand why, e.g., web browsers don't helpfully flag malformed source documents even as they do their best to render them. That much should be trivially easy to do.
Posted by Hamilton Lovecraft at July 13, 2006 7:54 PM