SharpReader 0.9.7.0 is now available at sharpreader.net.

Changes since the last version are:

  • Run internal browser in restricted security zone in order to make IE responsible for blocking restricted content, instead of just doing so by parsing and stripping tags.
  • Allow embedded CSS styles in item descriptions (was previously disabled because of javascript exploits that are now caught because of the security zone).
  • Support both <commentRSS> as well as <commentRss> as there was some confusion as to the proper capitalization of this element.
  • Fixed linebreak handling for some feeds.
  • Improved handling of relative urls in atom feeds (like Sam Ruby's feed for instance).
  • Now displaying enclosure links at the bottom of the item description.
  • Fixed installer to no longer complain if only .NET 2.0 is installed.

Phil Ringnalda:

Oh, and Luke? Pretty nice showing for a one-person unpaid hobby aggregator, mate

Thanks Phil :-)

I'm actually not that surprised SharpReader managed to get all these tests right; what does surprise me is that 9 out of 11 aggregators don't...

The Microsoft RSS Blog just announced that Vista will only accept RSS feeds that are well-formed XML.

I agree with Nick, who commented "This is the right thing to do, and I'm glad you're doing it - thanks". I'd like to add some emphasis to that statement though: "This is the right thing to do, and I'm glad you're doing it - thanks".

See, neither Nick's aggregator nor mine requires well-formed XML. This is because there are a lot of non-well-formed feeds out there, and the typical aggregator user doesn't care about XML specs, they just want to see the feed content. And if you're requiring well-formed XML, something as small as a single "&" in a single post will invalidate the entire feed, for as long as that post remains in the feed (which can be weeks depending on the update frequency of the feed).

Microsoft being more strict than us has the following positive results though:

  • It will most likely reduce the number of invalid feeds out there, making it easier for everyone to parse feeds.
  • Microsoft gets positive press for "doing the right thing".
  • For those feeds that still break, it may be another reason for people to look for alternative aggregators that can read that feed they're interested in.
Maybe there's an exception to Postel's law after all.

My hosting company ran into some issues this weekend that, besides causing a two day outage for both my blog and for sharpreader.net, also potentially caused some email to get lost. If you sent me anything on Friday, Saturday or Sunday, you may need to resend it - I'm not sure how much is lost for good and how much will be redelivered later :-(

Also, if anyone has any positive experiences with hosting a 50+ Gb/month site at a reasonable price (I currently only pay $17/month), please let me know. This wasn't the first outage I've had, nor do I expect it to be the last. Maybe it's time to move on.

update: looks like my hosting company still has some issues to be worked out; I can't send any emails through outlook for getting some weird "503 valid RCPT command must precede DATA" error (though sending through the web-based interface seems to work fine), and for some reason my movable type install is not showing any of your comments. Comments have not been lost though, as I can see them through the MT admin interface, and am also getting the email notifications (i'm actually getting those twice now... weird) - I just need to figure out why it's not rebuilding the pages correctly...

update 2: email issue has been fixed - for some reason I had to use some outlook setting that wasn't needed before their servers crashed... now all I need is to figure out what's going on with MT...

update 3: turned out that all comments were in a pending status and needed to be manually approved (the email notifications conveniently failed to mention this though). My MT-Blacklist was setup to only force moderation on old posts, but since the crash-recovery it now seems to force it on new ones as well. Oh well - I'm long overdue for an upgrade to MT 3.2 + SpamLookup anyway; guess it's time to stop procrastinating (but not tonight).

SharpReader 0.9.6.0 is now available at sharpreader.net.

Changes since the last version are:

  • Support for Atom 1.0.
  • Listening on port 5335 is now done on localhost only. This should prevent your firewall from showing this alert message.
  • You can now change the url of a feed through the feed properties pane.
  • If your subscriptions.xml file got corrupted somehow, sharpreader now reads the previous version upon startup (this would previously crash SharpReader on startup).
  • Fixed "The server committed an HTTP protocol violation." errors that are often thrown by wordpress feeds because of an invalid header. SharpReader now ignores these bad headers instead of throwing an exception.
  • Fixed a bug in the feed-discovery code.
  • Improved http redirect handling.
  • Uninstaller now asks whether to remove all settings and cache files.

Over a month after I first noticed large amounts of downloads of SharpReader0940, the requests are still pouring in. Since the file in question has long been removed, it does not create any major bandwidth issues anymore, but I'm still getting tired of all the entries in my error- and access-log.

I therefore decided to go through my access-log once more in order to block the IP-addresses of the worst offenders. In doing so, I noticed a few interesting things:

  • While most IPs that request SharpReader0940 only do that, some also generate "real" traffic (requests for web-pages, images, etc.) This real traffic typically has a different UserAgent compared to the 0940 downloads.
  • An interesting mix of requests came from 129.33.49.251 (bi01p1.nc.us.ibm.com): a number of 0940 downloads, some regular page requests, and then some requests from UserAgent "w3search". What's IBM up to?
  • The 2 biggest offenders (64.233.172.35 and 64.233.173.80) each generated over 10,000 requests over a 9 day period. Both also did have a small amount of "real" traffic though. A search for "64.233.172 ip" indicates this traffic comes from Google's Web Accelerator. My guess is that the traffic is really from people who happen to have GWA installed on their system, but it's possible that GWA itself is at fault. After all, why would they download the same file over and over again? Doesn't GWA cache files?
Considering the requests are still coming in after a month, and the same (no longer existing) file is still being requested, I doubt this is a deliberate DOS attack - a would-be attacker would've either stopped by now, or switched to another file that is still there.

Since all these requests still have a firefox/mozilla UserAgent, I'm thinking it's probably some kind of firefox plugin that's misbehaving; maybe a built-in test-case that due to some bug ends up downloading this file or something.

Anyway, I've included the now-blocked IPs below. If you find you can no longer access sharpreader.net, you may want to check if you're coming from one of them. Also, if anyone from IBM reads this that may be able to shed some light on the traffic coming from their system (Sam? Mark? you there?;-), I'd appreciate the info.

12.226.103.254 209.82.25.186 24.32.44.157 66.91.159.234 69.241.33.69
128.104.179.188 211.28.74.17 24.7.71.132 67.153.2.226 69.245.58.57
128.32.14.222 212.76.39.230 24.79.132.97 67.167.238.80 69.39.97.204
128.82.79.37 213.168.5.18 59.163.68.130 67.180.141.4 70.110.253.24
130.216.226.102 216.136.66.212 59.92.97.34 67.79.29.151 70.25.201.2
136.152.160.76 216.177.114.26 60.234.147.237 68.104.146.137 70.25.27.110
138.89.49.107 216.222.3.2 63.203.23.109 68.11.35.102 70.28.185.4
141.151.200.193 216.78.207.5 64.185.5.10 68.120.154.154 70.92.30.56
141.151.218.105 216.87.87.233 64.231.252.117 68.188.226.231 71.108.20.247
141.153.177.191 217.196.241.113 64.233.172.35 68.189.139.61 71.242.90.69
151.205.246.211 218.186.218.137 64.233.173.80 68.226.5.17 71.3.202.117
193.1.184.254 24.119.235.155 65.115.96.254 68.37.62.16 72.14.192.5
199.33.173.215 24.119.251.146 65.121.32.222 68.38.86.88 72.14.194.28
201.141.27.217 24.147.255.39 65.199.60.87 68.39.182.246 80.163.170.70
201.141.31.126 24.165.242.237 65.204.229.11 68.45.42.253 81.211.226.196
201.141.67.135 24.168.251.203 65.211.194.101 68.71.120.209 82.36.128.78
201.252.2.226 24.173.2.239 65.223.100.46 68.95.238.138 83.147.147.130
202.156.2.234 24.188.78.43 65.24.182.180 69.110.157.71 83.228.4.48
203.110.151.78 24.214.164.6 65.25.141.70 69.162.41.147 83.76.250.181
206.47.125.163 24.219.210.118 65.30.161.189 69.163.42.1 83.78.169.125
207.190.68.6 24.238.181.140 65.93.134.195 69.197.252.213 84.230.85.27
207.215.1.60 24.24.110.55 66.177.132.89 69.211.12.213 84.92.82.201
209.241.220.227 24.251.148.61 66.223.205.191 69.237.149.177 85.81.76.9

My sharpreader.net domain ran out of bandwidth earlier today. Checking the access log showed that this was due to a huge amount of downloads of SharpReader 0.9.4.0 - a version that's not even actively linked to from sharpreader.net.

All of these downloads have http://news.bbc.co.uk/rss/newsonline_world_edition/front_page/rss091.xml as the referrer, and all (except for a single MSIE request) have some version of Mozilla/5.0 (firefox/gecko) as the useragent. The BBC RSS feed does not contain any links to sharpreader0940.zip either.

I have no clue as to what is causing this. There are over 1000 unique IP's; some of which only request the file once, others multiple times in either random or fixed intervals (oftentimes as close as 2 minutes apart).

If anyone reading this has any idea on what may be going on here, please leave a comment below - thanks.

I just realized that I've accidentally let my sharpreader.net domain expire - oops!

I'm trying to get this resolved ASAP and get the site back up.

UPDATE: though my ISP/Registrar never actually responded to my support request, someone obviously did read it as the site is back up again. Would've been nice if they'd sent me a reminder email before this happened though...

Copyright © 2003, 2004 Luke Hutteman