Traffic Update

Over a month after I first noticed large amounts of downloads of SharpReader0940, the requests are still pouring in. Since the file in question has long been removed, it does not create any major bandwidth issues anymore, but I'm still getting tired of all the entries in my error- and access-log.

I therefore decided to go through my access-log once more in order to block the IP-addresses of the worst offenders. In doing so, I noticed a few interesting things:

  • While most IPs that request SharpReader0940 only do that, some also generate "real" traffic (requests for web-pages, images, etc.) This real traffic typically has a different UserAgent compared to the 0940 downloads.
  • An interesting mix of requests came from ( a number of 0940 downloads, some regular page requests, and then some requests from UserAgent "w3search". What's IBM up to?
  • The 2 biggest offenders ( and each generated over 10,000 requests over a 9 day period. Both also did have a small amount of "real" traffic though. A search for "64.233.172 ip" indicates this traffic comes from Google's Web Accelerator. My guess is that the traffic is really from people who happen to have GWA installed on their system, but it's possible that GWA itself is at fault. After all, why would they download the same file over and over again? Doesn't GWA cache files?
Considering the requests are still coming in after a month, and the same (no longer existing) file is still being requested, I doubt this is a deliberate DOS attack - a would-be attacker would've either stopped by now, or switched to another file that is still there.

Since all these requests still have a firefox/mozilla UserAgent, I'm thinking it's probably some kind of firefox plugin that's misbehaving; maybe a built-in test-case that due to some bug ends up downloading this file or something.

Anyway, I've included the now-blocked IPs below. If you find you can no longer access, you may want to check if you're coming from one of them. Also, if anyone from IBM reads this that may be able to shed some light on the traffic coming from their system (Sam? Mark? you there?;-), I'd appreciate the info.

TrackBack URL for this entry:

If they are blocked, how can they read the post? :)

Posted by orangy at July 28, 2005 6:53 AM

I did a scan for mentions of SharpReader within the IBM intranet. It found three occurances, only one of which contained a link, and that one was to

I see nothing which could help explain what is going on here.

Posted by Sam Ruby at July 28, 2005 7:00 AM

I don't know if this is related or not, but the stats for the feed for Spanning Partners show a strange behavior: a handful of SharpReader clients show up one day (~25) and then are gone for a few days, then are back for a few days, etc. The number changes slightly, but is always between 20-25. At least that's what FeedBurner is reporting.

Posted by Charlie Wood at July 28, 2005 7:38 AM

> If they are blocked, how can they read the post?

blocked from, not

Posted by Luke Hutteman at July 28, 2005 9:13 AM

Sam: I'm not so much looking for links to SharpReader on your intranet (even in the unlikely event that they point directly to the, this still wouldn't explain where all the other traffic comes from), but your proxy logs may show which internal IPs are responsible for these requests. Knowing the actual system that (some of) the traffic originates from could help a lot in figuring out what application/plugin is causing it.

Posted by Luke Hutteman at July 28, 2005 9:19 AM

Hey Luke, I have Google Web Accelerator. When I visited the the first time after reading this article, I got the page but with no CSS. I hit refresh, and got 403/Forbidden. I hit refresh again, and got the right page with the styles. Weirdly enough, I could just keep hitting refresh and Firefox would just cycle through the above three over and over again even after clearing my cache. I then turned off GWA for and refreshed. It now shows the page with CSS as it should look, and refreshing has no effect (which is good). So I don't know what that means or if it helps you, but there you go.

Posted by Jack at July 28, 2005 7:01 PM

Were the uncached GWA requests from before you took it 404, or after? The zip *should* have been cached (though who knows what their caching algo looks like), but a 404 should not be cached (it's 404 Dunno, not 404 I Don't Want To Talk To You No More, You Empty Headed Animal Food Trough Wiper). A 410 Gone, on the other hand, should be cached, and Mark will fart in their general direction if they ever request something a second time after getting a 410.

Posted by Phil Ringnalda at July 28, 2005 8:28 PM

Why not contact Google with this info. Perhaps there is an unknowned error in there tool wich this info might help to fix? It's a long shot but better than nothing.

In my stats (wusage) it logs every request even if the file isn't downloaded. It also logs that files bandwidth, even if it's not downloaded. So for example, my CSS-file onlye downloads ones per user, but logs as downloaded ones per each page view. So it says the file take up about 1 GB per month, but thatäs far from the truth.

Even considering the above, the many request from thoose few ip's are really odd.

I googled "SharpReader0940" and found some links like these:

Posted by Bobby at July 28, 2005 8:51 PM

Phil: I agree that a 404 should not be cached, but I'd expect GWA to be smart enough to realize that if the file wasn't there a few seconds ago, it's probably not necessary to try and download it again now - especially after they've already received a couple thousand 404's...

Posted by Luke Hutteman at July 28, 2005 10:03 PM

Hey hows it going?

Regarding the GWA issue, I found this on their FAQ (

4. Does Google Web Accelerator speed up all web pages?

No, it doesn't. For security reasons, Google Web Accelerator won't speed up pages encrypted with the HTTPS: protocol (such as bank records pages). Also, Google Web Accelerator only speeds up web pages, not large data downloads such as MP3 and streaming video files.

That would explain why the GWA is not caching the file as might be expected from a "caching" technology.

2. How does Google Web Accelerator work?

Google Web Accelerator uses various strategies to make your web pages load faster, including:

Sending your page requests through Google machines dedicated to handling Google Web Accelerator traffic.
Storing copies of frequently looked at pages to make them quickly accessible.
Downloading only the updates if a web page has changed slightly since you last viewed it.
Prefetching certain pages onto your computer in advance.
Managing your Internet connection to reduce delays.
Compressing data before sending it to your computer.

Posted by Craig Spargo at July 29, 2005 2:57 AM

Hey, looking at the page you mentioned, I notice that the headers say Cache-Control: private, so this wouldn't be served out of the cache from Google Web Accelerator. So I don't think that the GWA is involved in this (not 100% sure, but that's my strong guess).

Best wishes,

Posted by GoogleGuy at July 29, 2005 2:54 PM

Perhaps it's reference from a search engine cached page?

Posted by poooooof at September 9, 2005 2:22 PM

Maybe I did something wrong, not sure. But I have e-mailed Luke, and have not received any kind of response. I am finally UN-INSTALLING SharpReader. I realize that it is free, and not necessarily free of bugs, but I cannot get the damn thing to STOP opening. It is 'frozen' on a feed, and I cannot have ANY other feed, or window on my computer for that matter, without it opening... This has been going on now for some time. I have uninstalled and reinstalled, at least twice. Now I am removing EVERYTHING from the computer.

Anyone experienced this?? I try to close the program, only to have it open moments later. It continues to show the same feed, unread, over and over and over...


Posted by Christian Connett at September 13, 2005 11:11 AM
This discussion has been closed. If you wish to contact me about this post, you can do so by email.