Traffic Update

Over a month after I first noticed large amounts of downloads of SharpReader0940, the requests are still pouring in. Since the file in question has long been removed, it does not create any major bandwidth issues anymore, but I'm still getting tired of all the entries in my error- and access-log.

I therefore decided to go through my access-log once more in order to block the IP-addresses of the worst offenders. In doing so, I noticed a few interesting things:

  • While most IPs that request SharpReader0940 only do that, some also generate "real" traffic (requests for web-pages, images, etc.) This real traffic typically has a different UserAgent compared to the 0940 downloads.
  • An interesting mix of requests came from 129.33.49.251 (bi01p1.nc.us.ibm.com): a number of 0940 downloads, some regular page requests, and then some requests from UserAgent "w3search". What's IBM up to?
  • The 2 biggest offenders (64.233.172.35 and 64.233.173.80) each generated over 10,000 requests over a 9 day period. Both also did have a small amount of "real" traffic though. A search for "64.233.172 ip" indicates this traffic comes from Google's Web Accelerator. My guess is that the traffic is really from people who happen to have GWA installed on their system, but it's possible that GWA itself is at fault. After all, why would they download the same file over and over again? Doesn't GWA cache files?
Considering the requests are still coming in after a month, and the same (no longer existing) file is still being requested, I doubt this is a deliberate DOS attack - a would-be attacker would've either stopped by now, or switched to another file that is still there.

Since all these requests still have a firefox/mozilla UserAgent, I'm thinking it's probably some kind of firefox plugin that's misbehaving; maybe a built-in test-case that due to some bug ends up downloading this file or something.

Anyway, I've included the now-blocked IPs below. If you find you can no longer access sharpreader.net, you may want to check if you're coming from one of them. Also, if anyone from IBM reads this that may be able to shed some light on the traffic coming from their system (Sam? Mark? you there?;-), I'd appreciate the info.

12.226.103.254 209.82.25.186 24.32.44.157 66.91.159.234 69.241.33.69
128.104.179.188 211.28.74.17 24.7.71.132 67.153.2.226 69.245.58.57
128.32.14.222 212.76.39.230 24.79.132.97 67.167.238.80 69.39.97.204
128.82.79.37 213.168.5.18 59.163.68.130 67.180.141.4 70.110.253.24
130.216.226.102 216.136.66.212 59.92.97.34 67.79.29.151 70.25.201.2
136.152.160.76 216.177.114.26 60.234.147.237 68.104.146.137 70.25.27.110
138.89.49.107 216.222.3.2 63.203.23.109 68.11.35.102 70.28.185.4
141.151.200.193 216.78.207.5 64.185.5.10 68.120.154.154 70.92.30.56
141.151.218.105 216.87.87.233 64.231.252.117 68.188.226.231 71.108.20.247
141.153.177.191 217.196.241.113 64.233.172.35 68.189.139.61 71.242.90.69
151.205.246.211 218.186.218.137 64.233.173.80 68.226.5.17 71.3.202.117
193.1.184.254 24.119.235.155 65.115.96.254 68.37.62.16 72.14.192.5
199.33.173.215 24.119.251.146 65.121.32.222 68.38.86.88 72.14.194.28
201.141.27.217 24.147.255.39 65.199.60.87 68.39.182.246 80.163.170.70
201.141.31.126 24.165.242.237 65.204.229.11 68.45.42.253 81.211.226.196
201.141.67.135 24.168.251.203 65.211.194.101 68.71.120.209 82.36.128.78
201.252.2.226 24.173.2.239 65.223.100.46 68.95.238.138 83.147.147.130
202.156.2.234 24.188.78.43 65.24.182.180 69.110.157.71 83.228.4.48
203.110.151.78 24.214.164.6 65.25.141.70 69.162.41.147 83.76.250.181
206.47.125.163 24.219.210.118 65.30.161.189 69.163.42.1 83.78.169.125
207.190.68.6 24.238.181.140 65.93.134.195 69.197.252.213 84.230.85.27
207.215.1.60 24.24.110.55 66.177.132.89 69.211.12.213 84.92.82.201
209.241.220.227 24.251.148.61 66.223.205.191 69.237.149.177 85.81.76.9

TrackBack URL for this entry: http://www.hutteman.com/scgi-bin/mt/mt-tb.cgi/192
Comments

If they are blocked, how can they read the post? :)

Posted by orangy at July 28, 2005 6:53 AM

I did a scan for mentions of SharpReader within the IBM intranet. It found three occurances, only one of which contained a link, and that one was to http://www.sharpreader.net/.

I see nothing which could help explain what is going on here.

Posted by Sam Ruby at July 28, 2005 7:00 AM

I don't know if this is related or not, but the stats for the feed for Spanning Partners show a strange behavior: a handful of SharpReader clients show up one day (~25) and then are gone for a few days, then are back for a few days, etc. The number changes slightly, but is always between 20-25. At least that's what FeedBurner is reporting.

Posted by Charlie Wood at July 28, 2005 7:38 AM

> If they are blocked, how can they read the post?

blocked from sharpreader.net, not hutteman.com

Posted by Luke Hutteman at July 28, 2005 9:13 AM

Sam: I'm not so much looking for links to SharpReader on your intranet (even in the unlikely event that they point directly to the 0940.zip, this still wouldn't explain where all the other traffic comes from), but your proxy logs may show which internal IPs are responsible for these requests. Knowing the actual system that (some of) the traffic originates from could help a lot in figuring out what application/plugin is causing it.

Posted by Luke Hutteman at July 28, 2005 9:19 AM

Hey Luke, I have Google Web Accelerator. When I visited the sharpreader.net the first time after reading this article, I got the page but with no CSS. I hit refresh, and got 403/Forbidden. I hit refresh again, and got the right page with the styles. Weirdly enough, I could just keep hitting refresh and Firefox would just cycle through the above three over and over again even after clearing my cache. I then turned off GWA for sharpreader.net and refreshed. It now shows the page with CSS as it should look, and refreshing has no effect (which is good). So I don't know what that means or if it helps you, but there you go.

Posted by Jack at July 28, 2005 7:01 PM

Were the uncached GWA requests from before you took it 404, or after? The zip *should* have been cached (though who knows what their caching algo looks like), but a 404 should not be cached (it's 404 Dunno, not 404 I Don't Want To Talk To You No More, You Empty Headed Animal Food Trough Wiper). A 410 Gone, on the other hand, should be cached, and Mark will fart in their general direction if they ever request something a second time after getting a 410.

Posted by Phil Ringnalda at July 28, 2005 8:28 PM

Why not contact Google with this info. Perhaps there is an unknowned error in there tool wich this info might help to fix? It's a long shot but better than nothing.

In my stats (wusage) it logs every request even if the file isn't downloaded. It also logs that files bandwidth, even if it's not downloaded. So for example, my CSS-file onlye downloads ones per user, but logs as downloaded ones per each page view. So it says the file take up about 1 GB per month, but thatäs far from the truth.

Even considering the above, the many request from thoose few ip's are really odd.

I googled "SharpReader0940" and found some links like these:
http://www.tweakers.net/meuktracker/5654?&niv=1
http://msmvps.com/xpditif/archive/2004/02/29/3417.aspx

Posted by Bobby at July 28, 2005 8:51 PM

Phil: I agree that a 404 should not be cached, but I'd expect GWA to be smart enough to realize that if the file wasn't there a few seconds ago, it's probably not necessary to try and download it again now - especially after they've already received a couple thousand 404's...

Posted by Luke Hutteman at July 28, 2005 10:03 PM

Hey hows it going?

Regarding the GWA issue, I found this on their FAQ (http://webaccelerator.google.com/support.html):

4. Does Google Web Accelerator speed up all web pages?

No, it doesn't. For security reasons, Google Web Accelerator won't speed up pages encrypted with the HTTPS: protocol (such as bank records pages). Also, Google Web Accelerator only speeds up web pages, not large data downloads such as MP3 and streaming video files.

That would explain why the GWA is not caching the file as might be expected from a "caching" technology.

2. How does Google Web Accelerator work?

Google Web Accelerator uses various strategies to make your web pages load faster, including:

Sending your page requests through Google machines dedicated to handling Google Web Accelerator traffic.
Storing copies of frequently looked at pages to make them quickly accessible.
Downloading only the updates if a web page has changed slightly since you last viewed it.
Prefetching certain pages onto your computer in advance.
Managing your Internet connection to reduce delays.
Compressing data before sending it to your computer.

Posted by Craig Spargo at July 29, 2005 2:57 AM

Hey, looking at the page you mentioned, I notice that the headers say Cache-Control: private, so this wouldn't be served out of the cache from Google Web Accelerator. So I don't think that the GWA is involved in this (not 100% sure, but that's my strong guess).

Best wishes,
GoogleGuy

Posted by GoogleGuy at July 29, 2005 2:54 PM

Perhaps it's reference from a search engine cached page?

Posted by poooooof at September 9, 2005 2:22 PM

Maybe I did something wrong, not sure. But I have e-mailed Luke, and have not received any kind of response. I am finally UN-INSTALLING SharpReader. I realize that it is free, and not necessarily free of bugs, but I cannot get the damn thing to STOP opening. It is 'frozen' on a feed, and I cannot have ANY other feed, or window on my computer for that matter, without it opening... This has been going on now for some time. I have uninstalled and reinstalled, at least twice. Now I am removing EVERYTHING from the computer.

Anyone experienced this?? I try to close the program, only to have it open moments later. It continues to show the same feed, unread, over and over and over...

AHH!

Posted by Christian Connett at September 13, 2005 11:11 AM
This discussion has been closed. If you wish to contact me about this post, you can do so by email.