Wednesday, July 27, 2005 11:11 PM
Over a month after I first noticed large amounts of downloads of SharpReader0940, the requests are still pouring in. Since the file in question has long been removed, it does not create any major bandwidth issues anymore, but I'm still getting tired of all the entries in my error- and access-log.
I therefore decided to go through my access-log once more in order to block the IP-addresses of the worst offenders. In doing so, I noticed a few interesting things:
- While most IPs that request SharpReader0940 only do that, some also generate "real" traffic (requests for web-pages, images, etc.) This real traffic typically has a different UserAgent compared to the 0940 downloads.
- An interesting mix of requests came from 18.104.22.168 (bi01p1.nc.us.ibm.com): a number of 0940 downloads, some regular page requests, and then some requests from UserAgent "w3search". What's IBM up to?
- The 2 biggest offenders (22.214.171.124 and 126.96.36.199) each generated over 10,000 requests over a 9 day period. Both also did have a small amount of "real" traffic though. A search for "64.233.172 ip" indicates this traffic comes from Google's Web Accelerator. My guess is that the traffic is really from people who happen to have GWA installed on their system, but it's possible that GWA itself is at fault. After all, why would they download the same file over and over again? Doesn't GWA cache files?
Considering the requests are still coming in after a month, and the same (no longer existing) file is still being requested, I doubt this is a deliberate DOS attack - a would-be attacker would've either stopped by now, or switched to another file that is still there.
Since all these requests still have a firefox/mozilla UserAgent, I'm thinking it's probably some kind of firefox plugin that's misbehaving; maybe a built-in test-case that due to some bug ends up downloading this file or something.
Anyway, I've included the now-blocked IPs below. If you find you can no longer access sharpreader.net, you may want to check if you're coming from one of them. Also, if anyone from IBM reads this that may be able to shed some light on the traffic coming from their system (Sam? Mark? you there?;-), I'd appreciate the info.
|188.8.131.52 ||184.108.40.206 ||220.127.116.11 ||18.104.22.168 ||22.214.171.124 |
|126.96.36.199 ||188.8.131.52 ||184.108.40.206 ||220.127.116.11 ||18.104.22.168 |
|22.214.171.124 ||126.96.36.199 ||188.8.131.52 ||184.108.40.206 ||220.127.116.11 |
|18.104.22.168 ||22.214.171.124 ||126.96.36.199 ||188.8.131.52 ||184.108.40.206 |
|220.127.116.11 ||18.104.22.168 ||22.214.171.124 ||126.96.36.199 ||188.8.131.52 |
|184.108.40.206 ||220.127.116.11 ||18.104.22.168 ||22.214.171.124 ||126.96.36.199 |
|188.8.131.52 ||184.108.40.206 ||220.127.116.11 ||18.104.22.168 ||22.214.171.124 |
|126.96.36.199 ||188.8.131.52 ||184.108.40.206 ||220.127.116.11 ||18.104.22.168 |
|22.214.171.124 ||126.96.36.199 ||188.8.131.52 ||184.108.40.206 ||220.127.116.11 |
|18.104.22.168 ||22.214.171.124 ||126.96.36.199 ||188.8.131.52 ||184.108.40.206 |
|220.127.116.11 ||18.104.22.168 ||22.214.171.124 ||126.96.36.199 ||188.8.131.52 |
|184.108.40.206 ||220.127.116.11 ||18.104.22.168 ||22.214.171.124 ||126.96.36.199 |
|188.8.131.52 ||184.108.40.206 ||220.127.116.11 ||18.104.22.168 ||22.214.171.124 |
|126.96.36.199 ||188.8.131.52 ||184.108.40.206 ||220.127.116.11 ||18.104.22.168 |
|22.214.171.124 ||126.96.36.199 ||188.8.131.52 ||184.108.40.206 ||220.127.116.11 |
|18.104.22.168 ||22.214.171.124 ||126.96.36.199 ||188.8.131.52 ||184.108.40.206 |
|220.127.116.11 ||18.104.22.168 ||22.214.171.124 ||126.96.36.199 ||188.8.131.52 |
|184.108.40.206 ||220.127.116.11 ||18.104.22.168 ||22.214.171.124 ||126.96.36.199 |
|188.8.131.52 ||184.108.40.206 ||220.127.116.11 ||18.104.22.168 ||22.214.171.124 |
|126.96.36.199 ||188.8.131.52 ||184.108.40.206 ||220.127.116.11 ||18.104.22.168 |
|22.214.171.124 ||126.96.36.199 ||188.8.131.52 ||184.108.40.206 ||220.127.116.11 |
|18.104.22.168 ||22.214.171.124 ||126.96.36.199 ||188.8.131.52 ||184.108.40.206 |
|220.127.116.11 ||18.104.22.168 ||22.214.171.124 ||126.96.36.199 ||188.8.131.52 |
TrackBack URL for this entry: http://www.hutteman.com/scgi-bin/mt/mt-tb.cgi/192
If they are blocked, how can they read the post? :)
I did a scan for mentions of SharpReader within the IBM intranet. It found three occurances, only one of which contained a link, and that one was to http://www.sharpreader.net/.
I see nothing which could help explain what is going on here.
I don't know if this is related or not, but the stats for the feed for Spanning Partners show a strange behavior: a handful of SharpReader clients show up one day (~25) and then are gone for a few days, then are back for a few days, etc. The number changes slightly, but is always between 20-25. At least that's what FeedBurner is reporting.
> If they are blocked, how can they read the post?
blocked from sharpreader.net, not hutteman.com
Sam: I'm not so much looking for links to SharpReader on your intranet (even in the unlikely event that they point directly to the 0940.zip, this still wouldn't explain where all the other traffic comes from), but your proxy logs may show which internal IPs are responsible for these requests. Knowing the actual system that (some of) the traffic originates from could help a lot in figuring out what application/plugin is causing it.
Hey Luke, I have Google Web Accelerator. When I visited the sharpreader.net the first time after reading this article, I got the page but with no CSS. I hit refresh, and got 403/Forbidden. I hit refresh again, and got the right page with the styles. Weirdly enough, I could just keep hitting refresh and Firefox would just cycle through the above three over and over again even after clearing my cache. I then turned off GWA for sharpreader.net and refreshed. It now shows the page with CSS as it should look, and refreshing has no effect (which is good). So I don't know what that means or if it helps you, but there you go.
Were the uncached GWA requests from before you took it 404, or after? The zip *should* have been cached (though who knows what their caching algo looks like), but a 404 should not be cached (it's 404 Dunno, not 404 I Don't Want To Talk To You No More, You Empty Headed Animal Food Trough Wiper). A 410 Gone, on the other hand, should be cached, and Mark will fart in their general direction if they ever request something a second time after getting a 410.
Why not contact Google with this info. Perhaps there is an unknowned error in there tool wich this info might help to fix? It's a long shot but better than nothing.
In my stats (wusage) it logs every request even if the file isn't downloaded. It also logs that files bandwidth, even if it's not downloaded. So for example, my CSS-file onlye downloads ones per user, but logs as downloaded ones per each page view. So it says the file take up about 1 GB per month, but thatäs far from the truth.
Even considering the above, the many request from thoose few ip's are really odd.
I googled "SharpReader0940" and found some links like these:
Phil: I agree that a 404 should not be cached, but I'd expect GWA to be smart enough to realize that if the file wasn't there a few seconds ago, it's probably not necessary to try and download it again now - especially after they've already received a couple thousand 404's...
Hey hows it going?
Regarding the GWA issue, I found this on their FAQ (http://webaccelerator.google.com/support.html):
4. Does Google Web Accelerator speed up all web pages?
No, it doesn't. For security reasons, Google Web Accelerator won't speed up pages encrypted with the HTTPS: protocol (such as bank records pages). Also, Google Web Accelerator only speeds up web pages, not large data downloads such as MP3 and streaming video files.
That would explain why the GWA is not caching the file as might be expected from a "caching" technology.
2. How does Google Web Accelerator work?
Google Web Accelerator uses various strategies to make your web pages load faster, including:
Sending your page requests through Google machines dedicated to handling Google Web Accelerator traffic.
Storing copies of frequently looked at pages to make them quickly accessible.
Downloading only the updates if a web page has changed slightly since you last viewed it.
Prefetching certain pages onto your computer in advance.
Managing your Internet connection to reduce delays.
Compressing data before sending it to your computer.
Hey, looking at the page you mentioned, I notice that the headers say Cache-Control: private, so this wouldn't be served out of the cache from Google Web Accelerator. So I don't think that the GWA is involved in this (not 100% sure, but that's my strong guess).
Perhaps it's reference from a search engine cached page?
Maybe I did something wrong, not sure. But I have e-mailed Luke, and have not received any kind of response. I am finally UN-INSTALLING SharpReader. I realize that it is free, and not necessarily free of bugs, but I cannot get the damn thing to STOP opening. It is 'frozen' on a feed, and I cannot have ANY other feed, or window on my computer for that matter, without it opening... This has been going on now for some time. I have uninstalled and reinstalled, at least twice. Now I am removing EVERYTHING from the computer.
Anyone experienced this?? I try to close the program, only to have it open moments later. It continues to show the same feed, unread, over and over and over...