public virtual MemoryStream: Traffic Update

Wednesday, July 27, 2005 11:11 PM

Traffic Update

Over a month after I first noticed large amounts of downloads of SharpReader0940, the requests are still pouring in. Since the file in question has long been removed, it does not create any major bandwidth issues anymore, but I'm still getting tired of all the entries in my error- and access-log.

I therefore decided to go through my access-log once more in order to block the IP-addresses of the worst offenders. In doing so, I noticed a few interesting things:

While most IPs that request SharpReader0940 only do that, some also generate "real" traffic (requests for web-pages, images, etc.) This real traffic typically has a different UserAgent compared to the 0940 downloads.
An interesting mix of requests came from 129.33.49.251 (bi01p1.nc.us.ibm.com): a number of 0940 downloads, some regular page requests, and then some requests from UserAgent "w3search". What's IBM up to?
The 2 biggest offenders (64.233.172.35 and 64.233.173.80) each generated over 10,000 requests over a 9 day period. Both also did have a small amount of "real" traffic though. A search for "64.233.172 ip" indicates this traffic comes from Google's Web Accelerator. My guess is that the traffic is really from people who happen to have GWA installed on their system, but it's possible that GWA itself is at fault. After all, why would they download the same file over and over again? Doesn't GWA cache files?

Considering the requests are still coming in after a month, and the same (no longer existing) file is still being requested, I doubt this is a deliberate DOS attack - a would-be attacker would've either stopped by now, or switched to another file that is still there.

Since all these requests still have a firefox/mozilla UserAgent, I'm thinking it's probably some kind of firefox plugin that's misbehaving; maybe a built-in test-case that due to some bug ends up downloading this file or something.

Anyway, I've included the now-blocked IPs below. If you find you can no longer access sharpreader.net, you may want to check if you're coming from one of them. Also, if anyone from IBM reads this that may be able to shed some light on the traffic coming from their system (Sam? Mark? you there?;-), I'd appreciate the info.

12.226.103.254	209.82.25.186	24.32.44.157	66.91.159.234	69.241.33.69
128.104.179.188	211.28.74.17	24.7.71.132	67.153.2.226	69.245.58.57
128.32.14.222	212.76.39.230	24.79.132.97	67.167.238.80	69.39.97.204
128.82.79.37	213.168.5.18	59.163.68.130	67.180.141.4	70.110.253.24
130.216.226.102	216.136.66.212	59.92.97.34	67.79.29.151	70.25.201.2
136.152.160.76	216.177.114.26	60.234.147.237	68.104.146.137	70.25.27.110
138.89.49.107	216.222.3.2	63.203.23.109	68.11.35.102	70.28.185.4
141.151.200.193	216.78.207.5	64.185.5.10	68.120.154.154	70.92.30.56
141.151.218.105	216.87.87.233	64.231.252.117	68.188.226.231	71.108.20.247
141.153.177.191	217.196.241.113	64.233.172.35	68.189.139.61	71.242.90.69
151.205.246.211	218.186.218.137	64.233.173.80	68.226.5.17	71.3.202.117
193.1.184.254	24.119.235.155	65.115.96.254	68.37.62.16	72.14.192.5
199.33.173.215	24.119.251.146	65.121.32.222	68.38.86.88	72.14.194.28
201.141.27.217	24.147.255.39	65.199.60.87	68.39.182.246	80.163.170.70
201.141.31.126	24.165.242.237	65.204.229.11	68.45.42.253	81.211.226.196
201.141.67.135	24.168.251.203	65.211.194.101	68.71.120.209	82.36.128.78
201.252.2.226	24.173.2.239	65.223.100.46	68.95.238.138	83.147.147.130
202.156.2.234	24.188.78.43	65.24.182.180	69.110.157.71	83.228.4.48
203.110.151.78	24.214.164.6	65.25.141.70	69.162.41.147	83.76.250.181
206.47.125.163	24.219.210.118	65.30.161.189	69.163.42.1	83.78.169.125
207.190.68.6	24.238.181.140	65.93.134.195	69.197.252.213	84.230.85.27
207.215.1.60	24.24.110.55	66.177.132.89	69.211.12.213	84.92.82.201
209.241.220.227	24.251.148.61	66.223.205.191	69.237.149.177	85.81.76.9

TrackBack URL for this entry: http://www.hutteman.com/scgi-bin/mt/mt-tb.cgi/192

Comments

If they are blocked, how can they read the post? :)

Posted by orangy at July 28, 2005 6:53 AM

I did a scan for mentions of SharpReader within the IBM intranet. It found three occurances, only one of which contained a link, and that one was to http://www.sharpreader.net/.

I see nothing which could help explain what is going on here.

Posted by Sam Ruby at July 28, 2005 7:00 AM

I don't know if this is related or not, but the stats for the feed for Spanning Partners show a strange behavior: a handful of SharpReader clients show up one day (~25) and then are gone for a few days, then are back for a few days, etc. The number changes slightly, but is always between 20-25. At least that's what FeedBurner is reporting.

Posted by Charlie Wood at July 28, 2005 7:38 AM

> If they are blocked, how can they read the post?

blocked from sharpreader.net, not hutteman.com

Posted by Luke Hutteman at July 28, 2005 9:13 AM

Sam: I'm not so much looking for links to SharpReader on your intranet (even in the unlikely event that they point directly to the 0940.zip, this still wouldn't explain where all the other traffic comes from), but your proxy logs may show which internal IPs are responsible for these requests. Knowing the actual system that (some of) the traffic originates from could help a lot in figuring out what application/plugin is causing it.

Posted by Luke Hutteman at July 28, 2005 9:19 AM

Hey Luke, I have Google Web Accelerator. When I visited the sharpreader.net the first time after reading this article, I got the page but with no CSS. I hit refresh, and got 403/Forbidden. I hit refresh again, and got the right page with the styles. Weirdly enough, I could just keep hitting refresh and Firefox would just cycle through the above three over and over again even after clearing my cache. I then turned off GWA for sharpreader.net and refreshed. It now shows the page with CSS as it should look, and refreshing has no effect (which is good). So I don't know what that means or if it helps you, but there you go.

Posted by Jack at July 28, 2005 7:01 PM

Were the uncached GWA requests from before you took it 404, or after? The zip *should* have been cached (though who knows what their caching algo looks like), but a 404 should not be cached (it's 404 Dunno, not 404 I Don't Want To Talk To You No More, You Empty Headed Animal Food Trough Wiper). A 410 Gone, on the other hand, should be cached, and Mark will fart in their general direction if they ever request something a second time after getting a 410.

Posted by Phil Ringnalda at July 28, 2005 8:28 PM

Why not contact Google with this info. Perhaps there is an unknowned error in there tool wich this info might help to fix? It's a long shot but better than nothing.

In my stats (wusage) it logs every request even if the file isn't downloaded. It also logs that files bandwidth, even if it's not downloaded. So for example, my CSS-file onlye downloads ones per user, but logs as downloaded ones per each page view. So it says the file take up about 1 GB per month, but that�s far from the truth.

Even considering the above, the many request from thoose few ip's are really odd.

I googled "SharpReader0940" and found some links like these:
http://www.tweakers.net/meuktracker/5654?&niv=1
http://msmvps.com/xpditif/archive/2004/02/29/3417.aspx

Posted by Bobby at July 28, 2005 8:51 PM

Phil: I agree that a 404 should not be cached, but I'd expect GWA to be smart enough to realize that if the file wasn't there a few seconds ago, it's probably not necessary to try and download it again now - especially after they've already received a couple thousand 404's...

Posted by Luke Hutteman at July 28, 2005 10:03 PM

Hey hows it going?

Regarding the GWA issue, I found this on their FAQ (http://webaccelerator.google.com/support.html):

4. Does Google Web Accelerator speed up all web pages?

No, it doesn't. For security reasons, Google Web Accelerator won't speed up pages encrypted with the HTTPS: protocol (such as bank records pages). Also, Google Web Accelerator only speeds up web pages, not large data downloads such as MP3 and streaming video files.

That would explain why the GWA is not caching the file as might be expected from a "caching" technology.

2. How does Google Web Accelerator work?

Google Web Accelerator uses various strategies to make your web pages load faster, including:

Sending your page requests through Google machines dedicated to handling Google Web Accelerator traffic.
Storing copies of frequently looked at pages to make them quickly accessible.
Downloading only the updates if a web page has changed slightly since you last viewed it.
Prefetching certain pages onto your computer in advance.
Managing your Internet connection to reduce delays.
Compressing data before sending it to your computer.

Posted by Craig Spargo at July 29, 2005 2:57 AM

Hey, looking at the page you mentioned, I notice that the headers say Cache-Control: private, so this wouldn't be served out of the cache from Google Web Accelerator. So I don't think that the GWA is involved in this (not 100% sure, but that's my strong guess).

Best wishes,
GoogleGuy

Posted by GoogleGuy at July 29, 2005 2:54 PM

Perhaps it's reference from a search engine cached page?

Posted by poooooof at September 9, 2005 2:22 PM

Maybe I did something wrong, not sure. But I have e-mailed Luke, and have not received any kind of response. I am finally UN-INSTALLING SharpReader. I realize that it is free, and not necessarily free of bugs, but I cannot get the damn thing to STOP opening. It is 'frozen' on a feed, and I cannot have ANY other feed, or window on my computer for that matter, without it opening... This has been going on now for some time. I have uninstalled and reinstalled, at least twice. Now I am removing EVERYTHING from the computer.

Anyone experienced this?? I try to close the program, only to have it open moments later. It continues to show the same feed, unread, over and over and over...

AHH!

Posted by Christian Connett at September 13, 2005 11:11 AM

This discussion has been closed. If you wish to contact me about this post, you can do so by email.