In order to cut down on variables that are set on the server, I attempt to turn off session management for spiders so that no session variables need to be created. I do this based on user agents and black-listed IP addresses. However, recently, I have been getting a slew of hits from what I assume are spiders that have regular user agents:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Since I can't use that, I thought I would black list the IP addresses, but it seems that the spider is sending a randomized remote address for each page request. The following IP addresses all came from some sort of crawler within two minutes:
22.214.171.124 (3 hits)
126.96.36.199 (2 hits)
188.8.131.52 (2 hits)
184.108.40.206 (2 hits)
220.127.116.11 (2 hits)
18.104.22.168 (3 hits)
I know that it was a crawler because they all had the same http referer, which was my home page and not all of the requested pages are available from the home page, which means the referer was being set manually. This is so irritating! Now, I have dozens upon dozens of sessions being created on the server that will last 20 minutes without being used twice. That is poor memory management.
Why is the spider doing this? I suppose this is to stop people from serving up different content based on spiders, but that is not my purpose. Having no session management does not server different content. It just turns off certain server-side tracking. Uggg.