What pages do bad bots look for?
Last Updated: 2020-08-01 14:28:20 UTC
by Jan Kopriva (Version: 1)
I’ve been wondering for some time now about what pages and paths are visited the most by “bad” bots – scrapers, data harvesters and other automated scanners which disregards the exclusions set in robots.txt. To determine this, I’ve set up a little experiment – I placed robots.txt on one of my domains, which disallowed access to commonly used paths and PHP pages which might of interest to bots (login.php, /wp-admin/, etc.), configured the server to provide HTTP 200 response for these paths and pages and started logging details about requests sent to them.
To avoid as much legitimate or manually generated traffic as possible, I’ve done this on a domain which pointed to a server on which none of the common content management systems was used.
The captured requests were a mixed bag, as one might expect. Some of them were simple one-shot HTTP GET requests while others were part of multi-request scans, some had no parameters set, while others carried generic SQL injection or XSS payloads or tried to “blindly” exploit vulnerabilities specific to common content management systems.
For our purposes, however, this is beside the point as we’re more interested in finding out which pages were looked for the most. I went over the logs and put the “top 10” most commonly requested pages for the past 12 months in the following table, along with the number of times each path or page was hit.
Although finding wp-login.php in the first place is hardly surprising, the results are interesting. Given the fairly large early drop in a number of requests it seems that one might be able to catch a significant portion of interesting “bad” bot behavior with just a single-page (or four or five-page) honeypot... In other words, if you’ve ever wondered where to place a “honeypage” on your server in order for it to be effective, the top paths mentioned in the table above might probably be a good start.
Aug 1st 2020
2 years ago
with a wp-login.php page that clearly thanks the cracker for their kind attempt and that is has been logged. We see you and are watching you! Ya I know most will just breeze past that, but it will slow down some, especially those just starting.
The trick will be with doing some thing with that IP or small range of IPs to induce more pressure. The search engines could be set to include into those addresses' searches, articles such as: "how to break your cracking habit", "what your local jails are really like", and ads of lawyers specializing in milking/representing crackers in court. Any of the gatekeeper sites, including social media, could do similar. It is important to not block the primary service and only for a shortish while, just to make it clear that this IP or very close to this IP was recently doing something naughty (engage social pressure). The perpetrators will know they've been seen, and that their actions can have a direct and negative impact on them. This would be enough to stop some early explores from sliding down the path towards script kiddiness and beyond and to increase the cost/effort of those who still try.
Networks that have such response popping up, then can know that they've been used. If it internal, there are some internal discussions to have. If none found internal, then perhaps there is a security issue that needs addressing, and the encouragement to rectify the security gaps (oh, I guess the open WiFi wasn't a good idea).
Aug 1st 2020
2 years ago
Aug 2nd 2020
2 years ago