24 Nov, 2023

Defending Against Deceptive Crawlers. The Battle Against Encrypted GET Request Spam

In light of the rapid development of technology, cyber security is becoming an integral part of an organisation’s defence strategy. One sophisticated attack that stands out among today’s threats is the spamming of encrypted GET requests using different IP addresses. However, what makes this type of attack particularly sophisticated is the sending of requests under the name of known applications.

Customer enquiry, and immediate team response

CQR was approached by a customer with alarming news: their website was suddenly experiencing unusual issues. The client discovered that their hosting space had unexpectedly run out of free space, accompanied by an abnormally high volume of website traffic. This surge was traced back to an intricate attack involving spamming encrypted GET requests. Notably, these requests were spoofing the user agent string with the name of Googlebot. The customer’s logs presented a series of requests coming from Google’s IP addresses, all targeting different URLs on their site and occurring rapidly on the same day. The requests, made using the GET method, contained search parameters in Chinese, typically associated with dating service advertisements – a telltale sign of spam. Moreover, each request was disguised as originating from Googlebot, as evidenced by the User-Agent strings.

Essence of the Attack

Upon a detailed analysis, we found that all these requests were processed successfully by the server, as indicated by the 200 status code. This was a major concern since the nature of the requests was inconsistent with the typical behavior of a legitimate Googlebot. To explain this to a non-technical audience: imagine a Googlebot as a good robot that visits websites to help them show up on Google searches. However, in this case, it was as if someone was wearing a Googlebot costume to trick the website. Our cybersecurity experts identified two likely tactics used in this attack:

User-Agent Spoofing – Attackers might have altered the User-Agent in their HTTP requests to masquerade as Googlebot. This can be done quite easily using tools like cURL or programming libraries, and is a common tactic to bypass security measures that trust traffic from reputable crawlers like Google’s.

Using Open Proxies or VPN Services – The attackers could have routed their traffic through open proxy servers or VPN services, some of which might use IP ranges associated with Google. This method makes it challenging to trace the true origin of the attack and can mislead security systems that trust traffic from such IP addresses.

As a consequence of these tactics, the website under attack started showing up in search engine results with the specific content the attackers had used, known as “payloads.” This led to the website unintentionally hosting irrelevant and potentially harmful content. The implication of such an incident is significant. It can harm the website’s credibility, potentially lead to search engine penalties, and distort the site’s traffic analytics.

Illustrating the impact, a screenshot showed that the website began appearing at the top of Google search results for the exact payloads inserted by the attackers. This not only spreads the unwanted content further but also raises serious questions about the integrity and reliability of the affected website in the eyes of its users and search engines alike.

Problem solution

Upon discovering the sophisticated attack on their online platform, the CQR team swiftly took decisive measures to mitigate the impact and secure the system. Recognizing the urgency of the situation, the first step involved temporarily disabling the search functionality on the platform. This strategic move aimed to prevent further exploitation of the vulnerability and buy the team time to assess the extent of the attack.

In their effort to fortify the platform against future incursions, the CQR team made specific and strategic edits to the robots.txt and htaccess files. The adjustments aimed to enhance access control and thwart unauthorized entities, particularly those attempting to manipulate the website through the exploitation of the encrypted GET requests.

  1. Robots.txt Modifications:

    • To counter the recent attack, the team implemented decisive changes within the robots.txt file, restricting the bot’s access to the directories that had been under assault. This involved disallowing access to specific directories that were targeted during the recent sophisticated incursion. By strategically limiting the bot’s reach to these compromised directories, the team effectively mitigated the impact of the attack on vulnerable areas of the website.

    • Additionally, a proactive measure was taken to adjust the robots.txt configuration, focusing on regulating the frequency of legitimate search engine crawlers. This precautionary step aimed to prevent strain on the platform’s resources, guarding against potential risks associated with excessive crawling attempts. Such attempts could be indicative of a broader threat landscape, including the potential for a distributed denial-of-service (DDoS) attack. This fine-tuning of access controls within the robots.txt file not only addressed the recent attack but also contributed to an enhanced and resilient defense strategy against future cyber threats.

  2. htaccess File Adjustments:

    • The team inserted rules in the htaccess file to filter and block incoming requests that closely matched the patterns of the malicious encrypted GET requests. This involved creating specific rules based on the characteristics of the attack, such as the unique UTF-8 encrypted queries.

    • Furthermore, they implemented IP filtering to block the known IP addresses associated with the attack. This served as an additional layer of defense against the perpetrators, preventing their access to the platform and reducing the likelihood of further attempts.

    • To counter potential variations in the attack methodology, the team set up anomaly detection rules in the htaccess file. This involved monitoring incoming requests for patterns that deviated from the norm, enabling the system to identify and block suspicious activity in real-time.

The CQR team dived into the backend infrastructure and performed a thorough check of the MySQL database. The main focus was on cleaning the database of malicious and extraneous queries that had infiltrated it during the attack, which looked something like this:

This meticulous cleanup operation not only aimed at eliminating traces of the recent incident but also bolstered the system’s resilience against future infiltration attempts. The team’s commitment to database hygiene played a pivotal role in restoring the platform’s integrity.

Understanding the importance of real-time protection, the CQR team leveraged Cloudflare’s “I’m under attack” mode to alleviate the strain on the platform’s CPU. 

By activating this feature, the team added an additional layer of defense against potential DDoS attacks and reduced the system’s susceptibility to resource-intensive activities.

Summarising the results

In understanding the multifaceted motivations behind the recent cyber attack on our client’s website, it becomes clear that each motive required a targeted and strategic response. From attempts to tarnish the company’s reputation to disrupting competing services, using the attack as a diversion for more serious cybercrimes, and spreading spam or fraudulent messages, the attackers displayed a range of objectives.

Our team, acknowledging these deceptive tactics, acted swiftly to mitigate the impact. We implemented advanced filters to catch and block dubious traffic, particularly focusing on those posing as Googlebot and those routing through known proxy or VPN channels. Essentially, we set up sophisticated digital ‘bouncers’ at the website’s entrance to halt these disguised intruders. Furthermore, we advised our client to monitor their website traffic closely for similar patterns and recommended the deployment of more comprehensive intrusion detection and prevention systems for enduring security.

We also made specific adjustments to the website’s robots.txt and .htaccess files. These files function as the website’s rulebooks, dictating which parts should be accessible and to whom. By refining these files, we enhanced the site’s security against unauthorized crawlers and malicious requests.

Our client expressed satisfaction with the swift resolution of the problem and appreciated the comprehensive recommendations provided for future security enhancements. This successful intervention reinforces the importance of staying alert and proactively adapting to the evolving landscape of digital threats.

Other Services

Ready to secure?

Let's get in touch