Introducing subCrawl – a framework for the analysis and clustering of hacking tools found using open directories

Josh Stroschein (Independent researcher), Patrick Schläpfer (HP) & Alex Holland (HP)

From phishing kits to command-and-control (C2) panels, web shells and directories containing multiple samples of malware, open directories can provide a wealth of information into threat actor operations. But how can we discover open directories associated with malicious activity? And once we discover them, what are the next steps for identifying interesting content? Furthermore, is it possible to compare those artifacts found and draw conclusions about which threat actors use which tools, correlate compromised hosts by the tools found, and discover how they compromised the site?

To answer the questions posed above, we implemented the open-source framework subCrawl. subCrawl is written in Python3 and provides a modular framework for discovering open directories, identifying unique content through signatures, and organizing the data with optional output modules, such as MISP.

Open directories are simply folders that are viewable on a public web server that provides direct links to all its content. While open directories can be used to legitimately share files, such as images and documents, they are often overlooked by threat actors. Therefore, they can provide insight into the structure, tools and malware being used by many threat actors. This oversight can provide direct access to the tools they’ve placed on a server, such as open or password-protected web shells, source code for prevalent C2 panels such as Azorult, Pony and Agent Tesla, and proxy scripts for QuakBot. However, open directories can not only lead to a deeper understanding of malware operations, but also help disrupt ongoing campaigns or create protective measures against them.

To make sense of the information found from the scanned open directories, we use our framework subCrawl to aggregate the data with fuzzy hashes, web server information, used scripting languages and more. This approach allows for the creation of unique signatures that can be used to track tool usage across multiple hosts and cluster threat actor activities. To help manage the hosts explored and the data collected, we create consolidated MISP events, which enables us to cluster the found artifacts and draw interesting conclusions about the use of tools and possible website compromise scenarios.

In this talk, we will present the open-source framework subCrawl, which reflects our approach for hunting open directories. We will also explore our methodology to detect and cluster malicious content using publicly available threat feeds with the support of the well-known tool MISP, which helps us to store the data in a structured form and cluster it. Based on our aggregated data set, we will give insight into our most significant findings by evaluating the hacking tools used by threat actors and describing possible connections between compromised websites.

Got a question about this presentation? To get in touch with the speakers, contact Patrick Schläpfer by email on [email protected] or on Twitter at @stoerchl.

Josh Stroschein

Independent researcher

Josh is an experienced malware analyst and reverse engineer and has a passion for sharing his knowledge with others. He is the Director of Training for OISF, where he leads all training activity for the foundation and is also responsible for academic outreach and developing research initiatives. Josh is an accomplished trainer, providing training in the aforementioned subject areas at BlackHat, DerbyCon, Toorcon, Hack-In-The-Box, Suricon, and other public and private venues. Josh is an Assistant Professor of Cyber Security at Dakota State University where he teaches malware analysis and reverse engineering, an author on Pluralsight, and a threat researcher for HP.

@jstrosch

Patrick Schläpfer

Patrick is a malware analyst at HP with interests in a wide range of security areas. He focused on cybersecurity during his studies, where he developed a particular interest in malware analysis. After graduation, he worked on a scientific project at a university that built a dynamic malware analysis system using code similarity clustering. He gained further experience as a security analyst with responsibilities in threat intelligence at a Swiss bank. At the beginning of 2021, Patrick joined HP's Threat Research team as a malware analyst. He conducts analyses of new threats, using the results to improve HP's security products and shares them with the community.

@stoerchl

Alex Holland

Alex Holland is a malware analyst at HP and is based in Cambridge, UK. He enjoys tracking malware families, admiring process trees and finding exciting ways of visualizing samples. Prior to joining HP, Alex worked in a variety of security roles to include incident response and malware analysis. Alex is also an accomplished speaker and has presented his research at leading security conferences across the globe.

@cryptogramfan