Cyber Threat Intelligence - TOR Investigations

Introduction

In this article, I will guide you through my thought process while conducting an investigation, aiming to identify and collect sources of intelligence. Specifically, we will focus on utilizing open-source intelligence sources related to the Deep and Dark Web domain. Our primary goal is to monitor intelligence information from the following sources: ‘markets’ and ‘cybercriminal activities.’ While I will present a case I worked on some time ago, I won’t provide details about the case itself. Instead, I will discuss the tools I used and how to conduct a hunt and collect intelligence and analysis techniques. As you join me on this journey, I hope you don’t get lost in my thought process.

We will shift our focus more on OSINT, Dark Web Analysis for Threat Intelligence, Scouring Forums and Marketplaces, How to Infiltrate These Forums, mining Threat Intelligence, as well as utilizing Dark Web Search Engines, and Blockchain Analytics.

Anatomy of the Dark Web

We cover the concept of the dark web and provide a brief overview of how it works, focusing on the ecosystem and intricate infrastructure of the Tor network, dark web marketplaces, the role of cryptocurrencies, and the dynamics of underground forums and chat rooms. Alright, what is the dark web? If you’re reading this article, you likely already know the answer, but let’s cover it one more time. The dark web is a hidden part of the internet that operates beyond the reach of conventional search engines like Google or DuckDuckGo. It requires specialized software to access—yes, you guessed it—the Tor network. However, to better understand the dark web, it’s essential to first grasp the different layers that make up the internet ecosystem. The web can be divided into three distinct layers: the surface web, the deep web, and the dark web, as far as I know.

So, we all know the surface web, right? For example, you’re reading this article on the surface web, and this blog is already indexed by search engines, making it openly available to the public. But the Deep Web is another beast—it’s significantly larger than the surface web and encompasses all web content that is not indexed by search engines. It consists of password-protected pages, membership sites, private databases, online private systems, and other dynamically generated web content. So, it requires specific access credentials or a direct link to access its content. You often hear people misunderstand the deep web with the dark web, or vice versa, but they are distinct from each other. How? Unlike the deep web, the dark web constitutes a small portion of the deep web. The dark web is intentionally concealed and requires specialized software to access. Accessing the dark web involves using anonymity tools such as, yes, the Tor network. The dark web primarily hosts websites with “.onion” domain extensions and is renowned for unsavory activities, including illegal marketplaces, cybercrime forums, and other hidden services. Anonymity is a core characteristic of the dark web, making it appealing to those seeking privacy or engaging in illegal activities.

At its core, Tor functions as a multi-layered encryption network that obscures users’ digital footprints using a cryptographic technique known as onion routing. When a user accesses the dark web through Tor, their internet traffic is encapsulated in multiple layers of encryption, akin to an onion’s protective layers. This encryption encapsulation ensures that each data packet remains unintelligible to any single node in the network, thereby safeguarding the user’s identity.

media.cybernews.com

Tor’s modus operandi relies on a diverse constellation of volunteer-operated nodes and relays distributed across the globe. Each data packet hops through a series of these intermediary nodes, with each node only privy to its immediate predecessor and successor. This orchestrated relay of information obscures the original sender’s identity, effectively masking the user’s digital trail.

Exit nodes, the final stop in this complex journey, decrypt the data packet and forward it to its intended destination. Importantly, exit nodes are the only point in the Tor network where the encrypted data is decrypted. This strategic arrangement prevents a single node from having full visibility into both the source and the final destination of the data. In essence, the Tor network weaves a complex web of encryption and redirection, rendering user identities inscrutable and online activities nearly untraceable. As we proceed deeper into the dark web’s ecosystem, this robust infrastructure becomes the backbone of the various marketplaces and interactions that characterize the dark web.

Case Study

Alright, so the case was simple. The profile we were hunting and their operation had already been investigated by other researchers, but our task was simply the process of collecting, analyzing, and disseminating actionable information(TTPs), When I received the case, there was already some internal research available, so it was easy to focus solely on the profiles and fingerprints of their activity across the web, and to uncover any hidden connections about the operation (TTPs). So, it was a straightforward assignment.

So, like any assignment, you have a name or domain of interest and the information to get you started. First, I was assigned to identify the relationship between two onion sites and demonstrate that they are owned by the same entity or individual. Additionally, I was tasked with tracing their blockchain fingerprints to a registered cryptocurrency exchange.

Simple, right? However, as you proceed through the investigation and hunt, you uncover a lot more information and details that may have been ignored or missed, and you have to decide how to proceed.

Keep this in mind as we cover more. As you can see below, we have two targets: one is a marketplace, the other is more like a personal service. These sites are collected and analyzed to determine the most likely explanation for a given set of data, which may indicate they are run by the same individual or entity, as they share a lot of characteristics and other small details. However, this is still just an assumption. We need facts and proof that they are indeed run or operated under the same actor, and that’s what we’re going to do.

There are several methodologies for analyzing CTI once it has been collected. Structured analytic techniques are a set of methodologies designed to improve the accuracy and consistency of intelligence analysis. Some common SATs used in CTI analysis include:

  1. Analysis of Competing Hypotheses (ACH): This involves evaluating multiple competing hypotheses to determine the most likely explanation for a given set of data. This has already been done.

  2. Key Assumptions Check: This technique involves identifying and challenging the key assumptions underlying an intelligence assessment, ensuring that they are well-founded and not based on unfounded beliefs or biases.

  +---------+--------------------------------------------------------------------+
  | Site    | Description                                                        |
  +---------+--------------------------------------------------------------------+
  | Site A  | Operates as market, facilitating the sale of hacking               |
  |         | tools and personal identifying information (PII).                  |
  +---------+--------------------------------------------------------------------+
  | Site B  | Appears to be a platform offering hacking services.                |
  +---------+--------------------------------------------------------------------+

Alright, time to dig into this. This is how I hunt. Some researchers try to gather as much information as possible so they can analyze and organize data later. However, I like to see this from an offensive standpoint—”Where they messed up?” A simple detail or vulnerability that I can use to de-anonymize the operation or launch an offensive attack on the target. Some of my colleagues told me that this is not the typical mindset of a threat researcher, but it has always worked for me. So, the first step is information gathering. Like I said, we already have some research done, but I’d like to go over the information and conduct my own research.

To conduct an investigation, you need a profile, most likely a cybercriminal one. You don’t have to do anything illegal; you just need to appear as one. For that, you’ll need credibility, such as being an existing member of forums to vouch for you, or you can pay for entry. If we talk about forums, establishing connections and using social engineering to infiltrate forums and establish a reputation can be effective. Alternatively, all this can be bypassed by targeting accounts with reputation and credibility, and using them as puppets with no need for a profile. There are many accounts out there, especially in these dark web forums, that can be easily taken over. Since we’re not dealing with a multi-million-dollar company that has its own red team, But depending on who you ask, this may be fallen under “Legality and Ethics” Hacking laws differ across jurisdictions;

Something that is more important is to not take Operational Security (OPSEC) lightly in a hunt or investigation. Always practice good OPSEC practices and follow the process. Be aware of what you’re doing and how you interact and move through these sites, and what you click on. Simply put, it boils down to one thing: Control over information and actions to prevent any attempts at turning them against you. CALI (Capabilities, Activities, Limitations, and Intentions) is a simple checklist outlining the operation’s must-haves.

But how does this relate to the site itself? Since you’re not investigating a forum, how does this come into play? Well, this will come into play once you identify persons of interest who run or claim to run this site, and many of them have accounts across different TOR-based forums. But let’s not get ahead of ourselves here. Let’s start by performing a simple information gathering on site A.

Info Gathering & Research

So, Site A. Your casual TOR-based Marketplace. However, the site was relatively new, not so popular, but had been running for some time with a low profile. It was the usual suspects: wildlife sales, phishing toolkits, stolen identification, and so on.

The first step is to crawl and collect data from sites. This involves collecting and searching for specific keywords and returning metadata to discover new links and content, possible emails, and Bitcoin addresses. Additionally, performing service fingerprinting on discovered services is crucial. However, for now, let’s focus on data collection.

Lucky for us, there are crawlers designed to collect data from sites on the Tor network. For example, TorBot is a tool that can crawl any URL hosted on the Tor network and save website data to an output JSON file. Another tool called TorCrawl can also crawl any URL on the Tor network. In addition to saving site data to an output HTML file, it can search for specific keywords and return metadata about the crawl itself.

There are other tools available as well. I usually leverage Selenium for its flexibility to address the unique authentication requirements of various marketplaces and to gather the exact data we needed for our analysis of illicit digital goods. For example, I gather and crawl domains in engines like Ahamia or those posted in other CTI sharing researcher communities.

The list goes on; it can encompass thousands of different types of sites to run checks on those that are active and those that are not. This way, you can have a kind of a phone book that gets you where you need to go for analysis. Once we’ve scraped data, it’s time to sift through the noise and extract Indicators of Compromise (IoCs) such as IP addresses, URLs, or file hashes. We proactively search through datasets to identify threats before they hit. There’s so much you can pull, but let’s get back to our subject, Site A.

Once I’ve extracted links and endpoints present in them at the time, I shift my focus to usually look for URLs with parameters like foo.php?id=2. I run a simple fuzz on files (pdf, png, xml, etc.) and intel (emails, BTC, ETH, etc.). I can’t really tell you how many vulnerabilities from outdated packages or outdated software are used in such sites, so usually, you look for SQL injection, directory traversal, and more on this later. But once the data is gathered and you have an idea about the overall site, You move to the other site “B” and do exactly the same. Once you’ve gathered data, you start to cross-reference intel and build a report. Usually, at this time, I start noticing the resemblance between the two sites – the same technology is used, but also some small details. All this data must be captured and reported as you conduct an investigation; it’s like you’re writing a penetration report. As you move, you document what’s going on because at times, .onion sites often go down for prolonged periods of time or entirely disappear.

Alright, we’re trying to find a link between two different onion sites. Now, past the information gathering phase, we start by identifying the administrators and popular vendors of site “A”. So, we begin by creating an intelligence profile, taking note of the following:

  • Username / Alias
  • Date of account creation / Online, Offline (map out an activity pattern)
  • PGP public key (Important! Reused keys indicate related accounts)
  • Type of merchandise offered
  • Methods of contact

Now, it’s time to translate the intel we collected into actors, events, and attributes. First things first, remember that thing about the offensive approach? Well, in the data, I noticed a couple of things.

The site “B” is running on the ‘Apache web server,’ which is fairly standard. However, what gets interesting is that the site operator seems to have forgotten to disable the Apache status module, also known as mod_status or server-status. This module provides information about the requests Apache is currently serving and has recently served. The information includes:

  • The time the server was last started/restarted and the duration it has been running.
  • Averages, including the number of requests per second, the number of bytes served per second, and the average number of bytes per request.
  • Details about the current hosts and requests being processed.

And more other juicy intel, but we keep moving as per the playbook. So, two profiles of interest showed up in the data regarding site “A”: one labeled as Admin and another as a vendor. Both were engaged in multiple activities like leaking sensitive data and such. I decided to focus more on gathering information about these profiles.

My goal is to determine whether I can link them to any profiles or accounts on other sites or forums. Starting with the administration of Site A, it is now time to cross-reference that intelligence profile and search for any useful information. I was able to trace the username across various forums, one of which is xss[.]is. In case you’re unfamiliar, “xss is a Russian forum that hosts discussions on vulnerabilities, exploitation, malware, and various other cyber-related topics.” great forum.

To thoroughly scope the profile, I initiated a quick crawl to retrieve a link. Upon checking the link, I received no response, indicating that the site is no longer operational. Additionally, the profile associated with it has been inactive since the last recorded activity in 2021, which mean’s the link is down and the profile was create before all this mess, So I run the site through WebBackMachine to see if any snapshots are available,

img

The site contains snapshots dating back to approximately 2018, with additional snapshots from 2019, 2020, and 2021. It’s time to conduct a manual inspection of these snapshots.

As we delve into the snapshots, we discover a subdomain labeled ‘Sub’ and ‘Services.’ Interestingly, it turns out that the site was offering (Pen-Testing) services before it went offline, What’s even more intriguing is that before it transformed into a service, it started as a personal blog in the 2018 snapshot. This blog featured articles on hacking, tools, and related topics.

As we carefully examine all of this, we managed to collect the following information:

+-------------------------------------------+----------------------------------+
|                Site A                     |              Site B              |
+-------------------------------------------+----------------------------------+
| Bitcoin Address                           | Bitcoin Address                  |
| A name linked to the administration       | Email Address                    |
| alias                                     |                                  |
| Country of residence                      |                                  |
| Email Address                             |                                  |
+-------------------------------------------+----------------------------------+

Now that we’ve managed to collect and analyze some valuable information on the administration of Site A, and we know that the Admin has an interest in hacking services, we still need a key that links the site to it. I did some research and hunted for the email address to see where they’re connected and what services they’ve been using, which all led me to a set of digital footprints of a guy across the web, from social networks like LinkedIn to a YouTube channel and so on. With a possible identification of the owner of Site A, it’s time to examine the evidence provided by the bitcoin addresses.

Blockchain Forensics On Transactions

From a single Bitcoin address, various insights can be derived, including the total number of transactions, the sources and amounts of incoming funds, the destinations and amounts of outgoing funds, a historical timeline of transactions, and identification of other associated Bitcoin addresses within the same wallet. This is where the websites Wallet Explorer and OTX become relevant and come into play.

Usually, the first approach is to start looking for patterns and correlations to link multiple addresses. We also map the flow of funds and relationships between addresses to uncover suspicious activities or money laundering schemes, and extract and analyze additional data associated with transactions, timestamps to gain further insights.

With these tools, we are able to identify any other bitcoin addresses owned by the same wallet.

When we input the address into the explorer, the displayed data includes transaction records, each with specific information like dates and the amounts sent or received. Notably, one of the transactions received funds from an unfamiliar sender (address beginning with “06f”), allowing us to discern the shared ownership of these addresses and subsequently unveil the complete wallet.

With a transaction history dating back to 2019, we now have a time frame that matches our investigation. Let’s proceed to scrutinize the individual transactions associated with each of these Bitcoin addresses.

These two sites are related since their bitcoin addresses come from the same wallet, confirming that the individuals behind them are the same.

Transaction History explores how funds have moved in and out of the address, potentially revealing patterns or connections to other addresses.

Most of the transactions paid into these accounts resemble normal transactions when viewed on the blockchain. However, upon further examination, some transactions involve multiple addresses, possibly indicating the use of a bitcoin mixing service. This is normal, as many actors use a mixing service, or cryptocurrency tumbler, to guarantee anonymity by essentially scrambling the addresses and the payments made.

img

Likely, the bitcoin address is of an exchange, or it may be a well-used bitcoin tumbling service, explaining the large volume of bitcoin addresses it holds in its wallet, allowing it to essentially scramble transactions.

Here, we have successfully conducted research to establish the relationship between the two sites, confirming that they are indeed owned by the same person. Additionally, we have tracked down the market administration. However, it’s important to note that using open-source information on the dark web and the blockchain can only take you so far.

Conclusion

In wrapping up, our journey through this investigation has really highlighted the fascinating process of gathering intelligence. We delved deep, carefully analyzing data and using various tools like open-source intelligence (OSINT) sources, blockchain explorers, and web scraping techniques. It’s amazing how we pieced together information from different places to uncover connections between seemingly unrelated things, revealing shady activities. I hope you’ve learned something from this, Until next time.