Cyber Pulse Academy

Latest News

Wordlist Scanning (T1595.003)

Ultimate Guide to Active Scanning - Wordlist Scanning: Attack & Defense


Wordlist Scanning is a targeted reconnaissance technique where attackers systematically test a list of common or likely words against a target system to discover hidden resources, files, directories, or parameters that aren't publicly linked. This practical guide transforms the official MITRE data into actionable intelligence for both red and blue teams.




White Label 42b38606 wordlist scanning t1595.003 1

Table of Contents


Understanding Wordlist Scanning in Simple Terms

Imagine a thief checking every window and door in a large office building at night. They don't know which ones are unlocked, but they have a list of every possible entrance, main doors, fire exits, loading bays, even old windows. They methodically try each one until they find an entry point. That's Wordlist Scanning in the digital world.


Instead of physical doors, attackers use automated tools to try thousands of common names for web pages (/admin), directories (/backup), files (/config.json), or API endpoints (/api/v1/users). These resources are often hidden from public navigation but still accessible if you know the exact address. The attacker's "wordlist" is their digital skeleton key ring, filled with keys for common locks.


White Label b2dc5b30 wordlist scanning t1595.003 2

Decoding the Jargon: Key Terms for Wordlist Scanning

Term Simple Definition Why It Matters
Wordlist / Dictionary A text file containing a list of possible directory or file names (e.g., "admin", "backup", "test", "config"). The attacker's ammunition. Quality wordlists (like common.txt or directory-list-2.3-medium.txt) dramatically increase success rates.
HTTP Status Code A 3-digit number returned by a web server (200=OK, 404=Not Found, 403=Forbidden, 301=Redirect). Attackers filter results based on these codes. A "200" on /backup.zip is a major find. A "403" might still indicate a protected resource worth targeting.
Brute-force vs. Smart Scanning Brute-force tries random combinations; smart scanning uses curated, context-aware wordlists. Modern Wordlist Scanning is "smart", using lists tailored to CMS (WordPress, Joomla), frameworks, or specific industries.
Resource Discovery The goal of the technique: uncovering hidden but accessible assets. This is the critical first step that leads to data leaks, admin panel access, or vulnerable components.
Rate Limiting A defense that blocks an IP after too many rapid requests. Forces attackers to scan slowly ("low and slow") to avoid detection, turning a 5-minute scan into a 5-day campaign.

The Attacker's Playbook: Executing Wordlist Scanning

Step-by-Step Breakdown

An attacker doesn't just run a tool and get lucky. Professional Wordlist Scanning follows a deliberate process:

  1. Target Selection & Scope: Identify the target (e.g., https://target-company.com). Use passive reconnaissance (T1592) to gather tech stack hints (is it WordPress? An API server?).
  2. Wordlist Curation: Choose or craft the right wordlist. A generic list is a start, but a list for "WordPress 6.0 directories" or "common API endpoints" is far more effective.
  3. Tool Configuration: Set up the scanner (like gobuster or ffuf). Configure:
    • Target URL
    • Wordlist path
    • Request rate (to avoid tripping alarms)
    • Extensions to try (e.g., .php, .bak, .zip)
    • Filter rules (ignore 404s, flag 200s and 403s)
  4. Execution & Evasion: Run the scan, often using proxies, Tor, or cloud VMs to hide the source IP. "Low and slow" is the mantra against vigilant defenders.
  5. Analysis & Triage: Review the output. A discovered /phpinfo.php is a critical find. /backup_2023.sql is a potential goldmine. These become targets for the next phase (Initial Access or Collection).

Red Team Analogy & Mindset

Think like a burglar casing a building. You're not breaking in yet. You're checking for an unlocked side door, an open window on the second floor, or a dumpster with blueprints. Your goal is to find the easiest, least conspicuous point of entry. In red teaming, wordlist scanning is that crucial "casing" phase. You're mapping the attack surface that the blue team might have forgotten. Success isn't just about finding something; it's about finding something useful without getting caught.


Tools & Command-Line Examples

  • ffuf (Fuzz Faster U Fool): Modern, fast, highly customizable. The current tool of choice.
  • gobuster: Popular for directory and DNS brute-forcing. Simple and effective.
  • Dirb/Dirbuster: Older but reliable, with built-in wordlists.
Example ffuf Command:

ffuf -w /usr/share/wordlists/dirb/common.txt -u https://target.com/FUZZ -t 50 -mc 200,403 -rate 100

# Breakdown:
# -w : Path to wordlist
# -u : Target URL with "FUZZ" where words get inserted
# -t : Number of concurrent threads
# -mc : Match HTTP status codes 200 (OK) and 403 (Forbidden)
# -rate : Requests per second (keep low to evade detection)

Example gobuster Command:


gobuster dir -u https://api.target.com -w ./api-wordlist.txt -x php,json,bak -t 20 --delay 200ms

# dir : Mode is directory brute-force
# -x : File extensions to try
# --delay : Adds 200ms between requests to be stealthier

Real-World Campaign Example

Advanced Persistent Threat (APT) groups routinely use Wordlist Scanning in their reconnaissance phases. For example, APT29 (Cozy Bear), associated with Russian intelligence, has been documented using automated scanning tools to discover exposed services and vulnerable web applications as a precursor to targeted attacks.


In a campaign analyzed by Mandiant, the group conducted extensive internet-wide scanning to identify vulnerable VPN appliances and external web servers of targeted organizations in the government and healthcare sectors. This scanning, which included wordlist-style discovery of specific management portals and APIs, enabled them to build a target list for subsequent exploitation.


This highlights that even sophisticated state-sponsored actors rely on this fundamental technique. It's not "noisy" or "amateur" if done carefully, it's effective intelligence gathering.


The Defender's Handbook: Stopping Wordlist Scanning

Blue Team Analogy & Detection Philosophy

Think like a building security guard watching a thousand doors on a monitor. A single person trying one door is normal. Someone methodically trying every door, in sequence, at a steady pace, is suspicious. Your job isn't to stop the first knock, but to detect the pattern of systematic testing.


The defender's philosophy for catching Wordlist Scanning hinges on two concepts: volume and failure rate. Legitimate users request known, linked resources. Attackers request many unknown, non-existent resources (resulting in 404s) with occasional, surprising successes (200s on obscure paths). Your security controls must spot this anomalous behavior.


SOC Reality Check: What to Look For

In your SIEM, you'll rarely see an alert titled "WORDLIST SCAN DETECTED." Instead, you see correlated events:

  • Web Server Logs: A spike in 404 "Not Found" errors from a single IP address.
  • WAF Alerts: Requests for known malicious paths (e.g., /wp-admin, /phpmyadmin) from non-internal IPs.
  • Rate Limiting Triggers: Your CDN or edge firewall blocking an IP for too many requests.
  • The Clincher: The same IP that generated 1000 404s suddenly gets a 200 OK on /backup/prod.db. This is the "smoking gun" sequence.

Threat Hunter's Eye: Practical Query

Here is a ready-to-use Sigma rule for detecting potential wordlist scanning activity. This rule looks for a source IP generating a high number of unique 404 responses within a short timeframe, a classic signature of this technique.

# Sigma Rule: Potential Directory/Wordlist Brute-Force Scanning
# Author: MITRE ATT&CK Field Guide
# Reference: T1595.003 - Wordlist Scanning
title: High Volume of 404 Errors from Single Source
id: a1b2c3d4-5678-90ef-ghij-klmnopqrstuv
status: experimental
description: Detects a source IP generating an excessive number of HTTP 404 responses, which may indicate directory or file brute-forcing (wordlist scanning).
author: Your Blue Team
logsource:
    category: webserver
    product: nginx/apache/iis
detection:
    selection:
        c-ip: '*'
        sc-status: 404
    timeframe: 5m
    condition: selection | count(c-ip) by c-ip > 150
fields:
    - c-ip
    - cs-host
    - cs-uri-stem
falsepositives:
    - Web crawlers (bots) with broken logic
    - Legacy clients requesting old resources
    - Penetration testing activity
level: medium
tags:
    - attack.reconnaissance
    - attack.t1595.003

You can convert this Sigma rule to your specific SIEM (Splunk, Elasticsearch, Azure Sentinel) using tools like sigmac.


Key Data Sources for Detection

  • Web Server Access Logs (Apache, Nginx, IIS): The primary source. Ensure you log: Client IP, Timestamp, Request Method, URI, Status Code, User-Agent.
  • Web Application Firewall (WAF) Logs: Captures blocked requests and threat intelligence-based alerts for known malicious paths.
  • CDN/Proxy Logs (Cloudflare, Akamai): Often see the scanning traffic first and can provide rate-limiting data.
  • Network Security Monitoring (NSM) / IDS: Tools like Zeek/Bro can generate http.log files perfect for this analysis.

Building Resilience: Mitigation Strategies for Wordlist Scanning

Actionable Mitigation Controls

Don't just try to detect scanning; make the scan fruitless and obvious.

  • Remove Unnecessary Resources: The single best mitigation. If there's no hidden /backup folder, it can't be found. Conduct regular asset inventories and purge old test files, development pages, and forgotten admin portals from production.
  • Implement Strong Access Controls: For legitimate but sensitive resources (e.g., /admin, /api/internal), enforce authentication before the application layer. Use network segmentation or allowlisting so they are inaccessible from the public internet.
  • Deploy a Web Application Firewall (WAF): Configure it to rate-limit requests from a single IP, especially those resulting in 404s. Many WAFs have built-in signatures for "Directory Traversal" and "Scanner Detection."
  • Obfuscate with Care: While not a security control, using non-obvious directory names (e.g., /c4f8e9j3/ instead of /admin/) can defeat generic wordlists. Do not rely on this alone (security through obscurity).
  • Monitor and Respond: Have a playbook for when scanning is detected. This can range from simply blocking the IP at the firewall to initiating enhanced monitoring on the resources they discovered.

Red vs. Blue: A Quick Comparison

Attacker's Goal (Red Team) Defender's Action (Blue Team)
Discover hidden, accessible assets (files, directories, panels). Minimize the attack surface by removing unnecessary public assets.
Remain stealthy by scanning slowly ("low and slow") to evade rate limits. Detect patterns, not just volume, by looking for high 404 rates over extended periods.
Use targeted wordlists based on the identified technology stack. Harden specific technologies and monitor logs for requests to common vulnerable paths for your stack.
Leverage discovered resources for initial access or data collection. Protect necessary resources with strong authentication and network controls, making discovery irrelevant.

Wordlist Scanning Cheat Sheet

🔴

Red Flag

A single external IP address generating hundreds of HTTP 404 (Not Found) errors within minutes, especially if followed by a successful 200 (OK) on an obscure path like /archive.zip or /phpinfo.php.

🛡️

Blue's Best Move

Conduct a quarterly "forgotten asset" hunt. Use the same scanning tools (like ffuf) against your own public-facing assets. Find and remove or protect what shouldn't be publicly accessible before an attacker does.

🔍

Hunt Here

Web server access logs. Correlate client IPs with status code counts. Build a dashboard tracking the top sources of 404 errors and investigate any IP that is also in the top 10 list for legitimate (200) requests to non-standard pages.

📚

Learn More


Conclusion and Next Steps

Wordlist Scanning remains a foundational and highly effective reconnaissance technique precisely because it works. Defenders often overlook the "forgotten" parts of their public-facing infrastructure, creating opportunities for persistent attackers.

Moving from theory to practice is key:

  • For Blue Teams: Run the provided Sigma rule in your environment. Schedule a monthly review of your top 404-generating IPs. Most importantly, scan yourself to find and fix hidden assets.
  • For Red Teams: Hone your skills with ffuf or gobuster. Practice crafting targeted wordlists and operating stealthily to evade common detection rules.

To deepen your understanding of the reconnaissance phase, explore related techniques like [Internal-Link: T1592 - Gather Victim Host Information] and [Internal-Link: T1589 - Gather Victim Identity Information].


Further Reading & External Resources:


Remember, in cybersecurity, what you don't know can hurt you. Shine a light on your dark corners before an adversary does.


DONATE · SUPPORT

We keep threat intelligence free. No paywalls, no ads. Your donation directly funds server infrastructure, research, and tools. Every contribution - no matter the size - makes this platform sustainable.
100% of your support goes to the platform. No corporate sponsors, just the community.
ROOT::DONATE

Leave a Comment

Your email address will not be published. Required fields are marked *

Ask ChatGPT
Set ChatGPT API key
Find your Secret API key in your ChatGPT User settings and paste it here to connect ChatGPT with your Courses LMS website.
Certification Courses
Hands-On Labs
Threat Intelligence
Latest Cyber News
MITRE ATT&CK Breakdown
All Cyber Keywords

Every contribution moves us closer to our goal: making world-class cybersecurity education accessible to ALL.

Choose the amount of donation by yourself.