From Docker Logs to hono-honeypot: 200 Attack Patterns

Q: How did you pick the 200 plus patterns in the middleware?

The patterns came from months of reading docker compose logs on production boxes I run, categorising recurring probe paths, and testing regex candidates against a corpus of real scanner traffic. No roundup, no curated Owasp list. Every category in the ruleset was in the access log of a small site enough times to earn its place. The shape stabilises fast because scanners recycle attack classes; once you have a month of noise, the categories barely move. I keep the list current by re-reading the logs on new CVEs.

Q: Why does hono-honeypot return 410 Gone instead of 404?

A 410 Gone is a stronger signal than a 404 Not Found. Google and Bing de-index 410 URLs faster because the status explicitly says the resource will not return, so scanner-discovered URLs drop out of crawl queues quickly. A 404 leaves URLs in the queue for retry. The body is empty to save bandwidth and to deny scanners a fingerprint surface. The status is configurable if a specific deployment needs 404 or 403, but the default is deliberate and worth keeping.

Q: Do I need Redis to use the strike and ban feature?

No. The package ships with a MemoryStore that works for single-node deployments out of the box. The strike-and-ban system is opt-in; the default middleware is stateless. When you do want distributed banning across a load-balanced deployment, the HoneypotStore interface is two methods and the README includes a worked ioredis example using INCR, EXPIRE, and SETEX to implement the strike window and the ban TTL. Any KV store with atomic increment can back it: Redis, Cloudflare KV, Deno KV, or a custom adapter.

hono-honeypot is an open-source Hono middleware that blocks vulnerability scanners, bot crawlers, and brute-force probes before they reach application logic. The 200 plus patterns inside it were not curated from a blog roundup. They came from months of watching docker compose logs stream across the production apps I run, writing regexes against the noise, and letting scanner traffic teach me what the internet throws at a fresh box in 2026. This post is how the package got to v1.2.2, what the logs showed, and why 410 beats 404.

TL;DR

I keep two or three terminals open on production boxes watching docker compose logs in real time. Over months of that practice, the attack-pattern dataset wrote itself.

Shipped the result as hono-honeypot on npm, MIT licensed, zero runtime dependencies, v1.2.2 as of April 2026.

200 plus regex patterns, sub-millisecond matching, works on every Hono runtime: Cloudflare Workers, Bun, Deno, Node.js, Vercel Edge, Fastly Compute.

Default response is 410 Gone, not 404. Search engines de-index 410 URLs faster.

Optional strike and ban store for IP-level throttling; MemoryStore ships built in, HoneypotStore interface lets you wire Redis or any KV.

Why I watched Docker logs for months in the first place

The observability practice on every production app I operate is deliberately unglamorous: two or three terminals pinned to docker compose logs -f, scrolled in real time, watched like a dashboard. Not log aggregation, not Loki, not a hosted SIEM. Plain stdout, coloured by the access-log format I wrote into the proxy layer of each app. IP, Cloudflare country code, method, path, status, latency, user agent. One line per request. My bio covers the 15 year arc behind the habit. When you actually read every line your box serves for a week, the legitimate traffic gets boring and the noise floor becomes an attacker census.

A small VPS behind Cloudflare receives scanner traffic in the hundreds of thousands of requests per day range, all of it probing for software the box does not run. PHP scanners sweep for .php endpoints on a Next.js box. Mirai descendants hammer /HNAP1/ and /boaform/ against a site that was never a router. Spring Boot Actuator probes hit a TypeScript app. Laravel Ignition payloads keep arriving in 2026. The volume and taxonomy of scanner traffic on an anonymous small site in a low-prestige IP range is indistinguishable from what a Fortune 500 sees at its edge. The attacker does not know or care what you built.

What the logs actually showed over months of reading

I kept a scratchpad of every probe category that appeared more than a handful of times. Over a few months the list stabilised into a taxonomy. The high-frequency categories were predictable: PHP and WordPress paths, admin panel enumeration, .git and .env discovery, framework config files (next.config.ts, nuxt.config.ts, vercel.json, netlify.toml, serverless.yml), version-control directories, dependency manifests (package.json, composer.json, Gemfile, requirements.txt), SSH and credential files (/.ssh/, /id_rsa, /.npmrc, /.aws/), and the ever-present .DS_Store.

The middle-frequency categories were the instructive ones. IoT and router exploits (/HNAP1/, /boaform/, /GponForm/) pour in from Mirai-family botnets. Microsoft Exchange and SharePoint webshell paths (/owa/, /ecp/, /_layouts/, /_vti_bin/) stay active years after the headline CVEs. Self-hosted app probes (/nextcloud/, /owncloud/, /WebInterface/ for CrushFTP) track the vulnerability cycle of popular self-hosted software. Observability endpoints (/grafana/, /kibana/, /prometheus/, /jira/, /confluence/, /geoserver/) get swept on the assumption that if the app is popular, it is exposed. CI/CD surfaces (/jenkins/, /portainer/, /gitea/, /gitlab/) plus Kubernetes discovery (/metrics, /healthz, /readyz, /livez, /.dockerenv) come through continuously.

The low-frequency but high-signal categories were the ones most roundups miss. Vite dev-server path traversal (CVE-2025-30208) shows up as /@fs/, /@vite/, /@id/ probes against production deploys that never shipped a dev server. SSRF cloud-metadata attempts (169.254.169.254, /latest/meta-data) test whether the origin will proxy them to an EC2 metadata endpoint. Java enterprise surfaces (/WEB-INF, /manager/html, /solr, /actuator) arrive in rotation. Laravel and Django debug endpoints (/_ignition, /__debug__) are tried in case someone shipped with debug on. And the most entertaining category: brute-force directory tokens. /old, /new, /test, /demo, /script, /2017, /2024, on the theory that somewhere behind one of those is an unprotected staging environment.

Once the taxonomy stabilised, the package wrote itself. Every category became a regex group, with smart anchoring to keep false positives off the legitimate surface. /admin at the root gets blocked; /api/admin does not. The dataset came from requests against a paid API product and a handful of client dashboards I operate; the taxonomy generalises well beyond that because scanners do not target specific tech stacks.

Why 410 Gone beats 404 for honeypot responses

The default response in hono-honeypot is 410 Gone, not 404 Not Found. The choice is not stylistic. It is the under-used deliberate signal you can send to both a search engine and a scanner.

A 404 tells a crawler "this path does not exist right now, try again later". The URL stays in the crawler queue and keeps tying up crawl budget. A 410 says "this resource is gone and will not come back". Google de-indexes a 410 URL faster than a 404, which is exactly what you want for an attacker path that a careless bot decided to scan through your origin. The fewer honeypot URLs live in scanner queues, the less noise you see over time.

The body is empty. An empty response saves bandwidth across millions of probes and removes any ability for a scanner to fingerprint the honeypot via error-page content. A known-bot user agent gets the same 410 so legitimate crawlers see the same "gone" signal, but they do not accumulate strike counts; banning a real Googlebot or ClaudeBot IP would torch a site's SEO and AI-citation surface overnight. The existing Next.js proxy on this site runs the same 410-with-bot-allowlist pattern, the same reason I wrote up whether llms.txt actually matters for a personal brand after shipping it.

The strike store: optional, stateless by default

The middleware is stateless out of the box. Every matched probe gets a 410, no memory of who sent it, no throttling. Stateless is the right default because a stateful security layer with no fallback becomes the site's weakest link when its backing store hiccups.

Stateful mode is opt-in. The HoneypotStore interface is small, the package ships a MemoryStore for single-node deployments, and the README documents a Redis example using INCR plus EXPIRE for the strike window and SETEX for the ban TTL. Three strikes inside an hour ships as the default threshold; two was too aggressive for noisy NAT IPs, four let the bad actors stick around too long. IP extraction is a chain: cf-connecting-ip first, then the first address in x-forwarded-for, then x-real-ip, then 'unknown'. Unknown IPs never accumulate strikes because tracking them would ban every request behind a header-stripping proxy.

Stateful mode buys one thing: a scanner IP that burns through ten patterns in a minute gets 24 hours of 410 with no real work on your origin. Stateless mode already answers in sub-millisecond regex time; stateful mode removes even that for 24 hours after a ban fires. At high-volume origins it compounds; at low-volume ones the memory store is enough.

Runtime shape: why Hono, and why every runtime

Hono was the right substrate because its middleware model runs the same code on Cloudflare Workers, Bun, Deno, Node.js, Vercel Edge, and Fastly Compute. A developer who deploys one app on Workers and another on a Bun box should not have to fork their security middleware. Zero runtime dependencies is the same decision: every runtime has its own pathology for transitive deps, and shipping a dep graph that breaks Fastly Compute but works on Workers is exactly the rough edge users do not debug. Hono is also still the tightest web framework for edge runtimes in 2026, and its middleware surface is small enough to audit in an afternoon.

The decision to ship on Hono rather than Next.js is because this site already has the Next.js-proxy version running in proxy.ts alongside the same operator mindset that defends SQLite for a solo founder. Porting the idea into a Hono package reaches a much larger audience than keeping it as a private file. The two are the same idea shaped for two different runtimes: both run on Node-level APIs, both log in a human-readable access-log format, and the npm package is the version any Hono developer can drop in with three lines.

The `AGENTS.md` file and why it ships with the package

One detail most roundups never mention: the package ships an AGENTS.md at the repo root. It is short, written for an AI coding agent working on the package, and it encodes the rules the maintainer would otherwise have to restate every time Claude Code or Cursor opens the repo. Which files are load-bearing. Which parts are free to change without a version bump. How the test matrix maps to the runtime list. How to add a pattern category without widening the false-positive surface.

In 2026, shipping AGENTS.md in an open-source npm package is not a courtesy. Agents are now a plurality of the readers opening a production repo for the first time, and a repo that encodes its own rules gets better pull requests and fewer naive regressions. The model does not have to read the whole history to make the right call; the file tells it where the landmines are. Every npm package I ship now includes one.

What I would do differently in 2027

Three items are queued. First, richer pattern metadata: every regex carrying a category tag, a first-observed date, and a reference CVE where one applies, so users can filter the ruleset by recency or severity. Second, per-pattern telemetry hooks: a structured event when a regex fires, so users can wire their own analytics without parsing stdout. Third, content-type-specific response helpers (empty body for HTML probes, a terse JSON blob for API probes) without users wiring their own. None ship in v1.2.2 because the core job, blocking 200 plus patterns at sub-millisecond speed with a sane default, has to stay the priority. Optional complexity costs more than it delivers when most users install and forget.

The real limit of the log-watching method is that it is retrospective. A novel attack class shows up in the noise before you know what to call it, and by the time a regex exists the CVE may already have its own Wikipedia page. That is a feature: patterns that survive my scratchpad are patterns scanners still use. The week scanners stop sending Mirai-family /HNAP1/ traffic is the week that regex retires. None have retired yet.

FAQ

What is `hono-honeypot` and when should I use it?

hono-honeypot is an open-source Hono middleware that blocks 200 plus attack patterns before your app handler runs. Install it on any Hono app you deploy to a public URL, from a Cloudflare Worker to a Node container. It is useful anywhere scanner noise currently reaches your app code: unnecessary compute, polluted access logs, inflated error budgets. The install is three lines, the default is stateless, and the package has zero runtime dependencies so it drops into Workers and Fastly Compute without ceremony.

How did you pick the 200 plus patterns in the middleware?

The patterns came from months of reading docker compose logs on production boxes I run, categorising recurring probe paths, and testing regex candidates against a corpus of real scanner traffic. No roundup, no curated Owasp list. Every category in the ruleset was in the access log of a small site enough times to earn its place. The shape stabilises fast because scanners recycle attack classes; once you have a month of noise, the categories barely move. I keep the list current by re-reading the logs on new CVEs.

Why does `hono-honeypot` return `410 Gone` instead of `404`?

A 410 Gone is a stronger signal than a 404 Not Found. Google and Bing de-index 410 URLs faster because the status explicitly says the resource will not return, so scanner-discovered URLs drop out of crawl queues quickly. A 404 leaves URLs in the queue for retry. The body is empty to save bandwidth and to deny scanners a fingerprint surface. The status is configurable if a specific deployment needs 404 or 403, but the default is deliberate and worth keeping.

Does the honeypot ever block real users or search engine crawlers?

The pattern set uses smart anchoring to avoid blocking legitimate app routes; /admin at the root is blocked, but /api/admin inside an actual admin API namespace is allowed. Known search and AI crawlers (Googlebot, Bingbot, ClaudeBot, GPTBot, Perplexity, and the rest of the 2026 allowlist) get the same 410 on honeypot paths so they de-index the probe URLs, but they never accumulate strikes and never get banned. Banning a real Googlebot IP would torch a site's SEO for 24 hours; the allowlist is the guardrail.

Do I need Redis to use the strike and ban feature?

No. The package ships with a MemoryStore that works for single-node deployments out of the box. The strike-and-ban system is opt-in; the default middleware is stateless. When you do want distributed banning across a load-balanced deployment, the HoneypotStore interface is two methods and the README includes a worked ioredis example using INCR, EXPIRE, and SETEX to implement the strike window and the ban TTL. Any KV store with atomic increment can back it: Redis, Cloudflare KV, Deno KV, or a custom adapter.

Why ship an `AGENTS.md` file in an open-source npm package?

Because in 2026, the plurality of contributors opening an open-source package for the first time are AI coding agents, and the repo that encodes its own rules gets better work done inside it. AGENTS.md lists the load-bearing files, the parts that are free to change, the test matrix across runtimes, and the rule for adding a pattern without widening false positives. The payoff is measurable in cleaner pull requests and drastically less context-gathering when Claude Code or Cursor opens the repo. Every npm package I ship now includes one by default.

Where is the source code and how do I contribute?

The package is MIT licensed and lives at github.com/ph33nx/hono-honeypot. Issues and pull requests are welcome. The repo layout is intentionally flat: patterns in one file, the middleware in another, the store interface in a third, runtime test matrices in a CI workflow. Reading the AGENTS.md at the root before opening a PR saves time; it documents which files are load-bearing and how the test matrix maps to the supported runtime list. Pattern contributions should come with a reference attack class and a note on false-positive risk before they land.