docs(wiki): add Anti-Track v2 page + sidebar entry

docs: anti-tracking v2 design — layered block/poison/anonymize
2026-06-30 10:00:52 +00:00 · 2026-06-17 15:15:28 +02:00 · 2026-06-17 15:15:28 +02:00
4 changed files with 455 additions and 0 deletions
--- a/docs/superpowers/specs/2026-06-17-anti-tracking-v2-design.md
+++ b/docs/superpowers/specs/2026-06-17-anti-tracking-v2-design.md
@ -0,0 +1,332 @@
+# Anti-Tracking v2 — Layered Block / Poison / Anonymize
+
+- **Date:** 2026-06-17
+- **Package:** `secubox-toolbox` (+ small wires into `secubox-dns-guard`)
+- **Status:** Design approved, pending implementation plan
+- **Related:** `social_graph`/`social.py` cartography, `escalate.py` (#527),
+  `autolearn`, `protective_mode.py`, `ad_ghost.py`
+
+---
+
+## 1. Goal
+
+Give LAN clients behind the SecuBox transparent WAF a privacy layer that:
+
+1. **Minimises external attack surface** — blocks pure trackers at three depths
+   (DNS refuse → exclusive-IP nft-drop → HTTP 204).
+2. **Defeats profiling without breaking sites** — trackers that are load-bearing
+   (or whose blocking would reveal us as a blocker) are *poisoned*: the client
+   presents a **stable fabricated identity** the target accepts, so the tracker
+   builds a coherent profile of a person who does not exist.
+3. **Anonymises the residue** — always-on header hygiene (DNT/GPC, strip
+   operator/carrier headers) on every flow.
+
+Derived from operator intent: *"supprimer tous les accès externes d'une page
+pour n'avoir que les accès au site demandé… utiliser les domaines des cookies
+traceurs pour blacklister… corréler les IPs banned"* (the block/surface side)
+and *"cookies rémanents… contenu fake mais accepté par la cible… faire du bruit…
+anonymiser et rendre silencieux la pertinence de mes traces"* (the poison side).
+
+### Decisions locked during brainstorming
+
+| Question | Decision |
+|---|---|
+| Scope | One combined spec, both block+poison |
+| Default posture | Layered: **block pure, poison load-bearing, anonymize residue** |
+| Poison identity | **Stable fake per (client, tracker)** — coherent fiction, no rotation |
+| Noise | **Passive signal degradation only** — no new traffic (board-constrained) |
+| IP correlation | **Exclusive-tracker IPs only**, CDN/cloud allowlisted |
+| First-party-only | **Opt-in "Fort Knox" mode, per-site** (off by default) |
+
+---
+
+## 2. Architecture (Approach C)
+
+A new pure-Python **brain** (`privacy.py`) holds all policy and the fake-identity
+jar. One new thin **mitm addon** (`privacy_guard.py`) applies the brain's verdict
+in the hot path. The **offline pipeline** reuses the existing hourly `autolearn`
+timer and proven nft/DNS machinery — no new daemon.
+
+```
+┌── HOT PATH (mitm worker, per flow) ──────────────────────────────┐
+│  privacy_guard.py  (NEW thin addon)                              │
+│    request hook  → privacy.verdict(flow, client) → one of:       │
+│        allow | block(204) | poison | anonymize | fortknox-block  │
+│    response hook → rewrite Set-Cookie per the chosen fake jar    │
+│         └─→ privacy.py  (NEW brain, pure Python, unit-tested)    │
+│               • classify(host)      tracker? pure vs load-bearing│
+│               • fake_jar(client,trk) stable fabricated identity  │
+│               • verdict(...)        the layered policy           │
+│               • fortknox(site,host) same-registrable-domain?     │
+│               • reads: learned-trackers.txt, exclusive-ip set,   │
+│                        cdn-allowlist, filters.json toggles       │
+└──────────────────────────────────────────────────────────────────┘
+            ▲ reads lists                       │ records observations
+            │                                   ▼
+┌── OFFLINE PATH (hourly autolearn timer / existing store) ────────┐
+│  social_graph + store.py   → observe 3rd-party tracking cookies  │
+│  autolearn (EXTENDED)       → learned-trackers.txt               │
+│                               + cookie-derived blacklist (NEW)    │
+│                               + exclusive-tracker-IP set (NEW)    │
+│  escalate.py (EXTENDED)     → nft-drop exclusive-tracker IPs      │
+│  dns-guard feed (NEW wire)  → refuse blacklisted domains at DNS   │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+### Files
+
+**New**
+- `mitmproxy_addons/privacy_guard.py` — request/response hooks, applies verdict only.
+- `secubox_toolbox/privacy.py` — brain: classify, jar, verdict, fortknox, list loaders.
+- `tests/test_privacy.py`, `tests/test_privacy_autolearn.py` — unit + offline tests.
+- `www/privacy/` (or a tab in the existing toolbox webui) — Privacy / Anti-Track panel.
+- `data/cdn-allowlist.txt` — Cloudflare/Fastly/Akamai/Google/AWS published ranges.
+
+**Extended**
+- `sbin/secubox-toolbox-autolearn` — two new outputs (blacklist signal, exclusive-IP set).
+- `secubox_toolbox/escalate.py` — consume `exclusive-tracker-ips.txt` → nft sets with TTL.
+- `secubox_toolbox/social.py` / `social_graph.py` — flag cross-site tracking-cookie setters.
+- `secubox_toolbox/filters.py` — new toggle defaults (Section 5).
+- small `dns-guard` sync hook (write blacklist into its blocklist input + reload).
+- `systemd/secubox-toolbox-mitm.service` — add `-s …/privacy_guard.py`.
+
+**Untouched / still authoritative**
+- `social.py` cartography schema, `store.py` core tables, nft `secubox_blacklist`
+  `blacklist_v4/v6` sets, consent tiers r0/r1/r2.
+
+### Boundary
+
+`privacy_guard.py` performs **no** classification or state math — it only applies
+`privacy.verdict()`. All subtle logic (fake-id jar, layered verdict, fortknox) lives
+in `privacy.py`, testable without mitmproxy. The hot path does only in-memory
+dict/regex lookups; all resolve/correlate work is offline in `autolearn`.
+
+---
+
+## 3. Layered verdict (`privacy.verdict(flow, client)`)
+
+Evaluated per request; first match wins.
+
+```
+1. FORT KNOX armed for this site?  (site ∈ fortknox_sites)
+   ├─ host NOT same-registrable-domain as the page's site → BLOCK (any 3rd-party)
+   └─ else (first-party)                                   → ALLOW
+
+2. Is host a tracker?  classify(host)
+   ├─ NOT a tracker        → ALLOW (+ anonymize headers if privacy_anonymize)
+   ├─ PURE tracker         → BLOCK  (204 at HTTP; also DNS-refused / maybe nft-dropped)
+   └─ LOAD-BEARING tracker → POISON
+```
+
+### Classification — pure vs load-bearing
+
+A host is a **tracker** if it matches the static `_TRACKER` patterns **or**
+`learned-trackers.txt`.
+
+- **PURE** (safe to block): no first-party role — never observed serving
+  non-tracking content, request looks like a beacon (analytics/collect/pixel path,
+  or no page-critical `Accept: text/html|script` dependency). `autolearn` promotes
+  a host to PURE only after it is confirmed beacon-only across ≥2 sites.
+- **LOAD-BEARING** (poison, never block): also serves needed content (tag managers,
+  CDN-hosted JS), or blocking it has broken pages before.
+- **Default-unknown tracker → POISON.** Fail-safe: when in doubt, poison (never
+  breaks a page), never block.
+
+### POISON action
+
+- **Request hook:** replace outbound `Cookie` values for that tracker with the
+  client's stable fake identity from the jar (forge, not drop → request accepted).
+- **Passive signal degradation:** referer → site root; fake-but-consistent
+  `Sec-CH-UA`/locale/screen hints; pin `DNT:1`, `Sec-GPC:1`.
+- **Response hook:** rewrite the tracker's `Set-Cookie` so the *next* request keeps
+  presenting the same fake id (jar persists it); drop re-identification cookies that
+  don't match the jar.
+
+### ANONYMIZE (always-on hygiene, all flows incl. allowed first-party)
+
+Strip operator/carrier headers (existing `protective_mode` list: MSISDN, x-acr,
+x-wap-*, forwarded, etc.), pin DNT/Sec-GPC. Active whenever `privacy_anonymize`.
+
+### Relationship to `protective_mode.py`
+
+`protective_mode`'s `spoof` (drop-cookie) becomes the **anonymize** primitive; the
+new **poison** verdict supersedes drop for load-bearing trackers. `protective_mode`
+is refactored to call `privacy.py` rather than carry its own host list — one source
+of truth for tracker patterns.
+
+---
+
+## 4. Stable-fake-identity jar
+
+One persistent fabricated identity per `(client, tracker)` that looks real, never
+rotates, and never derives from the client's real data.
+
+### Generation — deterministic, no stored secret in clear
+
+```
+fake_id(client, tracker, cookie_name) =
+    HMAC( server_seed , client_mac_hash || registrable(tracker) || cookie_name )
+    → shaped to the cookie's observed format
+```
+
+- `server_seed`: `/etc/secubox/secrets/privacy-jar.key`, mode 0600, owner
+  `secubox-toolbox`, generated once in postinst. Not in TOML, not in git (CSPN
+  secrets rule).
+- **Deterministic ⇒ stateless to compute.** Same inputs always yield the same id,
+  so the identity is "rémanent" across worker restarts and identical across all 4
+  R3 workers with **no shared state**.
+- `client_mac_hash` is the already-hashed client id from `store.py` (never raw MAC).
+- Different `client_mac_hash` in the HMAC ⇒ no cross-client leakage.
+
+### Format shaping — so the target accepts it
+
+`privacy.py` carries a **format-profile table** keyed by cookie name/pattern
+(`_ga` → `GA1.2.<int>.<int>`, UUID, base64 blob, `<ts>.<rand>`, …). It renders the
+HMAC output into the right shape so the value is syntactically valid. Unknown
+cookie → generic opaque token matching the observed length/charset. Observed shapes
+come from `social_graph`'s existing passive cookie observation (names + formats
+only, never real values).
+
+### Persistence & overrides
+
+The deterministic path needs no storage. A small new `privacy_jar` table in the
+existing `toolbox.db` is used **only** for exceptions: a server-assigned id we must
+echo back verbatim (we store *our* fake mapping, never the real one), or an operator
+pin/reset. TTL-purged like other tables; `reset_client()` (RGPD) also wipes that
+client's jar rows.
+
+### Failure mode
+
+If `privacy-jar.key` is missing/unreadable, fall back to **anonymize-drop** (old
+spoof) rather than emit a guessable/zero id — fail toward privacy, never toward a
+weak fake.
+
+---
+
+## 5. Offline pipeline (in the hourly `autolearn` timer)
+
+Produces three files the hot path / DNS layer read. No new daemon, no hot-path cost.
+All writes atomic (temp + rename).
+
+### 5.1 Cookie-derived blacklist signal (NEW)
+
+Extend `social_graph` observation to flag a domain as a **tracking-cookie setter**
+when, across `social_edges`/`social_nodes`, it:
+
+- sets a third-party cookie (domain ≠ page registrable domain), **and**
+- the cookie id is reused across **≥2 distinct sites** (cross-site correlation =
+  the definition of a tracking cookie), **and**
+- is set **pre-consent** (`pre_consent_hits` column already exists).
+
+`autolearn` folds these into `learned-trackers.txt` tagged reason `cookie-xsite`.
+This realises *"utiliser les domaines des cookies traceurs pour blacklister."*
+
+### 5.2 Exclusive-tracker IP set (NEW)
+
+For each learned tracker domain, resolve it (reuse `escalate._resolve_ips`) and
+accumulate a `domain→IPs` map in `toolbox.db`. An IP qualifies for nft-drop **only
+if**:
+
+- every domain ever seen resolving to it is a known tracker (no first-party
+  co-resident), **and**
+- it is **not** in the CDN/cloud allowlist (`data/cdn-allowlist.txt`:
+  Cloudflare/Fastly/Akamai/Google/AWS published ranges, refreshable).
+
+Qualifying IPs → `exclusive-tracker-ips.txt`. `escalate.py` (extended) adds them to
+the existing `inet secubox_blacklist blacklist_v4/v6` sets **with a TTL** (default
+4h, auto-renewed while still qualifying) so a reassigned IP ages out. Every
+add/remove is appended to `/var/log/secubox/audit.log` (reason + TTL) — auditable
+and reversible per CSPN.
+
+### 5.3 DNS-guard wiring (NEW, defense-in-depth)
+
+A small sync step writes the learned blacklist into `secubox-dns-guard`'s blocklist
+input and reloads it, so pure trackers are **refused at DNS** before a connection is
+attempted — covers flows that bypass the proxy. Domains remain HTTP-204'd by
+`privacy_guard` too (belt and suspenders).
+
+### Layered block, three depths
+
+DNS refuse → exclusive-IP nft-drop → HTTP 204. A pure tracker hits whichever fires
+first; **load-bearing trackers hit none** (poisoned, not blocked).
+
+### Guardrails (CSPN / board safety)
+
+- CDN allowlist is the hard gate against collateral on shared IPs.
+- Every list write atomic; every nft/DNS mutation audit-logged with reason + TTL.
+- Master `privacy_enforce` toggle disables **all** active blocking/dropping
+  (observe-only) without a restart.
+
+---
+
+## 6. Config, UI, error handling, tests
+
+### 6.1 Config (`/etc/secubox/toolbox/filters.json`, hot-reloaded, default-safe)
+
+| Key | Default | Effect |
+|---|---|---|
+| `privacy_enforce` | `false` | master switch; off = observe-only |
+| `privacy_poison` | `true` | forge stable fake id for load-bearing trackers vs anonymize-drop |
+| `privacy_anonymize` | `true` | always-on header hygiene (DNT/GPC, strip operator headers) |
+| `privacy_ip_drop` | `false` | enable exclusive-tracker IP nft-drop |
+| `privacy_dns_feed` | `true` | sync learned blacklist into dns-guard |
+| `fortknox_sites` | `[]` | per-site first-party-only opt-in list |
+
+`privacy_enforce` ships **off** → feature deploys dark, soaks in observe-only, then
+is armed. `privacy_ip_drop` separately gated (highest-collateral action).
+
+### 6.2 UI — "Privacy / Anti-Track" panel (toolbox webui)
+
+Per-client tracker cartography (already collected); learned blacklist with reasons
+(`cookie-xsite`, `opgrade`, `threat-intel`); exclusive-IP drop list with TTLs;
+poison counters (fake identities served); Fort-Knox per-site arm/disarm. Reads route
+through the aggregator like every other module; write/toggle actions POST to the
+toolbox API. No new socket.
+
+### 6.3 Error handling — fail toward privacy, never toward breakage
+
+- Missing `privacy-jar.key` → anonymize-drop fallback.
+- `learned-trackers.txt` / IP set / CDN allowlist unreadable → treat as empty (no
+  false blocks), log once.
+- Any exception in `verdict()` → `allow` + anonymize (never 500 a client's page);
+  counter incremented.
+- DNS-guard / nft sync failure → audit-log, leave previous list in place (atomic
+  writes ⇒ no partial state).
+- `privacy_guard` never blocks the worker loop — in-memory lookups only; resolve/
+  correlate is offline.
+
+### 6.4 Tests
+
+- `privacy.py` units (no mitmproxy): classify pure vs load-bearing; `verdict()` each
+  branch incl. Fort-Knox; jar determinism (same inputs→same id; different client→
+  different id; format shaping per cookie pattern); fail-safe paths.
+- `autolearn`: cookie-xsite detection (id reused ≥2 sites, pre-consent); exclusive-IP
+  gate (first-party co-resident → excluded, CDN range → excluded); atomic writes.
+- Integration: synthetic flow through `privacy_guard` hooks — cookie forged not
+  dropped, headers anonymized, 204 on pure tracker.
+- CSPN: audit entry for every nft/DNS mutation with reason + TTL; `reset_client`
+  wipes jar rows.
+
+---
+
+## 7. Rollout
+
+1. Ship with `privacy_enforce=false` (observe-only). Verify cartography + learned
+   lists populate; no client impact.
+2. Soak window; review the learned blacklist and would-be exclusive-IP set in the UI
+   for false positives.
+3. Arm `privacy_enforce=true` (poison + anonymize + DNS feed + HTTP block).
+4. Separately arm `privacy_ip_drop=true` after the exclusive-IP set looks clean.
+5. Fort-Knox is per-site opt-in throughout — never auto-armed.
+
+Deploy respects the board rules: no mass daemon restart (only
+`secubox-toolbox-mitm.service` reload for the new addon), shared `/…/secubox`
+parents stay 0755, secret in `/etc/secubox/secrets/` 0600.
+
+---
+
+## 8. Out of scope (future specs)
+
+- Active decoy traffic generation (bandwidth/CPU cost, IP-reputation risk).
+- Rotating / pooled-k-anonymity identities (chose stable-per-client).
+- Default-on first-party isolation with auto-grown allowlist.
--- a/wiki/Anti-Track.md
+++ b/wiki/Anti-Track.md
@ -0,0 +1,113 @@
+# Anti-Track v2
+
+**[EN](Anti-Track)** | [FR](Anti-Track-FR) | **🟡 WALL · 🟣 MIND** | 🔒
+
+> Bloque · Empoisonne · Anonymise — *ta vie privée n'est pas à vendre.*
+
+![Anti-Track v2 — Bloque, Empoisonne, Anonymise](images/anti-track-v2-poster.png)
+
+Anti-Track is the SecuBox privacy layer that runs inside the transparent WAF
+(`secubox-toolbox` / mitmproxy). It protects every device on your LAN without any
+client-side install: trackers are stopped, fooled, or scrubbed before they ever
+profile you.
+
+---
+
+## 🟡 The three layers (WALL)
+
+Every request to a known tracker is handled by one of three actions. The system
+**fails safe**: when unsure, it poisons (never breaks a page) rather than blocks.
+
+### 1️⃣ Bloque — pure trackers
+
+Hosts that exist *only* to track (analytics beacons, pixels, data brokers) are
+blocked at three depths:
+
+| Depth | Mechanism |
+|-------|-----------|
+| 🧱 **DNS refuse** | the domain never resolves — cheapest, catches even non-proxied flows |
+| 🛑 **IP drop** | nftables drop for IPs that serve *exclusively* trackers (CDN/cloud ranges are allowlisted, never dropped) |
+| ✉️ **HTTP 204** | the proxy answers the tracker call with an empty success |
+
+A host is promoted to "pure" only after it is confirmed beacon-only across **≥2
+sites**. *Ici, pas de quartier.*
+
+### 2️⃣ Empoisonne — load-bearing trackers
+
+Trackers that also carry needed content (tag managers, CDN-hosted scripts) can't be
+blocked without breaking the page. Instead they get a **stable fake identity**: the
+client presents fabricated-but-valid cookie values the target accepts, so the
+tracker builds a coherent profile of *a person who does not exist*.
+
+- The fake identity is **persistent** ("rémanent") — same fiction every visit, no
+  rotation tell that would reveal a blocker.
+- It is **per-device, per-tracker**, and never derived from your real data.
+- Other signals (referer, UA hints, locale, screen) are degraded in the same flow —
+  *des infos bidon pour des pisteurs perdus.*
+
+### 3️⃣ Anonymise — every flow
+
+Always-on hygiene applied to all traffic, including the legitimate first-party site:
+
+- strip operator/carrier headers (`MSISDN`, `x-acr`, `x-wap-*`, `X-Forwarded-For`,
+  `Referer` to trackers, re-identification `ETag`s…)
+- pin `DNT: 1` and `Sec-GPC: 1`
+
+*Propre, léger, anonyme.*
+
+---
+
+## 🔒 Fort Knox — first-party-only (opt-in)
+
+For sensitive sites you can arm **Fort Knox** per-site: every third-party request is
+blocked — tracker or not — leaving only the requested site itself. Maximum surface
+reduction. It breaks many normal sites (embeds, CDNs, fonts), so it is **off by
+default** and armed site-by-site.
+
+> *Nous ne voyons rien. Nous ne gardons rien. Vous gardez le contrôle.*
+
+---
+
+## 🟢 Configuration (ROOT)
+
+Toggles live in `/etc/secubox/toolbox/filters.json` and hot-reload (no restart):
+
+| Key | Default | Effect |
+|-----|---------|--------|
+| `privacy_enforce` | `false` | master switch — off = **observe-only** (watch, never act) |
+| `privacy_poison` | `true` | forge a stable fake identity for load-bearing trackers |
+| `privacy_anonymize` | `true` | always-on header hygiene (DNT/GPC, strip operator headers) |
+| `privacy_ip_drop` | `false` | nft-drop exclusive-tracker IPs |
+| `privacy_dns_feed` | `true` | feed the learned blacklist into `secubox-dns-guard` |
+| `fortknox_sites` | `[]` | per-site first-party-only opt-in list |
+
+**Observe-only first.** Anti-Track deploys *dark*: it watches and learns who tracks
+you, you review the findings in the dashboard, then you arm enforcement. *On
+regarde, mais on ne conserve pas.*
+
+---
+
+## 🟣 How it learns (MIND)
+
+The blacklist is not a static list — it is learned from your own traffic, hourly:
+
+- **cookie-xsite** — a domain that sets a third-party cookie whose id is reused
+  across ≥2 of your sites (the textbook definition of a tracking cookie)
+- **opgrade** — operator-grade / data-broker hosts seen cross-site
+- **threat-intel** — IOC feeds (ThreatFox, Feodo, SSLBL)
+
+Every block, drop, and poison is written to the immutable audit log
+(`/var/log/secubox/audit.log`) with a reason and a TTL, so nothing is permanent and
+everything is reviewable.
+
+---
+
+## See also
+
+- [[Android-ToolBox]] — one-tap R3 onboarding for client devices
+- [[Browser-Extension]] — tracker cartography in the browser
+- Design spec: `docs/superpowers/specs/2026-06-17-anti-tracking-v2-design.md`
+
+---
+
+*© 2026 CyberMind · Notre-Dame-du-Cruet, Savoie*
--- a/wiki/_Sidebar.md
+++ b/wiki/_Sidebar.md
@ -40,6 +40,7 @@

 ### 🟣 MIND — Modules

+* [[Anti-Track]] 🛡️ bloque · empoisonne · anonymise
 * [[MODULES-EN|Modules]] 🇬🇧
 * [[MODULES-FR]] 🇫🇷
 * [[MODULES-DE]] 🇩🇪
--- a/wiki/images/README.md
+++ b/wiki/images/README.md
@ -0,0 +1,9 @@
+# Wiki images
+
+Local image assets referenced by wiki pages.
+
+## Expected files
+
+| File | Used by | Notes |
+|------|---------|-------|
+| `anti-track-v2-poster.png` | [[Anti-Track]] | Comic-style hero poster (Bloque · Empoisonne · Anonymise). Portrait, ~1024×1536. |
Author	SHA1	Message	Date
CyberMind-FR	981acc4b44	docs(wiki): add Anti-Track v2 page + sidebar entry Some checks are pending License Headers / check (push) Waiting to run Details	2026-06-17 15:15:28 +02:00
CyberMind-FR	bd54d82ae2	docs: anti-tracking v2 design — layered block/poison/anonymize	2026-06-17 15:15:28 +02:00