Compare commits

...

11 Commits

Author SHA1 Message Date
4922aada7f docs: HISTORY + WIP for the 2026-06-14 toolbox sprint (2.6.23→2.6.36)
Some checks are pending
License Headers / check (push) Waiting to run
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 18:42:32 +02:00
CyberMind
2523333fc8
Merge pull request #591 from CyberMind-FR/fix/autolearn-exclude-antibot
fix(autolearn): exclude anti-bot from auto-block (#589 follow-up)
2026-06-14 17:39:06 +02:00
8b22d0ff62 fix(autolearn): exclude anti-bot vendors from auto-block (#589 follow-up)
Anti-bot WADs (Datadome/PerimeterX) are often the visited site's own in-path
WAF — auto-blocking them breaks the page. Learner now feeds only
operator-grade/data-broker classified trackers + threat-intel domains;
cross-site threshold 4->2. secubox-toolbox 2.6.36.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 17:38:44 +02:00
CyberMind
f5a4c73248
Merge pull request #590 from CyberMind-FR/feature/589-autolearn-bad-trackers-actors-from-threa
Autolearn bad trackers/actors → ad_ghost block set (#589)
2026-06-14 17:34:39 +02:00
d8a75487ac feat(toolbox): autolearn bad trackers/actors → ad_ghost block set (closes #589)
sbin/secubox-toolbox-autolearn (+ hourly timer) builds a high-confidence
learned-trackers.txt from threat-intel domain IOCs + cross-site domains
classified anti-bot/operator-grade (>=4 sites). Conservative (no plain
CDNs). ad_ghost loads it (mtime-cached), 204s learned hosts too, gated by
the autolearn filter (default on). postinst enables timer + first run.
secubox-toolbox 2.6.35. Unit-tested (learner criteria + learned block).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 17:34:18 +02:00
CyberMind
630cb81e03
Merge pull request #588 from CyberMind-FR/feature/587-eye-graph-domain-suffix-country-cloud-nu
Cartographie: domain-nugget cloud view (#587)
2026-06-14 17:21:02 +02:00
0f3f57e7bb feat(toolbox): domain-nugget cloud view, grouped by country (closes #587)
New '🏷️ Domaines' toggle: trackers folded to eTLD+1, packed as cloud-nugget
bubbles grouped by country (country→domain d3.pack), sized by hits,
tier-coloured, flag+name, click→domain summary. IPs hidden. Mirrors the
donut pack. secubox-toolbox 2.6.34.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 17:20:40 +02:00
CyberMind
7c14bdfe5e
Merge pull request #586 from CyberMind-FR/feature/578-banner-shareable-top-1-pin-quick-button
Shared broadcast pin in every banner (#578)
2026-06-14 16:44:16 +02:00
062131608f feat(toolbox): shared broadcast pin in every banner (closes #578)
Operator-set 📌 pin (or top-1 tracker) in /run/secubox/pin.json shown as
the first chip in every R2/R3 banner (24h). api: GET/POST /admin/pin +
/admin/pin/ui setter (auto-fill from top tracker). inject_banner renders it.
secubox-toolbox 2.6.33.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 16:43:56 +02:00
CyberMind
1de8b29865
Merge pull request #585 from CyberMind-FR/feature/577-perf-video-photo-cdn-proxy-cache-shared
Shared media proxy-cache (default off) (#577)
2026-06-14 15:14:35 +02:00
3b8daf964e feat(toolbox): shared media proxy-cache, default off (closes #577)
mitmproxy_addons/media_cache.py — one upstream fetch serves all R2/R3
clients: cacheable GET media/static (image/video-segment/audio/font/css/js)
on disk (/var/cache/secubox/toolbox/media), keyed by URL. Safety: 16MB/obj
cap gated on Content-Length (large video passthrough, no RAM hold), 2GB LRU,
skips Range/auth/Set-Cookie/no-store, fail-open. Opt-in filter media_cache.
api /admin/cache stats + WebUI toggle + launcher + postinst dir.
secubox-toolbox 2.6.32. Unit-tested (HIT across clients, cap, range, segment).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 15:13:59 +02:00
15 changed files with 775 additions and 5 deletions

View File

@ -3,6 +3,49 @@
---
## 2026-06-14 — ToolBoX privacy/perf sprint : 2.6.23 → 2.6.36, all live on gk2
Large feature sprint on `secubox-toolbox` (built + merged + deployed live,
kbin healthy) + clients + two live fixes. Each shipped via PR + merge +
build + deploy.
**Toolbox (`secubox-toolbox` 2.6.23 → 2.6.36):**
- #560 protective mode — tracker alerting + active **spoofer** (strip
operator/tracking headers, drop 3rd-party cookies, DNT/GPC). Live in
`spoof` on the 4 R3 workers + R2.
- #566 modular **filters** (`/etc/secubox/toolbox/filters.json`, WebUI
`/admin/filters/ui`) + R3+/R4 **ad/banner ghoster** (ad-hiding CSS +
204 ad/tracker hosts ; savings → banner quick-stats).
- #584 ad ghosting = **collapse** (no placeholder ; reverted #576 black-hole).
- #577 shared **media proxy-cache** (image/video-segment, 16 MB/obj cap,
2 GB LRU, default OFF/opt-in) — `/admin/cache`.
- #589/#591 **autolearn** bad trackers → ad_ghost block set (threat-intel
domains + operator-grade cross-site ; anti-bot excluded) + hourly timer.
- #553/#549 cartographie **donut** (continent→country) + #587
**domain-nugget** cloud (country→eTLD+1) + #575 **IP nodes hidden**
(flag+name only) + #555 **favicons** of major sites (never IPs).
- #545/#572 banner: neon → colourful **emoji-chip guirlande** ;
inspected→**protected** on R3+/R4 ; #578 shared **pin** broadcast
(`/admin/pin/ui`).
- #570 DPI **media/content-type statistifier** + donut (`/admin/media/ui`).
- #574 webext popup **protection panel** ; #568 top-tracker list capped 5.
- #562 `/ca/fingerprint` surfaces the **R3 CA** (D5:E4:3A) on the tunnel.
- #581 **postinst fix** : enabled units get a real `restart` on upgrade
(was leaving the portal dead → kbin 503 ; bit us twice).
- #516 review (#564): `detect_antibot` → (vendor, **is_challenge**),
response-level (cf-mitigated / non-200 token) — deployment vs challenge.
**Clients:** Android APK **v0.3.0** (real zero-tap : launch + boot
auto-onboard) ; webext **v0.1.4** (crash-fix const-ext, favicons, popup
protection panel) — both served from the cabine + GitHub releases.
**Live fixes:** Nextcloud iPhone photo sync (disabled broken
`files_antivirus` + raised PHP upload limits) ; kbin 503 root-caused →
#581.
**Open / blocked:** #592 unified webmail-hub (Gmail OAuth2 + Gandi + OVH) —
design filed, BLOCKED on a Google OAuth client + operator decisions.
## 2026-06-13 — Browser extension : emancipate cartographie live (ref #532)
Nouveau client `clients/webext-toolbox/` (MV3 Firefox `.xpi` + Chromium),

View File

@ -1,5 +1,32 @@
# WIP — Work In Progress
*Mis à jour : 2026-06-13*
*Mis à jour : 2026-06-14*
---
## 🔄 2026-06-14 : ToolBoX privacy/perf sprint — 2.6.36 live (see HISTORY)
Tout mergé + déployé sur gk2 (kbin sain, `secubox-toolbox 2.6.36`).
Détail complet dans HISTORY 2026-06-14. Résumé :
- ✅ Protective spoof (#560), modular filters + ad-ghoster (#566, collapse
#584), media cache opt-in (#577), autolearn (#589/#591), DPI media donut
(#570), donut + domain-nugget cartographie (#553/#587, IP cachées #575,
favicons #555), guirlande banner + pin (#572/#578), webext popup panel
(#574), /ca/fingerprint R3 (#562), postinst restart fix (#581),
detect_antibot deployment-vs-challenge (#564).
- ✅ Clients : APK v0.3.0 (zero-tap launch+boot), webext v0.1.4.
- ✅ Fixes live : Nextcloud iPhone photos (files_antivirus off + PHP
limits), kbin 503 (#581).
### ⬜ Next Up
- **#592 secubox-webmail-hub** (Gmail OAuth2 + Gandi + OVH, inbox unifié) —
design filé, **BLOQUÉ** : besoin d'un client OAuth Google (client_id/
secret/redirect) + nom de vhost + (read-only Phase 1 ?). Phase 1 IMAP
(Gandi/OVH) peut démarrer sans OAuth sur "start phase 1".
- Côté user : re-trust R3 CA `D5:E4:3A` sur l'iPhone (bannière HTTPS) ;
tester l'upload photo Nextcloud ; activer `media_cache` si voulu
(`/admin/filters/ui`) et surveiller `/admin/cache`.
---

View File

@ -1,3 +1,63 @@
secubox-toolbox (2.6.36-1~bookworm1) bookworm; urgency=medium
* fix(autolearn): exclude anti-bot vendors from the auto-block list (#589
follow-up). Anti-bot WADs (Datadome/PerimeterX) often sit in the visited
site's own path, so auto-blocking them would break the page. The learner
now feeds only OPERATOR-GRADE/data-broker classified trackers (+ threat-
intel domains); cross-site threshold lowered 4→2.
-- Gerald KERMA <devel@cybermind.fr> Sun, 14 Jun 2026 16:50:00 +0200
secubox-toolbox (2.6.35-1~bookworm1) bookworm; urgency=medium
* Autolearn bad trackers/actors (#589) — feeds ad_ghost's block set.
- sbin/secubox-toolbox-autolearn (+ hourly timer) builds a HIGH-
confidence list /var/lib/secubox/toolbox/learned-trackers.txt from
(1) threat-intel domain IOCs (threatfox malicious) and (2) cross-site
domains CLASSIFIED anti-bot/operator-grade seen on >=4 sites.
Conservative — plain cross-site CDNs are NOT learned.
- ad_ghost.py loads it (mtime-cached) and 204s learned hosts too
(X-SecuBox-Ghost: learned), gated by the new `autolearn` filter
(default on). postinst enables the timer + runs once.
-- Gerald KERMA <devel@cybermind.fr> Sun, 14 Jun 2026 16:30:00 +0200
secubox-toolbox (2.6.34-1~bookworm1) bookworm; urgency=medium
* Cartographie: domain-nugget cloud view (#587). New "🏷️ Domaines" toggle
(alongside 🍩 Donuts / 👁️ Œil) — trackers folded to eTLD+1 and packed as
cloud-nugget bubbles grouped by country (country→domain d3.pack), sized by
hits, tier-coloured, flag+name labels, click → domain summary. IPs hidden.
-- Gerald KERMA <devel@cybermind.fr> Sun, 14 Jun 2026 15:45:00 +0200
secubox-toolbox (2.6.33-1~bookworm1) bookworm; urgency=medium
* Shared broadcast pin in every banner (#578). An operator-set "📌 pin"
(or the top-1 tracker of the day) is stored in /run/secubox/pin.json and
shown as the first chip in EVERY R2/R3 client's banner (24 h window) —
a shared signal around all banner injections. api: GET/POST /admin/pin
+ GET /admin/pin/ui setter (with an "use top-1 tracker" auto-fill from
the social aggregate). inject_banner reads + renders it.
-- Gerald KERMA <devel@cybermind.fr> Sun, 14 Jun 2026 15:15:00 +0200
secubox-toolbox (2.6.32-1~bookworm1) bookworm; urgency=medium
* Shared media proxy-cache (#577) — DEFAULT OFF (opt-in filter `media_cache`).
- mitmproxy_addons/media_cache.py : cacheable GET media/static (image /
video segment / audio / font / css / js) stored on disk under
/var/cache/secubox/toolbox/media, keyed by URL, served from cache to
EVERY R2/R3 client (one upstream fetch → all clients). Safety rails
for the RAM-light board : 16 MB/object cap gated on Content-Length
(large/progressive video passed through, never RAM-held by us), 2 GB
on-disk LRU, never caches Range/auth/Set-Cookie/no-store, fail-open.
- filters: media_cache toggle (WebUI panel). api: GET /admin/cache
(hits/misses/hit-rate/Mo served/size). postinst: cache dir.
Wired into the mitm-wg launcher (request hook serves, response stores).
-- Gerald KERMA <devel@cybermind.fr> Sun, 14 Jun 2026 14:45:00 +0200
secubox-toolbox (2.6.31-1~bookworm1) bookworm; urgency=medium
* ad_ghost: remove ad placeholders entirely (#584, reverses #576). Ghosted

View File

@ -61,6 +61,9 @@ SBXFILTERS
# 4. Storage dir (SQLite + future PDF reports)
install -d -m 0750 -o secubox-toolbox -g secubox-toolbox /var/lib/secubox/toolbox
# #577 : shared media proxy-cache dir (opt-in via filters ; 2 GB LRU).
install -d -m 0750 -o secubox-toolbox -g secubox-toolbox /var/cache/secubox/toolbox/media 2>/dev/null || \
mkdir -p /var/cache/secubox/toolbox/media
# #536 : Android APK serve dir + best-effort fetch of the latest
# release asset (so GET /wg/toolbox.apk serves it locally/offline).
# Non-blocking : if there's no release yet / no network, the endpoint
@ -208,6 +211,11 @@ fi
# until the operator opts in via a SECUBOX_ESCALATE_* drop-in.
systemctl enable secubox-escalate.timer 2>/dev/null || true
systemctl start secubox-escalate.timer 2>/dev/null || true
# #589 : autolearn bad-tracker timer (hourly) + a first run now so
# the learned list exists immediately for ad_ghost.
systemctl enable secubox-toolbox-autolearn.timer 2>/dev/null || true
systemctl start secubox-toolbox-autolearn.timer 2>/dev/null || true
/usr/sbin/secubox-toolbox-autolearn 2>&1 | head -1 || true
fi
fi

View File

@ -35,6 +35,10 @@ override_dh_installsystemd:
# Install the secondary unit manually (dh_installsystemd expects 1 unit/pkg).
install -d debian/secubox-toolbox/lib/systemd/system
install -m 0644 systemd/secubox-toolbox-mitm.service debian/secubox-toolbox/lib/systemd/system/
# #589 : autolearn bad-tracker learner + hourly timer
install -m 0755 sbin/secubox-toolbox-autolearn debian/secubox-toolbox/usr/sbin/
install -m 0644 systemd/secubox-toolbox-autolearn.service debian/secubox-toolbox/lib/systemd/system/
install -m 0644 systemd/secubox-toolbox-autolearn.timer debian/secubox-toolbox/lib/systemd/system/
# Phase 6.P (#496) : systemd drop-ins for RuntimeMaxSec=6h on mitm + mitm-wg
install -d debian/secubox-toolbox/lib/systemd/system/secubox-toolbox-mitm.service.d
install -m 0644 systemd/secubox-toolbox-mitm.service.d/10-runtime-max.conf \

View File

@ -46,6 +46,43 @@ _AD_HOST = re.compile(
re.IGNORECASE,
)
# #589 — auto-learned bad hosts (threat-intel + classified cross-site
# trackers), rebuilt hourly by secubox-toolbox-autolearn. Loaded with a
# mtime check so a fresh learn takes effect within ~60 s, no restart.
_LEARNED_PATH = "/var/lib/secubox/toolbox/learned-trackers.txt"
_learned: set = set()
_learned_mtime = 0.0
_learned_check = 0.0
_2L_TLD = ("co.uk", "com.au", "co.jp", "co.nz", "com.br", "co.za", "gouv.fr")
def _registrable(host: str):
host = (host or "").split(":")[0].lower().strip(".")
if not host or host.replace(".", "").isdigit() or ":" in host:
return None
p = host.split(".")
if len(p) <= 2:
return host
last2 = ".".join(p[-2:])
return ".".join(p[-3:]) if (last2 in _2L_TLD and len(p) >= 3) else last2
def _learned_set() -> set:
global _learned, _learned_mtime, _learned_check
now = time.time()
if now - _learned_check < 60:
return _learned
_learned_check = now
try:
m = os.path.getmtime(_LEARNED_PATH)
if m != _learned_mtime:
with open(_LEARNED_PATH, encoding="utf-8") as f:
_learned = {ln.strip().lower() for ln in f if ln.strip()}
_learned_mtime = m
except Exception:
pass
return _learned
# Cosmetic hide selectors, grouped so the WebUI can toggle each category.
_COSMETIC = {
"ads": (
@ -124,10 +161,18 @@ class AdGhost:
if not _is_r3plus(flow):
return
host = flow.request.pretty_host or ""
if _AD_HOST.search(host):
blocked = bool(_AD_HOST.search(host))
learned = False
if not blocked and f.get("autolearn", True):
reg = _registrable(host)
if reg and (reg in _learned_set() or host.lower() in _learned_set()):
blocked = learned = True
if blocked:
flow.response = http.Response.make(
204, b"", {"X-SecuBox-Ghost": "blocked"})
204, b"", {"X-SecuBox-Ghost": "learned" if learned else "blocked"})
_counts["blocked_requests"] += 1
if learned:
_counts["learned_blocks"] = _counts.get("learned_blocks", 0) + 1
_counts["bytes_saved_est"] += _EST_BYTES_PER_REQ
_flush()

View File

@ -452,6 +452,9 @@ def _banner_html_dynamic(sha1: str, ctx: dict, csp_strict: bool,
# renders correctly regardless of page charset (some legacy pages declare
# iso-8859-1 which would mangle our raw UTF-8 emoji bytes).
right_parts = [f"{_ncr(ctx['status_icon'])} {ctx['status']}"]
# #578 — shared broadcast pin first, so every banner shows it.
if ctx.get("pin"):
right_parts.insert(0, "&#x1F4CC; " + _ncr(ctx["pin"])) # 📌
if ctx["flag"]:
# Phase 6.M (#496) : flags are Unicode "regional indicator" pairs
# (🇫🇷 = U+1F1EB + U+1F1F7). NCR-encoded pairs do NOT join into a
@ -707,6 +710,18 @@ class InjectBanner:
except Exception:
ctx["ghost_blocked"] = 0
ctx["ghost_kb"] = 0
# #578 — shared broadcast pin (operator/top-1), shown in every
# client's banner. Fresh window 24 h.
ctx["pin"] = ""
try:
import json as _json
import time as _time
with open("/run/secubox/pin.json", "r", encoding="utf-8") as _pf:
_p = _json.load(_pf)
if _p.get("text") and (_time.time() - _p.get("ts", 0)) < 86400:
ctx["pin"] = str(_p["text"])[:80]
except Exception:
pass
csp_strict = _detect_csp_strict(flow)
report_url = _report_url_for(flow)
level_label = _level_label(flow)

View File

@ -0,0 +1,250 @@
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
#
# #577 — shared media proxy-cache. One fetch serves every R2/R3 client:
# cacheable GET media/static (image / video segment / audio / font / css /
# js) is stored on disk, keyed by URL, and served from cache on subsequent
# requests from ANY client — saving upstream bandwidth + latency.
#
# Safety rails for the RAM-light cabine:
# - DEFAULT OFF (filter `media_cache` toggle) ; instantly killable.
# - per-object cap (16 MB) gated on Content-Length BEFORE buffering, so a
# large/progressive video is passed through, never cached/RAM-held by us.
# - 2 GB on-disk LRU budget, evicted oldest-first.
# - never caches Range/partial, authenticated, Set-Cookie, no-store/private.
# - fail-open everywhere : a cache error must never break the flow.
from __future__ import annotations
import hashlib
import json
import os
import re
import sys
import time
from mitmproxy import http
try:
if "/usr/lib/secubox/toolbox" not in sys.path:
sys.path.insert(0, "/usr/lib/secubox/toolbox")
from secubox_toolbox.filters import get_filters
except Exception:
def get_filters(force: bool = False):
return {"media_cache": False}
CACHE_DIR = "/var/cache/secubox/toolbox/media"
STATS = "/run/secubox/media_cache.json"
MAX_OBJ = 16 * 1024 * 1024 # 16 MB / object
MAX_TOTAL = 2 * 1024 * 1024 * 1024 # 2 GB on disk
DEFAULT_TTL = 3600 # 1 h when upstream gives no max-age
_CACHEABLE = ("image/", "video/", "audio/", "font/", "text/css",
"javascript", "ecmascript", "application/font",
"application/vnd.ms-fontobject")
_MAXAGE = re.compile(r"max-age\s*=\s*(\d+)", re.IGNORECASE)
# in-memory index (mitmproxy hooks run single-threaded in the event loop)
_index: dict = {} # key -> {"size": int, "exp": float, "atime": float, "ct": str}
_total = 0
_stats = {"hits": 0, "misses": 0, "stored": 0, "evicted": 0,
"bytes_served": 0, "since": int(time.time())}
_last_flush = 0.0
def _key(url: str) -> str:
return hashlib.sha256(url.encode("utf-8", "ignore")).hexdigest()
def _paths(key: str):
d = os.path.join(CACHE_DIR, key[:2])
return os.path.join(d, key), os.path.join(d, key + ".m")
def _enabled() -> bool:
try:
return bool(get_filters().get("media_cache"))
except Exception:
return False
def _cacheable_ct(ct: str) -> bool:
ct = (ct or "").split(";", 1)[0].strip().lower()
return bool(ct) and any(f in ct for f in _CACHEABLE)
def _flush_stats(force: bool = False) -> None:
global _last_flush
now = time.time()
if not force and (now - _last_flush) < 5:
return
_last_flush = now
try:
os.makedirs(os.path.dirname(STATS), exist_ok=True)
with open(STATS, "w", encoding="utf-8") as f:
json.dump({**_stats, "objects": len(_index),
"bytes_cached": _total, "updated": int(now)}, f)
except Exception:
pass
def _load_index() -> None:
"""Rebuild the index from disk on startup (bounded, best-effort)."""
global _total
try:
for sub in os.listdir(CACHE_DIR):
d = os.path.join(CACHE_DIR, sub)
if not os.path.isdir(d):
continue
for name in os.listdir(d):
if name.endswith(".m"):
continue
fp = os.path.join(d, name)
try:
st = os.stat(fp)
meta = {}
mp = fp + ".m"
if os.path.exists(mp):
with open(mp, encoding="utf-8") as mf:
meta = json.load(mf)
_index[name] = {"size": st.st_size,
"exp": meta.get("exp", 0),
"atime": st.st_atime,
"ct": meta.get("ct", "")}
_total += st.st_size
except Exception:
pass
except FileNotFoundError:
pass
def _evict_if_needed() -> None:
global _total
if _total <= MAX_TOTAL:
return
# oldest atime first
for key, e in sorted(_index.items(), key=lambda kv: kv[1]["atime"]):
if _total <= MAX_TOTAL:
break
body, meta = _paths(key)
try:
os.remove(body)
except OSError:
pass
try:
os.remove(meta)
except OSError:
pass
_total -= e["size"]
_index.pop(key, None)
_stats["evicted"] += 1
class MediaCache:
def __init__(self):
try:
os.makedirs(CACHE_DIR, exist_ok=True)
_load_index()
except Exception:
pass
# ── serve from cache (request hook) ──
def request(self, flow: http.HTTPFlow) -> None:
if not _enabled():
return
r = flow.request
if r.method != "GET":
return
if "range" in r.headers or "authorization" in r.headers:
return
key = _key(r.pretty_url or "")
e = _index.get(key)
if not e:
_stats["misses"] += 1
return
if e["exp"] and e["exp"] < time.time():
return # stale — let it revalidate/refetch (and re-store)
body_path, _m = _paths(key)
try:
with open(body_path, "rb") as f:
body = f.read()
except OSError:
_index.pop(key, None)
return
e["atime"] = time.time()
try:
os.utime(body_path, None)
except OSError:
pass
_stats["hits"] += 1
_stats["bytes_served"] += len(body)
_flush_stats()
flow.response = http.Response.make(
200, body,
{"Content-Type": e.get("ct") or "application/octet-stream",
"X-SecuBox-Cache": "HIT",
"Cache-Control": "public, max-age=300"},
)
# ── store to cache (response hook) ──
def response(self, flow: http.HTTPFlow) -> None:
global _total
if not _enabled() or not flow.response:
return
r = flow.request
resp = flow.response
if r.method != "GET" or resp.status_code != 200:
return
if "range" in r.headers or "authorization" in r.headers:
return
if resp.headers.get("x-secubox-cache") == "HIT":
return
cc = (resp.headers.get("cache-control", "") or "").lower()
if "no-store" in cc or "private" in cc:
return
if "set-cookie" in resp.headers:
return
if not _cacheable_ct(resp.headers.get("content-type", "")):
return
# size gate on the HEADER — never cache (nor force-buffer) > MAX_OBJ
try:
clen = int(resp.headers.get("content-length", "0") or "0")
except (TypeError, ValueError):
clen = 0
if clen <= 0 or clen > MAX_OBJ:
return
try:
body = resp.content or b""
except Exception:
return
if not body or len(body) > MAX_OBJ:
return
# freshness window
m = _MAXAGE.search(cc)
ttl = int(m.group(1)) if m else DEFAULT_TTL
if ttl <= 0:
return
key = _key(r.pretty_url or "")
body_path, meta_path = _paths(key)
try:
os.makedirs(os.path.dirname(body_path), exist_ok=True)
tmp = body_path + ".tmp"
with open(tmp, "wb") as f:
f.write(body)
os.replace(tmp, body_path)
with open(meta_path, "w", encoding="utf-8") as f:
json.dump({"ct": (resp.headers.get("content-type", "") or "").split(";")[0],
"exp": time.time() + ttl,
"url": (r.pretty_url or "")[:300]}, f)
except Exception:
return
old = _index.get(key, {}).get("size", 0)
_total += len(body) - old
_index[key] = {"size": len(body), "exp": time.time() + ttl,
"atime": time.time(),
"ct": (resp.headers.get("content-type", "") or "").split(";")[0]}
_stats["stored"] += 1
_evict_if_needed()
_flush_stats()
addons = [MediaCache()]

View File

@ -0,0 +1,106 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
#
# #589 — autolearn bad trackers/actors. Builds a HIGH-CONFIDENCE block list
# that ad_ghost consults (in addition to its static ad-host regex), from:
# 1. threat-intel domain IOCs (threatfox malicious C2/malware domains) ;
# 2. cross-site OPERATOR-GRADE / data-broker tracker domains
# (social_host_meta.opgrade_vendor) seen on >= MIN_SITES sites.
# Deliberately conservative — a plain cross-site CDN (fonts, shared assets)
# is NOT learned, and ANTI-BOT vendors are NOT learned either : a site's own
# WAF (Datadome/PerimeterX) sits in the 1st-party path, so blocking it would
# break the site. So live R3 users don't get legit sites broken. Run hourly
# by secubox-toolbox-autolearn.timer ; output read by ad_ghost (cached).
from __future__ import annotations
import json
import sqlite3
import sys
import time
DB = "/var/lib/secubox/toolbox/toolbox.db"
OUT = "/var/lib/secubox/toolbox/learned-trackers.txt"
MIN_SITES = 2 # cross-site threshold for operator-grade trackers
MAX_ENTRIES = 8000
_2L = ("co.uk", "com.au", "co.jp", "co.nz", "com.br", "co.za", "gouv.fr")
def registrable(host: str):
host = (host or "").split(":")[0].lower().strip(".")
if not host or host.replace(".", "").isdigit() or ":" in host:
return None
p = host.split(".")
if len(p) <= 2:
return host
last2 = ".".join(p[-2:])
return ".".join(p[-3:]) if (last2 in _2L and len(p) >= 3) else last2
def main() -> int:
learned: set[str] = set()
try:
c = sqlite3.connect(DB, timeout=10)
c.row_factory = sqlite3.Row
except Exception as e:
sys.stderr.write(f"autolearn: cannot open {DB}: {e}\n")
return 0
# 1) threat-intel malicious domains (high confidence).
try:
for r in c.execute("SELECT DISTINCT ioc FROM threat_intel WHERE type='domain'"):
d = registrable(r["ioc"])
if d:
learned.add(d)
except Exception:
pass
ti = len(learned)
# 2) cross-site OPERATOR-GRADE / data-broker trackers ONLY. Anti-bot
# vendors are deliberately excluded — they're frequently the visited
# site's own WAF (in-path), so blocking them breaks the page.
try:
classified = set()
for r in c.execute(
"SELECT tracker_domain FROM social_host_meta "
"WHERE opgrade_vendor IS NOT NULL"):
d = registrable(r["tracker_domain"])
if d:
classified.add(d)
# distinct 1st-party sites per registrable tracker domain
sites: dict[str, set] = {}
for r in c.execute("SELECT tracker_domain, sites_jsonl FROM social_nodes"):
d = registrable(r["tracker_domain"])
if not d or d not in classified:
continue
try:
for s in json.loads(r["sites_jsonl"] or "[]"):
sites.setdefault(d, set()).add(s)
except Exception:
pass
for d, ss in sites.items():
if len(ss) >= MIN_SITES:
learned.add(d)
except Exception:
pass
c.close()
learned.discard(None)
out = sorted(learned)[:MAX_ENTRIES]
try:
tmp = OUT + ".tmp"
with open(tmp, "w", encoding="utf-8") as f:
f.write("\n".join(out) + ("\n" if out else ""))
import os
os.replace(tmp, OUT)
except Exception as e:
sys.stderr.write(f"autolearn: write failed: {e}\n")
return 0
sys.stderr.write(
f"autolearn: {len(out)} hosts learned ({ti} threat-intel + "
f"{len(out) - ti} classified cross-site) @ {int(time.time())}\n")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@ -110,7 +110,7 @@ fi
# ad_ghost (#566) runs right after protective_mode: for R3+/R4 it 204s known
# ad/tracker hosts (bandwidth save) at request time and injects ad-hiding CSS
# on HTML responses. Gated by the modular filter config (toolbox WebUI).
for addon in inject_xff utiq_defense protective_mode ad_ghost local_store social_graph inject_banner dpi cookies avatar ja4 soc_relay cert_pin_detect media_stats; do
for addon in inject_xff utiq_defense protective_mode ad_ghost media_cache local_store social_graph inject_banner dpi cookies avatar ja4 soc_relay cert_pin_detect media_stats; do
ARGS+=(-s "$ADDON_DIR/${addon}.py")
done

View File

@ -2455,6 +2455,106 @@ _MEDIA_EMOJI = {
}
@router.get("/admin/cache")
async def admin_cache() -> dict:
"""#577 — shared media cache stats (hits/misses/bytes served/size)."""
import json as _json
from pathlib import Path as _P
out: dict = {"hits": 0, "misses": 0, "stored": 0, "evicted": 0,
"bytes_served": 0, "objects": 0, "bytes_cached": 0,
"since": None, "updated": None}
try:
st = _P("/run/secubox/media_cache.json")
if st.exists():
out.update(_json.loads(st.read_text()))
except Exception:
pass
tot = (out.get("hits", 0) + out.get("misses", 0)) or 1
out["hit_rate"] = round(100 * out.get("hits", 0) / tot, 1)
out["mb_served"] = round(out.get("bytes_served", 0) / 1048576, 1)
out["mb_cached"] = round(out.get("bytes_cached", 0) / 1048576, 1)
try:
from .filters import get_filters as _gf
out["enabled"] = bool(_gf().get("media_cache"))
except Exception:
out["enabled"] = False
return out
_PIN_PATH = "/run/secubox/pin.json"
@router.get("/admin/pin")
async def admin_pin() -> dict:
"""#578 — the shared broadcast pin shown in every client's banner."""
import json as _json
from pathlib import Path as _P
cur = {"text": "", "url": "", "ts": 0, "by": ""}
try:
p = _P(_PIN_PATH)
if p.exists():
cur.update(_json.loads(p.read_text()))
except Exception:
pass
return cur
@router.post("/admin/pin")
async def admin_pin_set(request: Request) -> dict:
"""#578 — set/clear the shared pin (broadcast to all banners). Empty
text clears it."""
import json as _json
import time as _time
from pathlib import Path as _P
try:
body = await request.json()
except Exception:
body = {}
text = (str(body.get("text", "")) if isinstance(body, dict) else "")[:80].strip()
url = (str(body.get("url", "")) if isinstance(body, dict) else "")[:300].strip()
rec = {"text": text, "url": url, "ts": int(_time.time()) if text else 0, "by": "admin"}
try:
_P(_PIN_PATH).parent.mkdir(parents=True, exist_ok=True)
_P(_PIN_PATH).write_text(_json.dumps(rec))
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
return rec
@router.get("/admin/pin/ui", response_class=HTMLResponse)
async def admin_pin_ui() -> HTMLResponse:
"""#578 — minimal pin setter; can auto-fill the current top-1 tracker."""
html = """<!doctype html><html lang=fr><meta charset=utf-8>
<meta name=viewport content="width=device-width,initial-scale=1">
<title>Pin partagé ToolBoX</title>
<style>
body{background:#0a0a0f;color:#e8e6d9;font:14px system-ui,sans-serif;max-width:520px;margin:30px auto;padding:0 18px}
h1{color:#c9a84c;font-size:18px} label{display:block;color:#6b6b7a;font-size:12px;margin:12px 0 4px}
input{width:100%;padding:8px;border-radius:6px;border:1px solid #333;background:#14141c;color:#e8e6d9}
button{margin:14px 8px 0 0;padding:9px 14px;border-radius:6px;border:1px solid #c9a84c;background:#c9a84c;color:#0a0a0f;font-weight:700;cursor:pointer}
button.alt{background:transparent;color:#00d4ff;border-color:#00d4ff}
button.danger{background:transparent;color:#e63946;border-color:#e63946}
#msg{color:#00ff41;min-height:18px;margin-top:10px} .muted{color:#6b6b7a;font-size:12px}
</style>
<h1>📌 Pin partagé (toutes les bannières)</h1>
<p class=muted>Un message épinglé, diffusé dans la bannière de TOUS les clients R2/R3 (24 h).</p>
<label>Texte du pin <input id=text maxlength=80 placeholder="ex: traceur #1 du jour — doubleclick.net"></label>
<label>Lien (optionnel) <input id=url maxlength=300 placeholder="https://…"></label>
<button id=save>📌 Épingler</button>
<button class=alt id=top> Utiliser le traceur #1</button>
<button class=danger id=clear>Retirer</button>
<p id=msg></p>
<script>
const $=s=>document.querySelector(s);
fetch('/admin/pin').then(r=>r.json()).then(p=>{$('#text').value=p.text||'';$('#url').value=p.url||'';});
function post(t,u){return fetch('/admin/pin',{method:'POST',headers:{'content-type':'application/json'},body:JSON.stringify({text:t,url:u})}).then(r=>r.json()).then(()=>{$('#msg').textContent='✓ diffusé';setTimeout(()=>$('#msg').textContent='',1500);});}
$('#save').onclick=()=>post($('#text').value,$('#url').value);
$('#clear').onclick=()=>{$('#text').value='';$('#url').value='';post('','');};
$('#top').onclick=()=>fetch('/admin/social-aggregate?hours=24').then(r=>r.json()).then(d=>{const t=(d.by_tracker_domain||[])[0];if(t){$('#text').value='Traceur #1 : '+t.tracker_domain+' ('+t.hits+' hits)';}else{$('#msg').textContent='aucun traceur';}});
</script></html>"""
return HTMLResponse(content=html)
@router.get("/admin/media")
async def admin_media() -> dict:
"""#570 — DPI media/content-type statistics for the donut UI."""
@ -2570,6 +2670,9 @@ async def admin_filters_ui() -> HTMLResponse:
<h2>Ghosting pub (R3+/R4)</h2>
<label><input type=checkbox data-k=ad_ghost> Masquer pubs/bannières/widgets (cosmétique)</label>
<label><input type=checkbox data-k=ad_ghost_block> Bloquer les hôtes pub/traceurs (économise la bande passante)</label>
<h2>Cache média partagé (#577)</h2>
<label><input type=checkbox data-k=media_cache> Cache média/photo/vidéo partagé (2 Go, 1 fetch tous les clients)</label>
<h2>Catégories ghosting</h2>
<label><input type=checkbox data-c=ads> · catégorie : publicités</label>
<label><input type=checkbox data-c=consent_nag> · catégorie : bandeaux cookies/consentement</label>
<label><input type=checkbox data-c=newsletter> · catégorie : pop-ups newsletter</label>

View File

@ -21,6 +21,8 @@ DEFAULTS: Dict = {
"protective": "spoof", # off | alert | spoof (tracker spoofer)
"ad_ghost": True, # R3+/R4 silent ad/banner/widget ghosting
"ad_ghost_block": True, # 204 known ad/tracker hosts (save bandwidth)
"media_cache": False, # #577 shared media proxy-cache (opt-in)
"autolearn": True, # #589 also block auto-learned bad hosts
"ad_ghost_categories": { # cosmetic ghost groups
"ads": True,
"consent_nag": True,
@ -72,7 +74,7 @@ def set_filters(patch: Dict) -> Dict:
if ck in DEFAULTS["ad_ghost_categories"]})
elif k == "protective" and v in _VALID_PROTECTIVE:
cur["protective"] = v
elif k in ("banner", "ad_ghost", "ad_ghost_block"):
elif k in ("banner", "ad_ghost", "ad_ghost_block", "media_cache", "autolearn"):
cur[k] = bool(v)
try:
os.makedirs(os.path.dirname(FILTERS_PATH), exist_ok=True)

View File

@ -0,0 +1,12 @@
[Unit]
Description=SecuBox ToolBoX — autolearn bad trackers/actors (#589)
Documentation=https://github.com/CyberMind-FR/secubox-deb/issues/589
After=secubox-toolbox.service
[Service]
Type=oneshot
ExecStart=/usr/sbin/secubox-toolbox-autolearn
Nice=10
IOSchedulingClass=idle
# best-effort, never block boot
TimeoutStartSec=120

View File

@ -0,0 +1,11 @@
[Unit]
Description=SecuBox ToolBoX — hourly autolearn of bad trackers (#589)
[Timer]
OnBootSec=10min
OnUnitActiveSec=1h
Persistent=true
RandomizedDelaySec=5min
[Install]
WantedBy=timers.target

View File

@ -85,9 +85,21 @@
function draw(graph) {
if (!graph) return;
if (view === 'donuts') renderDonuts(graph);
else if (view === 'nuggets') renderNuggets(graph);
else render(graph);
}
// #587 — registrable domain (eTLD+1) for the domain-nugget view.
const _2L = ['co.uk', 'com.au', 'co.jp', 'co.nz', 'com.br', 'co.za', 'gouv.fr'];
function registrable(host) {
host = (host || '').split(':')[0].toLowerCase().replace(/^\.+|\.+$/g, '');
if (!host || /^\d{1,3}(\.\d{1,3}){3}$/.test(host) || host.includes(':')) return null;
const p = host.split('.');
if (p.length <= 2) return host;
const last2 = p.slice(-2).join('.');
return (_2L.includes(last2) && p.length >= 3) ? p.slice(-3).join('.') : last2;
}
// Inject the view toggle (🍩 donuts ⇄ 👁️ œil) once, above the svg.
function ensureToggle() {
if (document.getElementById('view-toggle') || !svgEl) return;
@ -107,6 +119,7 @@
return b;
};
bar.appendChild(mk('donuts', '🍩 Donuts'));
bar.appendChild(mk('nuggets', '🏷️ Domaines'));
bar.appendChild(mk('eye', '👁️ Œil'));
svgEl.parentNode.insertBefore(bar, svgEl);
syncToggle();
@ -516,6 +529,77 @@
ndEl.hidden = false;
}
// #587 — domain-nugget cloud : trackers folded to eTLD+1, packed as
// cloud bubbles grouped by country (country→domain d3.pack). IPs hidden.
function _shortDom(d) { d = d || ''; return d.length > 16 ? d.slice(0, 15) + '…' : d; }
function focusDomain(d) {
if (!ndEl) return;
bind('nd_domain', (d.flag || '') + ' ' + d.domain);
bind('nd_country', d.flag || '—'); bind('nd_asn', '—');
bind('nd_cdn', d.tier || '—'); bind('nd_antibot', '—'); bind('nd_opgrade', '—');
bind('nd_sites', '~' + Math.round(d.value) + ' hits');
bind('nd_first_seen', '—'); bind('nd_last_seen', '—');
ndEl.hidden = false;
}
function renderNuggets(graph) {
clearGraph();
const { W, H } = svgSize();
svg.attr('viewBox', `0 0 ${W} ${H}`);
bind('total_trackers', graph.stats.total_trackers || 0);
bind('total_sites', graph.stats.total_sites || 0);
updateAntibotTile(graph.stats.antibot_sites || 0, graph.stats.antibot_vendors || []);
updateOpgradeTile(graph.stats.opgrade_sites || 0, graph.stats.opgrade_vendors || []);
const byDom = new Map();
for (const n of (graph.nodes || [])) {
const d = registrable(n.domain);
if (!d) continue; // skip IPs
let e = byDom.get(d);
if (!e) { e = { domain: d, hits: 0, flag: n.country_flag || '', cc: n.country_iso || '??', tier: n.tier || 'other' }; byDom.set(d, e); }
e.hits += (n.hits || 0);
if (!e.flag && n.country_flag) { e.flag = n.country_flag; e.cc = n.country_iso || '??'; }
}
const doms = [...byDom.values()];
if (!doms.length) return;
const byCC = new Map();
for (const d of doms) {
const k = d.cc || '??';
if (!byCC.has(k)) byCC.set(k, { cc: k, flag: d.flag, list: [] });
byCC.get(k).list.push(d);
}
const root = { name: 'root', children: [...byCC.values()].map(c => ({
name: c.cc, flag: c.flag, country: true,
children: c.list.map(d => ({ leaf: true, domain: d.domain, flag: d.flag, tier: d.tier, value: Math.max(d.hits, 1) })),
})) };
const h = d3.hierarchy(root).sum(d => d.value || 0)
.sort((a, b) => (b.value || 0) - (a.value || 0));
d3.pack().size([W, H]).padding(d => d.depth === 1 ? 12 : 3)(h);
const content = svg.append('g').attr('class', 'content');
content.append('g').selectAll('circle.cc')
.data(h.descendants().filter(d => d.depth === 1)).join('circle')
.attr('cx', d => d.x).attr('cy', d => d.y).attr('r', d => d.r)
.attr('fill', 'rgba(0,212,255,0.05)')
.attr('stroke', 'rgba(0,212,255,0.4)').attr('stroke-dasharray', '2,3');
content.append('g').selectAll('text.cc')
.data(h.descendants().filter(d => d.depth === 1)).join('text')
.attr('x', d => d.x).attr('y', d => d.y - d.r + 13).attr('text-anchor', 'middle')
.attr('fill', 'var(--cyber-cyan,#00d4ff)').attr('font-size', 12).attr('font-weight', 'bold')
.text(d => (d.data.flag || '🏴') + ' ' + (d.data.name || '?'));
const leaves = content.append('g').selectAll('g.nug')
.data(h.leaves()).join('g')
.attr('class', 'node').attr('transform', d => `translate(${d.x},${d.y})`)
.style('cursor', 'pointer').on('click', (ev, d) => focusDomain(d.data));
leaves.append('circle').attr('r', d => d.r)
.attr('fill', d => (TIER[d.data.tier] || TIER.other).c).attr('fill-opacity', 0.85)
.attr('stroke', '#0a0a0f').attr('stroke-width', 0.6);
leaves.append('text').attr('text-anchor', 'middle').attr('dy', '.35em')
.attr('font-size', d => Math.max(7, Math.min(d.r * 0.45, 12)))
.attr('fill', '#0a0a0f')
.text(d => d.r > 14 ? ((d.data.flag ? d.data.flag + ' ' : '') + _shortDom(d.data.domain)) : (d.r > 8 ? _shortDom(d.data.domain) : ''));
leaves.append('title').text(d => d.data.domain + ' — ~' + Math.round(d.data.value) + ' hits');
}
// ─── focus / detail panel ───
function focusNode(node, linkSel) {
if (node.kind !== 'tracker') { ndEl.hidden = true; return; }