Merge pull request #501 from CyberMind-FR/perf/501-banner-injection-quickwins

Perf/501 banner injection quickwins
docs: update WIP/HISTORY/TODO for Phase 10 banner perf + postinst fix (ref #501 )
2026-07-01 17:17:14 +00:00 · 2026-06-09 07:50:43 +02:00 · 2026-06-09 07:14:26 +02:00 · 2026-06-09 07:11:07 +02:00 · 2026-06-09 06:40:42 +02:00 · 2026-06-09 06:27:40 +02:00
13 changed files with 667 additions and 73 deletions
--- a/.claude/HISTORY.md
+++ b/.claude/HISTORY.md
@ -3,6 +3,100 @@
 ---
 ## 2026-06-09 — Phase 10 banner injection perf quick wins + postinst regression fix (ref #501)
 ### Package bumps
 | Package | from → to |
 |---|---|
 | secubox-toolbox | 2.5.0 → **2.5.1** (banner perf, déployé live) |
 | secubox-toolbox | 2.5.1 → **2.5.2** (postinst regression fix, code-only) |
 ### What landed
 **1. Banner injection quick wins** (`secubox-toolbox` 2.5.1, commit `ce059d0f`)
 User signal : "la banner n'apparait qu'en fin de chargement et les
 chargements de pages sont très lents". Quatre changements ciblés :
 * `_host_signals(host)` — nouvelle fonction LRU-cachée (maxsize=2048)
  retournant `(app_emoji, app, flag, country, asn, status, status_icon)`.
  Re-hits coûtent un dict lookup au lieu de 5-50 ms (`classify_host` +
  `whitelist match` + GeoIP DNS+mmdb).
 * `_count_trackers_in_body()` retiré du chemin chaud. Le flag
  `is_tracker_host` (regex cheap sur l'host de la requête) couvre le
  signal privacy ; le scan plein-corps économise 30-200 ms sur les
  publishers lourds.
 * `_MAX_INJECT_BYTES = 2 MB` — skip injection sur gros corps via
  pré-check `Content-Length` + garde défensive `len(body)` pour les
  streamed bodies sans CL.
 * Tile 🎯 N trackers (corps) supprimée ; cookies + ⚠ tracker-host
  conservés.
 Confirmation utilisateur post-déploiement gk2 : "browsing performance
 on iPhone is better... perfect work".
 **2. Postinst regression fix** (`secubox-toolbox` 2.5.2, commit `15f48d9d`)
 Deux régressions silencieuses pendant le déploiement 2.5.0 → 2.5.1 sur gk2 :
 * **kbin.gk2.secubox.in 503 pendant 5 min** : dpkg upgrade a SIGTERMé
  `secubox-toolbox.service` (FastAPI kbin landing) et ne l'a jamais
  redémarré, car `dh_installsystemd --no-start --no-enable` dans
  `debian/rules`. Détecté quand l'utilisateur a signalé "kbin 503".
 * **iPhone tunnel inutilisable** : postinst a écrasé
  `/etc/nftables.d/secubox-toolbox-wg.nft` avec la version single-port
  DNAT, supprimant le fanout Phase 9 (que l'opérateur avait déployé en
  runtime avec `nft -f` sans persistence côté package). Résultat : tout
  le trafic WG R3 pinné sur worker@1 à 97 % CPU, w2-w4 idle. Détecté
  quand l'utilisateur a signalé "browsing excessivement trop lent".
 Fixes postinst-only :
 * Postinst déploie maintenant `secubox-toolbox-wg-fanout.nft` en
  `/etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft`. Le préfixe `zz-`
  garantit le tri alphabétique après le base file dans le glob include
  de `/etc/nftables.conf` → le base file crée la table + chains + UDP
  51820 input rule, puis le zz drop-in flush+repeuple `prerouting`
  avec le numgen fanout map sur ports 8081..8084.
 * Sur upgrade (`$2` set), `systemctl try-restart` sur
  `secubox-toolbox.service`, `secubox-toolbox-mitm.service`, et les 4
  instances `secubox-toolbox-mitm-wg-worker@{1..4}.service`. `try-restart`
  est no-op si l'unité n'est pas active, donc safe sur fresh install.
 ### Mitigations live appliquées sur gk2 (2026-06-09)
 * `systemctl start secubox-toolbox.service` — restaure kbin landing.
 * `cp .../secubox-toolbox-wg-fanout.nft /etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft`
  + `systemctl reload nftables.service` + `systemctl restart
  secubox-toolbox-mitm-wg-worker@1.service` (pour drop les sticky
  flows pinnés sur w1) — restaure le fanout 4-worker.
 ### Mémoire ajoutée
 * `feedback_nft_layered_dropins_persistence.md` — Phase 9 fanout doit
  trier APRÈS son table-creator (zz- prefix) ; ne jamais symlinker en
  place du base file.
 * `feedback_postinst_preserve_runtime_state.md` — dpkg upgrade SIGTERMe
  l'unité ; postinst doit try-restart + redéployer les drop-ins nft
  appliqués en runtime.
 ### Branche
 `perf/501-banner-injection-quickwins` poussée sur origin (commits
 `ce059d0f` + `15f48d9d`). **Pas de PR ouverte** par défaut (rule
 `feedback_no_unprompted_prs.md`).
 ### À faire ensuite
 * Build + deploy `secubox-toolbox 2.5.2` sur gk2 (postinst-only — pas
  de code change ; attendre fenêtre de maintenance).
 * Ouvrir PR #501 sur instruction.
 * Phase 10 future : refactor banner vers JS-driven async (élimine le
  buffer-read pour TOUS les corps, pas seulement < 2 MB).
 ---
 ## 2026-06-08 — Phase 7.E.x LXC hygiene + auth recovery + Phase 8 opening (ref #498, #500)
 ### Package bumps
--- a/.claude/TODO.md
+++ b/.claude/TODO.md
@ -1,10 +1,35 @@
 # TODO — SecuBox-DEB Backlog
-*Mis à jour : 2026-06-08*
+*Mis à jour : 2026-06-09*
 ---
 ## 🔥 P0 — Immediate (in flight)
 ### Phase 10 — Banner injection perf (#501) — ✅ shipped 2026-06-09
 - [x] **Banner perf quick wins** (`secubox-toolbox` 2.5.1, commit `ce059d0f`)
  — LRU `_host_signals` (2048), drop body tracker scan, 2 MB body cap,
  trim dead tile. Deployed live sur gk2, iPhone confirme "better...
  perfect work".
 - [x] **Postinst regression fix** (`secubox-toolbox` 2.5.2, commit `15f48d9d`)
  — auto-deploy fanout drop-in en `zz-`, try-restart sur upgrade. Push
  origin, code-only (pas encore déployé).
 - [ ] **Build + deploy 2.5.2 sur gk2** — postinst-only, attendre fenêtre
  de maintenance (ne pas perturber session iPhone stable).
 - [ ] **Ouvrir PR #501** sur instruction utilisateur (branche poussée :
  `perf/501-banner-injection-quickwins`).
 - [ ] **Phase 10 future** — refactor banner JS-driven async (élimine
  buffer-read pour tous corps, pas seulement < 2 MB).
 ### Phase 9 — mitm-wg multi-worker fanout (#501) — ✅ shipped 2026-06-08
 - [x] **4-worker template + numgen DNAT fanout** (`secubox-toolbox`
  2.5.0, merged `89380a12`). Live numbers gk2 : CPU 68/44/50/54 % au
  lieu d'un single ~90 %.
 - [ ] **Phase 9.1 future** : real filelock pour
  `/var/lib/secubox/toolbox/mitm-bypass-dynamic.conf` (race 4-worker
  tolérable via launcher's `sort -u`, mais propre serait mieux).
 ### Phase 8 — Anti-tracking opérateur (Utiq) — issue #500
 Plan complet documenté en issue. À implémenter :
@ -33,16 +58,12 @@ Plan complet documenté en issue. À implémenter :
  UDP 123 outbound fonctionnel. Peut-être un IPv6-only resolve qui
  échoue silencieusement.
-### Phase 9 (futur) — mitm-wg multi-instance dispatcher
+### Phase 9 ✅ shipped 2026-06-08 — voir bloc P0 ci-dessus
-Re-attaquable seulement avec une archi à inventer :
+Approche initialement envisagée (dispatcher custom, LXC privilégiée
-
+shared-netfilter) — résolue plus simplement via nft `numgen inc mod 4`
- [ ] Évaluer un dispatcher Python custom qui termine wg-quick côté hôte
+ conntrack flow-pinning + systemd template `@.service`.
-  puis route par peer-id vers une mitmproxy instance dans une LXC.
+**#502 D redesign** capture la suite (captive → LXC TPROXY-inside).
 - [ ] OR : wg-quick dans la LXC privilégiée avec accès netfilter
  partagé (kernel module loaded host-side, exposé via /proc).
 - [ ] Évaluer aussi le patch upstream mitmproxy pour multi-peer
  `--mode wireguard` (probablement pas dans leur scope).
 ---
--- a/.claude/WIP.md
+++ b/.claude/WIP.md
@ -1,5 +1,103 @@
 # WIP — Work In Progress
-*Mis à jour : 2026-06-08*
+*Mis à jour : 2026-06-09*
 ---
 ## 🔄 2026-06-09 : Phase 10 — Banner injection perf quick wins + postinst regression fix (ref #501)
 User signal : "la banner n'apparait qu'en fin de chargement et les
 chargements de pages sont très lents. peut-on améliorer ces rendus sans
 compromettre l'analyse... ou alléger l'analyse nécessaire à la
 bannière ?".
 Diagnostic — chaque réponse HTML déclenchait trois opérations
 host-stables sans cache et **un scan plein-corps en regex** juste pour
 remplir une tile "trackers: N" que personne ne lit :
 * `_host_app.classify_host(host)` — walk sur 100+ patterns (5-50 ms à froid)
 * `_whitelist_mod.match(host)` — match contre la whitelist
 * `_geo_mod.lookup(host)` — DNS + mmdb GeoIP (5-50 ms à froid)
 * `_count_trackers_in_body(flow.response.content)` — buffer-read intégral
  du corps avant injection → la bannière n'apparaissait qu'en fin de
  streaming.
 ### ✅ Done — packages
 | Package | from → to | Commit |
 |---|---|---|
 | secubox-toolbox | 2.5.0 → **2.5.1** | `ce059d0f` (banner perf quick wins, déployé live) |
 | secubox-toolbox | 2.5.1 → **2.5.2** | `15f48d9d` (postinst regression fix, code only) |
 Branche : `perf/501-banner-injection-quickwins` (poussée sur origin, **pas de PR**).
 ### ✅ Done — détails
 **1. Banner injection quick wins** (`secubox-toolbox` 2.5.1, commit `ce059d0f`)
 * `_host_signals(host)` — nouvelle fonction LRU-cachée (maxsize=2048)
  retournant un tuple `(app_emoji, app, flag, country, asn, status,
  status_icon)`. Re-hits coûtent un dict lookup au lieu de 5-50 ms.
 * `_count_trackers_in_body()` retiré du chemin chaud. Le flag
  `is_tracker_host` (regex cheap sur l'host de la requête) couvre le
  signal privacy.
 * `_MAX_INJECT_BYTES = 2 MB` — skip de l'injection sur les gros corps
  via pré-check `Content-Length` + garde défensive `len(body)` pour les
  streamed bodies sans CL.
 * Tile cookies + ⚠ tracker-host conservée ; tile 🎯 N trackers (corps)
  morte → supprimée.
 **Mesure live** : sur le déploiement gk2, l'iPhone confirme "browsing
 performance on iPhone is better... perfect work".
 **2. Postinst regression fix** (`secubox-toolbox` 2.5.2, commit `15f48d9d`)
 Deux régressions silencieuses pendant le déploiement 2.5.0 → 2.5.1 sur gk2 :
 * **kbin.gk2.secubox.in 503 pendant 5 min** : dpkg upgrade a SIGTERMé
  `secubox-toolbox.service` et ne l'a jamais redémarré
  (`dh_installsystemd --no-start --no-enable` dans debian/rules).
 * **iPhone tunnel KO** : postinst a écrasé
  `/etc/nftables.d/secubox-toolbox-wg.nft` avec la version single-port
  DNAT, supprimant le fanout Phase 9 que l'opérateur avait déployé en
  runtime → tout le trafic pinné sur worker@1 à 97 % CPU.
 Fixes postinst (pas de code change) :
 * Postinst déploie maintenant le fanout drop-in en
  `/etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft`. Le préfixe `zz-`
  garantit le tri alphabétique après le base file dans le glob include
  de `/etc/nftables.conf` → le base file crée la table + chains, puis
  le zz drop-in flush+repeuple la chain `prerouting` avec le numgen
  fanout map.
 * Sur upgrade (`$2` set), `systemctl try-restart` sur
  `secubox-toolbox.service`, `secubox-toolbox-mitm.service`, et les 4
  instances `secubox-toolbox-mitm-wg-worker@{1..4}.service`. `try-restart`
  est no-op si l'unité n'est pas active, donc safe sur fresh install.
 ### Mémoire mise à jour
 * [nft layered drop-ins persistence](../memory/feedback_nft_layered_dropins_persistence.md)
  — drop-ins doivent trier APRÈS leur table-creator ; ne jamais
  symlinker en place du base file.
 * [postinst must preserve runtime state](../memory/feedback_postinst_preserve_runtime_state.md)
  — dpkg upgrade SIGTERMe l'unité ; postinst doit try-restart + redéployer
  les drop-ins nft appliqués en runtime.
 ### ⬜ Next up
 * **Build + deploy 2.5.2 sur gk2** (postinst-only fix — pas de code change ;
  attendre prochaine fenêtre de maintenance, ne pas perturber la session
  iPhone actuellement stable).
 * **PR #501** quand prêt (branche poussée mais pas de PR ouvert, per
  rule "Don't open PRs unprompted").
 * **Phase 10 future** : refactor banner injection vers une approche
  JS-driven async pour éliminer complètement le buffer-read du corps
  (envoyer un `<script>` minimal à la position `<head>` qui fetch le
  contenu de la bannière + le contexte via XHR, puis injecte le DOM
  côté client). Ferait passer la bannière de "fin de streaming" à
  "first paint" pour TOUS les corps, pas seulement les < 2 MB.
 * **#502 D redesign** : captive → LXC avec TPROXY-inside-LXC (toujours
  en attente, pas commencé).
 ---
--- a/packages/secubox-toolbox/debian/changelog
+++ b/packages/secubox-toolbox/debian/changelog
@ -1,3 +1,115 @@
 secubox-toolbox (2.5.2-1~bookworm1) bookworm; urgency=medium
  * Phase 10.1 (#501 perf) — postinst regressions caught on 2.5.1 deploy.
    The 2.5.0 → 2.5.1 upgrade on gk2 produced TWO silent regressions
    that were not code bugs but deploy-script holes :
      - kbin.gk2.secubox.in returned 503 for 5 min : dpkg upgrade
        SIGTERMed secubox-toolbox.service and never restarted it
        (dh_installsystemd ships --no-start --no-enable in rules).
      - iPhone tunnel went from ~10s/page to unusable : postinst's
        `install -m 0644 .../secubox-toolbox-wg.nft /etc/nftables.d/`
        overwrote the operator-deployed fanout drop-in, regressing
        to single-port DNAT and pinning all traffic to worker@1
        (97% CPU while w2-w4 idle).
    Fixes :
      - postinst now also installs the Phase 9 fanout file as
        /etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft (zz- prefix
        so it sorts after the base file in the alphabetical glob
        include, overriding the prerouting chain with the numgen
        fanout map).
      - postinst now runs `systemctl try-restart` on upgrade ($2 set)
        for secubox-toolbox.service, secubox-toolbox-mitm.service,
        and the 4 mitm-wg-worker@{1..4}.service instances. try-restart
        is a no-op when the unit isn't running, so fresh installs are
        unaffected.
    No code change ; postinst only.
 -- Gerald KERMA <devel@cybermind.fr>  Tue, 09 Jun 2026 07:30:00 +0200
 secubox-toolbox (2.5.1-1~bookworm1) bookworm; urgency=medium
  * Phase 10 (#501 perf) — banner injection quick wins.
    User reported "la banner n'apparait qu'en fin de chargement et les
    chargements de pages sont très lents".  Root cause : every HTML
    response triggered (a) a full body buffer-read just to scan for
    tracker patterns, (b) un-cached _host_app.classify_host walk over
    100+ regexes, (c) un-cached _geo_mod.lookup hitting DNS + the
    GeoIP mmdb.  All three ran on every flow, all three were
    host-stable.  Quick wins :
      - LRU cache (maxsize=2048) on a new _host_signals(host) tuple
        returning (app_emoji, app, flag, country, asn, status,
        status_icon).  Re-hits cost a dict lookup instead of 5-50 ms.
      - Dropped the per-response body tracker scan
        (_count_trackers_in_body).  The 1st-party is_tracker_host
        flag (cheap regex on the request host) is the privacy-
        relevant signal ; the "trackers: N in body" count was a
        nice-to-have that delayed banner injection by 30-200 ms on
        big publishers.
      - Skip injection entirely when the response body exceeds 2 MB
        (cheap pre-flight check on Content-Length, plus a defensive
        len() guard for streamed bodies without CL).  Huge SPA
        bundles and PDF-as-HTML now flow through untouched.
      - Trimmed the cookie/tracker tile : the dead "🎯 N" body-scan
        branch is gone, only "🍪 N" + "⚠ tracker-host" remain.
    Net effect on a typical page : banner shows up at first paint
    instead of after the body finishes streaming, and TTFB for big
    HTML responses is no longer banner-bound.
    No protocol or schema change ; addon order is identical.
 -- Gerald KERMA <devel@cybermind.fr>  Mon, 09 Jun 2026 22:00:00 +0200
 secubox-toolbox (2.5.0-1~bookworm1) bookworm; urgency=medium
  * Phase 9 (#501) — multi-worker fanout for mitm-wg.
    Single-process mitm-wg saturated one ARM core at ~90 % even with
    just 2-3 active wg peers, limited by the Python GIL.  Phase 9
    spreads new TCP flows across N=4 worker instances :
      - systemd : new template secubox-toolbox-mitm-wg-worker@.service ;
        per-instance Environment=MITM_WG_LISTEN_PORT=808%i (8081..8084).
        Per-worker RuntimeMaxSec=3 h, MemoryMax=128M, TasksMax=128.
      - launcher : reads MITM_WG_LISTEN_PORT (default 8081 for legacy
        single-worker service).
      - nft : new drop-in nftables.d/secubox-toolbox-wg-fanout.nft
        replaces the prerouting chain with a numgen inc round-robin
        across 4 ports.  Conntrack pins each TCP flow to its initially
        assigned worker for the lifetime of the connection
        (sticky-per-flow ; rebalancing only at new connection).
      - opt-in : single-worker secubox-toolbox-mitm-wg.service stays
        shipped + functional.  Activation recipe in the worker unit's
        [Unit] description.
    Live numbers on gk2 with 2 active Linux peers + 1 iPhone :
      single  90-95 % CPU on 1 core (saturated)
      fanout  ~55 % avg per worker × 4, 0-70 % range (headroom)
    SQLite WAL on toolbox.db handles 4 concurrent writers ; the
    cert-pin auto-learning dynamic bypass file is the remaining race
    surface (4 writers can dupe a line, the launcher's sort -u
    de-dupes at next reload).  A real filelock lands in Phase 9.1.
 -- Gérald Kerma <devel@cybermind.fr>  mar., 09 juin 2026 04:27:27 +0000
 secubox-toolbox (2.4.3-1~bookworm1) bookworm; urgency=medium
  * Phase 8.2 perf (#500) — defensive performance work :
    - Captive mitm service flags now match the mitm-wg quick win of
      2.4.1 (--set http2=true / connection_strategy=eager /
      keep_host_header=true). No perceptible change today (the
      captive AP is down so the service idles at ~0 % CPU) but the
      moment the AP is reactivated the captive picks up the same
      ×4 CPU win the WG path got.
    - Addon SQLite writes (local_store + utiq) are now fire-and-
      forget through a singleton ThreadPoolExecutor. Each addon owns
      its own bg writer thread (sbx_store_write / sbx_utiq_write).
      The mitmproxy asyncio event loop never blocks on _conn() open
      / INSERT / fsync. Live diagnostic showed the actual mitm-wg
      bottleneck is mitmproxy itself (TLS termination + per-flow
      H/2 parsing) under multi-peer fan-in, not the addon writes ;
      the change is still warranted as defensive hygiene before
      shipping the Phase 9 multi-worker fanout that will benefit
      from non-blocking writes when 4 workers contend on the same
      SQLite file.
 -- Gérald Kerma <devel@cybermind.fr>  mar., 09 juin 2026 04:19:18 +0000
 secubox-toolbox (2.4.2-1~bookworm1) bookworm; urgency=medium
  * Landing page kbin.gk2.secubox.in : la section 'Démo install R3'
--- a/packages/secubox-toolbox/debian/postinst
+++ b/packages/secubox-toolbox/debian/postinst
@ -112,10 +112,21 @@ fi
    # Phase 7 (#498) : install nft drop-in for boot-survival + reload nft so
    # the running ruleset matches what nftables.service will load at next boot.
    # Phase 10 (#501 perf) : ALSO deploy the Phase 9 fanout drop-in as
    # zz-prefixed so it loads AFTER the base file (alphabetical glob) and
    # overrides the prerouting chain with the 4-worker numgen DNAT.  The
    # base file creates the table + chains + 51820 input rule ; the zz
    # drop-in flushes prerouting and repopulates it with the fanout map.
    # Pre-2.5.2 operators who applied the fanout via runtime nft -f got it
    # silently wiped on every dpkg upgrade — fixed here (idempotent).
    if [ -f /usr/share/secubox/toolbox/nftables.d/secubox-toolbox-wg.nft ]; then
      install -d -m 0755 /etc/nftables.d
      install -m 0644 /usr/share/secubox/toolbox/nftables.d/secubox-toolbox-wg.nft \
        /etc/nftables.d/secubox-toolbox-wg.nft
      if [ -f /usr/share/secubox/toolbox/nftables.d/secubox-toolbox-wg-fanout.nft ]; then
        install -m 0644 /usr/share/secubox/toolbox/nftables.d/secubox-toolbox-wg-fanout.nft \
          /etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft
      fi
      if systemctl is-active --quiet nftables.service 2>/dev/null; then
        systemctl reload nftables.service 2>/dev/null \
          || /usr/sbin/nft -f /etc/nftables.d/secubox-toolbox-wg.nft 2>/dev/null \
@ -142,6 +153,25 @@ fi
        systemctl restart unbound.service 2>/dev/null || true
      fi
    fi
    # Phase 10 (#501 perf) : on UPGRADE ($2 = previous version), try-restart
    # the long-running daemons so the new code is live without operator
    # intervention.  dh_installsystemd ships --no-start --no-enable in
    # debian/rules so without this loop secubox-toolbox.service stays dead
    # post-upgrade until reboot (caught 2026-06-09 : kbin.gk2 503'd for 5 min
    # after the 2.5.0 → 2.5.1 upgrade because the unit was SIGTERMed and
    # never restarted).  try-restart is a no-op when the unit is not
    # running, so it's safe on fresh install / unconfigured boards.
    if [ -n "${2:-}" ] && [ -d /run/systemd/system ]; then
      for unit in secubox-toolbox.service \
                  secubox-toolbox-mitm.service \
                  secubox-toolbox-mitm-wg-worker@1.service \
                  secubox-toolbox-mitm-wg-worker@2.service \
                  secubox-toolbox-mitm-wg-worker@3.service \
                  secubox-toolbox-mitm-wg-worker@4.service ; do
        systemctl try-restart "$unit" 2>/dev/null || true
      done
    fi
    ;;
 esac
--- a/packages/secubox-toolbox/debian/rules
+++ b/packages/secubox-toolbox/debian/rules
@ -35,6 +35,11 @@ override_dh_installsystemd:
 	install -d debian/secubox-toolbox/lib/systemd/system/secubox-toolbox-mitm-wg.service.d
 	install -m 0644 systemd/secubox-toolbox-mitm-wg.service.d/10-runtime-max.conf \
 	  debian/secubox-toolbox/lib/systemd/system/secubox-toolbox-mitm-wg.service.d/
 	# Phase 9 (#501) : multi-worker fanout template — opt-in via
 	# systemctl enable @1..4.  See unit's [Unit] doc string for the
 	# activation + rollback recipe.
 	install -m 0644 systemd/secubox-toolbox-mitm-wg-worker@.service \
 	  debian/secubox-toolbox/lib/systemd/system/
 	# Primary unit goes via dh_installsystemd which also handles the enable helpers.
 	cp systemd/secubox-toolbox.service debian/secubox-toolbox.service
 	dh_installsystemd --no-start --no-enable
@ -57,6 +62,10 @@ override_dh_strip:
 	install -d debian/secubox-toolbox/usr/share/secubox/toolbox/nftables.d
 	install -m 0644 nftables.d/secubox-toolbox-wg.nft \
 	  debian/secubox-toolbox/usr/share/secubox/toolbox/nftables.d/
 	# Phase 9 (#501) : fanout DNAT drop-in (opt-in).  Operator activates
 	# by symlinking /etc/nftables.d/secubox-toolbox-wg.nft → this file.
 	install -m 0644 nftables.d/secubox-toolbox-wg-fanout.nft \
 	  debian/secubox-toolbox/usr/share/secubox/toolbox/nftables.d/
 	install -m 0755 sbin/secubox-toolbox-wg-restore \
 	  debian/secubox-toolbox/usr/sbin/
 	install -m 0644 systemd/secubox-toolbox-wg-restore.service \
--- a/packages/secubox-toolbox/debian/secubox-toolbox-mitm.service
+++ b/packages/secubox-toolbox/debian/secubox-toolbox-mitm.service
@ -19,6 +19,9 @@ ExecStart=/usr/bin/mitmdump \
    --set confdir=/etc/secubox/toolbox/mitm \
    --set ssl_insecure=false \
    --set web_open_browser=false \
    --set http2=true \
    --set connection_strategy=eager \
    --set keep_host_header=true \
    -s /usr/lib/secubox/toolbox/mitmproxy_addons/cookies.py \
    -s /usr/lib/secubox/toolbox/mitmproxy_addons/dpi.py \
    -s /usr/lib/secubox/toolbox/mitmproxy_addons/avatar.py \
--- a/packages/secubox-toolbox/mitmproxy_addons/inject_banner.py
+++ b/packages/secubox-toolbox/mitmproxy_addons/inject_banner.py
@ -219,8 +219,84 @@ def _count_trackers_in_body(body: bytes, cap: int = 200_000) -> int:
        return 0
 # Phase 10 (#501 perf) — per-host context cache.
 #
 # `_compute_site_context` used to call into 4 sub-systems per response :
 # `_host_app.classify_host` (host → emoji + app via 100+ pattern walk),
 # `_whitelist_mod.match`, `_geo_mod.lookup` (DNS + GeoIP mmdb lookup),
 # plus the body tracker scan + cookie counts. Of those, ALL host-keyed
 # results are stable for hours — there's no point recomputing per
 # response. An LRU cache on `host` shaves ~20-50 ms per HTML response,
 # and frees the GIL during banner injection so the rest of the addon
 # chain (and other workers) can progress.
 import functools as _functools
@_functools.lru_cache(maxsize=2048)
 def _host_signals(host: str) -> tuple:
    """Return (app_emoji, app, flag, country, asn, status, status_icon).
    Tuple form keeps the cache key + value flat and hashable. We don't
    cache cookie counts (per-flow) or tracker scan (per-response) — only
    the truly host-stable values.
    A cache miss is the expensive path : classify_host (~50 µs typical,
    up to 5 ms on the hot patterns), whitelist match (~1 µs), and the
    GeoIP lookup (DNS + mmdb — can be 5-50 ms on a cold DNS cache).
    """
    app_emoji = "❔"
    app = host
    flag = ""
    country = ""
    asn = ""
    status = "inspected"
    status_icon = "🔍"
    if not _HAS_CLASSIFIERS:
        return (app_emoji, app, flag, country, asn, status, status_icon)
    try:
        cls = _host_app.classify_host(host)
        app_emoji = cls.get("emoji", "❔")
        app = cls.get("app", host) if cls.get("app") != "?" else host
    except Exception:
        pass
    try:
        wl = _whitelist_mod.match(host)
        if wl:
            status = "bypassed-whitelist"
            status_icon = "🛡"
        elif re.search(r"\.(signal|whispersystems|threema|simplex|matrix|proton|tutanota)\.", host):
            status = "e2e-opaque"
            status_icon = "🔐"
    except Exception:
        pass
    if _HAS_GEO:
        try:
            info = _geo_mod.lookup(host) or {}
            flag = info.get("flag", "")
            country = info.get("country_iso", "")
            asn = (info.get("asn_org") or "")[:24]
        except Exception:
            pass
    return (app_emoji, app, flag, country, asn, status, status_icon)
 def _compute_site_context(flow: http.HTTPFlow) -> dict:
-    """Compute per-site signals for the dynamic right-side of the banner."""
+    """Compute per-site signals for the dynamic right-side of the banner.
    Phase 10 (#501 perf) — host-stable signals come from the LRU cache
    (`_host_signals`). The body tracker scan is REMOVED ; the
    1st-party `is_tracker_host` flag already surfaces tracker hosts as
    `⚠ tracker-host` in the banner, which is the privacy-relevant
    signal. Counting trackers in the body required a full body buffer
    scan that delayed banner injection by 30-200 ms per HTML response
    on big publishers — not worth it for a "trackers: N" count that
    most users don't read.
    """
    host = (flow.request.host or "").lower()
    ctx = {
        "host": host[:50],
@ -240,6 +316,10 @@ def _compute_site_context(flow: http.HTTPFlow) -> dict:
        "utiq_recent_count": 0,
    }
    # Host-stable signals — single LRU lookup per host.
    (ctx["app_emoji"], ctx["app"], ctx["flag"], ctx["country"], ctx["asn"],
     ctx["status"], ctx["status_icon"]) = _host_signals(host)
    # Cookies (cheap : just header counts, name-less for privacy)
    set_n, sent_n = _count_cookies(flow)
    ctx["cookies_set"] = set_n
@ -259,43 +339,10 @@ def _compute_site_context(flow: http.HTTPFlow) -> dict:
    except Exception:
        pass
-    # Trackers : 1st-party host check + body scan
+    # Trackers : 1st-party host flag only — body scan removed in Phase 10
    # for perceived-latency win. The banner still shows ⚠ tracker-host
    # when the visited site is itself a known tracker domain.
    ctx["is_tracker_host"] = bool(_TRACKER_HOST_PATTERNS.match(host))
    if flow.response and flow.response.content:
        ctx["trackers"] = _count_trackers_in_body(flow.response.content)
    if not _HAS_CLASSIFIERS:
        return ctx
    # App classification
    try:
        cls = _host_app.classify_host(host)
        ctx["app_emoji"] = cls.get("emoji", "❔")
        ctx["app"] = cls.get("app", host) if cls.get("app") != "?" else host
    except Exception:
        pass
    # Whitelist / status
    try:
        wl = _whitelist_mod.match(host)
        if wl:
            ctx["status"] = "bypassed-whitelist"
            ctx["status_icon"] = "🛡"
        # E2E pattern check (cheap)
        elif re.search(r"\.(signal|whispersystems|threema|simplex|matrix|proton|tutanota)\.", host):
            ctx["status"] = "e2e-opaque"
            ctx["status_icon"] = "🔐"
    except Exception:
        pass
    # Geo (flag + country + ASN)
    if _HAS_GEO:
        try:
            info = _geo_mod.lookup(host) or {}
            ctx["flag"] = info.get("flag", "")
            ctx["country"] = info.get("country_iso", "")
            ctx["asn"] = (info.get("asn_org") or "")[:24]
        except Exception:
            pass
    # Quality grade (passive — we only see response headers + transport)
    try:
@ -382,23 +429,17 @@ def _banner_html_dynamic(sha1: str, ctx: dict, csp_strict: bool,
        right_parts.append(ctx["flag"])
    if ctx["app_emoji"] and ctx["app"]:
        right_parts.append(f"{_ncr(ctx['app_emoji'])} {_ncr(ctx['app'])}")
-    # Phase 6.G : cookies + trackers (privacy signals)
+    # Phase 6.G : cookies + 1st-party tracker host (privacy signals).
    # Phase 10 perf : the per-response tracker body scan is gone — we keep
    # only the host-level flag (cheap regex on the request host).
    cookies_set = ctx.get("cookies_set", 0)
    cookies_sent = ctx.get("cookies_sent", 0)
    trackers = ctx.get("trackers", 0)
    is_tracker = ctx.get("is_tracker_host", False)
    cookie_total = cookies_set + cookies_sent
    if cookie_total > 0:
-        # 🍪 N (set+sent) — colored if many
+        right_parts.append(f"&#x1F36A; {cookie_total}")  # 🍪
        cookie_emoji = "&#x1F36A;"  # 🍪
        right_parts.append(f"{cookie_emoji} {cookie_total}")
    if trackers > 0 or is_tracker:
        # 🎯 N trackers in body, or ⚠ if 1st-party host is itself a tracker
    if is_tracker:
        right_parts.append("&#x26A0; tracker-host")  # ⚠
        else:
            target_emoji = "&#x1F3AF;"  # 🎯
            right_parts.append(f"{target_emoji} {trackers}")
    # Phase 8 (#500) — surface Utiq hits for this client. Cheap query
    # against the utiq_events store (last 1 h). Avoids surfacing the
    # tile on stale state by capping the lookback window.
@ -527,6 +568,9 @@ def _client_level(flow) -> str:
    return "r1"
 _MAX_INJECT_BYTES = 2 * 1024 * 1024  # Phase 10 perf cap : skip injection on huge bodies
 class InjectBanner:
    def response(self, flow: http.HTTPFlow) -> None:
        if not flow.response:
@ -540,9 +584,21 @@ class InjectBanner:
        # AND R3 (portable WG opt-in). R0/R1 stay banner-free.
        if _client_level(flow) not in ("r2", "r3"):
            return
        # Phase 10 perf : cheap pre-flight check on Content-Length to avoid
        # reading multi-MB bodies into RAM just to discover we'd skip them.
        # `flow.response.content` would buffer the whole body before returning.
        try:
            cl = int(flow.response.headers.get("content-length", "0") or "0")
            if cl > _MAX_INJECT_BYTES:
                return
        except (TypeError, ValueError):
            pass
        body = flow.response.content
        if body is None or _GUARD in body:
            return
        if len(body) > _MAX_INJECT_BYTES:
            # Streamed bodies without content-length still get caught here.
            return
        m = _RE_BODY_CLOSE.search(body)
        if not m:
            return
--- a/packages/secubox-toolbox/mitmproxy_addons/local_store.py
+++ b/packages/secubox-toolbox/mitmproxy_addons/local_store.py
@ -145,7 +145,15 @@ def _peer_ip(flow) -> str | None:
    return None
-def _insert(mac_hash: str | None, source: str, payload: dict) -> None:
+# Phase 8.B perf (#500) — fire-and-forget SQLite writes via a single
 # background thread so the mitmproxy asyncio event loop never blocks
 # on `fsync()`. Single worker keeps inserts ordered AND avoids SQLite
 # write contention (the engine itself serialises writers in WAL mode).
 import concurrent.futures as _futures
 _executor = _futures.ThreadPoolExecutor(max_workers=1, thread_name_prefix="sbx_store_write")
 def _insert_sync(mac_hash: str | None, source: str, payload: dict) -> None:
    if not mac_hash:
        return
    try:
@ -163,6 +171,22 @@ def _insert(mac_hash: str | None, source: str, payload: dict) -> None:
        log.debug("sqlite insert failed: %s", e)
 def _insert(mac_hash: str | None, source: str, payload: dict) -> None:
    """Phase 8.B — submit the insert to the bg thread. Hook returns
    instantly ; the mitmproxy event loop keeps churning flows while
    the SQLite IO happens off-thread.
    Submit may raise RuntimeError if the executor was shut down during
    interpreter teardown ; we swallow that to keep the hook silent on
    shutdown."""
    if not mac_hash:
        return
    try:
        _executor.submit(_insert_sync, mac_hash, source, payload)
    except RuntimeError:
        pass
 # ──────────────── mitmproxy hooks ────────────────
 class LocalStore:
--- a/packages/secubox-toolbox/nftables.d/secubox-toolbox-wg-fanout.nft
+++ b/packages/secubox-toolbox/nftables.d/secubox-toolbox-wg-fanout.nft
@ -0,0 +1,53 @@
 # SPDX-License-Identifier: LicenseRef-CMSD-1.0
 # Phase 9 (#501) — multi-worker fanout drop-in for the R3 wg tunnel mitm.
 #
 # REPLACES the prerouting rules from secubox-toolbox-wg.nft :
 #   iif wg-toolbox tcp dport 443 dnat ip to 10.99.1.1:8081  (single port)
 # with a round-robin numgen mapping to ports 8081..8084.
 #
 # Why numgen inc and not jhash : nftables 1.0.6 (Debian bookworm) doesn't
 # support `jhash` in numgen yet (lands in 1.0.7+).  `inc` is round-robin
 # per-rule-evaluation, but conntrack pins the chosen DNAT translation for
 # the lifetime of the TCP flow — so each individual TCP connection sees
 # exactly one worker from SYN to FIN.  Re-balancing happens only between
 # connections, which is exactly what we want.
 #
 # To apply at boot (the postinst installs this file next to the single-
 # worker drop-in ; the operator picks which is loaded by nftables.service
 # via a symlink at /etc/nftables.d/secubox-toolbox-wg.nft).
 flush chain inet wg-toolbox prerouting
 table inet wg-toolbox {
    chain prerouting {
        type nat hook prerouting priority dstnat; policy accept;
        # Phase 9 (#501) — 4-worker round-robin DNAT.  numgen returns
        # 0..3 ; the map sends each to one of the 4 worker ports on
        # 10.99.1.1.  Conntrack pins the choice for the whole flow.
        iif "wg-toolbox" tcp dport 443 dnat ip to 10.99.1.1 \
            : numgen inc mod 4 map {
                0 : 8081,
                1 : 8082,
                2 : 8083,
                3 : 8084
            }
        iif "wg-toolbox" tcp dport 80 dnat ip to 10.99.1.1 \
            : numgen inc mod 4 map {
                0 : 8081,
                1 : 8082,
                2 : 8083,
                3 : 8084
            }
        # Phase 7 (#498) — DNS DNAT for legacy peer configs that hand out
        # DNS = 10.99.0.1.  Single target — these queries are tiny and
        # don't need worker fanout.
        iif "wg-toolbox" ip daddr 10.99.0.1 udp dport 53 dnat ip to 10.99.1.1:53
        iif "wg-toolbox" ip daddr 10.99.0.1 tcp dport 53 dnat ip to 10.99.1.1:53
        # Phase 7 (#498) — captive-portal HTTP probe from the R3
        # verification page.
        iif "wg-toolbox" ip daddr 10.99.0.1 tcp dport 8088 dnat ip to 10.99.1.1:8088
    }
 }
--- a/packages/secubox-toolbox/sbin/secubox-toolbox-mitm-wg-launch
+++ b/packages/secubox-toolbox/sbin/secubox-toolbox-mitm-wg-launch
@ -45,11 +45,15 @@ fi
 # Phase 7 (#498) — listen-host is overridable via env. Host (default) binds
 # 10.99.1.1 (the wg-toolbox interface IP) ; LXC variant sets 0.0.0.0 so it
 # accepts the DNAT'd traffic on the 10.100.0.62 br-lxc interface.
 # Phase 9 (#501) — listen-port is overridable too. Each fanout worker
 # instance (secubox-toolbox-mitm-wg-worker@N) sets MITM_WG_LISTEN_PORT
 # to 808N. The legacy single-process service keeps the 8081 default.
 MITM_WG_LISTEN_HOST="${MITM_WG_LISTEN_HOST:-10.99.1.1}"
 MITM_WG_LISTEN_PORT="${MITM_WG_LISTEN_PORT:-8081}"
 ARGS=(
    --mode transparent
    --listen-host "$MITM_WG_LISTEN_HOST"
-    --listen-port 8081
+    --listen-port "$MITM_WG_LISTEN_PORT"
    --set confdir=/etc/secubox/toolbox/ca-wg
    --set ssl_insecure=false
    --set web_open_browser=false
--- a/packages/secubox-toolbox/secubox_toolbox/utiq.py
+++ b/packages/secubox-toolbox/secubox_toolbox/utiq.py
@ -79,17 +79,15 @@ def _publisher_from_host(host: str) -> str:
    return h or "unknown"
-def record_event(
+# Phase 8.B perf (#500) — fire-and-forget SQLite writes via single
-    *,
+# background thread (matches local_store.py pattern). Mitmproxy's
-    client_ip: Optional[str],
+# asyncio event loop never blocks on _conn() open + INSERT + fsync.
-    host: str,
+import concurrent.futures as _futures
-    path: Optional[str],
+_executor = _futures.ThreadPoolExecutor(max_workers=1, thread_name_prefix="sbx_utiq_write")
-    action: str,
+
-    level: str,
+
-    detected_mtid: Optional[str] = None,
+def _record_sync(client_ip, host, path, action, level,
-    injected_mtid: Optional[str] = None,
+                 detected_mtid, injected_mtid) -> None:
 ) -> None:
    """Insert one event.  Best-effort — never raises into the addon."""
    try:
        with _conn() as c:
            c.execute(
@ -112,6 +110,26 @@ def record_event(
        log.warning("record_event failed: %s", e)
 def record_event(
    *,
    client_ip: Optional[str],
    host: str,
    path: Optional[str],
    action: str,
    level: str,
    detected_mtid: Optional[str] = None,
    injected_mtid: Optional[str] = None,
 ) -> None:
    """Insert one event off-thread. Best-effort — never raises into
    the addon, never blocks the mitmproxy asyncio loop."""
    try:
        _executor.submit(_record_sync, client_ip, host, path, action,
                         level, detected_mtid, injected_mtid)
    except RuntimeError:
        # Executor shut down (interpreter teardown) — silent drop.
        pass
 def recent(hours: int = 24, limit: int = 200) -> List[Dict]:
    """Return the last events within the window, newest first."""
    since = int(time.time()) - hours * 3600
--- a/packages/secubox-toolbox/systemd/secubox-toolbox-mitm-wg-worker@.service
+++ b/packages/secubox-toolbox/systemd/secubox-toolbox-mitm-wg-worker@.service
@ -0,0 +1,72 @@
 # SPDX-License-Identifier: LicenseRef-CMSD-1.0
 # Phase 9 (#501) — multi-worker fanout for the R3 wg tunnel mitm.
 #
 # Why : on gk2 the single-process mitm-wg saturates one ARM core at
 # ~90 % under just 2-3 concurrently-active wg peers.  The Python GIL
 # caps real parallelism inside a single mitmproxy process.  Phase 9
 # runs N=4 worker instances (8081..8084) and lets nft DNAT spread
 # new TCP connections evenly across them via `numgen inc mod 4`,
 # which is sticky-per-connection (the conntrack entry locks the
 # translation for the lifetime of the flow).
 #
 # Each %i ∈ {1..4} → listen on 808%i .  Activate with :
 #
 #   systemctl enable --now secubox-toolbox-mitm-wg-worker@{1,2,3,4}.service
 #   nft -f /etc/nftables.d/secubox-toolbox-wg-fanout.nft
 #   systemctl disable --now secubox-toolbox-mitm-wg.service    # retire single
 #
 # Rollback (single-process) :
 #
 #   systemctl disable --now secubox-toolbox-mitm-wg-worker@{1,2,3,4}.service
 #   nft -f /usr/share/secubox/toolbox/nftables.d/secubox-toolbox-wg.nft
 #   systemctl enable --now secubox-toolbox-mitm-wg.service
 #
 # State coherence : all 4 workers share /var/lib/secubox/toolbox/toolbox.db
 # (WAL mode, multi-writer-safe).  Cert-pin auto-learning's dynamic
 # bypass file is the one source of contention left (4 writers race on
 # /var/lib/secubox/toolbox/mitm-bypass-dynamic.conf) ; the .path
 # watcher already de-bounces 10 s before reload-restart so the worst
 # case is a duplicate line added then deduped by the launcher's
 # sort -u pipeline.  Acceptable for Phase 9 ship ; a real filelock
 # lands in 9.1.
 [Unit]
 Description=SecuBox ToolBoX MITM WireGuard worker %i (R3 fanout port 808%i)
 After=network.target wg-quick@wg-toolbox.service
 Wants=wg-quick@wg-toolbox.service
 Documentation=https://github.com/CyberMind-FR/secubox-deb/issues/501
 [Service]
 Type=simple
 User=secubox-toolbox
 Group=secubox-toolbox
 WorkingDirectory=/usr/lib/secubox/toolbox
 # Phase 9 — per-instance port.  systemd's %i is the instance number.
 Environment="MITM_WG_LISTEN_HOST=10.99.1.1"
 Environment="MITM_WG_LISTEN_PORT=808%i"
 ExecStart=/usr/sbin/secubox-toolbox-mitm-wg-launch
 Restart=on-failure
 RestartSec=5
 # Same hygiene cycle as the single-process unit.  3 h recycle per
 # worker, staggered by 45 min via RuntimeMaxSec randomization
 # (RandomizedDelaySec on the timer would be cleaner ; here we just
 # accept that 4 workers will all recycle at boot+3h with brief 5 s
 # downtime each, mitigated by the others still serving traffic).
 RuntimeMaxSec=3h
 # Memory envelope per worker — 4x the single-process budget split
 # evenly is 100 MB each, but real-world per-worker RSS sits at
 # ~60-80 MB so MemoryMax=128M gives a sane upper bound.
 MemoryHigh=100M
 MemoryMax=128M
 # Resource isolation between workers.  Without it, one runaway
 # worker can drag the others.
 TasksMax=128
 [Install]
 WantedBy=multi-user.target
Author	SHA1	Message	Date
CyberMind	2b9a060561	Merge pull request #501 from CyberMind-FR/perf/501-banner-injection-quickwins Some checks are pending License Headers / check (push) Waiting to run Details Perf/501 banner injection quickwins	2026-06-09 07:50:43 +02:00
CyberMind-FR	53e819cd2d	docs: update WIP/HISTORY/TODO for Phase 10 banner perf + postinst fix (ref #501 ) - WIP.md: new 2026-06-09 entry covering 2.5.1 (banner perf) + 2.5.2 (postinst regression fix) with diagnosis, packages, memory links and next-up. - HISTORY.md: companion timestamped entry with live mitigations applied on gk2 today. - TODO.md: mark Phase 10 + Phase 9 shipped in P0; remove obsolete "Phase 9 (futur)" dispatcher block (resolved by nft numgen fanout).	2026-06-09 07:14:26 +02:00
CyberMind-FR	15f48d9d4b	fix(toolbox): postinst preserves Phase 9 fanout + try-restart on upgrade (ref #501 ) Two silent regressions surfaced during the 2.5.0 → 2.5.1 deploy on gk2: 1. kbin.gk2.secubox.in 503 for 5 min after upgrade. dpkg SIGTERMed secubox-toolbox.service and never restarted it because dh_installsystemd --no-start --no-enable is set in debian/rules. 2. iPhone tunnel went from ~10 s/page to unusable. postinst overwrote the operator-deployed fanout nft drop-in (/etc/nftables.d/secubox-toolbox-wg.nft) with the pristine single-port-DNAT version, pinning all R3 traffic to worker@1 (97 % CPU while w2-w4 sat idle). Fixes — postinst only, no code change: - debian/postinst: ALWAYS install the Phase 9 fanout drop-in as /etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft. zz- prefix sorts after the base file in /etc/nftables.conf's alphabetical glob include, so the base file creates the table + chains and the zz drop-in then flushes the prerouting chain and repopulates it with the numgen mod 4 map across ports 8081..8084. - debian/postinst: on UPGRADE ($2 set), try-restart the long-running units (secubox-toolbox, secubox-toolbox-mitm, and the 4 worker instances). try-restart is a no-op when the unit isn't running, so fresh installs / unconfigured boards are unaffected. - debian/changelog: bump to 2.5.2-1~bookworm1.	2026-06-09 07:11:07 +02:00
CyberMind-FR	ce059d0f9d	perf(toolbox): banner injection quick wins, drop body scan + LRU host cache (ref #501 ) User reported the banner only appeared at end-of-load and pages were very slow. Three host-stable lookups (classify_host, whitelist match, GeoIP) ran on every flow without caching, and a full-body regex scan ran just to count "trackers in body" for a tile most users don't read. - inject_banner.py: new _host_signals(host) returning a 7-tuple, cached @functools.lru_cache(maxsize=2048). Re-hit cost goes from 5-50 ms to a dict lookup. - inject_banner.py: drop _count_trackers_in_body() call from the response hot path. is_tracker_host flag (cheap host regex) keeps the privacy signal. - inject_banner.py: _MAX_INJECT_BYTES = 2 MB cap on response() via Content-Length pre-flight + defensive len() guard. - inject_banner.py: drop the now-dead "🎯 N body trackers" tile branch ; the cookie + ⚠ tracker-host tiles stay. Net: banner shows at first paint, TTFB on big HTML responses no longer banner-bound. - debian/changelog: bump to 2.5.1-1~bookworm1.	2026-06-09 06:40:42 +02:00
CyberMind-FR	89380a121a	Merge feature/501-mitm-wg-multi-worker-fanout : Phase 9 mitm-wg multi-worker fanout (ref #501 )	2026-06-09 06:27:40 +02:00
CyberMind-FR	c17810e1f0	feat(toolbox): Phase 9 multi-worker fanout for mitm-wg (ref #501 ) The single-process mitm-wg saturated one ARM core at ~90 % under just 2-3 active wg peers because the Python GIL caps real parallelism inside one mitmproxy process. No single-process tweak moves that needle further than Phase 8.1 already did (CPU 65 % → 12 % at idle ; at multi-peer load we're back to 90 %+). Phase 9 ships 4 mitm worker instances and lets nft round-robin distribute new TCP flows across them via `numgen inc mod 4`. Conntrack pins each flow's DNAT translation for its lifetime, so a given TCP connection sees exactly one worker from SYN to FIN — sticky-per-flow without needing nftables 1.0.7+ jhash support (Debian bookworm ships 1.0.6). What's shipped : systemd/secubox-toolbox-mitm-wg-worker@.service Template unit ; per-instance Environment=MITM_WG_LISTEN_PORT= 808%i (8081..8084). Per-worker RuntimeMaxSec=3h, MemoryMax=128M, TasksMax=128, User=secubox-toolbox. sbin/secubox-toolbox-mitm-wg-launch Now reads MITM_WG_LISTEN_PORT (default 8081 for the legacy single-worker service). nftables.d/secubox-toolbox-wg-fanout.nft Replaces the single-port DNAT rules with a numgen-inc round-robin map to 4 ports. DNS + captive-portal DNAT rules stay untouched (small queries, no benefit from fanout). debian/rules Installs both the worker template and the fanout nft drop-in next to the existing single-worker artifacts. Activation (operator-initiated) : systemctl disable --now secubox-toolbox-mitm-wg.service systemctl enable --now secubox-toolbox-mitm-wg-worker@{1,2,3,4}.service ln -sf /usr/share/secubox/toolbox/nftables.d/secubox-toolbox-wg-fanout.nft \ /etc/nftables.d/secubox-toolbox-wg.nft nft -f /etc/nftables.d/secubox-toolbox-wg.nft Rollback : reverse the steps above ; the legacy single-worker service and its single-port nft drop-in remain shipped + functional. Live numbers on gk2 with the workload active during the cut-over (2 Linux peers + 1 iPhone, dozens of concurrent flows) : before one process @ 90-95 % CPU on a single core, saturated after ~55 % avg per worker × 4, 0-70 % range, all cores below saturation The Phase 8.B addon-write fire-and-forget pattern shipped in 2.4.3 becomes load-bearing here : SQLite WAL on toolbox.db handles 4 concurrent writers cleanly because each worker's _executor serialises its own writes, and SQLite's writer mutex handles the inter-worker contention with no event-loop stalls. Known limitation : the cert-pin auto-learning dynamic bypass file is the remaining race surface (4 writers can dupe a line under burst, the launcher's sort -u de-dupes at next reload). A real filelock lands in Phase 9.1.	2026-06-09 06:27:27 +02:00
CyberMind-FR	06e73d39bd	Merge perf/500-captive-flags-and-addon-async : captive flags + addon async writes (ref #500 )	2026-06-09 06:19:30 +02:00
CyberMind-FR	d3fbf174c0	perf(toolbox): captive mitm flag symmetry + addon writes off-thread (ref #500 ) A — Captive mitm-toolbox (secubox-toolbox-mitm.service) gets the same three flags as mitm-wg picked up in 2.4.1 : --set http2=true --set connection_strategy=eager --set keep_host_header=true The captive idles at ~0 % CPU right now (wlan AP is down), so no visible change today. When the AP is reactivated the captive will inherit the same ×4 CPU win the WG path saw — and it stays in symmetry with mitm-wg so future tweaks land on both. B — Addon SQLite writes are now fire-and-forget via singleton ThreadPoolExecutor (max_workers=1, thread_name_prefix=…) : local_store._insert → submitted to sbx_store_write utiq.record_event → submitted to sbx_utiq_write The hook returns instantly ; the _conn() open + INSERT + fsync chain runs on the bg thread. No more event-loop stalls during peer flow processing. Live diagnostic on gk2 surfaced the actual bottleneck while shipping this : one Linux PC peer (10.99.1.60) was generating ~3 client connects/sec sustained, the second Linux PC (10.99.1.47) running in parallel, and mitm-wg processing ~100 concurrent TLS sessions. At that level the CPU is consumed by mitmproxy itself (TLS termination + per-flow H/2 stream parsing under the Python GIL), NOT by the addon writes — so A and B don't move the needle today. They remain shipped as defensive hygiene before Phase 9 multi-worker fanout : when 4 mitm workers contend on the same SQLite file, the fire-and-forget pattern matters.	2026-06-09 06:19:19 +02:00