Compare commits

...

4 Commits

Author SHA1 Message Date
CyberMind
2b9a060561
Merge pull request #501 from CyberMind-FR/perf/501-banner-injection-quickwins
Some checks are pending
License Headers / check (push) Waiting to run
Perf/501 banner injection quickwins
2026-06-09 07:50:43 +02:00
53e819cd2d docs: update WIP/HISTORY/TODO for Phase 10 banner perf + postinst fix (ref #501)
- WIP.md: new 2026-06-09 entry covering 2.5.1 (banner perf) + 2.5.2
    (postinst regression fix) with diagnosis, packages, memory links
    and next-up.
  - HISTORY.md: companion timestamped entry with live mitigations applied
    on gk2 today.
  - TODO.md: mark Phase 10 + Phase 9 shipped in P0; remove obsolete
    "Phase 9 (futur)" dispatcher block (resolved by nft numgen fanout).
2026-06-09 07:14:26 +02:00
15f48d9d4b fix(toolbox): postinst preserves Phase 9 fanout + try-restart on upgrade (ref #501)
Two silent regressions surfaced during the 2.5.0 → 2.5.1 deploy on gk2:

  1. kbin.gk2.secubox.in 503 for 5 min after upgrade.  dpkg SIGTERMed
     secubox-toolbox.service and never restarted it because
     dh_installsystemd --no-start --no-enable is set in debian/rules.

  2. iPhone tunnel went from ~10 s/page to unusable.  postinst overwrote
     the operator-deployed fanout nft drop-in
     (/etc/nftables.d/secubox-toolbox-wg.nft) with the pristine
     single-port-DNAT version, pinning all R3 traffic to worker@1
     (97 % CPU while w2-w4 sat idle).

Fixes — postinst only, no code change:

  - debian/postinst: ALWAYS install the Phase 9 fanout drop-in as
    /etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft.  zz- prefix sorts
    after the base file in /etc/nftables.conf's alphabetical glob
    include, so the base file creates the table + chains and the zz
    drop-in then flushes the prerouting chain and repopulates it with
    the numgen mod 4 map across ports 8081..8084.

  - debian/postinst: on UPGRADE ($2 set), try-restart the long-running
    units (secubox-toolbox, secubox-toolbox-mitm, and the 4 worker
    instances).  try-restart is a no-op when the unit isn't running,
    so fresh installs / unconfigured boards are unaffected.

  - debian/changelog: bump to 2.5.2-1~bookworm1.
2026-06-09 07:11:07 +02:00
ce059d0f9d perf(toolbox): banner injection quick wins, drop body scan + LRU host cache (ref #501)
User reported the banner only appeared at end-of-load and pages were
very slow.  Three host-stable lookups (classify_host, whitelist match,
GeoIP) ran on every flow without caching, and a full-body regex scan
ran just to count "trackers in body" for a tile most users don't read.

  - inject_banner.py: new _host_signals(host) returning a 7-tuple,
    cached @functools.lru_cache(maxsize=2048).  Re-hit cost goes
    from 5-50 ms to a dict lookup.
  - inject_banner.py: drop _count_trackers_in_body() call from the
    response hot path.  is_tracker_host flag (cheap host regex)
    keeps the privacy signal.
  - inject_banner.py: _MAX_INJECT_BYTES = 2 MB cap on response()
    via Content-Length pre-flight + defensive len() guard.
  - inject_banner.py: drop the now-dead "🎯 N body trackers" tile
    branch ; the cookie + ⚠ tracker-host tiles stay.

Net: banner shows at first paint, TTFB on big HTML responses no
longer banner-bound.

  - debian/changelog: bump to 2.5.1-1~bookworm1.
2026-06-09 06:40:42 +02:00
6 changed files with 419 additions and 60 deletions

View File

@ -3,6 +3,100 @@
---
## 2026-06-09 — Phase 10 banner injection perf quick wins + postinst regression fix (ref #501)
### Package bumps
| Package | from → to |
|---|---|
| secubox-toolbox | 2.5.0 → **2.5.1** (banner perf, déployé live) |
| secubox-toolbox | 2.5.1 → **2.5.2** (postinst regression fix, code-only) |
### What landed
**1. Banner injection quick wins** (`secubox-toolbox` 2.5.1, commit `ce059d0f`)
User signal : "la banner n'apparait qu'en fin de chargement et les
chargements de pages sont très lents". Quatre changements ciblés :
* `_host_signals(host)` — nouvelle fonction LRU-cachée (maxsize=2048)
retournant `(app_emoji, app, flag, country, asn, status, status_icon)`.
Re-hits coûtent un dict lookup au lieu de 5-50 ms (`classify_host` +
`whitelist match` + GeoIP DNS+mmdb).
* `_count_trackers_in_body()` retiré du chemin chaud. Le flag
`is_tracker_host` (regex cheap sur l'host de la requête) couvre le
signal privacy ; le scan plein-corps économise 30-200 ms sur les
publishers lourds.
* `_MAX_INJECT_BYTES = 2 MB` — skip injection sur gros corps via
pré-check `Content-Length` + garde défensive `len(body)` pour les
streamed bodies sans CL.
* Tile 🎯 N trackers (corps) supprimée ; cookies + ⚠ tracker-host
conservés.
Confirmation utilisateur post-déploiement gk2 : "browsing performance
on iPhone is better... perfect work".
**2. Postinst regression fix** (`secubox-toolbox` 2.5.2, commit `15f48d9d`)
Deux régressions silencieuses pendant le déploiement 2.5.0 → 2.5.1 sur gk2 :
* **kbin.gk2.secubox.in 503 pendant 5 min** : dpkg upgrade a SIGTERMé
`secubox-toolbox.service` (FastAPI kbin landing) et ne l'a jamais
redémarré, car `dh_installsystemd --no-start --no-enable` dans
`debian/rules`. Détecté quand l'utilisateur a signalé "kbin 503".
* **iPhone tunnel inutilisable** : postinst a écrasé
`/etc/nftables.d/secubox-toolbox-wg.nft` avec la version single-port
DNAT, supprimant le fanout Phase 9 (que l'opérateur avait déployé en
runtime avec `nft -f` sans persistence côté package). Résultat : tout
le trafic WG R3 pinné sur worker@1 à 97 % CPU, w2-w4 idle. Détecté
quand l'utilisateur a signalé "browsing excessivement trop lent".
Fixes postinst-only :
* Postinst déploie maintenant `secubox-toolbox-wg-fanout.nft` en
`/etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft`. Le préfixe `zz-`
garantit le tri alphabétique après le base file dans le glob include
de `/etc/nftables.conf` → le base file crée la table + chains + UDP
51820 input rule, puis le zz drop-in flush+repeuple `prerouting`
avec le numgen fanout map sur ports 8081..8084.
* Sur upgrade (`$2` set), `systemctl try-restart` sur
`secubox-toolbox.service`, `secubox-toolbox-mitm.service`, et les 4
instances `secubox-toolbox-mitm-wg-worker@{1..4}.service`. `try-restart`
est no-op si l'unité n'est pas active, donc safe sur fresh install.
### Mitigations live appliquées sur gk2 (2026-06-09)
* `systemctl start secubox-toolbox.service` — restaure kbin landing.
* `cp .../secubox-toolbox-wg-fanout.nft /etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft`
+ `systemctl reload nftables.service` + `systemctl restart
secubox-toolbox-mitm-wg-worker@1.service` (pour drop les sticky
flows pinnés sur w1) — restaure le fanout 4-worker.
### Mémoire ajoutée
* `feedback_nft_layered_dropins_persistence.md` — Phase 9 fanout doit
trier APRÈS son table-creator (zz- prefix) ; ne jamais symlinker en
place du base file.
* `feedback_postinst_preserve_runtime_state.md` — dpkg upgrade SIGTERMe
l'unité ; postinst doit try-restart + redéployer les drop-ins nft
appliqués en runtime.
### Branche
`perf/501-banner-injection-quickwins` poussée sur origin (commits
`ce059d0f` + `15f48d9d`). **Pas de PR ouverte** par défaut (rule
`feedback_no_unprompted_prs.md`).
### À faire ensuite
* Build + deploy `secubox-toolbox 2.5.2` sur gk2 (postinst-only — pas
de code change ; attendre fenêtre de maintenance).
* Ouvrir PR #501 sur instruction.
* Phase 10 future : refactor banner vers JS-driven async (élimine le
buffer-read pour TOUS les corps, pas seulement < 2 MB).
---
## 2026-06-08 — Phase 7.E.x LXC hygiene + auth recovery + Phase 8 opening (ref #498, #500)
### Package bumps

View File

@ -1,10 +1,35 @@
# TODO — SecuBox-DEB Backlog
*Mis à jour : 2026-06-08*
*Mis à jour : 2026-06-09*
---
## 🔥 P0 — Immediate (in flight)
### Phase 10 — Banner injection perf (#501) — ✅ shipped 2026-06-09
- [x] **Banner perf quick wins** (`secubox-toolbox` 2.5.1, commit `ce059d0f`)
— LRU `_host_signals` (2048), drop body tracker scan, 2 MB body cap,
trim dead tile. Deployed live sur gk2, iPhone confirme "better...
perfect work".
- [x] **Postinst regression fix** (`secubox-toolbox` 2.5.2, commit `15f48d9d`)
— auto-deploy fanout drop-in en `zz-`, try-restart sur upgrade. Push
origin, code-only (pas encore déployé).
- [ ] **Build + deploy 2.5.2 sur gk2** — postinst-only, attendre fenêtre
de maintenance (ne pas perturber session iPhone stable).
- [ ] **Ouvrir PR #501** sur instruction utilisateur (branche poussée :
`perf/501-banner-injection-quickwins`).
- [ ] **Phase 10 future** — refactor banner JS-driven async (élimine
buffer-read pour tous corps, pas seulement < 2 MB).
### Phase 9 — mitm-wg multi-worker fanout (#501) — ✅ shipped 2026-06-08
- [x] **4-worker template + numgen DNAT fanout** (`secubox-toolbox`
2.5.0, merged `89380a12`). Live numbers gk2 : CPU 68/44/50/54 % au
lieu d'un single ~90 %.
- [ ] **Phase 9.1 future** : real filelock pour
`/var/lib/secubox/toolbox/mitm-bypass-dynamic.conf` (race 4-worker
tolérable via launcher's `sort -u`, mais propre serait mieux).
### Phase 8 — Anti-tracking opérateur (Utiq) — issue #500
Plan complet documenté en issue. À implémenter :
@ -33,16 +58,12 @@ Plan complet documenté en issue. À implémenter :
UDP 123 outbound fonctionnel. Peut-être un IPv6-only resolve qui
échoue silencieusement.
### Phase 9 (futur) — mitm-wg multi-instance dispatcher
### Phase 9 ✅ shipped 2026-06-08 — voir bloc P0 ci-dessus
Re-attaquable seulement avec une archi à inventer :
- [ ] Évaluer un dispatcher Python custom qui termine wg-quick côté hôte
puis route par peer-id vers une mitmproxy instance dans une LXC.
- [ ] OR : wg-quick dans la LXC privilégiée avec accès netfilter
partagé (kernel module loaded host-side, exposé via /proc).
- [ ] Évaluer aussi le patch upstream mitmproxy pour multi-peer
`--mode wireguard` (probablement pas dans leur scope).
Approche initialement envisagée (dispatcher custom, LXC privilégiée
shared-netfilter) — résolue plus simplement via nft `numgen inc mod 4`
+ conntrack flow-pinning + systemd template `@.service`.
**#502 D redesign** capture la suite (captive → LXC TPROXY-inside).
---

View File

@ -1,5 +1,103 @@
# WIP — Work In Progress
*Mis à jour : 2026-06-08*
*Mis à jour : 2026-06-09*
---
## 🔄 2026-06-09 : Phase 10 — Banner injection perf quick wins + postinst regression fix (ref #501)
User signal : "la banner n'apparait qu'en fin de chargement et les
chargements de pages sont très lents. peut-on améliorer ces rendus sans
compromettre l'analyse... ou alléger l'analyse nécessaire à la
bannière ?".
Diagnostic — chaque réponse HTML déclenchait trois opérations
host-stables sans cache et **un scan plein-corps en regex** juste pour
remplir une tile "trackers: N" que personne ne lit :
* `_host_app.classify_host(host)` — walk sur 100+ patterns (5-50 ms à froid)
* `_whitelist_mod.match(host)` — match contre la whitelist
* `_geo_mod.lookup(host)` — DNS + mmdb GeoIP (5-50 ms à froid)
* `_count_trackers_in_body(flow.response.content)` — buffer-read intégral
du corps avant injection → la bannière n'apparaissait qu'en fin de
streaming.
### ✅ Done — packages
| Package | from → to | Commit |
|---|---|---|
| secubox-toolbox | 2.5.0 → **2.5.1** | `ce059d0f` (banner perf quick wins, déployé live) |
| secubox-toolbox | 2.5.1 → **2.5.2** | `15f48d9d` (postinst regression fix, code only) |
Branche : `perf/501-banner-injection-quickwins` (poussée sur origin, **pas de PR**).
### ✅ Done — détails
**1. Banner injection quick wins** (`secubox-toolbox` 2.5.1, commit `ce059d0f`)
* `_host_signals(host)` — nouvelle fonction LRU-cachée (maxsize=2048)
retournant un tuple `(app_emoji, app, flag, country, asn, status,
status_icon)`. Re-hits coûtent un dict lookup au lieu de 5-50 ms.
* `_count_trackers_in_body()` retiré du chemin chaud. Le flag
`is_tracker_host` (regex cheap sur l'host de la requête) couvre le
signal privacy.
* `_MAX_INJECT_BYTES = 2 MB` — skip de l'injection sur les gros corps
via pré-check `Content-Length` + garde défensive `len(body)` pour les
streamed bodies sans CL.
* Tile cookies + ⚠ tracker-host conservée ; tile 🎯 N trackers (corps)
morte → supprimée.
**Mesure live** : sur le déploiement gk2, l'iPhone confirme "browsing
performance on iPhone is better... perfect work".
**2. Postinst regression fix** (`secubox-toolbox` 2.5.2, commit `15f48d9d`)
Deux régressions silencieuses pendant le déploiement 2.5.0 → 2.5.1 sur gk2 :
* **kbin.gk2.secubox.in 503 pendant 5 min** : dpkg upgrade a SIGTERMé
`secubox-toolbox.service` et ne l'a jamais redémarré
(`dh_installsystemd --no-start --no-enable` dans debian/rules).
* **iPhone tunnel KO** : postinst a écrasé
`/etc/nftables.d/secubox-toolbox-wg.nft` avec la version single-port
DNAT, supprimant le fanout Phase 9 que l'opérateur avait déployé en
runtime → tout le trafic pinné sur worker@1 à 97 % CPU.
Fixes postinst (pas de code change) :
* Postinst déploie maintenant le fanout drop-in en
`/etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft`. Le préfixe `zz-`
garantit le tri alphabétique après le base file dans le glob include
de `/etc/nftables.conf` → le base file crée la table + chains, puis
le zz drop-in flush+repeuple la chain `prerouting` avec le numgen
fanout map.
* Sur upgrade (`$2` set), `systemctl try-restart` sur
`secubox-toolbox.service`, `secubox-toolbox-mitm.service`, et les 4
instances `secubox-toolbox-mitm-wg-worker@{1..4}.service`. `try-restart`
est no-op si l'unité n'est pas active, donc safe sur fresh install.
### Mémoire mise à jour
* [nft layered drop-ins persistence](../memory/feedback_nft_layered_dropins_persistence.md)
— drop-ins doivent trier APRÈS leur table-creator ; ne jamais
symlinker en place du base file.
* [postinst must preserve runtime state](../memory/feedback_postinst_preserve_runtime_state.md)
— dpkg upgrade SIGTERMe l'unité ; postinst doit try-restart + redéployer
les drop-ins nft appliqués en runtime.
### ⬜ Next up
* **Build + deploy 2.5.2 sur gk2** (postinst-only fix — pas de code change ;
attendre prochaine fenêtre de maintenance, ne pas perturber la session
iPhone actuellement stable).
* **PR #501** quand prêt (branche poussée mais pas de PR ouvert, per
rule "Don't open PRs unprompted").
* **Phase 10 future** : refactor banner injection vers une approche
JS-driven async pour éliminer complètement le buffer-read du corps
(envoyer un `<script>` minimal à la position `<head>` qui fetch le
contenu de la bannière + le contexte via XHR, puis injecte le DOM
côté client). Ferait passer la bannière de "fin de streaming" à
"first paint" pour TOUS les corps, pas seulement les < 2 MB.
* **#502 D redesign** : captive → LXC avec TPROXY-inside-LXC (toujours
en attente, pas commencé).
---

View File

@ -1,3 +1,63 @@
secubox-toolbox (2.5.2-1~bookworm1) bookworm; urgency=medium
* Phase 10.1 (#501 perf) — postinst regressions caught on 2.5.1 deploy.
The 2.5.0 → 2.5.1 upgrade on gk2 produced TWO silent regressions
that were not code bugs but deploy-script holes :
- kbin.gk2.secubox.in returned 503 for 5 min : dpkg upgrade
SIGTERMed secubox-toolbox.service and never restarted it
(dh_installsystemd ships --no-start --no-enable in rules).
- iPhone tunnel went from ~10s/page to unusable : postinst's
`install -m 0644 .../secubox-toolbox-wg.nft /etc/nftables.d/`
overwrote the operator-deployed fanout drop-in, regressing
to single-port DNAT and pinning all traffic to worker@1
(97% CPU while w2-w4 idle).
Fixes :
- postinst now also installs the Phase 9 fanout file as
/etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft (zz- prefix
so it sorts after the base file in the alphabetical glob
include, overriding the prerouting chain with the numgen
fanout map).
- postinst now runs `systemctl try-restart` on upgrade ($2 set)
for secubox-toolbox.service, secubox-toolbox-mitm.service,
and the 4 mitm-wg-worker@{1..4}.service instances. try-restart
is a no-op when the unit isn't running, so fresh installs are
unaffected.
No code change ; postinst only.
-- Gerald KERMA <devel@cybermind.fr> Tue, 09 Jun 2026 07:30:00 +0200
secubox-toolbox (2.5.1-1~bookworm1) bookworm; urgency=medium
* Phase 10 (#501 perf) — banner injection quick wins.
User reported "la banner n'apparait qu'en fin de chargement et les
chargements de pages sont très lents". Root cause : every HTML
response triggered (a) a full body buffer-read just to scan for
tracker patterns, (b) un-cached _host_app.classify_host walk over
100+ regexes, (c) un-cached _geo_mod.lookup hitting DNS + the
GeoIP mmdb. All three ran on every flow, all three were
host-stable. Quick wins :
- LRU cache (maxsize=2048) on a new _host_signals(host) tuple
returning (app_emoji, app, flag, country, asn, status,
status_icon). Re-hits cost a dict lookup instead of 5-50 ms.
- Dropped the per-response body tracker scan
(_count_trackers_in_body). The 1st-party is_tracker_host
flag (cheap regex on the request host) is the privacy-
relevant signal ; the "trackers: N in body" count was a
nice-to-have that delayed banner injection by 30-200 ms on
big publishers.
- Skip injection entirely when the response body exceeds 2 MB
(cheap pre-flight check on Content-Length, plus a defensive
len() guard for streamed bodies without CL). Huge SPA
bundles and PDF-as-HTML now flow through untouched.
- Trimmed the cookie/tracker tile : the dead "🎯 N" body-scan
branch is gone, only "🍪 N" + "⚠ tracker-host" remain.
Net effect on a typical page : banner shows up at first paint
instead of after the body finishes streaming, and TTFB for big
HTML responses is no longer banner-bound.
No protocol or schema change ; addon order is identical.
-- Gerald KERMA <devel@cybermind.fr> Mon, 09 Jun 2026 22:00:00 +0200
secubox-toolbox (2.5.0-1~bookworm1) bookworm; urgency=medium
* Phase 9 (#501) — multi-worker fanout for mitm-wg.

View File

@ -112,10 +112,21 @@ fi
# Phase 7 (#498) : install nft drop-in for boot-survival + reload nft so
# the running ruleset matches what nftables.service will load at next boot.
# Phase 10 (#501 perf) : ALSO deploy the Phase 9 fanout drop-in as
# zz-prefixed so it loads AFTER the base file (alphabetical glob) and
# overrides the prerouting chain with the 4-worker numgen DNAT. The
# base file creates the table + chains + 51820 input rule ; the zz
# drop-in flushes prerouting and repopulates it with the fanout map.
# Pre-2.5.2 operators who applied the fanout via runtime nft -f got it
# silently wiped on every dpkg upgrade — fixed here (idempotent).
if [ -f /usr/share/secubox/toolbox/nftables.d/secubox-toolbox-wg.nft ]; then
install -d -m 0755 /etc/nftables.d
install -m 0644 /usr/share/secubox/toolbox/nftables.d/secubox-toolbox-wg.nft \
/etc/nftables.d/secubox-toolbox-wg.nft
if [ -f /usr/share/secubox/toolbox/nftables.d/secubox-toolbox-wg-fanout.nft ]; then
install -m 0644 /usr/share/secubox/toolbox/nftables.d/secubox-toolbox-wg-fanout.nft \
/etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft
fi
if systemctl is-active --quiet nftables.service 2>/dev/null; then
systemctl reload nftables.service 2>/dev/null \
|| /usr/sbin/nft -f /etc/nftables.d/secubox-toolbox-wg.nft 2>/dev/null \
@ -142,6 +153,25 @@ fi
systemctl restart unbound.service 2>/dev/null || true
fi
fi
# Phase 10 (#501 perf) : on UPGRADE ($2 = previous version), try-restart
# the long-running daemons so the new code is live without operator
# intervention. dh_installsystemd ships --no-start --no-enable in
# debian/rules so without this loop secubox-toolbox.service stays dead
# post-upgrade until reboot (caught 2026-06-09 : kbin.gk2 503'd for 5 min
# after the 2.5.0 → 2.5.1 upgrade because the unit was SIGTERMed and
# never restarted). try-restart is a no-op when the unit is not
# running, so it's safe on fresh install / unconfigured boards.
if [ -n "${2:-}" ] && [ -d /run/systemd/system ]; then
for unit in secubox-toolbox.service \
secubox-toolbox-mitm.service \
secubox-toolbox-mitm-wg-worker@1.service \
secubox-toolbox-mitm-wg-worker@2.service \
secubox-toolbox-mitm-wg-worker@3.service \
secubox-toolbox-mitm-wg-worker@4.service ; do
systemctl try-restart "$unit" 2>/dev/null || true
done
fi
;;
esac

View File

@ -219,8 +219,84 @@ def _count_trackers_in_body(body: bytes, cap: int = 200_000) -> int:
return 0
# Phase 10 (#501 perf) — per-host context cache.
#
# `_compute_site_context` used to call into 4 sub-systems per response :
# `_host_app.classify_host` (host → emoji + app via 100+ pattern walk),
# `_whitelist_mod.match`, `_geo_mod.lookup` (DNS + GeoIP mmdb lookup),
# plus the body tracker scan + cookie counts. Of those, ALL host-keyed
# results are stable for hours — there's no point recomputing per
# response. An LRU cache on `host` shaves ~20-50 ms per HTML response,
# and frees the GIL during banner injection so the rest of the addon
# chain (and other workers) can progress.
import functools as _functools
@_functools.lru_cache(maxsize=2048)
def _host_signals(host: str) -> tuple:
"""Return (app_emoji, app, flag, country, asn, status, status_icon).
Tuple form keeps the cache key + value flat and hashable. We don't
cache cookie counts (per-flow) or tracker scan (per-response) only
the truly host-stable values.
A cache miss is the expensive path : classify_host (~50 µs typical,
up to 5 ms on the hot patterns), whitelist match (~1 µs), and the
GeoIP lookup (DNS + mmdb can be 5-50 ms on a cold DNS cache).
"""
app_emoji = ""
app = host
flag = ""
country = ""
asn = ""
status = "inspected"
status_icon = "🔍"
if not _HAS_CLASSIFIERS:
return (app_emoji, app, flag, country, asn, status, status_icon)
try:
cls = _host_app.classify_host(host)
app_emoji = cls.get("emoji", "")
app = cls.get("app", host) if cls.get("app") != "?" else host
except Exception:
pass
try:
wl = _whitelist_mod.match(host)
if wl:
status = "bypassed-whitelist"
status_icon = "🛡"
elif re.search(r"\.(signal|whispersystems|threema|simplex|matrix|proton|tutanota)\.", host):
status = "e2e-opaque"
status_icon = "🔐"
except Exception:
pass
if _HAS_GEO:
try:
info = _geo_mod.lookup(host) or {}
flag = info.get("flag", "")
country = info.get("country_iso", "")
asn = (info.get("asn_org") or "")[:24]
except Exception:
pass
return (app_emoji, app, flag, country, asn, status, status_icon)
def _compute_site_context(flow: http.HTTPFlow) -> dict:
"""Compute per-site signals for the dynamic right-side of the banner."""
"""Compute per-site signals for the dynamic right-side of the banner.
Phase 10 (#501 perf) — host-stable signals come from the LRU cache
(`_host_signals`). The body tracker scan is REMOVED ; the
1st-party `is_tracker_host` flag already surfaces tracker hosts as
` tracker-host` in the banner, which is the privacy-relevant
signal. Counting trackers in the body required a full body buffer
scan that delayed banner injection by 30-200 ms per HTML response
on big publishers not worth it for a "trackers: N" count that
most users don't read.
"""
host = (flow.request.host or "").lower()
ctx = {
"host": host[:50],
@ -240,6 +316,10 @@ def _compute_site_context(flow: http.HTTPFlow) -> dict:
"utiq_recent_count": 0,
}
# Host-stable signals — single LRU lookup per host.
(ctx["app_emoji"], ctx["app"], ctx["flag"], ctx["country"], ctx["asn"],
ctx["status"], ctx["status_icon"]) = _host_signals(host)
# Cookies (cheap : just header counts, name-less for privacy)
set_n, sent_n = _count_cookies(flow)
ctx["cookies_set"] = set_n
@ -259,43 +339,10 @@ def _compute_site_context(flow: http.HTTPFlow) -> dict:
except Exception:
pass
# Trackers : 1st-party host check + body scan
# Trackers : 1st-party host flag only — body scan removed in Phase 10
# for perceived-latency win. The banner still shows ⚠ tracker-host
# when the visited site is itself a known tracker domain.
ctx["is_tracker_host"] = bool(_TRACKER_HOST_PATTERNS.match(host))
if flow.response and flow.response.content:
ctx["trackers"] = _count_trackers_in_body(flow.response.content)
if not _HAS_CLASSIFIERS:
return ctx
# App classification
try:
cls = _host_app.classify_host(host)
ctx["app_emoji"] = cls.get("emoji", "")
ctx["app"] = cls.get("app", host) if cls.get("app") != "?" else host
except Exception:
pass
# Whitelist / status
try:
wl = _whitelist_mod.match(host)
if wl:
ctx["status"] = "bypassed-whitelist"
ctx["status_icon"] = "🛡"
# E2E pattern check (cheap)
elif re.search(r"\.(signal|whispersystems|threema|simplex|matrix|proton|tutanota)\.", host):
ctx["status"] = "e2e-opaque"
ctx["status_icon"] = "🔐"
except Exception:
pass
# Geo (flag + country + ASN)
if _HAS_GEO:
try:
info = _geo_mod.lookup(host) or {}
ctx["flag"] = info.get("flag", "")
ctx["country"] = info.get("country_iso", "")
ctx["asn"] = (info.get("asn_org") or "")[:24]
except Exception:
pass
# Quality grade (passive — we only see response headers + transport)
try:
@ -382,23 +429,17 @@ def _banner_html_dynamic(sha1: str, ctx: dict, csp_strict: bool,
right_parts.append(ctx["flag"])
if ctx["app_emoji"] and ctx["app"]:
right_parts.append(f"{_ncr(ctx['app_emoji'])} {_ncr(ctx['app'])}")
# Phase 6.G : cookies + trackers (privacy signals)
# Phase 6.G : cookies + 1st-party tracker host (privacy signals).
# Phase 10 perf : the per-response tracker body scan is gone — we keep
# only the host-level flag (cheap regex on the request host).
cookies_set = ctx.get("cookies_set", 0)
cookies_sent = ctx.get("cookies_sent", 0)
trackers = ctx.get("trackers", 0)
is_tracker = ctx.get("is_tracker_host", False)
cookie_total = cookies_set + cookies_sent
if cookie_total > 0:
# 🍪 N (set+sent) — colored if many
cookie_emoji = "&#x1F36A;" # 🍪
right_parts.append(f"{cookie_emoji} {cookie_total}")
if trackers > 0 or is_tracker:
# 🎯 N trackers in body, or ⚠ if 1st-party host is itself a tracker
if is_tracker:
right_parts.append("&#x26A0; tracker-host") # ⚠
else:
target_emoji = "&#x1F3AF;" # 🎯
right_parts.append(f"{target_emoji} {trackers}")
right_parts.append(f"&#x1F36A; {cookie_total}") # 🍪
if is_tracker:
right_parts.append("&#x26A0; tracker-host") # ⚠
# Phase 8 (#500) — surface Utiq hits for this client. Cheap query
# against the utiq_events store (last 1 h). Avoids surfacing the
# tile on stale state by capping the lookback window.
@ -527,6 +568,9 @@ def _client_level(flow) -> str:
return "r1"
_MAX_INJECT_BYTES = 2 * 1024 * 1024 # Phase 10 perf cap : skip injection on huge bodies
class InjectBanner:
def response(self, flow: http.HTTPFlow) -> None:
if not flow.response:
@ -540,9 +584,21 @@ class InjectBanner:
# AND R3 (portable WG opt-in). R0/R1 stay banner-free.
if _client_level(flow) not in ("r2", "r3"):
return
# Phase 10 perf : cheap pre-flight check on Content-Length to avoid
# reading multi-MB bodies into RAM just to discover we'd skip them.
# `flow.response.content` would buffer the whole body before returning.
try:
cl = int(flow.response.headers.get("content-length", "0") or "0")
if cl > _MAX_INJECT_BYTES:
return
except (TypeError, ValueError):
pass
body = flow.response.content
if body is None or _GUARD in body:
return
if len(body) > _MAX_INJECT_BYTES:
# Streamed bodies without content-length still get caught here.
return
m = _RE_BODY_CLOSE.search(body)
if not m:
return