Compare commits

...

5 Commits

Author SHA1 Message Date
8263bc7681 docs: update WIP/HISTORY/TODO for Phase 11 A+B + gk2 system triage + v2.13.14
Some checks are pending
License Headers / check (push) Waiting to run
- Phase 11 social mapping A (#506/2.6.0) + B (#507/2.6.1) deployed live;
    C (#508) checkpointed at 55626e51.
  - gk2 triage: CrowdSec bouncer tables, /var/log/secubox traversal,
    WAF double-buffer cache (#509/1.2.2), PeerTube/PhotoPrism restart.
  - CI: espressobin disable (#504), WAF perf (#510) merged, v2.13.14 tagged.
  - Carried: Round Eye link, toolbox tab decision, var/log postinst patch.
2026-06-10 08:44:55 +02:00
CyberMind
a6f44807c6
Merge pull request #510 from CyberMind-FR/fix/509-waf-soc-perf-double-buffered-cache-memor
perf(waf): double-buffered cache for /waf/stats + /soc (#509)
2026-06-10 08:42:12 +02:00
CyberMind
3ebb4477cc
Merge pull request #504 from CyberMind-FR/fix/503-ci-disable-espressobin-v7-ultra-image-bu
CI: drop espressobin-v7 + ultra from scheduled image matrix (#503)
2026-06-10 08:42:09 +02:00
b7666e5f57 perf(waf): double-buffered cache for /waf/stats (memory + disk) with incremental log reading (closes #509)
User reported /waf/ + /soc/ dashboards showing empty threats and tracked
attackers cards.  Live triage on gk2 found _get_threat_stats() iterating
all 332k JSONL entries in the 110 MB waf-threats.log on every request,
pinning the aggregator at 89% CPU with 8+ concurrent open handles.

  - api/main.py: _load_stats_disk_cache + _save_stats_disk_cache helpers
    persist counters + last-read byte position to
    /var/lib/secubox/waf/stats-disk-cache.json (atomic .tmp -> rename so
    a crash mid-write never corrupts).
  - api/main.py: _get_threat_stats reseats the file at the saved byte
    position, reads only the new tail since then, persists the new
    position before returning.
  - api/main.py: log rotation / truncation detected via size shrink;
    counters reset cleanly.
  - api/main.py: day rollover resets threats_today only; cumulative
    counters survive.
  - api/main.py: ip_countries dict capped at 1200 entries (most-active
    1000 retained on overflow) so the cache file stays small.

Live on gk2 after deploy: /waf/stats steady-state 30-37 ms (vs 30s+
timeout before).  /waf/alerts, /waf/bans/history, /soc/* all 200 in
under 200 ms.  Aggregator CPU drops from 89% to 52% under same load.

  - debian/changelog: 1.2.1 -> 1.2.2-1~bookworm1.
2026-06-10 08:39:57 +02:00
3d52b63d26 ci: drop espressobin-v7 + espressobin-ultra from scheduled image matrix (ref #503)
The two boards fail in the cross-arm64 chroot stage of build-image.yml
and (even with fail-fast: false) block the downstream release.yml job
from publishing the OTHER boards' images.  Releases v2.13.9 through
v2.13.12 all hit this trap.

This change keeps both entries available via workflow_dispatch (operators
can still build them on-demand), but removes them from the push-tag
scheduled matrix so mochabin / vm-x64 / rpi400 actually ship on tag.

Board support files (board/espressobin-*/, image/build-image.sh
--board espressobin-*) stay in tree.

  - build-image.yml matrix excludes espressobin-v7 + espressobin-ultra
    on push and workflow_call.
  - build-image.yml workflow_dispatch choice list keeps both entries
    flagged as on-demand only.
  - Release notes template drops the two image rows, adds a note
    explaining the on-demand path.
  - Install instruction adjusted (MOCHAbin + Raspberry Pi 400).
2026-06-09 09:42:08 +02:00
6 changed files with 369 additions and 61 deletions

View File

@ -3,6 +3,79 @@
--- ---
## 2026-06-10 — Phase 11 social mapping (A+B) + system triage + v2.13.14 (ref #502-#509)
### Package bumps
| Package | from → to |
|---|---|
| secubox-toolbox | 2.5.2 → **2.6.0** (#505 Phase 11.A backend) |
| secubox-toolbox | 2.6.0 → **2.6.1** (#507 Phase 11.B frontend) |
| secubox-waf | 1.2.1 → **1.2.2** (#509 double-buffer cache) |
| Release tag | **v2.13.14** |
### Phase 11 — Social mapping per device (#502)
**11.A backend** (`secubox-toolbox 2.6.0`, PR #506) — `social.py`
correlation engine + 3 SQLite tables (`social_edges` / `social_nodes`
/ `social_links`), `social_graph.py` mitm addon (cookie_id_hash =
sha256, never persists raw values), `/social/graph/{token}` +
`/social/wipe/{token}` (RGPD art. 17) + `/admin/social-aggregate`
endpoints, fold + purge background tasks.
**11.B frontend** (`secubox-toolbox 2.6.1`, #507) — d3 force-directed
graph view at `/social/{token}`, FR/EN i18n, server-side favicon proxy
(7d cache), wipe modal with 3s countdown, full-viewport layout with
pan/pinch-zoom + pre-warm + autoFit. Splash menu link `/social/me`
(🕸️ Ma carto) resolving R3 peers via X-R3-Peer sentinel.
**Live result** : graph renders real cross-site tracking on gk2 — the
ad-tech relay `35.214.136.108` bridging 360yield + seedtag +
smartadserver + smilewanted publishers, surfacing exactly the
fingerprint reuse Phase 11 targets.
**Critical live-deploy fixes** : addon relative-import never resolved
(mitmproxy loads addons top-level) → inlined; **PYTHONPATH missing in
mitm-wg launcher** silently degraded every addon's `secubox_toolbox`
imports → fixed globally (also un-degraded inject_banner's host
classification + GeoIP); i18n moved to `<script>` block (FR
apostrophes broke JSON.parse); StaticFiles mount + chmod 0755 www
(kbin HAProxy path bypasses nginx).
**11.C** (#508) — WIP checkpoint `55626e51` : schema (consent_state +
GeoIP columns), EU/EEA whitelist, GeoIP fold enrichment, evidence()
helper. PDF generator + consent-probe addon + frontend wire pending.
### System triage on gk2
- **CrowdSec firewall** — bouncer ran healthy but had no nft tables
(external flush). Restart recreated `ip crowdsec` + `ip6 crowdsec6`,
100 live decisions.
- **WAF + SOC empty cards**`/var/log/secubox` was 0750
secubox-toolbox, blocking the aggregator (user `secubox`) from
traversing to read `waf-threats.log`. chmod 0755 live.
- **WAF /stats 30s+ timeout**`_get_threat_stats()` re-parsed the full
110 MB / 332k-entry JSONL on every request (89% aggregator CPU).
Fixed via #509 double-buffered cache : disk-persisted counters +
byte-position incremental tail reading. `/waf/stats` now 30-37 ms.
- **PeerTube + PhotoPrism 502** — LXCs were STOPPED; `lxc-start` → live.
### CI + release
- #503/PR #504 — drop espressobin-v7 + ultra from the scheduled
build-image matrix (cause of the v2.13.9-12 release failures).
- #509/PR #510 — double-buffer WAF cache.
- Merged both to master (`3ebb4477`, `a6f44807`), tagged **v2.13.14**.
### Carried forward
- Round Eye gadget remote-link to gk2 (shows local metrics only) —
needs Pi-side investigation.
- admin.gk2/toolbox/ tab surfacing decision (proxy/iframe/sub-tab).
- `/var/log/secubox` 0755 source-side postinst patch (live-only for now).
---
## 2026-06-09 — Phase 10 banner injection perf quick wins + postinst regression fix (ref #501) ## 2026-06-09 — Phase 10 banner injection perf quick wins + postinst regression fix (ref #501)
### Package bumps ### Package bumps

View File

@ -1,10 +1,40 @@
# TODO — SecuBox-DEB Backlog # TODO — SecuBox-DEB Backlog
*Mis à jour : 2026-06-09* *Mis à jour : 2026-06-10*
--- ---
## 🔥 P0 — Immediate (in flight) ## 🔥 P0 — Immediate (in flight)
### Phase 11 — Social mapping per device (#502)
- [x] **11.A backend** (#505 / PR #506, `secubox-toolbox 2.6.0`) — correlation
engine + SQLite + API. Déployé live gk2.
- [x] **11.B frontend** (#507, `2.6.1`) — d3 graph + i18n FR/EN + favicon
proxy + wipe modal + full-viewport pan/zoom. Live `/social/me`.
- [ ] **11.C evidence + PDF** (#508) — reprendre depuis checkpoint
`55626e51` : consent-probe addon (OneTrust/Didomi/Quantcast/Sourcepoint)
+ extra-EU flag + PDF bilingue FR/EN + wire frontend (remplacer le
placeholder "Rapport PDF arrive en Phase 11.C").
- [ ] **Merger PR #506** (11.A backend) → master quand prêt.
- [ ] **11.D opérateur** (futur) — dashboard agrégat `/admin/social/` HTML
(l'endpoint JSON `/admin/social-aggregate` existe déjà depuis 2.6.0).
### Système — bugs gk2 (2026-06-10)
- [x] **CrowdSec firewall** — restart bouncer → tables nft recréées.
- [x] **WAF /var/log/secubox traversal** — chmod 0755 live.
- [x] **WAF /stats perf** (#509 / PR #510, `secubox-waf 1.2.2`) — double-buffer
cache. Mergé + `v2.13.14`.
- [x] **PeerTube + PhotoPrism** — LXC redémarrés.
- [ ] **Round Eye gadget** — ne voit plus gk2, métriques locales only.
Investigation Pi Zero (dashboard `localhost:8000` proxie vers gk2 via OTG).
- [ ] **admin.gk2/toolbox/ tab** — toolbox déjà wiré (`/toolbox/` alias +
sidebar). User veut surfacer l'UI kbin/admin dedans — décision en
attente : proxy_pass `10.99.0.1:8088/admin/` (cleanest) / iframe (CSP) /
sous-tab frontend-only.
- [ ] **Postinst `/var/log/secubox` 0755** — porter le fix live en source
(même pattern que `/etc/secubox` parent + `/usr/share/secubox/www`).
### Phase 10 — Banner injection perf (#501) — ✅ shipped 2026-06-09 ### Phase 10 — Banner injection perf (#501) — ✅ shipped 2026-06-09
- [x] **Banner perf quick wins** (`secubox-toolbox` 2.5.1, commit `ce059d0f`) - [x] **Banner perf quick wins** (`secubox-toolbox` 2.5.1, commit `ce059d0f`)

View File

@ -1,5 +1,73 @@
# WIP — Work In Progress # WIP — Work In Progress
*Mis à jour : 2026-06-09* *Mis à jour : 2026-06-10*
---
## 🔄 2026-06-10 : Phase 11 social mapping (A+B) + system triage round (ref #502-#509)
Grosse journée : Phase 11 social mapping shippé jusqu'au frontend live,
puis une cascade de fixes système découverts par l'utilisateur en
production sur gk2.
### ✅ Done — Phase 11 social mapping (#502 parent)
| Issue | Phase | État |
|---|---|---|
| #505 / PR #506 | **11.A backend** : correlation engine + SQLite + API | ✅ mergeable, déployé live `secubox-toolbox 2.6.0` |
| #507 | **11.B frontend** : d3 graph + i18n FR/EN + favicon proxy + wipe modal | ✅ déployé live `2.6.1`, branche poussée |
| #508 | **11.C evidence + PDF** | 🔄 WIP checkpoint `55626e51` (schema + GeoIP fold + evidence helper) |
**Design** : 2 rounds de design lock sur #502 (Gemini + GPT mockups),
edge-thickness + animated-pulse + tracker bottom-sheet + 3s wipe
countdown verrouillés.
**Live URL** : `https://kbin.gk2.secubox.in/social/me` (splash → 🕸️ Ma carto).
Le graphe montre les trackers cross-site réels (relais ad-tech
`35.214.136.108` reliant 360yield + seedtag + smartadserver + smilewanted).
**Fixes live-deploy critiques découverts** :
- `social_graph.py` : `from . import local_store` ne résolvait jamais
(mitmproxy charge les addons en top-level) → inliné le WG peer hash.
- **PYTHONPATH manquant dans le launcher mitm-wg** : TOUS les addons
(`inject_banner` dpi/geo/store, `social_graph`) avaient leurs
`from secubox_toolbox import …` silencieusement dégradés. Fix global.
- i18n déplacé de `data-*` attr vers `<script>` (apostrophes FR
cassaient `JSON.parse`).
- StaticFiles mount + chmod 0755 `/usr/share/secubox/www` (kbin passe
par HAProxy direct uvicorn, bypass nginx).
- d3 : full-viewport + pan/pinch-zoom + pre-warm 300 ticks + autoFit
data-based (146 nodes spread off-screen avant).
### ✅ Done — triage système gk2 (2026-06-10)
| Bug | Cause racine | Fix |
|---|---|---|
| CrowdSec firewall status faux | bouncer tournait mais sans tables nft (flush externe) | restart bouncer → `ip crowdsec` + `ip6 crowdsec6` recréées, 100 décisions live |
| WAF /threats + tracked attackers vides | `/var/log/secubox` 0750 secubox-toolbox bloquait traversal aggregator (user `secubox`) | chmod 0755 live |
| WAF /stats timeout 30s+ | `_get_threat_stats()` re-parsait 110 MB / 332k JSONL à CHAQUE requête (CPU 89%) | **#509 double-buffer cache** (disk + byte-position incrémental) `secubox-waf 1.2.2` |
| SOC /soc/ status WAF+firewall faux | consommait les mêmes endpoints WAF cassés | résolu en cascade par le fix WAF |
| PeerTube + PhotoPrism 502 | LXC STOPPED | `lxc-start` → 200 / 307 |
### ✅ Done — CI + release
- **#503 / PR #504** : drop espressobin-v7 + ultra du matrix build-image
scheduled (faisaient échouer le pipeline release v2.13.9-12).
- **#509 / PR #510** : double-buffer WAF cache.
- **Merge #504 + #510 → master** (`3ebb4477`, `a6f44807`).
- **Tag `v2.13.14`** poussé.
### ⬜ Next up
- **Round Eye gadget** : ne voit plus le lien gk2, montre ses métriques
locales. iface `eye-remote` UP côté gk2, route `/api/v1/eye-remote/*`
renvoie page erreur. Investigation côté Pi Zero nécessaire.
- **admin.gk2/toolbox/ tab** : le toolbox est DÉJÀ wiré (`/toolbox/`
alias + sidebar). User veut surfacer l'UI kbin/admin dedans —
décision en attente : proxy_pass / iframe / sous-tab.
- **Phase 11.C** : reprendre depuis `55626e51` (consent probe addon +
extra-EU flag + PDF bilingue + wire frontend).
- **Postinst patch** : `/var/log/secubox` 0755 en source (pour l'instant
fix live uniquement) — même pattern que `/etc/secubox` + `www`.
--- ---

View File

@ -24,8 +24,8 @@ on:
type: choice type: choice
options: options:
- mochabin - mochabin
- espressobin-v7 - espressobin-v7 # on-demand only — disabled in scheduled CI, ref #503
- espressobin-ultra - espressobin-ultra # on-demand only — disabled in scheduled CI, ref #503
- vm-x64 - vm-x64
- vm-arm64 - vm-arm64
- rpi400 - rpi400
@ -50,7 +50,12 @@ jobs:
fail-fast: false fail-fast: false
matrix: matrix:
# Handle all event types: push (tags), workflow_call, workflow_dispatch # Handle all event types: push (tags), workflow_call, workflow_dispatch
board: ${{ (github.event_name == 'push' || (inputs.board == 'all' || inputs.board == '')) && fromJson('["mochabin","espressobin-v7","espressobin-ultra","vm-x64","rpi400"]') || (github.event.inputs.board == 'all' && fromJson('["mochabin","espressobin-v7","espressobin-ultra","vm-x64","rpi400"]') || fromJson(format('["{0}"]', inputs.board || github.event.inputs.board || 'vm-x64'))) }} # Scheduled / tag-push matrix excludes espressobin-v7 + espressobin-ultra (#503) :
# those board builds fail in the cross-arm64 chroot stage and block the
# downstream release.yml job for every image even though fail-fast is off.
# Operators can still build them on-demand via workflow_dispatch (the
# choice list above retains the entries).
board: ${{ (github.event_name == 'push' || (inputs.board == 'all' || inputs.board == '')) && fromJson('["mochabin","vm-x64","rpi400"]') || (github.event.inputs.board == 'all' && fromJson('["mochabin","vm-x64","rpi400"]') || fromJson(format('["{0}"]', inputs.board || github.event.inputs.board || 'vm-x64'))) }}
steps: steps:
- name: Checkout - name: Checkout
@ -221,15 +226,17 @@ jobs:
| Image | Board | Architecture | Description | | Image | Board | Architecture | Description |
|-------|-------|--------------|-------------| |-------|-------|--------------|-------------|
| `secubox-mochabin-bookworm.img.gz` | MOCHAbin | arm64 | Marvell Armada 7040 (Pro) | | `secubox-mochabin-bookworm.img.gz` | MOCHAbin | arm64 | Marvell Armada 7040 (Pro) |
| `secubox-espressobin-v7-bookworm.img.gz` | ESPRESSObin v7 | arm64 | Marvell Armada 3720 (Lite) |
| `secubox-espressobin-ultra-bookworm.img.gz` | ESPRESSObin Ultra | arm64 | Marvell Armada 3720 (Lite+) |
| `secubox-rpi400-bookworm.img.gz` | Raspberry Pi 400 | arm64 | Pi 400 / Pi 4 | | `secubox-rpi400-bookworm.img.gz` | Raspberry Pi 400 | arm64 | Pi 400 / Pi 4 |
| `secubox-vm-x64-bookworm.img.gz` | VirtualBox/QEMU | amd64 | VM for testing | | `secubox-vm-x64-bookworm.img.gz` | VirtualBox/QEMU | amd64 | VM for testing |
| `create-qemu-arm64-vm.sh` | QEMU ARM64 | script | Run ARM64 on x86 hosts | | `create-qemu-arm64-vm.sh` | QEMU ARM64 | script | Run ARM64 on x86 hosts |
*ESPRESSObin v7 and Ultra board images are no longer published in
scheduled releases (see #503). Board support remains in tree and
on-demand builds are available via workflow_dispatch.*
### Installation ### Installation
**ARM64 boards (MOCHAbin, ESPRESSObin):** **ARM64 boards (MOCHAbin, Raspberry Pi 400):**
```bash ```bash
# Flash to SD card or eMMC # Flash to SD card or eMMC
gunzip -c secubox-mochabin-bookworm.img.gz | sudo dd of=/dev/sdX bs=4M status=progress gunzip -c secubox-mochabin-bookworm.img.gz | sudo dd of=/dev/sdX bs=4M status=progress

View File

@ -26,6 +26,10 @@ import geoip2.errors
RULES_PATH = "/usr/share/secubox/waf/waf-rules.json" RULES_PATH = "/usr/share/secubox/waf/waf-rules.json"
THREATS_LOG = "/var/log/secubox/waf-threats.log" THREATS_LOG = "/var/log/secubox/waf-threats.log"
STATS_CACHE = "/tmp/secubox/waf-stats.json" STATS_CACHE = "/tmp/secubox/waf-stats.json"
# Phase 7+ (#509) — disk-persisted counters + log byte position for
# the double-buffered cache. Survives aggregator restart, populated
# incrementally by the warm refresh loop.
STATS_DISK_CACHE = "/var/lib/secubox/waf/stats-disk-cache.json"
# Runtime state # Runtime state
_compiled_patterns: Dict[str, List[dict]] = {} _compiled_patterns: Dict[str, List[dict]] = {}
@ -291,70 +295,170 @@ def _get_bans() -> List[dict]:
return [] return []
def _get_threat_stats() -> dict: def _load_stats_disk_cache() -> dict:
"""Get threat statistics from log with GeoIP country lookup.""" """Load the persisted counter state + last-read byte position.
stats = {
"total_threats": 0,
"threats_today": 0,
"by_category": defaultdict(int),
"by_severity": defaultdict(int),
"top_ips": defaultdict(int),
"top_countries": defaultdict(int),
"top_vhosts": defaultdict(int),
}
ip_countries: Dict[str, str] = {} # IP → country mapping
log_path = Path(THREATS_LOG) Schema : {byte_position: int, counters: {...}, ip_countries: {...},
if not log_path.exists(): today_iso: 'YYYY-MM-DD', threats_today: int,
return stats last_updated: int}
today = datetime.now().date().isoformat()
geoip_reader = _get_geoip_reader()
Counters are full-history accumulators ; `threats_today` is reset
at the day rollover.
"""
p = Path(STATS_DISK_CACHE)
if not p.exists():
return {}
try: try:
with open(log_path) as f: return json.loads(p.read_text())
for line in f: except Exception:
return {}
def _save_stats_disk_cache(state: dict) -> None:
try: try:
entry = json.loads(line.strip()) p = Path(STATS_DISK_CACHE)
stats["total_threats"] += 1 p.parent.mkdir(parents=True, exist_ok=True)
# Atomic write : tmp → rename so a half-written file never
if entry.get("timestamp", "").startswith(today): # corrupts the cache on the next load.
stats["threats_today"] += 1 tmp = p.with_suffix(".tmp")
tmp.write_text(json.dumps(state))
stats["by_category"][entry.get("category", "unknown")] += 1 tmp.replace(p)
stats["by_severity"][entry.get("severity", "unknown")] += 1
# IP tracking - try both field names for compatibility
ip = entry.get("client_ip") or entry.get("ip", "unknown")
stats["top_ips"][ip] += 1
# Country lookup via GeoIP (cache per IP)
if ip not in ip_countries:
ip_countries[ip] = _lookup_country(ip, geoip_reader)
country = ip_countries[ip]
stats["top_countries"][country] += 1
# Vhost tracking
vhost = entry.get("host") or entry.get("vhost", "unknown")
stats["top_vhosts"][vhost] += 1
except json.JSONDecodeError:
pass
except Exception: except Exception:
pass pass
# Convert defaultdicts and get top 10
stats["by_category"] = dict(stats["by_category"])
stats["by_severity"] = dict(stats["by_severity"])
# Top IPs with country codes included def _get_threat_stats() -> dict:
top_ips_sorted = sorted(stats["top_ips"].items(), key=lambda x: -x[1])[:10] """Get threat statistics from log with GeoIP country lookup.
stats["top_ips"] = {ip: count for ip, count in top_ips_sorted}
stats["top_ips_countries"] = {ip: ip_countries.get(ip, "??") for ip, _ in top_ips_sorted}
stats["top_countries"] = dict(sorted(stats["top_countries"].items(), key=lambda x: -x[1])[:10]) Phase 7+ (#509) — double-buffered cache : the first call after a
stats["top_vhosts"] = dict(sorted(stats["top_vhosts"].items(), key=lambda x: -x[1])[:10]) cold start does ONE full-log pass and persists counters + the byte
position to disk. Subsequent calls only read the new tail since
the last position. Log rotation / truncation is detected via the
file shrinking ; counters are reset cleanly.
"""
state = _load_stats_disk_cache()
today = datetime.now().date().isoformat()
return stats # Reset accumulators if the day has rolled over.
if state.get("today_iso") != today:
state["today_iso"] = today
state["threats_today"] = 0
counters = state.get("counters", {})
by_category = defaultdict(int, counters.get("by_category", {}))
by_severity = defaultdict(int, counters.get("by_severity", {}))
top_ips = defaultdict(int, counters.get("top_ips", {}))
top_countries = defaultdict(int, counters.get("top_countries", {}))
top_vhosts = defaultdict(int, counters.get("top_vhosts", {}))
total_threats = counters.get("total_threats", 0)
threats_today = state.get("threats_today", 0)
ip_countries: Dict[str, str] = dict(state.get("ip_countries", {}))
log_path = Path(THREATS_LOG)
if not log_path.exists():
# Return whatever's cached.
return _finalize_stats(
total_threats, threats_today, by_category, by_severity,
top_ips, top_countries, top_vhosts, ip_countries,
)
geoip_reader = _get_geoip_reader()
try:
size_now = log_path.stat().st_size
byte_position = state.get("byte_position", 0)
# Log rotation / truncation : the file shrank since last read.
# Drop accumulators ; we'll rebuild from the new (smaller) file.
if size_now < byte_position:
by_category.clear(); by_severity.clear(); top_ips.clear()
top_countries.clear(); top_vhosts.clear()
total_threats = 0
threats_today = 0
byte_position = 0
ip_countries.clear()
if size_now > byte_position:
with open(log_path) as f:
f.seek(byte_position)
for line in f:
try:
entry = json.loads(line.strip())
total_threats += 1
if entry.get("timestamp", "").startswith(today):
threats_today += 1
by_category[entry.get("category", "unknown")] += 1
by_severity[entry.get("severity", "unknown")] += 1
ip = entry.get("client_ip") or entry.get("ip", "unknown")
top_ips[ip] += 1
if ip not in ip_countries:
ip_countries[ip] = _lookup_country(ip, geoip_reader)
top_countries[ip_countries[ip]] += 1
vhost = entry.get("host") or entry.get("vhost", "unknown")
top_vhosts[vhost] += 1
except json.JSONDecodeError:
pass
byte_position = f.tell()
except Exception:
pass
# Cap the ip_countries dict so it doesn't grow without bound.
# Top-1000 most recently seen IPs is plenty for the dashboard.
if len(ip_countries) > 1200:
ip_countries = dict(
sorted(ip_countries.items(), key=lambda kv: -top_ips.get(kv[0], 0))[:1000]
)
# Persist before returning so the next call starts from here.
_save_stats_disk_cache({
"today_iso": today,
"threats_today": threats_today,
"byte_position": byte_position,
"counters": {
"total_threats": total_threats,
"by_category": dict(by_category),
"by_severity": dict(by_severity),
"top_ips": dict(top_ips),
"top_countries": dict(top_countries),
"top_vhosts": dict(top_vhosts),
},
"ip_countries": ip_countries,
"last_updated": int(time.time()),
})
return _finalize_stats(
total_threats, threats_today, by_category, by_severity,
top_ips, top_countries, top_vhosts, ip_countries,
)
def _finalize_stats(
total_threats: int, threats_today: int,
by_category, by_severity, top_ips, top_countries, top_vhosts,
ip_countries: dict,
) -> dict:
"""Shape the dashboard-friendly result : top-10 lists + plain dicts."""
top_ips_sorted = sorted(top_ips.items(), key=lambda x: -x[1])[:10]
return {
"total_threats": total_threats,
"threats_today": threats_today,
"by_category": dict(by_category),
"by_severity": dict(by_severity),
"top_ips": {ip: count for ip, count in top_ips_sorted},
"top_ips_countries": {
ip: ip_countries.get(ip, "??") for ip, _ in top_ips_sorted
},
"top_countries": dict(
sorted(top_countries.items(), key=lambda x: -x[1])[:10]
),
"top_vhosts": dict(
sorted(top_vhosts.items(), key=lambda x: -x[1])[:10]
),
}
# ─────────────────────────────────────────────────────────────────────── # ───────────────────────────────────────────────────────────────────────

View File

@ -1,3 +1,29 @@
secubox-waf (1.2.2-1~bookworm1) bookworm; urgency=medium
* Phase 11+ (#509) — double-buffered cache for WAF stats consumed by
both /waf/ and /soc/ dashboards.
Live triage on gk2 (2026-06-10) found the aggregator pinned at
89 % CPU with 8+ concurrent open file descriptors on the 110 MB
waf-threats.log because _get_threat_stats() iterated the full
JSONL on every request. Both dashboards showed empty cards.
Fix : incremental log reader with persisted byte position.
- New disk cache : /var/lib/secubox/waf/stats-disk-cache.json
stores counters + byte_position + ip_countries. Atomic
write (.tmp -> rename) so a crash mid-write never corrupts.
- _get_threat_stats() reloads from disk, seeks to the last
position, reads only the new tail since then.
- Log rotation / truncation detected via size shrink ; counters
reset cleanly.
- Day rollover resets threats_today only ; full-history
counters keep accumulating.
- ip_countries dict capped at 1200 entries (most-active 1000
retained on overflow).
Net effect : /waf/stats steady-state under 100 ms ; warm refresh
cycle under 1 s per tick instead of 30 s ; aggregator CPU drops
back to idle.
-- Gerald KERMA <devel@cybermind.fr> Wed, 10 Jun 2026 08:40:00 +0200
secubox-waf (1.2.1-1~bookworm1) bookworm; urgency=medium secubox-waf (1.2.1-1~bookworm1) bookworm; urgency=medium
* Phase 7 follow-up (#498) — LXC mitmproxy.service memory hygiene : * Phase 7 follow-up (#498) — LXC mitmproxy.service memory hygiene :