Hermes — self-healing supervisor stack
Multi-service stack supervising ELC AI Agent instances. Spawned from feat/corone-self-healing-agent branch — keb workspace was the reference implementation, hence keb’s commit author identity = Julian (Hermes Self-Healing Port).
Components (3 systemd units)
| Service | Port | Purpose | Status (2026-05-03) |
|---|---|---|---|
hermes-webui | 127.0.0.1:8787 | Python supervisor + Web UI (uvicorn) | activating |
hermes-dashboard | 127.0.0.1:9119 | Hermes admin dashboard (/root/.local/bin/hermes dashboard) | active |
hermes-cloudflare | (tunnel) | Cloudflare Tunnel exposing hermes.corone.monster → 127.0.0.1:8787 | activating |
Public endpoint
https://hermes.corone.monster(via Cloudflare Tunnel — seecloudflaredconfig)- nginx vhost:
/etc/nginx/sites-enabled/hermes.corone.monster
Code locations
- Agent:
/root/.hermes/hermes-agent/(Python venv) - Web UI:
/root/hermes-webui/server.py - Dashboard binary:
/root/.local/bin/hermes - cloudflared:
/root/.local/bin/cloudflared
Function
Watches ELC instances (corone-app:3000, kebahagiaan-app:3001):
- Restart on crash via systemd
- Captures
error.unexpectedlifecycle event (added in 1.3.7) - Catches process-level uncaughtException + unhandledRejection
- Dispatches notifications externally
Related ELC commits (on corone-julian branch)
e2a1365—feat(lifecycle)dispatcherror.unexpectedon uncaught exception/rejectiona6cba6a—feat(notifications)error.unexpectedevent type for process-level crashes
Why it matters
Without Hermes, an ELC crash → 502 → user manually restarts. With Hermes:
- Process dies → systemd starts new one
- Lifecycle event flagged → Hermes notifies
- Dashboard at
:9119shows recent restart history - Cloudflare Tunnel keeps URL alive even if local nginx hiccups
Cross-refs
- elc-ai-agent — supervised target
- elc-release-1.3.7 — release that adds error.unexpected
- nginx-vhost-map — public routing