Compare commits

...

6 Commits

Author SHA1 Message Date
Admin
4831c74acc feat(observability+tts): OTel logs for backend/runner/ui; add kokoro-fastapi (GPU) and pocket-tts (CPU) to homelab
Some checks failed
Release / Check ui (push) Successful in 47s
Release / Docker / caddy (push) Successful in 48s
Release / Test backend (push) Successful in 3m24s
CI / Check ui (pull_request) Successful in 52s
CI / Test backend (pull_request) Successful in 3m6s
CI / Docker / caddy (pull_request) Failing after 38s
Release / Upload source maps (push) Failing after 33s
Release / Docker / ui (push) Failing after 2m41s
Release / Docker / backend (push) Successful in 2m48s
Release / Docker / runner (push) Failing after 1m1s
Release / Gitea Release (push) Has been skipped
CI / Docker / ui (pull_request) Successful in 1m48s
CI / Docker / backend (pull_request) Successful in 1m56s
CI / Docker / runner (pull_request) Successful in 1m45s
- otelsetup.Init now returns a *slog.Logger wired to the OTLP log exporter
  so all slog output is shipped to Loki with embedded trace IDs
- backend and runner both adopt the new OTel-bridged logger
- runner.runScrapeTask and runAudioTask now emit structured OTel spans
- ui/hooks.server.ts adds BatchLogRecordProcessor alongside existing trace exporter
- homelab: add kokoro-fastapi GPU service (ghcr.io/remsky/kokoro-fastapi-gpu)
  using deploy.resources.reservations for NVIDIA GPU, exposed internally on :8880
- homelab: add pocket-tts CPU service (ghcr.io/kyutai-labs/pocket-tts) on :8000
- runner KOKORO_URL hardcoded to http://kokoro-fastapi:8880 (fixes DNS failure
  for the stale kokoro.kalekber.cc hostname)
2026-03-27 16:05:22 +05:00
Admin
7e5e0495cf ci: retrigger after fixing runner DNS — job containers now use dnsmasq with AAAA filtering
Some checks failed
CI / Check ui (pull_request) Successful in 46s
CI / Test backend (pull_request) Successful in 2m56s
CI / Docker / ui (pull_request) Failing after 49s
CI / Docker / backend (pull_request) Successful in 1m47s
CI / Docker / caddy (pull_request) Successful in 5m45s
CI / Docker / runner (pull_request) Successful in 1m53s
2026-03-27 10:32:56 +05:00
Admin
188685e1b6 ci: retrigger to rebuild Docker backend image after transient IPv6 failure
Some checks failed
CI / Test backend (pull_request) Failing after 29s
CI / Docker / backend (pull_request) Has been skipped
CI / Docker / runner (pull_request) Has been skipped
CI / Check ui (pull_request) Successful in 53s
CI / Docker / ui (pull_request) Successful in 1m27s
CI / Docker / caddy (pull_request) Successful in 5m15s
2026-03-27 10:12:19 +05:00
Admin
3271a5f3e6 fix(ui): use resourceFromAttributes instead of new Resource to fix verbatimModuleSyntax TS error
Some checks failed
CI / Test backend (pull_request) Successful in 36s
CI / Check ui (pull_request) Successful in 46s
CI / Docker / backend (pull_request) Failing after 41s
CI / Docker / runner (pull_request) Successful in 1m38s
CI / Docker / ui (pull_request) Successful in 1m42s
CI / Docker / caddy (pull_request) Successful in 6m27s
2026-03-27 10:01:29 +05:00
Admin
ee3ed29316 Add OTel distributed tracing to backend, ui, and runner
Some checks failed
CI / Check ui (pull_request) Failing after 44s
CI / Docker / ui (pull_request) Has been skipped
CI / Test backend (pull_request) Successful in 3m30s
CI / Docker / backend (pull_request) Successful in 2m28s
CI / Docker / caddy (pull_request) Successful in 5m22s
CI / Docker / runner (pull_request) Successful in 1m52s
Go backend:
- Add OTel SDK + otelhttp middleware deps (go.mod)
- New internal/otelsetup package: init OTLP/HTTP TracerProvider from env vars
- cmd/backend/main.go: call otelsetup.Init() after logger + ctx setup
- internal/backend/server.go: wrap mux with otelhttp.NewHandler() before
  sentryhttp, so all HTTP spans are recorded

SvelteKit UI:
- Add @opentelemetry/sdk-node, exporter-trace-otlp-http, resources,
  semantic-conventions
- hooks.server.ts: init NodeSDK when OTEL_EXPORTER_OTLP_ENDPOINT is set;
  graceful shutdown on SIGTERM/SIGINT

Config:
- docker-compose.yml: pass OTEL_EXPORTER_OTLP_ENDPOINT + OTEL_SERVICE_NAME
  to backend, runner, and ui services
- homelab/docker-compose.yml: fix runner OTel endpoint to HTTP port 4318
- Doppler prd: OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.libnovel.cc
- Doppler prd_homelab: OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318

All services no-op gracefully when the env var is unset (local dev).
2026-03-26 23:00:52 +05:00
Admin
a39f660a37 Migrate tooling to homelab; add OTel observability stack
All checks were successful
CI / Test backend (pull_request) Successful in 34s
CI / Check ui (pull_request) Successful in 33s
CI / Docker / backend (pull_request) Successful in 1m26s
CI / Docker / runner (pull_request) Successful in 1m50s
CI / Docker / ui (pull_request) Successful in 1m3s
CI / Docker / caddy (pull_request) Successful in 5m53s
- Remove GlitchTip, Umami, Fider, Gotify, Uptime Kuma, Dozzle, Watchtower
  from prod docker-compose (now run on homelab)
- Add dozzle-agent on prod (127.0.0.1:7007) for homelab Dozzle to connect to
- Remove corresponding subdomain blocks from Caddyfile (now routed via
  Cloudflare Tunnel from homelab)
- Add homelab/docker-compose.yml: unified homelab stack with all migrated
  tooling services plus full OTel stack (Tempo 2.6.1, Loki, Prometheus,
  OTel Collector, Grafana)
- Add homelab/otel/: Tempo, Loki, Prometheus, OTel Collector configs +
  Grafana provisioning (datasources + dashboards)
- Add homelab/dozzle/users.yml for Dozzle auth
2026-03-26 21:22:43 +05:00
21 changed files with 2163 additions and 250 deletions

View File

@@ -204,41 +204,11 @@
}
# ── Tooling subdomains ────────────────────────────────────────────────────────
feedback.libnovel.cc {
import security_headers
reverse_proxy fider:3000
}
# ── GlitchTip: error tracking ─────────────────────────────────────────────────
errors.libnovel.cc {
import security_headers
reverse_proxy glitchtip-web:8000
}
# ── Umami: page analytics ─────────────────────────────────────────────────────
analytics.libnovel.cc {
import security_headers
reverse_proxy umami:3000
}
# ── Dozzle: Docker log viewer ─────────────────────────────────────────────────
logs.libnovel.cc {
import security_headers
reverse_proxy dozzle:8080
}
# ── Uptime Kuma: uptime monitoring ────────────────────────────────────────────
uptime.libnovel.cc {
import security_headers
reverse_proxy uptime-kuma:3001
}
# ── Gotify: push notifications ────────────────────────────────────────────────
push.libnovel.cc {
import security_headers
reverse_proxy gotify:80
}
# feedback.libnovel.cc, errors.libnovel.cc, analytics.libnovel.cc,
# logs.libnovel.cc, uptime.libnovel.cc, push.libnovel.cc, grafana.libnovel.cc
# are now routed via Cloudflare Tunnel directly to the homelab (192.168.0.109).
# No Caddy rules needed here — Cloudflare handles TLS termination and routing.
# ── PocketBase: exposed for homelab runner task polling ───────────────────────
# Allows the homelab runner to claim tasks and write results via the PB API.
# Admin UI is also accessible here for convenience.

BIN
backend/backend Executable file

Binary file not shown.

View File

@@ -26,6 +26,7 @@ import (
"github.com/libnovel/backend/internal/config"
"github.com/libnovel/backend/internal/kokoro"
"github.com/libnovel/backend/internal/meili"
"github.com/libnovel/backend/internal/otelsetup"
"github.com/libnovel/backend/internal/storage"
)
@@ -70,6 +71,19 @@ func run() error {
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer stop()
// ── OpenTelemetry tracing + logs ──────────────────────────────────────────
otelShutdown, otelLog, err := otelsetup.Init(ctx, version)
if err != nil {
return fmt.Errorf("init otel: %w", err)
}
if otelShutdown != nil {
defer otelShutdown()
// Replace the plain slog logger with the OTel-bridged one so all
// structured log lines are forwarded to Loki with trace IDs attached.
log = otelLog
log.Info("otel tracing + logs enabled", "endpoint", os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT"))
}
// ── Storage ──────────────────────────────────────────────────────────────
store, err := storage.NewStore(ctx, cfg, log)
if err != nil {

View File

@@ -25,6 +25,7 @@ import (
"github.com/libnovel/backend/internal/kokoro"
"github.com/libnovel/backend/internal/meili"
"github.com/libnovel/backend/internal/novelfire"
"github.com/libnovel/backend/internal/otelsetup"
"github.com/libnovel/backend/internal/runner"
"github.com/libnovel/backend/internal/storage"
)
@@ -70,6 +71,19 @@ func run() error {
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer stop()
// ── OpenTelemetry tracing + logs ─────────────────────────────────────────
otelShutdown, otelLog, err := otelsetup.Init(ctx, version)
if err != nil {
return fmt.Errorf("init otel: %w", err)
}
if otelShutdown != nil {
defer otelShutdown()
// Switch to the OTel-bridged logger so all structured log lines are
// forwarded to Loki with trace IDs attached.
log = otelLog
log.Info("otel tracing + logs enabled", "endpoint", os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT"))
}
// ── Storage ─────────────────────────────────────────────────────────────
store, err := storage.NewStore(ctx, cfg, log)
if err != nil {

View File

@@ -9,14 +9,19 @@ require (
require (
github.com/andybalholm/brotli v1.1.1 // indirect
github.com/cenkalti/backoff/v5 v5.0.3 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/getsentry/sentry-go v0.43.0 // indirect
github.com/go-ini/ini v1.67.0 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/golang-jwt/jwt/v5 v5.3.1 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.28.0 // indirect
github.com/klauspost/compress v1.18.2 // indirect
github.com/klauspost/cpuid/v2 v2.2.11 // indirect
github.com/klauspost/crc32 v1.3.0 // indirect
@@ -28,10 +33,27 @@ require (
github.com/redis/go-redis/v9 v9.18.0 // indirect
github.com/rs/xid v1.6.0 // indirect
github.com/tinylib/msgp v1.6.1 // indirect
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
go.opentelemetry.io/contrib/bridges/otelslog v0.17.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.67.0 // indirect
go.opentelemetry.io/otel v1.42.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.18.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.42.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.42.0 // indirect
go.opentelemetry.io/otel/log v0.18.0 // indirect
go.opentelemetry.io/otel/metric v1.42.0 // indirect
go.opentelemetry.io/otel/sdk v1.42.0 // indirect
go.opentelemetry.io/otel/sdk/log v0.18.0 // indirect
go.opentelemetry.io/otel/trace v1.42.0 // indirect
go.opentelemetry.io/proto/otlp v1.9.0 // indirect
go.uber.org/atomic v1.11.0 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/crypto v0.48.0 // indirect
golang.org/x/sys v0.41.0 // indirect
golang.org/x/text v0.34.0 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20260209200024-4cfbd4190f57 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20260209200024-4cfbd4190f57 // indirect
google.golang.org/grpc v1.79.2 // indirect
google.golang.org/protobuf v1.36.11 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

View File

@@ -1,5 +1,7 @@
github.com/andybalholm/brotli v1.1.1 h1:PR2pgnyFznKEugtsUo0xLdDop5SKXd5Qf5ysW+7XdTA=
github.com/andybalholm/brotli v1.1.1/go.mod h1:05ib4cKhjx3OQYUY22hTVd34Bc8upXjOLL2rKwwZBoA=
github.com/cenkalti/backoff/v5 v5.0.3 h1:ZN+IMa753KfX5hd8vVaMixjnqRZ3y8CuJKRKj1xcsSM=
github.com/cenkalti/backoff/v5 v5.0.3/go.mod h1:rkhZdG3JZukswDf7f0cwqPNk4K0sa+F97BxZthm/crw=
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
@@ -8,14 +10,23 @@ github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/r
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg=
github.com/felixge/httpsnoop v1.0.4/go.mod h1:m8KPJKqk1gH5J9DgRY2ASl2lWCfGKXixSwevea8zH2U=
github.com/getsentry/sentry-go v0.43.0 h1:XbXLpFicpo8HmBDaInk7dum18G9KSLcjZiyUKS+hLW4=
github.com/getsentry/sentry-go v0.43.0/go.mod h1:XDotiNZbgf5U8bPDUAfvcFmOnMQQceESxyKaObSssW0=
github.com/go-ini/ini v1.67.0 h1:z6ZrTEZqSWOTyH2FlglNbNgARyHG8oLW9gMELqKr06A=
github.com/go-ini/ini v1.67.0/go.mod h1:ByCAeIL28uOIIG0E3PJtZPDL8WnHpFKFOtgjp+3Ies8=
github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE=
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/grpc-ecosystem/grpc-gateway/v2 v2.28.0 h1:HWRh5R2+9EifMyIHV7ZV+MIZqgz+PMpZ14Jynv3O2Zs=
github.com/grpc-ecosystem/grpc-gateway/v2 v2.28.0/go.mod h1:JfhWUomR1baixubs02l85lZYYOm7LV6om4ceouMv45c=
github.com/klauspost/compress v1.18.2 h1:iiPHWW0YrcFgpBYhsA6D1+fqHssJscY/Tm/y2Uqnapk=
github.com/klauspost/compress v1.18.2/go.mod h1:R0h/fSBs8DE4ENlcrlib3PsXS61voFxhIs2DeRhCvJ4=
github.com/klauspost/cpuid/v2 v2.0.1/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
@@ -45,6 +56,32 @@ github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu
github.com/tinylib/msgp v1.6.1 h1:ESRv8eL3u+DNHUoSAAQRE50Hm162zqAnBoGv9PzScPY=
github.com/tinylib/msgp v1.6.1/go.mod h1:RSp0LW9oSxFut3KzESt5Voq4GVWyS+PSulT77roAqEA=
github.com/xyproto/randomstring v1.0.5/go.mod h1:rgmS5DeNXLivK7YprL0pY+lTuhNQW3iGxZ18UQApw/E=
go.opentelemetry.io/auto/sdk v1.2.1 h1:jXsnJ4Lmnqd11kwkBV2LgLoFMZKizbCi5fNZ/ipaZ64=
go.opentelemetry.io/auto/sdk v1.2.1/go.mod h1:KRTj+aOaElaLi+wW1kO/DZRXwkF4C5xPbEe3ZiIhN7Y=
go.opentelemetry.io/contrib/bridges/otelslog v0.17.0 h1:NFIS6x7wyObQ7cR84x7bt1sr8nYBx89s3x3GwRjw40k=
go.opentelemetry.io/contrib/bridges/otelslog v0.17.0/go.mod h1:39SaByOyDMRMe872AE7uelMuQZidIw7LLFAnQi0FWTE=
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.67.0 h1:OyrsyzuttWTSur2qN/Lm0m2a8yqyIjUVBZcxFPuXq2o=
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.67.0/go.mod h1:C2NGBr+kAB4bk3xtMXfZ94gqFDtg/GkI7e9zqGh5Beg=
go.opentelemetry.io/otel v1.42.0 h1:lSQGzTgVR3+sgJDAU/7/ZMjN9Z+vUip7leaqBKy4sho=
go.opentelemetry.io/otel v1.42.0/go.mod h1:lJNsdRMxCUIWuMlVJWzecSMuNjE7dOYyWlqOXWkdqCc=
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.18.0 h1:icqq3Z34UrEFk2u+HMhTtRsvo7Ues+eiJVjaJt62njs=
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.18.0/go.mod h1:W2m8P+d5Wn5kipj4/xmbt9uMqezEKfBjzVJadfABSBE=
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.42.0 h1:THuZiwpQZuHPul65w4WcwEnkX2QIuMT+UFoOrygtoJw=
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.42.0/go.mod h1:J2pvYM5NGHofZ2/Ru6zw/TNWnEQp5crgyDeSrYpXkAw=
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.42.0 h1:uLXP+3mghfMf7XmV4PkGfFhFKuNWoCvvx5wP/wOXo0o=
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.42.0/go.mod h1:v0Tj04armyT59mnURNUJf7RCKcKzq+lgJs6QSjHjaTc=
go.opentelemetry.io/otel/log v0.18.0 h1:XgeQIIBjZZrliksMEbcwMZefoOSMI1hdjiLEiiB0bAg=
go.opentelemetry.io/otel/log v0.18.0/go.mod h1:KEV1kad0NofR3ycsiDH4Yjcoj0+8206I6Ox2QYFSNgI=
go.opentelemetry.io/otel/metric v1.42.0 h1:2jXG+3oZLNXEPfNmnpxKDeZsFI5o4J+nz6xUlaFdF/4=
go.opentelemetry.io/otel/metric v1.42.0/go.mod h1:RlUN/7vTU7Ao/diDkEpQpnz3/92J9ko05BIwxYa2SSI=
go.opentelemetry.io/otel/sdk v1.42.0 h1:LyC8+jqk6UJwdrI/8VydAq/hvkFKNHZVIWuslJXYsDo=
go.opentelemetry.io/otel/sdk v1.42.0/go.mod h1:rGHCAxd9DAph0joO4W6OPwxjNTYWghRWmkHuGbayMts=
go.opentelemetry.io/otel/sdk/log v0.18.0 h1:n8OyZr7t7otkeTnPTbDNom6rW16TBYGtvyy2Gk6buQw=
go.opentelemetry.io/otel/sdk/log v0.18.0/go.mod h1:C0+wxkTwKpOCZLrlJ3pewPiiQwpzycPI/u6W0Z9fuYk=
go.opentelemetry.io/otel/trace v1.42.0 h1:OUCgIPt+mzOnaUTpOQcBiM/PLQ/Op7oq6g4LenLmOYY=
go.opentelemetry.io/otel/trace v1.42.0/go.mod h1:f3K9S+IFqnumBkKhRJMeaZeNk9epyhnCmQh/EysQCdc=
go.opentelemetry.io/proto/otlp v1.9.0 h1:l706jCMITVouPOqEnii2fIAuO3IVGBRPV5ICjceRb/A=
go.opentelemetry.io/proto/otlp v1.9.0/go.mod h1:xE+Cx5E/eEHw+ISFkwPLwCZefwVjY+pqKg1qcK03+/4=
go.uber.org/atomic v1.11.0 h1:ZvwS0R+56ePWxUNi+Atn9dWONBPp/AUETXlHW0DxSjE=
go.uber.org/atomic v1.11.0/go.mod h1:LUxbIzbOniOlMKjJjyPfpl4v+PKK2cNJn91OQbhoJI0=
go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc=
@@ -57,6 +94,14 @@ golang.org/x/sys v0.41.0 h1:Ivj+2Cp/ylzLiEU89QhWblYnOE9zerudt9Ftecq2C6k=
golang.org/x/sys v0.41.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/text v0.34.0 h1:oL/Qq0Kdaqxa1KbNeMKwQq0reLCCaFtqu2eNuSeNHbk=
golang.org/x/text v0.34.0/go.mod h1:homfLqTYRFyVYemLBFl5GgL/DWEiH5wcsQ5gSh1yziA=
google.golang.org/genproto/googleapis/api v0.0.0-20260209200024-4cfbd4190f57 h1:JLQynH/LBHfCTSbDWl+py8C+Rg/k1OVH3xfcaiANuF0=
google.golang.org/genproto/googleapis/api v0.0.0-20260209200024-4cfbd4190f57/go.mod h1:kSJwQxqmFXeo79zOmbrALdflXQeAYcUbgS7PbpMknCY=
google.golang.org/genproto/googleapis/rpc v0.0.0-20260209200024-4cfbd4190f57 h1:mWPCjDEyshlQYzBpMNHaEof6UX1PmHcaUODUywQ0uac=
google.golang.org/genproto/googleapis/rpc v0.0.0-20260209200024-4cfbd4190f57/go.mod h1:j9x/tPzZkyxcgEFkiKEEGxfvyumM01BEtsW8xzOahRQ=
google.golang.org/grpc v1.79.2 h1:fRMD94s2tITpyJGtBBn7MkMseNpOZU8ZxgC3MMBaXRU=
google.golang.org/grpc v1.79.2/go.mod h1:KmT0Kjez+0dde/v2j9vzwoAScgEPx/Bw1CYChhHLrHQ=
google.golang.org/protobuf v1.36.11 h1:fV6ZwhNocDyBLK0dj+fg8ektcVegBBuEolpbTQyBNVE=
google.golang.org/protobuf v1.36.11/go.mod h1:HTf+CrKn2C3g5S8VImy6tdcUvCska2kB7j23XfzDpco=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=

View File

@@ -33,6 +33,7 @@ import (
"github.com/libnovel/backend/internal/kokoro"
"github.com/libnovel/backend/internal/meili"
"github.com/libnovel/backend/internal/taskqueue"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)
// Dependencies holds all external services the backend server depends on.
@@ -170,9 +171,17 @@ func (s *Server) ListenAndServe(ctx context.Context) error {
mux.HandleFunc("POST /api/progress/{slug}", s.handleSetProgress)
mux.HandleFunc("DELETE /api/progress/{slug}", s.handleDeleteProgress)
// Wrap mux with OTel tracing (no-op when no TracerProvider is set),
// then with Sentry for panic recovery and error reporting.
var handler http.Handler = mux
handler = otelhttp.NewHandler(handler, "libnovel.backend",
otelhttp.WithMessageEvents(otelhttp.ReadEvents, otelhttp.WriteEvents),
)
handler = sentryhttp.New(sentryhttp.Options{Repanic: true}).Handle(handler)
srv := &http.Server{
Addr: s.cfg.Addr,
Handler: sentryhttp.New(sentryhttp.Options{Repanic: true}).Handle(mux),
Handler: handler,
ReadTimeout: 15 * time.Second,
WriteTimeout: 60 * time.Second,
IdleTimeout: 60 * time.Second,

View File

@@ -0,0 +1,108 @@
// Package otelsetup initialises the OpenTelemetry SDK for the LibNovel backend.
//
// It reads two environment variables:
//
// OTEL_EXPORTER_OTLP_ENDPOINT — OTLP/HTTP endpoint, e.g. http://otel-collector:4318
// OTEL_SERVICE_NAME — service name reported in traces (default: "backend")
//
// When OTEL_EXPORTER_OTLP_ENDPOINT is empty the function is a no-op: it
// returns a nil shutdown func and the default slog.Logger, so callers never
// need to branch on it.
//
// Usage in main.go:
//
// shutdown, log, err := otelsetup.Init(ctx, version)
// if err != nil { return err }
// if shutdown != nil { defer shutdown() }
package otelsetup
import (
"context"
"fmt"
"log/slog"
"os"
"time"
"go.opentelemetry.io/contrib/bridges/otelslog"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
otellog "go.opentelemetry.io/otel/log/global"
"go.opentelemetry.io/otel/sdk/log"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)
// Init sets up TracerProvider and LoggerProvider that export via OTLP/HTTP.
//
// Returns:
// - shutdown: flushes and stops both providers (nil when OTel is disabled).
// - logger: an slog.Logger bridged to OTel logs (falls back to default when disabled).
// - err: non-nil only on SDK initialisation failure.
func Init(ctx context.Context, version string) (shutdown func(), logger *slog.Logger, err error) {
endpoint := os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
if endpoint == "" {
return nil, slog.Default(), nil // OTel disabled — not an error
}
serviceName := os.Getenv("OTEL_SERVICE_NAME")
if serviceName == "" {
serviceName = "backend"
}
// ── Shared resource ───────────────────────────────────────────────────────
res, err := resource.New(ctx,
resource.WithAttributes(
semconv.ServiceName(serviceName),
semconv.ServiceVersion(version),
),
)
if err != nil {
return nil, slog.Default(), fmt.Errorf("otelsetup: create resource: %w", err)
}
// ── Trace provider ────────────────────────────────────────────────────────
traceExp, err := otlptracehttp.New(ctx,
otlptracehttp.WithEndpoint(endpoint),
otlptracehttp.WithInsecure(), // collector is on the internal Docker network
)
if err != nil {
return nil, slog.Default(), fmt.Errorf("otelsetup: create OTLP trace exporter: %w", err)
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(traceExp),
sdktrace.WithResource(res),
sdktrace.WithSampler(sdktrace.ParentBased(sdktrace.TraceIDRatioBased(0.2))),
)
otel.SetTracerProvider(tp)
// ── Log provider ──────────────────────────────────────────────────────────
logExp, err := otlploghttp.New(ctx,
otlploghttp.WithEndpoint(endpoint),
otlploghttp.WithInsecure(),
)
if err != nil {
return nil, slog.Default(), fmt.Errorf("otelsetup: create OTLP log exporter: %w", err)
}
lp := log.NewLoggerProvider(
log.WithProcessor(log.NewBatchProcessor(logExp)),
log.WithResource(res),
)
otellog.SetLoggerProvider(lp)
// Bridge slog → OTel logs. Structured fields and trace IDs are forwarded
// automatically; Grafana can correlate log lines with Tempo traces.
otelLogger := otelslog.NewLogger(serviceName)
shutdown = func() {
shutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
_ = tp.Shutdown(shutCtx)
_ = lp.Shutdown(shutCtx)
}
return shutdown, otelLogger, nil
}

View File

@@ -22,6 +22,10 @@ import (
"sync/atomic"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
"github.com/libnovel/backend/internal/bookstore"
"github.com/libnovel/backend/internal/domain"
"github.com/libnovel/backend/internal/kokoro"
@@ -301,6 +305,14 @@ func (r *Runner) newOrchestrator() *orchestrator.Orchestrator {
// runScrapeTask executes one scrape task end-to-end and reports the result.
func (r *Runner) runScrapeTask(ctx context.Context, task domain.ScrapeTask) {
ctx, span := otel.Tracer("runner").Start(ctx, "runner.scrape_task")
defer span.End()
span.SetAttributes(
attribute.String("task.id", task.ID),
attribute.String("task.kind", task.Kind),
attribute.String("task.url", task.TargetURL),
)
log := r.deps.Log.With("task_id", task.ID, "kind", task.Kind, "url", task.TargetURL)
log.Info("runner: scrape task starting")
@@ -340,8 +352,10 @@ func (r *Runner) runScrapeTask(ctx context.Context, task domain.ScrapeTask) {
if result.ErrorMessage != "" {
r.tasksFailed.Add(1)
span.SetStatus(codes.Error, result.ErrorMessage)
} else {
r.tasksCompleted.Add(1)
span.SetStatus(codes.Ok, "")
}
log.Info("runner: scrape task finished",
@@ -384,6 +398,15 @@ func (r *Runner) runCatalogueTask(ctx context.Context, task domain.ScrapeTask, o
// runAudioTask executes one audio-generation task.
func (r *Runner) runAudioTask(ctx context.Context, task domain.AudioTask) {
ctx, span := otel.Tracer("runner").Start(ctx, "runner.audio_task")
defer span.End()
span.SetAttributes(
attribute.String("task.id", task.ID),
attribute.String("book.slug", task.Slug),
attribute.Int("chapter.number", task.Chapter),
attribute.String("audio.voice", task.Voice),
)
log := r.deps.Log.With("task_id", task.ID, "slug", task.Slug, "chapter", task.Chapter, "voice", task.Voice)
log.Info("runner: audio task starting")
@@ -407,6 +430,7 @@ func (r *Runner) runAudioTask(ctx context.Context, task domain.AudioTask) {
fail := func(msg string) {
log.Error("runner: audio task failed", "reason", msg)
r.tasksFailed.Add(1)
span.SetStatus(codes.Error, msg)
result := domain.AudioResult{ErrorMessage: msg}
if err := r.deps.Consumer.FinishAudioTask(ctx, task.ID, result); err != nil {
log.Error("runner: FinishAudioTask failed", "err", err)
@@ -441,6 +465,7 @@ func (r *Runner) runAudioTask(ctx context.Context, task domain.AudioTask) {
}
r.tasksCompleted.Add(1)
span.SetStatus(codes.Ok, "")
result := domain.AudioResult{ObjectKey: key}
if err := r.deps.Consumer.FinishAudioTask(ctx, task.ID, result); err != nil {
log.Error("runner: FinishAudioTask failed", "err", err)

View File

@@ -161,6 +161,8 @@ services:
KOKORO_URL: "${KOKORO_URL}"
KOKORO_VOICE: "${KOKORO_VOICE}"
GLITCHTIP_DSN: "${GLITCHTIP_DSN}"
OTEL_EXPORTER_OTLP_ENDPOINT: "${OTEL_EXPORTER_OTLP_ENDPOINT}"
OTEL_SERVICE_NAME: "backend"
healthcheck:
test: ["CMD", "/healthcheck", "http://localhost:8080/health"]
interval: 15s
@@ -217,8 +219,9 @@ services:
KOKORO_URL: "${KOKORO_URL}"
KOKORO_VOICE: "${KOKORO_VOICE}"
GLITCHTIP_DSN: "${GLITCHTIP_DSN}"
OTEL_EXPORTER_OTLP_ENDPOINT: "${OTEL_EXPORTER_OTLP_ENDPOINT}"
OTEL_SERVICE_NAME: "runner"
healthcheck:
# The runner writes /tmp/runner.alive on every poll.
# 120s = 2× the default 30s poll interval with generous headroom.
test: ["CMD", "/healthcheck", "file", "/tmp/runner.alive", "120"]
interval: 60s
@@ -267,6 +270,9 @@ services:
PUBLIC_UMAMI_SCRIPT_URL: "${PUBLIC_UMAMI_SCRIPT_URL}"
# GlitchTip client + server-side error tracking
PUBLIC_GLITCHTIP_DSN: "${PUBLIC_GLITCHTIP_DSN}"
# OpenTelemetry tracing
OTEL_EXPORTER_OTLP_ENDPOINT: "${OTEL_EXPORTER_OTLP_ENDPOINT}"
OTEL_SERVICE_NAME: "ui"
# OAuth2 providers
GOOGLE_CLIENT_ID: "${GOOGLE_CLIENT_ID}"
GOOGLE_CLIENT_SECRET: "${GOOGLE_CLIENT_SECRET}"
@@ -299,6 +305,19 @@ services:
timeout: 10s
retries: 5
# ─── Dozzle agent ────────────────────────────────────────────────────────────
# Exposes prod container logs to the Dozzle instance on the homelab.
# The homelab Dozzle connects here via DOZZLE_REMOTE_AGENT.
# Port 7007 is bound to localhost only — not reachable from the internet.
dozzle-agent:
image: amir20/dozzle:latest
restart: unless-stopped
command: agent
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
ports:
- "127.0.0.1:7007:7007"
# ─── CrowdSec bouncer registration ───────────────────────────────────────────
# One-shot: registers the Caddy bouncer with the CrowdSec LAPI and writes the
# generated API key to crowdsec/.crowdsec.env, which Caddy reads via env_file.
@@ -380,203 +399,6 @@ services:
WATCHTOWER_NOTIFICATION_URL: "${WATCHTOWER_NOTIFICATION_URL}"
DOCKER_API_VERSION: "1.44"
# ─── Shared PostgreSQL (Fider + GlitchTip + Umami) ───────────────────────────
# A single Postgres instance hosting three separate databases.
# PocketBase uses its own embedded SQLite; this postgres is only for the
# three new services below.
postgres:
image: postgres:16-alpine
restart: unless-stopped
environment:
POSTGRES_USER: "${POSTGRES_USER}"
POSTGRES_PASSWORD: "${POSTGRES_PASSWORD}"
POSTGRES_DB: postgres
expose:
- "5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "${POSTGRES_USER}"]
interval: 10s
timeout: 5s
retries: 5
# ─── Postgres database initialisation ────────────────────────────────────────
# One-shot: creates the fider, glitchtip, and umami databases if missing.
postgres-init:
image: postgres:16-alpine
depends_on:
postgres:
condition: service_healthy
environment:
PGPASSWORD: "${POSTGRES_PASSWORD}"
entrypoint: >
/bin/sh -c "
psql -h postgres -U ${POSTGRES_USER} -d postgres -tc \"SELECT 1 FROM pg_database WHERE datname='fider'\" | grep -q 1 ||
psql -h postgres -U ${POSTGRES_USER} -d postgres -c \"CREATE DATABASE fider\";
psql -h postgres -U ${POSTGRES_USER} -d postgres -tc \"SELECT 1 FROM pg_database WHERE datname='glitchtip'\" | grep -q 1 ||
psql -h postgres -U ${POSTGRES_USER} -d postgres -c \"CREATE DATABASE glitchtip\";
psql -h postgres -U ${POSTGRES_USER} -d postgres -tc \"SELECT 1 FROM pg_database WHERE datname='umami'\" | grep -q 1 ||
psql -h postgres -U ${POSTGRES_USER} -d postgres -c \"CREATE DATABASE umami\";
echo 'postgres-init: databases ready';
"
restart: "no"
# ─── Fider (user feedback & feature requests) ─────────────────────────────────
fider:
image: getfider/fider:stable
restart: unless-stopped
depends_on:
postgres-init:
condition: service_completed_successfully
postgres:
condition: service_healthy
expose:
- "3000"
environment:
BASE_URL: "${FIDER_BASE_URL}"
DATABASE_URL: "postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/fider?sslmode=disable"
JWT_SECRET: "${FIDER_JWT_SECRET}"
# Email: Resend SMTP
EMAIL_NOREPLY: "noreply@libnovel.cc"
EMAIL_SMTP_HOST: "${FIDER_SMTP_HOST}"
EMAIL_SMTP_PORT: "${FIDER_SMTP_PORT}"
EMAIL_SMTP_USERNAME: "${FIDER_SMTP_USER}"
EMAIL_SMTP_PASSWORD: "${FIDER_SMTP_PASSWORD}"
EMAIL_SMTP_ENABLE_STARTTLS: "false"
# ─── GlitchTip DB migration (one-shot) ───────────────────────────────────────
glitchtip-migrate:
image: glitchtip/glitchtip:latest
depends_on:
postgres-init:
condition: service_completed_successfully
postgres:
condition: service_healthy
environment:
DATABASE_URL: "postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/glitchtip"
SECRET_KEY: "${GLITCHTIP_SECRET_KEY}"
GLITCHTIP_DOMAIN: "${GLITCHTIP_DOMAIN}"
EMAIL_URL: "${GLITCHTIP_EMAIL_URL}"
DEFAULT_FROM_EMAIL: "noreply@libnovel.cc"
VALKEY_URL: "redis://valkey:6379/1"
command: "./manage.py migrate"
restart: "no"
# ─── GlitchTip web (error tracking UI + API) ─────────────────────────────────
glitchtip-web:
image: glitchtip/glitchtip:latest
restart: unless-stopped
depends_on:
glitchtip-migrate:
condition: service_completed_successfully
valkey:
condition: service_healthy
expose:
- "8000"
environment:
DATABASE_URL: "postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/glitchtip"
SECRET_KEY: "${GLITCHTIP_SECRET_KEY}"
GLITCHTIP_DOMAIN: "${GLITCHTIP_DOMAIN}"
EMAIL_URL: "${GLITCHTIP_EMAIL_URL}"
DEFAULT_FROM_EMAIL: "noreply@libnovel.cc"
VALKEY_URL: "redis://valkey:6379/1"
PORT: "8000"
ENABLE_USER_REGISTRATION: "false"
healthcheck:
test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/0/')"]
interval: 15s
timeout: 5s
retries: 5
# ─── GlitchTip worker (background task processor) ─────────────────────────────
glitchtip-worker:
image: glitchtip/glitchtip:latest
restart: unless-stopped
depends_on:
glitchtip-migrate:
condition: service_completed_successfully
valkey:
condition: service_healthy
environment:
DATABASE_URL: "postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/glitchtip"
SECRET_KEY: "${GLITCHTIP_SECRET_KEY}"
GLITCHTIP_DOMAIN: "${GLITCHTIP_DOMAIN}"
EMAIL_URL: "${GLITCHTIP_EMAIL_URL}"
DEFAULT_FROM_EMAIL: "noreply@libnovel.cc"
VALKEY_URL: "redis://valkey:6379/1"
SERVER_ROLE: "worker"
# ─── Umami (page analytics) ───────────────────────────────────────────────────
umami:
image: ghcr.io/umami-software/umami:postgresql-latest
restart: unless-stopped
depends_on:
postgres-init:
condition: service_completed_successfully
postgres:
condition: service_healthy
expose:
- "3000"
environment:
DATABASE_URL: "postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/umami"
APP_SECRET: "${UMAMI_APP_SECRET}"
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:3000/api/heartbeat"]
interval: 15s
timeout: 5s
retries: 5
# ─── Dozzle (Docker log viewer) ───────────────────────────────────────────────
dozzle:
image: amir20/dozzle:latest
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./dozzle/users.yml:/data/users.yml:ro
expose:
- "8080"
environment:
DOZZLE_AUTH_PROVIDER: simple
DOZZLE_HOSTNAME: "logs.libnovel.cc"
healthcheck:
test: ["CMD", "/dozzle", "healthcheck"]
interval: 15s
timeout: 5s
retries: 5
# ─── Uptime Kuma (uptime monitoring) ──────────────────────────────────────────
uptime-kuma:
image: louislam/uptime-kuma:1
restart: unless-stopped
volumes:
- uptime_kuma_data:/app/data
expose:
- "3001"
healthcheck:
test: ["CMD", "extra/healthcheck"]
interval: 15s
timeout: 5s
retries: 5
# ─── Gotify (push notifications) ──────────────────────────────────────────────
gotify:
image: gotify/server:latest
restart: unless-stopped
volumes:
- gotify_data:/app/data
expose:
- "80"
environment:
GOTIFY_DEFAULTUSER_NAME: "${GOTIFY_ADMIN_USER}"
GOTIFY_DEFAULTUSER_PASS: "${GOTIFY_ADMIN_PASS}"
GOTIFY_SERVER_PORT: "80"
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:80/health"]
interval: 15s
timeout: 5s
retries: 5
volumes:
minio_data:
pb_data:
@@ -586,6 +408,3 @@ volumes:
caddy_config:
caddy_logs:
crowdsec_data:
postgres_data:
uptime_kuma_data:
gotify_data:

453
homelab/docker-compose.yml Normal file
View File

@@ -0,0 +1,453 @@
# LibNovel homelab
#
# Runs on 192.168.0.109. Hosts:
# - libnovel runner (background task worker)
# - tooling: GlitchTip, Umami, Fider, Dozzle, Uptime Kuma, Gotify
# - observability: OTel Collector, Tempo, Loki, Prometheus, Grafana
# - cloudflared tunnel (public subdomains via Cloudflare Zero Trust)
# - shared Postgres for tooling DBs
#
# All secrets come from Doppler (project=libnovel, config=prd_homelab).
# Run with: doppler run -- docker compose up -d
#
# Public subdomains (via Cloudflare Tunnel — no ports exposed to internet):
# errors.libnovel.cc → glitchtip-web:8000
# analytics.libnovel.cc → umami:3000
# feedback.libnovel.cc → fider:3000
# logs.libnovel.cc → dozzle:8080
# uptime.libnovel.cc → uptime-kuma:3001
# push.libnovel.cc → gotify:80
# grafana.libnovel.cc → grafana:3000
services:
# ── Cloudflare Tunnel ───────────────────────────────────────────────────────
# Outbound-only encrypted tunnel to Cloudflare.
# Routes all public subdomains to their respective containers on this network.
# No inbound ports needed — cloudflared initiates all connections outward.
cloudflared:
image: cloudflare/cloudflared:latest
restart: unless-stopped
command: tunnel --no-autoupdate run --token ${CLOUDFLARE_TUNNEL_TOKEN}
environment:
CLOUDFLARE_TUNNEL_TOKEN: "${CLOUDFLARE_TUNNEL_TOKEN}"
# ── LibNovel Runner ─────────────────────────────────────────────────────────
# Background task worker. Connects to prod PocketBase, MinIO, Meilisearch
# via their public subdomains (pb.libnovel.cc, storage.libnovel.cc, etc.)
runner:
image: kalekber/libnovel-runner:latest
restart: unless-stopped
stop_grace_period: 135s
labels:
com.centurylinklabs.watchtower.enable: "true"
environment:
POCKETBASE_URL: "https://pb.libnovel.cc"
POCKETBASE_ADMIN_EMAIL: "${POCKETBASE_ADMIN_EMAIL}"
POCKETBASE_ADMIN_PASSWORD: "${POCKETBASE_ADMIN_PASSWORD}"
MINIO_ENDPOINT: "storage.libnovel.cc"
MINIO_ACCESS_KEY: "${MINIO_ROOT_USER}"
MINIO_SECRET_KEY: "${MINIO_ROOT_PASSWORD}"
MINIO_USE_SSL: "true"
MINIO_PUBLIC_ENDPOINT: "${MINIO_PUBLIC_ENDPOINT}"
MINIO_PUBLIC_USE_SSL: "${MINIO_PUBLIC_USE_SSL}"
MEILI_URL: "${MEILI_URL}"
MEILI_API_KEY: "${MEILI_API_KEY}"
VALKEY_ADDR: ""
GODEBUG: "preferIPv4=1"
KOKORO_URL: "http://kokoro-fastapi:8880"
KOKORO_VOICE: "${KOKORO_VOICE}"
RUNNER_WORKER_ID: "${RUNNER_WORKER_ID}"
RUNNER_POLL_INTERVAL: "${RUNNER_POLL_INTERVAL}"
RUNNER_MAX_CONCURRENT_SCRAPE: "${RUNNER_MAX_CONCURRENT_SCRAPE}"
RUNNER_MAX_CONCURRENT_AUDIO: "${RUNNER_MAX_CONCURRENT_AUDIO}"
RUNNER_TIMEOUT: "${RUNNER_TIMEOUT}"
RUNNER_METRICS_ADDR: "${RUNNER_METRICS_ADDR}"
RUNNER_SKIP_INITIAL_CATALOGUE_REFRESH: "true"
LOG_LEVEL: "${LOG_LEVEL}"
GLITCHTIP_DSN: "${GLITCHTIP_DSN}"
# OTel — send runner traces/metrics to the local collector (HTTP)
OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4318"
OTEL_SERVICE_NAME: "runner"
healthcheck:
test: ["CMD", "/healthcheck", "file", "/tmp/runner.alive", "120"]
interval: 60s
timeout: 5s
retries: 3
# ── Shared Postgres ─────────────────────────────────────────────────────────
# Hosts glitchtip, umami, and fider databases.
postgres:
image: postgres:16-alpine
restart: unless-stopped
environment:
POSTGRES_USER: "${POSTGRES_USER}"
POSTGRES_PASSWORD: "${POSTGRES_PASSWORD}"
POSTGRES_DB: postgres
expose:
- "5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "${POSTGRES_USER}"]
interval: 10s
timeout: 5s
retries: 5
# ── Postgres database initialisation ────────────────────────────────────────
postgres-init:
image: postgres:16-alpine
depends_on:
postgres:
condition: service_healthy
environment:
PGPASSWORD: "${POSTGRES_PASSWORD}"
entrypoint: >
/bin/sh -c "
psql -h postgres -U ${POSTGRES_USER} -d postgres -tc \"SELECT 1 FROM pg_database WHERE datname='fider'\" | grep -q 1 ||
psql -h postgres -U ${POSTGRES_USER} -d postgres -c \"CREATE DATABASE fider\";
psql -h postgres -U ${POSTGRES_USER} -d postgres -tc \"SELECT 1 FROM pg_database WHERE datname='glitchtip'\" | grep -q 1 ||
psql -h postgres -U ${POSTGRES_USER} -d postgres -c \"CREATE DATABASE glitchtip\";
psql -h postgres -U ${POSTGRES_USER} -d postgres -tc \"SELECT 1 FROM pg_database WHERE datname='umami'\" | grep -q 1 ||
psql -h postgres -U ${POSTGRES_USER} -d postgres -c \"CREATE DATABASE umami\";
echo 'postgres-init: databases ready';
"
restart: "no"
# ── GlitchTip DB migration ──────────────────────────────────────────────────
glitchtip-migrate:
image: glitchtip/glitchtip:latest
depends_on:
postgres-init:
condition: service_completed_successfully
postgres:
condition: service_healthy
environment:
DATABASE_URL: "postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/glitchtip"
SECRET_KEY: "${GLITCHTIP_SECRET_KEY}"
GLITCHTIP_DOMAIN: "${GLITCHTIP_DOMAIN}"
EMAIL_URL: "${GLITCHTIP_EMAIL_URL}"
DEFAULT_FROM_EMAIL: "noreply@libnovel.cc"
VALKEY_URL: "redis://valkey:6379/1"
command: "./manage.py migrate"
restart: "no"
# ── GlitchTip web ───────────────────────────────────────────────────────────
glitchtip-web:
image: glitchtip/glitchtip:latest
restart: unless-stopped
depends_on:
glitchtip-migrate:
condition: service_completed_successfully
expose:
- "8000"
environment:
DATABASE_URL: "postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/glitchtip"
SECRET_KEY: "${GLITCHTIP_SECRET_KEY}"
GLITCHTIP_DOMAIN: "${GLITCHTIP_DOMAIN}"
EMAIL_URL: "${GLITCHTIP_EMAIL_URL}"
DEFAULT_FROM_EMAIL: "noreply@libnovel.cc"
VALKEY_URL: "redis://valkey:6379/1"
PORT: "8000"
ENABLE_USER_REGISTRATION: "false"
healthcheck:
test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/0/')"]
interval: 15s
timeout: 5s
retries: 5
# ── GlitchTip worker ────────────────────────────────────────────────────────
glitchtip-worker:
image: glitchtip/glitchtip:latest
restart: unless-stopped
depends_on:
glitchtip-migrate:
condition: service_completed_successfully
environment:
DATABASE_URL: "postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/glitchtip"
SECRET_KEY: "${GLITCHTIP_SECRET_KEY}"
GLITCHTIP_DOMAIN: "${GLITCHTIP_DOMAIN}"
EMAIL_URL: "${GLITCHTIP_EMAIL_URL}"
DEFAULT_FROM_EMAIL: "noreply@libnovel.cc"
VALKEY_URL: "redis://valkey:6379/1"
SERVER_ROLE: "worker"
# ── Umami ───────────────────────────────────────────────────────────────────
umami:
image: ghcr.io/umami-software/umami:postgresql-latest
restart: unless-stopped
depends_on:
postgres-init:
condition: service_completed_successfully
postgres:
condition: service_healthy
expose:
- "3000"
environment:
DATABASE_URL: "postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/umami"
APP_SECRET: "${UMAMI_APP_SECRET}"
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:3000/api/heartbeat"]
interval: 15s
timeout: 5s
retries: 5
# ── Fider ───────────────────────────────────────────────────────────────────
fider:
image: getfider/fider:stable
restart: unless-stopped
depends_on:
postgres-init:
condition: service_completed_successfully
postgres:
condition: service_healthy
expose:
- "3000"
environment:
BASE_URL: "${FIDER_BASE_URL}"
DATABASE_URL: "postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/fider?sslmode=disable"
JWT_SECRET: "${FIDER_JWT_SECRET}"
EMAIL_NOREPLY: "noreply@libnovel.cc"
EMAIL_SMTP_HOST: "${FIDER_SMTP_HOST}"
EMAIL_SMTP_PORT: "${FIDER_SMTP_PORT}"
EMAIL_SMTP_USERNAME: "${FIDER_SMTP_USER}"
EMAIL_SMTP_PASSWORD: "${FIDER_SMTP_PASSWORD}"
EMAIL_SMTP_ENABLE_STARTTLS: "false"
# ── Dozzle ──────────────────────────────────────────────────────────────────
# Watches both homelab and prod containers.
# Prod agent runs on 165.22.70.138:7007 (added separately to prod compose).
dozzle:
image: amir20/dozzle:latest
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./dozzle/users.yml:/data/users.yml:ro
expose:
- "8080"
environment:
DOZZLE_AUTH_PROVIDER: simple
DOZZLE_HOSTNAME: "logs.libnovel.cc"
DOZZLE_REMOTE_AGENT: "prod@165.22.70.138:7007"
healthcheck:
test: ["CMD", "/dozzle", "healthcheck"]
interval: 15s
timeout: 5s
retries: 5
# ── Uptime Kuma ─────────────────────────────────────────────────────────────
uptime-kuma:
image: louislam/uptime-kuma:1
restart: unless-stopped
volumes:
- uptime_kuma_data:/app/data
expose:
- "3001"
healthcheck:
test: ["CMD", "extra/healthcheck"]
interval: 15s
timeout: 5s
retries: 5
# ── Gotify ──────────────────────────────────────────────────────────────────
gotify:
image: gotify/server:latest
restart: unless-stopped
volumes:
- gotify_data:/app/data
expose:
- "80"
environment:
GOTIFY_DEFAULTUSER_NAME: "${GOTIFY_ADMIN_USER}"
GOTIFY_DEFAULTUSER_PASS: "${GOTIFY_ADMIN_PASS}"
GOTIFY_SERVER_PORT: "80"
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:80/health"]
interval: 15s
timeout: 5s
retries: 5
# ── Valkey ──────────────────────────────────────────────────────────────────
# Used by GlitchTip for task queuing.
valkey:
image: valkey/valkey:7-alpine
restart: unless-stopped
expose:
- "6379"
volumes:
- valkey_data:/data
healthcheck:
test: ["CMD", "valkey-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
# ── OTel Collector ──────────────────────────────────────────────────────────
# Receives OTLP from backend/ui/runner, fans out to Tempo + Prometheus + Loki.
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
restart: unless-stopped
volumes:
- ./otel/collector.yaml:/etc/otelcol-contrib/config.yaml:ro
expose:
- "4317" # OTLP gRPC
- "4318" # OTLP HTTP
- "8888" # Collector self-metrics (scraped by Prometheus)
depends_on:
- tempo
- prometheus
- loki
# No healthcheck — distroless image has no shell or curl
# ── Tempo ───────────────────────────────────────────────────────────────────
# Distributed trace storage. Receives OTLP from the collector.
tempo:
image: grafana/tempo:2.6.1
restart: unless-stopped
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./otel/tempo.yaml:/etc/tempo.yaml:ro
- tempo_data:/var/tempo
expose:
- "3200" # Tempo query API (queried by Grafana)
- "4317" # OTLP gRPC ingest (collector → tempo)
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:3200/ready"]
interval: 15s
timeout: 5s
retries: 5
# ── Prometheus ──────────────────────────────────────────────────────────────
# Scrapes metrics from backend (via prod), runner, and otel-collector.
prometheus:
image: prom/prometheus:latest
restart: unless-stopped
command:
- "--config.file=/etc/prometheus/prometheus.yaml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=30d"
- "--web.enable-remote-write-receiver"
volumes:
- ./otel/prometheus.yaml:/etc/prometheus/prometheus.yaml:ro
- prometheus_data:/prometheus
expose:
- "9090"
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:9090/-/healthy"]
interval: 15s
timeout: 5s
retries: 5
# ── Loki ────────────────────────────────────────────────────────────────────
# Log aggregation. Receives logs from OTel collector. Replaces manual Dozzle
# tailing for structured log search.
loki:
image: grafana/loki:latest
restart: unless-stopped
command: ["-config.file=/etc/loki/loki.yaml"]
volumes:
- ./otel/loki.yaml:/etc/loki/loki.yaml:ro
- loki_data:/loki
expose:
- "3100"
# No healthcheck — distroless image has no shell or curl
# ── Grafana ─────────────────────────────────────────────────────────────────
# Single UI for traces (Tempo), metrics (Prometheus), and logs (Loki).
# Accessible at grafana.libnovel.cc via Cloudflare Tunnel.
grafana:
image: grafana/grafana:latest
restart: unless-stopped
depends_on:
- tempo
- prometheus
- loki
expose:
- "3000"
volumes:
- grafana_data:/var/lib/grafana
- ./otel/grafana/provisioning:/etc/grafana/provisioning:ro
environment:
GF_SERVER_ROOT_URL: "https://grafana.libnovel.cc"
GF_SECURITY_ADMIN_USER: "${GRAFANA_ADMIN_USER}"
GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_ADMIN_PASSWORD}"
GF_AUTH_ANONYMOUS_ENABLED: "false"
GF_FEATURE_TOGGLES_ENABLE: "traceqlEditor"
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:3000/api/health"]
interval: 15s
timeout: 5s
retries: 5
# ── Kokoro-FastAPI (GPU TTS) ────────────────────────────────────────────────
# OpenAI-compatible TTS service backed by the Kokoro model, running on the
# homelab RTX 3050 (8 GB VRAM). Replaces the broken kokoro.kalekber.cc DNS.
# Voices match existing IDs: af_bella, af_sky, af_heart, etc.
# The runner reaches it at http://kokoro-fastapi:8880 via the Docker network.
kokoro-fastapi:
image: ghcr.io/remsky/kokoro-fastapi-gpu:latest
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
expose:
- "8880"
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:8880/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
# ── pocket-tts (CPU TTS) ────────────────────────────────────────────────────
# Lightweight CPU-only TTS using kyutai-labs/pocket-tts.
# OpenAI-compatible: POST /v1/audio/speech on port 8000.
# Voices: alba, marius, javert, jean, fantine, cosette, eponine, azelma.
# Not currently used by the runner (runner uses kokoro-fastapi), but available
# for experimentation / fallback.
pocket-tts:
image: ghcr.io/kyutai-labs/pocket-tts:latest
restart: unless-stopped
expose:
- "8000"
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
# ── Watchtower ──────────────────────────────────────────────────────────────
# Auto-updates runner image when CI pushes a new tag.
# Only watches services with the watchtower label.
watchtower:
image: containrrr/watchtower:latest
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock
command: --label-enable --interval 300 --cleanup
environment:
WATCHTOWER_NOTIFICATIONS: "${WATCHTOWER_NOTIFICATIONS}"
WATCHTOWER_NOTIFICATION_URL: "${WATCHTOWER_NOTIFICATION_URL}"
DOCKER_API_VERSION: "1.44"
volumes:
postgres_data:
valkey_data:
uptime_kuma_data:
gotify_data:
tempo_data:
prometheus_data:
loki_data:
grafana_data:

5
homelab/dozzle/users.yml Normal file
View File

@@ -0,0 +1,5 @@
users:
admin:
name: admin
email: admin@libnovel.cc
password: "$2y$10$4jqLza2grpxnQn0EGux2C.UmlSxRmOvH/J1ySzOBxMZgW6cA2TnmK"

View File

@@ -0,0 +1,68 @@
# OTel Collector config
#
# Receivers: OTLP (gRPC + HTTP) from backend, ui, runner
# Processors: batch for efficiency, resource detection for host metadata
# Exporters: Tempo (traces), Prometheus (metrics), Loki (logs)
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 512
# Attach host metadata to all telemetry
resourcedetection:
detectors: [env, system]
timeout: 5s
exporters:
# Traces → Tempo
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
# Metrics → Prometheus (remote write)
prometheusremotewrite:
endpoint: "http://prometheus:9090/api/v1/write"
tls:
insecure_skip_verify: true
# Logs → Loki (via OTLP HTTP endpoint)
otlphttp/loki:
endpoint: "http://loki:3100/otlp"
tls:
insecure: true
# Collector self-observability (optional debug)
debug:
verbosity: basic
extensions:
health_check:
endpoint: 0.0.0.0:13133
pprof:
endpoint: 0.0.0.0:1777
service:
extensions: [health_check, pprof]
pipelines:
traces:
receivers: [otlp]
processors: [resourcedetection, batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [resourcedetection, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [resourcedetection, batch]
exporters: [otlphttp/loki]

View File

@@ -0,0 +1,13 @@
# Grafana dashboard provisioning
# Points Grafana at the local dashboards directory.
# Drop any .json dashboard file into homelab/otel/grafana/provisioning/dashboards/
# and it will appear in Grafana automatically on restart.
apiVersion: 1
providers:
- name: libnovel
folder: LibNovel
type: file
options:
path: /etc/grafana/provisioning/dashboards

View File

@@ -0,0 +1,53 @@
# Grafana datasource provisioning
# Auto-configures Tempo, Prometheus, and Loki on first start.
# No manual setup needed in the UI.
apiVersion: 1
datasources:
- name: Tempo
type: tempo
uid: tempo
url: http://tempo:3200
access: proxy
isDefault: false
jsonData:
httpMethod: GET
serviceMap:
datasourceUid: prometheus
nodeGraph:
enabled: true
traceQuery:
timeShiftEnabled: true
spanStartTimeShift: "1h"
spanEndTimeShift: "-1h"
spanBar:
type: "Tag"
tag: "http.url"
lokiSearch:
datasourceUid: loki
- name: Prometheus
type: prometheus
uid: prometheus
url: http://prometheus:9090
access: proxy
isDefault: true
jsonData:
httpMethod: POST
exemplarTraceIdDestinations:
- name: traceID
datasourceUid: tempo
- name: Loki
type: loki
uid: loki
url: http://loki:3100
access: proxy
isDefault: false
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: '"traceID":"(\w+)"'
name: TraceID
url: "$${__value.raw}"

38
homelab/otel/loki.yaml Normal file
View File

@@ -0,0 +1,38 @@
# Loki config — minimal single-node setup
# Receives logs from OTel Collector. 30-day retention.
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
instance_addr: 127.0.0.1
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
limits_config:
retention_period: 720h # 30 days
compactor:
working_directory: /loki/compactor
delete_request_store: filesystem
retention_enabled: true

View File

@@ -0,0 +1,22 @@
# Prometheus config
# Scrapes OTel collector self-metrics and runner metrics endpoint.
# Backend metrics come in via OTel remote-write — no direct scrape needed.
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
environment: production
scrape_configs:
# OTel Collector self-metrics
- job_name: otel-collector
static_configs:
- targets: ["otel-collector:8888"]
# Runner JSON metrics endpoint (native format, no Prometheus client yet)
# Will be replaced by OTLP once runner is instrumented with OTel SDK.
- job_name: libnovel-runner
metrics_path: /metrics
static_configs:
- targets: ["runner:9091"]

45
homelab/otel/tempo.yaml Normal file
View File

@@ -0,0 +1,45 @@
# Tempo config — minimal single-node setup
# Stores traces locally. Grafana queries via the HTTP API on port 3200.
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
ingester:
trace_idle_period: 10s
max_block_bytes: 104857600 # 100MB
max_block_duration: 30m
compactor:
compaction:
block_retention: 720h # 30 days
storage:
trace:
backend: local
local:
path: /var/tempo/blocks
wal:
path: /var/tempo/wal
metrics_generator:
registry:
external_labels:
source: tempo
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://prometheus:9090/api/v1/write
send_exemplars: true
overrides:
defaults:
metrics_generator:
processors: [service-graphs, span-metrics]
generate_native_histograms: both

1184
ui/package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -30,6 +30,11 @@
"dependencies": {
"@aws-sdk/client-s3": "^3.1005.0",
"@aws-sdk/s3-request-presigner": "^3.1005.0",
"@opentelemetry/exporter-logs-otlp-http": "^0.214.0",
"@opentelemetry/exporter-trace-otlp-http": "^0.214.0",
"@opentelemetry/resources": "^2.6.1",
"@opentelemetry/sdk-node": "^0.214.0",
"@opentelemetry/semantic-conventions": "^1.40.0",
"@sentry/sveltekit": "^10.45.0",
"cropperjs": "^1.6.2",
"ioredis": "^5.3.2",

View File

@@ -7,6 +7,33 @@ import { env as pubEnv } from '$env/dynamic/public';
import { log } from '$lib/server/logger';
import { createUserSession, touchUserSession, isSessionRevoked } from '$lib/server/pocketbase';
import { drain as drainPresignCache } from '$lib/server/presignCache';
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPLogExporter } from '@opentelemetry/exporter-logs-otlp-http';
import { BatchLogRecordProcessor } from '@opentelemetry/sdk-logs';
import { resourceFromAttributes } from '@opentelemetry/resources';
import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
// ─── OpenTelemetry server-side tracing + logs ─────────────────────────────────
// No-op when OTEL_EXPORTER_OTLP_ENDPOINT is unset (e.g. local dev).
const otlpEndpoint = process.env.OTEL_EXPORTER_OTLP_ENDPOINT;
if (otlpEndpoint) {
const sdk = new NodeSDK({
resource: resourceFromAttributes({
[ATTR_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME ?? 'ui',
[ATTR_SERVICE_VERSION]: pubEnv.PUBLIC_BUILD_VERSION ?? 'dev'
}),
traceExporter: new OTLPTraceExporter({ url: `${otlpEndpoint}/v1/traces` }),
logRecordProcessors: [
new BatchLogRecordProcessor(
new OTLPLogExporter({ url: `${otlpEndpoint}/v1/logs` })
)
]
});
sdk.start();
process.once('SIGTERM', () => sdk.shutdown().catch(() => {}));
process.once('SIGINT', () => sdk.shutdown().catch(() => {}));
}
// ─── Sentry / GlitchTip server-side error tracking ────────────────────────────
// No-op when PUBLIC_GLITCHTIP_DSN is unset (e.g. local dev).