Compare commits

...

4 Commits

Author SHA1 Message Date
Admin
29d0eeb7e8 docs: add architecture diagrams (D2 + Mermaid)
All checks were successful
CI / Scraper / Test (pull_request) Successful in 10s
CI / UI / Build (pull_request) Successful in 28s
CI / UI / Docker Push (pull_request) Has been skipped
CI / Scraper / Lint (pull_request) Successful in 1m5s
CI / Scraper / Docker Push (pull_request) Has been skipped
iOS CI / Build (pull_request) Successful in 8m7s
iOS CI / Test (pull_request) Successful in 16m14s
Adds docs/architecture.d2 and docs/architecture.mermaid.md showing the
docker-compose-new.yml service topology — storage, application, init
containers, and external dependencies with annotated connections.

Also includes the rendered docs/architecture.svg (D2 output).

View live: d2 --watch docs/architecture.d2
View in Gitea: navigate to docs/architecture.mermaid.md in the web UI.
2026-03-21 20:35:03 +05:00
Admin
fabe9724c2 fix(scraper): add Brotli decompression to HTTP client
Some checks failed
CI / Scraper / Lint (pull_request) Failing after 21s
CI / Scraper / Test (pull_request) Failing after 21s
CI / UI / Build (pull_request) Failing after 21s
CI / Scraper / Docker Push (pull_request) Has been skipped
CI / UI / Docker Push (pull_request) Has been skipped
iOS CI / Build (pull_request) Successful in 3m53s
iOS CI / Test (pull_request) Successful in 6m32s
novelfire.net responds with Content-Encoding: br when the scraper
advertises 'gzip, deflate, br'. The client only handled gzip, so
Brotli-compressed bytes were fed raw into the HTML parser producing
garbage — empty titles, zero chapters, and selector failures.

Added github.com/andybalholm/brotli and wired it into GetContent
alongside the existing gzip path.
2026-03-20 11:19:28 +05:00
Admin
4c9bb4adde feat: add pb-init-v2.sh for v2 stack; wire into docker-compose-new.yml
All checks were successful
CI / Scraper / Lint (pull_request) Successful in 12s
CI / Scraper / Test (pull_request) Successful in 16s
CI / UI / Build (pull_request) Successful in 16s
CI / Scraper / Docker Push (pull_request) Has been skipped
CI / UI / Docker Push (pull_request) Has been skipped
iOS CI / Build (pull_request) Successful in 5m20s
iOS CI / Test (pull_request) Successful in 7m4s
Minimal PocketBase bootstrap for the v2 stack (backend + runner + ui-v2).
Creates only the 6 collections actually used by v2:
  books, chapters_idx, ranking, progress, scraping_tasks, audio_jobs

Drops v1-only collections (app_users, user_settings, audio_cache,
book_comments, comment_votes, user_library, user_sessions,
user_subscriptions) and unused fields (date_label, user_id/audio_time
on progress).  heartbeat_at is included in create_collection from the
start and also covered by ensure_field for existing instances.

docker-compose-new.yml pb-init service now mounts pb-init-v2.sh.
2026-03-15 21:58:02 +05:00
Admin
22b6ee824e fix(pb-init): use python3 for JSON parsing in ensure_field; add heartbeat_at fields
All checks were successful
CI / Scraper / Test (pull_request) Successful in 14s
CI / UI / Build (pull_request) Successful in 17s
CI / UI / Docker Push (pull_request) Has been skipped
CI / Scraper / Lint (pull_request) Successful in 21s
CI / Scraper / Docker Push (pull_request) Has been skipped
iOS CI / Build (pull_request) Successful in 8m22s
iOS CI / Test (pull_request) Successful in 12m35s
The sed-based collection id and fields extraction was greedy and broke on
collections with multiple fields (grabbed the last field id instead of the
top-level collection id → PATCH to wrong URL → 404).

Rewrite ensure_field to use python3 for reliable JSON parsing. Also adds the
missing heartbeat_at (date) field to scraping_tasks and audio_jobs which was
never applied on the initial deploy because the bug prevented the PATCH.
2026-03-15 21:53:54 +05:00
9 changed files with 552 additions and 12 deletions

View File

@@ -65,7 +65,7 @@ services:
POCKETBASE_ADMIN_EMAIL: "${POCKETBASE_ADMIN_EMAIL:-admin@libnovel.local}"
POCKETBASE_ADMIN_PASSWORD: "${POCKETBASE_ADMIN_PASSWORD:-changeme123}"
volumes:
- ./scripts/pb-init.sh:/pb-init.sh:ro
- ./scripts/pb-init-v2.sh:/pb-init.sh:ro
entrypoint: ["sh", "/pb-init.sh"]
# ─── Backend API ──────────────────────────────────────────────────────────────

99
docs/architecture.d2 Normal file
View File

@@ -0,0 +1,99 @@
direction: right
# ─── External ─────────────────────────────────────────────────────────────────
novelfire: novelfire.net {
shape: cloud
style.fill: "#f0f4ff"
}
kokoro: Kokoro-FastAPI TTS {
shape: cloud
style.fill: "#f0f4ff"
}
browser: Browser / iOS App {
shape: person
style.fill: "#fff9e6"
}
# ─── Init containers (one-shot) ───────────────────────────────────────────────
init: Init containers {
style.fill: "#f5f5f5"
style.stroke-dash: 4
minio-init: minio-init {
shape: rectangle
label: "minio-init\n(mc: create buckets)"
}
pb-init: pb-init {
shape: rectangle
label: "pb-init\n(bootstrap collections)"
}
}
# ─── Storage ──────────────────────────────────────────────────────────────────
storage: Storage {
style.fill: "#eaf7ea"
minio: MinIO {
shape: cylinder
label: "MinIO :9000\n\nbuckets:\n libnovel-chapters\n libnovel-audio\n libnovel-avatars\n libnovel-browse"
}
pocketbase: PocketBase {
shape: cylinder
label: "PocketBase :8090\n\ncollections:\n books chapters_idx\n audio_cache progress\n scrape_jobs app_users\n ranking"
}
}
# ─── Application ──────────────────────────────────────────────────────────────
app: Application {
style.fill: "#eef3ff"
backend: backend {
shape: rectangle
label: "Backend API :8080\n(Go — HTTP API server)"
}
runner: runner {
shape: rectangle
label: "Runner\n(Go — background worker\nscraping + TTS jobs)"
}
ui: ui {
shape: rectangle
label: "SvelteKit UI :5252\n(adapter-node)"
}
}
# ─── Init → Storage deps ──────────────────────────────────────────────────────
init.minio-init -> storage.minio: create buckets {style.stroke-dash: 4}
init.pb-init -> storage.pocketbase: bootstrap schema {style.stroke-dash: 4}
# ─── App → Storage ────────────────────────────────────────────────────────────
app.backend -> storage.minio: blobs (chapters, audio,\navatars, browse)
app.backend -> storage.pocketbase: structured records\n(books, progress, jobs…)
app.runner -> storage.minio: write chapter markdown\n& audio MP3s
app.runner -> storage.pocketbase: read/update scrape jobs\nwrite book records
# ─── App internal ─────────────────────────────────────────────────────────────
app.ui -> app.backend: REST API calls\n(server-side)
# ─── External → App ───────────────────────────────────────────────────────────
app.runner -> novelfire: scrape\n(HTTP GET)
app.runner -> kokoro: TTS generation\n(HTTP POST)
# ─── Browser ──────────────────────────────────────────────────────────────────
browser -> app.ui: HTTPS :5252
browser -> storage.minio: presigned URLs\n(audio / chapter downloads)

View File

@@ -0,0 +1,47 @@
```mermaid
graph LR
%% ── External ──────────────────────────────────────────────────────────
NF([novelfire.net])
KK([Kokoro-FastAPI TTS])
CL([Browser / iOS App])
%% ── Init containers ───────────────────────────────────────────────────
subgraph INIT["Init containers (one-shot)"]
MI[minio-init\nmc: create buckets]
PI[pb-init\nbootstrap collections]
end
%% ── Storage ───────────────────────────────────────────────────────────
subgraph STORAGE["Storage"]
MN[(MinIO :9000\nchapters · audio\navatars · browse)]
PB[(PocketBase :8090\nbooks · chapters_idx\naudio_cache · progress\nscrape_jobs · app_users · ranking)]
end
%% ── Application ───────────────────────────────────────────────────────
subgraph APP["Application"]
BE[Backend API :8080\nGo HTTP server]
RN[Runner\nGo background worker]
UI[SvelteKit UI :5252]
end
%% ── Init → Storage ────────────────────────────────────────────────────
MI -.->|create buckets| MN
PI -.->|bootstrap schema| PB
%% ── App → Storage ─────────────────────────────────────────────────────
BE -->|blobs| MN
BE -->|structured records| PB
RN -->|chapter markdown & audio| MN
RN -->|read/update jobs & books| PB
%% ── App internal ──────────────────────────────────────────────────────
UI -->|REST API| BE
%% ── Runner → External ─────────────────────────────────────────────────
RN -->|scrape HTTP GET| NF
RN -->|TTS HTTP POST| KK
%% ── Client ────────────────────────────────────────────────────────────
CL -->|HTTPS :5252| UI
CL -->|presigned URLs| MN
```

119
docs/architecture.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 43 KiB

View File

@@ -10,6 +10,7 @@ require (
require (
github.com/BurntSushi/toml v1.4.1-0.20240526193622-a339e1f7089c // indirect
github.com/andybalholm/brotli v1.2.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/go-ini/ini v1.67.0 // indirect

View File

@@ -1,5 +1,7 @@
github.com/BurntSushi/toml v1.4.1-0.20240526193622-a339e1f7089c h1:pxW6RcqyfI9/kWtOwnv/G+AzdKuy2ZrqINhenH4HyNs=
github.com/BurntSushi/toml v1.4.1-0.20240526193622-a339e1f7089c/go.mod h1:ukJfTF/6rtPPRCnwkur4qwRxa8vTRFBF0uk2lLoLwho=
github.com/andybalholm/brotli v1.2.0 h1:ukwgCxwYrmACq68yiUqwIWnGY0cTPox/M94sVwToPjQ=
github.com/andybalholm/brotli v1.2.0/go.mod h1:rzTDkvFWvIrjDXZHkuS16NPggd91W3kUSvPlQ1pLaKY=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=

View File

@@ -10,6 +10,8 @@ import (
"os"
"strings"
"time"
"github.com/andybalholm/brotli"
)
type httpClient struct {
@@ -106,16 +108,17 @@ func (c *httpClient) GetContent(ctx context.Context, req ContentRequest) (string
// net/http decompresses gzip automatically only when it sets the header
// itself; since we set Accept-Encoding explicitly we must do it ourselves.
body := resp.Body
if strings.EqualFold(resp.Header.Get("Content-Encoding"), "gzip") {
switch strings.ToLower(resp.Header.Get("Content-Encoding")) {
case "gzip":
gr, gzErr := gzip.NewReader(resp.Body)
if gzErr != nil {
return "", fmt.Errorf("http: gzip reader: %w", gzErr)
}
defer gr.Close()
body = gr
case "br":
body = io.NopCloser(brotli.NewReader(resp.Body))
}
// br (Brotli) decompression requires an external package; skip for now —
// the server will fall back to gzip or plain text for unknown encodings.
raw, err := io.ReadAll(body)
if err != nil {

257
scripts/pb-init-v2.sh Executable file
View File

@@ -0,0 +1,257 @@
#!/bin/sh
# pb-init-v2.sh — idempotent PocketBase collection bootstrap for the v2 stack
#
# Creates all collections required by libnovel v2 (backend + runner + ui-v2).
# Safe to re-run: POST returns 400/422 when a collection already exists; both
# are treated as success. The ensure_field helper adds fields to existing
# instances without touching fields that are already present.
#
# Collections created:
# books — book metadata
# chapters_idx — per-chapter index (title, number)
# ranking — novelfire ranking snapshots
# progress — per-session reading progress
# scraping_tasks — scrape job queue (runner ↔ backend)
# audio_jobs — TTS job queue (runner ↔ backend)
#
# Required env vars (with defaults matching docker-compose-new.yml):
# POCKETBASE_URL http://pocketbase:8090
# POCKETBASE_ADMIN_EMAIL admin@libnovel.local
# POCKETBASE_ADMIN_PASSWORD changeme123
set -e
PB_URL="${POCKETBASE_URL:-http://pocketbase:8090}"
PB_EMAIL="${POCKETBASE_ADMIN_EMAIL:-admin@libnovel.local}"
PB_PASSWORD="${POCKETBASE_ADMIN_PASSWORD:-changeme123}"
log() { echo "[pb-init-v2] $*"; }
# ─── 0. Ensure curl and python3 are available ────────────────────────────────
if ! command -v curl > /dev/null 2>&1; then
apk add --no-cache curl > /dev/null 2>&1
fi
if ! command -v python3 > /dev/null 2>&1; then
apk add --no-cache python3 > /dev/null 2>&1
fi
# ─── 1. Wait for PocketBase to be ready ──────────────────────────────────────
log "waiting for PocketBase at $PB_URL ..."
until curl -sf "$PB_URL/api/health" > /dev/null 2>&1; do
sleep 2
done
log "PocketBase is up"
# ─── 2. Ensure the superuser exists ──────────────────────────────────────────
#
# On a fresh install PocketBase v0.23+ exposes a one-time install token in the
# /_/ redirect Location header. Use it to create the superuser if needed; on
# subsequent runs the token is gone and we fall through to normal auth.
log "ensuring superuser $PB_EMAIL exists ..."
LOCATION=$(curl -sf -o /dev/null -w "%{redirect_url}" "$PB_URL/_/" 2>/dev/null || true)
if echo "$LOCATION" | grep -q "pbinstal/"; then
INSTALL_TOKEN=$(echo "$LOCATION" | sed 's|.*pbinstal/||' | tr -d ' \r\n')
log "install token found — creating superuser via install endpoint"
curl -sf -X POST "$PB_URL/api/collections/_superusers/records" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INSTALL_TOKEN" \
-d "{\"email\":\"$PB_EMAIL\",\"password\":\"$PB_PASSWORD\",\"passwordConfirm\":\"$PB_PASSWORD\"}" \
> /dev/null 2>&1 || true
log "superuser create attempted (may already exist)"
fi
# ─── 3. Authenticate and obtain a superuser token ────────────────────────────
log "authenticating as $PB_EMAIL ..."
AUTH_RESPONSE=$(curl -sf -X POST "$PB_URL/api/collections/_superusers/auth-with-password" \
-H "Content-Type: application/json" \
-d "{\"identity\":\"$PB_EMAIL\",\"password\":\"$PB_PASSWORD\"}")
TOKEN=$(echo "$AUTH_RESPONSE" | sed 's/.*"token":"\([^"]*\)".*/\1/')
if [ -z "$TOKEN" ] || [ "$TOKEN" = "$AUTH_RESPONSE" ]; then
log "ERROR: failed to obtain auth token. Response: $AUTH_RESPONSE"
exit 1
fi
log "auth token obtained"
# ─── 4. Helpers ──────────────────────────────────────────────────────────────
# create_collection NAME JSON_BODY
# POSTs to /api/collections. 400/422 = already exists → treated as success.
create_collection() {
NAME="$1"
BODY="$2"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
-X POST "$PB_URL/api/collections" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d "$BODY")
case "$STATUS" in
200|201) log "created collection: $NAME" ;;
400|422) log "collection already exists (skipped): $NAME" ;;
*) log "WARNING: unexpected status $STATUS for collection: $NAME" ;;
esac
}
# ensure_field COLLECTION FIELD_NAME FIELD_TYPE
#
# Uses python3 to parse the collection schema, then PATCHes the full fields
# array with the new field appended — only if it is not already present.
# python3 is required to correctly extract the top-level collection id from
# the JSON response (sed-based extraction is unreliable on multi-field schemas
# because the greedy pattern picks up a field id instead of the collection id).
ensure_field() {
COLL="$1"
FIELD_NAME="$2"
FIELD_TYPE="$3"
SCHEMA=$(curl -sf \
-H "Authorization: Bearer $TOKEN" \
"$PB_URL/api/collections/$COLL" 2>/dev/null)
PARSED=$(echo "$SCHEMA" | python3 -c "
import sys, json
try:
d = json.load(sys.stdin)
fields = d.get('fields', [])
exists = any(f.get('name') == '$FIELD_NAME' for f in fields)
print('exists=' + str(exists))
print('id=' + d.get('id', ''))
if not exists:
fields.append({'name': '$FIELD_NAME', 'type': '$FIELD_TYPE'})
print('fields=' + json.dumps(fields))
except Exception as e:
print('error=' + str(e))
" 2>/dev/null)
if echo "$PARSED" | grep -q "^exists=True"; then
log "field $COLL.$FIELD_NAME already exists — skipping"
return
fi
COLLECTION_ID=$(echo "$PARSED" | grep "^id=" | sed 's/^id=//')
if [ -z "$COLLECTION_ID" ]; then
log "WARNING: could not get id for collection $COLL — skipping ensure_field"
return
fi
NEW_FIELDS=$(echo "$PARSED" | grep "^fields=" | sed 's/^fields=//')
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
-X PATCH "$PB_URL/api/collections/$COLLECTION_ID" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d "{\"fields\":${NEW_FIELDS}}")
case "$STATUS" in
200|201) log "patched $COLL — added field: $FIELD_NAME ($FIELD_TYPE)" ;;
*) log "WARNING: patch returned $STATUS when adding $FIELD_NAME to $COLL" ;;
esac
}
# ─── 5. Collections ───────────────────────────────────────────────────────────
# books — one record per scraped novel
create_collection "books" '{
"name": "books",
"type": "base",
"fields": [
{"name": "slug", "type": "text", "required": true},
{"name": "title", "type": "text", "required": true},
{"name": "author", "type": "text"},
{"name": "cover", "type": "text"},
{"name": "status", "type": "text"},
{"name": "genres", "type": "json"},
{"name": "summary", "type": "text"},
{"name": "total_chapters", "type": "number"},
{"name": "source_url", "type": "text"},
{"name": "ranking", "type": "number"}
]
}'
# chapters_idx — lightweight chapter list (no content; content lives in MinIO)
create_collection "chapters_idx" '{
"name": "chapters_idx",
"type": "base",
"fields": [
{"name": "slug", "type": "text", "required": true},
{"name": "number", "type": "number", "required": true},
{"name": "title", "type": "text"}
]
}'
# ranking — periodic novelfire ranking snapshots
create_collection "ranking" '{
"name": "ranking",
"type": "base",
"fields": [
{"name": "rank", "type": "number", "required": true},
{"name": "slug", "type": "text", "required": true},
{"name": "title", "type": "text"},
{"name": "author", "type": "text"},
{"name": "cover", "type": "text"},
{"name": "status", "type": "text"},
{"name": "genres", "type": "json"},
{"name": "source_url", "type": "text"}
]
}'
# progress — per-session reading progress (no user accounts required)
create_collection "progress" '{
"name": "progress",
"type": "base",
"fields": [
{"name": "session_id", "type": "text", "required": true},
{"name": "slug", "type": "text", "required": true},
{"name": "chapter", "type": "number"}
]
}'
# scraping_tasks — scrape job queue consumed by the runner
create_collection "scraping_tasks" '{
"name": "scraping_tasks",
"type": "base",
"fields": [
{"name": "kind", "type": "text"},
{"name": "target_url", "type": "text"},
{"name": "from_chapter", "type": "number"},
{"name": "to_chapter", "type": "number"},
{"name": "worker_id", "type": "text"},
{"name": "status", "type": "text", "required": true},
{"name": "books_found", "type": "number"},
{"name": "chapters_scraped", "type": "number"},
{"name": "chapters_skipped", "type": "number"},
{"name": "errors", "type": "number"},
{"name": "error_message", "type": "text"},
{"name": "started", "type": "date"},
{"name": "finished", "type": "date"},
{"name": "heartbeat_at", "type": "date"}
]
}'
# audio_jobs — TTS generation queue consumed by the runner
create_collection "audio_jobs" '{
"name": "audio_jobs",
"type": "base",
"fields": [
{"name": "cache_key", "type": "text", "required": true},
{"name": "slug", "type": "text", "required": true},
{"name": "chapter", "type": "number", "required": true},
{"name": "voice", "type": "text"},
{"name": "worker_id", "type": "text"},
{"name": "status", "type": "text", "required": true},
{"name": "error_message", "type": "text"},
{"name": "started", "type": "date"},
{"name": "finished", "type": "date"},
{"name": "heartbeat_at", "type": "date"}
]
}'
# ─── 6. Schema migrations (idempotent — safe to re-run on existing instances) ─
#
# heartbeat_at was added after the initial v2 deploy. ensure_field is a no-op
# if the field already exists (e.g. fresh installs that ran this script from
# the start already have it from the create_collection call above).
ensure_field "scraping_tasks" "heartbeat_at" "date"
ensure_field "audio_jobs" "heartbeat_at" "date"
log "all collections ready"

View File

@@ -98,22 +98,34 @@ ensure_field() {
-H "Authorization: Bearer $TOKEN" \
"$PB_URL/api/collections/$COLL" 2>/dev/null)
# Check if the field already exists (look for "name":"<FIELD_NAME>" in the fields array)
if echo "$SCHEMA" | grep -q "\"name\":\"$FIELD_NAME\""; then
# Use python3 to reliably parse the JSON schema.
PARSED=$(echo "$SCHEMA" | python3 -c "
import sys, json
try:
d = json.load(sys.stdin)
fields = d.get('fields', [])
exists = any(f.get('name') == '$FIELD_NAME' for f in fields)
print('exists=' + str(exists))
print('id=' + d.get('id', ''))
if not exists:
fields.append({'name': '$FIELD_NAME', 'type': '$FIELD_TYPE'})
print('fields=' + json.dumps(fields))
except Exception as e:
print('error=' + str(e))
" 2>/dev/null)
if echo "$PARSED" | grep -q "^exists=True"; then
log "field $COLL.$FIELD_NAME already exists — skipping"
return
fi
COLLECTION_ID=$(echo "$SCHEMA" | sed 's/.*"id":"\([^"]*\)".*/\1/')
if [ -z "$COLLECTION_ID" ] || [ "$COLLECTION_ID" = "$SCHEMA" ]; then
COLLECTION_ID=$(echo "$PARSED" | grep "^id=" | sed 's/^id=//')
if [ -z "$COLLECTION_ID" ]; then
log "WARNING: could not get id for collection $COLL — skipping ensure_field"
return
fi
# Extract current fields array and append the new field before the closing bracket.
CURRENT_FIELDS=$(echo "$SCHEMA" | sed 's/.*"fields":\(\[.*\]\).*/\1/')
TRIMMED=$(echo "$CURRENT_FIELDS" | sed 's/]$//')
NEW_FIELDS="${TRIMMED},{\"name\":\"${FIELD_NAME}\",\"type\":\"${FIELD_TYPE}\"}]"
NEW_FIELDS=$(echo "$PARSED" | grep "^fields=" | sed 's/^fields=//')
PATCH_BODY="{\"fields\":${NEW_FIELDS}}"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \