Compare commits

...

8 Commits

Author SHA1 Message Date
root
8662aed565 feat: PDF single-chapter import, EPUB numbering fix, admin chapter split tool
All checks were successful
Release / Check ui (push) Successful in 2m10s
Release / Test backend (push) Successful in 53s
Release / Docker (push) Successful in 6m7s
Release / Gitea Release (push) Successful in 23s
- parsePDF: return all text as single 'Full Text' chapter (admin splits manually)
- parseEPUB: fix chapter numbering to use sequential counter not spine index
- Remove dead code: chaptersFromBookmarks, cleanChapterText, extractChaptersFromText, chapterHeadingRE; drop pdfcpu alias and regexp imports
- Backend: POST /api/admin/books/:slug/split-chapters endpoint — splits text on '---' dividers, optional '## Title' headers, writes chapters via WriteChapter
- UI: admin panel now shows for all admin users regardless of source_url; chapter split tool shown when book has single 'Full Text' chapter, pre-fills from MinIO content
2026-04-09 23:59:24 +05:00
root
cdfa1ac5b2 fix(pdf): fix page ordering, Win-1252 quotes, and chapter header cleanup
Some checks failed
Release / Test backend (push) Successful in 54s
Release / Check ui (push) Successful in 1m49s
Release / Gitea Release (push) Has been cancelled
Release / Docker (push) Has been cancelled
Three fixes to PDF chapter extraction quality:

1. Page ordering: parse page number from pdfcpu filename (out_Content_page_N.txt)
   instead of using lexicographic sort index — fixes chapters bleeding into each
   other (e.g. Prologue text appearing inside Chapter 1).

2. Windows-1252 chars: map bytes 0x91-0x9F to proper Unicode (curly quotes U+2018/
   U+2019/U+201C/U+201D, em-dash U+2014, etc.) instead of raw Latin-1 control
   bytes that rendered as ◆ in the browser.

3. Chapter header cleanup: skip the first page of each bookmark range (decorative
   title art page) and strip any run-on title fragment at the start of the first
   body page (e.g. 'for New Journeys!I stood atop...' → 'I stood atop...'). The
   remaining sentence truncation is a fundamental limitation of this PDF's
   PUA-encoded body font (C2_1/Literata) — those glyphs cannot be decoded without
   the publisher's private ToUnicode mapping.
2026-04-09 23:43:09 +05:00
root
ffcdf5ee10 fix(pdf): replace dslipak/pdf with pdfcpu bookmark+content-stream extraction
All checks were successful
Release / Test backend (push) Successful in 4m55s
Release / Check ui (push) Successful in 1m52s
Release / Docker (push) Successful in 7m13s
Release / Gitea Release (push) Successful in 46s
Use PDF outline (bookmarks) for chapter titles and page ranges, then
extract text from per-page content streams via pdfcpu ExtractContent.
This avoids the indefinite hang caused by dslipak/pdf trying to resolve
custom font ToUnicode CMaps on publisher PDFs.

- parsePDF: decrypt → ExtractContent → Bookmarks → chaptersFromBookmarks
- chaptersFromBookmarks: flatten bookmark tree, skip front/back matter
  (Cover, Insert, Title Page, Copyright, Appendix), assign page ranges
- extractTextFromContentStream: handle TJ arrays (concat literal strings,
  skip hex glyph arrays and kerning numbers) + single Tj strings
- Falls back to paragraph-splitting when no bookmarks present
- Build verified; test PDF produces 9 chapters with proper titles
2026-04-09 22:36:58 +05:00
root
899c504d1f feat(import): move PDF parsing to backend; fix heartbeat/reap for import_tasks
All checks were successful
Release / Test backend (push) Successful in 51s
Release / Check ui (push) Successful in 2m2s
Release / Docker (push) Successful in 7m32s
Release / Gitea Release (push) Successful in 1m0s
- parsePDF function restored in import.go (body was orphaned outside function)
- ParseImportFile() called at upload time with 3-min timeout; chapters stored as JSON in MinIO
- runner.go: prefer ChaptersKey path (read pre-parsed JSON) over BookImport.Import()
- ImportChapterStore interface added; store wired in runner/main.go
- HeartbeatTask and ReapStaleTasks now include import_tasks collection
- parseImportTask now returns ChaptersKey in domain.ImportTask
- asynq_runner.go handleImportTask passes ChaptersKey
- pb-init-v3.sh: chapters_key field added to import_tasks schema
2026-04-09 21:19:43 +05:00
root
d82aa9d4b4 fix(import): decrypt owner-encrypted PDFs with pdfcpu; add imports bucket to minio-init
All checks were successful
Release / Test backend (push) Successful in 2m10s
Release / Check ui (push) Successful in 1m55s
Release / Docker (push) Successful in 7m17s
Release / Gitea Release (push) Successful in 48s
- parsePDF now attempts to strip encryption via pdfcpu (empty user password)
  before handing bytes to dslipak/pdf — fixes '256-bit encryption key' error
  on publisher PDFs that use owner-only encryption (copy/print restrictions)
- Add pdfcpu v0.11.1 as direct dependency (was already indirect)
- docker-compose.yml minio-init: add 'imports' and 'translations' buckets
  so a fresh deploy creates all required buckets
2026-04-09 20:08:12 +05:00
root
ae08382b81 fix(import): wire ImportFileStore to bypass Asynq type assertion; add pb-init collections
All checks were successful
Release / Test backend (push) Successful in 44s
Release / Check ui (push) Successful in 1m56s
Release / Docker (push) Successful in 6m10s
Release / Gitea Release (push) Successful in 37s
- Add ImportFileStore interface to bookstore package
- Add ImportFileStore field to backend.Dependencies
- Wire ImportFileStore: store in cmd/backend/main.go
- handlers_import.go: use s.deps.ImportFileStore.PutImportFile instead of
  broken s.deps.Producer.(*storage.Store) type assertion (fails when Asynq active)
- pb-init-v3.sh: add import_tasks and notifications collection definitions
2026-04-09 19:05:11 +05:00
root
b9f8008c2c chore: embed git credentials in remote URL; update AGENTS.md 2026-04-09 17:09:53 +05:00
root
d25cee3d8c fix(ci): track generated admin_nav_notifications.js to avoid CDN-dependent paraglide failure
All checks were successful
Release / Test backend (push) Successful in 37s
Release / Check ui (push) Successful in 1m54s
Release / Docker (push) Successful in 5m45s
Release / Gitea Release (push) Successful in 35s
2026-04-09 17:03:30 +05:00
20 changed files with 940 additions and 199 deletions

View File

@@ -47,11 +47,21 @@ Sub-directories have their own `AGENTS.md` with deeper context (e.g. `ios/AGENTS
- `release.yaml` — runs on `v*` tags (build Docker images, upload source maps, create Gitea release)
- Secrets: `DOCKER_USER`, `DOCKER_TOKEN`, `GITEA_TOKEN`, `GLITCHTIP_AUTH_TOKEN`
### Git credentials
Credentials are embedded in the remote URL — no `HOME=/root` or credential helper needed for push:
```
https://kamil:95782641Apple%24@gitea.kalekber.cc/kamil/libnovel.git
```
All git commands still use `HOME=/root` prefix for consistency (picks up `/root/.gitconfig` for user name/email), but push auth works without it.
### Releasing a new version
```bash
git tag v2.5.X -m "Short title\n\nOptional longer body"
git push origin v2.5.X
HOME=/root git tag v2.6.X -m "Short title"
HOME=/root git push origin v3-cleanup --tags
```
CI will build all Docker images, upload source maps to GlitchTip, and create a Gitea release automatically.

View File

@@ -189,6 +189,7 @@ func run() error {
ChapterImageStore: store,
Producer: producer,
TaskReader: store,
ImportFileStore: store,
SearchIndex: searchIndex,
Kokoro: kokoroClient,
PocketTTS: pocketTTSClient,

View File

@@ -192,21 +192,22 @@ func run() error {
deps := runner.Dependencies{
Consumer: consumer,
BookWriter: store,
BookReader: store,
AudioStore: store,
CoverStore: store,
TranslationStore: store,
BookImport: storage.NewBookImporter(store),
ChapterIngester: store,
SearchIndex: searchIndex,
Novel: novel,
Kokoro: kokoroClient,
PocketTTS: pocketTTSClient,
CFAI: cfaiClient,
LibreTranslate: ltClient,
Notifier: store,
Log: log,
BookWriter: store,
BookReader: store,
AudioStore: store,
CoverStore: store,
TranslationStore: store,
BookImport: storage.NewBookImporter(store),
ImportChapterStore: store,
ChapterIngester: store,
SearchIndex: searchIndex,
Novel: novel,
Kokoro: kokoroClient,
PocketTTS: pocketTTSClient,
CFAI: cfaiClient,
LibreTranslate: ltClient,
Notifier: store,
Log: log,
}
r := runner.New(rCfg, deps)

View File

@@ -3,7 +3,23 @@ module github.com/libnovel/backend
go 1.26.1
require (
github.com/getsentry/sentry-go v0.43.0
github.com/hibiken/asynq v0.26.0
github.com/hibiken/asynq/x v0.0.0-20260203063626-d704b68a426d
github.com/meilisearch/meilisearch-go v0.36.1
github.com/minio/minio-go/v7 v7.0.98
github.com/pdfcpu/pdfcpu v0.11.1
github.com/prometheus/client_golang v1.23.2
github.com/redis/go-redis/v9 v9.18.0
github.com/yuin/goldmark v1.8.2
go.opentelemetry.io/contrib/bridges/otelslog v0.17.0
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.67.0
go.opentelemetry.io/otel v1.42.0
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.18.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.42.0
go.opentelemetry.io/otel/log v0.18.0
go.opentelemetry.io/otel/sdk v1.42.0
go.opentelemetry.io/otel/sdk/log v0.18.0
golang.org/x/net v0.51.0
)
@@ -13,12 +29,9 @@ require (
github.com/cenkalti/backoff/v5 v5.0.3 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/clipperhouse/uax29/v2 v2.2.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
github.com/dslipak/pdf v0.0.2 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/getsentry/sentry-go v0.43.0 // indirect
github.com/go-ini/ini v1.67.0 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
@@ -28,40 +41,25 @@ require (
github.com/hhrutter/lzw v1.0.0 // indirect
github.com/hhrutter/pkcs7 v0.2.0 // indirect
github.com/hhrutter/tiff v1.0.2 // indirect
github.com/hibiken/asynq v0.26.0 // indirect
github.com/hibiken/asynq/x v0.0.0-20260203063626-d704b68a426d // indirect
github.com/klauspost/compress v1.18.2 // indirect
github.com/klauspost/cpuid/v2 v2.2.11 // indirect
github.com/klauspost/crc32 v1.3.0 // indirect
github.com/mattn/go-runewidth v0.0.19 // indirect
github.com/meilisearch/meilisearch-go v0.36.1 // indirect
github.com/minio/crc64nvme v1.1.1 // indirect
github.com/minio/md5-simd v1.1.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/philhofer/fwd v1.2.0 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/client_golang v1.23.2 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.66.1 // indirect
github.com/prometheus/procfs v0.16.1 // indirect
github.com/redis/go-redis/v9 v9.18.0 // indirect
github.com/robfig/cron/v3 v3.0.1 // indirect
github.com/rs/xid v1.6.0 // indirect
github.com/spf13/cast v1.10.0 // indirect
github.com/tinylib/msgp v1.6.1 // indirect
github.com/yuin/goldmark v1.8.2 // indirect
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
go.opentelemetry.io/contrib/bridges/otelslog v0.17.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.67.0 // indirect
go.opentelemetry.io/otel v1.42.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.18.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.42.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.42.0 // indirect
go.opentelemetry.io/otel/log v0.18.0 // indirect
go.opentelemetry.io/otel/metric v1.42.0 // indirect
go.opentelemetry.io/otel/sdk v1.42.0 // indirect
go.opentelemetry.io/otel/sdk/log v0.18.0 // indirect
go.opentelemetry.io/otel/trace v1.42.0 // indirect
go.opentelemetry.io/proto/otlp v1.9.0 // indirect
go.uber.org/atomic v1.11.0 // indirect
@@ -77,5 +75,4 @@ require (
google.golang.org/grpc v1.79.2 // indirect
google.golang.org/protobuf v1.36.11 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

View File

@@ -2,6 +2,10 @@ github.com/andybalholm/brotli v1.1.1 h1:PR2pgnyFznKEugtsUo0xLdDop5SKXd5Qf5ysW+7X
github.com/andybalholm/brotli v1.1.1/go.mod h1:05ib4cKhjx3OQYUY22hTVd34Bc8upXjOLL2rKwwZBoA=
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/bsm/ginkgo/v2 v2.12.0 h1:Ny8MWAHyOepLGlLKYmXG4IEkioBysk6GpaRTLC8zwWs=
github.com/bsm/ginkgo/v2 v2.12.0/go.mod h1:SwYbGRRDovPVboqFv0tPTcG1sN61LM1Z4ARdbAV9g4c=
github.com/bsm/gomega v1.27.10 h1:yeMWxP2pV2fG3FgAODIY8EiRE3dy0aeFYt4l7wh6yKA=
github.com/bsm/gomega v1.27.10/go.mod h1:JyEr/xRbxbtgWNi8tIEVPUYZ5Dzef52k01W3YH0H+O0=
github.com/cenkalti/backoff/v5 v5.0.3 h1:ZN+IMa753KfX5hd8vVaMixjnqRZ3y8CuJKRKj1xcsSM=
github.com/cenkalti/backoff/v5 v5.0.3/go.mod h1:rkhZdG3JZukswDf7f0cwqPNk4K0sa+F97BxZthm/crw=
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
@@ -12,14 +16,16 @@ github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/rVNCu3HqELle0jiPLLBs70cWOduZpkS1E78=
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc=
github.com/dslipak/pdf v0.0.2 h1:djAvcM5neg9Ush+zR6QXB+VMJzR6TdnX766HPIg1JmI=
github.com/dslipak/pdf v0.0.2/go.mod h1:2L3SnkI9cQwnAS9gfPz2iUoLC0rUZwbucpbKi5R1mUo=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg=
github.com/felixge/httpsnoop v1.0.4/go.mod h1:m8KPJKqk1gH5J9DgRY2ASl2lWCfGKXixSwevea8zH2U=
github.com/frankban/quicktest v1.14.6 h1:7Xjx+VpznH+oBnejlPUj8oUpdxnVs4f8XU8WnHkI4W8=
github.com/frankban/quicktest v1.14.6/go.mod h1:4ptaffx2x8+WTWXmUCuVU6aPUX1/Mz7zb5vbUoiM6w0=
github.com/getsentry/sentry-go v0.43.0 h1:XbXLpFicpo8HmBDaInk7dum18G9KSLcjZiyUKS+hLW4=
github.com/getsentry/sentry-go v0.43.0/go.mod h1:XDotiNZbgf5U8bPDUAfvcFmOnMQQceESxyKaObSssW0=
github.com/go-errors/errors v1.4.2 h1:J6MZopCL4uSllY1OfXM374weqZFFItUbrImctkmUxIA=
github.com/go-errors/errors v1.4.2/go.mod h1:sIVyrIiJhuEF+Pj9Ebtd6P/rEYROXFi3BopGUQ5a5Og=
github.com/go-ini/ini v1.67.0 h1:z6ZrTEZqSWOTyH2FlglNbNgARyHG8oLW9gMELqKr06A=
github.com/go-ini/ini v1.67.0/go.mod h1:ByCAeIL28uOIIG0E3PJtZPDL8WnHpFKFOtgjp+3Ies8=
github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
@@ -29,6 +35,10 @@ github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE=
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/golang/protobuf v1.5.4 h1:i7eJL8qZTpSEXOPTxNKhASYpMn+8e5Q6AdndVa1dWek=
github.com/golang/protobuf v1.5.4/go.mod h1:lnTiLA8Wa4RWRcIUkrtSVa5nRhsEGBg48fD6rSs7xps=
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/grpc-ecosystem/grpc-gateway/v2 v2.28.0 h1:HWRh5R2+9EifMyIHV7ZV+MIZqgz+PMpZ14Jynv3O2Zs=
@@ -50,6 +60,12 @@ github.com/klauspost/cpuid/v2 v2.2.11 h1:0OwqZRYI2rFrjS4kvkDnqJkKHdHaRnCm68/DY4O
github.com/klauspost/cpuid/v2 v2.2.11/go.mod h1:hqwkgyIinND0mEev00jJYCxPNVRVXFQeu1XKlok6oO0=
github.com/klauspost/crc32 v1.3.0 h1:sSmTt3gUt81RP655XGZPElI0PelVTZ6YwCRnPSupoFM=
github.com/klauspost/crc32 v1.3.0/go.mod h1:D7kQaZhnkX/Y0tstFGf8VUzv2UofNGqCjnC3zdHB0Hw=
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc=
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
github.com/mattn/go-runewidth v0.0.19 h1:v++JhqYnZuu5jSKrk9RbgF5v4CGUjqRfBm05byFGLdw=
github.com/mattn/go-runewidth v0.0.19/go.mod h1:XBkDxAl56ILZc9knddidhrOlY5R/pDhgLpndooCuJAs=
github.com/meilisearch/meilisearch-go v0.36.1 h1:mJTCJE5g7tRvaqKco6DfqOuJEjX+rRltDEnkEC02Y0M=
@@ -66,42 +82,40 @@ github.com/pdfcpu/pdfcpu v0.11.1 h1:htHBSkGH5jMKWC6e0sihBFbcKZ8vG1M67c8/dJxhjas=
github.com/pdfcpu/pdfcpu v0.11.1/go.mod h1:pP3aGga7pRvwFWAm9WwFvo+V68DfANi9kxSQYioNYcw=
github.com/philhofer/fwd v1.2.0 h1:e6DnBTl7vGY+Gz322/ASL4Gyp1FspeMvx1RNDoToZuM=
github.com/philhofer/fwd v1.2.0/go.mod h1:RqIHx9QI14HlwKwm98g9Re5prTQ6LdeRQn+gXJFxsJM=
github.com/pingcap/errors v0.11.4 h1:lFuQV/oaUMGcD2tqt+01ROSmJs75VG1ToEOkZIZ4nE4=
github.com/pingcap/errors v0.11.4/go.mod h1:Oi8TUi2kEtXXLMJk9l1cGmz20kV3TaQ0usTwv5KuLY8=
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/prometheus/client_golang v1.20.5 h1:cxppBPuYhUnsO6yo/aoRol4L7q7UFfdm+bR9r+8l63Y=
github.com/prometheus/client_golang v1.20.5/go.mod h1:PIEt8X02hGcP8JWbeHyeZ53Y/jReSnHgO035n//V5WE=
github.com/prometheus/client_golang v1.23.2 h1:Je96obch5RDVy3FDMndoUsjAhG5Edi49h0RJWRi/o0o=
github.com/prometheus/client_golang v1.23.2/go.mod h1:Tb1a6LWHB3/SPIzCoaDXI4I8UHKeFTEQ1YCr+0Gyqmg=
github.com/prometheus/client_model v0.6.1 h1:ZKSh/rekM+n3CeS952MLRAdFwIKqeY8b62p8ais2e9E=
github.com/prometheus/client_model v0.6.1/go.mod h1:OrxVMOVHjw3lKMa8+x6HeMGkHMQyHDk9E3jmP2AmGiY=
github.com/prometheus/client_model v0.6.2 h1:oBsgwpGs7iVziMvrGhE53c/GrLUsZdHnqNwqPLxwZyk=
github.com/prometheus/client_model v0.6.2/go.mod h1:y3m2F6Gdpfy6Ut/GBsUqTWZqCUvMVzSfMLjcu6wAwpE=
github.com/prometheus/common v0.55.0 h1:KEi6DK7lXW/m7Ig5i47x0vRzuBsHuvJdi5ee6Y3G1dc=
github.com/prometheus/common v0.55.0/go.mod h1:2SECS4xJG1kd8XF9IcM1gMX6510RAEL65zxzNImwdc8=
github.com/prometheus/common v0.66.1 h1:h5E0h5/Y8niHc5DlaLlWLArTQI7tMrsfQjHV+d9ZoGs=
github.com/prometheus/common v0.66.1/go.mod h1:gcaUsgf3KfRSwHY4dIMXLPV0K/Wg1oZ8+SbZk/HH/dA=
github.com/prometheus/procfs v0.15.1 h1:YagwOFzUgYfKKHX6Dr+sHT7km/hxC76UB0learggepc=
github.com/prometheus/procfs v0.15.1/go.mod h1:fB45yRUv8NstnjriLhBQLuOUt+WW4BsoGhij/e3PBqk=
github.com/prometheus/procfs v0.16.1 h1:hZ15bTNuirocR6u0JZ6BAHHmwS1p8B4P6MRqxtzMyRg=
github.com/prometheus/procfs v0.16.1/go.mod h1:teAbpZRB1iIAJYREa1LsoWUXykVXA1KlTmWl8x/U+Is=
github.com/redis/go-redis/v9 v9.18.0 h1:pMkxYPkEbMPwRdenAzUNyFNrDgHx9U+DrBabWNfSRQs=
github.com/redis/go-redis/v9 v9.18.0/go.mod h1:k3ufPphLU5YXwNTUcCRXGxUoF1fqxnhFQmscfkCoDA0=
github.com/robfig/cron/v3 v3.0.1 h1:WdRxkvbJztn8LMz/QEvLN5sBU+xKpSqwwUO1Pjr4qDs=
github.com/robfig/cron/v3 v3.0.1/go.mod h1:eQICP3HwyT7UooqI/z+Ov+PtYAWygg1TEWWzGIFLtro=
github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ=
github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc=
github.com/rs/xid v1.6.0 h1:fV591PaemRlL6JfRxGDEPl69wICngIQ3shQtzfy2gxU=
github.com/rs/xid v1.6.0/go.mod h1:7XoLgs4eV+QndskICGsho+ADou8ySMSjJKDIan90Nz0=
github.com/spf13/cast v1.10.0 h1:h2x0u2shc1QuLHfxi+cTJvs30+ZAHOGRic8uyGTDWxY=
github.com/spf13/cast v1.10.0/go.mod h1:jNfB8QC9IA6ZuY2ZjDp0KtFO2LZZlg4S/7bzP6qqeHo=
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U=
github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U=
github.com/tinylib/msgp v1.6.1 h1:ESRv8eL3u+DNHUoSAAQRE50Hm162zqAnBoGv9PzScPY=
github.com/tinylib/msgp v1.6.1/go.mod h1:RSp0LW9oSxFut3KzESt5Voq4GVWyS+PSulT77roAqEA=
github.com/xyproto/randomstring v1.0.5 h1:YtlWPoRdgMu3NZtP45drfy1GKoojuR7hmRcnhZqKjWU=
github.com/xyproto/randomstring v1.0.5/go.mod h1:rgmS5DeNXLivK7YprL0pY+lTuhNQW3iGxZ18UQApw/E=
github.com/yuin/goldmark v1.8.2 h1:kEGpgqJXdgbkhcOgBxkC0X0PmoPG1ZyoZ117rDVp4zE=
github.com/yuin/goldmark v1.8.2/go.mod h1:ip/1k0VRfGynBgxOz0yCqHrbZXhcjxyuS66Brc7iBKg=
github.com/zeebo/xxh3 v1.0.2 h1:xZmwmqxHZA8AI603jOQ0tMqmBr9lPeFwGg6d+xy9DC0=
github.com/zeebo/xxh3 v1.0.2/go.mod h1:5NWz9Sef7zIDm2JHfFlcQvNekmcEl9ekUZQQKCYaDcA=
go.opentelemetry.io/auto/sdk v1.2.1 h1:jXsnJ4Lmnqd11kwkBV2LgLoFMZKizbCi5fNZ/ipaZ64=
go.opentelemetry.io/auto/sdk v1.2.1/go.mod h1:KRTj+aOaElaLi+wW1kO/DZRXwkF4C5xPbEe3ZiIhN7Y=
go.opentelemetry.io/contrib/bridges/otelslog v0.17.0 h1:NFIS6x7wyObQ7cR84x7bt1sr8nYBx89s3x3GwRjw40k=
@@ -124,12 +138,18 @@ go.opentelemetry.io/otel/sdk v1.42.0 h1:LyC8+jqk6UJwdrI/8VydAq/hvkFKNHZVIWuslJXY
go.opentelemetry.io/otel/sdk v1.42.0/go.mod h1:rGHCAxd9DAph0joO4W6OPwxjNTYWghRWmkHuGbayMts=
go.opentelemetry.io/otel/sdk/log v0.18.0 h1:n8OyZr7t7otkeTnPTbDNom6rW16TBYGtvyy2Gk6buQw=
go.opentelemetry.io/otel/sdk/log v0.18.0/go.mod h1:C0+wxkTwKpOCZLrlJ3pewPiiQwpzycPI/u6W0Z9fuYk=
go.opentelemetry.io/otel/sdk/log/logtest v0.18.0 h1:l3mYuPsuBx6UKE47BVcPrZoZ0q/KER57vbj2qkgDLXA=
go.opentelemetry.io/otel/sdk/log/logtest v0.18.0/go.mod h1:7cHtiVJpZebB3wybTa4NG+FUo5NPe3PROz1FqB0+qdw=
go.opentelemetry.io/otel/sdk/metric v1.42.0 h1:D/1QR46Clz6ajyZ3G8SgNlTJKBdGp84q9RKCAZ3YGuA=
go.opentelemetry.io/otel/sdk/metric v1.42.0/go.mod h1:Ua6AAlDKdZ7tdvaQKfSmnFTdHx37+J4ba8MwVCYM5hc=
go.opentelemetry.io/otel/trace v1.42.0 h1:OUCgIPt+mzOnaUTpOQcBiM/PLQ/Op7oq6g4LenLmOYY=
go.opentelemetry.io/otel/trace v1.42.0/go.mod h1:f3K9S+IFqnumBkKhRJMeaZeNk9epyhnCmQh/EysQCdc=
go.opentelemetry.io/proto/otlp v1.9.0 h1:l706jCMITVouPOqEnii2fIAuO3IVGBRPV5ICjceRb/A=
go.opentelemetry.io/proto/otlp v1.9.0/go.mod h1:xE+Cx5E/eEHw+ISFkwPLwCZefwVjY+pqKg1qcK03+/4=
go.uber.org/atomic v1.11.0 h1:ZvwS0R+56ePWxUNi+Atn9dWONBPp/AUETXlHW0DxSjE=
go.uber.org/atomic v1.11.0/go.mod h1:LUxbIzbOniOlMKjJjyPfpl4v+PKK2cNJn91OQbhoJI0=
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto=
go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE=
go.yaml.in/yaml/v2 v2.4.2 h1:DzmwEr2rDGHl7lsFgAHxmNz/1NlQ7xLIrlN2h5d1eGI=
go.yaml.in/yaml/v2 v2.4.2/go.mod h1:081UH+NErpNdqlCXm3TtEran0rJZGxAYx9hb/ELlsPU=
go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc=
@@ -146,6 +166,8 @@ golang.org/x/text v0.34.0 h1:oL/Qq0Kdaqxa1KbNeMKwQq0reLCCaFtqu2eNuSeNHbk=
golang.org/x/text v0.34.0/go.mod h1:homfLqTYRFyVYemLBFl5GgL/DWEiH5wcsQ5gSh1yziA=
golang.org/x/time v0.14.0 h1:MRx4UaLrDotUKUdCIqzPC48t1Y9hANFKIRpNx+Te8PI=
golang.org/x/time v0.14.0/go.mod h1:eL/Oa2bBBK0TkX57Fyni+NgnyQQN4LitPmob2Hjnqw4=
gonum.org/v1/gonum v0.16.0 h1:5+ul4Swaf3ESvrOnidPp4GZbzf0mxVQpDCYUQE7OJfk=
gonum.org/v1/gonum v0.16.0/go.mod h1:fef3am4MQ93R2HHpKnLk4/Tbh/s0+wqD5nfa6Pnwy4E=
google.golang.org/genproto/googleapis/api v0.0.0-20260209200024-4cfbd4190f57 h1:JLQynH/LBHfCTSbDWl+py8C+Rg/k1OVH3xfcaiANuF0=
google.golang.org/genproto/googleapis/api v0.0.0-20260209200024-4cfbd4190f57/go.mod h1:kSJwQxqmFXeo79zOmbrALdflXQeAYcUbgS7PbpMknCY=
google.golang.org/genproto/googleapis/rpc v0.0.0-20260209200024-4cfbd4190f57 h1:mWPCjDEyshlQYzBpMNHaEof6UX1PmHcaUODUywQ0uac=
@@ -154,9 +176,9 @@ google.golang.org/grpc v1.79.2 h1:fRMD94s2tITpyJGtBBn7MkMseNpOZU8ZxgC3MMBaXRU=
google.golang.org/grpc v1.79.2/go.mod h1:KmT0Kjez+0dde/v2j9vzwoAScgEPx/Bw1CYChhHLrHQ=
google.golang.org/protobuf v1.36.11 h1:fV6ZwhNocDyBLK0dj+fg8ektcVegBBuEolpbTQyBNVE=
google.golang.org/protobuf v1.36.11/go.mod h1:HTf+CrKn2C3g5S8VImy6tdcUvCska2kB7j23XfzDpco=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=

View File

@@ -96,11 +96,12 @@ func (p *Producer) CreateImportTask(ctx context.Context, task domain.ImportTask)
}
payload := ImportPayload{
PBTaskID: id,
Slug: task.Slug,
Title: task.Title,
FileType: task.FileType,
ObjectKey: task.ObjectKey,
PBTaskID: id,
Slug: task.Slug,
Title: task.Title,
FileType: task.FileType,
ObjectKey: task.ObjectKey,
ChaptersKey: task.ChaptersKey,
}
if err := p.enqueue(ctx, TypeImportBook, payload); err != nil {
// Non-fatal: PB record exists; runner will pick it up on next poll.

View File

@@ -48,9 +48,10 @@ type ScrapePayload struct {
// ImportPayload is the Asynq job payload for PDF/EPUB import tasks.
type ImportPayload struct {
PBTaskID string `json:"pb_task_id"`
Slug string `json:"slug"`
Title string `json:"title"`
FileType string `json:"file_type"` // "pdf" or "epub"
ObjectKey string `json:"object_key"` // MinIO path to uploaded file
PBTaskID string `json:"pb_task_id"`
Slug string `json:"slug"`
Title string `json:"title"`
FileType string `json:"file_type"` // "pdf" or "epub"
ObjectKey string `json:"object_key"` // MinIO path to uploaded file
ChaptersKey string `json:"chapters_key"` // MinIO path to pre-parsed chapters JSON
}

View File

@@ -1,6 +1,7 @@
package backend
import (
"context"
"encoding/json"
"fmt"
"io"
@@ -45,6 +46,8 @@ func (s *Server) handleAdminImport(w http.ResponseWriter, r *http.Request) {
ct := r.Header.Get("Content-Type")
var req importRequest
var objectKey string
var chaptersKey string
var chapterCount int
if strings.HasPrefix(ct, "multipart/form-data") {
if err := r.ParseMultipartForm(32 << 20); err != nil {
@@ -96,17 +99,38 @@ func (s *Server) handleAdminImport(w http.ResponseWriter, r *http.Request) {
return
}
// Upload to MinIO for actual import
// Parse PDF/EPUB on the backend (with timeout) and store chapters as JSON.
// The runner only needs to ingest pre-parsed chapters — no PDF parsing on runner.
parseCtx, parseCancel := context.WithTimeout(r.Context(), 3*time.Minute)
defer parseCancel()
chapters, parseErr := storage.ParseImportFile(parseCtx, data, req.FileType)
if parseErr != nil || len(chapters) == 0 {
jsonError(w, http.StatusUnprocessableEntity, "could not parse file: "+func() string {
if parseErr != nil { return parseErr.Error() }
return "no chapters found"
}())
return
}
// Store raw file in MinIO (for reference/re-import).
objectKey = fmt.Sprintf("imports/%d_%s", time.Now().Unix(), header.Filename)
store, ok := s.deps.Producer.(*storage.Store)
if !ok {
if s.deps.ImportFileStore == nil {
jsonError(w, http.StatusInternalServerError, "storage not available")
return
}
if err := store.PutImportFile(r.Context(), objectKey, data); err != nil {
if err := s.deps.ImportFileStore.PutImportFile(r.Context(), objectKey, data); err != nil {
jsonError(w, http.StatusInternalServerError, "upload file: "+err.Error())
return
}
// Store pre-parsed chapters JSON in MinIO so runner can ingest without re-parsing.
chaptersJSON, _ := json.Marshal(chapters)
chaptersKey = fmt.Sprintf("imports/%d_%s_chapters.json", time.Now().Unix(), strings.TrimSuffix(header.Filename, filepath.Ext(header.Filename)))
if err := s.deps.ImportFileStore.PutImportChapters(r.Context(), chaptersKey, chaptersJSON); err != nil {
jsonError(w, http.StatusInternalServerError, "store chapters: "+err.Error())
return
}
chapterCount = len(chapters)
} else {
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
jsonError(w, http.StatusBadRequest, "parse body: "+err.Error())
@@ -142,6 +166,8 @@ func (s *Server) handleAdminImport(w http.ResponseWriter, r *http.Request) {
BookStatus: req.BookStatus,
FileType: req.FileType,
ObjectKey: objectKey,
ChaptersKey: chaptersKey,
ChaptersTotal: chapterCount,
InitiatorUserID: "",
})
if err != nil {
@@ -150,8 +176,9 @@ func (s *Server) handleAdminImport(w http.ResponseWriter, r *http.Request) {
}
writeJSON(w, 0, importResponse{
TaskID: taskID,
Slug: slug,
TaskID: taskID,
Slug: slug,
Preview: &importPreview{Chapters: chapterCount},
})
}

View File

@@ -0,0 +1,141 @@
package backend
import (
"encoding/json"
"fmt"
"net/http"
"strings"
"github.com/libnovel/backend/internal/bookstore"
"github.com/libnovel/backend/internal/domain"
)
// handleAdminSplitChapters handles POST /api/admin/books/{slug}/split-chapters.
//
// Request body (JSON):
//
// { "text": "<full text with --- dividers and optional ## Title lines>" }
//
// The text is split on lines containing only "---". Each segment may start with
// a "## Title" line which becomes the chapter title; remaining lines are the
// chapter content. Sequential chapter numbers 1..N are assigned.
//
// All existing chapters for the book are replaced: WriteChapter is called for
// each new chapter (upsert by number), so chapters beyond N are not deleted —
// use the dedup endpoint afterwards if needed.
func (s *Server) handleAdminSplitChapters(w http.ResponseWriter, r *http.Request) {
if s.deps.BookWriter == nil {
jsonError(w, http.StatusServiceUnavailable, "book writer not configured")
return
}
slug := r.PathValue("slug")
if slug == "" {
jsonError(w, http.StatusBadRequest, "slug is required")
return
}
var req struct {
Text string `json:"text"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
jsonError(w, http.StatusBadRequest, "parse body: "+err.Error())
return
}
if strings.TrimSpace(req.Text) == "" {
jsonError(w, http.StatusBadRequest, "text is required")
return
}
chapters := splitChapterText(req.Text)
if len(chapters) == 0 {
jsonError(w, http.StatusUnprocessableEntity, "no chapters produced from text")
return
}
for _, ch := range chapters {
var mdContent string
if ch.Title != "" && ch.Title != fmt.Sprintf("Chapter %d", ch.Number) {
mdContent = fmt.Sprintf("# %s\n\n%s", ch.Title, ch.Content)
} else {
mdContent = fmt.Sprintf("# Chapter %d\n\n%s", ch.Number, ch.Content)
}
domainCh := domain.Chapter{
Ref: domain.ChapterRef{Number: ch.Number, Title: ch.Title},
Text: mdContent,
}
if err := s.deps.BookWriter.WriteChapter(r.Context(), slug, domainCh); err != nil {
jsonError(w, http.StatusInternalServerError, fmt.Sprintf("write chapter %d: %s", ch.Number, err.Error()))
return
}
}
writeJSON(w, 0, map[string]any{
"chapters": len(chapters),
"slug": slug,
})
}
// splitChapterText splits text on "---" divider lines into bookstore.Chapter
// slices. Each segment may optionally start with a "## Title" header line.
func splitChapterText(text string) []bookstore.Chapter {
lines := strings.Split(text, "\n")
// Collect raw segments split on "---" dividers.
var segments [][]string
cur := []string{}
for _, line := range lines {
if strings.TrimSpace(line) == "---" {
segments = append(segments, cur)
cur = []string{}
} else {
cur = append(cur, line)
}
}
segments = append(segments, cur) // last segment
var chapters []bookstore.Chapter
chNum := 0
for _, seg := range segments {
// Trim leading/trailing blank lines from the segment.
start, end := 0, len(seg)
for start < end && strings.TrimSpace(seg[start]) == "" {
start++
}
for end > start && strings.TrimSpace(seg[end-1]) == "" {
end--
}
seg = seg[start:end]
if len(seg) == 0 {
continue
}
// Check for a "## Title" header on the first line.
title := ""
contentStart := 0
if strings.HasPrefix(strings.TrimSpace(seg[0]), "## ") {
title = strings.TrimSpace(strings.TrimPrefix(strings.TrimSpace(seg[0]), "## "))
contentStart = 1
// Skip blank lines after the title.
for contentStart < len(seg) && strings.TrimSpace(seg[contentStart]) == "" {
contentStart++
}
}
content := strings.TrimSpace(strings.Join(seg[contentStart:], "\n"))
if content == "" {
continue
}
chNum++
if title == "" {
title = fmt.Sprintf("Chapter %d", chNum)
}
chapters = append(chapters, bookstore.Chapter{
Number: chNum,
Title: title,
Content: content,
})
}
return chapters
}

View File

@@ -85,6 +85,9 @@ type Dependencies struct {
// BookWriter writes book metadata and chapter refs to PocketBase.
// Used by admin text-gen apply endpoints.
BookWriter bookstore.BookWriter
// ImportFileStore uploads raw PDF/EPUB files to MinIO for the runner to process.
// Always wired to the concrete *storage.Store (not the Asynq wrapper).
ImportFileStore bookstore.ImportFileStore
// AIJobStore tracks long-running AI generation jobs in PocketBase.
// If nil, job persistence is disabled (jobs still run but are not recorded).
AIJobStore bookstore.AIJobStore
@@ -244,6 +247,9 @@ func (s *Server) ListenAndServe(ctx context.Context) error {
// Admin data repair endpoints
mux.HandleFunc("POST /api/admin/dedup-chapters/{slug}", s.handleDedupChapters)
// Admin chapter split (imported books)
mux.HandleFunc("POST /api/admin/books/{slug}/split-chapters", s.handleAdminSplitChapters)
// Import (PDF/EPUB)
mux.HandleFunc("POST /api/admin/import", s.handleAdminImport)
mux.HandleFunc("GET /api/admin/import", s.handleAdminImportList)

View File

@@ -215,3 +215,14 @@ type BookImporter interface {
// Returns the extracted chapters or an error.
Import(ctx context.Context, objectKey, fileType string) ([]Chapter, error)
}
// ImportFileStore uploads raw import files to object storage.
// Kept separate from BookImporter so the HTTP handler can upload the file
// without a concrete type assertion, regardless of which Producer is wired.
type ImportFileStore interface {
PutImportFile(ctx context.Context, objectKey string, data []byte) error
// PutImportChapters stores the pre-parsed chapters JSON under the given key.
PutImportChapters(ctx context.Context, key string, data []byte) error
// GetImportChapters retrieves the pre-parsed chapters JSON.
GetImportChapters(ctx context.Context, key string) ([]byte, error)
}

View File

@@ -178,6 +178,7 @@ type ImportTask struct {
FileName string `json:"file_name"`
FileType string `json:"file_type"` // "pdf" or "epub"
ObjectKey string `json:"object_key,omitempty"` // MinIO path to uploaded file
ChaptersKey string `json:"chapters_key,omitempty"` // MinIO path to pre-parsed chapters JSON
Author string `json:"author,omitempty"`
CoverURL string `json:"cover_url,omitempty"`
Genres []string `json:"genres,omitempty"`

View File

@@ -199,10 +199,11 @@ func (r *Runner) handleImportTask(ctx context.Context, t *asynq.Task) error {
return fmt.Errorf("unmarshal import payload: %w", err)
}
task := domain.ImportTask{
ID: p.PBTaskID,
Slug: p.Slug,
Title: p.Title,
FileType: p.FileType,
ID: p.PBTaskID,
Slug: p.Slug,
Title: p.Title,
FileType: p.FileType,
ChaptersKey: p.ChaptersKey,
}
r.tasksRunning.Add(1)
defer r.tasksRunning.Add(-1)

View File

@@ -15,6 +15,7 @@ package runner
import (
"context"
"encoding/json"
"fmt"
"log/slog"
"os"
@@ -49,6 +50,11 @@ type ChapterIngester interface {
IngestChapters(ctx context.Context, slug string, chapters []bookstore.Chapter) error
}
// ImportChapterStore retrieves pre-parsed chapter JSON blobs from object storage.
type ImportChapterStore interface {
GetImportChapters(ctx context.Context, key string) ([]byte, error)
}
// Config tunes the runner behaviour.
type Config struct {
// WorkerID uniquely identifies this runner instance in PocketBase records.
@@ -114,7 +120,12 @@ type Dependencies struct {
// CoverStore stores book cover images in MinIO.
CoverStore bookstore.CoverStore
// BookImport handles PDF/EPUB file parsing and chapter extraction.
// Kept for backward compatibility when ChaptersKey is not set.
BookImport bookstore.BookImporter
// ImportChapterStore retrieves pre-parsed chapter JSON blobs from MinIO.
// When set and the task has a ChaptersKey, the runner reads from here
// instead of calling BookImport.Import() (the new preferred path).
ImportChapterStore ImportChapterStore
// ChapterIngester persists extracted chapters into MinIO/PocketBase.
ChapterIngester ChapterIngester
// Notifier creates notifications for users.
@@ -675,6 +686,10 @@ func (r *Runner) runAudioTask(ctx context.Context, task domain.AudioTask) {
}
// runImportTask executes one PDF/EPUB import task.
// Preferred path: when task.ChaptersKey is set, it reads pre-parsed chapters
// JSON from MinIO (written by the backend at upload time) and ingests them.
// Fallback path: when ChaptersKey is empty, calls BookImport.Import() to
// parse the raw file on the runner (legacy behaviour, not used for new tasks).
func (r *Runner) runImportTask(ctx context.Context, task domain.ImportTask, objectKey string) {
ctx, span := otel.Tracer("runner").Start(ctx, "runner.import_task")
defer span.End()
@@ -682,10 +697,11 @@ func (r *Runner) runImportTask(ctx context.Context, task domain.ImportTask, obje
attribute.String("task.id", task.ID),
attribute.String("book.slug", task.Slug),
attribute.String("file.type", task.FileType),
attribute.String("chapters_key", task.ChaptersKey),
)
log := r.deps.Log.With("task_id", task.ID, "slug", task.Slug, "file_type", task.FileType)
log.Info("runner: import task starting")
log.Info("runner: import task starting", "chapters_key", task.ChaptersKey)
hbCtx, hbCancel := context.WithCancel(ctx)
defer hbCancel()
@@ -714,15 +730,33 @@ func (r *Runner) runImportTask(ctx context.Context, task domain.ImportTask, obje
}
}
if r.deps.BookImport == nil {
fail("book import not configured (BookImport dependency missing)")
return
}
var chapters []bookstore.Chapter
chapters, err := r.deps.BookImport.Import(ctx, objectKey, task.FileType)
if err != nil {
fail(fmt.Sprintf("import file: %v", err))
return
if task.ChaptersKey != "" && r.deps.ImportChapterStore != nil {
// New path: read pre-parsed chapters JSON uploaded by the backend.
raw, err := r.deps.ImportChapterStore.GetImportChapters(ctx, task.ChaptersKey)
if err != nil {
fail(fmt.Sprintf("get chapters JSON: %v", err))
return
}
if err := json.Unmarshal(raw, &chapters); err != nil {
fail(fmt.Sprintf("unmarshal chapters JSON: %v", err))
return
}
log.Info("runner: loaded pre-parsed chapters", "count", len(chapters))
} else {
// Legacy path: parse the raw file on the runner.
if r.deps.BookImport == nil {
fail("book import not configured (BookImport dependency missing)")
return
}
var err error
chapters, err = r.deps.BookImport.Import(ctx, objectKey, task.FileType)
if err != nil {
fail(fmt.Sprintf("import file: %v", err))
return
}
log.Info("runner: parsed chapters from file (legacy path)", "count", len(chapters))
}
if len(chapters) == 0 {
@@ -730,23 +764,12 @@ func (r *Runner) runImportTask(ctx context.Context, task domain.ImportTask, obje
return
}
// Store chapters via BookWriter
// Note: BookWriter.WriteChapters expects domain.Chapter, need conversion
var domainChapters []bookstore.Chapter
for _, ch := range chapters {
domainChapters = append(domainChapters, bookstore.Chapter{
Number: ch.Number,
Title: ch.Title,
Content: ch.Content,
})
}
// Store chapters via ChapterIngester
// Persist chapters via ChapterIngester.
if r.deps.ChapterIngester == nil {
fail("chapter ingester not configured")
return
}
if err := r.deps.ChapterIngester.IngestChapters(ctx, task.Slug, domainChapters); err != nil {
if err := r.deps.ChapterIngester.IngestChapters(ctx, task.Slug, chapters); err != nil {
fail(fmt.Sprintf("store chapters: %v", err))
return
}
@@ -786,7 +809,7 @@ func (r *Runner) runImportTask(ctx context.Context, task domain.ImportTask, obje
log.Error("runner: FinishImportTask failed", "err", err)
}
// Create notification for the user who initiated the import
// Notify the user who initiated the import.
if r.deps.Notifier != nil {
msg := fmt.Sprintf("Import completed: %d chapters from %s", len(chapters), task.Title)
targetUser := task.InitiatorUserID

View File

@@ -6,23 +6,19 @@ import (
"context"
"fmt"
"io"
"regexp"
"os"
"sort"
"strconv"
"strings"
"github.com/dslipak/pdf"
"github.com/libnovel/backend/internal/bookstore"
"github.com/libnovel/backend/internal/domain"
minio "github.com/minio/minio-go/v7"
"github.com/pdfcpu/pdfcpu/pkg/api"
"github.com/pdfcpu/pdfcpu/pkg/pdfcpu/model"
"golang.org/x/net/html"
)
// chapterHeadingRE matches common chapter heading patterns:
// "Chapter 1", "Chapter 1:", "Chapter 1 -", "CHAPTER ONE", "1.", "Part 1", etc.
var chapterHeadingRE = regexp.MustCompile(
`(?i)^(?:chapter|ch\.?|part|episode|book)\s+(\d+|[ivxlcdm]+)\b|^\d{1,4}[\.\)]\s+\S`)
type importer struct {
mc *minioClient
}
@@ -58,6 +54,7 @@ func (i *importer) Import(ctx context.Context, objectKey, fileType string) ([]bo
// chapter count and up to 3 preview lines (first non-empty line of each of
// the first 3 chapters). It is used by the analyze-only endpoint so users
// can preview chapter count before committing the import.
// Note: uses parsePDF which is backed by pdfcpu ExtractContent — fast, no hang risk.
func AnalyzeFile(data []byte, fileType string) (chapterCount int, firstLines []string, err error) {
var chapters []bookstore.Chapter
switch fileType {
@@ -90,35 +87,398 @@ func AnalyzeFile(data []byte, fileType string) (chapterCount int, firstLines []s
func parsePDF(data []byte) ([]bookstore.Chapter, error) {
r, err := pdf.NewReader(bytes.NewReader(data), int64(len(data)))
// decryptPDF strips encryption from a PDF using an empty user password.
// Returns the decrypted bytes, or an error if decryption is not possible.
// This handles the common case of "owner-only" encrypted PDFs (copy/print
// restrictions) which use an empty user password and open normally in readers.
func decryptPDF(data []byte) ([]byte, error) {
conf := model.NewDefaultConfiguration()
conf.UserPW = ""
conf.OwnerPW = ""
var out bytes.Buffer
err := api.Decrypt(bytes.NewReader(data), &out, conf)
if err != nil {
return nil, fmt.Errorf("open PDF: %w", err)
return nil, err
}
return out.Bytes(), nil
}
// ParseImportFile parses a PDF or EPUB and returns chapters.
// Unlike AnalyzeFile it respects ctx cancellation so callers can apply a timeout.
// For PDFs it first attempts to strip encryption with an empty password.
func ParseImportFile(ctx context.Context, data []byte, fileType string) ([]bookstore.Chapter, error) {
type result struct {
chapters []bookstore.Chapter
err error
}
ch := make(chan result, 1)
go func() {
var chapters []bookstore.Chapter
var err error
switch fileType {
case "pdf":
chapters, err = parsePDF(data)
case "epub":
chapters, err = parseEPUB(data)
default:
err = fmt.Errorf("unsupported file type: %s", fileType)
}
ch <- result{chapters, err}
}()
select {
case <-ctx.Done():
return nil, fmt.Errorf("parse timed out: %w", ctx.Err())
case r := <-ch:
return r.chapters, r.err
}
}
// pdfSkipBookmarks lists bookmark titles that are front/back matter, not story chapters.
// These are skipped when building the chapter list.
var pdfSkipBookmarks = map[string]bool{
"cover": true, "insert": true, "title page": true, "copyright": true,
"appendix": true, "color insert": true, "color illustrations": true,
}
// parsePDF extracts text from PDF bytes and returns it as a single chapter.
//
// The full readable text is returned as one chapter so the admin can manually
// split it into chapters via the UI using --- markers.
//
// Strategy:
// 1. Decrypt owner-protected PDFs (empty user password).
// 2. Extract raw content streams for every page using pdfcpu ExtractContent.
// 3. Concatenate text from all pages in order, skipping front matter
// (cover, title page, copyright — typically the first 10 pages).
func parsePDF(data []byte) ([]bookstore.Chapter, error) {
// Decrypt owner-protected PDFs (empty user password).
decrypted, err := decryptPDF(data)
if err == nil {
data = decrypted
}
// Extract per-page text so we can detect chapter boundaries.
numPages := r.NumPage()
if numPages == 0 {
return nil, fmt.Errorf("PDF has no pages")
conf := model.NewDefaultConfiguration()
conf.UserPW = ""
conf.OwnerPW = ""
// Extract all page content streams to a temp directory.
tmpDir, err := os.MkdirTemp("", "pdf-extract-*")
if err != nil {
return nil, fmt.Errorf("create temp dir: %w", err)
}
defer os.RemoveAll(tmpDir)
if err := api.ExtractContent(bytes.NewReader(data), tmpDir, "out", nil, conf); err != nil {
return nil, fmt.Errorf("extract PDF content: %w", err)
}
// Collect full text first with page markers so we can split by chapter.
entries, err := os.ReadDir(tmpDir)
if err != nil || len(entries) == 0 {
return nil, fmt.Errorf("PDF has no content pages")
}
// Parse page number from filename and build ordered text map.
pageTexts := make(map[int]string, len(entries))
maxPage := 0
for _, e := range entries {
pageNum := pageNumFromFilename(e.Name())
if pageNum <= 0 {
continue
}
raw, readErr := os.ReadFile(tmpDir + "/" + e.Name())
if readErr != nil {
continue
}
pageTexts[pageNum] = fixWin1252(extractTextFromContentStream(raw))
if pageNum > maxPage {
maxPage = pageNum
}
}
// Determine front-matter cutoff using bookmarks if available,
// otherwise skip the first 10 pages (cover/title/copyright).
bodyStart := 1
bookmarks, bmErr := api.Bookmarks(bytes.NewReader(data), conf)
if bmErr == nil {
for _, bm := range bookmarks {
title := strings.ToLower(strings.TrimSpace(bm.Title))
if !pdfSkipBookmarks[title] && bm.PageFrom > 0 {
// First non-front-matter bookmark — body starts here.
bodyStart = bm.PageFrom
break
}
}
} else if maxPage > 10 {
bodyStart = 11
}
// Concatenate all body pages.
var sb strings.Builder
fonts := make(map[string]*pdf.Font)
for i := 1; i <= numPages; i++ {
page := r.Page(i)
if page.V.IsNull() {
for p := bodyStart; p <= maxPage; p++ {
t := strings.TrimSpace(pageTexts[p])
if t == "" {
continue
}
text, err := page.GetPlainText(fonts)
if err != nil {
continue
}
sb.WriteString(text)
sb.WriteByte('\n')
sb.WriteString(t)
sb.WriteString("\n\n")
}
return extractChaptersFromText(sb.String()), nil
text := strings.TrimSpace(sb.String())
if text == "" {
return nil, fmt.Errorf("could not extract any text from PDF")
}
return []bookstore.Chapter{{
Number: 1,
Title: "Full Text",
Content: text,
}}, nil
}
// pageNumFromFilename extracts the page number from a pdfcpu content-stream
// filename like "out_Content_page_42.txt". Returns 0 if not parseable.
func pageNumFromFilename(name string) int {
// Strip directory prefix and extension.
base := name
if idx := strings.LastIndex(base, "/"); idx >= 0 {
base = base[idx+1:]
}
if idx := strings.LastIndex(base, "."); idx >= 0 {
base = base[:idx]
}
// Find last "_" and parse the number after it.
if idx := strings.LastIndex(base, "_"); idx >= 0 {
n, err := strconv.Atoi(base[idx+1:])
if err == nil && n > 0 {
return n
}
}
return 0
}
// win1252ToUnicode maps the Windows-1252 control range 0x800x9F to the
// Unicode characters they actually represent in that encoding.
// Standard Latin-1 maps these bytes to control characters; Win-1252 maps
// them to typographic symbols that appear in publisher PDFs.
var win1252ToUnicode = map[byte]rune{
0x80: '\u20AC', // €
0x82: '\u201A', //
0x83: '\u0192', // ƒ
0x84: '\u201E', // „
0x85: '\u2026', // …
0x86: '\u2020', // †
0x87: '\u2021', // ‡
0x88: '\u02C6', // ˆ
0x89: '\u2030', // ‰
0x8A: '\u0160', // Š
0x8B: '\u2039', //
0x8C: '\u0152', // Œ
0x8E: '\u017D', // Ž
0x91: '\u2018', // ' (left single quotation mark)
0x92: '\u2019', // ' (right single quotation mark / apostrophe)
0x93: '\u201C', // " (left double quotation mark)
0x94: '\u201D', // " (right double quotation mark)
0x95: '\u2022', // • (bullet)
0x96: '\u2013', // (en dash)
0x97: '\u2014', // — (em dash)
0x98: '\u02DC', // ˜
0x99: '\u2122', // ™
0x9A: '\u0161', // š
0x9B: '\u203A', //
0x9C: '\u0153', // œ
0x9E: '\u017E', // ž
0x9F: '\u0178', // Ÿ
}
// fixWin1252 replaces Windows-1252 specific bytes (0x800x9F) in a string
// that was decoded as raw Latin-1 bytes with their proper Unicode equivalents.
func fixWin1252(s string) string {
// Fast path: if no bytes in 0x800x9F range, return unchanged.
needsFix := false
for i := 0; i < len(s); i++ {
b := s[i]
if b >= 0x80 && b <= 0x9F {
needsFix = true
break
}
}
if !needsFix {
return s
}
var sb strings.Builder
sb.Grow(len(s))
for i := 0; i < len(s); i++ {
b := s[i]
if b >= 0x80 && b <= 0x9F {
if r, ok := win1252ToUnicode[b]; ok {
sb.WriteRune(r)
continue
}
}
sb.WriteByte(b)
}
return sb.String()
}
// extractTextFromContentStream parses a raw PDF content stream and extracts
// readable text from Tj and TJ operators.
//
// TJ arrays may contain a mix of literal strings (parenthesised) and hex glyph
// arrays. Only the literal strings are decoded — hex arrays require per-font
// ToUnicode CMaps and are skipped. Kerning adjustment numbers inside TJ arrays
// are also ignored (they're just spacing hints).
//
// Line breaks are inserted on ET / Td / TD / T* operators.
func extractTextFromContentStream(stream []byte) string {
s := string(stream)
var sb strings.Builder
i := 0
n := len(s)
for i < n {
// TJ array: [ ... ]TJ — collect all literal strings, skip hex & numbers.
if s[i] == '[' {
j := i + 1
for j < n && s[j] != ']' {
if s[j] == '(' {
// Literal string inside TJ array.
k := j + 1
depth := 1
for k < n && depth > 0 {
if s[k] == '\\' {
k += 2
continue
}
if s[k] == '(' {
depth++
} else if s[k] == ')' {
depth--
}
k++
}
lit := pdfUnescapeString(s[j+1 : k-1])
if hasPrintableASCII(lit) {
sb.WriteString(lit)
}
j = k
continue
}
j++
}
// Check if this is a TJ operator (skip whitespace after ']').
end := j + 1
for end < n && (s[end] == ' ' || s[end] == '\t' || s[end] == '\r' || s[end] == '\n') {
end++
}
if end+2 <= n && s[end:end+2] == "TJ" && (end+2 == n || !isAlphaNum(s[end+2])) {
i = end + 2
continue
}
i = j + 1
continue
}
// Single string: (string) Tj
if s[i] == '(' {
j := i + 1
depth := 1
for j < n && depth > 0 {
if s[j] == '\\' {
j += 2
continue
}
if s[j] == '(' {
depth++
} else if s[j] == ')' {
depth--
}
j++
}
lit := pdfUnescapeString(s[i+1 : j-1])
if hasPrintableASCII(lit) {
// Check for Tj operator.
end := j
for end < n && (s[end] == ' ' || s[end] == '\t') {
end++
}
if end+2 <= n && s[end:end+2] == "Tj" && (end+2 == n || !isAlphaNum(s[end+2])) {
sb.WriteString(lit)
i = end + 2
continue
}
}
i = j
continue
}
// Detect end of text object (ET) — add a newline.
if i+2 <= n && s[i:i+2] == "ET" && (i+2 == n || !isAlphaNum(s[i+2])) {
sb.WriteByte('\n')
i += 2
continue
}
// Detect Td / TD / T* — newline within text block.
if i+2 <= n && (s[i:i+2] == "Td" || s[i:i+2] == "TD" || s[i:i+2] == "T*") &&
(i+2 == n || !isAlphaNum(s[i+2])) {
sb.WriteByte('\n')
i += 2
continue
}
i++
}
return sb.String()
}
func isAlphaNum(b byte) bool {
return (b >= 'a' && b <= 'z') || (b >= 'A' && b <= 'Z') || (b >= '0' && b <= '9') || b == '_'
}
func hasPrintableASCII(s string) bool {
for _, c := range s {
if c >= 0x20 && c < 0x7F {
return true
}
}
return false
}
// pdfUnescapeString handles PDF string escape sequences.
func pdfUnescapeString(s string) string {
if !strings.ContainsRune(s, '\\') {
return s
}
var sb strings.Builder
i := 0
for i < len(s) {
if s[i] == '\\' && i+1 < len(s) {
switch s[i+1] {
case 'n':
sb.WriteByte('\n')
case 'r':
sb.WriteByte('\r')
case 't':
sb.WriteByte('\t')
case '(', ')', '\\':
sb.WriteByte(s[i+1])
default:
// Octal escape \ddd
if s[i+1] >= '0' && s[i+1] <= '7' {
end := i + 2
for end < i+5 && end < len(s) && s[end] >= '0' && s[end] <= '7' {
end++
}
val, _ := strconv.ParseInt(s[i+1:end], 8, 16)
sb.WriteByte(byte(val))
i = end
continue
}
sb.WriteByte(s[i+1])
}
i += 2
} else {
sb.WriteByte(s[i])
i++
}
}
return sb.String()
}
// ── EPUB parsing ──────────────────────────────────────────────────────────────
@@ -152,6 +512,7 @@ func parseEPUB(data []byte) ([]bookstore.Chapter, error) {
}
var chapters []bookstore.Chapter
chNum := 0
for i, href := range spineFiles {
fullPath := opfDir + href
content, err := epubFileContent(zr, fullPath)
@@ -162,12 +523,14 @@ func parseEPUB(data []byte) ([]bookstore.Chapter, error) {
if strings.TrimSpace(text) == "" {
continue
}
chNum++
title := titleMap[href]
if title == "" {
title = fmt.Sprintf("Chapter %d", i+1)
title = fmt.Sprintf("Chapter %d", chNum)
}
_ = i // spine index unused for numbering
chapters = append(chapters, bookstore.Chapter{
Number: i + 1,
Number: chNum,
Title: title,
Content: text,
})
@@ -464,80 +827,6 @@ func htmlToText(data []byte) string {
return strings.TrimSpace(strings.Join(out, "\n"))
}
// ── Chapter segmentation (shared by PDF and plain-text paths) ─────────────────
// extractChaptersFromText splits a block of plain text into chapters by
// detecting heading lines that match chapterHeadingRE.
// Falls back to paragraph-splitting when no headings are found.
func extractChaptersFromText(text string) []bookstore.Chapter {
lines := strings.Split(text, "\n")
type segment struct {
title string
number int
lines []string
}
var segments []segment
var cur *segment
chNum := 0
for _, line := range lines {
line = strings.TrimSpace(line)
if chapterHeadingRE.MatchString(line) {
if cur != nil {
segments = append(segments, *cur)
}
chNum++
// Try to parse the explicit chapter number from the heading.
if m := regexp.MustCompile(`\d+`).FindString(line); m != "" {
if n, err := strconv.Atoi(m); err == nil && n > 0 && n < 100000 {
chNum = n
}
}
cur = &segment{title: line, number: chNum}
} else if cur != nil && line != "" {
cur.lines = append(cur.lines, line)
}
}
if cur != nil {
segments = append(segments, *cur)
}
// Require segments to have meaningful content (>= 100 chars).
var chapters []bookstore.Chapter
for _, seg := range segments {
content := strings.Join(seg.lines, "\n")
if len(strings.TrimSpace(content)) < 50 {
continue
}
chapters = append(chapters, bookstore.Chapter{
Number: seg.number,
Title: seg.title,
Content: content,
})
}
// Fallback: no headings found — split by double newlines (paragraph blocks).
if len(chapters) == 0 {
paragraphs := strings.Split(text, "\n\n")
n := 0
for _, para := range paragraphs {
para = strings.TrimSpace(para)
if len(para) > 100 {
n++
chapters = append(chapters, bookstore.Chapter{
Number: n,
Title: fmt.Sprintf("Chapter %d", n),
Content: para,
})
}
}
}
return chapters
}
// ── Chapter ingestion ─────────────────────────────────────────────────────────
// IngestChapters stores extracted chapters for a book.

View File

@@ -654,13 +654,14 @@ func (s *Store) CreateImportTask(ctx context.Context, task domain.ImportTask) (s
"file_name": task.Slug + "." + task.FileType,
"file_type": task.FileType,
"object_key": task.ObjectKey,
"chapters_key": task.ChaptersKey,
"author": task.Author,
"cover_url": task.CoverURL,
"summary": task.Summary,
"book_status": task.BookStatus,
"status": string(domain.TaskStatusPending),
"chapters_done": 0,
"chapters_total": 0,
"chapters_total": task.ChaptersTotal,
"started": time.Now().UTC().Format(time.RFC3339),
"initiator_user_id": task.InitiatorUserID,
}
@@ -914,7 +915,7 @@ func (s *Store) FailTask(ctx context.Context, id, errMsg string) error {
}
// HeartbeatTask updates the heartbeat_at field on a running task.
// Tries scraping_tasks first, then audio_jobs, then translation_jobs.
// Tries scraping_tasks, audio_jobs, translation_jobs, then import_tasks.
func (s *Store) HeartbeatTask(ctx context.Context, id string) error {
payload := map[string]any{
"heartbeat_at": time.Now().UTC().Format(time.RFC3339),
@@ -925,7 +926,10 @@ func (s *Store) HeartbeatTask(ctx context.Context, id string) error {
if err := s.pb.patch(ctx, fmt.Sprintf("/api/collections/audio_jobs/records/%s", id), payload); err == nil {
return nil
}
return s.pb.patch(ctx, fmt.Sprintf("/api/collections/translation_jobs/records/%s", id), payload)
if err := s.pb.patch(ctx, fmt.Sprintf("/api/collections/translation_jobs/records/%s", id), payload); err == nil {
return nil
}
return s.pb.patch(ctx, fmt.Sprintf("/api/collections/import_tasks/records/%s", id), payload)
}
// ReapStaleTasks finds all running tasks whose heartbeat_at is either missing
@@ -943,7 +947,7 @@ func (s *Store) ReapStaleTasks(ctx context.Context, staleAfter time.Duration) (i
}
total := 0
for _, collection := range []string{"scraping_tasks", "audio_jobs", "translation_jobs"} {
for _, collection := range []string{"scraping_tasks", "audio_jobs", "translation_jobs", "import_tasks"} {
items, err := s.pb.listAll(ctx, collection, filter, "")
if err != nil {
return total, fmt.Errorf("ReapStaleTasks list %s: %w", collection, err)
@@ -1185,6 +1189,7 @@ func parseImportTask(raw json.RawMessage) (domain.ImportTask, error) {
FileName string `json:"file_name"`
FileType string `json:"file_type"`
ObjectKey string `json:"object_key"`
ChaptersKey string `json:"chapters_key"`
Author string `json:"author"`
CoverURL string `json:"cover_url"`
Genres string `json:"genres"` // stored as comma-separated
@@ -1219,6 +1224,7 @@ func parseImportTask(raw json.RawMessage) (domain.ImportTask, error) {
FileName: rec.FileName,
FileType: rec.FileType,
ObjectKey: rec.ObjectKey,
ChaptersKey: rec.ChaptersKey,
Author: rec.Author,
CoverURL: rec.CoverURL,
Genres: genres,
@@ -1266,6 +1272,20 @@ func (s *Store) PutImportFile(ctx context.Context, key string, data []byte) erro
return s.mc.putObject(ctx, "imports", key, "application/octet-stream", data)
}
// PutImportChapters stores a pre-parsed chapters JSON blob in MinIO.
func (s *Store) PutImportChapters(ctx context.Context, key string, data []byte) error {
return s.mc.putObject(ctx, "imports", key, "application/json", data)
}
// GetImportChapters retrieves the pre-parsed chapters JSON from MinIO.
func (s *Store) GetImportChapters(ctx context.Context, key string) ([]byte, error) {
data, err := s.mc.getObject(ctx, "imports", key)
if err != nil {
return nil, fmt.Errorf("get chapters object: %w", err)
}
return data, nil
}
func (s *Store) CoverExists(ctx context.Context, slug string) bool {
return s.mc.coverExists(ctx, CoverObjectKey(slug))
}

View File

@@ -58,6 +58,8 @@ services:
mc mb --ignore-existing local/audio;
mc mb --ignore-existing local/avatars;
mc mb --ignore-existing local/catalogue;
mc mb --ignore-existing local/translations;
mc mb --ignore-existing local/imports;
echo 'buckets ready';
"
environment:

View File

@@ -299,6 +299,40 @@ create "translation_jobs" '{
{"name":"heartbeat_at", "type":"date"}
]}'
create "import_tasks" '{
"name":"import_tasks","type":"base","fields":[
{"name":"slug", "type":"text", "required":true},
{"name":"title", "type":"text", "required":true},
{"name":"file_name", "type":"text"},
{"name":"file_type", "type":"text"},
{"name":"object_key", "type":"text"},
{"name":"chapters_key", "type":"text"},
{"name":"author", "type":"text"},
{"name":"cover_url", "type":"text"},
{"name":"genres", "type":"text"},
{"name":"summary", "type":"text"},
{"name":"book_status", "type":"text"},
{"name":"worker_id", "type":"text"},
{"name":"initiator_user_id", "type":"text"},
{"name":"status", "type":"text", "required":true},
{"name":"chapters_done", "type":"number"},
{"name":"chapters_total", "type":"number"},
{"name":"error_message", "type":"text"},
{"name":"started", "type":"date"},
{"name":"finished", "type":"date"},
{"name":"heartbeat_at", "type":"date"}
]}'
create "notifications" '{
"name":"notifications","type":"base","fields":[
{"name":"user_id", "type":"text","required":true},
{"name":"title", "type":"text","required":true},
{"name":"message", "type":"text"},
{"name":"link", "type":"text"},
{"name":"read", "type":"bool"},
{"name":"created", "type":"date"}
]}'
create "ai_jobs" '{
"name":"ai_jobs","type":"base","fields":[
{"name":"kind", "type":"text", "required":true},

View File

@@ -0,0 +1,44 @@
/* eslint-disable */
import { getLocale, experimentalStaticLocale } from '../runtime.js';
/** @typedef {import('../runtime.js').LocalizedString} LocalizedString */
/** @typedef {{}} Admin_Nav_NotificationsInputs */
const en_admin_nav_notifications = /** @type {(inputs: Admin_Nav_NotificationsInputs) => LocalizedString} */ () => {
return /** @type {LocalizedString} */ (`Notifications`)
};
const ru_admin_nav_notifications = /** @type {(inputs: Admin_Nav_NotificationsInputs) => LocalizedString} */ () => {
return /** @type {LocalizedString} */ (`Уведомления`)
};
const id_admin_nav_notifications = /** @type {(inputs: Admin_Nav_NotificationsInputs) => LocalizedString} */ () => {
return /** @type {LocalizedString} */ (`Notifikasi`)
};
const pt_admin_nav_notifications = /** @type {(inputs: Admin_Nav_NotificationsInputs) => LocalizedString} */ () => {
return /** @type {LocalizedString} */ (`Notificações`)
};
const fr_admin_nav_notifications = /** @type {(inputs: Admin_Nav_NotificationsInputs) => LocalizedString} */ () => {
return /** @type {LocalizedString} */ (`Notifications`)
};
/**
* | output |
* | --- |
* | "Notifications" |
*
* @param {Admin_Nav_NotificationsInputs} inputs
* @param {{ locale?: "en" | "ru" | "id" | "pt" | "fr" }} options
* @returns {LocalizedString}
*/
export const admin_nav_notifications = /** @type {((inputs?: Admin_Nav_NotificationsInputs, options?: { locale?: "en" | "ru" | "id" | "pt" | "fr" }) => LocalizedString) & import('../runtime.js').MessageMetadata<Admin_Nav_NotificationsInputs, { locale?: "en" | "ru" | "id" | "pt" | "fr" }, {}>} */ ((inputs = {}, options = {}) => {
const locale = experimentalStaticLocale ?? options.locale ?? getLocale()
if (locale === "en") return en_admin_nav_notifications(inputs)
if (locale === "ru") return ru_admin_nav_notifications(inputs)
if (locale === "id") return id_admin_nav_notifications(inputs)
if (locale === "pt") return pt_admin_nav_notifications(inputs)
return fr_admin_nav_notifications(inputs)
});

View File

@@ -100,6 +100,58 @@
const genres = $derived(parseGenres(data.book?.genres ?? []));
const chapterList = $derived(data.chapters ?? []);
// ── Admin: split chapters (imported PDF/EPUB books) ──────────────────────
const isFullTextBook = $derived(
chapterList.length === 1 && chapterList[0].title === 'Full Text'
);
let splitText = $state('');
let splitSaving = $state(false);
let splitResult = $state<'saved' | 'error' | ''>('');
let splitError = $state('');
let splitOpen = $state(false);
$effect(() => {
// Pre-fill the textarea with chapter 1 content when the panel is opened.
if (splitOpen && !splitText && data.book?.slug && isFullTextBook) {
fetch(`/api/chapter-markdown/${encodeURIComponent(data.book.slug)}/1`)
.then((r) => r.ok ? r.text() : '')
.then((t) => {
// Strip leading "# Full Text\n\n" header if present.
splitText = t.replace(/^# Full Text\n\n/, '').trim();
})
.catch(() => {});
}
});
async function splitChapters() {
const slug = data.book?.slug;
if (splitSaving || !slug) return;
splitSaving = true;
splitResult = '';
splitError = '';
try {
const res = await fetch(`/api/admin/books/${encodeURIComponent(slug)}/split-chapters`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text: splitText })
});
if (res.ok) {
splitResult = 'saved';
splitOpen = false;
await invalidateAll();
} else {
const d = await res.json().catch(() => ({}));
splitError = (d as any).error ?? 'Unknown error';
splitResult = 'error';
}
} catch (e: any) {
splitError = e?.message ?? '';
splitResult = 'error';
} finally {
splitSaving = false;
}
}
// ── Admin: rescrape ───────────────────────────────────────────────────────
let scraping = $state(false);
let scrapeResult = $state<'queued' | 'busy' | 'error' | ''>('');
@@ -979,7 +1031,7 @@
</a>
<!-- Admin panel (collapsed by default, admin only) -->
{#if data.isAdmin && book.source_url}
{#if data.isAdmin}
<div>
<button
onclick={() => (adminOpen = !adminOpen)}
@@ -997,6 +1049,62 @@
{#if adminOpen}
<div class="px-4 py-3 border-t border-(--color-border) flex flex-col gap-5">
<!-- Chapter split tool (only for imported books with single "Full Text" chapter) -->
{#if isFullTextBook}
<div class="flex flex-col gap-2">
<div class="flex items-center justify-between">
<p class="text-xs font-medium text-(--color-muted) uppercase tracking-wide">Split Chapters</p>
<button
onclick={() => { splitOpen = !splitOpen; splitResult = ''; splitError = ''; }}
class="text-xs text-(--color-muted) hover:text-(--color-text) transition-colors"
>
{splitOpen ? 'Hide' : 'Edit'}
</button>
</div>
{#if !splitOpen}
<p class="text-xs text-(--color-muted)">
This book has a single "Full Text" chapter. Use this tool to split it into chapters.
</p>
{/if}
{#if splitOpen}
<p class="text-xs text-(--color-muted)">
Insert <code class="bg-(--color-surface-3) px-1 rounded">---</code> on its own line to divide chapters.
Optionally start a segment with <code class="bg-(--color-surface-3) px-1 rounded">## Chapter Title</code>.
</p>
<textarea
bind:value={splitText}
rows="16"
class="w-full px-2 py-1.5 rounded bg-(--color-surface-3) border border-(--color-border) text-(--color-text) text-xs font-mono focus:outline-none focus:border-(--color-brand) resize-y"
placeholder="Paste or edit the full text here. Use --- to split chapters."
></textarea>
<div class="flex items-center gap-3 flex-wrap">
<button
onclick={splitChapters}
disabled={splitSaving || !splitText.trim()}
class="flex items-center gap-1.5 px-3 py-1.5 rounded text-xs font-medium transition-colors
{splitSaving || !splitText.trim() ? 'bg-(--color-surface-3) text-(--color-muted) cursor-not-allowed' : 'bg-(--color-brand)/20 text-(--color-brand-dim) hover:bg-(--color-brand)/40 border border-(--color-brand)/30'}"
>
{#if splitSaving}
<svg class="w-3 h-3 animate-spin" fill="none" viewBox="0 0 24 24"><circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"/><path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z"/></svg>
Saving…
{:else}
Save chapters
{/if}
</button>
{#if splitResult === 'saved'}
<span class="text-xs text-green-400">Saved.</span>
{:else if splitResult === 'error'}
<span class="text-xs text-(--color-danger)">{splitError || 'Error.'}</span>
{/if}
</div>
{/if}
</div>
<hr class="border-(--color-border)" />
{/if}
<!-- Rescrape / range-scrape (only for scraped books with a source URL) -->
{#if book.source_url}
<!-- Rescrape -->
<div class="flex items-center gap-3 flex-wrap">
<button
@@ -1065,6 +1173,7 @@
</span>
{/if}
</div>
{/if}
<hr class="border-(--color-border)" />