# libnovel Project Go web scraper for novelfire.net with TTS support via Kokoro-FastAPI. ## Architecture ``` scraper/ ├── cmd/scraper/main.go # Entry point: 'run' (one-shot) and 'serve' (HTTP server) ├── internal/ │ ├── orchestrator/orchestrator.go # Coordinates catalogue walk, metadata extraction, chapter scraping │ ├── browser/ # Browser client (content/scrape/cdp strategies) via Browserless │ ├── novelfire/scraper.go # novelfire.net specific scraping logic │ ├── server/server.go # HTTP API (POST /scrape, POST /scrape/book) │ ├── writer/writer.go # File writer (metadata.yaml, chapter .md files) │ └── scraper/interfaces.go # NovelScraper interface definition └── static/books/ # Output directory for scraped content ``` ## Key Concepts - **Orchestrator**: Manages concurrency - catalogue streaming → per-book metadata goroutines → chapter worker pool - **Browser Client**: 3 strategies (content/scrape/cdp) via Browserless Chrome container - **Writer**: Writes metadata.yaml and chapter markdown files to `static/books/{slug}/vol-0/1-50/` - **Server**: HTTP API with async scrape jobs, UI for browsing books/chapters, chapter-text endpoint for TTS ## Commands ```bash # Build cd scraper && go build -o bin/scraper ./cmd/scraper # One-shot scrape (full catalogue) ./bin/scraper run # Single book ./bin/scraper run --url https://novelfire.net/book/xxx # HTTP server ./bin/scraper serve # Tests cd scraper && go test ./... ``` ## Environment Variables | Variable | Description | Default | |----------|-------------|---------| | BROWSERLESS_URL | Browserless Chrome endpoint | http://localhost:3030 | | BROWSERLESS_STRATEGY | content \| scrape \| cdp | content | | SCRAPER_WORKERS | Chapter goroutines | NumCPU | | SCRAPER_STATIC_ROOT | Output directory | ./static/books | | SCRAPER_HTTP_ADDR | HTTP listen address | :8080 | | KOKORO_URL | Kokoro TTS endpoint | http://localhost:8880 | | KOKORO_VOICE | Default TTS voice | af_bella | | LOG_LEVEL | debug \| info \| warn \| error | info | ## Docker ```bash docker-compose up -d # Starts browserless, kokoro, scraper ``` ## Code Patterns - Uses `log/slog` for structured logging - Context-based cancellation throughout - Worker pool pattern in orchestrator (channel + goroutines) - Mutex for single async job (409 on concurrent scrape requests) ## AI Context Tips - Primary files to modify: `orchestrator.go`, `server.go`, `scraper.go`, `browser/*.go` - To add new source: implement `NovelScraper` interface from `internal/scraper/interfaces.go` - Skip `static/` directory - generated content, not source ## Speed Up AI Sessions (Optional) For faster AI context loading, use **Context7** (free, local indexing): ```bash # Install and index once npx @context7/cli@latest index --path . --ignore .aiignore # After first run, AI tools will query the index instead of re-scanning files ``` VSCode extension: https://marketplace.visualstudio.com/items?itemName=context7.context7