diff --git a/.dockerignore b/.dockerignore index 2abdc7e..dce8f89 100644 --- a/.dockerignore +++ b/.dockerignore @@ -1,4 +1,5 @@ .git .gitignore .env -README.md \ No newline at end of file +README.md +/adrs \ No newline at end of file diff --git a/adrs/01-epg-management.md b/adrs/01-epg-management.md new file mode 100644 index 0000000..8a464a2 --- /dev/null +++ b/adrs/01-epg-management.md @@ -0,0 +1,294 @@ +# 1. EPG Management + +- **Status:** Proposed +- **Date:** 2026-05-06 + +## Context + +`pz8-relay` currently acts as a stable, authenticated reverse proxy for a single +upstream IPTV playlist URL. The upstream URL changes frequently; the relay +shields client devices from those changes. + +We now want the relay to also handle EPG (Electronic Program Guide) data so that +configured devices get TV listings without any extra client-side configuration: + +1. The relayed URL points to an `m3u_plus` playlist. The playlist may or may + not declare an EPG via the `url-tvg=` attribute on its `#EXTM3U` header. +2. We want to fetch one or more external EPGs (XMLTV / "TVXML" format, often + distributed gzipped, e.g. ), + merge them, and serve the result from the relay. +3. Channel identifiers diverge between sources: the playlist uses + `tvg-id="Italia 1"` while open-epg uses `id="Italia 1.it"` (or similar). + The merged EPG must use ids that match the playlist's `tvg-id` values, or + client apps will not bind programmes to channels. +4. The playlist file is ~10 MB, and EPG files can be tens of MB once + decompressed. All processing must stream — no full file in memory. + +## Decision + +### High-level pipeline + +A background worker, driven by `time.Ticker`, periodically refreshes both the +playlist and the EPG in a single sequential pipeline: + +``` +fetch playlist ──► rewrite header (url-tvg → /epg) ──► atomic write playlist.m3u + │ + └─► extract tvg-id set (during stream) + │ + ▼ + fetch EPG sources (configured + optional playlist EPG) + │ + ▼ + gunzip + stream parse + │ + ▼ + normalize ids + filter to tvg-id set + │ + ▼ + atomic write epg.xml +``` + +HTTP handlers (`/playlist`, `/epg`) serve the most recent atomically-written +file via `http.ServeContent`. They never block on the worker. + +### Architectural shift: proxy → cache-and-serve (playlist only) + +Today `/` is a streaming reverse proxy to the upstream playlist URL. We replace +this with a periodically-refreshed local copy served from disk. Rationale: + +- We must rewrite the `#EXTM3U` header anyway (to redirect `url-tvg` to our + `/epg` endpoint), and we need the playlist parsed to extract `tvg-id`s for + filtering the EPG. Doing both per-request, on every client poll, is wasteful. +- A cached playlist is more resilient: if upstream is briefly down, devices + still get the last-known-good file. +- Stream URLs _inside_ the playlist are unaffected. Clients still hit them + directly upstream — we are not proxying actual video. + +### Configuration (env vars) + +| Var | Purpose | Default | +| ------------------------------------------- | --------------------------------------------------------------------------- | ---------------------------- | +| `PZ8_RELAY_TARGET_URL` | upstream m3u_plus playlist URL | required | +| `PZ8_RELAY_USERNAME` / `PZ8_RELAY_PASSWORD` | basic auth | required | +| `PZ8_RELAY_LISTEN_ADDR` | listen address | `:8080` | +| `PZ8_RELAY_PUBLIC_URL` | public base URL of this relay (used to rewrite `url-tvg` to `/epg`) | required when EPG is enabled | +| `PZ8_RELAY_EPG_URLS` | comma-separated list of EPG URLs (XMLTV, optionally gzipped) | empty (EPG disabled) | +| `PZ8_RELAY_EPG_REFRESH` | refresh interval (Go duration, e.g. `12h`) | `12h` | +| `PZ8_RELAY_PREFER_PLAYLIST_EPG` | use the playlist's own `url-tvg` when present (see below) | `false` | +| `PZ8_RELAY_CACHE_DIR` | directory for the cached playlist + EPG file | `/var/cache/pz8-relay` | +| `PZ8_RELAY_EPG_CONTENT_TYPE` | `Content-Type` header used by `/epg` | `application/xml` | + +EPG functionality is enabled iff `PZ8_RELAY_EPG_URLS` is non-empty **or** +`PZ8_RELAY_PREFER_PLAYLIST_EPG=true`. When disabled, the relay falls back to +the original behaviour (cached playlist served unchanged). + +### `PREFER_PLAYLIST_EPG` semantics + +- `false` _(default)_: only `PZ8_RELAY_EPG_URLS` are used. Any `url-tvg` in the + upstream playlist is ignored for fetching purposes (it is still rewritten in + the served playlist). +- `true`: if the upstream playlist declares `url-tvg=URL`, that URL becomes the + _only_ EPG source for this refresh cycle. If the playlist has no `url-tvg`, + fall back to `PZ8_RELAY_EPG_URLS`. + +This is a deliberate "either/or" interpretation of "prefer". A future +`MERGE_PLAYLIST_EPG=true` mode could add the playlist EPG on top of configured +URLs if needed. + +### Playlist processing + +Streamed line-by-line. Two responsibilities: + +1. **Rewrite the `#EXTM3U` header** (always the first non-empty line): + - If it has `url-tvg="..."`, replace the URL with `/epg`. + - If it does not, append `url-tvg="/epg"` (only when EPG is + enabled). + - Capture the original `url-tvg` value (if any) for use by + `PREFER_PLAYLIST_EPG`. +2. **Extract tvg-ids** from each `#EXTINF:` line via a small regex + (`tvg-id="([^"]*)"`). Empty/missing ids are skipped. + +Output is written to `/playlist.m3u.tmp` and `os.Rename`d into place +once the stream completes. + +### EPG fetching, parsing, merging + +Per source URL: + +1. HTTP GET; if the response or URL ends in `.gz` (or `Content-Encoding: gzip`), + wrap the body in `gzip.NewReader`. +2. Stream the body to a per-source uncompressed temp file + (`/epg-src-.xml.tmp`). Disk-buffered, not in memory. +3. After all sources are downloaded, run a **two-pass merge** writing into + `/epg.xml.tmp`: + - **Pass 1 — channels.** For each source in order, stream `` + elements with `xml.Decoder.Token`. For each channel: + - Normalize the id (see below). + - If it maps to a playlist tvg-id and the canonical id has not yet been + seen, rewrite the `id` attribute to the canonical (playlist) form and + emit it. Record `canonical_id → source_index` so programmes from + _other_ sources for that channel can be ignored. + - Otherwise drop. + - **Pass 2 — programmes.** For each source, stream `` elements. + Emit only those whose normalized `channel` attribute maps to a canonical + id owned by this source, after rewriting the `channel` attribute. +4. The merged output is written through a `gzip.Writer` so the canonical + on-disk artifact is `/epg.xml.gz`. `os.Rename` it into place. +5. Delete the per-source temp files. + +### Serving the EPG with gzip + +The on-disk file is gzipped (XMLTV compresses ~10×, which is meaningful given +typical sizes). The `/epg` handler dispatches based on `Accept-Encoding`: + +- **Client sends `gzip`**: serve `/epg.xml.gz` directly with + `Content-Encoding: gzip` and the configured `Content-Type`, via + `http.ServeContent` (handles `Last-Modified` / `If-Modified-Since` + automatically). +- **Client does not**: open the file, wrap in `gzip.NewReader`, set + `Last-Modified` from the file mtime, handle `If-Modified-Since` manually + (one stat + one comparison), and stream the decompressed bytes out. No + `Content-Length` in this branch. + +Most modern client apps and HTTP libraries advertise gzip, so the cheap path +is the common path. + +Two passes over disk-buffered files (rather than streaming all sources in one +pass) is the simplest correct approach: XMLTV files emit all channels before +all programmes, but the relative order across sources is unconstrained. The +disk overhead is bounded (tens of MB) and acceptable. + +### Channel id normalization + +Goal: collapse "Italia 1" / "Italia 1.it" / "italia1" to the same key, then map +each EPG channel to the playlist's exact `tvg-id` value. + +Algorithm (`internal/idnorm`): + +1. Lowercase. +2. Strip a trailing dotted country code if present (`.it`, `.uk`, `.de`, + `.com`, `.us` — small explicit allow-list to avoid eating real id chars). +3. Remove all non-alphanumeric characters. +4. The result is the _normalization key_. + +At refresh time, build a map `normKey → playlist_tvg_id` from the playlist's +extracted ids. During EPG processing, normalize each EPG channel id, look it +up in this map, and rewrite it to the playlist value. Misses are dropped. + +This is heuristic and will mis-match in edge cases. We accept that for v1. +A future `PZ8_RELAY_CHANNEL_MAP` file (manual `epg_id → tvg_id` overrides) +is the planned escape hatch if the normalizer proves insufficient — not built +until the need is real. + +### HTTP endpoints + +| Path | Method | Auth | Behaviour | +| ----------- | ------ | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | +| `/playlist` | GET | basic auth | serves `/playlist.m3u` via `http.ServeContent`; `Content-Type: application/vnd.apple.mpegurl` | +| `/epg` | GET | basic auth | serves the merged EPG; `Content-Type` from `PZ8_RELAY_EPG_CONTENT_TYPE` (default `application/xml`). Honors `Accept-Encoding: gzip` (see below) | +| `/healthz` | GET | none | 200 if both cache files exist and are < 2× refresh interval old, else 503 | + +For backwards compatibility with existing devices, `/` continues to serve the +playlist (alias for `/playlist`). + +`Last-Modified` and `If-Modified-Since` are handled automatically by +`http.ServeContent`, which is useful for client devices that poll. + +### Failure modes + +- **First refresh has not completed yet.** `/playlist` and `/epg` return 503 + with a `Retry-After: 30` header. +- **Subsequent refresh fails.** Keep serving the previous file; log at WARN. + The next tick retries. +- **Single EPG source fails** (within an otherwise successful refresh). + Continue with the remaining sources; log at WARN. The merged output reflects + what we did get. +- **Playlist refresh fails but EPG is configured.** Skip the EPG refresh for + this cycle (we cannot filter without a fresh tvg-id set; reusing the + previous one is acceptable as a future improvement but adds state). + +### Concurrency model + +- Single background goroutine drives the refresh loop; no fan-out. +- HTTP handlers only do `http.ServeFile`-style reads on stable paths. +- Writes are atomic via `os.Rename`, so readers either see the old file or the + new file, never a partial one. No locks needed. +- A `sync.Once`-guarded "first refresh done" channel lets handlers respond 503 + cleanly until the first cycle completes. + +### File layout + +Single Go module, multiple files in `package main` (the project is small): + +``` +main.go // wiring, http server, basic-auth middleware, env parsing +worker.go // periodic refresh loop +playlist.go // streaming playlist read/rewrite + tvg-id extraction +epg.go // EPG fetch, gunzip, two-pass merge, id rewriting +idnorm.go // channel id normalization +``` + +If complexity grows we can promote to `internal/...` packages later. + +### Testing strategy + +Minimal, table-driven, focused on the parts where logic — not plumbing — can +silently produce a wrong file. Go's `testdata/` convention houses fixtures. + +| File | What it covers | +| ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `idnorm_test.go` | Pure normalization. Table cases include `"Italia 1"` ↔ `"Italia 1.it"`, casing, embedded spaces/punctuation, the dotted-suffix allow-list (and ids that _look_ like they have a suffix but should be left alone). | +| `playlist_test.go` | Stream the rewriter against small inline `#EXTM3U` inputs: (a) header with existing `url-tvg=` is replaced, (b) header without it gets one appended, (c) the original `url-tvg` value is captured for `PREFER_PLAYLIST_EPG`, (d) extracted `tvg-id` set matches expectation, (e) all non-header lines pass through byte-for-byte. | +| `epg_test.go` | Merge two small XMLTV fixtures in `testdata/`: verify channel deduplication (first-source wins), id rewriting to canonical form, programmes filtered to known channels, programmes from the _non-owning_ source for a duplicate channel are dropped. Verify gunzip path by feeding one fixture as `.xml.gz`. | + +Out of scope for v1 tests: + +- HTTP wiring / handler glue (covered implicitly by manual smoke testing). +- The `time.Ticker` worker loop (a thin shell around the testable units). +- Filesystem atomicity (stdlib). + +Fixtures should be tiny (a handful of channels, a couple of programmes each) +and committed under `testdata/`. No network in tests — every fetch path takes +a `func(ctx) (io.ReadCloser, error)` or similar so tests inject readers. + +## Consequences + +**Positive** + +- Devices configured once with `/playlist` get both playlist and EPG + with no extra setup. +- Upstream is hit on the refresh schedule, not per client request. +- Atomic writes plus stable URLs make the relay robust to partial failures. +- All processing is streaming/disk-buffered; memory footprint stays small + even as EPGs grow. + +**Negative** + +- We now own a cache directory. Container runs need a writable volume (the + distroless `nonroot` user must own ``). The Dockerfile and + deployment must mount one or fall back to `/tmp`. +- Channel id matching is heuristic; some channels will lack EPG data even + when both sides "know about" them. +- Behaviour shift from live proxy to cached playlist is a semantic change — + if a stream endpoint inside the playlist is rotated by upstream, devices + see the old URL until the next refresh cycle. + +## Alternatives considered + +1. **Keep the playlist as a live reverse proxy and rewrite on the fly.** + Simpler in spirit but means re-fetching ~10 MB on every device poll, and + we still need a periodic playlist fetch for EPG filtering — so we end up + doing the work twice. Rejected. +2. **Skip channel-id normalization; serve the EPG verbatim and ask the user + to fix client-side.** Defeats the purpose of "set it and forget it". + Rejected. +3. **In-memory merged EPG.** A 50–200 MB string in memory is ugly for a + service that otherwise has a tiny footprint, and it loses the ability to + serve cleanly across restarts. Rejected. +4. **Re-emit EPG as gzip on disk.** Saves disk and bandwidth but complicates + `If-Modified-Since` and conditional requests. Defer until measured. + +## Open questions + +_None outstanding._