# 1. EPG Management - **Status:** Proposed - **Date:** 2026-05-06 ## Context `pz8-relay` currently acts as a stable, authenticated reverse proxy for a single upstream IPTV playlist URL. The upstream URL changes frequently; the relay shields client devices from those changes. We now want the relay to also handle EPG (Electronic Program Guide) data so that configured devices get TV listings without any extra client-side configuration: 1. The relayed URL points to an `m3u_plus` playlist. The playlist may or may not declare an EPG via the `url-tvg=` attribute on its `#EXTM3U` header. 2. We want to fetch one or more external EPGs (XMLTV / "TVXML" format, often distributed gzipped, e.g. ), merge them, and serve the result from the relay. 3. Channel identifiers diverge between sources: the playlist uses `tvg-id="Italia 1"` while open-epg uses `id="Italia 1.it"` (or similar). The merged EPG must use ids that match the playlist's `tvg-id` values, or client apps will not bind programmes to channels. 4. The playlist file is ~10 MB, and EPG files can be tens of MB once decompressed. All processing must stream — no full file in memory. ## Decision ### High-level pipeline A background worker, driven by `time.Ticker`, periodically refreshes both the playlist and the EPG in a single sequential pipeline: ``` fetch playlist ──► rewrite header (url-tvg → /epg) ──► atomic write playlist.m3u │ └─► extract tvg-id set (during stream) │ ▼ fetch EPG sources (configured + optional playlist EPG) │ ▼ gunzip + stream parse │ ▼ normalize ids + filter to tvg-id set │ ▼ atomic write epg.xml ``` HTTP handlers (`/playlist`, `/epg`) serve the most recent atomically-written file via `http.ServeContent`. They never block on the worker. ### Architectural shift: proxy → cache-and-serve (playlist only) Today `/` is a streaming reverse proxy to the upstream playlist URL. We replace this with a periodically-refreshed local copy served from disk. Rationale: - We must rewrite the `#EXTM3U` header anyway (to redirect `url-tvg` to our `/epg` endpoint), and we need the playlist parsed to extract `tvg-id`s for filtering the EPG. Doing both per-request, on every client poll, is wasteful. - A cached playlist is more resilient: if upstream is briefly down, devices still get the last-known-good file. - Stream URLs _inside_ the playlist are unaffected. Clients still hit them directly upstream — we are not proxying actual video. ### Configuration (env vars) | Var | Purpose | Default | | ------------------------------------------- | --------------------------------------------------------------------------- | ---------------------------- | | `PZ8_RELAY_TARGET_URL` | upstream m3u_plus playlist URL | required | | `PZ8_RELAY_USERNAME` / `PZ8_RELAY_PASSWORD` | basic auth | required | | `PZ8_RELAY_LISTEN_ADDR` | listen address | `:8080` | | `PZ8_RELAY_PUBLIC_URL` | public base URL of this relay (used to rewrite `url-tvg` to `/epg`) | required when EPG is enabled | | `PZ8_RELAY_EPG_URLS` | comma-separated list of EPG URLs (XMLTV, optionally gzipped) | empty (EPG disabled) | | `PZ8_RELAY_EPG_REFRESH` | refresh interval (Go duration, e.g. `12h`) | `12h` | | `PZ8_RELAY_PREFER_PLAYLIST_EPG` | use the playlist's own `url-tvg` when present (see below) | `false` | | `PZ8_RELAY_CACHE_DIR` | directory for the cached playlist + EPG file | `/var/cache/pz8-relay` | | `PZ8_RELAY_EPG_CONTENT_TYPE` | `Content-Type` header used by `/epg` | `application/xml` | EPG functionality is enabled iff `PZ8_RELAY_EPG_URLS` is non-empty **or** `PZ8_RELAY_PREFER_PLAYLIST_EPG=true`. When disabled, the relay falls back to the original behaviour (cached playlist served unchanged). ### `PREFER_PLAYLIST_EPG` semantics - `false` _(default)_: only `PZ8_RELAY_EPG_URLS` are used. Any `url-tvg` in the upstream playlist is ignored for fetching purposes (it is still rewritten in the served playlist). - `true`: if the upstream playlist declares `url-tvg=URL`, that URL becomes the _only_ EPG source for this refresh cycle. If the playlist has no `url-tvg`, fall back to `PZ8_RELAY_EPG_URLS`. This is a deliberate "either/or" interpretation of "prefer". A future `MERGE_PLAYLIST_EPG=true` mode could add the playlist EPG on top of configured URLs if needed. ### Playlist processing Streamed line-by-line. Two responsibilities: 1. **Rewrite the `#EXTM3U` header** (always the first non-empty line): - If it has `url-tvg="..."`, replace the URL with `/epg`. - If it does not, append `url-tvg="/epg"` (only when EPG is enabled). - Capture the original `url-tvg` value (if any) for use by `PREFER_PLAYLIST_EPG`. 2. **Extract tvg-ids** from each `#EXTINF:` line via a small regex (`tvg-id="([^"]*)"`). Empty/missing ids are skipped. Output is written to `/playlist.m3u.tmp` and `os.Rename`d into place once the stream completes. ### EPG fetching, parsing, merging Per source URL: 1. HTTP GET; if the response or URL ends in `.gz` (or `Content-Encoding: gzip`), wrap the body in `gzip.NewReader`. 2. Stream the body to a per-source uncompressed temp file (`/epg-src-.xml.tmp`). Disk-buffered, not in memory. 3. After all sources are downloaded, run a **two-pass merge** writing into `/epg.xml.tmp`: - **Pass 1 — channels.** For each source in order, stream `` elements with `xml.Decoder.Token`. For each channel: - Normalize the id (see below). - If it maps to a playlist tvg-id and the canonical id has not yet been seen, rewrite the `id` attribute to the canonical (playlist) form and emit it. Record `canonical_id → source_index` so programmes from _other_ sources for that channel can be ignored. - Otherwise drop. - **Pass 2 — programmes.** For each source, stream `` elements. Emit only those whose normalized `channel` attribute maps to a canonical id owned by this source, after rewriting the `channel` attribute. 4. The merged output is written through a `gzip.Writer` so the canonical on-disk artifact is `/epg.xml.gz`. `os.Rename` it into place. 5. Delete the per-source temp files. ### Serving the EPG with gzip The on-disk file is gzipped (XMLTV compresses ~10×, which is meaningful given typical sizes). The `/epg` handler dispatches based on `Accept-Encoding`: - **Client sends `gzip`**: serve `/epg.xml.gz` directly with `Content-Encoding: gzip` and the configured `Content-Type`, via `http.ServeContent` (handles `Last-Modified` / `If-Modified-Since` automatically). - **Client does not**: open the file, wrap in `gzip.NewReader`, set `Last-Modified` from the file mtime, handle `If-Modified-Since` manually (one stat + one comparison), and stream the decompressed bytes out. No `Content-Length` in this branch. Most modern client apps and HTTP libraries advertise gzip, so the cheap path is the common path. Two passes over disk-buffered files (rather than streaming all sources in one pass) is the simplest correct approach: XMLTV files emit all channels before all programmes, but the relative order across sources is unconstrained. The disk overhead is bounded (tens of MB) and acceptable. ### Channel id normalization Goal: collapse "Italia 1" / "Italia 1.it" / "italia1" to the same key, then map each EPG channel to the playlist's exact `tvg-id` value. Algorithm (`internal/idnorm`): 1. Lowercase. 2. Strip a trailing dotted country code if present (`.it`, `.uk`, `.de`, `.com`, `.us` — small explicit allow-list to avoid eating real id chars). 3. Remove all non-alphanumeric characters. 4. The result is the _normalization key_. At refresh time, build a map `normKey → playlist_tvg_id` from the playlist's extracted ids. During EPG processing, normalize each EPG channel id, look it up in this map, and rewrite it to the playlist value. Misses are dropped. This is heuristic and will mis-match in edge cases. We accept that for v1. A future `PZ8_RELAY_CHANNEL_MAP` file (manual `epg_id → tvg_id` overrides) is the planned escape hatch if the normalizer proves insufficient — not built until the need is real. ### HTTP endpoints | Path | Method | Auth | Behaviour | | ----------- | ------ | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | | `/playlist` | GET | basic auth | serves `/playlist.m3u` via `http.ServeContent`; `Content-Type: application/vnd.apple.mpegurl` | | `/epg` | GET | basic auth | serves the merged EPG; `Content-Type` from `PZ8_RELAY_EPG_CONTENT_TYPE` (default `application/xml`). Honors `Accept-Encoding: gzip` (see below) | | `/healthz` | GET | none | 200 if both cache files exist and are < 2× refresh interval old, else 503 | For backwards compatibility with existing devices, `/` continues to serve the playlist (alias for `/playlist`). `Last-Modified` and `If-Modified-Since` are handled automatically by `http.ServeContent`, which is useful for client devices that poll. ### Failure modes - **First refresh has not completed yet.** `/playlist` and `/epg` return 503 with a `Retry-After: 30` header. - **Subsequent refresh fails.** Keep serving the previous file; log at WARN. The next tick retries. - **Single EPG source fails** (within an otherwise successful refresh). Continue with the remaining sources; log at WARN. The merged output reflects what we did get. - **Playlist refresh fails but EPG is configured.** Skip the EPG refresh for this cycle (we cannot filter without a fresh tvg-id set; reusing the previous one is acceptable as a future improvement but adds state). ### Concurrency model - Single background goroutine drives the refresh loop; no fan-out. - HTTP handlers only do `http.ServeFile`-style reads on stable paths. - Writes are atomic via `os.Rename`, so readers either see the old file or the new file, never a partial one. No locks needed. - A `sync.Once`-guarded "first refresh done" channel lets handlers respond 503 cleanly until the first cycle completes. ### File layout Single Go module, multiple files in `package main` (the project is small): ``` main.go // wiring, http server, basic-auth middleware, env parsing worker.go // periodic refresh loop playlist.go // streaming playlist read/rewrite + tvg-id extraction epg.go // EPG fetch, gunzip, two-pass merge, id rewriting idnorm.go // channel id normalization ``` If complexity grows we can promote to `internal/...` packages later. ### Testing strategy Minimal, table-driven, focused on the parts where logic — not plumbing — can silently produce a wrong file. Go's `testdata/` convention houses fixtures. | File | What it covers | | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `idnorm_test.go` | Pure normalization. Table cases include `"Italia 1"` ↔ `"Italia 1.it"`, casing, embedded spaces/punctuation, the dotted-suffix allow-list (and ids that _look_ like they have a suffix but should be left alone). | | `playlist_test.go` | Stream the rewriter against small inline `#EXTM3U` inputs: (a) header with existing `url-tvg=` is replaced, (b) header without it gets one appended, (c) the original `url-tvg` value is captured for `PREFER_PLAYLIST_EPG`, (d) extracted `tvg-id` set matches expectation, (e) all non-header lines pass through byte-for-byte. | | `epg_test.go` | Merge two small XMLTV fixtures in `testdata/`: verify channel deduplication (first-source wins), id rewriting to canonical form, programmes filtered to known channels, programmes from the _non-owning_ source for a duplicate channel are dropped. Verify gunzip path by feeding one fixture as `.xml.gz`. | Out of scope for v1 tests: - HTTP wiring / handler glue (covered implicitly by manual smoke testing). - The `time.Ticker` worker loop (a thin shell around the testable units). - Filesystem atomicity (stdlib). Fixtures should be tiny (a handful of channels, a couple of programmes each) and committed under `testdata/`. No network in tests — every fetch path takes a `func(ctx) (io.ReadCloser, error)` or similar so tests inject readers. ## Consequences **Positive** - Devices configured once with `/playlist` get both playlist and EPG with no extra setup. - Upstream is hit on the refresh schedule, not per client request. - Atomic writes plus stable URLs make the relay robust to partial failures. - All processing is streaming/disk-buffered; memory footprint stays small even as EPGs grow. **Negative** - We now own a cache directory. Container runs need a writable volume (the distroless `nonroot` user must own ``). The Dockerfile and deployment must mount one or fall back to `/tmp`. - Channel id matching is heuristic; some channels will lack EPG data even when both sides "know about" them. - Behaviour shift from live proxy to cached playlist is a semantic change — if a stream endpoint inside the playlist is rotated by upstream, devices see the old URL until the next refresh cycle. ## Alternatives considered 1. **Keep the playlist as a live reverse proxy and rewrite on the fly.** Simpler in spirit but means re-fetching ~10 MB on every device poll, and we still need a periodic playlist fetch for EPG filtering — so we end up doing the work twice. Rejected. 2. **Skip channel-id normalization; serve the EPG verbatim and ask the user to fix client-side.** Defeats the purpose of "set it and forget it". Rejected. 3. **In-memory merged EPG.** A 50–200 MB string in memory is ugly for a service that otherwise has a tiny footprint, and it loses the ability to serve cleanly across restarts. Rejected. 4. **Re-emit EPG as gzip on disk.** Saves disk and bandwidth but complicates `If-Modified-Since` and conditional requests. Defer until measured. ## Open questions _None outstanding._