Files
pz8-relay/adrs/01-epg-management.md
T

295 lines
16 KiB
Markdown
Raw Normal View History

2026-05-06 10:55:58 +02:00
# 1. EPG Management
- **Status:** Proposed
- **Date:** 2026-05-06
## Context
`pz8-relay` currently acts as a stable, authenticated reverse proxy for a single
upstream IPTV playlist URL. The upstream URL changes frequently; the relay
shields client devices from those changes.
We now want the relay to also handle EPG (Electronic Program Guide) data so that
configured devices get TV listings without any extra client-side configuration:
1. The relayed URL points to an `m3u_plus` playlist. The playlist may or may
not declare an EPG via the `url-tvg=` attribute on its `#EXTM3U` header.
2. We want to fetch one or more external EPGs (XMLTV / "TVXML" format, often
distributed gzipped, e.g. <https://www.open-epg.com/files/italy1.xml.gz>),
merge them, and serve the result from the relay.
3. Channel identifiers diverge between sources: the playlist uses
`tvg-id="Italia 1"` while open-epg uses `id="Italia 1.it"` (or similar).
The merged EPG must use ids that match the playlist's `tvg-id` values, or
client apps will not bind programmes to channels.
4. The playlist file is ~10 MB, and EPG files can be tens of MB once
decompressed. All processing must stream — no full file in memory.
## Decision
### High-level pipeline
A background worker, driven by `time.Ticker`, periodically refreshes both the
playlist and the EPG in a single sequential pipeline:
```
fetch playlist ──► rewrite header (url-tvg → /epg) ──► atomic write playlist.m3u
└─► extract tvg-id set (during stream)
fetch EPG sources (configured + optional playlist EPG)
gunzip + stream parse
normalize ids + filter to tvg-id set
atomic write epg.xml
```
HTTP handlers (`/playlist`, `/epg`) serve the most recent atomically-written
file via `http.ServeContent`. They never block on the worker.
### Architectural shift: proxy → cache-and-serve (playlist only)
Today `/` is a streaming reverse proxy to the upstream playlist URL. We replace
this with a periodically-refreshed local copy served from disk. Rationale:
- We must rewrite the `#EXTM3U` header anyway (to redirect `url-tvg` to our
`/epg` endpoint), and we need the playlist parsed to extract `tvg-id`s for
filtering the EPG. Doing both per-request, on every client poll, is wasteful.
- A cached playlist is more resilient: if upstream is briefly down, devices
still get the last-known-good file.
- Stream URLs _inside_ the playlist are unaffected. Clients still hit them
directly upstream — we are not proxying actual video.
### Configuration (env vars)
| Var | Purpose | Default |
| ------------------------------------------- | --------------------------------------------------------------------------- | ---------------------------- |
| `PZ8_RELAY_TARGET_URL` | upstream m3u_plus playlist URL | required |
| `PZ8_RELAY_USERNAME` / `PZ8_RELAY_PASSWORD` | basic auth | required |
| `PZ8_RELAY_LISTEN_ADDR` | listen address | `:8080` |
| `PZ8_RELAY_PUBLIC_URL` | public base URL of this relay (used to rewrite `url-tvg` to `<public>/epg`) | required when EPG is enabled |
| `PZ8_RELAY_EPG_URLS` | comma-separated list of EPG URLs (XMLTV, optionally gzipped) | empty (EPG disabled) |
| `PZ8_RELAY_EPG_REFRESH` | refresh interval (Go duration, e.g. `12h`) | `12h` |
| `PZ8_RELAY_PREFER_PLAYLIST_EPG` | use the playlist's own `url-tvg` when present (see below) | `false` |
| `PZ8_RELAY_CACHE_DIR` | directory for the cached playlist + EPG file | `/var/cache/pz8-relay` |
| `PZ8_RELAY_EPG_CONTENT_TYPE` | `Content-Type` header used by `/epg` | `application/xml` |
EPG functionality is enabled iff `PZ8_RELAY_EPG_URLS` is non-empty **or**
`PZ8_RELAY_PREFER_PLAYLIST_EPG=true`. When disabled, the relay falls back to
the original behaviour (cached playlist served unchanged).
### `PREFER_PLAYLIST_EPG` semantics
- `false` _(default)_: only `PZ8_RELAY_EPG_URLS` are used. Any `url-tvg` in the
upstream playlist is ignored for fetching purposes (it is still rewritten in
the served playlist).
- `true`: if the upstream playlist declares `url-tvg=URL`, that URL becomes the
_only_ EPG source for this refresh cycle. If the playlist has no `url-tvg`,
fall back to `PZ8_RELAY_EPG_URLS`.
This is a deliberate "either/or" interpretation of "prefer". A future
`MERGE_PLAYLIST_EPG=true` mode could add the playlist EPG on top of configured
URLs if needed.
### Playlist processing
Streamed line-by-line. Two responsibilities:
1. **Rewrite the `#EXTM3U` header** (always the first non-empty line):
- If it has `url-tvg="..."`, replace the URL with `<PUBLIC_URL>/epg`.
- If it does not, append `url-tvg="<PUBLIC_URL>/epg"` (only when EPG is
enabled).
- Capture the original `url-tvg` value (if any) for use by
`PREFER_PLAYLIST_EPG`.
2. **Extract tvg-ids** from each `#EXTINF:` line via a small regex
(`tvg-id="([^"]*)"`). Empty/missing ids are skipped.
Output is written to `<cache>/playlist.m3u.tmp` and `os.Rename`d into place
once the stream completes.
### EPG fetching, parsing, merging
Per source URL:
1. HTTP GET; if the response or URL ends in `.gz` (or `Content-Encoding: gzip`),
wrap the body in `gzip.NewReader`.
2. Stream the body to a per-source uncompressed temp file
(`<cache>/epg-src-<n>.xml.tmp`). Disk-buffered, not in memory.
3. After all sources are downloaded, run a **two-pass merge** writing into
`<cache>/epg.xml.tmp`:
- **Pass 1 — channels.** For each source in order, stream `<channel>`
elements with `xml.Decoder.Token`. For each channel:
- Normalize the id (see below).
- If it maps to a playlist tvg-id and the canonical id has not yet been
seen, rewrite the `id` attribute to the canonical (playlist) form and
emit it. Record `canonical_id → source_index` so programmes from
_other_ sources for that channel can be ignored.
- Otherwise drop.
- **Pass 2 — programmes.** For each source, stream `<programme>` elements.
Emit only those whose normalized `channel` attribute maps to a canonical
id owned by this source, after rewriting the `channel` attribute.
4. The merged output is written through a `gzip.Writer` so the canonical
on-disk artifact is `<cache>/epg.xml.gz`. `os.Rename` it into place.
5. Delete the per-source temp files.
### Serving the EPG with gzip
The on-disk file is gzipped (XMLTV compresses ~10×, which is meaningful given
typical sizes). The `/epg` handler dispatches based on `Accept-Encoding`:
- **Client sends `gzip`**: serve `<cache>/epg.xml.gz` directly with
`Content-Encoding: gzip` and the configured `Content-Type`, via
`http.ServeContent` (handles `Last-Modified` / `If-Modified-Since`
automatically).
- **Client does not**: open the file, wrap in `gzip.NewReader`, set
`Last-Modified` from the file mtime, handle `If-Modified-Since` manually
(one stat + one comparison), and stream the decompressed bytes out. No
`Content-Length` in this branch.
Most modern client apps and HTTP libraries advertise gzip, so the cheap path
is the common path.
Two passes over disk-buffered files (rather than streaming all sources in one
pass) is the simplest correct approach: XMLTV files emit all channels before
all programmes, but the relative order across sources is unconstrained. The
disk overhead is bounded (tens of MB) and acceptable.
### Channel id normalization
Goal: collapse "Italia 1" / "Italia 1.it" / "italia1" to the same key, then map
each EPG channel to the playlist's exact `tvg-id` value.
Algorithm (`internal/idnorm`):
1. Lowercase.
2. Strip a trailing dotted country code if present (`.it`, `.uk`, `.de`,
`.com`, `.us` — small explicit allow-list to avoid eating real id chars).
3. Remove all non-alphanumeric characters.
4. The result is the _normalization key_.
At refresh time, build a map `normKey → playlist_tvg_id` from the playlist's
extracted ids. During EPG processing, normalize each EPG channel id, look it
up in this map, and rewrite it to the playlist value. Misses are dropped.
This is heuristic and will mis-match in edge cases. We accept that for v1.
A future `PZ8_RELAY_CHANNEL_MAP` file (manual `epg_id → tvg_id` overrides)
is the planned escape hatch if the normalizer proves insufficient — not built
until the need is real.
### HTTP endpoints
| Path | Method | Auth | Behaviour |
| ----------- | ------ | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| `/playlist` | GET | basic auth | serves `<cache>/playlist.m3u` via `http.ServeContent`; `Content-Type: application/vnd.apple.mpegurl` |
| `/epg` | GET | basic auth | serves the merged EPG; `Content-Type` from `PZ8_RELAY_EPG_CONTENT_TYPE` (default `application/xml`). Honors `Accept-Encoding: gzip` (see below) |
| `/healthz` | GET | none | 200 if both cache files exist and are < 2× refresh interval old, else 503 |
For backwards compatibility with existing devices, `/` continues to serve the
playlist (alias for `/playlist`).
`Last-Modified` and `If-Modified-Since` are handled automatically by
`http.ServeContent`, which is useful for client devices that poll.
### Failure modes
- **First refresh has not completed yet.** `/playlist` and `/epg` return 503
with a `Retry-After: 30` header.
- **Subsequent refresh fails.** Keep serving the previous file; log at WARN.
The next tick retries.
- **Single EPG source fails** (within an otherwise successful refresh).
Continue with the remaining sources; log at WARN. The merged output reflects
what we did get.
- **Playlist refresh fails but EPG is configured.** Skip the EPG refresh for
this cycle (we cannot filter without a fresh tvg-id set; reusing the
previous one is acceptable as a future improvement but adds state).
### Concurrency model
- Single background goroutine drives the refresh loop; no fan-out.
- HTTP handlers only do `http.ServeFile`-style reads on stable paths.
- Writes are atomic via `os.Rename`, so readers either see the old file or the
new file, never a partial one. No locks needed.
- A `sync.Once`-guarded "first refresh done" channel lets handlers respond 503
cleanly until the first cycle completes.
### File layout
Single Go module, multiple files in `package main` (the project is small):
```
main.go // wiring, http server, basic-auth middleware, env parsing
worker.go // periodic refresh loop
playlist.go // streaming playlist read/rewrite + tvg-id extraction
epg.go // EPG fetch, gunzip, two-pass merge, id rewriting
idnorm.go // channel id normalization
```
If complexity grows we can promote to `internal/...` packages later.
### Testing strategy
Minimal, table-driven, focused on the parts where logic — not plumbing — can
silently produce a wrong file. Go's `testdata/` convention houses fixtures.
| File | What it covers |
| ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `idnorm_test.go` | Pure normalization. Table cases include `"Italia 1"``"Italia 1.it"`, casing, embedded spaces/punctuation, the dotted-suffix allow-list (and ids that _look_ like they have a suffix but should be left alone). |
| `playlist_test.go` | Stream the rewriter against small inline `#EXTM3U` inputs: (a) header with existing `url-tvg=` is replaced, (b) header without it gets one appended, (c) the original `url-tvg` value is captured for `PREFER_PLAYLIST_EPG`, (d) extracted `tvg-id` set matches expectation, (e) all non-header lines pass through byte-for-byte. |
| `epg_test.go` | Merge two small XMLTV fixtures in `testdata/`: verify channel deduplication (first-source wins), id rewriting to canonical form, programmes filtered to known channels, programmes from the _non-owning_ source for a duplicate channel are dropped. Verify gunzip path by feeding one fixture as `.xml.gz`. |
Out of scope for v1 tests:
- HTTP wiring / handler glue (covered implicitly by manual smoke testing).
- The `time.Ticker` worker loop (a thin shell around the testable units).
- Filesystem atomicity (stdlib).
Fixtures should be tiny (a handful of channels, a couple of programmes each)
and committed under `testdata/`. No network in tests — every fetch path takes
a `func(ctx) (io.ReadCloser, error)` or similar so tests inject readers.
## Consequences
**Positive**
- Devices configured once with `<relay>/playlist` get both playlist and EPG
with no extra setup.
- Upstream is hit on the refresh schedule, not per client request.
- Atomic writes plus stable URLs make the relay robust to partial failures.
- All processing is streaming/disk-buffered; memory footprint stays small
even as EPGs grow.
**Negative**
- We now own a cache directory. Container runs need a writable volume (the
distroless `nonroot` user must own `<cache>`). The Dockerfile and
deployment must mount one or fall back to `/tmp`.
- Channel id matching is heuristic; some channels will lack EPG data even
when both sides "know about" them.
- Behaviour shift from live proxy to cached playlist is a semantic change —
if a stream endpoint inside the playlist is rotated by upstream, devices
see the old URL until the next refresh cycle.
## Alternatives considered
1. **Keep the playlist as a live reverse proxy and rewrite on the fly.**
Simpler in spirit but means re-fetching ~10 MB on every device poll, and
we still need a periodic playlist fetch for EPG filtering — so we end up
doing the work twice. Rejected.
2. **Skip channel-id normalization; serve the EPG verbatim and ask the user
to fix client-side.** Defeats the purpose of "set it and forget it".
Rejected.
3. **In-memory merged EPG.** A 50200 MB string in memory is ugly for a
service that otherwise has a tiny footprint, and it loses the ability to
serve cleanly across restarts. Rejected.
4. **Re-emit EPG as gzip on disk.** Saves disk and bandwidth but complicates
`If-Modified-Since` and conditional requests. Defer until measured.
## Open questions
_None outstanding._