feat: adr for EPG management
This commit is contained in:
@@ -0,0 +1,294 @@
|
||||
# 1. EPG Management
|
||||
|
||||
- **Status:** Proposed
|
||||
- **Date:** 2026-05-06
|
||||
|
||||
## Context
|
||||
|
||||
`pz8-relay` currently acts as a stable, authenticated reverse proxy for a single
|
||||
upstream IPTV playlist URL. The upstream URL changes frequently; the relay
|
||||
shields client devices from those changes.
|
||||
|
||||
We now want the relay to also handle EPG (Electronic Program Guide) data so that
|
||||
configured devices get TV listings without any extra client-side configuration:
|
||||
|
||||
1. The relayed URL points to an `m3u_plus` playlist. The playlist may or may
|
||||
not declare an EPG via the `url-tvg=` attribute on its `#EXTM3U` header.
|
||||
2. We want to fetch one or more external EPGs (XMLTV / "TVXML" format, often
|
||||
distributed gzipped, e.g. <https://www.open-epg.com/files/italy1.xml.gz>),
|
||||
merge them, and serve the result from the relay.
|
||||
3. Channel identifiers diverge between sources: the playlist uses
|
||||
`tvg-id="Italia 1"` while open-epg uses `id="Italia 1.it"` (or similar).
|
||||
The merged EPG must use ids that match the playlist's `tvg-id` values, or
|
||||
client apps will not bind programmes to channels.
|
||||
4. The playlist file is ~10 MB, and EPG files can be tens of MB once
|
||||
decompressed. All processing must stream — no full file in memory.
|
||||
|
||||
## Decision
|
||||
|
||||
### High-level pipeline
|
||||
|
||||
A background worker, driven by `time.Ticker`, periodically refreshes both the
|
||||
playlist and the EPG in a single sequential pipeline:
|
||||
|
||||
```
|
||||
fetch playlist ──► rewrite header (url-tvg → /epg) ──► atomic write playlist.m3u
|
||||
│
|
||||
└─► extract tvg-id set (during stream)
|
||||
│
|
||||
▼
|
||||
fetch EPG sources (configured + optional playlist EPG)
|
||||
│
|
||||
▼
|
||||
gunzip + stream parse
|
||||
│
|
||||
▼
|
||||
normalize ids + filter to tvg-id set
|
||||
│
|
||||
▼
|
||||
atomic write epg.xml
|
||||
```
|
||||
|
||||
HTTP handlers (`/playlist`, `/epg`) serve the most recent atomically-written
|
||||
file via `http.ServeContent`. They never block on the worker.
|
||||
|
||||
### Architectural shift: proxy → cache-and-serve (playlist only)
|
||||
|
||||
Today `/` is a streaming reverse proxy to the upstream playlist URL. We replace
|
||||
this with a periodically-refreshed local copy served from disk. Rationale:
|
||||
|
||||
- We must rewrite the `#EXTM3U` header anyway (to redirect `url-tvg` to our
|
||||
`/epg` endpoint), and we need the playlist parsed to extract `tvg-id`s for
|
||||
filtering the EPG. Doing both per-request, on every client poll, is wasteful.
|
||||
- A cached playlist is more resilient: if upstream is briefly down, devices
|
||||
still get the last-known-good file.
|
||||
- Stream URLs _inside_ the playlist are unaffected. Clients still hit them
|
||||
directly upstream — we are not proxying actual video.
|
||||
|
||||
### Configuration (env vars)
|
||||
|
||||
| Var | Purpose | Default |
|
||||
| ------------------------------------------- | --------------------------------------------------------------------------- | ---------------------------- |
|
||||
| `PZ8_RELAY_TARGET_URL` | upstream m3u_plus playlist URL | required |
|
||||
| `PZ8_RELAY_USERNAME` / `PZ8_RELAY_PASSWORD` | basic auth | required |
|
||||
| `PZ8_RELAY_LISTEN_ADDR` | listen address | `:8080` |
|
||||
| `PZ8_RELAY_PUBLIC_URL` | public base URL of this relay (used to rewrite `url-tvg` to `<public>/epg`) | required when EPG is enabled |
|
||||
| `PZ8_RELAY_EPG_URLS` | comma-separated list of EPG URLs (XMLTV, optionally gzipped) | empty (EPG disabled) |
|
||||
| `PZ8_RELAY_EPG_REFRESH` | refresh interval (Go duration, e.g. `12h`) | `12h` |
|
||||
| `PZ8_RELAY_PREFER_PLAYLIST_EPG` | use the playlist's own `url-tvg` when present (see below) | `false` |
|
||||
| `PZ8_RELAY_CACHE_DIR` | directory for the cached playlist + EPG file | `/var/cache/pz8-relay` |
|
||||
| `PZ8_RELAY_EPG_CONTENT_TYPE` | `Content-Type` header used by `/epg` | `application/xml` |
|
||||
|
||||
EPG functionality is enabled iff `PZ8_RELAY_EPG_URLS` is non-empty **or**
|
||||
`PZ8_RELAY_PREFER_PLAYLIST_EPG=true`. When disabled, the relay falls back to
|
||||
the original behaviour (cached playlist served unchanged).
|
||||
|
||||
### `PREFER_PLAYLIST_EPG` semantics
|
||||
|
||||
- `false` _(default)_: only `PZ8_RELAY_EPG_URLS` are used. Any `url-tvg` in the
|
||||
upstream playlist is ignored for fetching purposes (it is still rewritten in
|
||||
the served playlist).
|
||||
- `true`: if the upstream playlist declares `url-tvg=URL`, that URL becomes the
|
||||
_only_ EPG source for this refresh cycle. If the playlist has no `url-tvg`,
|
||||
fall back to `PZ8_RELAY_EPG_URLS`.
|
||||
|
||||
This is a deliberate "either/or" interpretation of "prefer". A future
|
||||
`MERGE_PLAYLIST_EPG=true` mode could add the playlist EPG on top of configured
|
||||
URLs if needed.
|
||||
|
||||
### Playlist processing
|
||||
|
||||
Streamed line-by-line. Two responsibilities:
|
||||
|
||||
1. **Rewrite the `#EXTM3U` header** (always the first non-empty line):
|
||||
- If it has `url-tvg="..."`, replace the URL with `<PUBLIC_URL>/epg`.
|
||||
- If it does not, append `url-tvg="<PUBLIC_URL>/epg"` (only when EPG is
|
||||
enabled).
|
||||
- Capture the original `url-tvg` value (if any) for use by
|
||||
`PREFER_PLAYLIST_EPG`.
|
||||
2. **Extract tvg-ids** from each `#EXTINF:` line via a small regex
|
||||
(`tvg-id="([^"]*)"`). Empty/missing ids are skipped.
|
||||
|
||||
Output is written to `<cache>/playlist.m3u.tmp` and `os.Rename`d into place
|
||||
once the stream completes.
|
||||
|
||||
### EPG fetching, parsing, merging
|
||||
|
||||
Per source URL:
|
||||
|
||||
1. HTTP GET; if the response or URL ends in `.gz` (or `Content-Encoding: gzip`),
|
||||
wrap the body in `gzip.NewReader`.
|
||||
2. Stream the body to a per-source uncompressed temp file
|
||||
(`<cache>/epg-src-<n>.xml.tmp`). Disk-buffered, not in memory.
|
||||
3. After all sources are downloaded, run a **two-pass merge** writing into
|
||||
`<cache>/epg.xml.tmp`:
|
||||
- **Pass 1 — channels.** For each source in order, stream `<channel>`
|
||||
elements with `xml.Decoder.Token`. For each channel:
|
||||
- Normalize the id (see below).
|
||||
- If it maps to a playlist tvg-id and the canonical id has not yet been
|
||||
seen, rewrite the `id` attribute to the canonical (playlist) form and
|
||||
emit it. Record `canonical_id → source_index` so programmes from
|
||||
_other_ sources for that channel can be ignored.
|
||||
- Otherwise drop.
|
||||
- **Pass 2 — programmes.** For each source, stream `<programme>` elements.
|
||||
Emit only those whose normalized `channel` attribute maps to a canonical
|
||||
id owned by this source, after rewriting the `channel` attribute.
|
||||
4. The merged output is written through a `gzip.Writer` so the canonical
|
||||
on-disk artifact is `<cache>/epg.xml.gz`. `os.Rename` it into place.
|
||||
5. Delete the per-source temp files.
|
||||
|
||||
### Serving the EPG with gzip
|
||||
|
||||
The on-disk file is gzipped (XMLTV compresses ~10×, which is meaningful given
|
||||
typical sizes). The `/epg` handler dispatches based on `Accept-Encoding`:
|
||||
|
||||
- **Client sends `gzip`**: serve `<cache>/epg.xml.gz` directly with
|
||||
`Content-Encoding: gzip` and the configured `Content-Type`, via
|
||||
`http.ServeContent` (handles `Last-Modified` / `If-Modified-Since`
|
||||
automatically).
|
||||
- **Client does not**: open the file, wrap in `gzip.NewReader`, set
|
||||
`Last-Modified` from the file mtime, handle `If-Modified-Since` manually
|
||||
(one stat + one comparison), and stream the decompressed bytes out. No
|
||||
`Content-Length` in this branch.
|
||||
|
||||
Most modern client apps and HTTP libraries advertise gzip, so the cheap path
|
||||
is the common path.
|
||||
|
||||
Two passes over disk-buffered files (rather than streaming all sources in one
|
||||
pass) is the simplest correct approach: XMLTV files emit all channels before
|
||||
all programmes, but the relative order across sources is unconstrained. The
|
||||
disk overhead is bounded (tens of MB) and acceptable.
|
||||
|
||||
### Channel id normalization
|
||||
|
||||
Goal: collapse "Italia 1" / "Italia 1.it" / "italia1" to the same key, then map
|
||||
each EPG channel to the playlist's exact `tvg-id` value.
|
||||
|
||||
Algorithm (`internal/idnorm`):
|
||||
|
||||
1. Lowercase.
|
||||
2. Strip a trailing dotted country code if present (`.it`, `.uk`, `.de`,
|
||||
`.com`, `.us` — small explicit allow-list to avoid eating real id chars).
|
||||
3. Remove all non-alphanumeric characters.
|
||||
4. The result is the _normalization key_.
|
||||
|
||||
At refresh time, build a map `normKey → playlist_tvg_id` from the playlist's
|
||||
extracted ids. During EPG processing, normalize each EPG channel id, look it
|
||||
up in this map, and rewrite it to the playlist value. Misses are dropped.
|
||||
|
||||
This is heuristic and will mis-match in edge cases. We accept that for v1.
|
||||
A future `PZ8_RELAY_CHANNEL_MAP` file (manual `epg_id → tvg_id` overrides)
|
||||
is the planned escape hatch if the normalizer proves insufficient — not built
|
||||
until the need is real.
|
||||
|
||||
### HTTP endpoints
|
||||
|
||||
| Path | Method | Auth | Behaviour |
|
||||
| ----------- | ------ | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `/playlist` | GET | basic auth | serves `<cache>/playlist.m3u` via `http.ServeContent`; `Content-Type: application/vnd.apple.mpegurl` |
|
||||
| `/epg` | GET | basic auth | serves the merged EPG; `Content-Type` from `PZ8_RELAY_EPG_CONTENT_TYPE` (default `application/xml`). Honors `Accept-Encoding: gzip` (see below) |
|
||||
| `/healthz` | GET | none | 200 if both cache files exist and are < 2× refresh interval old, else 503 |
|
||||
|
||||
For backwards compatibility with existing devices, `/` continues to serve the
|
||||
playlist (alias for `/playlist`).
|
||||
|
||||
`Last-Modified` and `If-Modified-Since` are handled automatically by
|
||||
`http.ServeContent`, which is useful for client devices that poll.
|
||||
|
||||
### Failure modes
|
||||
|
||||
- **First refresh has not completed yet.** `/playlist` and `/epg` return 503
|
||||
with a `Retry-After: 30` header.
|
||||
- **Subsequent refresh fails.** Keep serving the previous file; log at WARN.
|
||||
The next tick retries.
|
||||
- **Single EPG source fails** (within an otherwise successful refresh).
|
||||
Continue with the remaining sources; log at WARN. The merged output reflects
|
||||
what we did get.
|
||||
- **Playlist refresh fails but EPG is configured.** Skip the EPG refresh for
|
||||
this cycle (we cannot filter without a fresh tvg-id set; reusing the
|
||||
previous one is acceptable as a future improvement but adds state).
|
||||
|
||||
### Concurrency model
|
||||
|
||||
- Single background goroutine drives the refresh loop; no fan-out.
|
||||
- HTTP handlers only do `http.ServeFile`-style reads on stable paths.
|
||||
- Writes are atomic via `os.Rename`, so readers either see the old file or the
|
||||
new file, never a partial one. No locks needed.
|
||||
- A `sync.Once`-guarded "first refresh done" channel lets handlers respond 503
|
||||
cleanly until the first cycle completes.
|
||||
|
||||
### File layout
|
||||
|
||||
Single Go module, multiple files in `package main` (the project is small):
|
||||
|
||||
```
|
||||
main.go // wiring, http server, basic-auth middleware, env parsing
|
||||
worker.go // periodic refresh loop
|
||||
playlist.go // streaming playlist read/rewrite + tvg-id extraction
|
||||
epg.go // EPG fetch, gunzip, two-pass merge, id rewriting
|
||||
idnorm.go // channel id normalization
|
||||
```
|
||||
|
||||
If complexity grows we can promote to `internal/...` packages later.
|
||||
|
||||
### Testing strategy
|
||||
|
||||
Minimal, table-driven, focused on the parts where logic — not plumbing — can
|
||||
silently produce a wrong file. Go's `testdata/` convention houses fixtures.
|
||||
|
||||
| File | What it covers |
|
||||
| ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `idnorm_test.go` | Pure normalization. Table cases include `"Italia 1"` ↔ `"Italia 1.it"`, casing, embedded spaces/punctuation, the dotted-suffix allow-list (and ids that _look_ like they have a suffix but should be left alone). |
|
||||
| `playlist_test.go` | Stream the rewriter against small inline `#EXTM3U` inputs: (a) header with existing `url-tvg=` is replaced, (b) header without it gets one appended, (c) the original `url-tvg` value is captured for `PREFER_PLAYLIST_EPG`, (d) extracted `tvg-id` set matches expectation, (e) all non-header lines pass through byte-for-byte. |
|
||||
| `epg_test.go` | Merge two small XMLTV fixtures in `testdata/`: verify channel deduplication (first-source wins), id rewriting to canonical form, programmes filtered to known channels, programmes from the _non-owning_ source for a duplicate channel are dropped. Verify gunzip path by feeding one fixture as `.xml.gz`. |
|
||||
|
||||
Out of scope for v1 tests:
|
||||
|
||||
- HTTP wiring / handler glue (covered implicitly by manual smoke testing).
|
||||
- The `time.Ticker` worker loop (a thin shell around the testable units).
|
||||
- Filesystem atomicity (stdlib).
|
||||
|
||||
Fixtures should be tiny (a handful of channels, a couple of programmes each)
|
||||
and committed under `testdata/`. No network in tests — every fetch path takes
|
||||
a `func(ctx) (io.ReadCloser, error)` or similar so tests inject readers.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**
|
||||
|
||||
- Devices configured once with `<relay>/playlist` get both playlist and EPG
|
||||
with no extra setup.
|
||||
- Upstream is hit on the refresh schedule, not per client request.
|
||||
- Atomic writes plus stable URLs make the relay robust to partial failures.
|
||||
- All processing is streaming/disk-buffered; memory footprint stays small
|
||||
even as EPGs grow.
|
||||
|
||||
**Negative**
|
||||
|
||||
- We now own a cache directory. Container runs need a writable volume (the
|
||||
distroless `nonroot` user must own `<cache>`). The Dockerfile and
|
||||
deployment must mount one or fall back to `/tmp`.
|
||||
- Channel id matching is heuristic; some channels will lack EPG data even
|
||||
when both sides "know about" them.
|
||||
- Behaviour shift from live proxy to cached playlist is a semantic change —
|
||||
if a stream endpoint inside the playlist is rotated by upstream, devices
|
||||
see the old URL until the next refresh cycle.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
1. **Keep the playlist as a live reverse proxy and rewrite on the fly.**
|
||||
Simpler in spirit but means re-fetching ~10 MB on every device poll, and
|
||||
we still need a periodic playlist fetch for EPG filtering — so we end up
|
||||
doing the work twice. Rejected.
|
||||
2. **Skip channel-id normalization; serve the EPG verbatim and ask the user
|
||||
to fix client-side.** Defeats the purpose of "set it and forget it".
|
||||
Rejected.
|
||||
3. **In-memory merged EPG.** A 50–200 MB string in memory is ugly for a
|
||||
service that otherwise has a tiny footprint, and it loses the ability to
|
||||
serve cleanly across restarts. Rejected.
|
||||
4. **Re-emit EPG as gzip on disk.** Saves disk and bandwidth but complicates
|
||||
`If-Modified-Since` and conditional requests. Defer until measured.
|
||||
|
||||
## Open questions
|
||||
|
||||
_None outstanding._
|
||||
Reference in New Issue
Block a user