295 lines
16 KiB
Markdown
295 lines
16 KiB
Markdown
# 1. EPG Management
|
||
|
||
- **Status:** Proposed
|
||
- **Date:** 2026-05-06
|
||
|
||
## Context
|
||
|
||
`pz8-relay` currently acts as a stable, authenticated reverse proxy for a single
|
||
upstream IPTV playlist URL. The upstream URL changes frequently; the relay
|
||
shields client devices from those changes.
|
||
|
||
We now want the relay to also handle EPG (Electronic Program Guide) data so that
|
||
configured devices get TV listings without any extra client-side configuration:
|
||
|
||
1. The relayed URL points to an `m3u_plus` playlist. The playlist may or may
|
||
not declare an EPG via the `url-tvg=` attribute on its `#EXTM3U` header.
|
||
2. We want to fetch one or more external EPGs (XMLTV / "TVXML" format, often
|
||
distributed gzipped, e.g. <https://www.open-epg.com/files/italy1.xml.gz>),
|
||
merge them, and serve the result from the relay.
|
||
3. Channel identifiers diverge between sources: the playlist uses
|
||
`tvg-id="Italia 1"` while open-epg uses `id="Italia 1.it"` (or similar).
|
||
The merged EPG must use ids that match the playlist's `tvg-id` values, or
|
||
client apps will not bind programmes to channels.
|
||
4. The playlist file is ~10 MB, and EPG files can be tens of MB once
|
||
decompressed. All processing must stream — no full file in memory.
|
||
|
||
## Decision
|
||
|
||
### High-level pipeline
|
||
|
||
A background worker, driven by `time.Ticker`, periodically refreshes both the
|
||
playlist and the EPG in a single sequential pipeline:
|
||
|
||
```
|
||
fetch playlist ──► rewrite header (url-tvg → /epg) ──► atomic write playlist.m3u
|
||
│
|
||
└─► extract tvg-id set (during stream)
|
||
│
|
||
▼
|
||
fetch EPG sources (configured + optional playlist EPG)
|
||
│
|
||
▼
|
||
gunzip + stream parse
|
||
│
|
||
▼
|
||
normalize ids + filter to tvg-id set
|
||
│
|
||
▼
|
||
atomic write epg.xml
|
||
```
|
||
|
||
HTTP handlers (`/playlist`, `/epg`) serve the most recent atomically-written
|
||
file via `http.ServeContent`. They never block on the worker.
|
||
|
||
### Architectural shift: proxy → cache-and-serve (playlist only)
|
||
|
||
Today `/` is a streaming reverse proxy to the upstream playlist URL. We replace
|
||
this with a periodically-refreshed local copy served from disk. Rationale:
|
||
|
||
- We must rewrite the `#EXTM3U` header anyway (to redirect `url-tvg` to our
|
||
`/epg` endpoint), and we need the playlist parsed to extract `tvg-id`s for
|
||
filtering the EPG. Doing both per-request, on every client poll, is wasteful.
|
||
- A cached playlist is more resilient: if upstream is briefly down, devices
|
||
still get the last-known-good file.
|
||
- Stream URLs _inside_ the playlist are unaffected. Clients still hit them
|
||
directly upstream — we are not proxying actual video.
|
||
|
||
### Configuration (env vars)
|
||
|
||
| Var | Purpose | Default |
|
||
| ------------------------------------------- | --------------------------------------------------------------------------- | ---------------------------- |
|
||
| `PZ8_RELAY_TARGET_URL` | upstream m3u_plus playlist URL | required |
|
||
| `PZ8_RELAY_USERNAME` / `PZ8_RELAY_PASSWORD` | basic auth | required |
|
||
| `PZ8_RELAY_LISTEN_ADDR` | listen address | `:8080` |
|
||
| `PZ8_RELAY_PUBLIC_URL` | public base URL of this relay (used to rewrite `url-tvg` to `<public>/epg`) | required when EPG is enabled |
|
||
| `PZ8_RELAY_EPG_URLS` | comma-separated list of EPG URLs (XMLTV, optionally gzipped) | empty (EPG disabled) |
|
||
| `PZ8_RELAY_EPG_REFRESH` | refresh interval (Go duration, e.g. `12h`) | `12h` |
|
||
| `PZ8_RELAY_PREFER_PLAYLIST_EPG` | use the playlist's own `url-tvg` when present (see below) | `false` |
|
||
| `PZ8_RELAY_CACHE_DIR` | directory for the cached playlist + EPG file | `/var/cache/pz8-relay` |
|
||
| `PZ8_RELAY_EPG_CONTENT_TYPE` | `Content-Type` header used by `/epg` | `application/xml` |
|
||
|
||
EPG functionality is enabled iff `PZ8_RELAY_EPG_URLS` is non-empty **or**
|
||
`PZ8_RELAY_PREFER_PLAYLIST_EPG=true`. When disabled, the relay falls back to
|
||
the original behaviour (cached playlist served unchanged).
|
||
|
||
### `PREFER_PLAYLIST_EPG` semantics
|
||
|
||
- `false` _(default)_: only `PZ8_RELAY_EPG_URLS` are used. Any `url-tvg` in the
|
||
upstream playlist is ignored for fetching purposes (it is still rewritten in
|
||
the served playlist).
|
||
- `true`: if the upstream playlist declares `url-tvg=URL`, that URL becomes the
|
||
_only_ EPG source for this refresh cycle. If the playlist has no `url-tvg`,
|
||
fall back to `PZ8_RELAY_EPG_URLS`.
|
||
|
||
This is a deliberate "either/or" interpretation of "prefer". A future
|
||
`MERGE_PLAYLIST_EPG=true` mode could add the playlist EPG on top of configured
|
||
URLs if needed.
|
||
|
||
### Playlist processing
|
||
|
||
Streamed line-by-line. Two responsibilities:
|
||
|
||
1. **Rewrite the `#EXTM3U` header** (always the first non-empty line):
|
||
- If it has `url-tvg="..."`, replace the URL with `<PUBLIC_URL>/epg`.
|
||
- If it does not, append `url-tvg="<PUBLIC_URL>/epg"` (only when EPG is
|
||
enabled).
|
||
- Capture the original `url-tvg` value (if any) for use by
|
||
`PREFER_PLAYLIST_EPG`.
|
||
2. **Extract tvg-ids** from each `#EXTINF:` line via a small regex
|
||
(`tvg-id="([^"]*)"`). Empty/missing ids are skipped.
|
||
|
||
Output is written to `<cache>/playlist.m3u.tmp` and `os.Rename`d into place
|
||
once the stream completes.
|
||
|
||
### EPG fetching, parsing, merging
|
||
|
||
Per source URL:
|
||
|
||
1. HTTP GET; if the response or URL ends in `.gz` (or `Content-Encoding: gzip`),
|
||
wrap the body in `gzip.NewReader`.
|
||
2. Stream the body to a per-source uncompressed temp file
|
||
(`<cache>/epg-src-<n>.xml.tmp`). Disk-buffered, not in memory.
|
||
3. After all sources are downloaded, run a **two-pass merge** writing into
|
||
`<cache>/epg.xml.tmp`:
|
||
- **Pass 1 — channels.** For each source in order, stream `<channel>`
|
||
elements with `xml.Decoder.Token`. For each channel:
|
||
- Normalize the id (see below).
|
||
- If it maps to a playlist tvg-id and the canonical id has not yet been
|
||
seen, rewrite the `id` attribute to the canonical (playlist) form and
|
||
emit it. Record `canonical_id → source_index` so programmes from
|
||
_other_ sources for that channel can be ignored.
|
||
- Otherwise drop.
|
||
- **Pass 2 — programmes.** For each source, stream `<programme>` elements.
|
||
Emit only those whose normalized `channel` attribute maps to a canonical
|
||
id owned by this source, after rewriting the `channel` attribute.
|
||
4. The merged output is written through a `gzip.Writer` so the canonical
|
||
on-disk artifact is `<cache>/epg.xml.gz`. `os.Rename` it into place.
|
||
5. Delete the per-source temp files.
|
||
|
||
### Serving the EPG with gzip
|
||
|
||
The on-disk file is gzipped (XMLTV compresses ~10×, which is meaningful given
|
||
typical sizes). The `/epg` handler dispatches based on `Accept-Encoding`:
|
||
|
||
- **Client sends `gzip`**: serve `<cache>/epg.xml.gz` directly with
|
||
`Content-Encoding: gzip` and the configured `Content-Type`, via
|
||
`http.ServeContent` (handles `Last-Modified` / `If-Modified-Since`
|
||
automatically).
|
||
- **Client does not**: open the file, wrap in `gzip.NewReader`, set
|
||
`Last-Modified` from the file mtime, handle `If-Modified-Since` manually
|
||
(one stat + one comparison), and stream the decompressed bytes out. No
|
||
`Content-Length` in this branch.
|
||
|
||
Most modern client apps and HTTP libraries advertise gzip, so the cheap path
|
||
is the common path.
|
||
|
||
Two passes over disk-buffered files (rather than streaming all sources in one
|
||
pass) is the simplest correct approach: XMLTV files emit all channels before
|
||
all programmes, but the relative order across sources is unconstrained. The
|
||
disk overhead is bounded (tens of MB) and acceptable.
|
||
|
||
### Channel id normalization
|
||
|
||
Goal: collapse "Italia 1" / "Italia 1.it" / "italia1" to the same key, then map
|
||
each EPG channel to the playlist's exact `tvg-id` value.
|
||
|
||
Algorithm (`internal/idnorm`):
|
||
|
||
1. Lowercase.
|
||
2. Strip a trailing dotted country code if present (`.it`, `.uk`, `.de`,
|
||
`.com`, `.us` — small explicit allow-list to avoid eating real id chars).
|
||
3. Remove all non-alphanumeric characters.
|
||
4. The result is the _normalization key_.
|
||
|
||
At refresh time, build a map `normKey → playlist_tvg_id` from the playlist's
|
||
extracted ids. During EPG processing, normalize each EPG channel id, look it
|
||
up in this map, and rewrite it to the playlist value. Misses are dropped.
|
||
|
||
This is heuristic and will mis-match in edge cases. We accept that for v1.
|
||
A future `PZ8_RELAY_CHANNEL_MAP` file (manual `epg_id → tvg_id` overrides)
|
||
is the planned escape hatch if the normalizer proves insufficient — not built
|
||
until the need is real.
|
||
|
||
### HTTP endpoints
|
||
|
||
| Path | Method | Auth | Behaviour |
|
||
| ----------- | ------ | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| `/playlist` | GET | basic auth | serves `<cache>/playlist.m3u` via `http.ServeContent`; `Content-Type: application/vnd.apple.mpegurl` |
|
||
| `/epg` | GET | basic auth | serves the merged EPG; `Content-Type` from `PZ8_RELAY_EPG_CONTENT_TYPE` (default `application/xml`). Honors `Accept-Encoding: gzip` (see below) |
|
||
| `/healthz` | GET | none | 200 if both cache files exist and are < 2× refresh interval old, else 503 |
|
||
|
||
For backwards compatibility with existing devices, `/` continues to serve the
|
||
playlist (alias for `/playlist`).
|
||
|
||
`Last-Modified` and `If-Modified-Since` are handled automatically by
|
||
`http.ServeContent`, which is useful for client devices that poll.
|
||
|
||
### Failure modes
|
||
|
||
- **First refresh has not completed yet.** `/playlist` and `/epg` return 503
|
||
with a `Retry-After: 30` header.
|
||
- **Subsequent refresh fails.** Keep serving the previous file; log at WARN.
|
||
The next tick retries.
|
||
- **Single EPG source fails** (within an otherwise successful refresh).
|
||
Continue with the remaining sources; log at WARN. The merged output reflects
|
||
what we did get.
|
||
- **Playlist refresh fails but EPG is configured.** Skip the EPG refresh for
|
||
this cycle (we cannot filter without a fresh tvg-id set; reusing the
|
||
previous one is acceptable as a future improvement but adds state).
|
||
|
||
### Concurrency model
|
||
|
||
- Single background goroutine drives the refresh loop; no fan-out.
|
||
- HTTP handlers only do `http.ServeFile`-style reads on stable paths.
|
||
- Writes are atomic via `os.Rename`, so readers either see the old file or the
|
||
new file, never a partial one. No locks needed.
|
||
- A `sync.Once`-guarded "first refresh done" channel lets handlers respond 503
|
||
cleanly until the first cycle completes.
|
||
|
||
### File layout
|
||
|
||
Single Go module, multiple files in `package main` (the project is small):
|
||
|
||
```
|
||
main.go // wiring, http server, basic-auth middleware, env parsing
|
||
worker.go // periodic refresh loop
|
||
playlist.go // streaming playlist read/rewrite + tvg-id extraction
|
||
epg.go // EPG fetch, gunzip, two-pass merge, id rewriting
|
||
idnorm.go // channel id normalization
|
||
```
|
||
|
||
If complexity grows we can promote to `internal/...` packages later.
|
||
|
||
### Testing strategy
|
||
|
||
Minimal, table-driven, focused on the parts where logic — not plumbing — can
|
||
silently produce a wrong file. Go's `testdata/` convention houses fixtures.
|
||
|
||
| File | What it covers |
|
||
| ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| `idnorm_test.go` | Pure normalization. Table cases include `"Italia 1"` ↔ `"Italia 1.it"`, casing, embedded spaces/punctuation, the dotted-suffix allow-list (and ids that _look_ like they have a suffix but should be left alone). |
|
||
| `playlist_test.go` | Stream the rewriter against small inline `#EXTM3U` inputs: (a) header with existing `url-tvg=` is replaced, (b) header without it gets one appended, (c) the original `url-tvg` value is captured for `PREFER_PLAYLIST_EPG`, (d) extracted `tvg-id` set matches expectation, (e) all non-header lines pass through byte-for-byte. |
|
||
| `epg_test.go` | Merge two small XMLTV fixtures in `testdata/`: verify channel deduplication (first-source wins), id rewriting to canonical form, programmes filtered to known channels, programmes from the _non-owning_ source for a duplicate channel are dropped. Verify gunzip path by feeding one fixture as `.xml.gz`. |
|
||
|
||
Out of scope for v1 tests:
|
||
|
||
- HTTP wiring / handler glue (covered implicitly by manual smoke testing).
|
||
- The `time.Ticker` worker loop (a thin shell around the testable units).
|
||
- Filesystem atomicity (stdlib).
|
||
|
||
Fixtures should be tiny (a handful of channels, a couple of programmes each)
|
||
and committed under `testdata/`. No network in tests — every fetch path takes
|
||
a `func(ctx) (io.ReadCloser, error)` or similar so tests inject readers.
|
||
|
||
## Consequences
|
||
|
||
**Positive**
|
||
|
||
- Devices configured once with `<relay>/playlist` get both playlist and EPG
|
||
with no extra setup.
|
||
- Upstream is hit on the refresh schedule, not per client request.
|
||
- Atomic writes plus stable URLs make the relay robust to partial failures.
|
||
- All processing is streaming/disk-buffered; memory footprint stays small
|
||
even as EPGs grow.
|
||
|
||
**Negative**
|
||
|
||
- We now own a cache directory. Container runs need a writable volume (the
|
||
distroless `nonroot` user must own `<cache>`). The Dockerfile and
|
||
deployment must mount one or fall back to `/tmp`.
|
||
- Channel id matching is heuristic; some channels will lack EPG data even
|
||
when both sides "know about" them.
|
||
- Behaviour shift from live proxy to cached playlist is a semantic change —
|
||
if a stream endpoint inside the playlist is rotated by upstream, devices
|
||
see the old URL until the next refresh cycle.
|
||
|
||
## Alternatives considered
|
||
|
||
1. **Keep the playlist as a live reverse proxy and rewrite on the fly.**
|
||
Simpler in spirit but means re-fetching ~10 MB on every device poll, and
|
||
we still need a periodic playlist fetch for EPG filtering — so we end up
|
||
doing the work twice. Rejected.
|
||
2. **Skip channel-id normalization; serve the EPG verbatim and ask the user
|
||
to fix client-side.** Defeats the purpose of "set it and forget it".
|
||
Rejected.
|
||
3. **In-memory merged EPG.** A 50–200 MB string in memory is ugly for a
|
||
service that otherwise has a tiny footprint, and it loses the ability to
|
||
serve cleanly across restarts. Rejected.
|
||
4. **Re-emit EPG as gzip on disk.** Saves disk and bandwidth but complicates
|
||
`If-Modified-Since` and conditional requests. Defer until measured.
|
||
|
||
## Open questions
|
||
|
||
_None outstanding._
|