Files
pz8-relay/adrs/01-epg-management.md
T
2026-05-06 10:58:32 +02:00

16 KiB
Raw Blame History

1. EPG Management

  • Status: Proposed
  • Date: 2026-05-06

Context

pz8-relay currently acts as a stable, authenticated reverse proxy for a single upstream IPTV playlist URL. The upstream URL changes frequently; the relay shields client devices from those changes.

We now want the relay to also handle EPG (Electronic Program Guide) data so that configured devices get TV listings without any extra client-side configuration:

  1. The relayed URL points to an m3u_plus playlist. The playlist may or may not declare an EPG via the url-tvg= attribute on its #EXTM3U header.
  2. We want to fetch one or more external EPGs (XMLTV / "TVXML" format, often distributed gzipped, e.g. https://www.open-epg.com/files/italy1.xml.gz), merge them, and serve the result from the relay.
  3. Channel identifiers diverge between sources: the playlist uses tvg-id="Italia 1" while open-epg uses id="Italia 1.it" (or similar). The merged EPG must use ids that match the playlist's tvg-id values, or client apps will not bind programmes to channels.
  4. The playlist file is ~10 MB, and EPG files can be tens of MB once decompressed. All processing must stream — no full file in memory.

Decision

High-level pipeline

A background worker, driven by time.Ticker, periodically refreshes both the playlist and the EPG in a single sequential pipeline:

fetch playlist  ──►  rewrite header (url-tvg → /epg)  ──►  atomic write playlist.m3u
       │
       └─► extract tvg-id set (during stream)
                                      │
                                      ▼
                          fetch EPG sources (configured + optional playlist EPG)
                                      │
                                      ▼
                              gunzip + stream parse
                                      │
                                      ▼
                          normalize ids + filter to tvg-id set
                                      │
                                      ▼
                                 atomic write epg.xml

HTTP handlers (/playlist, /epg) serve the most recent atomically-written file via http.ServeContent. They never block on the worker.

Architectural shift: proxy → cache-and-serve (playlist only)

Today / is a streaming reverse proxy to the upstream playlist URL. We replace this with a periodically-refreshed local copy served from disk. Rationale:

  • We must rewrite the #EXTM3U header anyway (to redirect url-tvg to our /epg endpoint), and we need the playlist parsed to extract tvg-ids for filtering the EPG. Doing both per-request, on every client poll, is wasteful.
  • A cached playlist is more resilient: if upstream is briefly down, devices still get the last-known-good file.
  • Stream URLs inside the playlist are unaffected. Clients still hit them directly upstream — we are not proxying actual video.

Configuration (env vars)

Var Purpose Default
PZ8_RELAY_TARGET_URL upstream m3u_plus playlist URL required
PZ8_RELAY_USERNAME / PZ8_RELAY_PASSWORD basic auth required
PZ8_RELAY_LISTEN_ADDR listen address :8080
PZ8_RELAY_PUBLIC_URL public base URL of this relay (used to rewrite url-tvg to <public>/epg) required when EPG is enabled
PZ8_RELAY_EPG_URLS comma-separated list of EPG URLs (XMLTV, optionally gzipped) empty (EPG disabled)
PZ8_RELAY_EPG_REFRESH refresh interval (Go duration, e.g. 12h) 12h
PZ8_RELAY_PREFER_PLAYLIST_EPG use the playlist's own url-tvg when present (see below) false
PZ8_RELAY_CACHE_DIR directory for the cached playlist + EPG file /var/cache/pz8-relay
PZ8_RELAY_EPG_CONTENT_TYPE Content-Type header used by /epg application/xml

EPG functionality is enabled iff PZ8_RELAY_EPG_URLS is non-empty or PZ8_RELAY_PREFER_PLAYLIST_EPG=true. When disabled, the relay falls back to the original behaviour (cached playlist served unchanged).

PREFER_PLAYLIST_EPG semantics

  • false (default): only PZ8_RELAY_EPG_URLS are used. Any url-tvg in the upstream playlist is ignored for fetching purposes (it is still rewritten in the served playlist).
  • true: if the upstream playlist declares url-tvg=URL, that URL becomes the only EPG source for this refresh cycle. If the playlist has no url-tvg, fall back to PZ8_RELAY_EPG_URLS.

This is a deliberate "either/or" interpretation of "prefer". A future MERGE_PLAYLIST_EPG=true mode could add the playlist EPG on top of configured URLs if needed.

Playlist processing

Streamed line-by-line. Two responsibilities:

  1. Rewrite the #EXTM3U header (always the first non-empty line):
    • If it has url-tvg="...", replace the URL with <PUBLIC_URL>/epg.
    • If it does not, append url-tvg="<PUBLIC_URL>/epg" (only when EPG is enabled).
    • Capture the original url-tvg value (if any) for use by PREFER_PLAYLIST_EPG.
  2. Extract tvg-ids from each #EXTINF: line via a small regex (tvg-id="([^"]*)"). Empty/missing ids are skipped.

Output is written to <cache>/playlist.m3u.tmp and os.Renamed into place once the stream completes.

EPG fetching, parsing, merging

Per source URL:

  1. HTTP GET; if the response or URL ends in .gz (or Content-Encoding: gzip), wrap the body in gzip.NewReader.
  2. Stream the body to a per-source uncompressed temp file (<cache>/epg-src-<n>.xml.tmp). Disk-buffered, not in memory.
  3. After all sources are downloaded, run a two-pass merge writing into <cache>/epg.xml.tmp:
    • Pass 1 — channels. For each source in order, stream <channel> elements with xml.Decoder.Token. For each channel:
      • Normalize the id (see below).
      • If it maps to a playlist tvg-id and the canonical id has not yet been seen, rewrite the id attribute to the canonical (playlist) form and emit it. Record canonical_id → source_index so programmes from other sources for that channel can be ignored.
      • Otherwise drop.
    • Pass 2 — programmes. For each source, stream <programme> elements. Emit only those whose normalized channel attribute maps to a canonical id owned by this source, after rewriting the channel attribute.
  4. The merged output is written through a gzip.Writer so the canonical on-disk artifact is <cache>/epg.xml.gz. os.Rename it into place.
  5. Delete the per-source temp files.

Serving the EPG with gzip

The on-disk file is gzipped (XMLTV compresses ~10×, which is meaningful given typical sizes). The /epg handler dispatches based on Accept-Encoding:

  • Client sends gzip: serve <cache>/epg.xml.gz directly with Content-Encoding: gzip and the configured Content-Type, via http.ServeContent (handles Last-Modified / If-Modified-Since automatically).
  • Client does not: open the file, wrap in gzip.NewReader, set Last-Modified from the file mtime, handle If-Modified-Since manually (one stat + one comparison), and stream the decompressed bytes out. No Content-Length in this branch.

Most modern client apps and HTTP libraries advertise gzip, so the cheap path is the common path.

Two passes over disk-buffered files (rather than streaming all sources in one pass) is the simplest correct approach: XMLTV files emit all channels before all programmes, but the relative order across sources is unconstrained. The disk overhead is bounded (tens of MB) and acceptable.

Channel id normalization

Goal: collapse "Italia 1" / "Italia 1.it" / "italia1" to the same key, then map each EPG channel to the playlist's exact tvg-id value.

Algorithm (internal/idnorm):

  1. Lowercase.
  2. Strip a trailing dotted country code if present (.it, .uk, .de, .com, .us — small explicit allow-list to avoid eating real id chars).
  3. Remove all non-alphanumeric characters.
  4. The result is the normalization key.

At refresh time, build a map normKey → playlist_tvg_id from the playlist's extracted ids. During EPG processing, normalize each EPG channel id, look it up in this map, and rewrite it to the playlist value. Misses are dropped.

This is heuristic and will mis-match in edge cases. We accept that for v1. A future PZ8_RELAY_CHANNEL_MAP file (manual epg_id → tvg_id overrides) is the planned escape hatch if the normalizer proves insufficient — not built until the need is real.

HTTP endpoints

Path Method Auth Behaviour
/playlist GET basic auth serves <cache>/playlist.m3u via http.ServeContent; Content-Type: application/vnd.apple.mpegurl
/epg GET basic auth serves the merged EPG; Content-Type from PZ8_RELAY_EPG_CONTENT_TYPE (default application/xml). Honors Accept-Encoding: gzip (see below)
/healthz GET none 200 if both cache files exist and are < 2× refresh interval old, else 503

For backwards compatibility with existing devices, / continues to serve the playlist (alias for /playlist).

Last-Modified and If-Modified-Since are handled automatically by http.ServeContent, which is useful for client devices that poll.

Failure modes

  • First refresh has not completed yet. /playlist and /epg return 503 with a Retry-After: 30 header.
  • Subsequent refresh fails. Keep serving the previous file; log at WARN. The next tick retries.
  • Single EPG source fails (within an otherwise successful refresh). Continue with the remaining sources; log at WARN. The merged output reflects what we did get.
  • Playlist refresh fails but EPG is configured. Skip the EPG refresh for this cycle (we cannot filter without a fresh tvg-id set; reusing the previous one is acceptable as a future improvement but adds state).

Concurrency model

  • Single background goroutine drives the refresh loop; no fan-out.
  • HTTP handlers only do http.ServeFile-style reads on stable paths.
  • Writes are atomic via os.Rename, so readers either see the old file or the new file, never a partial one. No locks needed.
  • A sync.Once-guarded "first refresh done" channel lets handlers respond 503 cleanly until the first cycle completes.

File layout

Single Go module, multiple files in package main (the project is small):

main.go            // wiring, http server, basic-auth middleware, env parsing
worker.go          // periodic refresh loop
playlist.go        // streaming playlist read/rewrite + tvg-id extraction
epg.go             // EPG fetch, gunzip, two-pass merge, id rewriting
idnorm.go          // channel id normalization

If complexity grows we can promote to internal/... packages later.

Testing strategy

Minimal, table-driven, focused on the parts where logic — not plumbing — can silently produce a wrong file. Go's testdata/ convention houses fixtures.

File What it covers
idnorm_test.go Pure normalization. Table cases include "Italia 1""Italia 1.it", casing, embedded spaces/punctuation, the dotted-suffix allow-list (and ids that look like they have a suffix but should be left alone).
playlist_test.go Stream the rewriter against small inline #EXTM3U inputs: (a) header with existing url-tvg= is replaced, (b) header without it gets one appended, (c) the original url-tvg value is captured for PREFER_PLAYLIST_EPG, (d) extracted tvg-id set matches expectation, (e) all non-header lines pass through byte-for-byte.
epg_test.go Merge two small XMLTV fixtures in testdata/: verify channel deduplication (first-source wins), id rewriting to canonical form, programmes filtered to known channels, programmes from the non-owning source for a duplicate channel are dropped. Verify gunzip path by feeding one fixture as .xml.gz.

Out of scope for v1 tests:

  • HTTP wiring / handler glue (covered implicitly by manual smoke testing).
  • The time.Ticker worker loop (a thin shell around the testable units).
  • Filesystem atomicity (stdlib).

Fixtures should be tiny (a handful of channels, a couple of programmes each) and committed under testdata/. No network in tests — every fetch path takes a func(ctx) (io.ReadCloser, error) or similar so tests inject readers.

Consequences

Positive

  • Devices configured once with <relay>/playlist get both playlist and EPG with no extra setup.
  • Upstream is hit on the refresh schedule, not per client request.
  • Atomic writes plus stable URLs make the relay robust to partial failures.
  • All processing is streaming/disk-buffered; memory footprint stays small even as EPGs grow.

Negative

  • We now own a cache directory. Container runs need a writable volume (the distroless nonroot user must own <cache>). The Dockerfile and deployment must mount one or fall back to /tmp.
  • Channel id matching is heuristic; some channels will lack EPG data even when both sides "know about" them.
  • Behaviour shift from live proxy to cached playlist is a semantic change — if a stream endpoint inside the playlist is rotated by upstream, devices see the old URL until the next refresh cycle.

Alternatives considered

  1. Keep the playlist as a live reverse proxy and rewrite on the fly. Simpler in spirit but means re-fetching ~10 MB on every device poll, and we still need a periodic playlist fetch for EPG filtering — so we end up doing the work twice. Rejected.
  2. Skip channel-id normalization; serve the EPG verbatim and ask the user to fix client-side. Defeats the purpose of "set it and forget it". Rejected.
  3. In-memory merged EPG. A 50200 MB string in memory is ugly for a service that otherwise has a tiny footprint, and it loses the ability to serve cleanly across restarts. Rejected.
  4. Re-emit EPG as gzip on disk. Saves disk and bandwidth but complicates If-Modified-Since and conditional requests. Defer until measured.

Open questions

None outstanding.