ascii-tts-example/README.md

# ASCII TTS Example — "Warm Machine Oracle"

A self-contained Python demo (mode #6: **TTS narration ASCII video**) that:

1. Synthesizes a short narration locally with `espeak-ng` (no API keys).
2. Renders a colored ASCII-glyph grid reactive to the narration's energy.
3. Muxes everything into an MP4 with `ffmpeg`.

The output sits at `output/tts_ascii_example.mp4` — about 10–12 seconds at
960×540, 24 fps.

## Creative concept

Mood: **warm machine oracle** — a glowing quote being typed into a living
terminal.

Visual arc:
- **Amber data fog.** Quiet, low-frequency noise field in amber. The
  terminal "hums".
- **Cyan/purple signal rings.** Concentric rings ripple outward from the
  centre, modulated by the narration's RMS and transients.
- **Punctuation star-map iris.** Around the line *"small text can hold a
  whole universe"*, the background briefly irises into a star map built
  from `.,'`*`+`oO` glyphs.
- **Title card resolve.** A glowing title settles into place: *WARM
  MACHINE ORACLE — ascii / tts / signal*.

A typewriter quote overlay types the line in sync with the narration timing,
backed by a soft scanline/grain/bloom CRT-ish postprocess.

## Requirements

The script expects these locally available binaries and Python packages
(installed system-wide in this environment):

- `espeak-ng` (or `espeak` as fallback) — local TTS, no API keys.
- `ffmpeg`, `ffprobe` — encode + verify.
- Python 3.10+, `numpy`, `Pillow`. `scipy` is **not** required.

No `pip install` step is performed — the script will use what is present.

## Run

```sh
cd /root/src/ascii-tts-example
python3 render_tts_ascii.py
```

You should see numbered progress lines for each pipeline stage. On success
the script prints `[OK] ...tts_ascii_example.mp4 — video+audio, ~11s` and
exits 0.

## Output

```
output/tts_ascii_example.mp4   final video (H.264 + AAC, 960x540, 24fps)
_tmp/narration_raw.wav         raw espeak-ng synthesis
_tmp/narration_pad.wav         padded/trimmed audio used for mux
_tmp/frames/f_00000.png ...    individual rendered frames
_tmp/logs/*.log                full stdout+stderr for every subprocess
```

The `_tmp/` directory is safe to delete after the render.

## How it works (high level)

- **TTS** — `espeak-ng -v en+m3 -s 148 -p 38` synthesizes a slightly slow,
  lower-pitched male voice for the "oracle" feel.
- **Audio features** — the WAV is decoded with `wave`, converted to mono
  float32, and per-video-frame RMS + transient (positive-rectified energy
  diff with exponential decay) signals are computed. These drive intensity
  and a centre-radiating pulse.
- **Field synthesis** — three numpy fields are summed with trapezoidal
  section weights: amber fog (sum of low-frequency sinusoids), cyan/purple
  rings (`sin(r·k - t·ω)` with falloff), and a sparse twinkling star map.
- **Glyph grid** — character sprites are pre-rasterised into an
  `(n_chars, cell_h, cell_w)` uint8 alpha atlas. Each frame indexes into
  the atlas with vectorised numpy lookups (`sprites[grid_indices]`), then
  transpose/reshape into a single full-canvas alpha image — no Python-level
  per-cell drawing loop.
- **Per-cell colour** — each section contributes a weighted colour
  (amber/teal-purple/hot-amber), normalised by the active section weights
  and multiplied by intensity.
- **Adaptive tonemap** — 96th-percentile normalisation keeps frames from
  flattening when the field's overall energy is low.
- **Postprocess** — alternate-row scanlines, additive monochrome grain,
  cheap "bloom" via Gaussian-blurred bright channel, and a soft vignette.
- **Overlays** — typewriter quote (with blinking cursor and a glowing
  blurred copy behind the sharp text) and a final title card, both drawn
  on top of the postprocessed image.
- **Font / palette safety** — preferred glyph palettes use Unicode block
  characters (`▒▓█●★`); each is probed against the chosen font via
  `font.getmask(ch)` and ASCII-safe fallbacks are used if any are missing.

## ffmpeg pipeline notes

The script never pipes raw frames to ffmpeg's stdin. Frames are written as
PNGs and ffmpeg reads them via the `f_%05d.png` glob. All ffmpeg/ffprobe
invocations redirect both stdout and stderr to log files in `_tmp/logs/`,
so there are no half-filled pipes or stderr-deadlock conditions. The audio
is pre-padded (or trimmed) to exactly match the video duration in Python,
so the mux command is a simple `-map 0:v -map 1:a -t <dur>` with no
filter graph.

## Tuning

- **Narration text** — edit `NARRATION` near the top of the script.
- **Duration window** — `DUR_MIN`/`DUR_MAX` clamp the final video length.
- **Voice** — change the `-s` (speed), `-p` (pitch), `-v` (voice) arguments
  to `espeak-ng` inside `generate_tts()`.
- **Look** — palette colours (`C_AMBER`, `C_TEAL`, etc.) and glyph palettes
  (`PALETTE_*`) live at the top of the file.