Files
2026-05-15 01:23:51 -04:00

113 lines
4.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ASCII TTS Example — "Warm Machine Oracle"
A self-contained Python demo (mode #6: **TTS narration ASCII video**) that:
1. Synthesizes a short narration locally with `espeak-ng` (no API keys).
2. Renders a colored ASCII-glyph grid reactive to the narration's energy.
3. Muxes everything into an MP4 with `ffmpeg`.
The output sits at `output/tts_ascii_example.mp4` — about 1012 seconds at
960×540, 24 fps.
## Creative concept
Mood: **warm machine oracle** — a glowing quote being typed into a living
terminal.
Visual arc:
- **Amber data fog.** Quiet, low-frequency noise field in amber. The
terminal "hums".
- **Cyan/purple signal rings.** Concentric rings ripple outward from the
centre, modulated by the narration's RMS and transients.
- **Punctuation star-map iris.** Around the line *"small text can hold a
whole universe"*, the background briefly irises into a star map built
from `.,'`*`+`oO` glyphs.
- **Title card resolve.** A glowing title settles into place: *WARM
MACHINE ORACLE — ascii / tts / signal*.
A typewriter quote overlay types the line in sync with the narration timing,
backed by a soft scanline/grain/bloom CRT-ish postprocess.
## Requirements
The script expects these locally available binaries and Python packages
(installed system-wide in this environment):
- `espeak-ng` (or `espeak` as fallback) — local TTS, no API keys.
- `ffmpeg`, `ffprobe` — encode + verify.
- Python 3.10+, `numpy`, `Pillow`. `scipy` is **not** required.
No `pip install` step is performed — the script will use what is present.
## Run
```sh
cd /root/src/ascii-tts-example
python3 render_tts_ascii.py
```
You should see numbered progress lines for each pipeline stage. On success
the script prints `[OK] ...tts_ascii_example.mp4 — video+audio, ~11s` and
exits 0.
## Output
```
output/tts_ascii_example.mp4 final video (H.264 + AAC, 960x540, 24fps)
_tmp/narration_raw.wav raw espeak-ng synthesis
_tmp/narration_pad.wav padded/trimmed audio used for mux
_tmp/frames/f_00000.png ... individual rendered frames
_tmp/logs/*.log full stdout+stderr for every subprocess
```
The `_tmp/` directory is safe to delete after the render.
## How it works (high level)
- **TTS** — `espeak-ng -v en+m3 -s 148 -p 38` synthesizes a slightly slow,
lower-pitched male voice for the "oracle" feel.
- **Audio features** — the WAV is decoded with `wave`, converted to mono
float32, and per-video-frame RMS + transient (positive-rectified energy
diff with exponential decay) signals are computed. These drive intensity
and a centre-radiating pulse.
- **Field synthesis** — three numpy fields are summed with trapezoidal
section weights: amber fog (sum of low-frequency sinusoids), cyan/purple
rings (`sin(r·k - t·ω)` with falloff), and a sparse twinkling star map.
- **Glyph grid** — character sprites are pre-rasterised into an
`(n_chars, cell_h, cell_w)` uint8 alpha atlas. Each frame indexes into
the atlas with vectorised numpy lookups (`sprites[grid_indices]`), then
transpose/reshape into a single full-canvas alpha image — no Python-level
per-cell drawing loop.
- **Per-cell colour** — each section contributes a weighted colour
(amber/teal-purple/hot-amber), normalised by the active section weights
and multiplied by intensity.
- **Adaptive tonemap** — 96th-percentile normalisation keeps frames from
flattening when the field's overall energy is low.
- **Postprocess** — alternate-row scanlines, additive monochrome grain,
cheap "bloom" via Gaussian-blurred bright channel, and a soft vignette.
- **Overlays** — typewriter quote (with blinking cursor and a glowing
blurred copy behind the sharp text) and a final title card, both drawn
on top of the postprocessed image.
- **Font / palette safety** — preferred glyph palettes use Unicode block
characters (`▒▓█●★`); each is probed against the chosen font via
`font.getmask(ch)` and ASCII-safe fallbacks are used if any are missing.
## ffmpeg pipeline notes
The script never pipes raw frames to ffmpeg's stdin. Frames are written as
PNGs and ffmpeg reads them via the `f_%05d.png` glob. All ffmpeg/ffprobe
invocations redirect both stdout and stderr to log files in `_tmp/logs/`,
so there are no half-filled pipes or stderr-deadlock conditions. The audio
is pre-padded (or trimmed) to exactly match the video duration in Python,
so the mux command is a simple `-map 0:v -map 1:a -t <dur>` with no
filter graph.
## Tuning
- **Narration text** — edit `NARRATION` near the top of the script.
- **Duration window** — `DUR_MIN`/`DUR_MAX` clamp the final video length.
- **Voice** — change the `-s` (speed), `-p` (pitch), `-v` (voice) arguments
to `espeak-ng` inside `generate_tts()`.
- **Look** — palette colours (`C_AMBER`, `C_TEAL`, etc.) and glyph palettes
(`PALETTE_*`) live at the top of the file.