113 lines
4.8 KiB
Markdown
113 lines
4.8 KiB
Markdown
# ASCII TTS Example — "Warm Machine Oracle"
|
||
|
||
A self-contained Python demo (mode #6: **TTS narration ASCII video**) that:
|
||
|
||
1. Synthesizes a short narration locally with `espeak-ng` (no API keys).
|
||
2. Renders a colored ASCII-glyph grid reactive to the narration's energy.
|
||
3. Muxes everything into an MP4 with `ffmpeg`.
|
||
|
||
The output sits at `output/tts_ascii_example.mp4` — about 10–12 seconds at
|
||
960×540, 24 fps.
|
||
|
||
## Creative concept
|
||
|
||
Mood: **warm machine oracle** — a glowing quote being typed into a living
|
||
terminal.
|
||
|
||
Visual arc:
|
||
- **Amber data fog.** Quiet, low-frequency noise field in amber. The
|
||
terminal "hums".
|
||
- **Cyan/purple signal rings.** Concentric rings ripple outward from the
|
||
centre, modulated by the narration's RMS and transients.
|
||
- **Punctuation star-map iris.** Around the line *"small text can hold a
|
||
whole universe"*, the background briefly irises into a star map built
|
||
from `.,'`*`+`oO` glyphs.
|
||
- **Title card resolve.** A glowing title settles into place: *WARM
|
||
MACHINE ORACLE — ascii / tts / signal*.
|
||
|
||
A typewriter quote overlay types the line in sync with the narration timing,
|
||
backed by a soft scanline/grain/bloom CRT-ish postprocess.
|
||
|
||
## Requirements
|
||
|
||
The script expects these locally available binaries and Python packages
|
||
(installed system-wide in this environment):
|
||
|
||
- `espeak-ng` (or `espeak` as fallback) — local TTS, no API keys.
|
||
- `ffmpeg`, `ffprobe` — encode + verify.
|
||
- Python 3.10+, `numpy`, `Pillow`. `scipy` is **not** required.
|
||
|
||
No `pip install` step is performed — the script will use what is present.
|
||
|
||
## Run
|
||
|
||
```sh
|
||
cd /root/src/ascii-tts-example
|
||
python3 render_tts_ascii.py
|
||
```
|
||
|
||
You should see numbered progress lines for each pipeline stage. On success
|
||
the script prints `[OK] ...tts_ascii_example.mp4 — video+audio, ~11s` and
|
||
exits 0.
|
||
|
||
## Output
|
||
|
||
```
|
||
output/tts_ascii_example.mp4 final video (H.264 + AAC, 960x540, 24fps)
|
||
_tmp/narration_raw.wav raw espeak-ng synthesis
|
||
_tmp/narration_pad.wav padded/trimmed audio used for mux
|
||
_tmp/frames/f_00000.png ... individual rendered frames
|
||
_tmp/logs/*.log full stdout+stderr for every subprocess
|
||
```
|
||
|
||
The `_tmp/` directory is safe to delete after the render.
|
||
|
||
## How it works (high level)
|
||
|
||
- **TTS** — `espeak-ng -v en+m3 -s 148 -p 38` synthesizes a slightly slow,
|
||
lower-pitched male voice for the "oracle" feel.
|
||
- **Audio features** — the WAV is decoded with `wave`, converted to mono
|
||
float32, and per-video-frame RMS + transient (positive-rectified energy
|
||
diff with exponential decay) signals are computed. These drive intensity
|
||
and a centre-radiating pulse.
|
||
- **Field synthesis** — three numpy fields are summed with trapezoidal
|
||
section weights: amber fog (sum of low-frequency sinusoids), cyan/purple
|
||
rings (`sin(r·k - t·ω)` with falloff), and a sparse twinkling star map.
|
||
- **Glyph grid** — character sprites are pre-rasterised into an
|
||
`(n_chars, cell_h, cell_w)` uint8 alpha atlas. Each frame indexes into
|
||
the atlas with vectorised numpy lookups (`sprites[grid_indices]`), then
|
||
transpose/reshape into a single full-canvas alpha image — no Python-level
|
||
per-cell drawing loop.
|
||
- **Per-cell colour** — each section contributes a weighted colour
|
||
(amber/teal-purple/hot-amber), normalised by the active section weights
|
||
and multiplied by intensity.
|
||
- **Adaptive tonemap** — 96th-percentile normalisation keeps frames from
|
||
flattening when the field's overall energy is low.
|
||
- **Postprocess** — alternate-row scanlines, additive monochrome grain,
|
||
cheap "bloom" via Gaussian-blurred bright channel, and a soft vignette.
|
||
- **Overlays** — typewriter quote (with blinking cursor and a glowing
|
||
blurred copy behind the sharp text) and a final title card, both drawn
|
||
on top of the postprocessed image.
|
||
- **Font / palette safety** — preferred glyph palettes use Unicode block
|
||
characters (`▒▓█●★`); each is probed against the chosen font via
|
||
`font.getmask(ch)` and ASCII-safe fallbacks are used if any are missing.
|
||
|
||
## ffmpeg pipeline notes
|
||
|
||
The script never pipes raw frames to ffmpeg's stdin. Frames are written as
|
||
PNGs and ffmpeg reads them via the `f_%05d.png` glob. All ffmpeg/ffprobe
|
||
invocations redirect both stdout and stderr to log files in `_tmp/logs/`,
|
||
so there are no half-filled pipes or stderr-deadlock conditions. The audio
|
||
is pre-padded (or trimmed) to exactly match the video duration in Python,
|
||
so the mux command is a simple `-map 0:v -map 1:a -t <dur>` with no
|
||
filter graph.
|
||
|
||
## Tuning
|
||
|
||
- **Narration text** — edit `NARRATION` near the top of the script.
|
||
- **Duration window** — `DUR_MIN`/`DUR_MAX` clamp the final video length.
|
||
- **Voice** — change the `-s` (speed), `-p` (pitch), `-v` (voice) arguments
|
||
to `espeak-ng` inside `generate_tts()`.
|
||
- **Look** — palette colours (`C_AMBER`, `C_TEAL`, etc.) and glyph palettes
|
||
(`PALETTE_*`) live at the top of the file.
|