ASCII TTS Example — "Warm Machine Oracle"
A self-contained Python demo (mode #6: TTS narration ASCII video) that:
- Synthesizes a short narration locally with
espeak-ng(no API keys). - Renders a colored ASCII-glyph grid reactive to the narration's energy.
- Muxes everything into an MP4 with
ffmpeg.
The output sits at output/tts_ascii_example.mp4 — about 10–12 seconds at
960×540, 24 fps.
Creative concept
Mood: warm machine oracle — a glowing quote being typed into a living terminal.
Visual arc:
- Amber data fog. Quiet, low-frequency noise field in amber. The terminal "hums".
- Cyan/purple signal rings. Concentric rings ripple outward from the centre, modulated by the narration's RMS and transients.
- Punctuation star-map iris. Around the line "small text can hold a
whole universe", the background briefly irises into a star map built
from
.,'*+oO` glyphs. - Title card resolve. A glowing title settles into place: WARM MACHINE ORACLE — ascii / tts / signal.
A typewriter quote overlay types the line in sync with the narration timing, backed by a soft scanline/grain/bloom CRT-ish postprocess.
Requirements
The script expects these locally available binaries and Python packages (installed system-wide in this environment):
espeak-ng(orespeakas fallback) — local TTS, no API keys.ffmpeg,ffprobe— encode + verify.- Python 3.10+,
numpy,Pillow.scipyis not required.
No pip install step is performed — the script will use what is present.
Run
cd /root/src/ascii-tts-example
python3 render_tts_ascii.py
You should see numbered progress lines for each pipeline stage. On success
the script prints [OK] ...tts_ascii_example.mp4 — video+audio, ~11s and
exits 0.
Output
output/tts_ascii_example.mp4 final video (H.264 + AAC, 960x540, 24fps)
_tmp/narration_raw.wav raw espeak-ng synthesis
_tmp/narration_pad.wav padded/trimmed audio used for mux
_tmp/frames/f_00000.png ... individual rendered frames
_tmp/logs/*.log full stdout+stderr for every subprocess
The _tmp/ directory is safe to delete after the render.
How it works (high level)
- TTS —
espeak-ng -v en+m3 -s 148 -p 38synthesizes a slightly slow, lower-pitched male voice for the "oracle" feel. - Audio features — the WAV is decoded with
wave, converted to mono float32, and per-video-frame RMS + transient (positive-rectified energy diff with exponential decay) signals are computed. These drive intensity and a centre-radiating pulse. - Field synthesis — three numpy fields are summed with trapezoidal
section weights: amber fog (sum of low-frequency sinusoids), cyan/purple
rings (
sin(r·k - t·ω)with falloff), and a sparse twinkling star map. - Glyph grid — character sprites are pre-rasterised into an
(n_chars, cell_h, cell_w)uint8 alpha atlas. Each frame indexes into the atlas with vectorised numpy lookups (sprites[grid_indices]), then transpose/reshape into a single full-canvas alpha image — no Python-level per-cell drawing loop. - Per-cell colour — each section contributes a weighted colour (amber/teal-purple/hot-amber), normalised by the active section weights and multiplied by intensity.
- Adaptive tonemap — 96th-percentile normalisation keeps frames from flattening when the field's overall energy is low.
- Postprocess — alternate-row scanlines, additive monochrome grain, cheap "bloom" via Gaussian-blurred bright channel, and a soft vignette.
- Overlays — typewriter quote (with blinking cursor and a glowing blurred copy behind the sharp text) and a final title card, both drawn on top of the postprocessed image.
- Font / palette safety — preferred glyph palettes use Unicode block
characters (
▒▓█●★); each is probed against the chosen font viafont.getmask(ch)and ASCII-safe fallbacks are used if any are missing.
ffmpeg pipeline notes
The script never pipes raw frames to ffmpeg's stdin. Frames are written as
PNGs and ffmpeg reads them via the f_%05d.png glob. All ffmpeg/ffprobe
invocations redirect both stdout and stderr to log files in _tmp/logs/,
so there are no half-filled pipes or stderr-deadlock conditions. The audio
is pre-padded (or trimmed) to exactly match the video duration in Python,
so the mux command is a simple -map 0:v -map 1:a -t <dur> with no
filter graph.
Tuning
- Narration text — edit
NARRATIONnear the top of the script. - Duration window —
DUR_MIN/DUR_MAXclamp the final video length. - Voice — change the
-s(speed),-p(pitch),-v(voice) arguments toespeak-nginsidegenerate_tts(). - Look — palette colours (
C_AMBER,C_TEAL, etc.) and glyph palettes (PALETTE_*) live at the top of the file.