Training data as a readable ambient stream
About
Pre-training
Post-training
Data Lava Lamp
A cyberpunk-ish drift of tokens, domains, labels, and human-feedback
traces. Leave it running for a few minutes and the weird texture of
model training data starts to come into focus.
Pre-training
web-scale text
cached excerpts
Show Interface
Exit Fullscreen
What This Shows
Pre-training streams are raw-ish text a model might learn broad
patterns from: web pages, educational pages, PDFs, code, and other
large corpora. Post-training streams are more intentional: prompts,
assistant replies, ratings, preference labels, and other human
feedback used to shape behavior after the base model exists.
This prototype uses tiny cached excerpts so it stays fast and stable.
The structure is ready for live Hugging Face streams later.