Training data as a readable ambient stream

Data Lava Lamp

A neon drift of tokens, domains, source hints, and human-feedback signals. Leave it running for a few minutes and the weird texture of model training data starts to come into focus.

Pre-training web-scale text floating excerpts

Stream

Stage Pre-training
Trace domains + tokens

Display

Now flowing

Loading...

Loading stream context.

Samples ...
Source ...
License ...
Technical Details

Loading label details.

Loading sample details.