Training data as a readable ambient stream

Data Lava Lamp

A neon drift of tokens, domains, source hints, and human-feedback signals. Leave it running for a few minutes and the weird texture of model training data starts to come into focus.

Pre-training web-scale text floating excerpts

Stream

Stage Pre-training

Trace domains + tokens

Display

Show context labels

Flow speed

Density

Now flowing

Loading...

Loading stream context.

Samples ...

Source ...

License ...

Technical Details

Loading label details.

Loading sample details.