Training data as a readable ambient stream

Data Lava Lamp

A cyberpunk-ish drift of tokens, domains, labels, and human-feedback traces. Leave it running for a few minutes and the weird texture of model training data starts to come into focus.

Pre-training web-scale text cached excerpts

Dataset channel

Stage Pre-training

Trace domains + tokens

Display

Show trace labels

Flow speed

Density

Current stream

Loading...

Pulling cached metadata.

Cache ...

Source ...

License ...

Loading label notes.