Training data as a readable ambient stream

Data Lava Lamp

A cyberpunk-ish drift of tokens, domains, labels, and human-feedback traces. Leave it running for a few minutes and the weird texture of model training data starts to come into focus.

Pre-training web-scale text cached excerpts

Dataset channel

Stage Pre-training
Trace domains + tokens

Display

Current stream

Loading...

Pulling cached metadata.

Cache ...
Source ...
License ...

Loading label notes.