Grass, Ontocord, and LAION jointly release the first video-audio interleaved dataset, VALID.
AI industry leaders Grass, Ontocord, and LAION have announced the joint release of the VALID (Video-Audio Large Interleaved Dataset) dataset.
Built on the Grass video repository, the dataset contains 30 million audio clips that are interleaved with images and text, making it the industry's first video-audio interleaved dataset. The release of VALID will provide new data support for the training of multimodal AI models.