Researchers Discover Child Abuse Material in Major AI Image Generation Dataset

Home AI News Researchers Discover Child Abuse Material in Major AI Image Generation Dataset

Updated on December 20 2023

Researchers at the Stanford Internet Observatory have discovered that a dataset used for training AI image generation tools contains at least 1,008 validated instances of child sexual abuse material (CSAM). They warn that AI models trained on this dataset could potentially generate new, realistic instances of CSAM.

LAION, the non-profit organization that created the dataset, stated that it has a zero-tolerance policy for illegal content. To ensure safety, they are temporarily taking down the LAION datasets before republishing them. LAION claims to have implemented filters to detect and remove illegal content before the initial publication; however, the issue of CSAM detection has been a known concern since at least 2021, as the organization reviewed billions of images sourced from the internet.

Previous reports indicate that the LAION-5B dataset contains millions of problematic images, including pornography, violence, child nudity, and hate symbols, alongside over 5 billion images and descriptive captions. Notably, LAION founder Christoph Schuhmann admitted earlier this year that although he was unaware of any CSAM in the dataset, he had not rigorously examined the data.

Due to legal restrictions on viewing CSAM for verification, Stanford researchers utilized various techniques to identify suspected CSAM. Their findings included 3,226 entries, with many confirmed by third parties like PhotoDNA and the Canadian Centre for Child Protection.

Stability AI founder Emad Mostaque trained the Stable Diffusion model using a subset of LAION-5B data. Although the initial research version of Google's Imagen model was based on LAION-400M, subsequent iterations do not use LAION datasets. A spokesperson from Stability AI stated that their systems prohibit generating or editing CSAM, emphasizing that their models trained on LAION-5B were filtered for safety.

The newer Stable Diffusion 2 significantly reduced 'unsafe' content in its training data, making it more challenging for users to create explicit images. However, concerns remain regarding Stable Diffusion 1.5, which lacks these protections. The Stanford paper's authors recommend that models based on Stable Diffusion 1.5, without safety measures, should be deprecated and their distribution halted where possible.

Microsoft's Copilot AI Assistant Launches on Android Devices

Witness an AI Robot Master the Art of Defeating Humans in a Thrilling Marble Maze Game

Most people like

Storytell.ai

139.4K

Introducing an AI-powered productivity platform specifically designed for teams, revolutionizing the way you collaborate and manage projects. This innovative solution enhances efficiency, streamlines workflows, and fosters seamless communication, empowering your team to achieve more together. Discover how our platform transforms productivity through intelligent automation and insightful analytics.

AI-driven productivity platform Other

AI Dungeon

55.7K

Endless adventures powered by AI technology.

AI Other

Erota AI-written erotic stories

24.2K

Experience the thrill of AI-generated erotic stories crafted exclusively for your fantasies. Discover a world where your desires come to life through imaginative narratives designed to captivate and entertain.

erotic stories Large Language Models (LLMs)

BasicAI

31.7K

BasicAI offers cutting-edge, AI-powered training data solutions, featuring advanced data annotation services and a user-friendly data labeling platform designed to optimize AI and machine learning models.

AI data solutions AI Advertising Assistant

Find AI tools in YBX