TikTok Parent Accused of Using OpenAI's API to Develop Competing Models

ByteDance, the parent company of TikTok, is reportedly in violation of OpenAI’s terms of service by utilizing its technology to create competing large language models. As reported by The Verge, ByteDance is leveraging OpenAI’s API to gather data for the development of its own foundational model, currently referred to as Project Seed. With a history of innovation in generative AI, ByteDance's researchers have been focusing on sophisticated 3D generation models.

OpenAI's policies explicitly prohibit the use of outputs from models like GPT-4 to create rival systems. However, ByteDance is allegedly acquiring access to OpenAI’s technology through Microsoft—a provider that has similar restrictions in place—and has reportedly been consistently maxing out its API usage. Reports indicate that the API has been instrumental in the development phases of Project Seed, encompassing both model training and evaluation.

According to information obtained by The Verge, employee discussions on Lark, ByteDance’s internal messaging platform, revealed efforts to “whitewash” evidence of the company's alleged misuse of OpenAI’s technology. The company’s developers, primarily stationed in China, are said to have masked their usage of OpenAI’s API through data desensitization techniques, typically employed to safeguard sensitive business or personal information.

In response to these allegations, OpenAI confirmed that ByteDance's access to its ChatGPT account has been suspended while an investigation is underway. A ByteDance spokesperson emphasized the company's commitment to adhering to OpenAI's usage guidelines. They stated, "We utilize GPT to enhance products and features in markets outside of China, while our self-developed model powers Doubao, which is exclusive to China."

Doubao is ByteDance's conversational AI system, facilitating user interactions through images and text. The spokesperson claimed that a limited group of engineers had previously employed OpenAI’s API for "an internal small experimental model that was never launched." This practice was reportedly halted in April, with new internal protocols established to ensure that text generated by GPT models would not feed into the training datasets of ByteDance's proprietary models.

Furthermore, ByteDance indicated that its engineering team now primarily uses the GPT API in a restricted capacity during evaluation and testing processes, such as score benchmarking. The company has implemented measures to ensure compliance, including conducting batch sampling and comparing the similarity of its labeled data against OpenAI's outputs to mitigate the risk of inappropriate use by data annotators.

In the wake of ChatGPT’s rise in popularity, major Chinese tech firms, including ByteDance, Baidu, and Alibaba, have been racing to develop their own large language models. Recently, China unveiled a new supercomputer designed to bolster local efforts in AI model training, further highlighting the competitive landscape in the artificial intelligence sector.

Most people like

Find AI tools in YBX