Disney's extensive archive, built over nearly a century, can make finding specific characters, scenes, or objects a daunting task. To simplify this process, a dedicated team from Disney’s Direct-to-Consumer & International Organization (DTCI) has developed the Content Genome (CG), a machine learning platform designed to automate the digital archiving of vast amounts of content.
The CG platform creates knowledge graphs filled with metadata—similar to what you see in Google search results for notable figures like Steve Jobs. This metadata allows AI applications to enhance search functionality, content discovery, and personalization features. As Anthony Accardo, Director of Research and Development at DTCI, explains, it assists animators in quickly locating specific shots and sequences within Disney's archive, saving them time that would otherwise be spent sifting through video content on platforms like YouTube.
Initiated in 2016, the project aimed to equip Disney for the shift from traditional broadcast and home video distribution to a more consumer-centric digital video platform. "Building such a system from scratch is challenging," Accardo notes, emphasizing the necessity of a well-structured taxonomy for effective metadata management. A disorganized taxonomy can hinder the ability to utilize generated data effectively.
The team established what they refer to as the first automated tagging pipeline, as detailed in a recent Medium post. "Tagging content is crucial for DTCI's application of supervised learning, especially in custom use cases requiring specific detection," the DTCI team stated. This tagging process identifies nuanced story and character elements from structured data, including storylines and character motivations.
Utilizing existing facial recognition technology, the DTCI team successfully detected human faces across Disney’s movie and TV catalog. However, adapting this technology to recognize animated characters proved more complex. "Unlike live-action, animated faces don't always conform to typical human proportions," said Miquel Àngel Farré, DTCI’s Manager of Research and Development.
Initially trained with two Disney Junior animated shows, the team experienced limited success. Ultimately, they shifted to deep learning methods for enhanced accuracy in detecting animated faces. By refining an existing Faster-R CNN Object Detection architecture, previously trained on a non-Disney dataset, they streamlined the training process. This approach allowed for quicker adaptation while mitigating the need for extensive new datasets.
The tagging process includes human oversight to ensure accuracy, particularly for consumer-facing features. "For consumer searches, accuracy is vital, so we validate results through our QA system," Accardo remarks.
This innovative technology has the potential to transform how consumers interact with Disney’s content. In theory, users could efficiently search for episodes featuring specific minor characters, props, or action sequences. Such advancements could significantly enhance recommendation systems, making content discovery more user-friendly.
Looking ahead, Accardo and the team aim to expand the system’s capabilities through multimodal machine learning techniques. By integrating natural language processing with other recognition technologies, they aspire to identify complex concepts within their content seamlessly. "Understanding context remains a challenge but is essential for advancing AI," Accardo states.
As DTCI continues to refine this technology, the prospect of automating human contextual understanding in content becomes increasingly exciting.