Remember Sakana AI? Almost a year ago, this Tokyo-based startup made waves in the artificial intelligence landscape with its high-profile founders from Google and an innovative merging-based approach to developing high-performing AI models. Today, the company has unveiled two exciting image-generation models: Evo-Ukiyoe and Evo-Nishikie.
These models, available on Hugging Face, are designed to generate images from text and image prompts, but they feature a unique twist. Rather than focusing on broad image generation styles, these models specialize in Japan’s cherished historic art form, ukiyo-e, which thrived from the 17th to 19th centuries. Sakana AI aims to revitalize this traditional art for contemporary audiences using cutting-edge AI technology.
What to Expect from Sakana AI's New Models
Ukiyo-e, or “pictures of the floating world,” emerged in the early 1600s and predominantly depicted themes like historical scenes, landscapes, and sumo culture. Originating as monochrome woodblock prints, the art form evolved into intricate full-color prints known as nishiki-e. Unfortunately, ukiyo-e's popularity waned in the 19th century with the advent of digital photography.
With the introduction of Evo-Ukiyoe and Evo-Nishikie, Sakana AI seeks to reintroduce this historic art style into popular culture. Evo-Ukiyoe is a text-to-image model that generates images reminiscent of ukiyo-e, especially when prompted with descriptive text related to traditional motifs such as cherry blossoms, kimonos, and birds. Interestingly, it can also create ukiyo-e-style artwork featuring modern elements like hamburgers and laptops, although the results may occasionally stray from the traditional style.
This model builds on Evo-SDXL-JP, utilizing Sakana’s innovative evolutionary model merging technique atop Stability AI’s SDXL and various open diffusion models. A key aspect of its development was the use of LoRA (Low-Rank Adaptation) to fine-tune Evo-SDXL-JP on a dataset of over 24,000 meticulously captioned ukiyo-e artworks, sourced through a collaboration with the Art Research Center (ARC) at Ritsumeikan University in Kyoto.
In their blog post, the company stated, “We curated diverse data across a spectrum of subjects, from complete artworks to face-centered pieces, utilizing the digital images in the ARC collection. We emphasized multi-colored nishiki-e with vibrant hues and diversity.”
Evo-Nishikie, the second model, is an image-to-image application that adds color to monochrome ukiyo-e prints. This model can breathe new life into historical black-and-white illustrations by introducing color or refreshing existing multi-colored nishiki-e prints. Users simply need to provide a source image along with instructions for the desired coloration.
Sakana AI developed this capability by conducting ControlNet training on Evo-Ukiyoe, incorporating fixed prompts and condition images.
Goals for Further Research and Development
While both models currently support prompting exclusively in Japanese and are still in their early stages, Sakana AI aims to showcase traditional “Japanese beauty” through AI, promoting the appeal of the country’s culture globally. The company envisions applications in education and new avenues for enjoying classical literature.
Both models, along with the associated code, are available on Hugging Face. The repository includes a Python script and LoRA weights, provided under the Apache 2.0 license. Sakana notes, “This model is intended for research and development purposes only and should be regarded as an experimental prototype. It is not designed for commercial use or deployment in mission-critical environments, and its performance and outcomes are not guaranteed.”
So far, Sakana AI has secured $30 million in funding from various investors, including Lux Capital, which has a history of backing pioneering AI companies like Hugging Face, as well as Khosla Ventures, renowned for its early investment in OpenAI in 2019.