Staying informed about the rapidly evolving AI industry can be challenging. Until there’s an AI capable of doing it for you, here's a concise overview of recent developments in machine learning, alongside significant research and experiments we didn't cover in detail.
Last week, Midjourney, an AI startup focused on generating images (and soon videos), quietly updated its terms of service concerning its policy on intellectual property disputes. This revision replaced humorous language with more formal legal terminology. The change signals Midjourney's belief that AI companies like theirs will win court battles against creators whose content is used for training data.
The Update in Midjourney's Terms of Service
Generative AI models, including those developed by Midjourney, rely on vast datasets of images and text, typically sourced from public websites and various online repositories. These vendors argue that fair use—a legal doctrine allowing copyrighted works to be used in transformative ways—protects their training methods. However, many creators dispute this, particularly as studies emerge showing that AI models can reproduce their training data verbatim.
Some companies have proactively addressed these concerns by establishing licensing agreements and offering opt-out options for their datasets. Others have committed to covering legal fees for customers involved in copyright lawsuits stemming from their use of generative AI tools. However, Midjourney has not taken this proactive stance.
In fact, Midjourney has been under scrutiny for its apparent disregard for copyrights. The company previously maintained a list of thousands of artists—including illustrators and designers from major brands such as Hasbro and Nintendo—whose works were utilized or to be utilized for model training. Studies also indicate the inclusion of well-known TV shows and movie franchises, such as "Toy Story," "Star Wars," "Dune," and "Avengers," in its training dataset.
There is a potential scenario where legal rulings favor Midjourney, allowing the startup to continue its current practices of scraping and training on both old and new copyrighted data, should the courts affirm that fair use applies.
However, taking this approach carries significant risk. Currently, Midjourney enjoys a strong position, reporting around $200 million in revenue without external investment. Still, legal fees are substantial, and if the courts determine that fair use does not apply, it could be catastrophic for the company overnight.
The Reality of Reward and Risk
Here are some other noteworthy AI stories from the past week:
- AI-Assisted Advertisement Controversy: Creators on Instagram criticized a director for using another's more complex and impressive work in a commercial without credit.
- EU's Warning to AI Platforms Ahead of Elections: European authorities are demanding that major tech firms explain their strategies for preventing potential electoral issues.
- Google DeepMind's Gaming AI: The company aims to develop an AI capable of being an effective gaming partner for cooperative gameplay by training an agent on numerous hours of 3D gaming.
- Benchmark Dilemmas: Many AI vendors claim their models outperform competitors based on certain metrics, though these benchmarks are often flawed.
- AI2 Incubator Secures $200M: The AI2 Incubator, emerged from the nonprofit Allen Institute for AI, has attracted a substantial $200 million in computing resources to aid startups in their early stages.
- India's Regulatory Rollercoaster: The Indian government is grappling with the appropriate level of oversight for the AI sector, resulting in fluctuating regulations.
- Anthropic's New AI Models: Anthropic has introduced a new lineup of models, including Claude 3, which is said to rival OpenAI’s GPT-4. Early tests indicated impressive capabilities, though there are gaps in areas like current events knowledge.
- Deepfakes and Political Disinformation: A study from the Center for Countering Digital Hate (CCDH) explores the rising prevalence of AI-generated disinformation—specifically deepfake images related to elections—on X (formerly Twitter) over the last year.
- OpenAI vs. Elon Musk: OpenAI intends to dismiss all claims made by X CEO Elon Musk in a recent lawsuit, suggesting that Musk’s role in OpenAI's founding did not significantly influence its trajectory.
- Rufus Review: Following Amazon's announcement of its new AI chatbot, Rufus, integrated within its shopping app, early access revealed some disappointing limitations in its capabilities.
More Innovations in Machine Learning
Exploring Molecules: AI models enhance our understanding of molecular dynamics and conformation, providing insights that would be difficult to achieve through traditional, costly methods. Verification remains crucial, but advancements like AlphaFold are making significant strides.
Microsoft introduced ViSNet, a new model focused on predicting intricate structure-activity relationships between molecules and biological activity. While still experimental and targeted at researchers, it highlights the intersection of advanced science and technology.
Researchers from the University of Manchester are investigating COVID-19 Variants using large genetic datasets. Lead researcher Thomas House emphasized the need for improved analysis methods, with their results serving as proof of concept for machine learning in identifying emerging variants.
AI is also capable of Designing Molecules, with several researchers calling for ethical guidelines in this burgeoning field. Renowned computational biophysicist David Baker argues that the potential benefits of protein design far outweigh the risks. However, it is essential to navigate regulations carefully, balancing safety without stifling legitimate research.
At the University of Washington, atmospheric scientists have made intriguing claims through AI analyses of 25 years of satellite imagery in Turkmenistan. The findings suggest that the expected decrease in emissions following the Soviet collapse may, in fact, have led to increased methane emissions.
Language Model Limitations: While large language models predominantly train on English data, researchers at EPFL found that LlaMa-2 appears to revert to English even when translating between other languages. They argue that this may reflect a deeper issue, suggesting that these models structure their conceptual frameworks primarily around English. Diversifying training datasets could mitigate these issues and yield better overall performance.