Mitsubishi Scientist Explores: Do Deep Neural Networks Outperform Second Graders in Intelligence?

Anoop Cherian, a senior principal research scientist at Mitsubishi Electric Research Laboratories (MERL), recently conducted a fascinating study comparing the puzzle-solving abilities of advanced AI models like ChatGPT and GPT-4 with those of second graders. His findings, detailed in the paper “Are Deep Neural Networks SMARTer than Second Graders?” reveal significant insights into the capabilities of deep neural networks and their ability to solve basic reasoning tasks.

### Overview of Mitsubishi Electric Research Laboratories

Mitsubishi Electric Research Laboratories is a key subsidiary of the Mitsubishi Electric Corporation, dedicated to cutting-edge research and development. With over 1,600 filed patents, MERL focuses on application-driven basic research across various scientific domains. This includes not only artificial intelligence and robotics but also multi-physical systems and dynamic modeling.

As a senior principal research scientist in the computer vision group based in Cambridge, Massachusetts, Cherian’s work revolves around deep neural networks and machine learning, particularly in their applications to complex visual processing and reasoning tasks. He is currently exploring multimodal reasoning—a process that integrates various sensory inputs like audio, text, and visual data into cohesive reasoning frameworks. Cherian’s recent endeavors also include understanding cognitive models and how human-like knowledge can be embedded into machine learning methodologies.

### Insights from the Research

Cherian's recent paper originated from a desire to objectively evaluate the performances of deep neural networks, especially given reports of their capability to achieve remarkable feats, such as passing bar exams or creating artistic content. These accomplishments raise crucial questions about the true intelligence level of these models compared to human cognitive capabilities.

Traditional IQ tests are the standard for evaluating human intelligence, leading Cherian to consider whether similar assessments could be applied to AI. The study utilized a series of puzzles designed for second graders from Olympiad competitions, aiming to determine if deep neural networks could solve these puzzles as effectively as young children could.

### Key Findings

Cherian’s research set out to evaluate deep neural networks' abilities to perform **simple multimodal algorithmic reasoning tasks** derived from puzzles designed for second graders. These tasks not only require basic comprehension skills but also the ability to derive algorithms—a level of reasoning that goes beyond mere visual identification.

The results were striking: while second graders achieved an accuracy rate of 70-75%, the latest deep neural networks, including advanced large language models, only performed at about 35%. This stark contrast suggests a significant gap in the reasoning abilities of AI compared to even young human learners.

### Models Under Examination

The study specifically tested several prominent AI models, including:

- ChatGPT

- GPT-4

- GPT-4 configured with Bing interfaces (creative, precise, balanced)

- Google’s Bard

Initially, the dataset included 101 visual language puzzles, but to accommodate the text-only processing capabilities of the models, a subset of puzzles was created that removed the image component or rendered it irrelevant to the answers.

### Representative Puzzle Example

One example of a puzzle involves a scenario where an individual invites three friends to a pizza party, leading to a question about how many slices of pizza remain after each person consumes two slices. Despite its simplicity, many neural network models failed to consider the inviter as part of the group, displaying a consistent gap in logical reasoning.

### Discrepancies in Performance

Despite ChatGPT's impressive results in higher-stakes contexts like bar examinations, the findings underscore a disconnect in how these models handle simpler, context-based reasoning tasks. The primary reason may be the significant influence of diverse training datasets and course materials that prepare models for specific challenges, while the reasoning required for the puzzles was more nuanced and multi-faceted, involving various skills simultaneously.

### Implications for Enterprises

The implications of these findings are profound for businesses utilizing AI. Effective problem-solving in real-world applications hinges on three core concepts: **abstraction**, **reduction**, and **generalization**. These components play a crucial role in machine intelligence and must be understood and integrated into operational framings for optimal outcomes.

- **Abstraction** involves identifying essential elements of a problem while disregarding unnecessary details. For instance, determining how to sort boxes based on weight.

- **Reduction** focuses on deriving efficient methods to solve the problem, like creating a sorting algorithm to stack heavier boxes below lighter ones.

- **Generalization** refers to applying learned solutions across various contexts, thereby enhancing the adaptability and problem-solving capacity of AI systems.

### Addressing Risks

As organizations begin integrating AI models into their workflows, it's critical to approach the inherent risks with caution. The unpredictability of AI reasoning processes—often stochastic and difficult to define—poses significant challenges that could impact decisions and operations. Pragmatic strategies must be devised to manage potential inaccuracies and ensure that operational predictions remain within acceptable limits.

### Future Directions

Cherian's future research endeavors focus on expanding the puzzle dataset and refining approaches to develop neural networks capable of robust algorithmic reasoning. This work aims to bridge the gap between perceptual understanding and algorithmic reasoning, paving the way for enhanced AI capabilities in practical applications across a range of industries, from manufacturing to food processing.

Overall, Cherian's study offers crucial insights into the current state of AI intelligence and its implications for future applications, emphasizing the need for further development in the area of machine reasoning.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles