Google is set to enhance its web development platform with revolutionary tools that allow users to create applications using natural language prompts, alongside powerful multimodal capabilities. In a recent Medium post, JavaScript engineer Bedros Pamboukian shared purported screenshots of emerging AI features for MakerSuite, including Gemini—a highly anticipated multimodal AI model that will enable text, image, and audio inputs and outputs. While these features have yet to be publicly confirmed, the early screenshots suggest that they are still in the developmental stages, with several interface elements appearing unfinished.
### What’s Been Revealed?
The leaked features include a standout tool called Stubbs, designed to streamline the creation and sharing of AI-generated app prototypes with minimal effort. If accurate, Stubbs promises a user-friendly approach to web app development tailored for non-technical individuals. In addition to Stubbs, there’s a companion feature known as Stubbs Gallery, which will allow MakerSuite users to explore and modify existing prototypes. Notably, user-created Stubbs will remain private by default, with the option to share them with others.
Gemini, which is also referred to by the codename 'Jetway' for MakerSuite integration, is expected to drive the platform's multimodal functionalities. This includes capabilities in text recognition, object recognition, contextual understanding of images, and the ability to incorporate videos and HTML in prompts. The engineer indicated that Gemini will also be integrated into Vertex AI, Google's development environment for applications.
### Additional Upcoming Features
Among the new functionalities are an autosave feature for MakerSuite, translation support for prompts in various languages, and integration with Google Drive for seamless importing of images and files into the editing environment.
### Google Gemini: Insights and Expectations
Google has been teasing Gemini since its announcement at the I/O event in May, where CEO Sundar Pichai highlighted its impressive multimodal capabilities, which are significantly advanced compared to previous models. The Gemini project, developed by the Google DeepMind team, brings together expertise from both the Brain Team and DeepMind to offer a robust rival to industry leaders like OpenAI's ChatGPT.
While details remain limited, Gemini’s multimodal nature is confirmed, allowing it to process and generate text, video, and image responses. It is also designed to access various tools and APIs for enhanced functionality.
### Facilitating Easier App Development
The growing interest in using AI to foster improved web app development is evident, and Google is advancing this objective with a new development environment called Project IDX. This initiative joins a host of other tools, such as MetaGPT and GitHub Copilot, which facilitate application building through natural language processing.
Recently, a former Google engineer introduced an innovative approach for constructing AI-powered web apps locally on devices rather than relying solely on cloud infrastructure. If the rumored addition of Stubbs is accurate, it could significantly democratize access to app development, according to Bradley Shimmin, chief analyst for AI and data analytics.
### The Landscape of Tech Leaks
It’s important to note that the information presented in Pamboukian's blog post has not been formally verified by Google. The origin of the screenshots remains unclear, although Pamboukian states that they were obtained directly without external sources influencing the content. Historically, developers have been adept at uncovering unreleased features; for instance, back in June, a developer revealed Instagram's plans for AI chatbots three months ahead of an official announcement by Meta CEO Mark Zuckerberg during the Connect 2023 event.
As Google rolls out these potentially transformative capabilities, the tech community eagerly anticipates further developments that will shape the future of app creation.