Running AI in the public cloud raises significant concerns for enterprises regarding data privacy and security. Consequently, many organizations opt to deploy AI on private clouds or on-premises environments. Together AI is among the vendors addressing these challenges with a focus on enabling efficient AI deployment in private clouds. The company recently announced its Together Enterprise Platform, which supports AI deployment in virtual private cloud (VPC) and on-premises settings.
Together AI debuted in 2023, aiming to simplify the enterprise use of open-source large language models (LLMs). The existing full-stack platform allows businesses to leverage open-source LLMs on its cloud service, while the new Together Enterprise Platform facilitates AI deployment in environments controlled by customers. This platform addresses critical business concerns around performance, cost-efficiency, and data privacy.
"As enterprises scale AI workloads, efficiency and cost become paramount, alongside a strong emphasis on data privacy," said Vipul Prakash, CEO of Together AI. "Organizations have established privacy and compliance policies within their cloud setups, and they care about model ownership as well."
Cost-Effective Solutions for Private Cloud AI Deployment
A key feature of the Together Enterprise Platform is its ability to empower organizations to manage and run AI models within their private cloud environment. This flexibility is vital for enterprises that have made significant investments in their IT infrastructure. The platform not only supports private clouds but also allows users to scale to Together’s cloud as needed.
An important advantage of the Together Enterprise platform is its capability to enhance the performance of AI inference workloads significantly. "We can often improve inference performance by two to three times and reduce the necessary hardware by 50%," Prakash explained. "This leads to considerable savings and increased capacity for enterprises to develop new products, models, and features." These performance improvements stem from a combination of optimized software and efficient hardware utilization.
"We utilize algorithmic techniques to optimize computation schedules on GPUs for maximum efficiency and minimal latency," Prakash elaborated. "Our approach includes speculative decoding, where a smaller model predicts the output of a larger one, thereby reducing the load on more resource-intensive models."
Orchestrating Multiple AI Models with the Mixture of Agents Approach
Another standout feature of the Together Enterprise platform is its ability to manage multiple AI models within a single application or workflow. "In enterprises, it's common to use a mixture of open-source models, custom models, and models from various sources," noted Prakash. "The Together platform facilitates the orchestration of these diverse models, allowing for dynamic scaling based on feature demand."
Different methods exist for orchestrating AI models, from technologies like LangChain for integration to model routers that direct queries to the best model. Together AI, however, employs its unique Mixture of Agents method. This approach combines various agents, using simpler models as "proposers" to generate responses, which are then aggregated by a model that synthesizes these inputs for a superior answer.
"We are dedicated to providing a computational and inference platform, and we find agentic AI workflows fascinating," Prakash stated. "Expect to see more innovative developments from Together AI on this front in the coming months."