Writer, a San Francisco-based startup founded in 2020, raised $100 million in September 2023 to expand its proprietary large language models (LLMs) for enterprise applications. While not as well-known as giants like OpenAI, Anthropic, or Meta, Writer is carving a niche with its in-house models, collectively named Palmyra. Esteemed companies such as Accenture, Vanguard, HubSpot, and Pinterest leverage Writer’s creativity and productivity platform powered by these models.
Recently, Stanford HAI's Center for Research on Foundation Models introduced a new benchmarking metric termed HELM Lite, featuring in-context learning, which allows LLMs to learn tasks from a small set of examples provided during inference. Notably, while GPT-4 led this benchmark, Writer’s Palmyra X V2 and X V3 models performed "unexpectedly" well, ranking high despite their smaller size, according to Percy Liang, director of the Stanford center.
In the machine translation category, Palmyra excelled, achieving a top ranking. CEO May Habib highlighted this success in a LinkedIn post, noting, “Palmyra X is outperforming classic benchmarks, claiming the top position overall in MMLU and leading in the new translation tests.”
Habib emphasized the economic challenges enterprises face when implementing larger models like GPT-4, which was trained on 1.2 trillion tokens. She stated, “Generative AI use cases in 2024 need to be economically viable,” explaining that enterprises often struggle with high serving costs and changing prompts due to model distillation. She believes that Stanford HAI's benchmarking reflects real enterprise needs more accurately than other platforms like Hugging Face.
Writer initially targeted marketing teams and was co-founded by Habib and Waseem AlShikh, who previously managed Qordoba, an NLP and machine translation company. In early 2023, Writer introduced the Palmyra series, including models with 128 million to 20 billion parameters, and launched Knowledge Graph to help companies integrate business data with Palmyra and self-host models.
“We offer a full stack solution, combining the model with a built-in retrieval-augmented generation (RAG) system,” said Habib. This innovation addresses the inefficiencies of sending data to embeddings models and receiving it back.
Habib advocates for smaller models paired with curated training data, even in light of statements from experts suggesting larger generalist models outperform specialized ones. She noted the HELM Lite leaderboard showed medical LLMs outperforming GPT-4, asserting, “When it comes to inference and cost, enterprises benefit from specialized models that are easier to manage and more economical.”