Since the launch of ChatGPT, I can hardly recall a meeting with prospects or customers where the topic of generative AI wasn’t raised. Businesses are eager to explore how they can harness this technology, from enhancing internal efficiency to innovating external products and services. Companies across all sectors are racing to implement generative AI solutions to remain competitive.
Although generative AI is still in its infancy, its capabilities are evolving rapidly. From vertical search and photo editing to writing assistance, the unifying factor is the use of conversational interfaces to enhance software usability and power. Chatbots, now rebranded as "copilots" and "assistants," are trending again. As best practices begin to emerge, the first step in developing a chatbot is to narrow the focus and start small.
A copilot acts as an orchestrator, supporting users in completing various tasks via a free text interface. With countless possible input prompts, it's crucial that all interactions are managed effectively and securely. Instead of attempting to address every task from the outset—which could lead to unmet user expectations—developers should concentrate on mastering one specific task before scaling.
At AlphaSense, for instance, we zeroed in on earnings call summarization as our initial task. This well-defined, high-value project aligns seamlessly with our existing workflows and offers significant benefits to our customer base. Throughout this process, we gained valuable insights into LLM development, model selection, training data generation, retrieval augmented generation, and user experience design, paving the way for future expansions into open chat.
LLM Development: Open vs. Closed Models
As 2023 unfolded, the LLM performance landscape was clear: OpenAI's GPT-4 was leading, but well-funded competitors like Anthropic and Google were hot on its heels. While open-source solutions showed promise, they initially lagged behind closed models in text generation tasks.
My decade-long experience with AI indicated that open-source would make a remarkable comeback, and indeed it has. The open-source community has driven significant improvements in performance while reducing costs and latency. Models like LLaMA and Mistral lay powerful groundwork for innovation, and major cloud providers such as Amazon, Google, and Microsoft are increasingly backing a multi-vendor strategy that includes supporting and promoting open-source solutions.
While open source hasn’t yet surpassed closed models in published performance metrics, it has clearly outperformed them on several key trade-offs that developers confront in real-world applications. The 5 S's of Model Selection can guide developers in choosing the right model type:
1. Smarts: Fine-tuning allows open-source models to excel in narrow tasks, often outperforming closed models.
2. Spend: Open-source options are generally free aside from fixed GPU costs and engineering operations, making them more cost-effective at scale than usage-based pricing.
3. Speed: By controlling the entire stack, developers can continuously optimize performance, as the open-source community generates new concepts daily. Training smaller models with insights from larger ones can dramatically reduce latency from seconds to milliseconds.
4. Stability: Closed models are prone to performance drift. With open-source, developers can collect training data and consistently retrain a fixed model, enabling systematic performance evaluations over time.
5. Security: Serving the model offers end-to-end data control and, in general, AI safety benefits from a robust open-source environment.
While closed models are essential for tailored enterprise solutions and prototyping novel use cases, open source will likely be the backbone for significant products where generative AI is vital to user experiences.
LLM Development: Training Your Model
To develop a high-performance LLM, prioritize creating the best dataset for your task. This doesn’t necessarily mean building the largest dataset; in many cases, exceptional performance on niche tasks can be achieved with just hundreds of high-quality examples. Additionally, your unique data assets and understanding of your specific domain can outpace closed model providers who gather training data across multiple use cases. At AlphaSense, our AI engineers, product managers, and financial analysts collaborate to develop annotation guidelines that ensure our datasets remain high-quality and relevant.
Distillation is a vital technique to maximize your investment in quality data. Open-source models come in various sizes, from 70 billion parameters down to smaller options like 3 billion. For many narrow tasks, these smaller models can provide sufficient capabilities while being more cost-effective and efficient. The distillation process involves training a large model with comprehensive, human-generated data and then using that model to create vast amounts of synthetic data for training smaller models. This approach allows for flexibility in optimizing user experiences across different performance, cost, and latency parameters.
RAG: Retrieval-Augmented Generation
In developing products with LLMs, it becomes apparent that the quality of input dictates output quality. For instance, while ChatGPT leverages vast internet resources, this can expose users to misleading or unsafe content, a risk that is unmanageable in critical business environments. For businesses making pivotal decisions, this level of risk is unacceptable. This is where retrieval-augmented generation (RAG) comes into play. RAG focuses the LLM’s reasoning on authoritative content retrieved from an indexed database instead of relying solely on its trained dataset. While current LLMs can process considerable text input, real-world applications often demand handling far larger volumes of data. For example, AlphaSense's database includes hundreds of billions of words, making effective context retrieval essential.
Invest more effort into establishing an information retrieval system than in training the LLM. Both keyword-based and vector-based retrieval systems have their limitations, so a hybrid approach often works best. Grounding LLMs will be a dynamic area of generative AI research in the coming years, and focusing resources here is key.
User Experience and Design: Seamless Chat Integration
From a design standpoint, chatbots should integrate fluidly with existing platforms—they shouldn't feel like an afterthought. They must provide distinct value while aligning with established design patterns. Effective guardrails should demystify usage limits, gracefully manage unanswerable queries, and facilitate automatic context injection. Here are three vital integration considerations:
1. Chat vs. GUI: Users often prefer GUIs for routine workflows, as they effectively guide complex processes. Reserve chat functionalities for scenarios that require nuanced input where context isn’t easily predictable. Strategically decide when and where chat should be incorporated in your app.
2. Setting Context: LLMs struggle with context retention. As retrieval-based conversations can balloon to substantial word counts, traditional search controls and filters can mitigate this issue. Allow users to set context rigidly or adjust it dynamically, minimizing cognitive load and enhancing the accuracy of responses.
3. Auditability: Ensure all generative AI outputs cite their original sources and are contextually auditable. Speedy verification is crucial for trust and adoption of generative AI systems in business, so investing in this mechanism is essential.
The launch of ChatGPT marked a pivotal moment for generative AI, illuminating the potential of next-generation AI-powered applications. As more companies and developers roll out and scale AI-driven chat solutions, adhering to these best practices is vital. Aligning technology with business strategy will enable the creation of innovative products that deliver long-term value. Focusing on mastering a single task while seeking opportunities to expand chatbot capabilities will set developers up for success.