Data orchestration plays a crucial role in transferring data seamlessly between different systems, and Apache Airflow has emerged as a leading tool for this purpose, originally developed by Airbnb.
Recently, Astronomer, the primary commercial supporter of Apache Airflow, launched an update to its Astro platform, enhancing enterprise support, security, and management features. Initially designed for orchestrating data pipelines in analytics and business intelligence, Airflow is now increasingly utilized for artificial intelligence (AI) and machine learning (ML) workloads.
“Airflow excels at writing and executing data pipelines,” said Julian LaNeve, CTO of Astronomer. “By defining pipelines as code, users unlock virtually limitless possibilities.”
The Importance of Airflow in Data Orchestration
LaNeve emphasized that Airflow's popularity has surged as it simplifies the definition, construction, and deployment of data pipelines for organizations. It seamlessly integrates with major data platforms and cloud services such as Snowflake, Databricks, AWS, Microsoft Azure, and Google Cloud. While Airflow is user-friendly for single teams, its complexity increases at an enterprise level. This is where Astronomer steps in, providing a managed service for Apache Airflow.
Astronomer enhances the open-source framework with added capabilities. “We've developed the Astronomer runtime, optimizing the open-source project for improved efficiency,” LaNeve explained.
Additionally, the Astro platform includes tools designed to streamline the creation of data pipelines. For instance, the Astro Cloud IDE offers a notebook-based environment for easy pipeline development, while Astronomer also ventures into observability, focusing on comprehending data flow across the ecosystem.
Enhanced Connectivity and Upgrades with Astro
With the latest Astro platform update, Astronomer introduces significant enhancements. A key challenge in managing data pipelines is ensuring secure connections to data sources; the new connection management feature addresses this issue.
This feature serves as a centralized governance and security point for data pipelines. “Administrators can easily define connections to Snowflake, Databricks, and any other sources accessible via Airflow,” LaNeve stated.
The update also facilitates smoother upgrades and rollbacks for data pipeline configurations. If a pipeline fails, users can quickly revert to a previous configuration, while the platform conducts compatibility checks before applying updates to ensure smooth operation.
Astronomer’s Commitment to AI
Astronomer is increasingly leveraged for AI workflows. In November, the company announced integrations with a variety of AI vendors, including OpenAI, Cohere, Pinecone, OpenSearch, Weaviate, and pgvector.
Astronomer also introduced a reference architecture for building and deploying large language model (LLM) applications. The public demonstration, available at ask.astronomer.io, showcases how to consolidate documentation from numerous sources using a retrieval-augmented generation (RAG) strategy.
LaNeve envisions widespread use of Airflow and the Astro platform for training AI models. “To ensure your models are trained with the latest data, reliability is paramount, and that's precisely what Astronomer and Airflow were designed for.”