"Limited Access to LLMs? Snowflake Unveils Cross-Region Inference for Enhanced Availability"

The regional accessibility of large language models (LLMs) can significantly enhance competitive advantage—faster access enables quicker innovation. Conversely, those who must wait risk falling behind.

However, the rapid pace of AI development often forces organizations to postpone adoption until models are available within their tech stack. This delay is typically due to resource limitations, western-centric biases, and multilingual challenges.

To address this pressing issue, Snowflake has announced the general availability of cross-region inference on Cortex AI. With a simple configuration, developers can now process requests in different regions, even if a specific model isn’t available locally. This allows seamless integration of new LLMs as they become accessible.

Organizations can securely utilize LLMs across the U.S., EU, and Asia Pacific and Japan (APJ) without incurring additional egress charges.

“Cross-region inference on Cortex AI allows you to seamlessly integrate with the LLM of your choice, regardless of regional availability,” states Arun Agarwal, who leads AI product marketing at Snowflake.

Enabling Cross-Region Inference

Cross-region functionality must be enabled for data traversal, with settings disabled by default. Developers must specify regions for inference. If both regions operate under Amazon Web Services (AWS), data will traverse securely through AWS's global network, benefiting from automatic physical layer encryption. If different cloud providers are involved, traffic will pass through the public internet using encrypted mutual transport layer security (MTLS). Notably, inputs, outputs, and service-generated prompts are not stored or cached; inference processing occurs solely in the cross-region environment.

To generate responses securely within Snowflake's framework, users must first set an account-level parameter to define where inference will take place. Cortex AI then automatically identifies an appropriate region for processing when a requested LLM isn’t available in the source region.

For example, if a user sets a parameter to “AWSUS,” inference can occur in either the U.S. East or West regions. Alternatively, setting “AWSEU” enables routing to central EU or Asia Pacific Northeast. Currently, target regions can only be configured within AWS; if cross-region is enabled in Azure or Google Cloud, requests will still be processed through AWS.

Agarwal illustrates this with a scenario involving Snowflake Arctic. If the model is unavailable in the source region (AWS U.S. East), cross-region inference routes the request to AWS U.S. West 2, with the response returned to the original region.

“All of this can be done with a single line of code,” Agarwal notes.

Users are billed credits for LLM usage consumed in the source region—not the cross-region. Round-trip latency between regions is influenced by infrastructure and network conditions, but Snowflake anticipates that this latency will be negligible compared to LLM inference latency.

Most people like

Find AI tools in YBX