Skip to main content

Azure infrastructure requirements

This page describes the Azure infrastructure for running the ClearSpecs AI - Local LLM (offline) edition in Azure DevOps. Prompt execution does not use the ClearSpecs cloud assistant API; all model requests go only to an OpenAI-compatible endpoint that you host in Azure, subject to your network and proxy rules.

The supported topology is a model deployed in Azure Foundry, fronted by Azure API Management (APIM). You provision, secure, and operate that endpoint. For extension settings (Base URL, model, API key), see Configure an OpenAI-compatible endpoint. For distribution and overview, see Introduction.

Architecture overview

The ClearSpecs AI extension runs inside the Azure Boards UI in the user's browser on an Azure DevOps origin (for example https://dev.azure.com). When a user invokes an AI action, the extension issues a browser fetch to the configured chat/completions endpoint.

Two consequences drive the requirements below.

First, the call is cross-origin. The endpoint must return CORS headers that allow your Azure DevOps web origin, or the browser blocks the request even when the endpoint is reachable and healthy. Foundry has no configurable CORS surface, so APIM (or an Azure Function proxy) is required to add these headers. A direct browser-to-Foundry call cannot work.

Second, the endpoint is the trust boundary. Authentication, rate limiting, logging, and data residency are properties of the APIM and Foundry endpoint you stand up, not of the extension.

Prerequisites

RequirementNotes
Azure DevOps organization (or collection)With the privately distributed Local LLM extension installed and assigned to the project.
Permission to manage extension settingsAccess to Organization settings or Collection settings, then ClearSpecs AI, then the Local LLM tab. Settings are stored in Azure DevOps Extension Data under Microsoft's scoping rules.
Azure subscriptionWith a valid payment method.
Azure Foundry model deploymentOpenAI-compatible, reachable via APIM. See Azure Foundry: model deployment below.
Azure API Management instanceFronts Foundry and applies CORS and authentication. See Azure API Management below.

Extension configuration values

On the Local LLM tab you configure three settings:

  • Base URL — the HTTPS endpoint implementing the OpenAI-compatible chat/completions API. Trailing slashes are normalized. Point this at the APIM route, not at Foundry directly.
  • Model — the Azure Foundry deployment name.
  • API key — the APIM subscription key (or whatever credential the proxy enforces).

Azure Foundry: model deployment

See Deploy Foundry Models on Microsoft Learn.

You need:

  • An Azure subscription with a valid payment method.
  • The Cognitive Services Contributor role (or equivalent) on the Foundry resource, to create and manage deployments.
  • A Microsoft Foundry project under a Foundry resource.
  • For models from partners and community (for example Llama), an Azure Marketplace subscription and the permissions to create it. Models sold by Azure (for example Azure OpenAI models such as gpt-4o-mini) do not require this.

Deployment notes:

  • The deployment name is what the extension sends in the model parameter to route requests. Record this value; it becomes the Model field in the extension. You can rename the deployment from the default model name before deploying; if you do, the extension must use the chosen name.
  • The endpoint exposes an OpenAI-compatible surface under the /openai/v1/ path. The extension's Base URL should point at the APIM route that fronts this path.
  • Region and quota: deployments consume quota on a per-region, per-model basis, measured in Tokens-per-Minute (TPM). Choose a region where the model is available and allocate TPM per deployment. Hitting the quota ceiling blocks new deployments until you request more quota or reallocate TPM. Size TPM to expected concurrent ClearSpecs usage.

Azure API Management

See Configure AI Gateway (API Management) on Microsoft Learn.

APIM is required for this deployment. It adds the CORS headers the browser needs (Foundry has none), surfaces authentication to the extension, and provides token limits and quotas. AI Gateway in Foundry uses Azure API Management behind the scenes and can be enabled from the Foundry portal under Operate, then Admin console, then AI Gateway.

Permissions:

  • To create a new APIM instance: Contributor or Owner on the target resource group (or subscription).
  • To reuse an existing APIM instance: API Management Service Contributor (or Owner) on that instance.
  • Foundry portal access for the resource, for example Foundry Account Owner or Foundry Owner.

Instance requirements:

  • Create new provisions a Basic v2 SKU, intended for dev/test with SLA support. A free tier is available for APIM via AI Gateway; confirm current cost and free-tier eligibility against Azure API Management pricing.
  • Use existing requires the instance to be in the same Microsoft Entra tenant and subscription as the Foundry resource, on a v2 tier, not already associated with another AI Gateway, with the operator holding at least API Management Service Contributor on it.
  • For production or higher throughput, use a Standard v2 or Premium v2 tier.
  • If the Foundry resource has public network access disabled, the APIM instance must also be privately accessible. Use Standard v2 or Premium v2 with a private endpoint, or a Premium v2 injected into a virtual network.

What APIM provides for this deployment:

  • CORS headers allowing the Azure DevOps origin. This is the reason a browser client needs the proxy at all.
  • Authentication (subscription key or other) surfaced to the extension as the API key.
  • Token limits and quotas at the project level: multi-team containment, cost capping, and predictable usage ceilings for regulated workloads.

Note: New projects in the Foundry resource have AI Gateway enabled by default. Existing projects must be added to the gateway manually.

Mapping to the extension settings

Extension fieldValue
Base URLThe APIM route fronting the Foundry OpenAI-compatible endpoint (.../openai/v1/).
ModelThe Foundry deployment name.
API keyThe APIM subscription key (or whatever credential the proxy enforces).

CORS requirements

Because the Boards UI calls the endpoint from the browser, cross-origin rules apply:

  • The endpoint must return CORS headers that allow your Azure DevOps web origin (scheme and host, for example https://dev.azure.com).
  • Foundry has no configurable CORS surface. Place APIM (or an Azure Function proxy) in front and configure CORS there. A direct browser-to-Foundry call is blocked.

Misconfigured CORS is the most common failure mode. It presents as the request failing in the browser even though the model deployment is healthy and reachable from server-side tools. For Ollama or other self-hosted servers, see CORS guidance on the endpoint configuration page.

Networking and security

  • Transport: HTTPS/TLS end to end.
  • Authentication: the APIM subscription key (or other enforced credential) is supplied in the extension's API key field. It is stored in Azure DevOps Extension Data under Microsoft's scoping.
  • Data residency and privacy: the offline SKU sends prompt content only to the configured Base URL. No prompt data is sent to the ClearSpecs cloud assistant API. Data residency is determined by the Foundry deployment region.
  • Egress and private networking: if the Foundry resource or APIM sits behind a private endpoint or VNet, users' browsers must have a network route to the APIM endpoint.

Sizing and operational considerations

  • Throughput: size Foundry TPM quota to peak concurrent ClearSpecs usage. Where multiple teams share a deployment, govern them with APIM token limits to prevent one project from exhausting capacity.
  • Monitoring: verify traffic via APIM Monitoring, then Metrics (Requests), and Monitoring, then Logs (the GatewayLogs table). Expect 200 responses for healthy calls and 429 when a token limit is exceeded.
  • Quotas: plan for quota-increase lead time if expected usage approaches the regional model ceiling.