Azure Batch is a cloud-based service that allows you to run large-scale parallel and high-performance computing (HPC) workloads. With Azure Batch, you can easily scale out your computations to thousands of virtual machines (VMs) and process large amounts of data.
When creating an Azure Batch pool, it’s important to consider availability and reliability. One way to ensure high availability is to create a Batch pool across Azure Availability Zones. In this blog post, we’ll cover what Availability Zones are, the benefits of using them, and how to create an Azure Batch pool across Availability Zones.
What are Azure Availability Zones?
Azure Availability Zones are unique physical locations within an Azure region that are designed to provide high availability and resiliency. Each Availability Zone is made up of one or more datacenters with independent power, cooling, and networking. Azure regions that support Availability Zones have a minimum of three separate zones.
When you create an Azure Batch pool using a Virtual Machine Configuration, you can choose to provision your Batch pool across Availability Zones. Creating your pool with this zonal policy helps protect your Batch compute nodes from Azure datacenter-level failures.
For example, if you create your pool with a zonal policy in an Azure region that supports three Availability Zones, and one datacenter in one Availability Zone has an infrastructure failure, your Batch pool will still have healthy nodes in the other two Availability Zones, so the pool will remain available for task scheduling.
Best practices for using Azure Batch pools across Availability Zones
Here are some best practices for using Azure Batch pools across Availability Zones:
Check regional support and other requirements
Before creating a Batch pool with a zonal policy, check that the Azure region you select supports Availability Zones and the requested VM SKU in more than one zone. You can validate this by calling the Resource Skus List API and checking the locationInfo field of resourceSku.
For user subscription mode Batch accounts, make sure that the subscription in which you’re creating your pool does not have a zone offer restriction on the requested VM SKU. You can check this by calling the Resource Skus List API and checking the ResourceSkuRestrictions. If a zone restriction exists, you can submit a support ticket to remove the zone restriction.
Disable inter-node communication if using a VM SKU that supports InfiniBand
You cannot create a Batch pool with a zonal policy if it has inter-node communication enabled and uses a VM SKU that supports InfiniBand.
Use a virtual network
When you create a Batch pool with a zonal policy, each VM is placed in a separate Availability Zone. To ensure that the VMs in the pool can communicate with each other, you must create a virtual network that spans all the Availability Zones in the region.
Use managed disks
To ensure high availability and durability of data, use managed disks for your VMs. Managed disks provide built-in replication and redundancy across Availability Zones, so your data is protected in case of a datacenter-level failure.
Use Azure Monitor for monitoring and alerting
Use Azure Monitor to monitor your Batch pool and set up alerts for key metrics. Azure Monitor provides visibility into the health and performance of your Batch pool and can alert you when there are issues.
Examples :
- Creating a Batch Pool with a Zonal Policy using the Batch .NET SDK:
pool.DeploymentConfiguration.VirtualMachineConfiguration.NodePlacementConfiguration = new NodePlacementConfiguration()
{
Policy = NodePlacementPolicyType.Zonal
};
- Creating a Batch Pool with a Zonal Policy using the Batch REST API:
POST {batchURL}/pools?api-version=YYYY-MM-version
client-request-id: 00000000-0000-0000-0000-000000000000
Request body
"pool": {
"id": "pool2",
"vmSize": "standard_a1",
"virtualMachineConfiguration": {
"imageReference": {
"publisher": "Canonical",
"offer": "UbuntuServer",
"sku": "18.04-lts"
},
"nodePlacementConfiguration": {
"policy": "Zonal"
}
"nodeAgentSKUId": "batch.node.ubuntu 18.04"
},
"resizeTimeout": "PT15M",
"targetDedicatedNodes": 5,
"targetLowPriorityNodes": 0,
"maxTasksPerNode": 3,
"enableAutoScale": false,
"enableInterNodeCommunication": false
}
In conclusion, Azure Batch pools can be provisioned across availability zones, providing a high level of fault tolerance and resilience. This zonal policy helps protect Batch compute nodes from Azure datacenter-level failures, ensuring that your pool remains available for task scheduling. When creating a Batch pool, it’s important to ensure that the Azure region supports availability zones and that the requested VM SKU is available in more than one zone. Additionally, be sure to check for any zone offer restrictions on the VM SKU in your subscription and remove them if necessary. Finally, it’s worth noting that inter-node communication cannot be enabled if using a VM SKU that supports InfiniBand. By following these best practices and utilizing the provided code examples, you can create highly resilient and fault-tolerant Azure Batch pools across availability zones.
Docs : Create a pool across availability zones – Azure Batch | Microsoft Learn