(blog | rss | twitter | linkedin)
Note: This blog post is part of a series centered around the topic of high availability in Azure:
I’ll not be addressing scaling (horizontal or vertical), backups/restores and resiliency/healing in these posts. Each of those topics deserve their own series, perhaps I’ll write about them in the future if time permits.
In Azure, the following entities are backed by Azure storage accounts: blobs, file shares, queues, NoSQL table storages, Data Lake Storage (gen2) and unmanaged disks. In this blog post, we’ll go over the various redundancy options available for these storage accounts. We’ll compare & contrast them based on the following parameters:
Hopefully this blog post will serve as a cheat-sheet and help you choose the right Azure storage redundancy options for your use cases.
With LRS, your data is replicated thrice across multiple fault domains & update domains within a single storage scale unit (all within a single datacenter). Note that all three replicas are addressed by a single endpoint (i.e. you can’t target individual replicas for read/write operations).
Replication latency: No replication latency, data is synchronously written to all three replicas on every write request.
Disaster scenarios:
disaster type | service interruption? | data loss? | recovery possible? |
---|---|---|---|
hardware failure in physical rack/node | NO | NO1 | N/A |
datacenter disaster | YES | YES | NO2 |
availability zone disaster | ” | ” | ” |
regional disaster | ” | ” | ” |
geographic disaster | ” | ” | ” |
worldwide disaster | ” | ” | ” |
SLAs:
object storage | >= 99.999999999% (11 nines) |
read requests (hot tier) | >= 99.9% (3 nines) |
read requests (cool tier) | >= 99% (2 nines) |
write requests (hot tier) | >= 99.9% (3 nines) |
write requests (cool tier) | >= 99% (2 nines) |
With ZRS, your data is replicated across three availability zones within the same region (please note that currently not all regions support availability zones). As in the earlier case with LRS, all three replicas are addressed by a single endpoint.
Replication latency: Very low latency, data is synchronously written to all three replicas on every write request.
Disaster scenarios:
disaster type | service interruption? | data loss? | recovery possible? |
---|---|---|---|
hardware failure in physical rack/node | NO | NO1 | N/A |
datacenter disaster | ” | ” | ” |
availability zone disaster | YES2 | NO | N/A |
regional disaster | YES | YES | NO3 |
geographic disaster | ” | ” | ” |
worldwide disaster | ” | ” | ” |
SLAs:
object storage | >= 99.9999999999% (12 nines) |
read requests (hot tier) | >= 99.9% (3 nines) |
read requests (cool tier) | >= 99% (2 nines) |
write requests (hot tier) | >= 99.9% (3 nines) |
write requests (cool tier) | >= 99% (2 nines) |
With GRS, your data is replicated across two paired-regions (within the same Azure geography) in a primary region + secondary region setup. This ensures that one regional replica will be available in the event of a regional disaster.
The primary region & the secondary regions are addressed by separate endpoints. The secondary endpoint is generally inaccessible. However in case of a fail-over, the secondary is promoted to primary and read + write access is enabled for this endpoint. Fail-overs are automatically initiated by Azure in the event of a regional disaster. Azure is also introducing user-initiated fail-overs, which is currently in preview mode as of the time of writing this post.
Note: Both GRS (geo-redundant storage) and RA-GRS (read-access geo-redundant storage) are misnomers. They don’t create redundant copies across Azure geographies, only across paired-regions within the same Azure geography.
Replication latency: Your data is first replicated synchronously within the primary region via LRS. The data is then replicated asynchronously to the secondary region (eventually consistent). Within the secondary region, it is replicated synchronously using LRS. The official SLA for Azure storage does not make any guarantees about the time needed for geo-replication.
Disaster scenarios:
disaster type | service interruption? | data loss? | recovery possible? |
---|---|---|---|
hardware failure in physical rack/node | NO | NO | N/A |
datacenter disaster | YES1 | POSSIBLE2 | YES3 |
availability zone disaster | ” | ” | ” |
regional disaster | ” | ” | ” |
geographic disaster | YES | YES | NO |
worldwide disaster | ” | ” | ” |
SLAs:
object storage | >= 99.99999999999999% (16 nines) |
read requests (hot tier) | >= 99.9% (3 nines) |
read requests (cool tier) | >= 99% (2 nines) |
write requests (hot tier) | >= 99.9% (3 nines) |
write requests (cool tier) | >= 99% (2 nines) |
Same as GRS, but you always have read-only access to the secondary replica.
Replication latency: Same as GRS.
Disaster scenarios:
disaster type | service interruption? | data loss? | recovery possible? |
---|---|---|---|
hardware failure in physical rack/node | NO | NO | N/A |
datacenter disaster | YES1 | POSSIBLE2 | YES3 |
availability zone disaster | ” | ” | ” |
regional disaster | ” | ” | ” |
geographic disaster | YES | YES | NO |
worldwide disaster | ” | ” | ” |
SLAs:
object storage | >= 99.99999999999999% (16 nines) |
read requests (hot tier) | >= 99.99% (4 nines) |
read requests (cool tier) | >= 99.9% (3 nines) |
write requests (hot tier) | >= 99.9% (3 nines) |
write requests (cool tier) | >= 99% (2 nines) |
That’s all for today folks! Comments? Suggestions? Thoughts? Would love to hear from you, please leave a comment below or send me a tweet.