How Azure Storage Handles Updating BLOBs

At an Azure Boot Camp this last week I was asked how the Azure CDN (Content Delivery Network) handled file updates if a file was being updated at the same time one of the CDN nodes requests the file.  I wasn’t sure how the system would react, so I sent off an email to the get the answer.  Thanks to Steve Marx for the answer.  If you are familiar with BLOB storage and the CDN just skip to the fourth paragraph.

Before I give the answer, I want to give a little background for the moving parts involved in the scenario.  First we have Azure BLOB Storage (BLOB stands for Binary Large Object) which at the simplest definition, is a highly scalable file system in the cloud.  Files you put in BLOB Storage can be exposed for anyone to access, or you can lock them down to only give access to those you want to share them with.  So, if we have a file in BLOB Storage that can be accessed by requests at any time, what happens if you need to update the file?  This was really the crux of the question, but the gentlemen asking the question had thrown the additional aspect of the CDN into the mix.

The CDN is a way to geo-distribute your files across the globe 1.  Moving the files out to some 20+ edge nodes around the globe means that the files are closer to your global user base.  When a user’s browser or application requests one of the files they are directed to the nearest edge node.  The benefit here is better performance for your users and offloading some of the work from your web servers.  If the edge node has the file (and the file is within its TTL, or the file is passed its TTL but is up to date) then the file is served from the edge node rather than going all the way back to the source BLOB storage account, which could be across the globe.  If the edge doesn’t have the file or the file is past it’s time to live and a new version exists, then edge node will request the updated file from the source BLOB Storage.  What happens if that request comes in to the source BLOB storage account right when we are updating it?  In reality it doesn’t matter if the request is coming in from the CDN or some other consumer, the storage system reacts the same way.

So, the good news is that the BLOB Storage updates are atomic.  Meaning, the file isn’t updated until the update is fully complete and the BLOB Storage will continue to serve the previous version until the new version is committed.  We should never see any sort of corruption due to a file update being in progress.

There are two types of BLOBs, block BLOBs and page BLOBs.  Both can be broken down into smaller parts and read/updated in pieces.  As far as updates to block blobs go (via the Put Blob and Put Block List API requests) the changes are atomic so that there is never a BLOB with only partial changes that be served.

If you happen to be reading the BLOB (either Block or Page) in ranges, or not all at once because of the size of the file, then each read is atomic.  If you read a part of a BLOB and then it changes before you read the next part, then you could get into a scenario where it has changed out from under you.  To avoid this issue make sure to use the ETag values in the response to make sure it hasn’t changed since you started to get the file.  You can read more about ETags in http on wiki-pedia.  Unless you are coding your own download methods and will be breaking files up into pieces you don’t have worry about this.

So, since the CDN request is coded to deal with the issues of the possibly shifting file you can rest assured that requests coming in for a file from the CDN while the file is being updated will not mean a corrupted file response.

Thanks again to Steve Marx who did the research, and thanks to the gentlemen who asked the question.  You learn something new all the time.

1. Note that my description of the CDN here is simplified in how it determines to cache and go get newer copies.  To see the full details, check out the documentation.

Updated on 1/30/2011 – clearing up my statement of when the file is served from the edge node based on comments from Rajesh Kolla.