Few months ago, I started noticing a significant difference in performance between our live environment and staging environment, to be more specific, the response time of our APIs, it is one of my responsibilities as the engineer handling infrastructure of a Django application to maintain, optimize, and continuously improve performance and make the most out of our startup budget, I took a deep dive searching for the reason.
Take a quick look at the application Stack:
OS: Ubuntu 14.10 LTS, Web Server: Nginx, Framework: Django, API Framework: Django Rest Framework, Database: PostgreSQL, Queuing & Messaging: Celery backed by Redis, Cache: Redis.
Staging setup: Full stack on a single machine, EC2 t2.micro in US East region.
Live Setup: Web/App server that runs everything but the DB on a A2 VM, and PostgreSQL server on A1 VM on Microsoft Azure, both are located in the same availability zone in Central US Data Centre, and in the same virtual network.
According to the machines specs on both of the setups, the live site should be noticeably faster than the staging one if the data is the same and all the software is exactly the same, but that wasn’t our case… absolutely there is something wrong happening :S
I started my investigation by doing some research, and installing some tools to measure and test the performance of the main infrastructure components: CPU, Memory, Disk, Network,..etc. and… Well mainly it’s the disk:
Timing cached reads: 3468 MB in 2.00 seconds = 1733.71 MB/sec
Timing buffered disk reads: 88 MB in 3.15 seconds = 27.92 MB/sec
Timing cached reads: 21894 MB in 2.00 seconds = 10957.49 MB/sec
Timing buffered disk reads: 240 MB in 3.07 seconds = 78.17 MB/sec
That was kind of a sufficient reason for the slowness on the live site, hence a sufficient technical reason to migrate our live site to AWS, but wait!!, what about the cost? Well,…
On the other hand, and if I want to use the same size (or approximately the same) of machines I would use t1.samll, and t2.medium:
(0.026 + 0.052) X 744 ~= $38.714 monthly, ~= 37% of what Azure costs us for the same size of machines and much slower in performance.
Yet surprisingly Azure was 63% more expensive, it is not a major difference when running on a couple of machines, but it does get serious when running tens of servers (the cost difference could even be higher if you are using a reserved AWS instance rather than the on-demand instances which adds an additional 30% to the cost saving)
Few days ago I had the chance to migrate our live site to AWS. and just for fun, I did a quick calculation query on the average response time of our APIs to visualize the performance difference post migration, the graph shown in the header of this article is the result.