One of the main reasons for using cloud services is to be able to scale. Many people don’t take advantage of that which is a shame, because it can result in huge cost savings.
The example I’m presenting today is how you can scale Azure SQL to maintain a low running cost and still get good performance when you need it. The scenario that I’m using Azure SQL for is log parsing, the data flow is as follows.
The Back Story
Application logs are collected using Application Insights. Since Application Insights only holds data for 30 days, I need another solution for looking at the bigger trends. The way I decided to go about it was to enable continues export and store all the log data in Azure Storage. Azure Storage is a cheap place to have data located, but it sucks when you want to query the data. Solutions do exists that allows you to query data in Azure Blob storage, Azure SQL Data Warehouse have polybase support for Azure Storage, but unfortunately it doesn’t support JSON files, which the logs are exported as. An alternative solution would be to use Azure Data Lake Analytics, but that tool is not the easiest to work with, and also the size of my log files currently doesn’t justify using it. Instead I decided to go with Azure Function for handling my log files, my azure functions reads the JSON from the logs that I find interesting and inserts it directly into a table in Azure SQL. The table have a only a few columns, id, timestamp, requesturl and jsondata (which is just the exported data for the request).
This setup is not the nicest, but it allows me to use the JSON features of Azure SQL to pull content of out the log JSON documents, if I beforehand knew exactly what I was looking for, it would be uncessary to save the entire document in the SQL database, but since I don’t, and my requirement changes over time it is nice to have it all in there so I can change my queries and still have all the data available.
Unfortunately my table with all this JSON data quickly gets quite large (as expected), that means that working with the data is slow unless I give the database some extra power. The point of the database is just to pull out a few key metrics on a daily, weekly and monthly basis and present them in a Power BI report for business users. So having the database run at e.g. a Premium or Premium RS SKU all the time would be quite expensive, for the few times the reports are used.
What I instead decided to do was to create some views over my log table in the database that returns everything I’m interested in. Then once a day when my timer triggered function runs I increase the SKU of the database from S1 to Premium RS 1, import the latest data from Azure Storage, select the recent data from the views and materializes it into a table for each view. When this process is done I scale the database back down automatically as part of the function code. Now the Power BI reports can be built on the tables, these tables are fast as the only contain limited number of rows.
You can call what I build a small data warehouse, without using any of the normal data warehouse tools, but instead relying on Azure SQL and Azure Functions and some C# code.
How to scale from Azure Functions
To automatically scale the database up from Azure Functions I use the following code:
[csharp]
private static async Task ScaleDatabaseUp()
{
var constring = Environment.GetEnvironmentVariable("SQL_DB");
var query = string.Format(@"
ALTER DATABASE <DATABASENAME>
MODIFY (SERVICE_OBJECTIVE = ‘PRS1’); –PRS1");
using (SqlConnection conn = new SqlConnection(constring))
{
SqlCommand cmd = new SqlCommand(query, conn);
cmd.Connection.Open();
await cmd.ExecuteNonQueryAsync();
}
}
[/csharp]
You have to replace
with the name of your database and potentially also the service objective (SKU). You can find the service objectives here
Scaling the database up is not an instantaneous process, it can take quite some time depending on database size, and how busy your server is. Until the scaling is complete we don’t want to start our complex queries, so we have to poll the server asking if the scaling is complete. The way I decided to do that is using the GetDatabaseTier()
from a while loop
[csharp]
public static async Task<DatabaseServiceTier> GetDatabaseTier()
{
var query = @"SELECT
db.name [Database]
, ds.edition [Edition]
, ds.service_objective [Service Objective]
FROM
sys.database_service_objectives ds
JOIN
sys.databases db ON ds.database_id = db.database_id";
var constring = Environment.GetEnvironmentVariable("SQL_DB");
using (SqlConnection conn = new SqlConnection(constring))
{
SqlCommand cmd = new SqlCommand(query, conn);
cmd.Connection.Open();
var reader = await cmd.ExecuteReaderAsync();
reader.Read();
return new DatabaseServiceTier
{
Database = reader[0] as string,
Edition = reader[1] as string,
ServiceObjective = reader[2] as string
};
}
}
[/csharp]
Pooling the database to get the service objective
[csharp]
var serviceTier = await GetDatabaseTier();
int retry = 0;
if (serviceTier.ServiceObjective != "PRS1")
{
await ScaleDatabaseUp();
do
{
try
{
await Task.Delay(10000);
serviceTier = await GetDatabaseTier();
} catch(Exception e)
{
log.Error("Error while waiting for scale up to finish, retrying in 10 sec: " + e.ToString());
}
if (retry++ == 100)
{
log.Error("Retried 100 times wont wait any more");
return;
}
} while (serviceTier.ServiceObjective != "PRS1");
}
[/csharp]
With this code we can know when the scaling is complete, before we move on.
Once we are done with the heavy work the database can be scaled back down.
[csharp]
private static async Task ScaleDatabaseDown()
{
var constring = Environment.GetEnvironmentVariable("SQL_DB");
var query = string.Format(@"
ALTER DATABASE <DATABASENAME>
MODIFY (SERVICE_OBJECTIVE = ‘S1’); –PRS1
SELECT * FROM
sys.databases");
using (SqlConnection conn = new SqlConnection(constring))
{
SqlCommand cmd = new SqlCommand(query, conn);
cmd.Connection.Open();
await cmd.ExecuteNonQueryAsync();
}
}
[/csharp]
In my setup I do the heavy work at night, and my tasks takes around 30 min on the Premium RS 1 SKU (if you haven’t started using Premium RS, considered it, it is really good performance for 1/3 of the price of Premium, if you can live with the worse SLA).
My savings compared to running on Premium RS 1 x 24×7 is quite significant as you can imagine, of course how much you can save depends on the workloads you are doing. But one easy cost saving is to set this up for all your Line of Business applications that have no traffic outside of working hours.