IoT Azure Pipeline — Part 2

Continue building a pipeline to capture data and publish it for visualization.

Paul Bruffett
4 min readMar 16, 2021

Part 1

So now we have the environment setup and need to publish to the Azure Function. This can either be done from Visual Studio Code with the Azure Functions plug-in, or we can set it up as a GitHub Action. The GitHub Action means it’ll publish and update to Azure every time we commit, that’s what I’ve setup and you can see the .github/workflows/main.yml portion of the repo.

To configure this you’ll need to update “AZURE_FUNCTIONAPP_NAME” in the main.yml and follow the steps from this guide, specifically “download your publish profile” and “add the GitHub secret”. With that done and your main.yml updated you can commit changes to your function app and they’ll automatically be pushed to your running version in Azure.

To check if your app is performing you can use the Azure Portal and navigate to your function using the “Functions” option, then click its name;

That done, you see an option to “Monitor”. This is where you can see logs from your function and check to make sure it’s running successfully;

Now our function is writing output to CosmosDB, we can query records.

We’ll use a Jupyter Notebook to run some queries against the data, first we’ll need to grab the credentials for our CosmosDB, both the URL and Primary Key or Secondary Key. Add these to a JSON file in the same folder as the Jupyter Notebook called “cosmoskeys.json”. The other key you’ll notice is required is “datapane”, this is the free service I’ll be publishing the final report to. You can sign up here.

Our query selects all the records from the ‘temp’ container we created and began writing temperature readings to. We also see how many request units we consumed. With Cosmos we got 400 requests per second. Full scans like this are relatively cheap, my query returned 54,000 items and consumed 4.07 request units.

Building a report

So what I want to produce is a view that shows temperature by hour for the last few days and a heatmap of light by hour. Something like this;

Ok, so let’s prepare the raw data.

Now we need to create a data frame, filter out nulls and then group by hour for charting.

gaps in readings make this ugly

A basic chart of time temperature averaged by hour. Follow along with the previous steps using the Jupyter Notebook.

Aggregating and Improving

So let’s build the real solution, a dashboard using Bokeh which will look a bit better, and publish it to Datapane.

First, we need to build an aggregate collection in Cosmos, our current solution will stop scaling effectively eventually, if we run it long enough, and pushing aggregate queries to Cosmos will get more and more expensive, so one solution is to run a supplemental timed Function that will update and publish to a collection of readings already averaged by hour and add a flag to records to say they’ve been aggregated so we can ignore them.

Let’s loop through all of our readings, create a dict of sums by hours, we can then use this to create averages:

Next, we will update the existing records by adding a ‘processed’ flag and setting it to ‘true’, so next time we can ignore these. We will also write the records to a new collection in Cosmos called “dashboard”.

Examples of each of these can be seen in the Python notebook. Now that we’ve populated our Dashboard collection we can query it and build a more sophisticated report.

Automating Aggregates

If you want to automate the preparation of aggregates, another Azure Function can be used which will query all readings that have not been processed and add them to the dashboard collection. An example can be seen here.

Building the Dashboard

Now that we have the aggregates, let’s query them and build a dashboard in Bokeh.

This notebook queries the aggregates and publishes the report.

I’m building the dashboard to only use the last few days (3 in this case) of granular data for the temperature readings, this gives a better chart with a bit more nuance;

We’ll put the aggregates in a DataFrame, calculate the averages using the sum divided by the record count, and parse out the hours from the dates.

Our chart will involve two figures, the heatmap of illuminance and the graph of temperature. Let’s start with the graph of temperature;

Bokeh uses ColumnDataSource for loading data, so we’ll populate plot_source with the data we’d like to use.

We need a second y axis to display the units for illuminance (illuminance on the left, temperature on the right), this axis is dynamically scaled to be just a bit higher than our maximum value.

Finally we plot the lines and add this figure to our visualization. Next we need to add the boxes for our illuminance over the longer time horizon;

Next we bring them together and get our final output:

--

--

Paul Bruffett

Enterprise Architect specializing in data and analytics.