End to End MLOps on Azure

Complete MLOps pipeline with Terraform Infrastructure as code, Python web scraping, using AzureML for Dataset and Experiment Management with a FastAI language model.

8 min readJul 4, 2021

I sought to build a more comprehensive MLOps pipeline and solution after previously discussing my experience with the integration between GitHub and Azure ML Workspaces. The only prerequisites for this are a GitHub repo and Terraform Cloud account. The repository we’ll be working from is here.

Setting up the Pipeline

Since I want to develop an MLOps environment that is fully automated with infrastructure as code, I’ll be using Terraform and need a Terraform Cloud account (free). A good overview for setting up Terraform Cloud with GitHub Actions can be found here (though this is AWS centric so some is not relevant).

Once you have a Terraform Cloud account, generate a token by going to the Tokens page, clicking on “Create an API token” and naming it something like “GitHub actions”. This token value should be added to the GitHub repo you want to automate under “Settings” then “Secrets”. Name the secret “TF_API_TOKEN”.

Next, create a Workspace in Terraform Cloud (mine is named “tycho_model”), ensuring you select “API-driven Workflow” and NOT “Version control workflow”. API-driven will ensure GitHub is controlling the infrastructure and its state, rather than Terraform Cloud.

Finally, in the Terraform workspace, we need to set our Azure secrets so that Terraform can inventory our subscription and perform actions, these are registered under “Variables”. We need to generate a role and credentials for Terraform to use when interacting with Azure, to do this we can login to the Azure Portal and use the Cloud Shell, a good guide for this can be found here.

First, login to Azure, open the Cloud Shell and run the following command;

This creates a new service account. You’ll need to replace <subscription_id> with your own information. This command will return a name, password, and tenant. These should be registered as “client_id”, “client_secret”, and “tenant_id”, respectively. You’ll also need to register “subscription_id”, all in the Terraform Cloud variables for the workspace you created. The JSON output from this command should also be captured in its entirety in GitHub. Register it in your repo’s secrets and name it “AZURE_CREDENTIALS”.

Infrastructure as Code

Now we’re ready to check-in or fork the repo. This includes the definition for our GitHub Action and the Terraform file defining our infrastructure. Let’s start with the main.tf file;

Here we’ve defined a few key things, the first is “organization”, which must match your Terraform Cloud organization name, and “workspaces”, which also must match the name of your workspace. We are also calling the secrets we registered in Terraform Cloud so it can access our Azure subscription.

Next we define the infrastructure in several blocks, I won’t cover all of these but the notable one is our Azure Machine Learning Workspace;

One piece of infrastructure not managed by Terraform is our AML Compute cluster. This is defined along with the Azure ML config in .cloud/.azure. This is not ideal and I would have preferred to define it in the terraform config but it appears that AML touches the tags on a cluster when an experiment is run, even if tags are defined in Terraform on a cluster it will identify a change and redeploy the cluster, the time it takes to redeploy leaves the new cluster in an error state (possibly due to putting my subscription over core quotas for Azure).

GitHub Workflow

Now we need to invoke the Terraform file using the GitHub Action when we commit our code. This is captured under .github/workflows in the Terraform.yml config file.

This is a fairly boilerplate Terraform setup with the exception that I commented out the formatting check. This was failing on some valid configurations, such as when trying to include the AML Cluster Config. Formatting also didn’t seem to add much value given it’ll fail the Validation check later if there’s a real problem.

We then Plan and Apply the configuration with Terraform.

AML Training

Now that the environment is setup, we create a cluster (if it doesn’t exist), the cluster (as noted above) is defined in the .cloud/.azure/compute.json configuration;

Here I’m using an NV12 machine, this is significantly cheaper than the NC6v3. It scales automatically and nodes are retired after 5 minutes of inactivity.

Back to GitHub Actions

Our training execution steps are defined after the Terraform environment setup in the Terraform.yml file that drives our GitHub Workflow;

Here we’re connecting to the workspace, initializing the compute cluster and submitting our job. The first two of these look for JSON config files in /.cloud/.azure (workspace.json and compute.json, respectively). The last, our training run submission, looks for a config file in /code/train/run_config.yml.

This file defines our training script and various environment configurations. Training executes in a docker container using a Microsoft managed base image that already includes CUDA and CUDNN drivers;

We also reference the environment.yml dependency file in our run_config.yml. I’m using the FastAI library for training, so the environment.yml sets up PyTorch and FastAI.

Scraping & Registering Data

All this and just now are we getting to the model! I’m training a language model using Penny-Arcade posts, which go back over a decade and are predominantly authored by Jerry Holkins. This data scraping is done by our ScrapePA.py file, which uses BeautifulSoup. News posts occur every Monday, Wednesday and Friday, leading to consistent URLs that can be parsed. I collect 3500 news posts and store them for training.

Once the data is captured it’s tagged with the date and registered as a Dataset in the Azure ML Workspace. This decouples our data collection and training activities.

This portion of the script checks to see if our registered Dataset exists and, if so, is it older than 6 days;

The scraper doesn’t do an amazing job of removing junk in each post but seems serviceable.

This dataset is tagged with the created_on date and is versioned each time it’s updated.

Training a Model

Now we’ve got infrastructure as code, a web scraper that registers a Dataset, we’re ready to talk about the model. I’m using a FastAI language model;

Initially, I had some problems with PyTorch using the container’s GPU, so I start by checking to make sure there’s an available CUDA device for Torch. Next, I’m pre-training the model on the IMDB dataset. There are lots of other good datasets for this work, or pretrained models for ready use.

We’re fitting the model with metrics for accuracy and perplexity.

With this base model trained, we load the scraped web text data, do a little preprocessing to strip out some characters, and truncate the beginning of posts (they always have preamble junk). This is turned into a data block and we replace the model’s data loader, unfreeze all layers and fit with a lower rate to (hopefully) fine tune the model.

Logging

Ok, so let’s talk about logging. FastAI doesn’t natively supporting logging out to Azure ML Workspaces or to the (compatible) MLFlow. The existing MLFlow logger is for an old version of FastAI and no longer works. That means I had to write my own class to instrument FastAI for logging. You can see that in the train.py script;

This logger currently submits the loss back to Azure every few percent of training. It will submit up to every 1% but this is only checked when a batch is completed, so can vary based on the size of a dataset. This custom logger requires the metric name to be submitted, this allows for various sets of epochs (in this case the IMDB and Tycho corpus) to be disambiguated in the logging metrics.

The logger also doesn’t perform any after_epoch validation, instead the code after each fit evaluates and logs various metrics;

Here we evaluate the model and capture its metrics (in this case it returns loss, accuracy and perplexity but this code will accommodate more or fewer without breaking). We log the metrics back to Azure, then perform some inferencing to get text output from the model.

Final Thoughts

As you’ll note from the current output (with 1 epoch training on IMDB and 10 fine tuning on Tycho corpus), the model isn’t terribly good;

IMDB text when prompted for 300 words with “I liked this movie because”;

i liked this movie because it was a very interesting movie . Let ‘s see it , and go somewhere with the movie . Sun , Watching it . Also , what ‘s do it like ? I ‘m a

i liked this movie because it was so good and the acting was great . i was thinking , did i think it was I ‘m glad i did n’t watch I ‘ve seen such a decent picture . i thought I

Tycho fine tuned text when prompted with “I have been playing a lot of”;
i have been playing a lot of it , a little weird to see , but i have with a couple of hours things that could be gone so long . ▁ i do n’t know what the bizarre I ’m playing on there now are regarding Jack Tretton , or I ’ve never been able to make it even more . ▁ Playing it is not a game where the maps is , it ’s a lot to say . ▁ It ’s a game you ’ve probably heard about — I ’ll talk about it later and it ’s not the same as much King of Cards in Modern Warfare 2 , as it is a game i played . ▁ I ’ve got a few devices i saw trying to get through a bunch of different things , and the Gears goes into this Friday , and i ca n’t find it enough to be the kind of thing i played it . ▁ The most recent Titanfall series was a origin , but i could n’t really do anything about it . ▁ It ’s not the Online Thing It ’s Edition , which i think is a kind of existence with it . ▁ But i do n’t want to feel bad , and it ’s important to learn that is the natural colors of gaming . ▁ I ’m not trying to be a decent version of the Routine universe , but it is n’t even my favorite thing . ▁ i do n’t know what i actually liked about the game , that is to say , I ’ll still consider the entire browser .

To do;

Test various hyperparameters, possibly refactor so parameter tuning can be accomplished with a library
Test pretrained language models as bases for fine tuning
Deploy algorithm for RESTful inferencing with submitted prompts