Create Custom R Models in Azure Machine Learning

Overview

This tutorial will walk users through building a Random Forest model in Azure Machine Learning and R. We will use the bike sharing dataset for this tutorial. This dataset includes the number of bikes rented for different weather conditions. From the dataset, we can build a model that will predict how many bikes will be rented during certain weather conditions. 

Building Custom R Models in Azure Machine Learning Studio

A Step-By-Step Tutorial

About the Data

Azure Machine Learning Studio has a couple dozen built-in machine learning algorithms. But what if you need an algorithm that is not there? What if you want to customize certain algorithms? Azure can use any R or Python based machine learning package and associated algorithms! It’s called the “create model” module. With it, you can leverage the entire open-sourced R and Python communities.

The Bike Sharing dataset is a great data set for exploring Azure ML’s new R-script and R-model modules. The R-script allows for easy feature engineering from date-times and the R-model module lets us take advantage of R’s randomForest library. The data can be obtained from Kaggle; this tutorial specifically uses their “train” dataset.

The Bike Sharing dataset has 10,886 observations, each one pertaining to a specific hour from the first 19 days of each month from 2011 to 2012. The dataset consists of 11 columns that record information about bike rentals: date-time, season, working day, weather, temp, “feels like” temp, humidity, wind speed, casual rentals, registered rentals, and total rentals.

Feature Engineering & Preprocessing

Complete Feature Engineering

There is an untapped wealth of prediction power hidden in the “datetime” column. However, it needs to be converted from its current form. Conveniently, Azure ML has a module for running R scripts, which can take advantage of R’s built-in functionality for extracting features from the date-time data.