Building Data Visualization Tools

How to Work with Maps


The content of this blog is based on examples/notes/experiments related to the material presented in the “Building Data Visualization Tools” module of the “Mastering Software Development in R” Specialization (Coursera) created by Johns Hopkins University [1].

Required Packages

  • ggplot2, a system for “declaratively” creating graphics, based on “The Grammar of Graphics.”
  • gridExtra, provides a number of user-level functions to work with “grid” graphics.
  • dplyr, a tool for working with data frame like objects, both in memory and out of memory.
  • viridis, the viridis color palette.
  • ggmap, a collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps)
# If necessary to install a package run
# install.packages("packageName")

# Load packages
library(ggplot2)
library(gridExtra)
library(dplyr)
library(viridis)
library(ggmap)

Data

The ggplot2 package includes some datasets with geographic information. The ggplot2::map_data() function allows to get map data from the maps package (use ?map_data form more information).

Specifically the italy dataset [2] is used for some of the examples below. Please note that this dataset was prepared around 1989 so it is out of date especially information pertaining provinces (see ?maps::italy).

# Get the italy dataset from ggplot2
# Consider only the following provinces "Bergamo" , "Como", "Lecco", "Milano", "Varese"
# and arrange by group and order (ascending order)
italy_map <- ggplot2::map_data(map = "italy")
italy_map_subset <- italy_map %>%
  filter(region %in% c("Bergamo" , "Como", "Lecco", "Milano", "Varese")) %>%
  arrange(group, order)

Each observation in the dataframe defines a geographical point with some extra information:

  • long & lat, longitude and latitude of the geographical point
  • group, an identifier connected with the specific polygon points are part of
    • a map can be made of different polygons (e.g. one polygon for the main land and one for each islands, one polygon for each state, …)
  • order, the order of the point within the specific group
    • how the all of the points being part of the same group should be connected in order to create the polygon
  • region, the name of the province (Italy) or state (USA)
head(italy_map, 3)
##       long      lat group order        region subregion
## 1 11.83295 46.50011     1     1 Bolzano-Bozen      
## 2 11.81089 46.52784     1     2 Bolzano-Bozen      
## 3 11.73068 46.51890     1     3 Bolzano-Bozen      

How to work with maps

Having spatial information in the data gives the opportunity to map the data or, in other words, visualizing the information contained in the data in a geographical context. R has different possibilities to map data, from normal plots using longitude/latitude as x/y to more complex spatial data objects (e.g. shapefiles).

Mapping with ggplot2 package

The most basic way to create maps with your data is to use ggplot2, create a ggplot object and then, add a specific geom mapping longitude to x aesthetic and latitude to y aesthetic [4] [5]. This simple approach can be used to:

  • create maps of geographical areas (states, country, etc.)
  • map locations as points, lines, etc.

Create a map showing “Bergamo,” Como,” “Varese,” and “Milano” provinces in Italy using simple points…

When plotting simple points the geom_point function is used. In this case the polygon and order of the points is not important when plotting.

italy_map_subset %>%
  ggplot(aes(x = long, y = lat)) +
  geom_point(aes(color = region))
data visualization tools

Author: Pier Lorenzo Paracchini

He is a generalist with a passion for people, data and technology. He has a Master of Science in Electronic Engineering from the Politecnico Di Milano and works as an enthusiast developer with a data scientist twist in the software innovation sector in Statoil. His journey in data science and machine learning started in 2014.

LinkedIn

Follow us on:

Create a map showing “Bergamo,” Como,” “Varese,” and “Milano” provinces in Italy using lines…

The geom_path function is used to create such plots. From the R documentation, geom_path “… connects the observation in the order in which they appear in the data.” When plotting using geom_path is important to consider the polygon and the order within the polygon for each point in the map.

The points in the dataset are grouped by region and ordered by order. If information about the region is not provided then the sequential order of the observations will be the order used to connect the points and, for this reason, “unexpected” lines will be drawn when moving from one region to the other. On the other hand if information about the region is provided using the group or color aesthetic, mapping to region, the “unexpected” lines are removed (see example below).

plot_1 <- italy_map_subset %>%
  ggplot(aes(x = long, y = lat)) +
  geom_path() +
  ggtitle("No mapping with 'region', unexpected lines")

plot_2 <- italy_map_subset %>%
  ggplot(aes(x = long, y = lat)) +
  geom_path(aes(group = region)) +
  ggtitle("With 'group' mapping")

plot_3 <- italy_map_subset %>%
  ggplot(aes(x = long, y = lat)) +
  geom_path(aes(color = region)) +
  ggtitle("With 'color' mapping")

grid.arrange(plot_1, plot_2, plot_3, ncol = 2, layout_matrix = rbind(c(1,1), c(2,3)))