Lab 06 – Interactive visualizations

Instructions

Obtain the GitHub repository you will use to complete the lab, which contains a starter file named lab06.Rmd. This lab shows you how to create interactive visualizations using the highcharter and leaflet packages. Carefully read the lab instructions and complete the exercises using the provided spaces within your starter file lab06.Rmd. Then, when you’re ready to submit, follow the directions in the How to submit section below.

Note on PDF submissions

Because of how they work, the interactive visualizations will not show up when you knit your R Markdown files to the PDF format. This is okay. You should still submit the PDF file for this assignment to Blackboard.

What are interactive visualizations?

This lab introduces you to interactive visualizations, which are a class of dynamic visualizations that satisify two criteria [1],

Human input: control of some aspect of the visual representation of information, or of the information being represented, must be available to a human

Response time: changes made by the human must be incorporated into the visualization in a timely manner

Compared to static visualizations, which are the type of visualization that we create using ggplot2, interactive visualizations allow us to include additional information in the plots that make up our R Markdown reports. As Andy Kirk explains in Data Visualisation: A Handbook for Data Driven Design, interactive visualizations, when used in the right circumstances, offer many advantages [2],

It expands the physical limits of what you can show in a given space.

It increases the quanity and broadens the variety of angles of analysis to serve different curiosities.

It facilitates manipulations of the data displayed to handle varied interrogations.

It increases the overall control and potential customisation of the experience.

It amplifies your creative license and the scope for exploring different techniques for engaging users.

Dashboards are a common way to implement an interactive visualization and also allow users to select the parts of a dataset they want to include in a plot. An example of a dashboard-based interactive visualization can be seen here: https://ajayk.shinyapps.io/csi_773/. While we won’t be building a full dashboard, we will utilize two R packages, highcharter and leaflet to add interactivity to our R Markdown documents and enhance the way we explore a new dataset.

About the dataset

Unique dataset

This dataset was scraped by Dr. Glasbrenner in January 2019 and it is original to CDS 102!

You will be working with a dataset consisting of rental property information scraped on January 21, 2019 from https://www.carolinadesigns.com, which is a website people can use to book vacation rentals in North Carolina’s Outer Banks. The information collected for each property includes rental rates, its location, its features, and if any special amenities accompany the property. The website only provides rental rates for properties and dates that haven’t been booked yet, so missing values under the rate_[month] columns were imputed using a predictive model trained on the available data.

The table below provides descriptions of the dataset’s 63 variables,

Variable Description
property_number Numeric identifier for rental property
property_name Name of rental property
rate_[month] Median weekly rental rate for a given month. [month] is the first three letters of a calandar month: jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec
latitude Latitude for a given property
longitude Longitude for a given property
region Region where property is located. The regions are Corolla, Duck, Kill Devil Hills, Kitty Hawk, Nags Head, and Southern Shores
waterfront Classifies the proximity of a property to the water. The classifications are oceanfront, semi-oceanfront, soundfront, and inland
check_in_day Indicates if the property’s check-in day is Friday, Saturday, or Sunday
beach_distance Distance the property is from the beach in yards
baths_full The number of full bathrooms a property has
baths_half The number of half bathrooms a property has
bed_[type] The number of beds of a certain type a property has. The values for [type] are california_kings, daybeds, double_bunks, double_trundles, doubles, futons, kings, pyramid_bunks, queen_over_queens, queen_sleep_sofas, queens, rollaways, sleepsofas, trundles, twin_bunks, twin_sleep_sofas, and twins
number_dvd_players The number of DVD players a property has
number_highchairs The number of highchairs a property has
number_of_bedrooms The total number of bedrooms a property has
number_of_master_bedrooms The number of master bedrooms a property has
number_outdoor_showers The number of outdoor showers a property has
number_parking_spaces The number of parking spaces a property has
number_tvs The number of televisions a property has
elevator Indicates if the property has an elevator
home_theater Indicates if the property has a home theater room
media_room Indicates if the property has a media room
recreation_room Indicates if the property has a recreation room
seasonal_fireplace Indicates if the property has a fireplace
basketball_hoop Indicates if the property has a basketball hoop
volleyball_area Indicates if the property has an area to play volleyball
cabana Indicates if the property has a cabana
charcoal_grill Indicates if the property has a charcoal grill
gas_grill Indicates if the property has a gas grill
hot_tub Indicates if the property has a hot tub
special_events_welcome Indicates if the property allows guests to host a special events
two_dogs_welcome_with_fee Indicates if up to two dogs are allowed on a property if guests pay a special fee
community_pool_access Indicates if renting a property grants access to a community pool
community_tennis Indicates if renting a property grants access to community tennis courts
discounted_golf_fees Indicates if guests receive a discount at a local golf course
firm_4pm_check_in Indicates if a property will not allow guests to check-in earlier than 4pm

A data-driven vacation

Imagine that you are in charge of planning a vacation to the Outer Banks in the month of July and you want to put together a list of top rental options for your friends/family to review and vote on. You go to the rental property website and you find that there are over 300 rental properties available. Unfortunately the search tool does not let you easily compare and contrast your options and you have no desire to manually go through and read all the individual property pages. You realize that this is a perfect opportunity to put your data science skills to use, so you scrape the website and put together a tabular dataset with information on the rental properties that you can more easily explore. Since you need to both explore what’s available as well as summarize your findings for your friends/family, you decide to put together some interactive visualizations.

Interactive plots using highcharter

The highcharter package is used to create interactive versions of the same types of plots we’ve created using ggplot2. The syntax for creating a basic plot using highcharter is also similar to ggplot2 and is summarized below,

data %>%            # The dataset
  hchart(
    "...",          # Plot type: "scatter", "line", "column", etc.
    hcaes(
      x = ...,      # Variable on x-axis
      y = ...,      # Variable on y-axis
      group = ...   # Apply different colors for groups defined in variable
    )
  )

The ellipses are placeholders. Also note that, within the hchart() function, the hcaes() function plays a similar role to aes() in ggplot2.

  1. Let’s begin our exploration by creating a scatter plot that shows the rental rates for the month of July versus the number of bedrooms a property has,

    obx %>%
      hchart(
        "scatter",
        hcaes(
          x = number_of_bedrooms,
          y = rate_jul
        )
      )

    If you hover the mouse cursor over the points, you should see a pop-up that lists the x (number_of_bedrooms) and y (rate_jul) values for the given point. Describe the trend that you see in this plot, and then use the pop-ups to figure out how many bedrooms the most expensive rental property has and how many bedrooms the least expensive rental property has.

One of the things we would like to do in our data exploration is to narrow our search towards properties that will have a cheaper rental rate. We suspect that an important factor that will affect rental prices is the region where a property is located.

  1. Use the following code template to help you compute some summary statistics about the July rental rates,

    regions_summary <- obx %>%
      group_by(...) %>%
      summarize(
        rate_mean = round(mean(rate_jul)),
        rate_minimum = min(rate_jul),
        rate_maximum = max(rate_jul)
      )

    Fill in the ellipses so that you group over the different regions. Which region has the highest overall average rental rate? Which region has the lowest overall average rental rate?

Let’s use the information we just computed in Exercise 2 to make an interactive visualization that we could show to our friends/family. We will create a bar chart showing the average July rate for each region, which we will enhance by customizing what is displayed in the pop-up when we hover our mouse cursor over each bar. This will be done using the hc_tooltip() and tooltip_table() functions from highcharter.

  1. Fill in the ellipses in the following code template to create a bar chart the displays the average July rate for each region,

    regions_summary %>%
      hchart(
        "column",
        hcaes(
          x = ...,
          y = ...
        )
      ) %>%
      hc_tooltip(
        useHTML = TRUE,
        pointFormat = tooltip_table(
          x = combine("Region", "Mean rate"),
          y = combine("{point.region}", "{point.rate_mean}")
        )
      )

    Hover your mouse over the bars for each region. Does the mean rate you see in the pop-up match the mean rate you computed in Exercise 2?

The display information in the pop-up box is set using the vectors passed to the x (names) and y (values) keywords of the tooltip_table() function. Take special note of the vector passed to the y keyword, which contains text values with curly braces, “{point.region}” and “{point.rate_mean}”. This is a special syntax that says, for the data point you currently have your mouse hovering over, display the value for the variable region or rate_mean. Any variable in the regions_summary data frame, not just region and rate_mean, can be displayed in the pop-up box.

  1. The pop-up boxes for the visualization you created in Exercise 3 currently list two pieces of information, the name of the region and the mean rate. While nice to see, this simply repeats what we are already looking at in the bar chart. The real power of interactive visualizations is when we can include additional information. Copy the code you wrote for Exercise 3 and modify the vectors passed to the x and y keywords in tooltip_table() so that each pop-up contains two additional lines of information, the Minimum rate and Maximum rate for each region.

We now have a nice interactive visualization that summarizes the average, minimum, and maximum rental rates that we can show our friends/family to show which regions contain the cheapest and most expensive rental properties. Let’s filter the dataset so that we can focus our search on properties in the region with the cheapest rates on average.

  1. Use the filter() function to filter the dataset so that it only contains rental properties from the region with the cheapest July rates on average. Assign this filtered data frame to a variable called obx_region.

Let’s wrap up by updating the scatter plot we created in Exercise 1 by using the filtered dataset obx_region and customizing the pop-up message for each data point.

  1. Complete the following code template to add pop-ups to the visualization that lists the following information,

    • Property number

    • Property name

    • July rate

    • Number of bedrooms

    • Number of parking spaces

    obx_region %>%
      hchart(
        "scatter",
        hcaes(
          x = number_of_bedrooms,
          y = rate_jul,
          group = waterfront
        )
      ) %>%
      hc_tooltip(
        useHTML = TRUE,
        pointFormat = tooltip_table(
          x = combine(...),
          y = combine(...)
        )
      )

    Does the waterfront category (oceanfront, semi-oceanfront, soundfront, inland) affect the July rate? If so, which category contains the cheapest rates overall?

Before we move on, let’s filter the dataset one more time.

  1. After checking with your family/friends, you have determined that you will need a rental property with six bedrooms. Filter obx_region so that it only contains properties with six bedrooms. In addition, if you determined in Exercise 6 that there is a waterfront category that has cheaper overall rates, also filter the dataset to only include properties within this category. Assign the filtered dataset to a variable called obx_filtered.

Interactive maps using leaflet

The leaflet package lets us create interactive maps, which can be very useful when working with spatial data encoded as latitude and longitude values. The basic syntax for creating an interactive map using leaflet is summarized below,

dataset %>%
  leaflet() %>%
  addTiles() %>%
  addMarkers(
    lat = ~<latitude_variable>,
    lng = ~<longitude_variable>
  )

where <latitude_variable> and <longitude_variable> are placeholders.

Important!

The tilde ~ immediately before <latitude_variable> and <longitude_variable> are important and must be included. For example, you would write lat = ~lat_var if lat_var is the column containing the latitude values.

Our goal is to build an interactive map that our friends/family could browse that shows the properties that meet our filter criteria. We will adopt an iterative approach to building our map by adding features to it one step at a time.

  1. Use the leaflet syntax summarized above to create a basic interactive map showing the properties in obx_filtered.

    Hint

    The variables containing the location data for each property are named latitude and longitude.

One interesting thing we can do with the interactive map that we’re building is add points of interest. For example, the first ever Duck Donuts store (which, appropriately enough, sells doughnuts) opened in the Outer Banks, which could be one possible place to visit. We can display the location for the Duck Donuts store as a circle so that it looks different from the other markers. We can also add a pop-up to it that displays the street address for the store.

  1. Copy the code you wrote for Exercise 8 and add the Duck Donuts location marker with street address pop-up to it as follows,

    <code_from_previous_exercise> %>%
      addCircleMarkers(
        lat = 36.1633527,
        lng = -75.7534337,
        popup = "Duck Donuts<br>1190 Duck Rd<br>Duck, NC 27949",
        popupOptions = popupOptions(closeButton = FALSE)
      ) %>%
      addPopups(
        lat = 36.1633527,
        lng = -75.7534337,
        popup = "Duck Donuts<br>1190 Duck Rd<br>Duck, NC 27949",
        options = popupOptions(closeButton = FALSE)
      )

We can add the same kind of pop-up that we used for the Duck Donuts marker to our regular property markers, which will allow a user to click the marker icon and see additional information about the property. Without this, our friends/family will not be able to understand which property corresponds with each icon.

  1. Copy the code you wrote for Exercise 9 and add a new input to the addMarkers() function called popup,

    addMarkers(
      lat = ~<latitude_variable>,
      lng = ~<longitude_variable>,
      popup = ~paste0(
        property_name,
        "<br>",
        "July weekly rate: $",
        rate_jul,
        "<br>",
        ...
      )
    )

    The ellipses is a placeholder. The pop-up message for each property is created by the inputs to the paste0() function, which concatenates the different inputs together into a single piece of text. You can see the pop-ups by clicking the markers for the different properties.

    Replace the ellipses with more inputs to the paste0() function so that the pop-up displays the following additional information about each property,

    • Number of bedrooms

    • Number of full bathrooms

    • Number of parking spots

    • Distance to the beach

    • Latitude

    • Longitude

    Important!

    The “<br>” inputs are line breaks that will make text display on a new line. This is analogous to what happens in a text editor when you press the Enter key on your keyboard.

While there is always more that you could add to these interactive maps, this should be sufficient for your friends/family to browse and decide which properties they think are the most promising choices for your upcoming vacation.

How to submit

To submit your lab, follow the two steps below. Your lab will be graded for credit after you’ve completed both steps!

  1. Save, commit, and push your completed R Markdown file so that everything is synchronized to GitHub. If you do this right, then you will be able to view your completed file on the GitHub website.

  2. Knit your R Markdown document to the PDF format, export (download) the PDF file from RStudio Server, and then upload it to Lab 6 posting on Blackboard.

Cheatsheets

You are encouraged to review and keep the following cheatsheets handy while working on this lab:

Credits

This lab is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The original idea for the lab along with the first version of the lab instructions were written by Ajay Kulkarni for CDS-102. Revised instructions and new exercises written by James Glasbrenner.

References

[1] Wikipedia contributors, “Interactive Visualization,” (2018).

[2] A. Kirk, Data Visualisation: A Handbook for Data Driven Design, 1st ed. (Sage Publications, Los Angeles, 2016).