How to Screen Scrape Using R

STEP 1: BACKGROUND

Screen scraping is an effective technique that can be used to gather data off of web pages. Typically the data is gathered for further analysis or aggregation.

In this tutorial, we are going to screen scrape Googles “Best of 2017” App lists. We are using screen scraping as a technique to automate copying data off of websites. For data wranglers, there are a number of libraries and packages that have been developed to make screen scraping relatively straightforward. In Python, the package Beautiful Soup has a large following. In R, the package rvest has been getting a lot of traction.

In this tutorial, we will use the rvest package to scrape data from the Google Best Apps of 2017 website and store it in a data frame. We will then use a few of R packages to analyze the dataset further.

Note: The full R code can be downloaded here.