R for Data Science I: Intro to R and Data Visualization

Date: 
Friday, October 27, 2017
Location: 

Fagin Hall 114

Available to: 
Open Enrollment
Program Schedule: 
Workshops will be full days, from 9-4. The workshop will include two sessions: 9-12 and 1-4 (students will be expected to attend both). Each session will consist of a series of lecture/guided programming (~ 1.5 hours), followed by an hour of pair-programming, in which students complete exercises in pairs using a new data set, and a half hour of reconvening to discuss and summarize.
Prerequisites: 
None
Description: 

The purpose of this sequence of four workshops is to take students from no experience in R or data science to a level of proficiency in which they can (1) visualize, (2) manipulate, and (3) perform introductory statistical methods on a dataset. The sequence is divided into 4 full-day sessions, each with a morning and afternoon session that consists of lecture, pair programming, and review.

Because no programming experience is expected, courses will simultaneously teach the programming concept and the R implementation. I personally believe that larger concepts in programming are best taught after students learn an implementation, so sessions will alternate between R programming, followed by slides codifying the larger concept where appropriate.  

The course structure is loosely designed around R for Data Science by Grolemund and Wickham (freely available at http://r4ds.had.co.nz/index.html). The main difference between this course and typical R courses is that this one forgoes the typical programming-based introduction (teaching about data types, etc) in favor of beginning with real datasets and real analysis. We will incorporate important Base R functions and programming techniques along the way, but they are not a focus.

I am firm believer that the skill of how to figure out coding on your own is the most important skill one can have, and will emphasize problem solving, help files, and extensive use of online resources.

Workshop I: Intro to R and Data Visualization

The biggest hurdle in getting to use R is simply to not be intimidated by the coding nature. This workshop introduces students to the basic structure of RStudio and code-based commands. It does so in the context of the important and rewarding tasks of visualizing data.

Session Objective: Students will be able to

  • Load a data set into the workspace and explore values manually.
  • Create variables in a dataframe.
  • Use ggplot to create scatter plots, line plots, histograms, violin plots.
  • Use numeric and logical vectors.

Commands learned:

  • Base R: <-, ?, read.csv(), head(), [], $, 1:4, c(), logical operators (<, >, ==), table(), order(), dim, length.
  • Basic math and logical operations: max, min, which, which.max, sum, mean, &, |, ==, <, >, !=, NA, is.na()
  • Visualizations: library(), ggplot, aes, geom_scatter, geom_line, geom_histogram, facet_wrap, ggtitle, scale_x_log10, scale_color_continuous, scale_color_discrete, geom_text, geom_smooth, ggsave

Datasets Used:

  • Smoking and Birthweight from “The Costs of Low Birth Weight,” Quarterly Journal of Economics, August 2005, 120(3): 1031-1083.
  • ACS Philadelphia Wage Data

About the Instructor

Jonathan Tannen, Ph.D., is a Director at Econsult Solutions, Inc (ESI). Jonathan’s dissertation research used GIS and large-scale computational techniques to develop a Bayesian method to measure the movement of neighborhood boundaries. Broadly, his work showed that gentrification between 2000 and 2010 in Philadelphia and other dense cities occurred by emergent boundaries moving through space–the gentrified regions expanded and blocks switch dichotomously–rather than gradual block-level demographic changes.

Jonathan was born and raised in West Philadelphia, which is still his home. From 2007 to 2009, he taught at West Philadelphia High School through Teach for America, an experience that heavily informs his understanding of cycles of poverty and the nature of segregation in Philadelphia.

Jonathan received his Ph.D. in Public Policy in Urban and Population Policy from the Woodrow Wilson School at Princeton University in June 2016. Jonathan’s research interests include GIS, spatial statistics. His research with Douglas Massey on trends in Black hypersegregation was cited by the New York Times Editorial Board. Jonathan received his BA in Physics and Math cum laude from Harvard University in 2007, and a M.S.Ed. in Urban Education from the University of Pennsylvania in 2009.

Location

TBD

Instructor: 
Jonathan Tannen
Cost: 
Current students: free, Fels alumni: $100, all others: $200

Social Media

Facebook
Twitter
LinkedIn
Flickr

Contact Information

Fels Institute of Government
University of Pennsylvania
3814 Walnut Street
Philadelphia, PA 19104

Phone: (215) 898-2600
Fax: (215) 746-2829

felsinstitute@sas.upenn.edu