R for Data Science II: Data Wrangling

Date: 
Friday, November 3, 2017
Location: 

Fagin Hall 114

Available to: 
Open Enrollment
Program Schedule: 
Workshops will be full days, from 9-4. The workshop will include two sessions: 9-12 and 1-4 (students will be expected to attend both). Each session will consist of a series of lecture/guided programming (~ 1.5 hours), followed by an hour of pair-programming, in which students complete exercises in pairs using a new data set, and a half hour of reconvening to discuss and summarize.
Prerequisites: 
None (R for Data Science Workshop I preferred)
Description: 

The purpose of this sequence of four workshops is to take students from no experience in R or data science to a level of proficiency in which they can (1) visualize, (2) manipulate, and (3) perform introductory statistical methods on a dataset. The sequence is divided into 4 full-day sessions, each with a morning and afternoon session that consists of lecture, pair programming, and review.

Because no programming experience is expected, courses will simultaneously teach the programming concept and the R implementation. I personally believe that larger concepts in programming are best taught after students learn an implementation, so sessions will alternate between R programming, followed by slides codifying the larger concept where appropriate.  

The course structure is loosely designed around R for Data Science by Grolemund and Wickham (freely available at http://r4ds.had.co.nz/index.html). The main difference between this course and typical R courses is that this one forgoes the typical programming-based introduction (teaching about data types, etc) in favor of beginning with real datasets and real analysis. We will incorporate important Base R functions and programming techniques along the way, but they are not a focus.

I am firm believer that the skill of how to figure out coding on your own is the most important skill one can have, and will emphasize problem solving, help files, and extensive use of online resources.

Workshop II: Data Wrangling

Some 90% of a researcher’s analysis time is spent exploring, summarizing, and manipulating their data. This workshop introduces students to the set of commands that they will use every single time they open R. The concepts and commands in this workshop could suffice for most data-tasks.

Session Objective: Students will be able to

  • Summarise, create, and sort variable in a dataframe using dplyr.
  • Spread and Gather data using tidyr.
  • Merge datasets.
  • Use factor variables.

Commands Learned:

  • Dataframe Commands: filter, mutate, select, group_by, summarise, arrange, desc, %>%.
  • Advanced Dataframe Commands: spread, gather, left_join, inner_join.
  • Base R: factor, paste, ifelse, as.character, as.numeric, write.csv, save, grepl

Datasets Used:

  • Philadelphia 2017 Municipal Primary results.
  • Guns & Crime Data from Ian Ayres “Shooting Down the ‘More Guns Less Crime’ Hypothesis” Stanford Law Review, 2003, Vol. 55, 1193-1312.

About the Instructor

Jonathan Tannen, Ph.D., is a Director at Econsult Solutions, Inc (ESI). Jonathan’s dissertation research used GIS and large-scale computational techniques to develop a Bayesian method to measure the movement of neighborhood boundaries. Broadly, his work showed that gentrification between 2000 and 2010 in Philadelphia and other dense cities occurred by emergent boundaries moving through space–the gentrified regions expanded and blocks switch dichotomously–rather than gradual block-level demographic changes.

Jonathan was born and raised in West Philadelphia, which is still his home. From 2007 to 2009, he taught at West Philadelphia High School through Teach for America, an experience that heavily informs his understanding of cycles of poverty and the nature of segregation in Philadelphia.

Jonathan received his Ph.D. in Public Policy in Urban and Population Policy from the Woodrow Wilson School at Princeton University in June 2016. Jonathan’s research interests include GIS, spatial statistics. His research with Douglas Massey on trends in Black hypersegregation was cited by the New York Times Editorial Board. Jonathan received his BA in Physics and Math cum laude from Harvard University in 2007, and a M.S.Ed. in Urban Education from the University of Pennsylvania in 2009.

Location

TBD

Instructor: 
Jonathan Tannen
Cost: 
Current students: free, Fels alumni: $100, all others: $200

Social Media

Facebook
Twitter
LinkedIn
Flickr

Contact Information

Fels Institute of Government
University of Pennsylvania
3814 Walnut Street
Philadelphia, PA 19104

Phone: (215) 898-2600
Fax: (215) 746-2829

felsinstitute@sas.upenn.edu