ICOS Big Data Summer Camp

University of Michigan

Ross School of Business R0210 - 701 Tappan Street, Central Campus
May 14th-18th 2018
9:00 am - 5:00 pm

General Information

Social and organizational life are increasingly conducted or tracked online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” It is a good bet that within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “how to make use of big data,” just as statistical analysis is a standard part of such training today. The ICOS Big Data Camp aims to make big data accessible for people with no prior background. We want people to leave with enough confidence and basic knowledge to be able to know what is possible in their research and where they might go next, drawing on resources at the University of Michigan. Organizing committee: Jerry Davis, H. V. Jagadish, Cliff Lampe, and Brian Noble

Coordinators: Teddy DeWitt

Instructors: Abigail Azari, Jackie Cohen, Teddy DeWitt, Ronnie Lee, Jeff Lockhart, Patrick Park, Colleen Van Lent, Laura Wendlandt

Guides: Abigail Azari, Teddy DeWitt, Mana Heshmati, Ronnie Lee, Jeff Lockhart, Patrick Park

Speakers: Adriene Beltz, Elizabeth Bruch, Emilee Rader

Who: The course is aimed at graduate students and other researchers.

Requirements: Participants must bring a laptop with a few specific software packages installed (listed below).

Contact: Please mail teddydew@umich.edu for more information.

Resources: Go here for example papers and data sources.


Schedule

Monday 9:00 Introduction with Jerry Davis (Intro ppt, pdf.)
9:45 Camp Overview with Teddy DeWitt
10:30 Coffee Break and Software Check
10:45 Setup and Command Line Introduction with Jeff Lockhart (BASH Command Sheet pdf)
12:00 - 1:00 Lunch from Pizza House (website)
12:15 Lunch Speaker Series - Emilee Rader (bio/website) (slides)
1:00 - 1:30 Lunch Clean Up and Coffee Pick Me Up
1:30 Introduction to SQL with Teddy DeWitt (Main slides: pdf, database, slides on joins: pdf)
3:00 Project Overview and Team Formation
4:00 Team Meetings and Software Check
5:00 Depart
Tuesday9:00 Q & A and Software Check
9:30 Introduction to Python with Colleen Van Lent (notebook) (data)
12:00 - 1:00 Lunch from Picasso Cafe (website)
12:15 Lunch Speaker Series - Adriene Beltz (bio/website)
1:00 - 1:30 Lunch Clean Up and Coffee Pick Me Up
1:30 Data Structures and pandas with Ronnie Lee, (notebook) (data)
4:00 Group Work
5:00 Depart
Wednesday9:00 APIs & Web Scraping with Jeff Lockhart ( (Slides), (Twitter API notebook), (Twitter data), (CFB scraping notebook), (calendar scraping notebook), (scraping data) )
10:45 Break
11:00 Introduction to Map/Reduce Part I with Patrick Park (files)
12:00 - 1:00 Lunch from Jersusalem Garden (website)
12:15 Lunch Speaker Series - Elizabeth Bruch (bio/website)
1:00 - 1:30 Lunch Clean Up and Coffee Pick Me Up
1:30 Introduction to Map/Reduce Part II with Patrick Park
2:30 Break
2:45 Matplotlib I & Visualization with Abby Azari (PDF)
4:00 Groupwork
5:00 Depart
Thursday9:00 A Taste of Machine Learning with Ronnie Lee
10:00 Break
10:10 Introduction to Text Analysis with Laura Wendlandt (notebook), (slides)
11:20 Break
11:30 Matplotlib II with Abby Azari (VisualizationLesson.zip)
1:00-2:30 Lunch from Great Harvest Bread Co.(website)
1:30 Univesity of Michigan Big Data Resources Panel
2:30 Questions and Follow-up
3:00 HACKATHON BEGINS
5:00 PIZZA ARRIVES - Pizza House (website)
5:00 - 10:00 Rooms at Ross will be available until building closes at 10
Friday 9:00 HACKATHON!
10:00 HACKATHON!
10:00 HACKATHON!
11:00 HACKATHON!
12:00 Lunch from Afternoon Delight (website) - Come get some and keep working
1:00 HACKATHON!
3:00 Presentations Begin
5:00 Feedback
6:00 Celebration at Dominick's

Setup

To participate in the ICOS Big Data Summercamp, you will need working copies of the software described below. Please make sure to install everything (or at least to download the installers) before the start of your bootcamp.

Overview of the tools

Editor

When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words.

The Bash Shell

Bash is a commonly-used shell. Using a shell gives you more power to do more tasks more quickly with your computer.

Python

Python is becoming very popular in scientific computing, and it's a great language for teaching general programming concepts due to its easy-to-read syntax. While the 2.7 branch is commonly used, we are officially migrating to the 3.6 branch. All of the packages crucial for our purposes have been migrated. Installing all the scientific packages for Python individually can be a bit difficult, so we recommend an all in one installer. One of the best is Anaconda by Continuum Analytics.

Jupyter Notebook

The Jupyter Notebook is a web-based interface for interactive computing. The Jupyter Notebook has support for over 100 programming languages, including those popular in Data Science such R, Julia, Scala - and most importantly, Python. Individual notebooks are composable, executable, and sharable documents that mix text, code, data, and visualizations. The Jupyter Notebook comes pre-loaded on many all-in-one python installers like Anaconda.

SQL

SQL is a specialized programming language used with databases. SQL is a declarative langauge for describing (declaring) the data you want from the database. We use SQLite Studio for the lessons. There are often memory allocation limits built into most web browsers It is a relatively small application with a very intutitive GUI We also recomment installing a firefox plugin called SQLite Manager, for the lessons. If for some reason SQLLite Studio does not work for you, this will provide a good backup.

Windows Installation

Python

  • Download and install Anaconda. Very specifically download the graphical installer for the 3.6 version
  • Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Editor (OPTIONAL)

Notepad++ is a popular free code editor for Windows. (NOTE: Be aware that you must add its installation directory to your system path in order to launch it from the command line or have other tools like Git launch it for you. Please ask a TA to help you with this if you are interested.)

Bash Shell on Windows

  • Windows 10 - Please go to the following website which has great instructions for installing the Bash Shell for Windows.
  • Windows 8 and earlier - Please go to the following website and follow the instructions for installing Cygwin a Unix like environment that will give you access to Bash. The installation is slightly more involved. If you find yourself having trouble insalling it, we will take care of it on the first say of camp.

SQLite Studio

Download SQLite Studio to your desktop and follow the directions for installation.

Mac OS X Installation

Python

  • Download and install Anaconda. Very specifically download the graphical installer for the 3.6 version.
  • Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Editor (OPTIONAL)

We recommend Sublime Text. In a pinch, you can use nano or vi, which should be pre-installed.

SQLite Studio

Download SQLite Studio to your desktop and follow the directions for installation.