ICOS Big Data Summer Camp

University of Michigan

Room R0210 - Ross School of Business - 701 Tappan Street, Central Campus
June 5th-9th 2017
9:00 am - 5:00 pm

General Information

Social and organizational life are increasingly conducted or tracked online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” It is a good bet that within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “how to make use of big data,” just as statistical analysis is a standard part of such training today. The ICOS Big Data Camp aims to make big data accessible for people with no prior background. We want people to leave with enough confidence and basic knowledge to be able to know what is possible in their research and where they might go next, drawing on resources at the University of Michigan. Organizing committee: Jerry Davis, H. V. Jagadish, Cliff Lampe, and Brian Noble

Coordinators: Teddy DeWitt

Instructors: Sam Carton, Jackie Cohen, Jerry Davis, Teddy DeWitt, Ronnie Lee, Jeff Lockhart, Todd Schifeling, Colleen Van Lent

Guides: Nivi Karki, Ronnie Lee, Jeff Lockhart, Oskar Singer

Speakers: Pete Aceves, Cassandra Chambers, Fred Feinberg, Felix Kabo, Julian Katz-Samuels

Who: The course is aimed at graduate students and other researchers.

Requirements: Participants must bring a laptop with a few specific software packages installed (listed below).

Contact: Please mail teddydew@umich.edu for more information.

Resources: Go here for example papers and data sources.


Schedule

Monday 9:00 Introduction and Overview with Jerry Davis (Intro ppt, pdf.)
10:30 Break
10:45 Module 0 - Setup and Command Line Introduction with Jeff Lockhart (BASH Command Sheet pdf)
12:00 - 1:00 Lunch provided by Pizza House (link)
12:15 Lunch Speaker Series - Big Data PhD Student Panel with Pete Aceves, Cassandra Chambers, Teddy DeWitt, Jeff Lockhart, and Ronnie Lee (slides)
1:00 - 1:30 Lunch Clean Up and Coffee Pick Me Up
1:30 Module 1 - Introduction to SQL with Teddy DeWitt (Main slides: pdf, slides on joins: pdf)
3:00 Project Overview - Past Examples and Data Sources assignment slides
3:30 Snacks and Team Formation - Team Docs (scanned pdf)
4:00 Team Meetings
5:00 Depart
Tuesday9:00 SQL Challenge (pdf) (ppt) [Neccesary Additional Files: "Directors Schema" "Directors Data" "Retail DB"]
9:45 SQL Exercise Review
10:00 Break
10:15 Module 2 - Introduction to Python with Colleen Van Lent and Jackie Cohen (HTML, Notebook)
12:00 - 1:00 Lunch provided by Picasso Cafe (link)
12:15 Lunch Speaker Series - Fred Feinberg (Slides)
1:00 - 1:30 Lunch Clean Up and Coffee Pick Me Up
1:30 Module 3 - Structures and Scraping with Nivi Karki (NOTEBOOK, .csv File) and Todd Schifeling (Notebook1, Notebook2, Notebook3, Txt1, Txt2)
3:00 Group Work Scraping Examples
5:00 Depart
Wednesday9:00 Python Review
9:45 Q and A
10:00 Break
10:15 Module 4 - Python Introduction to NLTK with Sam Carton (NOTEBOOK, csv1, csv2)
12:00 - 1:00 Lunch provided by Jerusalem Garden (link)
12:15 Lunch Speaker Series - Julian Katz Samuels (ppt) (pdf)
1:00 - 1:30 Lunch Clean Up and Coffee Pick Me Up
1:30 Module 5 - APIs with Todd Schifeling (NOTEBOOK, setup file, tutorial) (bonus: semantic analyzer)
3:30 Group Work
5:00 Depart
Thursday9:00 Python Review - Sit with your Teams
9:30 Module 6 - Introduction to Python for Data Analysis (notebook (complete with instructor solutions), data, baseball.tsv)
12:00 - 1:00 Lunch Provided by Great Harvest Bread (link)
12:15 Lunch Speaker Series - Felix Kabo (pdf, txt)
1:00 - 1:30 Lunch Clean Up and Coffee Pick Me Up
1:30 Brock Palen - ARC-TS(webpage) (slides), Al Hero - MIDAS (website) (slides), and Kerby Shedden - CSCAR (website) (slides)
3:00 HACKATHON BEGINS
5:00 PIZZA ARRIVES
5:00 - 10:00 Rooms at Ross will be available until building closes at 10
Friday 9:00 HACKATHON!
10:00 HACKATHON!
10:00 HACKATHON!
11:00 HACKATHON!
12:00 Lunch provided by Afternoon Delight (link) - Come get some and keep working
1:00 HACKATHON!
3:00 Presentations Begin
5:00 Feedback
6:00 Celebration at Dominick's

Setup

To participate in the ICOS Big Data Summercamp, you will need working copies of the software described below. Please make sure to install everything (or at least to download the installers) before the start of your bootcamp.

Overview of the tools

Editor

When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words.

The Bash Shell

Bash is a commonly-used shell. Using a shell gives you more power to do more tasks more quickly with your computer.

Python

Python is becoming very popular in scientific computing, and it's a great language for teaching general programming concepts due to its easy-to-read syntax. While the 2.7 branch is commonly used, we are officially migrating to the 3.6 branch. All of the packages crucial for our purposes have been migrated. Installing all the scientific packages for Python individually can be a bit difficult, so we recommend an all in one installer. One of the best is Anaconda by Continuum Analytics.

Jupyter Notebook

The Jupyter Notebook is a web-based interface for interactive computing. The Jupyter Notebook has support for over 40 programming languages, including those popular in Data Science such R, Julia, Scala - and most importantly, Python. Individual notebooks are composable, executable, and sharable documents that mix text, code, data, and visualizations. The Jupyter Notebook comes pre-loaded on many all-in-one python installers like Anaconda.

SQL

SQL is a specialized programming language used with databases. SQL is a declarative langauge for describing (declaring) the data you want from the database. We use a firefox plugin called SQLite Manager, for the lessons.

We also recommend installing SQLite Studio as well. There are often memory allocation limits built into most web browsers which can make imporing large amount of data into a SQLite database challenging. SQLite Studio does not have this problem. It is also a relatively small application with a very intutitive GUI.

Windows Installation

Python

  • Download and install Anaconda. Very specifically download the graphical installer for the 3.6 version
  • Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Editor

Notepad++ is a popular free code editor for Windows. (NOTE: Be aware that you must add its installation directory to your system path in order to launch it from the command line or have other tools like Git launch it for you. Please ask a TA to help you with this if you are interested.)

SQLite Studio

Download SQLite Studio to your desktop and follow the directions for installation.

Bash Shell on Windows

  • Windows 10 - Please go to the following website which has great instructions for installing the Bash Shell for Windows.
  • Windows 8 and earlier - Please go to the following website and follow the instructions for installing Cygwin a Unix like environment that will give you access to Bash. The installation is slightly more involved. If you find yourself having trouble insalling it, we will take care of it on the first say of camp.

Firefox SQLite Plugin

Please download the SQLite manager plugin for Firefox. To install it:

  • Start Firefox.
  • Go to the plugin homepage.
  • Click the "Add Now" button.
  • Click "Install Now" on the dialog that appears after the download completes.
  • Restart Firefox when prompted.
  • Depending on Firefox version, either 1) Select "SQLite Manager" from the "Tools" menu or 2) Go to "customize" in main menu and drag SQLite into the menu.

Mac OS X Installation

Python

  • Download and install Anaconda. Very specifically download the graphical installer for the 3.6 version.
  • Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Editor

We recommend Sublime Text. In a pinch, you can use nano or vi, which should be pre-installed.

SQLite Studio

Download SQLite Studio to your desktop and follow the directions for installation.

Firefox SQLite Plugin

Instead of using sqlite3 from the command line, we will use this plugin for Firefox instead. To install it:

  • Start Firefox.
  • Go to the plugin homepage.
  • Click the "Add Now" button.
  • Click "Install Now" on the dialog that appears after the download completes.
  • Restart Firefox when prompted.
  • Depending on Firefox version, either 1) Select "SQLite Manager" from the "Tools" menu or 2) Go to "customize" in main menu and drag SQLite into the menu.