ICOS Big Data Summer Camp

University of Michigan

Room R0210 - Ross School of Business - 701 Tappan Street, Central Campus
May 19-22, 2014
9:00 am - 5:00 pm

General Information

Social and organizational life are increasingly conducted or tracked online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” It is a good bet that within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “how to make use of big data,” just as statistical analysis is a standard part of such training today. The ICOS Big Data Camp aims to make big data accessible for people with no prior background. We want people to leave with enough confidence and basic knowledge to be able to know what is possible in their research and where they might go next, drawing on resources at the University of Michigan. Organizing committee: Jerry Davis, Cliff Lampe, Brian Noble, and Jason Owen-Smith

Instructors: Brian Noble, Matt Burton, Michael Cafarella, Colleen Van Lent, Felix Kabo, Russ Funk, Todd Schifeling

Helpers: Khevna Shah, Nick Repole, Guarav Singhal, Tyler Markvluwer, Jonathan Pevarnek, Matt Baumgartner, Matthew Sullivan

Who: The course is aimed at graduate students and other researchers.

Requirements: Participants must bring a laptop with a few specific software packages installed (listed below).

Contact: Please mail mcburton@umich.edu for more information.


Schedule

Monday 09:00 Introduction and Overview (Intro ppt, pdf. Assignment ppt, pdf)
10:15 Break
10:30 Hero's Journey #1 - Sarita Yardi Schoenebeck
11:15 Hero's Journey #2 - Cliff Lampe
12:00 Lunch break
1:00 Group formation & How to learn in groups: lessons from design teams, Brian Noble
2:00 The Setup & Command line with Matt Burton (Tutorials: install, command line)
Tuesday 09:00-11:00 Introduction to SQL with Mike Cafarella (Slides: ppt pdf)
10:00-10:15 Coffee Break
11:00-12:00 Using SQL with Felix Kabo (Slides: ppt pdf & data)
12:00-1:00 Lunch break
1:00-5:00 Group Work (play data)
4:00-5:00 Check-in and end of day discussion
SIGN IN SHEET
Wednesday 09:00-11:00 Introduction to Python with Colleen Van Lent (HTML, Notebook)
10:00-10:15 Coffee Break
11:00-12:00 Using with Python with Russ Funk (slides, code, GitHub)
12:00-1:00 Lunch break
1:00-4:00 Group Work (scraping links)
4:00-5:00 Check-in and end of day discussion
Thursday 9:00-9:20 Now What? with Sharon Broude Geva
09:20-11:00 Introduction to APIs with Brian Noble (Install, Code, Lecture)
10:00-10:15 Coffee Break
11:00-12:00 Using APIs with Todd Schifeling (Slides: ppt, pdf, Code)
12:00-1:00 Lunch break
1:00-4:00 Group Work & Python + SQL (slides, code)
4:00-5:00 Check-in and end of day discussion
Thursday May 29th 1:00-4:00 Final Session with Group Presentations. Ross R0230
4:00-5:00 Dominicks!

Setup

To participate in the ICOS Big Data Summercamp, you will need working copies of the software described below. Please make sure to install everything (or at least to download the installers) before the start of your bootcamp.

Overview of the tools

Editor

When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words.

The Bash Shell

Bash is a commonly-used shell. Using a shell gives you more power to do more tasks more quickly with your computer.

Python

Python is becoming very popular in scientific computing, and it's a great language for teaching general programming concepts due to its easy-to-read syntax. We teach with Python version 2.7, since it is still the most widely used. Installing all the scientific packages for Python individually can be a bit difficult, so we recommend an all-in-one installer.

IPython Notebook

The IPython Notebook is a web-based interface for interactive computing with Python. Individual notebooks are composable, executable, and sharable documents that mix text, code, data, and visualizations. The IPython Notebook comes pre-loaded on many all-in-one python installers like Anaconda CE.

SQL

SQL is a specialized programming language used with databases. SQL is a declarative langauge for describing (declaring) the data you want from the database. We use a firefox plugin called SQLite Manager, for the lessons.

Windows Installation

Python

  • Download and install Anaconda CE.
  • Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Editor

Notepad++ is a popular free code editor for Windows. Be aware that you must add its installation directory to your system path in order to launch it from the command line (or have other tools like Git launch it for you). Please ask your instructor to help you do this.

Firefox SQLite Plugin

Windows doesn't have sqlite3 available on the the command line, so we will use this plugin for Firefox instead. To install it:

  • Start Firefox.
  • Go to the plugin homepage.
  • Click the "Add Now" button.
  • Click "Install Now" on the dialog that appears after the download completes.
  • Restart Firefox when prompted.
  • Select "SQLite Manager" from the "Tools" menu.

Mac OS X Installation

Python

  • Download and install Anaconda CE.
  • Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Editor

We recommend Text Wrangler or Sublime Text. In a pinch, you can use nano or vi, which should be pre-installed.

Firefox SQLite Plugin

Instead of using sqlite3 from the command line, we will use this plugin for Firefox instead. To install it:

  • Start Firefox.
  • Go to the plugin homepage.
  • Click the "Add Now" button.
  • Click "Install Now" on the dialog that appears after the download completes.
  • Restart Firefox when prompted.
  • Select "SQLite Manager" from the "Tools" menu.