Social and organizational life are increasingly conducted or tracked online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” It is a good bet that within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “how to make use of big data,” just as statistical analysis is a standard part of such training today. The ICOS Big Data Camp aims to make big data accessible for people with no prior background. We want people to leave with enough confidence and basic knowledge to be able to know what is possible in their research and where they might go next, drawing on resources at the University of Michigan. Organizing committee: Jerry Davis, H. V. Jagadish, Cliff Lampe, and Brian Noble
Instructors: Jon Atwell, Mike Anderson, Jerry Davis, Zakir Durumeric, Gareth Keeves, Cliff Lampe, Colleen Van Lent, Brian Noble, Katharina Reinecke, Eric Seymour
Helpers: David Adrian, Joshua Adkins, Antonio Deusany de Carvalho Junior, Ariana Mirian
Who: The course is aimed at graduate students and other researchers.
Requirements: Participants must bring a laptop with a few specific software packages installed (listed below).
Contact: Please mail schifelt@umich.edu for more information.
Resources: Go here for example papers and data sources.
Monday | 09:00 | Introduction and Overview with Jerry Davis (Intro ppt, pdf. Assignment ppt, pdf) |
10:15 | Break | |
10:30 | Hero's Journey #1 - Cliff Lampe | |
11:15 | Hero's Journey #2 - Katharina Reinecke | |
12:00 | Lunch break | |
1:00 | Group formation & How to learn in groups: lessons from design teams, Brian Noble | |
2:00 | The Setup & Command line with Todd Schifeling (Tutorials: install, command line) | |
Tuesday | 09:00-10:45 | Introduction to SQL with Mike Anderson (Slides: ppt pdf) |
10:45-11:00 | Coffee Break | |
11:00-12:00 | Using SQL with Eric Seymour (Materials: zip file) | |
12:00-1:00 | Lunch break | |
1:00-5:00 | Group Work (play data) | |
4:00-5:00 | Check-in and end of day discussion | |
Wednesday | 09:00-10:45 | Introduction to Python with Colleen Van Lent (HTML, Notebook) |
10:45-11:00 | Coffee Break | |
11:00-12:00 | Python Jon (Link to code) | |
12:00-1:00 | Lunch break | |
1:00-4:00 | Group Work (scraping links) | |
4:00-5:00 | Check-in and end of day discussion | |
SIGN IN SHEET | ||
Thursday | 9:00-9:20 | Now What? with Sharon Broude Geva (Slides) |
09:20-10:45 | Introduction to APIs with Zakir Durumeric (Install, Code, Lecture) | |
10:45-11:00 | Coffee Break | |
11:00-12:00 | Using APIs with Gareth Keeves | |
12:00-1:00 | Lunch break | |
1:00-1:30 | Music API with Antonio Deusany de Carvalho Junior (notebook) | |
1:00-4:00 | Group Work & Python + SQL (slides, code) | |
4:00-5:00 | Check-in and end of day discussion | |
Thursday June 11th | 1:00-4:00 | Final Session with Group Presentations. Ross R0220 |
4:00-5:00 | Dominicks! |
To participate in the ICOS Big Data Summercamp, you will need working copies of the software described below. Please make sure to install everything (or at least to download the installers) before the start of your bootcamp.
When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words.
Bash is a commonly-used shell. Using a shell gives you more power to do more tasks more quickly with your computer.
Python is becoming very popular in scientific computing, and it's a great language for teaching general programming concepts due to its easy-to-read syntax. We teach with Python version 2.7, since it is still the most widely used. Installing all the scientific packages for Python individually can be a bit difficult, so we recommend an all-in-one installer.
The IPython Notebook is a web-based interface for interactive computing with Python. Individual notebooks are composable, executable, and sharable documents that mix text, code, data, and visualizations. The IPython Notebook comes pre-loaded on many all-in-one python installers like Anaconda CE.
SQL is a specialized programming language used with databases. SQL is a declarative langauge for describing (declaring) the data you want from the database. We use a firefox plugin called SQLite Manager, for the lessons.
Notepad++ is a popular free code editor for Windows. Be aware that you must add its installation directory to your system path in order to launch it from the command line (or have other tools like Git launch it for you). Please ask your instructor to help you do this.
Windows doesn't have sqlite3
available on the the command line,
so we will use this plugin
for Firefox instead.
To install it:
We recommend
Text Wrangler or
Sublime Text.
In a pinch, you can use nano
or vi
,
which should be pre-installed.
Instead of using sqlite3
from the command line,
we will use this plugin
for Firefox instead.
To install it: