Social and organizational life are increasingly conducted or tracked online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” It is a good bet that within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “how to make use of big data,” just as statistical analysis is a standard part of such training today. The ICOS Big Data Camp aims to make big data accessible for people with no prior background. We want people to leave with enough confidence and basic knowledge to be able to know what is possible in their research and where they might go next, drawing on resources at the University of Michigan. Organizing committee: Jerry Davis, H. V. Jagadish, Cliff Lampe, and Brian Noble
Coordinators: Teddy DeWitt
Instructors: Sam Carton, Jackie Cohen, Jerry Davis, Teddy DeWitt, Ronnie Lee, Jeff Lockhart, Todd Schifeling, Colleen Van Lent
Guides: Nivi Karki, Ronnie Lee, Jeff Lockhart, Oskar Singer
Speakers: Pete Aceves, Cassandra Chambers, Fred Feinberg, Felix Kabo, Julian Katz-Samuels
Who: The course is aimed at graduate students and other researchers.
Requirements: Participants must bring a laptop with a few specific software packages installed (listed below).
Contact: Please mail teddydew@umich.edu for more information.
Resources: Go here for example papers and data sources.
Monday | 9:00 | Introduction and Overview with Jerry Davis (Intro ppt, pdf.) |
10:30 | Break | |
10:45 | Module 0 - Setup and Command Line Introduction with Jeff Lockhart (BASH Command Sheet pdf) | |
12:00 - 1:00 | Lunch provided by Pizza House (link) | |
12:15 | Lunch Speaker Series - Big Data PhD Student Panel with Pete Aceves, Cassandra Chambers, Teddy DeWitt, Jeff Lockhart, and Ronnie Lee (slides) | |
1:00 - 1:30 | Lunch Clean Up and Coffee Pick Me Up | |
1:30 | Module 1 - Introduction to SQL with Teddy DeWitt (Main slides: pdf, slides on joins: pdf) | |
3:00 | Project Overview - Past Examples and Data Sources assignment slides | |
3:30 | Snacks and Team Formation - Team Docs (scanned pdf) | |
4:00 | Team Meetings | |
5:00 | Depart | |
Tuesday | 9:00 | SQL Challenge (pdf) (ppt) [Neccesary Additional Files: "Directors Schema" "Directors Data" "Retail DB"] |
9:45 | SQL Exercise Review | |
10:00 | Break | |
10:15 | Module 2 - Introduction to Python with Colleen Van Lent and Jackie Cohen (HTML, Notebook) | |
12:00 - 1:00 | Lunch provided by Picasso Cafe (link) | |
12:15 | Lunch Speaker Series - Fred Feinberg (Slides) | |
1:00 - 1:30 | Lunch Clean Up and Coffee Pick Me Up | |
1:30 | Module 3 - Structures and Scraping with Nivi Karki (NOTEBOOK, .csv File) and Todd Schifeling (Notebook1, Notebook2, Notebook3, Txt1, Txt2) | |
3:00 | Group Work Scraping Examples | |
5:00 | Depart | |
Wednesday | 9:00 | Python Review |
9:45 | Q and A | |
10:00 | Break | |
10:15 | Module 4 - Python Introduction to NLTK with Sam Carton (NOTEBOOK, csv1, csv2) | |
12:00 - 1:00 | Lunch provided by Jerusalem Garden (link) | |
12:15 | Lunch Speaker Series - Julian Katz Samuels (ppt) (pdf) | |
1:00 - 1:30 | Lunch Clean Up and Coffee Pick Me Up | |
1:30 | Module 5 - APIs with Todd Schifeling (NOTEBOOK, setup file, tutorial) (bonus: semantic analyzer) | |
3:30 | Group Work | |
5:00 | Depart | |
Thursday | 9:00 | Python Review - Sit with your Teams |
9:30 | Module 6 - Introduction to Python for Data Analysis (notebook (complete with instructor solutions), data, baseball.tsv) | |
12:00 - 1:00 | Lunch Provided by Great Harvest Bread (link) | |
12:15 | Lunch Speaker Series - Felix Kabo (pdf, txt) | |
1:00 - 1:30 | Lunch Clean Up and Coffee Pick Me Up | |
1:30 | Brock Palen - ARC-TS(webpage) (slides), Al Hero - MIDAS (website) (slides), and Kerby Shedden - CSCAR (website) (slides) | |
3:00 | HACKATHON BEGINS | |
5:00 | PIZZA ARRIVES | |
5:00 - 10:00 | Rooms at Ross will be available until building closes at 10 | |
Friday | 9:00 | HACKATHON! |
10:00 | HACKATHON! | |
10:00 | HACKATHON! | |
11:00 | HACKATHON! | |
12:00 | Lunch provided by Afternoon Delight (link) - Come get some and keep working | |
1:00 | HACKATHON! | |
3:00 | Presentations Begin | |
5:00 | Feedback | |
6:00 | Celebration at Dominick's |
To participate in the ICOS Big Data Summercamp, you will need working copies of the software described below. Please make sure to install everything (or at least to download the installers) before the start of your bootcamp.
When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words.
Bash is a commonly-used shell. Using a shell gives you more power to do more tasks more quickly with your computer.
Python is becoming very popular in scientific computing, and it's a great language for teaching general programming concepts due to its easy-to-read syntax. While the 2.7 branch is commonly used, we are officially migrating to the 3.6 branch. All of the packages crucial for our purposes have been migrated. Installing all the scientific packages for Python individually can be a bit difficult, so we recommend an all in one installer. One of the best is Anaconda by Continuum Analytics.
The Jupyter Notebook is a web-based interface for interactive computing. The Jupyter Notebook has support for over 40 programming languages, including those popular in Data Science such R, Julia, Scala - and most importantly, Python. Individual notebooks are composable, executable, and sharable documents that mix text, code, data, and visualizations. The Jupyter Notebook comes pre-loaded on many all-in-one python installers like Anaconda.
SQL is a specialized programming language used with databases. SQL is a declarative langauge for describing (declaring) the data you want from the database. We use a firefox plugin called SQLite Manager, for the lessons.
We also recommend installing SQLite Studio as well. There are often memory allocation limits built into most web browsers which can make imporing large amount of data into a SQLite database challenging. SQLite Studio does not have this problem. It is also a relatively small application with a very intutitive GUI.
Notepad++ is a popular free code editor for Windows. (NOTE: Be aware that you must add its installation directory to your system path in order to launch it from the command line or have other tools like Git launch it for you. Please ask a TA to help you with this if you are interested.)
Download SQLite Studio to your desktop and follow the directions for installation.
Please download the SQLite manager plugin for Firefox. To install it:
We recommend
Sublime Text.
In a pinch, you can use nano
or vi
,
which should be pre-installed.
Download SQLite Studio to your desktop and follow the directions for installation.
Instead of using sqlite3
from the command line,
we will use this plugin
for Firefox instead.
To install it: