Social and organizational life are increasingly conducted or tracked online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” It is a good bet that within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “how to make use of big data,” just as statistical analysis is a standard part of such training today. The ICOS Big Data Camp aims to make big data accessible for people with no prior background. We want people to leave with enough confidence and basic knowledge to be able to know what is possible in their research and where they might go next, drawing on resources at the University of Michigan. Organizing committee: Jerry Davis, H. V. Jagadish, Cliff Lampe, and Brian Noble
Coordinators: Teddy DeWitt
Instructors: Abigail Azari, Jackie Cohen, Teddy DeWitt, Ronnie Lee, Jeff Lockhart, Patrick Park, Colleen Van Lent, Laura Wendlandt
Guides: Abigail Azari, Teddy DeWitt, Mana Heshmati, Ronnie Lee, Jeff Lockhart, Patrick Park
Speakers: Adriene Beltz, Elizabeth Bruch, Emilee Rader
Who: The course is aimed at graduate students and other researchers.
Requirements: Participants must bring a laptop with a few specific software packages installed (listed below).
Contact: Please mail teddydew@umich.edu for more information.
Resources: Go here for example papers and data sources.
Monday | 9:00 | Introduction with Jerry Davis (Intro ppt, pdf.) |
9:45 | Camp Overview with Teddy DeWitt | |
10:30 | Coffee Break and Software Check | |
10:45 | Setup and Command Line Introduction with Jeff Lockhart (BASH Command Sheet pdf) | |
12:00 - 1:00 | Lunch from Pizza House (website) | |
12:15 | Lunch Speaker Series - Emilee Rader (bio/website) (slides) | |
1:00 - 1:30 | Lunch Clean Up and Coffee Pick Me Up | |
1:30 | Introduction to SQL with Teddy DeWitt (Main slides: pdf, database, slides on joins: pdf) | |
3:00 | Project Overview and Team Formation | |
4:00 | Team Meetings and Software Check | |
5:00 | Depart | |
Tuesday | 9:00 | Q & A and Software Check |
9:30 | Introduction to Python with Colleen Van Lent (notebook) (data) | |
12:00 - 1:00 | Lunch from Picasso Cafe (website) | |
12:15 | Lunch Speaker Series - Adriene Beltz (bio/website) | |
1:00 - 1:30 | Lunch Clean Up and Coffee Pick Me Up | |
1:30 | Data Structures and pandas with Ronnie Lee, (notebook) (data) | |
4:00 | Group Work | |
5:00 | Depart | |
Wednesday | 9:00 | APIs & Web Scraping with Jeff Lockhart ( (Slides), (Twitter API notebook), (Twitter data), (CFB scraping notebook), (calendar scraping notebook), (scraping data) ) |
10:45 | Break | |
11:00 | Introduction to Map/Reduce Part I with Patrick Park (files) | |
12:00 - 1:00 | Lunch from Jersusalem Garden (website) | |
12:15 | Lunch Speaker Series - Elizabeth Bruch (bio/website) | |
1:00 - 1:30 | Lunch Clean Up and Coffee Pick Me Up | |
1:30 | Introduction to Map/Reduce Part II with Patrick Park | |
2:30 | Break | |
2:45 | Matplotlib I & Visualization with Abby Azari (PDF) | |
4:00 | Groupwork | |
5:00 | Depart | |
Thursday | 9:00 | A Taste of Machine Learning with Ronnie Lee |
10:00 | Break | |
10:10 | Introduction to Text Analysis with Laura Wendlandt (notebook), (slides) | |
11:20 | Break | |
11:30 | Matplotlib II with Abby Azari (VisualizationLesson.zip) | |
1:00-2:30 | Lunch from Great Harvest Bread Co.(website) | |
1:30 | Univesity of Michigan Big Data Resources Panel | |
2:30 | Questions and Follow-up | |
3:00 | HACKATHON BEGINS | |
5:00 | PIZZA ARRIVES - Pizza House (website) | |
5:00 - 10:00 | Rooms at Ross will be available until building closes at 10 | |
Friday | 9:00 | HACKATHON! |
10:00 | HACKATHON! | |
10:00 | HACKATHON! | |
11:00 | HACKATHON! | |
12:00 | Lunch from Afternoon Delight (website) - Come get some and keep working | |
1:00 | HACKATHON! | |
3:00 | Presentations Begin | |
5:00 | Feedback | |
6:00 | Celebration at Dominick's |
To participate in the ICOS Big Data Summercamp, you will need working copies of the software described below. Please make sure to install everything (or at least to download the installers) before the start of your bootcamp.
When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words.
Bash is a commonly-used shell. Using a shell gives you more power to do more tasks more quickly with your computer.
Python is becoming very popular in scientific computing, and it's a great language for teaching general programming concepts due to its easy-to-read syntax. While the 2.7 branch is commonly used, we are officially migrating to the 3.6 branch. All of the packages crucial for our purposes have been migrated. Installing all the scientific packages for Python individually can be a bit difficult, so we recommend an all in one installer. One of the best is Anaconda by Continuum Analytics.
The Jupyter Notebook is a web-based interface for interactive computing. The Jupyter Notebook has support for over 100 programming languages, including those popular in Data Science such R, Julia, Scala - and most importantly, Python. Individual notebooks are composable, executable, and sharable documents that mix text, code, data, and visualizations. The Jupyter Notebook comes pre-loaded on many all-in-one python installers like Anaconda.
SQL is a specialized programming language used with databases. SQL is a declarative langauge for describing (declaring) the data you want from the database. We use SQLite Studio for the lessons. There are often memory allocation limits built into most web browsers It is a relatively small application with a very intutitive GUI We also recomment installing a firefox plugin called SQLite Manager, for the lessons. If for some reason SQLLite Studio does not work for you, this will provide a good backup.
Notepad++ is a popular free code editor for Windows. (NOTE: Be aware that you must add its installation directory to your system path in order to launch it from the command line or have other tools like Git launch it for you. Please ask a TA to help you with this if you are interested.)
Download SQLite Studio to your desktop and follow the directions for installation.
We recommend
Sublime Text.
In a pinch, you can use nano
or vi
,
which should be pre-installed.
Download SQLite Studio to your desktop and follow the directions for installation.