Social and organizational life are increasingly conducted or tracked online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” It is a good bet that within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “how to make use of big data,” just as statistical analysis is a standard part of such training today. UM Data Camp aims to make big data accessible for people with no prior background. We want people to leave with enough confidence and basic knowledge to be able to know what is possible in their research and where they might go next, drawing on resources at the University of Michigan. Organizing committee: Elizabeth Bruch, Jerry Davis
Coordinator: Jeffrey W. Lockhart
Instructors: Teddy DeWitt (Management and Organizations), Nel Escher (EECS), Tom Fiore (Statistics and Math), Zander Furnas (Political Science), Jeffrey W. Lockhart (Sociology and PSC), Jerry Qiushi Yu (Political Science), Emily Sabo (Linguistics)
Speakers: Jerry Davis (Associate Dean, Ross), Joy Rohde (Interim Director, Science, Technology, and Public Policy Program), Abhraneel Sarma (HDI Lab)
Who: The course is aimed at graduate students and other researchers. This year we have 72 participants from over 30 different units on campus, including Anthropology, Bioinformatics, Climate and Space Sciences, Communication Studies, Design Science, Economics, Education, Ecology and Evolutionary Biology, Electrical Engineering and Computer Science, History, Human Genetics, Industrial and Manufacturing Systems Engineering, Industrial Operations Engineering, IT and ARC, Law, Management and Organizations, Mathematics, Political Science, Population Studies Center, Psychology, Public Policy, School for Environment and Sustainability, School of Information, Social Work, Sociology, School of Public Health, Statistics, Surgery, Transportation Research Institute, Urban Planning, and Women's Studies.
Requirements: Participants must bring a laptop and are encouraged to try installing the packages at the bottom of this page before camp. Installation support will be provided during camp, and online alternatives will be available as well.
Contact: Please mail jwlock at umich dot edu for more information.
Friday, June 14 | 12:00-5:00 pm | Optional pre-camp setup and technical help drop-in in Weiser 706. |
Monday, June 17 | 9:00 | Introduction with Jerry Davis Slides, Video |
9:45 | Camp overview with Jeff Lockhart Slides | |
10:15 | Break with coffee from Panera | |
10:30 | Jupyter Setup and Intro to Python with Emily Sabo Slides, video | |
12:00 | Lunch from Jerusalem Garden | |
1:00 | Python Part Two with Jerry Qiushi Yu notebook | |
2:15 | Python for Data and Statistics with Tom Fiore Files (Download peazip if you need software to to open zipped folders), video of lesson | |
3:25 | Project Overview and Team Formation Slides | |
5:00 | Depart | |
Tuesday | 9:00 | Machine Learning / Data Mining with Jeff Lockhart Slides, Notebook, Chicago data, GSS data, video |
10:30 | Break with coffee from Panera | |
10:45 | Data Visualization with Tom Fiore Files, video | |
12:15 | Lunch with guest speaker Abhraneel Sarma: "Visualization and Statistical Communication" (Catering: Tio's) slides | , video|
1:30 | APIs & Web Scraping with Nel Escher Slides Scraping_Notebook API_Notebook API_Wrapper_Notebook, video part 1, video part 2 | |
3:00 | Group work time | |
5:00 | Depart | |
Wednesday | 9:00 | Network Analysis with Zander Furnas Files, Video |
10:30 | Break with coffee from Panera | |
11:00 | Text Analysis / NLP with Emily Sabo Slides, handout, video | |
12:30 | Lunch with guest speaker Joy Rohde: "Ethics and Politics of Computational Social Science" slides (catering: Afternoon Delight) | |
2:00 | Command Line and BASH with Nel Escher Slides, Files, BASH Cheat Sheet | |
3:30 | Group work time | |
5:00 | Depart | |
Thursday | 9:00 | BIG Data, PySpark, and Hadoop with Jeff Lockhart Slides, Notebook, video of lesson |
10:30 | Break with coffee from Panera | |
11:00 | SQL with Teddy DeWitt Materials, video | |
12:15 | Lunch and Guest Lightning Talks (10th floor Weiser, catering: Jerusalem Garden) Slides | |
1:30 | Resource Fair (10th floor Weiser) Slides | |
2:30 | Hackathon! (Group work time) | |
5:00 | Pizza from Cottage Inn | |
5:00 | HACKATHON! Weiser will remain open for group work until the building closes at 10. | |
Friday | 9:00 | HACKATHON! |
12:00 | Lunch from DiBella's | |
1:00 | HACKATHON! | |
3:00 | Presentations Begin | |
4:30 | Feedback | |
5:30 | Celebration at Dominick's |
To participate in the ICOS Big Data Summercamp, you will need working copies of the software described below. Please make sure to install everything (or at least to download the installers) before the start of your bootcamp.
When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words.
Bash is a commonly-used shell. Using a shell gives you more power to do more tasks more quickly with your computer.
Python is becoming very popular in scientific computing, and it's a great language for teaching general programming concepts due to its easy-to-read syntax. Installing all the scientific packages for Python individually can be a bit difficult, so we recommend an all in one installer. One of the best is Anaconda by Continuum Analytics.
The Jupyter Labs is a browser-based interface for interactive computing. Jupyter Labs has support for over 100 programming languages, including those popular in Data Science such R, Julia, Scala - and most importantly, Python. Individual notebooks are composable, executable, and sharable documents that mix text, code, data, and visualizations. Jupyter Labs comes pre-loaded in Anaconda.
SQL is a specialized programming language used with databases. SQL is a declarative langauge for describing (declaring) the data you want from the database. We use SQLite Studio for the lessons. There are often memory allocation limits built into most web browsers It is a relatively small application with a very intutitive GUI.
Setup Duo two-factor authentication so that you have access to secure and high-performance computing resources at UM.
Notepad++ is a popular free code editor for Windows. (NOTE: Be aware that you must add its installation directory to your system path in order to launch it from the command line or have other tools like Git launch it for you. Please ask a TA to help you with this if you are interested.)
Download SQLite Studio to your desktop and follow the directions for installation.
Setup Duo two-factor authentication so that you have access to secure and high-performance computing resources at UM.
We recommend
Sublime Text.
In a pinch, you can use nano
or vi
,
which should be pre-installed.
Download SQLite Studio to your desktop and follow the directions for installation.
Setup Duo two-factor authentication so that you have access to secure and high-performance computing resources at UM.
The default shell is usually bash
,
but if your machine is set up differently
you can run it by opening a terminal and typing bash
.
There is no need to install anything.
Kate is one option for Linux users.
In a pinch, you can use nano
or vi
,
which should be pre-installed.
sqlite3
comes pre-installed on Linux.
Alternatively, you may install the Firefox SQLite browser plugin described below.
We recommend the all-in-one scientific Python installer Anaconda. (Installation requires using the shell and if you aren't comfortable doing the installation yourself just download the installer and we'll help you at the boot camp.)
bash Anaconda-and then press tab. The name of the file you just downloaded should appear.
yes
and press enter to approve
the license. Press enter to approve the default
location for the files. Type yes
and
press enter to prepend Anaconda to
your PATH
(this makes the Anaconda
distribution the default Python).