UM Big Data Summer Camp

University of Michigan

Weiser Hall - 500 Church St., Central Campus
June 17th-21st 2019
9:00 am - 5:00 pm

General Information

Social and organizational life are increasingly conducted or tracked online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” It is a good bet that within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “how to make use of big data,” just as statistical analysis is a standard part of such training today. UM Data Camp aims to make big data accessible for people with no prior background. We want people to leave with enough confidence and basic knowledge to be able to know what is possible in their research and where they might go next, drawing on resources at the University of Michigan. Organizing committee: Elizabeth Bruch, Jerry Davis

Coordinator: Jeffrey W. Lockhart

Instructors: Teddy DeWitt (Management and Organizations), Nel Escher (EECS), Tom Fiore (Statistics and Math), Zander Furnas (Political Science), Jeffrey W. Lockhart (Sociology and PSC), Jerry Qiushi Yu (Political Science), Emily Sabo (Linguistics)

Speakers: Jerry Davis (Associate Dean, Ross), Joy Rohde (Interim Director, Science, Technology, and Public Policy Program), Abhraneel Sarma (HDI Lab)

Who: The course is aimed at graduate students and other researchers. This year we have 72 participants from over 30 different units on campus, including Anthropology, Bioinformatics, Climate and Space Sciences, Communication Studies, Design Science, Economics, Education, Ecology and Evolutionary Biology, Electrical Engineering and Computer Science, History, Human Genetics, Industrial and Manufacturing Systems Engineering, Industrial Operations Engineering, IT and ARC, Law, Management and Organizations, Mathematics, Political Science, Population Studies Center, Psychology, Public Policy, School for Environment and Sustainability, School of Information, Social Work, Sociology, School of Public Health, Statistics, Surgery, Transportation Research Institute, Urban Planning, and Women's Studies.

Requirements: Participants must bring a laptop and are encouraged to try installing the packages at the bottom of this page before camp. Installation support will be provided during camp, and online alternatives will be available as well.

Contact: Please mail jwlock at umich dot edu for more information.


Schedule

, video
Friday, June 14 12:00-5:00 pm Optional pre-camp setup and technical help drop-in in Weiser 706.
Monday, June 17 9:00 Introduction with Jerry Davis Slides, Video
9:45 Camp overview with Jeff Lockhart Slides
10:15 Break with coffee from Panera
10:30 Jupyter Setup and Intro to Python with Emily Sabo Slides, video
12:00 Lunch from Jerusalem Garden
1:00 Python Part Two with Jerry Qiushi Yu notebook
2:15 Python for Data and Statistics with Tom Fiore Files (Download peazip if you need software to to open zipped folders), video of lesson
3:25 Project Overview and Team Formation Slides
5:00 Depart
Tuesday9:00 Machine Learning / Data Mining with Jeff Lockhart Slides, Notebook, Chicago data, GSS data, video
10:30 Break with coffee from Panera
10:45 Data Visualization with Tom Fiore Files, video
12:15 Lunch with guest speaker Abhraneel Sarma: "Visualization and Statistical Communication" (Catering: Tio's) slides
1:30 APIs & Web Scraping with Nel Escher Slides Scraping_Notebook API_Notebook API_Wrapper_Notebook, video part 1, video part 2
3:00 Group work time
5:00 Depart
Wednesday9:00 Network Analysis with Zander Furnas Files, Video
10:30 Break with coffee from Panera
11:00 Text Analysis / NLP with Emily Sabo Slides, handout, video
12:30 Lunch with guest speaker Joy Rohde: "Ethics and Politics of Computational Social Science" slides (catering: Afternoon Delight)
2:00 Command Line and BASH with Nel Escher Slides, Files, BASH Cheat Sheet
3:30 Group work time
5:00 Depart
Thursday9:00 BIG Data, PySpark, and Hadoop with Jeff Lockhart Slides, Notebook, video of lesson
10:30 Break with coffee from Panera
11:00 SQL with Teddy DeWitt Materials, video
12:15 Lunch and Guest Lightning Talks (10th floor Weiser, catering: Jerusalem Garden) Slides
1:30 Resource Fair (10th floor Weiser) Slides
2:30 Hackathon! (Group work time)
5:00 Pizza from Cottage Inn
5:00 HACKATHON! Weiser will remain open for group work until the building closes at 10.
Friday 9:00 HACKATHON!
12:00 Lunch from DiBella's
1:00 HACKATHON!
3:00 Presentations Begin
4:30 Feedback
5:30 Celebration at Dominick's

Resources

Setup

To participate in the ICOS Big Data Summercamp, you will need working copies of the software described below. Please make sure to install everything (or at least to download the installers) before the start of your bootcamp.

Overview of the tools

Editor

When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words.

The Bash Shell

Bash is a commonly-used shell. Using a shell gives you more power to do more tasks more quickly with your computer.

Python

Python is becoming very popular in scientific computing, and it's a great language for teaching general programming concepts due to its easy-to-read syntax. Installing all the scientific packages for Python individually can be a bit difficult, so we recommend an all in one installer. One of the best is Anaconda by Continuum Analytics.

Jupyter Notebook

The Jupyter Labs is a browser-based interface for interactive computing. Jupyter Labs has support for over 100 programming languages, including those popular in Data Science such R, Julia, Scala - and most importantly, Python. Individual notebooks are composable, executable, and sharable documents that mix text, code, data, and visualizations. Jupyter Labs comes pre-loaded in Anaconda.

SQL

SQL is a specialized programming language used with databases. SQL is a declarative langauge for describing (declaring) the data you want from the database. We use SQLite Studio for the lessons. There are often memory allocation limits built into most web browsers It is a relatively small application with a very intutitive GUI.

Windows Setup

Duo

Setup Duo two-factor authentication so that you have access to secure and high-performance computing resources at UM.

Python

  • Download and install Anaconda. Very specifically download the graphical installer for the 3.7 version
  • Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Editor (OPTIONAL)

Notepad++ is a popular free code editor for Windows. (NOTE: Be aware that you must add its installation directory to your system path in order to launch it from the command line or have other tools like Git launch it for you. Please ask a TA to help you with this if you are interested.)

Bash Shell on Windows

  • Windows 10 - Please go to the following website which has great instructions for installing the Bash Shell for Windows.
  • Windows 8 and earlier - Please go to the following website and follow the instructions for installing Cygwin a Unix like environment that will give you access to Bash. The installation is slightly more involved. If you find yourself having trouble insalling it, we will take care of it on the first say of camp.

SQLite Studio

Download SQLite Studio to your desktop and follow the directions for installation.

Mac OS X Installation

Duo

Setup Duo two-factor authentication so that you have access to secure and high-performance computing resources at UM.

Python

  • Download and install Anaconda. Very specifically download the graphical installer for the 3.7 version.
  • Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Editor (OPTIONAL)

We recommend Sublime Text. In a pinch, you can use nano or vi, which should be pre-installed.

SQLite Studio

Download SQLite Studio to your desktop and follow the directions for installation.

Linux

Duo

Setup Duo two-factor authentication so that you have access to secure and high-performance computing resources at UM.

Bash

The default shell is usually bash, but if your machine is set up differently you can run it by opening a terminal and typing bash. There is no need to install anything.

Editor

Kate is one option for Linux users. In a pinch, you can use nano or vi, which should be pre-installed.

SQLite

sqlite3 comes pre-installed on Linux. Alternatively, you may install the Firefox SQLite browser plugin described below.

Python

We recommend the all-in-one scientific Python installer Anaconda. (Installation requires using the shell and if you aren't comfortable doing the installation yourself just download the installer and we'll help you at the boot camp.)

  1. Download the installer that matches your operating system and save it in your home folder.
  2. Open a terminal window.
  3. Type
    bash Anaconda-
    and then press tab. The name of the file you just downloaded should appear.
  4. Press enter. You will follow the text-only prompts. When there is a colon at the bottom of the screen press the down arrow to move down through the text. Type yes and press enter to approve the license. Press enter to approve the default location for the files. Type yes and press enter to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).