# Scraping Reddit

After creating a Reddit account, head over to https://www.reddit.com/prefs/apps. Create a new app, select the 'script' option, and set the redirect uri to http://localhost:8000. 

Check out the PRAW documentation! https://praw.readthedocs.io/en/latest/

## Imports
- `pandas` is a package for handling and manipulating data.
- `praw` is a package that helps you use the Reddit API in python (Python Reddit API Wrapper).
- `time` is a package that lets you work with time in python. 
- `json` is a package for handling data in json format 

In [None]:
import pandas as pd
import praw
import time
import json

## Keys and Credentials
- You'll need API keys from the app you created to use the Reddit API. Fill them out here.

In [None]:
keys = {
    'client_id': 'SHORT_STRING_OF_GARBAGE',
    'client_secret': 'LONGER_STRING_OF_GARBAGE'
}

app_name = "Data Camp"

user_info = {
    'username':'YOUR_USERNAME',
    'password':'YOUR_PASSWORD'
}

### Using keys
- Give your credentials and keys to `praw` so that it can use them to get the data you request.

In [None]:
reddit = praw.Reddit(client_id=keys['client_id'], \
                     client_secret=keys['client_secret'], \
                     user_agent=app_name, \
                     username=user_info['username'], \
                     password=user_info['password']
)

## Using PRAW models

In [None]:
# get an instance of the PRAW subreddit model for the NASA subreddit
nasa = reddit.subreddit('nasa')

# get the 5 hottest posts in the NASA subreddit
nasa_submissions = nasa.hot(limit=5)

# loop through the posts and print their titles
for submission in nasa_submissions:
    print(submission.title)

### Try it out!

In the cell below, get the 10 hottest submissions for a subreddit of your choice and print out their titles.



In [None]:
# get an instance of the PRAW subreddit model for your chosen subreddit

# get the 10 hottest posts in your chosen subreddit

# loop through the posts and print their titles


[Challenge] Use the documentation to print the titles of the top submissions from user GovSchwarzenegger.

https://praw.readthedocs.io/en/latest/code_overview/models/redditor.html#praw.models.Redditor.top

In [None]:
# get top submissions for GovSchwarzenegger

# loop through the submissions and print titles


## A view of the Democratic 2020 Presidential Candidates
Some say it's hard to tell them apart. Let's get the top comments on subreddits dedicated to different candidates and see if we can sort them out.

In [None]:
candidates = ['SandersForPresident', 'ElizabethWarren', 'JoeBiden', 'Pete_Buttigieg', 'Kamala', 'YangForPresidentHQ']

subreddits = [reddit.subreddit(i) for i in candidates]

In [None]:
all_candidates_posts = []

for subreddit in subreddits:
    subreddit_name = subreddit.display_name
    for submission in subreddit.top(limit=100):
        candidate_posts = {}
        candidate_posts["title"] = submission.title
        candidate_posts["score"] = submission.score
        candidate_posts["id"] = submission.id
        candidate_posts["url"] = submission.url
        candidate_posts["created"] = submission.created
        candidate_posts["author"] = submission.author
        candidate_posts["body"] = submission.selftext
        candidate_posts["name"] = subreddit_name
        all_candidates_posts.append(candidate_posts)

## Convert the data to pandas

In [None]:
df = pd.DataFrame(all_candidates_posts)
df.head()

In [None]:
jb = df[df.name=="JoeBiden"]
jb.head()