Python Reddit Bot Tutorial

Python Reddit Bot Tutorial

Sep 3, 2020

In this tutorial we will learn how to build a Python Reddit bot using the requests module. The bot will perform a single task of listening to all the new comments on a subreddit. It will check if they contain any of the specified set of phrases. If any comment contains one or more of these specified phrases it will send a notification to us with the details of that comment and the phrases it matched for.

Background for our Python Reddit bot

There are well-established tools and approaches to make the process of building Python Reddit bots extremely simple. Most such techniques require registering a developer account on Reddit and making calls to its API. There is also a popular library called PRAW which is a wrapper over Reddit's API. It does a lot of heavy-lifting to make the process of building Python Reddit bots much easier for the developer. However this approach might be an overkill if we only want to read the public information present on Reddit rather than publishing our own content to the site. In that case we can avoid the hassle of creating a developer account or using a third party Reddit API wrapper like PRAW. Instead we can tackle the problem in a much simpler and more lightweight manner. We will discuss this method in this Python Reddit bot tutorial!

Overview

We will be using the popular Python HTTP client library requests for making HTTP requests. We will use it to fetch the JSON file maintained by Reddit to keep track of the comments on any subreddit. Reddit provides a JSON endpoint for each subreddit where the latest comments on the entire subreddit are listed. We simply need to periodically download this JSON file and parse it to obtain all the latest comments. This will let us keep track of all the comments and track any set of keywords. And all of this without having to log in to Reddit or using PRAW's API.

Downloading and parsing the comments

We will start with downloading the JSON file for any specified subreddit and iterating through all its comments.

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0"}


def get_comments(subreddit, limit=100):
    res = requests.get("https://www.reddit.com/r/%s/comments.json?limit=%d" %
                    (subreddit, limit), headers=headers)

    comments = res.json()["data"]["children"]
    for c in comments:
        comment = c["data"]
        print("Author: " + comment["author"])
        print("Permalink: " + comment["permalink"])
        print("Body: " + comment["body"])
        print("=" * 30)


if __name__ == "__main__":
    get_comments("python")

Running the above piece of code we get the following output (parts of the output trimmed for brevity):

Author: metaperl
Permalink: /r/Python/comments/il69g7/made_a_cli_tool_called_gee_to_speed_up_my_common/g3q1edh/
Body: where is the source? what command-line arg parser did you use?
==============================
Author: metaperl
Permalink: /r/Python/comments/il6s2l/check_out_this_tool_i_made_for_viewing_filtering/g3q18jo/
Body: I'm rather  new on pandas so forgive a beginner question: does this overlap in features with a Jupyter notebook?
==============================
Author: alexmojaki
Permalink: /r/Python/comments/il6s2l/check_out_this_tool_i_made_for_viewing_filtering/g3q11ct/
Body: Reminds me of https://bamboolib.8080labs.com/
==============================
Author: sqlphilosopher
Permalink: /r/Python/comments/il7e9n/made_custom_command_called_https_in_linux_using/g3pykmb/
Body: Very cool, consider adding this to the official Debian repository
==============================
Author: perryplatt
Permalink: /r/Python/comments/il3ydr/map_creator_made_using_python/g3pyaqr/
Body: If you could include fog of war, you have the start of an rts.
==============================
Author: mattwandcow
Permalink: /r/Python/comments/ikod5j/automate_the_boring_stuff_with_python_online/g3py4jj/
Body: I passed the link to an interested friend. I grabbed this a while back, loved the course and bought the book for reference. Thank you for creating such a wonderful teaching tool
==============================
Author: Karki2002
Permalink: /r/Python/comments/il3ydr/map_creator_made_using_python/g3pxn5d/
Body: Thanks a lot for the support :)
==============================
Author: Karki2002
Permalink: /r/Python/comments/il3ydr/map_creator_made_using_python/g3px94o/
Body: Yeah, I’ve been looking for someone to help make some pixel art for me, but it doesn’t really come free. So I thought the next best thing would be to checkout Open Game Art... I’m shit at art.
==============================
Author: vreo
Permalink: /r/Python/comments/ikliwj/web_scraping_1010_with_python/g3psngk/
Body: Oh... hehehe woosh
==============================
Author: QuantumCoder002
Permalink: /r/Python/comments/il3ydr/map_creator_made_using_python/g3ps07p/
Body: upvoted this post in r/pygame and here, its just so awesome !!!!!!!
==============================

We can see that we are able to get the information needed to track for any set of keywords on a periodic basis. Now let's look into the code.

The main activity in our code is happening on 8th line where we call requests.get. This sends a GET request to a dynamically generated URL which is the endpoint for the JSON file we need. Then after analyzing the response JSON we can quickly figure out the format of the response. Based on that we are iterating through all the comments by parsing the JSON file. We then read the fields we are interested in. It's also important to set the header field to something custom or emulate a real browser. Not doing so will use the default user-agent value which is blocked by Reddit servers.

Setting up periodic fetching of data for our Python Reddit bot

We are now able to call a function to obtain the details from the latest comments on a subreddit. Next we need to do this on a periodic basis to scan for keywords 24x7. Let's jump into the code.

import requests
import time
from datetime import datetime, timezone

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0"}

mock_keywords = {
    "art": ["pixel art", "image processing"],
    "env": ["virtualenv", "pyenv", "poetry"],
    "superlative": ["the best"],
}

muted_users = ["AutoModerator"]


def send_notification(title: str, body: str):
    print("--- Message ---")
    print("Title:", title)
    print(body)
    print("-" * 30)


def fetch_comment_data_periodically(subreddit: str, limit: int = 100):
    with open("read_comment_permalinks.txt", "a") as f:
        pass

    while True:
        with open("read_comment_permalinks.txt", "r") as f:
            read_comments = [l.rstrip() for l in f.read().split("\n")]
        comments = get_comments(subreddit, limit)
        for comment, body in comments:
            matches = scan_comment_for_keywords(body, comment, mock_keywords)
            if not matches:
                continue
            if comment["permalink"] in read_comments:
                continue
            if comment["author"] in muted_users:
                continue

            utc = datetime.utcfromtimestamp(
                comment["created_utc"]).replace(tzinfo=timezone.utc)
            local_time = utc.astimezone(tz=None)
            match_text = ", ".join([m["kind"] for m in matches])
            comment_text = "Author: " + comment["author"]
            comment_text += "\nPermalink: " + comment["permalink"]
            comment_text += "\nCreated: " + \
                local_time.strftime("%d %b, %H:%M:%S")
            comment_text += "\nBody: " + comment["body"]
            send_notification("Match for '%s'" % match_text, comment_text)
            read_comments.append(comment["permalink"])
            with open("read_comment_permalinks.txt", "a") as f:
                f.write(comment["permalink"] + "\n")
        time.sleep(60)


def get_comments(subreddit: str, limit: int = 100):
    res = requests.get("https://www.reddit.com/r/%s/comments.json?limit=%d" %
                    (subreddit, limit), headers=headers)

    comments = res.json()["data"]["children"]
    results = []
    for c in comments:
        comment = c["data"]
        results.append((comment, comment["body"]))
    return results


def scan_comment_for_keywords(comment: str, comment_obj: dict, keywords: dict):
    matches = []
    for k, phrases in keywords.items():
        for p in phrases:
            if p in comment.lower():
                matches.append({"kind": k, "comment": comment_obj})
                break
    return matches


if __name__ == "__main__":
    fetch_comment_data_periodically("python")

We start with creating a mock dictionary containing the set of keywords we want to monitor. For now we have mocked sending the notification but it can easily be replaced for real usage. We have multiple options such as sending an email to ourselves, using a Discord/Telegram bot, or web push notification and so on. You can read the articles on the best Python Automation tools. The last function in our code performs a very simple search for each keyword in a given comment. We can improve its performance by using more suitable data structures but this will suffice for simple cases.

The main addition in this code is the fetch_comment_data_periodically function which runs in an infinite loop. It also sleeps for one minute after every cycle. In each cycle we fetch the JSON file from Reddit and read the comments, looking for keyword matches. If there is any match we send a notification to ourselves. We also store the permalink of the matched comments in a text file. We use this file to check if we have already seen this comment to avoid duplicate notifications.

This is it for this tutorial! There's scope for further enhancements such as setting up a daemon for this process and so on.



Comments






Copyright © 2020 Python Automation Tutorial. All Rights Reserved