Python for Everybody: Learn Python from Scratch
About Lesson

Python for Everybody Capstone is an optional Honors assignment that involves retrieving and processing email data from the Sakai open source project. Sakai is a learning management system that is used by many universities and organizations. The email data contains the messages exchanged by the developers and users of Sakai over several years. The goal of this assignment is to analyze the email data and visualize some aspects of it, such as the frequency distribution of the senders and the activity over time.

To complete this assignment, you will need to do the following steps:

  • Download the email data from this link and save it as a file named mbox.txt.
  • Run the gmane.py program, which will read the email data and store it in a SQLite database named content.sqlite. This program will also clean up the data and extract some features, such as the sender, the subject, the date, and the organization.
  • Run the gmodel.py program, which will create a second SQLite database named index.sqlite. This program will model the email data as a network of nodes and edges, where the nodes are the senders and the edges are the messages. It will also compute some metrics, such as the rank and the centrality, for each node.
  • Run the gbasic.py program, which will produce a word cloud visualization of the most frequent words in the email data. You can view the word cloud in a file named gword.htm.
  • Run the gline.py program, which will produce a timeline visualization of the email activity over time. You can view the timeline in a file named gline.htm.

A simple example of the output of this assignment is shown below:

Join the conversation