About Lesson
Python for Everybody Capstone is an optional Honors assignment that involves retrieving and processing email data from the Sakai open source project. Sakai is a learning management system that is used by many universities and organizations. The email data contains the messages exchanged by the developers and users of Sakai over several years. The goal of this assignment is to analyze the email data and visualize some aspects of it, such as the frequency distribution of the senders and the activity over time.
To complete this assignment, you will need to do the following steps:
- Download the email data from this link and save it as a file named
mbox.txt
. - Run the
gmane.py
program, which will read the email data and store it in a SQLite database namedcontent.sqlite
. This program will also clean up the data and extract some features, such as the sender, the subject, the date, and the organization. - Run the
gmodel.py
program, which will create a second SQLite database namedindex.sqlite
. This program will model the email data as a network of nodes and edges, where the nodes are the senders and the edges are the messages. It will also compute some metrics, such as the rank and the centrality, for each node. - Run the
gbasic.py
program, which will produce a word cloud visualization of the most frequent words in the email data. You can view the word cloud in a file namedgword.htm
. - Run the
gline.py
program, which will produce a timeline visualization of the email activity over time. You can view the timeline in a file namedgline.htm
.
A simple example of the output of this assignment is shown below:
Join the conversation