Podcasting True Crime Cultural Data Analysis


The purpose of this project was to locate a humanities focused data set to be analyzed using computational and quantitive methods at a larger scale than traditional methods would allow. The project concluded with a critical essay that, in addition to critical theory, utilizes computational techniques as a method. 


The dataset acquired for this project was a compilation of podcast feeds obtained from the PodcatRE Archive. I chose to track the growth of and types of categories related to podcasts as a medium through the metadata found in RSS feeds. In addition to tracking these general trends, I was also interested in focusing on true crime as an area of interest for future study.  I chose to use Python for carrying out computational methods and Tableau to visualize the results.

My interest in true crime as a genre spurs from two separate and arguably related instances. The first is the debut of and subsequent popularity of the podcast Serial in 2014. Serial is hosted by public radio personality Sarah Koenig and revisits the 1999 murder investigation surrounding the death of teenager Hae Min Lee. The second instance is that while Apple supported podcasts through iTunes in 2005, an iOS update in 2014 included the Podcast app as a default app for its devices. This update is thought to be a possible reason for why the series garnered many more listeners than podcasts before it.  

Data Collection 

The original dataset consisted of 2,383 podcast RSS feeds in XML format. Each file corresponded to an individual podcast feed that contained anywhere from one to 200+ episode entries. A sample RSS feed is pictured below. 

Parsing the Dataset

In my first attempt at computational methods the sheer size and volume of the data proved to be too large for my Jupyter Notebook to process. It was clear to me that I needed to find a way to parse the data se that I would a smaller set that I would be able to work with more effectively. The podcast files had no clear naming structure or organization which meant that I needed to manually go through the files and select the ones that I wanted to include. Since I was interested in the topic of True Crime as a genre I rationalized eliminating business, technology, and financial related content because I didn’t think it fit with the topics I was interested in. Leaving these in could have provided a better sense of growth for content within the medium as well a fuller scope of what iTunes categories are represented by this dataset.

Ultimately I analyzed two sets of data. One set contained 180 podcast feeds and was made up of both True Crime content as well as other entertainment, society and culture, and history content. The second set contained 45 podcasts that solely had content related to True Crime. Though smaller than the original set, through these two sets of data, my goal was show the growth of podcasts as a medium as well the growth of the True Crime genre within the larger set. 

Resulting Data and Visualizations

Growth Over Time

To gauge growth over time I used Python to pull the "publish date" of each episode from each dataset and export it into a CSV. The data was then visualized in Tableau to better understand the growth rate. According to my literature podcasts as a medium became more sustainable once iTunes started supporting the feature in 2004. As you can see in the figure below (left) the medium has seen exponential growth, especially in the last three years. 

In terms of True Crime, the first related series in the PodcastRE archive first appears in 2007. Since this is the genre I was interested in exploring, I based all my comparisons and observations with this year as my starting point. Additionally, I measured growth up until and after the year 2014. I felt this was important as it was the year that Serial premiered which has been cited as integral to the inception and growth of the true crime genre. It is also, as noted before, the year in which the iOS update that supported the podcasts app as a default app in Apple devices which also is considered to have contributed to the medium’s growth.

Growth rate of podcasts as a medium since 2004 when iTunes started supporting podcasts.

Growth rate of True Crime related podcast content since 2007. 

Podcast Categories

In order to discuss my observations within the category field, I will first talk about the way categories are defined by Apple. Apple currently provides 15 parent categories that content creators can choose from to make their content findable. Within these 15 parent categories, there are sub-categories that content creators can also choose from. After selecting three parent categories, creators can add additional sub-categories. I found that the majority of true crime content is designated under the Arts, News and Politics, and Society and Culture parent categories with Government and Organizations and Comedy showing smaller designations. I did not find the News and Politics and Society and Culture designations surprising as there are multiple news related programs who have true crime related series on various networks. Most notably, 20/20 has an investigative series that airs on Investigation Discovery which is a television network dedicated to this type of content. Within the Society and Culture category, History and Personal Journals also have a sizeable amount of designations. Depending on the specific series this is not surprising either as the nature of this content may contain information from personal letters, journals, or interviews. I was slightly surprised by the Comedy designation because it does not seem like a natural fit for this type of content. My Favorite Murder is a true crime podcast that presents the information in a comedic way. Additionally, the creators of this podcast, Karen Killgariff and Georgia Hardstark openly talk about mental health issues that might otherwise be stigmatized.

Categories found within the larger podcast dataset. 

Categories found within the True Crime dataset.

Using Format