Goal of Analysis

The goal of this analysis is to assess any correlations between the appearance, cancellation, and ratings of the television show Sherlock and references to “Sherlock Holmes” in books during the time period before, during, and after the run of Sherlock. This will, in part, help determine the viability, from a publishing perspective, of producing new Sherlock Holmes material.


It is assumed there is a publishing lead time of 1-2 years, so the appearance, ratings, and cancellation of the show Sherlock will demonstrate a time-shifted pattern from the book (volume) publications.

Cleaning and arranging the Google Books ngram dataset

Cleaning and arranging the Google Books 2020 ngram (v3) dataset in PowerShell: selecting relevant ngram data, then changing it from tab-separated + comma-separated values to newline and comma-separated values with a header line.

cat 2-00242-of-00589 | Select-String -CaseSensitive -Pattern "Sherlock Holmes\t" > "Sherlock Holmes ngram count.tsv"
(cat "Sherlock Holmes ngram count.tsv") -replace "\t", "`n" -replace "$\n" -replace "Sherlock Holmes", "Year,count,volume" > "Sherlock Holmes ngram count.csv"
(gc "Sherlock Holmes ngram count.csv" | select -Skip 1) | sc "Sherlock Holmes ngram count.csv"

Sherlock Holmes Data Summaries

##     Position    Const             Created            Modified        
##  Min.   :1   Length:9           Length:9           Length:9          
##  1st Qu.:3   Class :character   Class :character   Class :character  
##  Median :5   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :5                                                           
##  3rd Qu.:7                                                           
##  Max.   :9                                                           
##   Description       Title               URL             Title.Type       
##  Min.   :52.00   Length:9           Length:9           Length:9          
##  1st Qu.:63.00   Class :character   Class :character   Class :character  
##  Median :71.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :69.78                                                           
##  3rd Qu.:76.00                                                           
##  Max.   :88.00                                                           
##   IMDb.Rating  Runtime..mins.       Year         Genres         
##  Min.   :8.0   Min.   :86.00   Min.   :2010   Length:9          
##  1st Qu.:9.0   1st Qu.:88.00   1st Qu.:2010   Class :character  
##  Median :9.0   Median :88.00   Median :2012   Mode  :character  
##  Mean   :9.0   Mean   :88.22   Mean   :2012                     
##  3rd Qu.:9.3   3rd Qu.:89.00   3rd Qu.:2014                     
##  Max.   :9.7   Max.   :89.00   Max.   :2014                     
##    Num.Votes     Release.Date        Directors        
##  Min.   :25502   Length:9           Length:9          
##  1st Qu.:26566   Class :character   Class :character  
##  Median :29716   Mode  :character   Mode  :character  
##  Mean   :30064                                        
##  3rd Qu.:30380                                        
##  Max.   :38324
##       Year          count           volume     
##  Min.   :2008   Min.   :23382   Min.   : 6501  
##  1st Qu.:2011   1st Qu.:30145   1st Qu.: 7787  
##  Median :2014   Median :36824   Median : 8701  
##  Mean   :2014   Mean   :36271   Mean   : 8816  
##  3rd Qu.:2016   3rd Qu.:41698   3rd Qu.: 9778  
##  Max.   :2019   Max.   :52985   Max.   :11003

Sherlock Holmes Data Plots


In reviewing the graphs of the IMDB ratings and Google ngram citations data from the date range of 2008, two years before the show Sherlock’s first appearance, to 2019, five years after the show’s final appearance, it is clear that there is a corresponding rise and fall in the number of volumes referencing “Sherlock Holmes” in relation to the show’s appearance, increasing popularity, and cancellation. The assumption that a time shift would be reflected in the publication data bore out.

The above would seem to suggest that the arrival on television of Sherlock Holmes-inspired content will result in increased publication of book-based Sherlock Holmes content. Its cancellation would seem to result in decreased publications referencing “Sherlock Holmes.”


The above analysis and dataset does not in any way attempt to distinguish general “Sherlock Holmes” ngram references from those specifically related to the television show; this is intentional, because the interest here is in the relationship between book writing about “Sherlock Holmes” and Sherlock.

Future Analysis

The current analysis focuses on a top-down, i.e. publication perspective, rather than a bottom-up, i.e. purchaser perspective. If data on actual sales of books containing the “Sherlock Holmes” ngram were available, that would provide insight into how the show’s performance may have affected actual sales rather than simply publication trends.