The goal of this analysis is to assess any correlations between the appearance, cancellation, and ratings of the television show Sherlock and references to “Sherlock Holmes” in books during the time period before, during, and after the run of Sherlock. This will, in part, help determine the viability, from a publishing perspective, of producing new Sherlock Holmes material.
It is assumed there is a publishing lead time of 1-2 years, so the appearance, ratings, and cancellation of the show Sherlock will demonstrate a time-shifted pattern from the book (volume) publications.
Cleaning and arranging the Google Books 2020 ngram (v3) dataset in PowerShell: selecting relevant ngram data, then changing it from tab-separated + comma-separated values to newline and comma-separated values with a header line.
cat 2-00242-of-00589 | Select-String -CaseSensitive -Pattern "Sherlock Holmes\t" > "Sherlock Holmes ngram count.tsv"
(cat "Sherlock Holmes ngram count.tsv") -replace "\t", "`n" -replace "$\n" -replace "Sherlock Holmes", "Year,count,volume" > "Sherlock Holmes ngram count.csv"
(gc "Sherlock Holmes ngram count.csv" | select -Skip 1) | sc "Sherlock Holmes ngram count.csv"
summary(sherlock_imdb)
## Position Const Created Modified
## Min. :1 Length:9 Length:9 Length:9
## 1st Qu.:3 Class :character Class :character Class :character
## Median :5 Mode :character Mode :character Mode :character
## Mean :5
## 3rd Qu.:7
## Max. :9
## Description Title URL Title.Type
## Min. :52.00 Length:9 Length:9 Length:9
## 1st Qu.:63.00 Class :character Class :character Class :character
## Median :71.00 Mode :character Mode :character Mode :character
## Mean :69.78
## 3rd Qu.:76.00
## Max. :88.00
## IMDb.Rating Runtime..mins. Year Genres
## Min. :8.0 Min. :86.00 Min. :2010 Length:9
## 1st Qu.:9.0 1st Qu.:88.00 1st Qu.:2010 Class :character
## Median :9.0 Median :88.00 Median :2012 Mode :character
## Mean :9.0 Mean :88.22 Mean :2012
## 3rd Qu.:9.3 3rd Qu.:89.00 3rd Qu.:2014
## Max. :9.7 Max. :89.00 Max. :2014
## Num.Votes Release.Date Directors
## Min. :25502 Length:9 Length:9
## 1st Qu.:26566 Class :character Class :character
## Median :29716 Mode :character Mode :character
## Mean :30064
## 3rd Qu.:30380
## Max. :38324
summary(Sherlock_Holmes_ngram_count)
## Year count volume
## Min. :2008 Min. :23382 Min. : 6501
## 1st Qu.:2011 1st Qu.:30145 1st Qu.: 7787
## Median :2014 Median :36824 Median : 8701
## Mean :2014 Mean :36271 Mean : 8816
## 3rd Qu.:2016 3rd Qu.:41698 3rd Qu.: 9778
## Max. :2019 Max. :52985 Max. :11003
In reviewing the graphs of the IMDB ratings and Google ngram citations data from the date range of 2008, two years before the show Sherlock’s first appearance, to 2019, five years after the show’s final appearance, it is clear that there is a corresponding rise and fall in the number of volumes referencing “Sherlock Holmes” in relation to the show’s appearance, increasing popularity, and cancellation. The assumption that a time shift would be reflected in the publication data bore out.
The above would seem to suggest that the arrival on television of Sherlock Holmes-inspired content will result in increased publication of book-based Sherlock Holmes content. Its cancellation would seem to result in decreased publications referencing “Sherlock Holmes.”
The above analysis and dataset does not in any way attempt to distinguish general “Sherlock Holmes” ngram references from those specifically related to the television show; this is intentional, because the interest here is in the relationship between book writing about “Sherlock Holmes” and Sherlock.
The current analysis focuses on a top-down, i.e. publication perspective, rather than a bottom-up, i.e. purchaser perspective. If data on actual sales of books containing the “Sherlock Holmes” ngram were available, that would provide insight into how the show’s performance may have affected actual sales rather than simply publication trends.