SherlockHolmes: An R Program to Analyze the Hidden Structure
of Sherlock Holmes Stories by Statistical Pattern Analysis of
Concordances
Barry
Zeeberg
Background
This is the second manuscript documenting the Sherlock R language program. The first manuscript provided the motivation, methods, and preliminary results. Now we will expand the types of studies and results.
The functions of the Sherlock package are designed to allow an integrated interaction of the user with the data, in the sense that the data can be viewed at a high level encompassing a broad overview, with the option to drill down to specific detail, such as specific correlated lead terms within the concordance. These views are all generated by default, and are accessible to the user through a structured hierarchical archive (Table 1).
I had previously presented a scatter plot of fraction values as a function of the chronological order for the search pattern “Holmes,” across all 60 Sherlock Holmes stories. I will now expand that result with an additional search patterns, namely “Watson” and “Sherlock.” He was usually addressed as “Holmes” rather than by the more intimate “Sherlock,” except when addressed by his brother Mycroft, or when Watson might introduce him to someone as “Mr. Sherlock Holmes.” So it is not surprising that the fraction value (please see the previous manuscript for this and other definitions) is generally lower for the search pattern “Sherlock” as compared with “Holmes.”
Table 1. Directories, Files, Functions, and Figures
Overview: Inventory
The histogram of fraction values for the search pattern “Watson” shows the broad range of values that characterize different stories (Figure 1).
Figure 1. The histogram of fraction values for the search pattern “Watson”
The fraction value for “Holmes” tends to increase with
chronological order (Figure 2). Somewhat surprisingly, the opposite is
true for the more intimate “Sherlock” (Figure 3). Conan Doyle appears to
have drifted away from the more intimate presentation.
Figure 2. Scatter plot of fraction values as a function of the chronological order for the search pattern “Holmes,” across all 60 Sherlock Holmes stories.
Figure 3. Scatter plot of fraction values as a function of the chronological order for the search pattern “Sherlock,” across all 60 Sherlock Holmes stories.
These can be compared more directly in an overlay plot (Figure
4).
Figure 4. Scatter plot overlay of fraction values as a function of the chronological order for the search patterns “Holmes” and “Sherlock” across all 60 Sherlock Holmes stories.
In spite of the lower fraction value for “Sherlock” vs.
“Holmes,” it does appear at least once in all but 3 of the 60 stories
(Figure 5). These absences occur among the chronologically latest
stories. This is consistent with my earlier hypothesis that Conan Doyle
appears to have drifted away from the more intimate presentation.
Figure 5. At least 1 instance of the search string in a text. Stories are in chronological order from top to bottom.
Another search string that suggests itself is “Watson,” whose
chronology pretty much parallels that for “Holmes” (Figure 6).
Figure 6. Scatter plot overlay of fraction values as a function of the chronological order for the search patterns “Holmes” and “Watson” across all 60 Sherlock Holmes stories.
“Watson” appears in all but 1 story (data not shown). There are
several other search strings that have a significant presence throughout
the stories, but nowhere near our top 3 (Figure 7).
Figure 7.
At least 1 instance of the search string in a text. Stories are in
chronological order from top to bottom.
Unique among these is “The Adventure of the Empty House,” which
is the only story in which all 5 of the search patterns appear. The
stars must have been in alignment for that one. In second place is “The
Valley of Fear,” in which 3 of the search patterns appear.
The scatter plot for the search patterns “Holmes,” “Sherlock,” and
“Mycroft” across all 60 Sherlock Holmes stories shows unusually high
fraction values for both “Mycroft” and “Sherlock” in one text (Figure
8).
Figure 8. Scatter
plot overlay of fraction values as a function of the chronological order
for the search patterns “Holmes,” “Sherlock,” and “Mycroft” across all
60 Sherlock Holmes stories.
The identity of the text “The Greek Interpreter” is retrieved
from the tabulation in the archival files (Figures 9, 10).
Figure 9. A portion
of the archival file for “Mycroft.”
Figure 10. A
portion of the archival file for “Sherlock.”
The hypothesis is that Sherlock’s brother Mycroft would use the
more intimate and familiar form of address.
To examine this in more detail, we can review the corresponding cumulative distribution overlay plot (Figure 11).