SherlockHolmes Part II

Barry Zeeberg [aut, cre]

2023-03-28

SherlockHolmes: An R Program to Analyze the Hidden Structure of Sherlock Holmes Stories by Statistical Pattern Analysis of Concordances

 

 

Barry Zeeberg


Background

This is the second manuscript documenting the Sherlock R language program. The first manuscript provided the motivation, methods, and preliminary results. Now we will expand the types of studies and results.

The functions of the Sherlock package are designed to allow an integrated interaction of the user with the data, in the sense that the data can be viewed at a high level encompassing a broad overview, with the option to drill down to specific detail, such as specific correlated lead terms within the concordance. These views are all generated by default, and are accessible to the user through a structured hierarchical archive (Table 1).

I had previously presented a scatter plot of fraction values as a function of the chronological order for the search pattern “Holmes,” across all 60 Sherlock Holmes stories. I will now expand that result with an additional search patterns, namely “Watson” and “Sherlock.” He was usually addressed as “Holmes” rather than by the more intimate “Sherlock,” except when addressed by his brother Mycroft, or when Watson might introduce him to someone as “Mr. Sherlock Holmes.” So it is not surprising that the fraction value (please see the previous manuscript for this and other definitions) is generally lower for the search pattern “Sherlock” as compared with “Holmes.”


Table 1. Directories, Files, Functions, and Figures


Overview: Inventory

The histogram of fraction values for the search pattern “Watson” shows the broad range of values that characterize different stories (Figure 1).

Figure 1. The histogram of fraction values for the search pattern “Watson”


The fraction value for “Holmes” tends to increase with chronological order (Figure 2). Somewhat surprisingly, the opposite is true for the more intimate “Sherlock” (Figure 3). Conan Doyle appears to have drifted away from the more intimate presentation.

Figure 2. Scatter plot of fraction values as a function of the chronological order for the search pattern “Holmes,” across all 60 Sherlock Holmes stories.

Figure 3. Scatter plot of fraction values as a function of the chronological order for the search pattern “Sherlock,” across all 60 Sherlock Holmes stories.


These can be compared more directly in an overlay plot (Figure 4).

Figure 4. Scatter plot overlay of fraction values as a function of the chronological order for the search patterns “Holmes” and “Sherlock” across all 60 Sherlock Holmes stories.


In spite of the lower fraction value for “Sherlock” vs. “Holmes,” it does appear at least once in all but 3 of the 60 stories (Figure 5). These absences occur among the chronologically latest stories. This is consistent with my earlier hypothesis that Conan Doyle appears to have drifted away from the more intimate presentation.

Figure 5. At least 1 instance of the search string in a text. Stories are in chronological order from top to bottom.


Another search string that suggests itself is “Watson,” whose chronology pretty much parallels that for “Holmes” (Figure 6).

Figure 6. Scatter plot overlay of fraction values as a function of the chronological order for the search patterns “Holmes” and “Watson” across all 60 Sherlock Holmes stories.


“Watson” appears in all but 1 story (data not shown). There are several other search strings that have a significant presence throughout the stories, but nowhere near our top 3 (Figure 7).


Figure 7. At least 1 instance of the search string in a text. Stories are in chronological order from top to bottom.


Unique among these is “The Adventure of the Empty House,” which is the only story in which all 5 of the search patterns appear. The stars must have been in alignment for that one. In second place is “The Valley of Fear,” in which 3 of the search patterns appear.

The scatter plot for the search patterns “Holmes,” “Sherlock,” and “Mycroft” across all 60 Sherlock Holmes stories shows unusually high fraction values for both “Mycroft” and “Sherlock” in one text (Figure 8).


Figure 8. Scatter plot overlay of fraction values as a function of the chronological order for the search patterns “Holmes,” “Sherlock,” and “Mycroft” across all 60 Sherlock Holmes stories.


The identity of the text “The Greek Interpreter” is retrieved from the tabulation in the archival files (Figures 9, 10).


Figure 9. A portion of the archival file for “Mycroft.”


Figure 10. A portion of the archival file for “Sherlock.”


The hypothesis is that Sherlock’s brother Mycroft would use the more intimate and familiar form of address.

To examine this in more detail, we can review the corresponding cumulative distribution overlay plot (Figure 11).