Jan Rybicki

Visualizing the femininity of the "Chawton House corpus"


Two main questions are addressed in this very brief presentation of the stylometric features of the "Chawton House corpus". First, the "Chawton novels", i.e. the anonymous ones and those written by known female authors are compared to identify, among the anonymous texts, those that might or might not have been written by a woman. To do this, another corpus of somewhat more canonical female (Austen, Radcliffe, Burney, Edgeworth, Shelley, Porter) and male (Defoe, Swift, Johnson, Richardson, Fielding, Sterne, Smollett, Goldsmith, Walpole, Lewis, Beckford, Godwin, Scott, Peacock) authors is studied, using Burrows's "zeta method", to establish a list of words characteristic for both groups of the "classic" writers.

This wordlist is then used to perform a Cluster Analysis of word frequencies of all the texts mentioned above. A clear division appears between the females and the males, with but a few misplacements (including a parody of the sentimental novel); one of the anonymous books consistently appears among those by Defoe et al.

The other comparion is limited to just the known female writers from both the inside and the outside of the "Chawton House corpus". Once again, a characteristic wordlist is established for both categories of writers. Interestingly, the "famous" writers tend less to use words directly associated with emotions and plot events usually associated with the 18th-century sentimental novel. One again, the two subgroups are almost perfectly outlined.

SvD, November 2012

