Software for the analysis of linguistic data that I have developed or worked on:
The Multidimensional Analysis Tagger is a program for Windows that replicates Biber’s (1988) Variation across Speech and Writing tagger for the multidimensional functional analysis of English texts, generally applied for studies on text type or genre variation. The program can generate a grammatically annotated version of the corpus selected as well as the necessary statistics to perform a text-type or genre analysis. The program plots the input text or corpus on Biber’s (1988) Dimensions and determines its closest text type, as proposed by Biber (1989) A Typology of English Texts. Finally, the program offers a tool for visualising the Dimensions features of an input text.
The Great American Word Mapper allows users to map the relative frequencies of the 97,246 most common words in an 8.9 billion word corpus of 890 million geocoded Tweets collected from across the contiguous United States between 11 October 2013 and 22 November 2014. The original app was created by Jack Grieve, Andrea Nini, and Diansheng Guo for the Trees and Tweets project, funded by AHRC/ESRC/JISC/IMLS as part of Digging into Data 3. This Quartz version was redesigned by Nikhil Sonnad.
The four word-by-county regional data matrices used for Word Mapper are available for download here. The website offers a matrix containing the relative frequencies per billion words of the 97,246 words measured across 3,075 counties and three matrices containing the corresponding Getis-Ord Gi* z-scores, calculated using three different nearest neighbours spatial weights matrices.
See the papers and talks at the link above for more information. If you use the data in your research, please cite one or more of the papers.