On Monday Dr James Tompkinson (University of York) and I presented our talk on “Assessing the suitability of forensic authorship analysis methodologies for speech data” at the International Association for Forensic Phonetics and Acoustics (IAFPA) 2025 conference at Leiden University (The Hague), where we show some preliminary results about applying some authorship analysis techniques to transcribed speech. You can find the slides of the talk here: https://zenodo.org/records/16308151.
Category: Uncategorized
Examining an author’s individual grammar
On Monday I delivered a talk at the Comparative Literature Goes Digital Workshop at the Digital Humanities 2025 conference. As part of this talk I have also prepared a tutorial to use our new authorship verification method, LambdaG, to produce text heatmaps to study the idiosyncratic language of an author. This Github repository contains the abstract, a link to the tutorial and the slide of my talk: https://github.com/andreanini/lambdaG-case-study-DH2025.
Appearance on the Writing Wrongs podcast
A few months ago, I had the pleasure of being a guest on the ‘Writings Wrongs’ podcast. The episode covered the events of the Aiya Napa rape case and the evidence I presented at the trial. Like all other episodes, the hosts do an amazing job explaining everything in detail but in a really accessible way. I highly recommend this episode, as well as the whole podcast! You can find it here: https://www.aston.ac.uk/research/forensic-linguistics/writing-wrongs
A corpus analysis of idiolectal n-grams
The slides and abstract of our talk “A corpus analysis of idiolectal n-grams” at #CL2025 are now available here: https://doi.org/10.5281/zenodo.15806985
New pre-print: “Linguistic Individuality in Lexicogrammatical Alternations”
My PhD student Michael Cameron has uploaded a pre-print of his latest work, “Linguistic Individuality in Lexicogrammatical Alternations“, which shows with a pre-registered experiment how individuals consistently select the same lexicogrammatical variants over time and do this differently from other individuals. This suggests evidence for personalised entrenchment, which is an important factors in linguistic individuality (with obvious implications for forensic linguistics). You can find the pre-print here: https://doi.org/10.31234/osf.io/uvtrb.
“A Theory of Linguistic Individuality” now fully open access
My latest monograph, “A Theory of Linguistic Individuality for Authorship Analysis”, is now open access! You can download it for free here.
idiolect: a new R package for Forensic Authorship Analysis
I’m pleased to announce the release of version 1 of idiolect, my new package to carry out Forensic Authorship Analysis using R. The website of the package https://andreanini.github.io/idiolect contains a Get Started page with a brief tutorial. The package offers several well-known authorship analysis methods, including our new method LambdaG (https://arxiv.org/abs/2403.08462v1), as well as functions to calibrate Likelihood Ratios so to express the strength of the evidence within the Likelihood Ratio Framework for forensic science.
The package contains functions that cover the typical workflow for authorship analysis for forensic problems:
- Input and preprocess data;
- Carry out an analysis (Delta, N-gram Tracing, the Impostors Method, LambdaG);
- Test the performance of the methods on ground truth data;
- Apply the method to the questioned text and calibrate a Likelihood Ratio;
- Explore the data using feature importance or other visualisations depending on the method, including using concordances.
Let me know if you have any feedback or questions!
Recruiting a PhD student for a fully-funded position
I’m pleased to announce that we are recruiting a PhD student for a fully-funded (tuition and stipend) position for the project: “Can a robot impersonate a human? Studying machines’ ability to mimic linguistic identity” funded by an ESRC North West Social Science Doctoral Training Partnership grant. The position is in collaboration with Naimuri and contains a substantial element of industrial experience.
The project will address the following research questions: (1) To what extent can LLMs impersonate a specific individual such that they can fool forensic linguistic detection? (2) How do we modify existing detection methods to mitigate the problems identified in (1)?
Details and application form can be found here: https://www.findaphd.com/phds/project/can-a-robot-impersonate-a-human-studying-machines-ability-to-mimic-linguistic-identity/?p172832.
Keynote: How Linguistics can help to eliminate unnecessary complexity in modern Natural Language Processing
Yesterday I had the pleasure to deliver a Keynote talk at the Interdisciplinary Perspectives: Bridging Sociological Studies in the Digital Age conference, organised by the amazing Digital Humanities PhD students at King’s College London.
In the talk I argued that performance in a Natural Language Processing task, namely authorship analysis, can be significantly increased if insight and knowledge from (Cognitive) Linguistics is taken in consideration. Even though the field of NLP is now at a stage of wondering to what extent Linguistics is still needed, I make a case that the often quoted statement that “Every time I fire a linguist, the performance of the system goes up” tends to apply to certain strand of Linguistics that do not engage with the study of real-life language usage. Instead, Cognitive Linguistic Usage-Based approaches to the study of language are very compatible with modern advances in NLP. As shown in this case study, the application of Cognitive Linguistics theoretical frameworks can lead to better trade offs between computational complexity and performance, reducing the number of computational steps, processing time, and model parameters.
The slides of my talk are available here: https://doi.org/10.5281/zenodo.11583694.
Invited talk at the Corpus Linguistics Symposium: Style and Authorship
Tomorrow afternoon I’ll give a keynote talk at the Corpus Linguistics Symposium: Style and Authorship at the University of Leeds. My talk will be on how our new authorship verification method, LambdaG, can be more transparent in visualising the important features that identify an author than the state of the art and I will use Dickens as a case study. The event is hybrid and the talk is going to be recorded. For more information, the webpage of the event is: https://www.latl.leeds.ac.uk/research-satellites/corpus-linguistics/clstyleauthorship/.