Now that large language models (LLMs) have become part and parcel of how we conduct text authoring, there is a danger that this will result in a loss of linguistical diversity in style as more and more people lean on LLMs to expand on their ideas. I'll readily admit that I have used this in non-essential contexts but I'll be categorical in stating that the content of this blog is exclusively an expression of my writing style.
It will therefore become essential to develop stylometric tools which can distinguish one writer from another as well flag content that is likely to be the result of heavy LLM use. Writers may now be called upon to demonstrate the development of their ideas through an evolving log of their work in which it will be evident how the work as naturally progressed with time.
In this post, however, I would like to outline what I believe would be a pragmatic approach towards such stylometric analysis.
Input
Obviously, the input would be a block of text––the more the merrier. Whether formatting is important is good question; I can see how an authors attention to formatting detail may serve as an identifying marker.
Output
What should such a tool output?