AI Can Guess Your Location Based on Writing Style: A Modern Twist on Linguistic Analysis
Can AI detect where you are from just by analyzing your writing style? One of my favorite films is My Fair Lady, based on George Bernard Shaw’s Pygmalion. In this classic tale, Audrey Hepburn transforms from a Cockney flower girl into a duchess in a glittering gown, while Rex Harrison talk-sings his way through the narrative. The film's central plot revolves around the idea of whether someone’s accent can be changed enough to deceive the upper class. What My Fair Lady illustrated in 1964, through its blend of music, dance, and dialogue, was a concept that linguists have long understood: language can reveal your origins without explicitly stating them. This led me to wonder: could large language models (LLMs) like those used in modern AI be employed as virtual Professor Higginses to pinpoint a person’s geographical location just by their writing? To explore this, I developed HIGGINS GPT, an AI tool designed to analyze linguistic cues from short text samples. These cues include word choices, sentence structure, topic selection, grammatical peculiarities, spellings, idioms, metaphors, cultural references, and emotional tone. Even something as subtle as sentence length can offer valuable insights. For instance, longer, more complex sentences might indicate fluency in Romance languages, while shorter, more concise sentences suggest a different background. HIGGINS GPT leverages these nuances to make educated guesses about the writer's nationality or region. To create HIGGINS GPT, I focused on training the model to recognize and differentiate between various linguistic patterns associated with different parts of the world. This involved curating a diverse dataset of texts from a wide range of regions and cultures, ensuring the AI could identify a broad spectrum of linguistic markers. The model was then fine-tuned to improve its accuracy in detecting these subtle differences. To test the effectiveness of HIGGINS GPT, I conducted several experiments using comments from social media platforms. These platforms are a rich source of diverse writing styles, making them ideal for this type of analysis. The results were intriguing: the AI was often able to make accurate predictions about the writer's origin based on their unique linguistic profile. Of course, the accuracy of such predictions can vary. Factors such as the writer's level of education, multilingualism, and exposure to different cultural influences can complicate the process. However, even with these variables, HIGGINS GPT demonstrated a promising ability to provide insights into the writer's background. The potential applications of this technology are vast. In fields like sociolinguistics, it could help researchers better understand how language evolves and varies across different populations. In cybersecurity, it might assist in identifying the sources of suspicious activity. On a personal level, it could be used to connect individuals with similar backgrounds or interests, enhancing community engagement and cultural exchange. While HIGGINS GPT is an exciting step forward, it also raises important ethical considerations. For example, the use of such technology could potentially infringe on privacy, particularly if it is applied without consent. It's crucial to balance the benefits of linguistic analysis with the need to protect individual rights and data. In summary, the development of HIGGINS GPT shows that AI can indeed analyze writing styles to infer a person's geographical origin. This technology, rooted in the principles of linguistics, opens up new avenues for research and practical applications, but it also necessitates careful consideration of ethical implications.