Criminals, linguistics, literacy and attribution | WeLiveSecurity

Criminals, linguistics, literacy and attribution

Linguistics and some form of textual analysis can be helpful in analysing malware and scams. Regional attribution, though, still requires caution.

Linguistics and some form of textual analysis can be helpful in analysing malware and scams. Regional attribution, though, still requires caution.

In an article I wrote recently for Infosecurity Magazine – Spelling Bee (Input from the Hive Mind – I touched on the topic of textual analysis (in a rather loose sense).

This was in response to some comments implying that it’s a good indicator of scamminess when a message uses US or UK spellings inappropriate to the region from which it’s supposed to originate. The main thrust of that part of my article was that the use of the -ize or -ise suffixes is not as cut and dried as some spelling and style checkers would have you believe, and that the use of Americanisms is not an infallible guide to origin in the 21st century. However much some of us might regret their encroachment into UK English…

In fact, the pseudo-French replacement of all instances of –ize with –ise is a fairly recent publishing fad with which many writers and publishers in the UK have never chosen to conform. And, of course, with the globalization of many commercial entities, it’s not uncommon for many people in many countries whose first language is not English to learn the language from US-oriented sources, and that may also influence a company’s regional preference, linguistically speaking.


A Spelling Bee searching for its dictionary

Indeed, while poor English (of whatever regional variety) is often a clue that Something Is Phishy, even august financial institutions might sometimes slip up, or use unexpected regional idioms.

One point I made, however, was that ‘impeccable presentation doesn’t prove legitimacy‘ and that other cues and clues may be more reliable.

While the recent report in The Register of two men arrested in connection with the CoinVault ransomware doesn’t provide any information related to phish-type social engineering and linguistic manipulation, it’s interesting to see that part of the case against these suspects seems to be based on the inclusion of phrases in ‘perfect Dutch’ sprinkled throughout the binary, indicating a Dutch connection.

I don’t have any privileged information about the case, and no reason at all to believe that the Dutch NHTCU’s conclusions aren’t justified. It is worth bearing in mind, though, that in general anti-malware analysts are careful to avoid drawing ‘authoritative’ forensic conclusions: in particular about attribution of the origin of malicious activity on the basis of linguistics, cultural references, timestamps and other attributes that might be provide useful clues, but might also be deliberately introduced to mislead analysts for political or other reasons. Irritating as that caution may be to journalists and others sometimes, there are often good reasons for it.

David “Two bees or not two bees?” Harley