Malware simulators are not suitable for product testing

As previously mentioned here, I presented my 16^th Virus Bulletin paper at the Virus Bulletin 2017 conference in Madrid on Thursday 5^th October. The paper The (testing) world turned upside down is now available here by kind permission of Virus Bulletin. [1]

The paper doesn't address everything I ever wanted to say about product testing, but it does touch on quite a few of the topics that continue to raise my blood pressure, so my intention is to use it as a springboard from which to talk about some of those issues in a series of blog articles. In the first of those articles, I'm going to talk about malware simulation. Simulators are not always used specifically in product testing, and they're not (hopefully) deliberately used to mislead (usually), but the harsh truth is that people do tend to expect more from a simulation than it can deliver. [2]

How to bias testing

Here’s a (pseudonymous) quote from a 1993 vintage article in Virus News International [3] on how to bias testing. It's amazing, given the age of that article, how many of the techniques it describes (or their offspring) can still be seen in (mis)use in 2017:

“Don’t use viruses at all. Use simulated viruses. Assume that the simulation is perfect and that therefore all products should detect them.”

Subsequently, a well-known open letter [4] from the year 2000 objected to simulated viruses, with particular reference to the Rosenthal Utilities, a suite of programs considered by the security industry to be of no practical use and in some circumstances potentially dangerous. The signatories to that letter (and yes, I was one of them) noted that:

‘…using simulated viruses in a product review inverts the test results … because:

It rewards the product that incorrectly reports a non-virus as infected.
It penalizes a product that correctly recognizes the non-virus as not infected.’

(Bizarrely, one of the signatories of that letter still represents a testing organization that has recently (reportedly) conducted a sponsored test that uses ‘created’ or ‘simulated’ malware, as well as disabling layers of functionality. [5])

Around the time I first started to work closely with ESET, we saw yet more simulated viruses, courtesy of the ‘Untangled’ so-called Anti-Virus Fight Club test, [6] standing as representative of many other tests and reviews where the tester used:

…simulated viruses, virus fragments, or even “virus-like” programs, whatever that means.

EICAR: tests for installation, not detection

Strangely, the Untangled test used several instances of the EICAR test file, which is not exactly a simulation and is certainly not suitable for comparative testing [7] unless the test objective is along the lines of ascertaining ‘does this product detect the EICAR test file?’

True, most malware isn’t viral any more, but I don’t think that matters in this context. Nor, of course, is the EICAR file a virus, and doesn't in any respect behave like one. Well, the EICAR file isn’t really a simulation so much as an installation check, rather like the AMTSO security features check utilities (many of which use the EICAR file).

The EICAR file and the AMTSO (Anti-Malware Testing Standards Organization) feature settings checks ‘work’ because much of the industry agrees to recognize them, not as a simulation but as an indication that a security product is installed and ‘awake’, though it’s not a guarantee that it’s fully functional.

Perhaps it's a little unfortunate that AMTSO chose to include installation tests on its web site, in that – because the organization is so much associated with 'proper' comparative and certification testing – people might think that AMTSO's offering of installation tests in some way legitimizes their use in comparative tests. However, some people and organizations certainly seem to find installation tests useful, and AMTSO has made its views quite clear in a guidelines document on the Use and Misuse of Test Files.

Simulations and FPs

There's been an avalanche lately of malware simulators (especially ransomware simulators) that do not differ much from those installation checks and old-school virus simulators. Some virus simulators relied on embedding non-functional snippets of viral code, expecting security software to be triggered by recognition of a simple scan string, rather than by more thorough and complete identification mechanisms. However, this approach was based on a basic misunderstanding of how even (much) stone-age signature detection worked. There was never a single 'universal' static signature for each malicious sample, used by all scanners to identify it.

Nowadays, sole (or even frequent) reliance on static signatures in mainstream security products is, in any case, ancient history, or a figment of the imaginations of so-called next-gen marketroids.

Here's an extract from a paper that we (Lysa Myers, Eddy Willems and I) presented at AVAR several years ago, describing Doren Rosenthal's notorious (and by that time long-discredited) virus simulation:

"disliked by AV researchers in general, for a number of reasons:

They were based on the false premise that a product that detects a real virus should also detect a simulation of that virus.
The previous false premise is also based on the equally false premise that a virus signature is some unique footprint and that all security software uses the same signature. There is, of course, no reason why random virus fragments should be detected as if they were a real virus, and even less reason for several vendors to use the same signature. "

To be fair, the registered version of the Rosenthal utilities did, in fact, include "a real if relatively innocuous virus."

This could therefore be described as a rare example of a simulation displaying something close to viral behaviour. However, the inappropriate use of Rosenthal’s programs by some testers in comparative testing "forced the AV industry to add detection of yet another non-virus to the impressive selection of garbage files, intendeds and other inappropriate objects it needs to detect if a product isn’t to be penalized for not detecting what it shouldn’t need to detect."

Present-day ransomware simulators tend to take a more generic approach: rather than impersonating specific malware, they may attempt to simulate malicious behaviour. However, they really don't behave much like real ransomware. In the Good Old Days, it was at least arguable that 'viral' behaviour (i.e. self-replication) is a usually-reliable indicator of malicious behaviour, which tends to make heuristic detection of 'real' viruses comparatively easy. However, ransomware simulators usually focus (understandably) on file encryption, which may or may not be malicious. After all, it's also a primary function of some security software! Ransim illustrates exactly this difficulty: in order to behave ethically, the simulator is designed to introduce its own files and then encrypt them.

"There's no reason why a process that only encrypts files it has just created itself should be diagnosed as malicious, let alone as ransomware."

But avoiding encrypting 'existing files on disk' is pretty much the opposite of what real ransomware does. There's no reason why a process that only encrypts files it has just created itself should be diagnosed as malicious, let alone as ransomware. Encryption is a legitimate function: it's only the context that makes it malicious. Otherwise, Microsoft Word's operations when a user chooses to password-protect an existing document should also be detected as ransomware, which is patently ridiculous. However, encryption covertly performed by ransomware on files that are accessed and modified without the authorization of their owner clearly is malicious.

So how does a simulator do what real ransomware does without being real malware? I am not convinced that's even realistically possible to do ethically and in absolute safety, but it can't be done usefully by blocking 'operations' without taking into account what suspected malware is really doing. Modern security software is not generally concerned with simplistic signature scanning, but with techniques that analyse how software behaves in order to establish whether it has some possible or probable malicious intent. It's the ability to distinguish between the legitimate and malicious execution of an operation that makes a competent security program effective. While it's too much to hope that security software will always get it right, it seems to me that there are only two ways that it can always detect simulated malware.

By flagging as suspicious every operation that might just be malicious, irrespective of the inevitable false positives. And, even worse, letting the user decide whether to block or accept execution.
Detect the simulator rather than the malicious process it's intended to emulate. In general, those who generate simulators cry foul when this is done, and it does, of course, invalidate the purpose of the simulation. But that's a problem that arises from the dubious nature of the simulation approach, not from the incompetence or malfeasance of security vendors.

It's not just that malware simulators generally fail to emulate real malware successfully. The security industry hasn’t agreed to recognize them as having some sort of EICAR-like functionality, either. If they were to be flagged by mainstream anti-malware, it probably ought to be as something analogous to a PUA (Potentially Unwanted Application). Something like ‘Quasi-Simulator Non-Malware’. Which is a polite way of saying ‘brain-dead intentional false positive’.

Pussyfooting

This might be the right time for me to introduce you to my cat.

We call him Ike. Short for EICAR.COM. He is a simulated tiger, but that doesn't make him suitable for tiger detection testing. (We also have a dog called Little Bobby Tables.)

Sims and Sinners

A simulation of an attack is, by definition, not a real attack, even if it's a 'good' simulation (which would be a novelty). It is someone’s conception of what an attack should look like, but is not truly malicious. In general, the security industry avoids detecting simulated attacks, as not only are they technically false positives, but they're also founded on dubious assumptions about how malware and anti-malware programs work. Simulations are usually based on a snapshot view of some malware, and disregard differences in malware and anti-malware technologies. They almost invariably mislead people who expect them to be accurate. In fact, they prove little or nothing about what a product detects except that it detects a specific simulation.

Companies choosing to detect simulators are like those companies who used to detect garbage files, simulations, etc. known to be present in commonly-used sample collections, to avoid being penalized in poor but influential tests. In other words, they avoid competitive disadvantage, but legitimize poor tests. In general, products that detect a simulation can and do merely detect the presence of a simulator, even if they also detect the generic type of breach the simulator is intended to represent or impersonate.

Sadly, simulations are not good indicators of how good a product is at detecting real malware, still less of how it compares in performance to other products.

References

[1] Harley, D. The (Testing) World Turned Upside Down. Virus Bulletin International Conference 2017. https://geekpeninsula.wordpress.com/2017/10/09/virus-bulletin-conference-paper-2017/

[2] Gordon, S. Are Good Virus Simulators Still A Bad Idea? 1996. http://www.sciencedirect.com/science/article/pii/1353485896844047 .

[3] Virus News International, November 1993, pp.40-41, 48. http://www.softpanorama.org/Malware/Reprints/virus_reviews.html

[4] Wells, J. et al. Open Letter. 2000: https://www.cybersoft.com/static/downloads/whitepapers/Open_Letter.pdf

[5] Ragan, S. Cylance accuses AV-Comparatives and MRG Effitas of fraud and software piracy. CSO Online. 2017. http://www.csoonline.com/article/3167236/security/cylance-accuses-avcomparatives-and-mrg-effitas-of-fraud-and-softwarepiracy.html

[6] Harley, D. Untangling the Wheat from the Chaff in Comparative Anti-Virus Reviews. Small Blue-Green World. 2007. https://geekpeninsula.files.wordpress.com/2013/09/av_comparative_guide.pdf

[7] Harley, D.; Myers, L.; Willems, E. Test Files and Product Evaluation: the Case for and against Malware Simulation. AVAR Conference 2010. http://www.welivesecurity.com/media_files/white-papers/AVAR-EICAR-2010.pdf