Signatures, product testing, and the lingering death of AV

Virus Bulletin recently published a paper of mine on Hype heuristics, signatures and the death of AV (again). If you’re one of the people who’ve followed my writing over the last couple of decades (hi, how are you both?) the general thrust of the paper won’t surprise you. After all, it’s only a couple of years since Larry Bridwell and myself asked in a paper for AVAR 2013 ‘Whatever happened to anti-virus?‘, looking at all the misconceptions and distortions the anti-malware industry has attracted. Especially the one about how scanners are supposed to detect only known malware, using static signatures. Er, no…

It’s not that the anti-malware industry – or the anti-virus industry, as its critics continue to call it, possibly in the hope that no-one will notice that it detects a good deal more than viruses and has done for many years – is beyond criticism. Indeed, perhaps the time has come for an informed re-evaluation of its role and context in the security industry. However, it does irritate me when criticism is based on lack of knowledge or, even worse, a desire to mislead in the hope of diverting market share from mainstream anti-malware towards their own product.

Martijn Grooten says in his introductory blog, that I have ‘a long history in the security industry — so long that he might even remember the days when anti-virus was indeed purely signature-based — and [have] heard it all before.’ Well, that depends on how you define the security industry: I didn’t actually start consulting for a vendor until I left the UK’s National Health Service in 2006, but my first dabblings in computer security go back to around 1986, when I worked at a medical school in London, though I didn’t have first-hand experience of malware until I moved to what is now Cancer Research UK. But I have certainly heard these misleading assertions before, far too many times.

There were tools around at that time that certainly weren’t signature-specific and had appeared pretty early on in the game, but I suspect that Martijn was thinking more of unequivocal known-malware scanners. I certainly remember the days when Virus Bulletin itself published 16-byte hexadecimal patterns that could be added to anti-virus programs and other utilities for detection of specific malware. (The fact that such lists were considered useful in a monthly magazine tells you something about the slow pace of malware distribution in those days.)

I haven’t gone all through my hard copy back numbers to find out exactly when VB stopped that practice, but it could be said that it was doomed almost from the beginning with the rise of polymorphism, and as scanners moved from simple string scanning, supplemented by wildcards and regular expressions, towards more complex algorithmic approaches. Well, doomed is an oversimplification – the fact that the feature continued into the second half of the 90s, when heuristic scanning was pretty well established, perhaps demonstrates that scanning technology was always multi-layered.

The Virus Bulletin archive page notes that “Industry veteran David Harley never tires of responding to such criticism.” Actually, I’m not sure about that: I am in fact very tired of it. But I do seem to have responded to it more times than I care to think about. In fact, while writing this article I suddenly realized that the same issue crops up in one of the first articles I wrote (with Andrew Lee) for ESET: Heuristic Analysis— Detecting Unknown Viruses.

Some of the most persistent myths in computing relate to virus and anti-virus (AV) technology. The widely-held belief that AV software can only detect specific, known viruses has been around since the early days of AV research. It wasn’t altogether true then; some of the first AV programs weren’t intended to detect specific viruses, but rather to detect or block virus-like behavior, or suspicious changes in files. And, it’s definitely not true now.

Nor was it true the first time I looked at the issue, long before I started to work with ESET. But there was a little more to this latest paper than the same old same old. The ‘resolutely independent security sceptic Kevin Townsend’ recently made some better-founded observations about his own distrust of the security industry that I felt deserved to be addressed. He has subsequently responded to my response (try and stay with the plot, please!) here: Still critical of the AV industry… after all these years. As before, many of his concerns centre on product testing and the use of testing results to imply perfect protection.

As I see it, the problem is this: it’s perfectly possible for a product to detect 100% of all the malware included in a test set, and perfectly reasonable for the vendor of a tested product to capitalize on a good result. However, test sample sets in mainstream tests are usually numbered in tens, maybe hundreds. Test sets have actually tended to shrink as the better testers have focused on dynamic testing, which is much more resource-intensive than old-school static testing, but gives a more accurate picture – if done properly! – of a product’s ability to detect that sample than static testing. Which is, of course why the Anti-Malware Testing Standards Organization AMTSO has promoted it so vigorously.

But if a test is based on a million samples – a quantity that would not really be practical except as a monstrous exercise in archaic static testing –the same problem persists. The number of known malicious samples is much, much greater than a million: let’s say hundreds of millions. And let’s not even think about the unknown samples…

It’s a bit of a sidebar, but here’s an oversimplified picture of why static versus dynamic matters. Suppose I give you a packet of assorted fruit candies (fruit sweets, for those of us who speak UK English…) and ask you to test that they’re the ‘right’ flavour. How might you go about it?

  1. You could look at the wrapper or even the sweet itself and assume that ‘if it’s red it’s strawberry, if it’s yellow it’s lemon, if it’s green it’s earwax…’ (Sorry, a bit of a Harry Potter moment there…) That’s (very) roughly analogous to first-generation string scanning, where a fairly superficial look at the program from the outside reveals characteristics that are known to characterize malware. Even then, you don’t necessarily need to have seen an identical sweetie to identify it. Different confectioners will tend to follow similar conventions, so there’s a good chance that if it’s yellow, it’s lemon-flavoured. Even a first-generation scanner might recognize generically that a hex string or a sequence of instructions is close enough to a known-malware signature (I hate that word!) to indicate a close relationship and malicious intent.
  2. You might run a chemical analysis against each sweet you test in the hope of finding out whether the colouring used matches the flavourings used. That’s actually closer to how modern scanners are likely to work, using a variety of techniques, including techniques that allow it to run a program in a safe virtual environment in order to see what it really does. Which suggests that the difference between static and dynamic isn’t as cut and dried as I’ve implied, but I did say this description would be oversimplified. In fact, it’s not far removed from the ways in which first-generation scanners worked: the real difference is the range of analytical techniques used by a modern scanner, from conceptually simple generic detections to sophisticated algorithms taking into account a wide range of factors.
  3. You can taste the sweets. Well, maybe you check for toxic substances first. (In fact, a significant factor in dynamic testing methodology is balancing the need to fool malware into thinking it isn’t sitting on a lab machine against the need to avoid spreading malware outside the lab environment.) That’s the dynamic end of the spectrum.

What have you proved when you’re lying in the ambulance with a bad case of poisoning, or at least have established that some or hopefully all the sweets in that bag are correctly labelled? Well, you haven’t proved that all the other sweets in all the other bags in the world are also correctly labelled (or not). You haven’t proved they’re non-toxic, either.

The samples that are used are a compromise, forced on the tester because testing a product against the total population isn’t practical. So for the test to be any use at all, the tester must persuade you that he used a representative sample. Maybe that sample set is as representative of the total sample population as the tester wants you to think, but how do you prove it? (An awful lot of tests – or a lot of awful tests – still simply assume that it is, and don’t give you any information on how the samples were selected.)

So it’s fine to say a product scored 100% in a given test (assuming it did, of course). But if marketing materials give the impression that this is the same as 100% detection of all malware or even all known malware, you shouldn’t trust them any more than Kevin does. (This may be the nearest we’ve ever been to agreeing on a testing issue…) That applies not only to vendor marketing, but also to testers: yes, most mainstream testers are in business and have to do marketing too.

Of course, a misleading claim may not be intentionally misleading, and that doesn’t necessarily mean that the product isn’t any good either, but it does mean that it isn’t marketed as accurately as it should be.

Author David Harley, ESET

Follow us

Copyright © 2016 ESET, All Rights Reserved.