Imperva, VirusTotal, and whether AV is useful

Introduction

I kind of hoped that the fuss about Imperva's somewhat discredited quasi-test, first publicized in Novermber, claiming that anti-virus detects less than 5% of new malware, would have died away by now. After all, there was enough criticism at the time to cause Imperva itself to modify its position both in the version of its report that it eventually made public (the version originally released to journalists seems to have been substantially different) and in a subsequent blog, However, it seems that some sectors of the media, notably the New York Times and The Register, haven't quite kept up with the plot. A second wave of reports ignores the methodological holes in Imperva's report and recycles its dubious statistics uncritically. Why dubious? Because, despite the protests of the AV community and of VirusTotal itself, the study by Imperva and the Israeli Institute of Technology was founded on the myth that VirusTotal reports provide a reliable way of ascertaining whether an AV product does or doesn't detect a given sample of malicious software.

Executive Summary

Imperva's study frustrated not only the AV research community, but anyone who cares about accurate testing and evaluation of security products and strategies. VirusTotal is a great service, but it's intended to give some idea of whether a given file is likely to be malicious. It simply isn't designed or suitable for evaluating the ability of one or more security products to detect it. At best, it tells you whether those products are capable of detecting it using the particular program module and configuration used by VirusTotal. (For more detail, see the November 28th section in the Timeline below.)

Using those data in that context to evaluate detection capability is a bit like evaluating the musicality of a rock band by listening to the drummer's backing vocals. Some drummers sing very well, but there's a bit more to most performances than the doo-wop in the background. And even a basic commercial anti-virus package does a lot more than look for static signatures extracted from malware already found and analysed. (Security suites include several other layers of security that reduce the chances that that malware will infect a system. I said reduce, not provide 100% protection!) The Imperva report leapt to a conclusion based on spurious statistics, and the company then tried to tell us that their claim (that less than 5% of new malware is detected by anti-virus) is borne out an AV-Test report. While the actual report wasn't linked, the screenshot included in Imperva's blog seemed to show the average industry detection results in three scenarios:

0-day malware attacks: avg = 87% (n=102)
malware discovered over last 2-3 months: avg = 98% (n=272,799)
widespread and prevalent malware: avg=100% (n=5,000)

5%, 87%: hardly any difference at all. ;-)

In fact, what that proves is that it’s perfectly possible for products to provide protection against malware proactively, not only by way of generic/heuristic detections (behaviour analysis, sandboxing, traffic analysis and so on), effective though they sometimes are, but also in terms of increased overall security through multi-layering.

I'm afraid I'm going to quote myself:

Personally (and in principle) I’d rather advocate a sound combination of defensive layers than advocate the substitution of one non-panacea for another, as vendors in other security spaces sometimes seem to. Actually, a modern anti-virus solution is already a compromise between malware-specific and generic detection, but I still wouldn’t advocate anti-virus as a sole solution, any more than I would IPS, or whitelisting, or a firewall.

No security researcher worth listening to is going to claim that anti-virus is the only security software you'll ever need, or that their software detects 100% of known and unknown malware, but then, there are no 100% effective solutions. While there may come a time when applications and operating systems are so secure that malware detection becomes unnecessary - at least, that's what people have been telling me since the 90s! - but to dismiss AV right now as not worth paying for is naive at best, irresponsible even. If there was no money to pay AV research specialists because there was no for-fee AV, the security industry in general would know a great deal less about malware than it does right now, and we'd all be the poorer for it.

Timeline

Around the end of November 2012, Imperva circulated the first version of their report among journalists. While no journalist shared the report itself with me, some described the basic methodology and requested comment: for instance, Kevin Townsend quoted myself and David Emm (of Kaspersky) in a piece for Infosecurity Magazine.

[27th March: Bill Brenner commented Better off without AV? Not yet. While he didn't comment on the essential invalidity of Imperva's 'test' he did make one or two valid points about Imperva's conclusions.]

November 28th: According to Kevin:

...This is the stark result of an analysis of 80 new viruses found and tested by Imperva. The company used the TOR network to scour the dark net to find the latest malware, and then developed an automated testing and monitoring process to test the samples using the VirusTotal website...The results obtained by Imperva are not encouraging. “The average detection rate of a newly created virus is 0%,” and “Typically, it takes four weeks for just 25% of antivirus vendors to detect a new virus.”

While the full article is worth reading, this was my own initial response, summarizing why VirusTotal should not be used as a means of measuring anti-virus detection performance:

Virus Total was never intended as a test of scanner performance, and isn't suitable for the job. This means, for instance, that some products will flag Possibly Unwanted applications as malware: on some products, this is because of default settings, in other cases, because VT have been asked to turn on a non-default option. In other words, some products as configured by VT may never detect certain samples because they're not unequivocally malicious. Others may be able to detect samples on-access, but not on-demand, because not all approaches to behaviour analysis and heuristics can be implemented in static/passive scanning.

If this was a real test, I'd be doubtful of its accuracy because there was no way of validating their results. We have no idea what samples they're looking at and whether they're correctly classified as malware, still less about their prevalence. In the absence of those data and of real testing that checks detection against on-demand and on-access scanning and using like-for-like , there is more than a whiff of marketing about this exercise.

Virus Total itself has spoken out several times against the misuse of the service as a substitute for testing. In fact, Julio Canto of Virus Total and I put together a paper a year or two ago that addresses this, among other issues:

http://go.eset.com/us/resources/white-papers/cfet2011_multiscanning_paper.pdf

Here are some extracts from that paper that cover most of the ground:

VirusTotal was not designed as a tool to perform AV comparative analyses, but to check suspicious samples with multiple engines, and to help AV labs by forwarding them the malware they failed to detect....

VirusTotal uses a group of very heterogeneous engines. AV products may implement roughly equivalent functionality in enormously different ways, and VT doesn’t exercise all the layers of functionality that may be present in a modern security product.
VirusTotal uses command-line versions: that also affects execution context, which may mean that a product fails to detect something it would detect in a more realistic context.
It uses the parameters that AV vendors indicate: if you think of this as a (pseudo)test, then consider that you’re testing vendor philosophy in terms of default configurations, not objective performance.
Some products are targeted for the gateway: gateway products are likely to be configured according to very different presumptions to those that govern desktop product configuration.
Some of the heuristic parameters employed are very sensitive, not to mention paranoid.

Conclusion
VirusTotal is self-described as a TOOL, not a SOLUTION: it’s a highly collaborative enterprise, allowing the industry and users to help each other. As with any other tool (especially other public multi-scanner sites), it’s better suited to some contexts than others. It can be used for useful research or can be misused for purposes for which it was never intended, and the reader must have a minimum of knowledge and understanding to interpret the results correctly. With tools that are less impartial in origin, and/or less comprehensively documented, the risk of misunderstanding and misuse is even greater.

[Also on the 28th of November, Intego blogged on the same topic: I somehow overlooked it at the time, but it makes several excellent points very cogently.]

4th December: Righard Zwienenberg commented at length on the Threatblog, making many of the same points and also addressing Imperva's initial contention that AV is not worth paying for.

6th December: the Dutch language site security.nl pointed to Righard's blog and gave right-of-reply to Imperva.

10th December: I expanded on the many objections to using VirusTotal as a substitute for testing in a blog for (ISC)2, and also addressed Imperva's somewhat evasive counterclaims as quoted at security.nl (in the same (ISC)2 article, and in English!)

13th December: Caroline Donnelly quoted Trend's Rik Ferguson in IT Pro: Rik observed that

“Simply scanning a collection of files, no matter how large or how well sourced misses the point of security software entirely..They were not exposing the products to threats in the way they would be in the wild.”

She also quoted Imperva's Tal Be'ery as saying:

"the evolving nature of security threats mean Ferguson’s recommendations may not work for every testing scenario."

Hmm. Not altogether to the point, in my humble opinion.

17th December: Imperva published a blog From A to V: Refuting Criticism of Our Antivirus Report in which it claimed to have acknowledged 'the limitations of our methodology' in the report. And, in fact, the publicly available copy of the study does include a summary of some of the issues raised subsequent to its original release. However, I didn't feel that either the blog or the modified report adequately addressed its quintessential weakness: that is a conclusion based on inaccurate data. I said so (and quite a lot more) in the article Imperva-ious to Criticism for an independent testing-focused blog.

1st January: the New York Times (in an article syndicated elsewhere) told us that The antivirus industry has a dirty little secret: its products are often not very good at stopping viruses. The article uncritically accepted the study's spurious statistics, as did Richard Chirgwin in the Register. Though both journalists did at least notice that Imperva's does have a commercial agenda. As Richard Chirgwin put it:

Imperva suggests enterprise security should devote more attention to detecting aberrant behavior in systems and servers. Which, unsurprisingly, happens to be the company’s own specialty.

January 2nd: my pseudonymous colleague Old Mac Bloggit - over at the Anti-Malware Testing blog - was totally enraged at all the misinformation and misrepresentation, and the sloppy journalism that overlooked or ignored all the material above: Journalism’s Dirty Little Secret. He wasn't the only one though: when I tweeted the link to that article, several AV people and other researchers retweeted it. While Kaspersky's Roel Schouwenberg tweeted:

Criticizing the AV industry is fine. But do it using proper research/tests. Repeat 2013 times:VirusTotal is not a testing tool.

VirusTotal's own Bernardo Quintero tweeted:

antivirus evaluation with less than 100 samples & using VirusTotal... is more like a joke than a study

Some of the media had a more balanced view, too. Tech News Daily Paul Wagenseil noted that Study Faulting Anti-Virus Effectiveness May Itself Be Flawed and gave Rik Ferguson, Graham Cluley, and an unnamed spokesman from Kaspersky a chance to comment, while Graeme Burton noted for Computing that

...the methodology of the study has been widely criticised by security specialists... VirusTotal is a website that analyses files and URLs to identify viruses, worms, trojans and other other malware – and not real-world threat exposure. Nor did the study take account of the different parameters in which the products could be run..

And for Channelnomics, Stefanie Hoffman noted that Anti-virus [is] Evolving, But Here To Stay, making the point that:

For solution providers, anti-virus by itself has long since become commoditized — its time as as profitable standalone come and gone. However, almost every solution provider will continue to carry some form of anti-virus in their portfolio, used in tandem with their own unique blend of security solutions and services.

January 3rd: Max Eddy, for PC Magazine, took a slightly different approach and asked for the opinions of professional testers (including comment from AV-Test, AV-Comparative, NSS Labs, and Dennis Labs) and found that Experts Slam Imperva Antivirus Study. Eddy remarked that:

Doom-and-gloom research like this always requires a hefty grain of salt, but after speaking with numerous industry experts an entire shaker might be necessary.

Randy Abrams (now with NSS) summed up:

It is rare that I encounter such an incredibly unsophisticated methodology, improper sample collection criteria, and unsupported conclusions wrapped up in a single PDF.

And Simon Edwards (Dennis Labs) commented as regards Imperva's claims that free AV performs better than for-fee products:

This is counter to all of our findings over many years of testing ... Almost without exception the best products are paid-for.

[I tried to move the discussion on: Going beyond Imperva and VirusTotal but with only partial success. Gunter Ollman joined in with a prime piece of anti-virus bashing on the 7th of January, to which I responded (somewhat testily) on 8th January in The death of AV. Yet again. And quoted Pierre Vandevenne at some length in The Posthumous Role of AV. Lysa Myers added another dollop of commonsense in That Anti-Virus Test You Read Might Not Be Accurate, and Here’s Why. And that is probably as far as I'll take the timeline in this blog article, but I don't think it'll be the end of the controversy.]

Conclusion

Forget about offending the AV industry if you like - no-one else worries about it - but consider whether you want to base your security strategy (at home or at work) on a PR exercise based on statistical misrepresentation and misunderstanding. Don't look for The One True [probably generic] Solution: look for combinations of solution that give you the best coverage at a price you can afford.

I'm thinking primarily about business users here, but the principle applies to home users too: the right free antivirus is a lot better than no protection, but the relatively low outlay for a competent security suite is well worth it for the extra layers of protection.

David Harley CITP FBCS CISSP
ESET Senior Research Fellow