AV-Comparatives, one of the major anti-malware testing organizations, has just announced its retrospective test for November. [Link removed - no longer available (2017)] Retrospective or "frozen" testing involves testing the ability of one or more products to detect threats proactively, using techniques such as advanced heuristics rather than signature detection.

The test used new and unique samples received between 4th and 31st August, while the 16 products testing were "frozen": that is, they didn't receive updates and signatures after the 4th. This means, of course, that they were only able to detect these samples using proactive or generic techniques, since they had no access to signatures generated after the samples were discovered, so it's a fair test of the detection capabilities of a product with good passive heuristics. It was an on-demand test, using static analysis. For advanced heuristics using some form of behaviour analysis, dynamic testing might be a fairer test: however, it would be extraordinarily time- and resource-intensive to run a full dynamic test with such a large sample set. (Which is probably why fully dynamic testing is still so rare.) However, a really interesting feature of this test is that the results take into account not only proactive detection, but false positives.

As it happens, I've been doing quite a few presentations on testing in the past few weeks, and one of the points I've been making that usually generates lively discussion concerns the continual trade-off between maximizing "true positives" and minimizing false positives (FPs). (Of course, that's also a theme that features strongly in the AMTSO-approved document on the principles of testing.) When people talk about FPs, they usually think of embarrassing and very public incidents like Windows system files being incorrectly identified as malware. Of course, that happens: as the anti-malware industry tries to focus more and more on proactive detection, it's amazing that it doesn't happen more often. Far more common these days is the identification (or, arguably, misidentification) of corrupted or truncated samples as malware even when such sample can't actually execute or do damage. While you can certainly argue that a program meant to be malicious should be detected even if it isn't able to deliver, there are as many views and ways of addressing that issue as there are vendors. Then, there's the issue of what constitutes a "suspicious" file. Some products come quite close to equating "executable" with "suspicious". Many come very close to diagnosing any file processed with a runtime packer as suspicious, even though there are still legitimate reasons for using runtime compression, for instance. There are justifiable reasons for taking this approach. It's certainly fair to be cautious about scenarios like this:

It’s packed (compressed, maybe multiply, with a runtime packer)
It’s packed with a known “black” packer (suggests malice)
It’s packed with a custom variant packer (suggests obfuscation and malice)
It’s packed, but was already a small executable (suggests obfuscation and malice)

However, this isn't malware-specific detection. It's more like the defenses evolved in the heyday of mass-mailers, i.e. blocking attachments like .EXE, .SCR, .COM and so on. In other words, rather than blacklisting a single malicious object, or a family of malware, a whole class of executable objects is being blocked. That’s fine if the customer understands what is happening, but it’s not a malware-specific detection method, and, potentially, it institutionalizes false positives. Furthermore, it’s not a universal practice: maybe it should be, but that’s a whole different discussion. So tests that assume that suspicious==malicious are making an inappropriate value judgement, and to see a test that takes these issues into account is very encouraging.

And yes, we seem to have done pretty well, thank you for asking. We were one of the front runners in detecting test samples, and did even better in terms of minimizing false positives, and that combination has put us in the very gratifying position of having the only product certified as Advanced+.

But let's be clear about this: we're not talking about absolute values here, but a continuum. Some companies have a much higher tolerance for false positives than others. And there are many other factors that you might need to consider when you evaluate a product. Certainly, you should read the full report, rather than take my word for anything. (Go to the AV-Comparatives site and check report number 20 on the comparatives page.) Still, it's always nice to find a major testing group prioritizing similar criteria to those that we value.

David Harley
Director of Malware Intelligence