Congratulations to our friends at Virus Bulletin for yet another great conference (the 20th) in Vancouver this week. Congratulations also to our own Pierre-Marc Bureau, voted the best newcomer to the AV business at the conference.

By kind permission of Virus Bulletin, we've already put two of the papers written or co-authored by ESET researchers up on the White Papers page.

AV Testing Exposed by Peter Kosinár, Juraj Malcho, Richard Marko, and David Harley looks at the good, the bad and the ugly in testing. Here's the abstract:

As the number of security suites available on the market increases, so does the need for accurate tests to assess their detection capabilities and footprint, but accuracy and appropriate test methodology becomes more difficult to achieve. Good tests help consumers to make better-informed choices and help vendors to improve their software. But who really benefits when vendors tune products to look good in tests instead of maximizing their efficiency on the desktop?

Conducting detection testing may seem as simple as grabbing a set of (presumed) malware and scanning it. But simplicity isn’t always easy. Aspirant detection testers typically have limited testing experience, technical skills and resources. Constantly recurring errors and mistaken assumptions weaken the validity of test results – especially when inappropriate conclusions are drawn, as when likely error margins in the order of whole per cents are ignored, causing exaggerated or even reversed ranking.

We examine (in much more detail than previous analyses) typical problems like inadequate, unrepresentative sizing of sample sets, limited diversity of samples and the inclusion of garbage and non-malicious files (false positives), set into the context of 2010’s malware scene.

Performance and resource consumption metrics (e.g. memory usage, CPU overhead) can also be dramatically skewed by incorrect methodology such as separating kernel and user data, and poor choice of ‘common’ file access.

We show how numerous methodological errors and inaccuracies can be amplified by misinterpretation of the results. We analyse historical data from different testing sources to determine their statistical relevance and significance, and demonstrate how easily results can drastically favour one tested product over the others.

Call of the WildList: Last Orders for WildCore-Based Testing? was written by David Harley and K7's Andrew Lee. Here's the abstract:

The well-documented problems with WildList testing derive from difficulties in adjusting to the 21st Century threat landscape. The (obviously overstretched) WildList Organization’s focus on self-replicating malware, which nowadays comprises just a small percentage of the whole range of malware types; the lengthy testing and validation process between the appearance and the inclusion of a specific malicious program on the list, and the availability of the underpinning test set to WildList participants are all cited as objections to the validity of WildList testing, and some vendors and testing organizations have heavily criticized it – some vendors even withdrawing from tests that rely heavily on it.

In line with AMTSO’s preference for dynamic over static testing, most mainstream testers have supplemented or replaced WildList testing with some form of dynamic methodology, which, done correctly, is assumed to be a better reflection of today’s user experience. So does WildList testing still have a place in testing and certification? Is it still a meaningful differentiator? If it isn’t, does that mean that sample validation is no longer considered a practical objective for testers, or is that a misreading of the AMTSO guidelines on dynamic testing?

This paper summarizes the static/dynamic debate, examining the contemporary relevance of the WildList and WildCore.

David Harley CITP FBCS CISSP
ESET Senior Research Fellow