AV Numbers Game

...I would suggest that you take any statement like "Grottyscan AntiVirus is best because it detects 200 million viruses" with a pinch of salt. Actually, a whole salt mine...

…I would suggest that you take any statement like “Grottyscan AntiVirus is best because it detects 200 million viruses” with a pinch of salt. Actually, a whole salt mine…

That Magic Lantern thing just keeps raising its head (and an ugly little head it is too, poor thing…) Earlier this week I was in Krems, Austria, for the EICAR conference,and the story was alluded to in a paper by Eric Filiol and Alan Zaccardelle called “Magic Lantern… Reloaded/Anti-Viral psychosis McAfee Case," though it was kind of peripheral to the other topics covered in that paper.

I picked up one of those other topics in a blog for SC Magazine, in particular the suggestion that the industry – or at any rate one of our competitors – exaggerates the size of the malware threat by detecting the same threat under more than one name. I'm quite clear in my own mind that this suggestion is the result of confusion between the number of threats and the number of detections (signatures, if you like) that a scanner has. I addressed that at some length for SC Mag, but the gist is this.

  • The total number of threats a company has detection for is arbitrary and subjective: it depends entirely on how you count. ESET doesn't, as a matter of policy, discuss that total as far as our engine is concerned, because it wouldn't be useful and would generally be misleading. 
  • Actually, the number of threats received on a daily basis is also pretty subjective, though it does it least give some idea of the magnitude of threats. In our case, I believe it's a count of unique binaries: other companies may do it differently. Oh look, I forgot to tell you what the number is. ;-)
  • Neither count has anything to do with detection names, which is one of the points of the paper: the authors suggest that because in one case, the same threat was detected at different times using 3-4 different names (four according to the presentation, if I remember correctly), it means that the vendor in question is quoting four times the real number of threats they process. This is a fallacy:
    • Detection names evolve as we learn more about a threat.
    • They may also be different according to the detection context (for example, whether paranoid heuristics are being used).
    • Detection names might actually be intended to give information to the lab that generated them as much as or more than to the customer.
    • Most importantly, there is no "standard" way of detecting a threat: there may be many ways of detecting it, and all of them valid.

Let's try the hypothetical example of a malicious program that uses a well-known malicious obfuscator. What do you call the detection?

  • Product A might give it a name that reflects the fact that the basecode clearly belongs to a certain malware subfamily. If the family includes 150 variants, each variant might have a different name.
  • Product B might see resemblances to a wider family and name it accordingly with a name that reflects a single generation detection that picks up 100,000 subvariants.
  • Product C might give it a name that reflects the obfuscator that was used.
  • Product D might give it a name that reflects the obfuscation algorithm.
  • Product E might detect everything that uses an obfuscator – any obfuscator. So the number of samples that C, D and E detect under a single name might vary widely between each product.
  • Product F might use a name that reflects one of its other characteristics: for example, the fact that it uses autorun.inf to install itself, if it can. In that case, a single name might cover millions of samples and thousands of otherwise unrelated families.

In this instance, two products might both detect every sample of our hypothetical malware that exists. Which is best, A or F? You might argue that A is best because it has more signatures, or that F is best because it detects more samples. Actually, you can't determine which is best, because they're counting different things, and anyway the detection names don't tell you anything about numbers in either instance.

In that blog, I said that I didn't know if the competitor in question publishes the total number of threats it detects. (That was literally true: I was getting my internet connection in a hotel, and for some reason it wouldn't connect to any of the search engines I tried. It's amazing how difficult it is to research a topic without a search engine…)

Kurt Wismer suggested that it does, and came up with this link from 2006. However, that looks to me like The Register's interpretation of some figures about detections, not numbers of individual threats. [Update: I owe thanks to Kurt, who has since found the post by Jimmy Kuo to which the Register article refers, and confirms that it's about detections, not raw threat volumes.]

In any event, I would certainly suggest that you take any statement like "Grottyscan AntiVirus is best because it detects 200 million viruses" with a pinch of salt. Actually, a whole salt mine.

ESET Senior Research Fellow


Sign up to receive an email update whenever a new article is published in our Ukraine Crisis – Digital Security Resource Center