Larry Seltzer posted an interesting item yesterday.  The article on "SW Tests Show Problems With AV Detections " is  based on an "Analyst's Diary" entry called "On the way to better testing."

Kaspersky did something rather interesting, though a little suspect. They created 20 perfectly innocent executable files, then created fake detections for ten of them. Then they uploaded (actually, the blog says re-uploaded) the files to Virus Total. They claim that after ten days "up to" 14 other vendors also had detection for the files for which Kaspersky had a fake detection.

(No, ESET wasn't one of them! We don't have a marketing axe to grind here.)

In fact, several vendors only detected one or two of those files, not the whole set. Furthermore, in at least one case all the files were detected as "suspicious" rather than malicious, and only by that company's online scanner: in that context, it's possibly defensible, certainly as a short-term flag.

So is there really a problem here? When Virus Total sends a sample of what may be a missed detection to the company whose scanner missed it, they probably don't expect the company simply to add detection without validating the sample: otherwise, a false positive could cascade through the industry in a very short time. [Update: there's an interesting related blog from Hispasec, who provide the Virus Total service, at - I'm afraid it's an automatic translation, but you get the idea... (Thanks, Pedro, for calling my attention to it.)]

But as Larry suggests, there are other factors at play there, rather than a simple mindset of "if Kaspersky detects it, it must be malicious". With all due respect to the company, no-one in this business has escaped problems with false positives, as John Leyden pointed out today in an article in the Register.

In the Kaspersky blog, Magnus suggests that this is really a problem with testing. Well, he has a point. Where tests are carried out with huge sample sets and minimal time and financial resources, it's inevitable that some testers will rely on "source reputation and multi-scanning" rather than manual validation. In fact, Ján Vrabec recently cited problems with a test where a much smaller test set was "validated" according to whether four or more scanners detected it as malicious.

However, AV companies (and the major test organizations) don't, in general, rely on these approaches. Obviously, a significant proportion of analysis for a virus lab has to be manual, though where tens of thousands of samples are received daily, there has to be as much automation as possible. So it may be that there are many factors at work here, such as:

  • Timing issues - a sample may be wrongly detected at one point because of "reputation sourcing" and detection removed after manual analysis. A Virus Total analysis report is a snapshot of one moment in time: it shouldn't be seen as a blot on a company's permanent record.
  • In fact, the same problems frequently pointed out in other contexts with the inappropriate use of Virus Total also apply here. VT uses a battery of command line scanners, and different products will behave differently in different contexts. Taken into account along with the timing issue mentioned above, the numbers cited here don't seem to mean much.
  • As Larry suggests, it's not unknown for an aggressive heuristic to generate a false positive. And as a spokesman for another product suggested, products with a wide range of functionality may react differently according to what backend, gateway or desktop settings are enabled.

In fact, the problem Kaspersky has flagged may have longer-lasting effects than the company has acknowledged. Larry Seltzer's article seems to say that subsequently, a "hello world" program generated with the same compiler and compiler settings was also flagged by at least two programs. Could this be a compiler-specific heuristic implemented as a result of Kaspersky's artificial false positives? If so, Kaspersky also bear some of the blame.

The sad thing is, that genuine problems may take second place here to the easy story of "who copies from whom." By going straight to the press and presenting journalists with the innocent executables, encouraging them to use Virus Total (inappropriately, in our view) to check their conclusions, Kaspersky has made inevitable what they flagged as a risk. Reproducible experimentation is a worthy target, as long as the experimental methodology is sound, but does that really apply here?

But the real problem here is that Magnus is perpetuating a fallacy that already worries many people in the industry. He suggests that the problem here is with static testing, and that the move to dynamic testing is going to fix things. However, that's neither the problem nor the cure in this case. The problem is non-validation, and the cure is validation. Right now, lots of tests claim to be dynamic, because that's what AMTSO advocates. Quite rightly, in principle: good dynamic testing has the potential to be a more accurate test of a product's efficacy than static testing. But good static testing remains a better approach than bad dynamic testing, which few testers are doing well at the moment. A significant part of that problem is, unfortunately, still validation.

The Research Team