Anti-malware testing on the Windows platform remains highly controversial, even after almost two decades of regular and frequent testing using millions of malware samples. By contrast, there is a tiny number of threats (by comparison) that affect OS X users, which suggests that it ought to be easier to test with a sample set that represents a high proportion of all known OS X threats. But there are also fewer prior tests on which to base test methodology, so establishing sound mainstream testing is trickier than you might think, not least because so few people have experience of such tests. But as both Macs and Mac malware increase in prevalence, the importance of testing software that’s intended to supplement the internal security of OS X increases, too.

OK. That’s more or less what it says in the abstract for our Virus Bulletin paper on the topic, but that’s because it happens to be what we think. :) Of course, we encourage you to read the paper – Mac Hacking: the Way to Better Testing? But this is the second article in a blog series based on the presentation rather than directly on the paper, offering a more concise summary of our views on Mac testing issues. The previous article is here.

The testers dilemma: Static testing in a dynamic threatscape

Comparative testing in the Mac world introduces an extra competitive layer. Products are not only in competition with each other, but with Apple, in that dynamic or on-access testing is only practical with samples for which Apple hasn’t implemented signature detection yet, or with samples that XProtect sigs may not catch in a real-life infection scenario. That is, for example, in an execution context where Xprotect.plist doesn’t kick in as expected and intended, because the utility doesn’t cover a specific infection vector.

In what ways might testing a Mac be easier? What can a tester do to make testing more real-world? Are there things that can reasonably be done that would make a test less realistic yet more fair and accurate?

Reconfiguring the environment and staying real-world

Admittedly, there are ways to ‘de-patch’ the relevant components of the OS, but is that real-world testing? If the OS is able to intervene because the malware is a variant that it recognizes, that’s real-world in a sense, but it’s not whole-product testing. At a time when mainstream testers are anxious to implement whole-product testing in accordance with AMTSO guidelines, it is difficult for testers to do so on OS X, whether for technical reasons or because of resource issues. (There are analogous issues on other platforms, especially mobile devices.) And that’s OK as long as it’s clear to readers of test reviews that what they’re looking at is a compromise, not a perfect reflection of a product’s capabilities in the real world. That’s because such a test is not a reliable guide to its capability regarding malware that isn’t already known and potentially neutralized. Furthermore, static testing isn’t conclusive proof of the detection that it would offer (or would have offered) in the absence of the operating system’s own defences.

Gatekeeper can be overridden manually, though that might still be a problem during an intensive test unless the Gatekeeper response is automated, or the utility is disabled completely.

The limitations of the signature-based XProtect.plist utility mean that it can be ‘evaded’. Certainly if you’re able to test before an XProtect signature is added, it’s unlikely that the utility will interfere with that testing segment. It’s possible, though, that static batch testing could still be derailed by the inclusion of samples for which a signature does exist.

That window before XProtect covers a new threat discovered by Apple’s own or other researchers has at times been days or weeks wide, leaving machines that are unprotected by mainstream anti-malware exposed to potential infection, though improved communication between Apple and the AV industry has significantly reduced this problem by facilitating the exchange of samples. Any window of opportunity that is available to the malware is also available to the tester. But that window closes when an Xprotect signature has been added. So most Mac testers have been almost entirely focused on retrospective testing with samples that are already ‘XProtected’ or at any rate assumed to be. Unless they have access to really fresh samples, which is unusual.

Testing Longitudinally

A test run on a fully-patched system running the latest OS X versions can only be real-time if the samples aren’t detectable by active operating system utilities when opened, executed, copied and so on. Otherwise the OS won’t allow malware to execute. Testers don’t necessarily have time to spend wrestling with OS X internals. They may not have the resources to acquire, validate and test with samples before the xprotect.plist window closes, or to test longitudinally (i.e. over time) so as to accommodate that window of opportunity. Testers who do this routinely tend to be certification testers who have the capacity to run longitudinal testing because that’s essentially what their vendor customers are paying for. What we have here is an aggravated version of the dilemma already faced by testers when considering what Windows version(s) and patch levels to test with.

Testing by De-Protecting

Disabling system-integrated protection moves you away from the ‘real world’, at least as most people experience it. Not only is the system ‘untypical’ of real-life user experience, but disabling one aspect of the inbuilt security may break something else. If you’re testing in a live network scenario, you may have just introduced a risk to other vulnerable systems.

Dynamic testing on an unpatched, de-XProtected system or an earlier OS that doesn’t include Xprotect is perhaps real-world in the very limited sense that unpatched and un-updated systems do undeniably exist in the real world, though the number of up-to-date scanners that will run under OS X versions predating Snow Leopard (OS X 10.6) has already decreased dramatically. And (unless you're doing platform-specific tests) something feels very wrong about using obsolescent OS system versions in order to test an additional layer of security that can’t be tested on a current OS version. Breaking’ a current OS (or, indeed, an app under test) in order to isolate a single layer of tested functionality is a long way from the principles of whole-product testing and can’t be representative of the average customer’s real-world experience, but remains a necessary compromise when testing with older malware.

Emulation and Obfuscation

Static testing of on-demand components rather than whole product testing is still too common, sometimes because static testing is cheaper and, in principle, simpler to implement, especially with large sample sets.

In many cases modern scanners do use emulation in on-demand scanning so that a program being scanned is allowed to execute harmlessly in a virtualized or emulated environment. Nonetheless, testing that assumes that ability in all contexts is not maintaining a level playing field. If sandboxing is less effective in some commercial-grade Mac security products, that may be because malicious Mac-directed programs have less need to be technically complex than their Windows-directed siblings, and that may be reflected in comparatively laid-back anti-malware technology. It’s not surprising if vendors don’t use resource-intensive technologies that are not – or not yet – needed in the context of OS X.

Mac product testing, however, is, historically, largely based at present on the (increasingly inaccurate) assumption that the difficulties of on-access scanning need not be addressed since Mac malware is less likely to be self-protected by the kind of anti-forensic obfuscation that characterizes so much Windows malware.

Some Mac scanners do make less use of advanced proactive detection techniques than commercial-grade Windows scanners do, even when they come from a company with Windows products. This is especially true in the context of Mac-specific threats, and in fact some products may be little more than a Mac-friendly shell around a ported Windows or Linux engine with added detection of Mac and cross-platform threats. However, it's an oversimplification to imply that any but the most basic scanners make no use of behaviour analysis. Fortunately, some testers are now not only recognizing the problem and taking measures to counter it, but are also focusing less on raw detection and more on whole product testing, considering a whole range of factors that contribute to protecting OS X systems. AMTSO, the Anti-Malware Testing Standards Organization, has published a guidelines document on whole product testing: AMTSO Whole Product Testing Guidelines .

David Harley and Lysa Myers
ESET North America