Behavioural program analysis is widely used for understanding malware behaviour, for creating rule-based detectors, and for clustering samples into malware families. However, this approach is ineffective when the behaviour of individual samples changes across different executions, owing to environment sensitivity, evasive techniques or time variability. While the inability to observe the complete behaviour of a program is a well-known limitation of dynamic analysis, the prevalence of this behaviour variability in the wild, and the behaviour components that are most affected by it, are still unknown. As the behavioural traces are typically collected by executing the samples in a controlled environment, the models created and tested using such traces do not account for the broad range of behaviours observed in the wild, and may result in a false sense of security.
In this paper we conduct the first quantitative analysis of behavioural variability in Windows malware, PUP and benign samples, using a novel dataset of 7.6M execution traces, recorded in 5.4M real hosts from 113 countries. We analyse program behaviours at multiple granularities, and we show how they change across hosts and across time. We then analyse the invariant parts of the malware behaviours, and we show how this affects the effectiveness of malware detection using a common class of behavioural rules. Our findings have actionable implications for malware clustering and detection, and they emphasize that program behaviour in the wild depends on a subtle interplay of factors that may only be observed at scale, by monitoring malware on real hosts.
Got a question about this presentation? To get in touch with the speakers, contact Erin Avllazagaj on Twitter at @albocoder.