USENIX Security2018

FANCI : Feature-based Automated NXDomain Classification and Intelligence

Samuel Schüppen, Dominik Teubert, Patrick Herrmann, Ulrike Meyer

被引用 164 次

摘要

FANCI is a novel system for detecting infections with domain generation algorithm (DGA) based malware by monitoring non-existent domain (NXD) responses in DNS traffic. It relies on machine-learning based classification of NXDs (i.e., domain names included in negative DNS responses), into DGA-related and benign NXDs. The features for classification are extracted exclusively from the individual NXD that is to be classified. We evaluate the system on malicious data generated by 59 DGAs from the DGArchive, data recorded in a large university's campus network, and data recorded on the internal network of a large company. We show that the system yields a very high classification accuracy at a low false positive rate, generalizes very well, and is able to identify previously unknown DGAs. USENIX Association 27th USENIX Security Symposium 1165 we also show that FANCI generalizes very well, that is, it maintains its detection quality even when applied to data recorded in a network different from the one it was trained in. Applying FANCI, we were able to identify ten DGAs not included in the DGArchive at the time of writing. We reckon that at least four of them were completely unknown, while the others most likely result from unknown seeds or are variations of known DGAs. Finally, our system is very efficient with respect to both training (5.66 min on 92,102 samples) and prediction (0.0025 s per sample) such that it is even able to perform on-thefly detection in large networks without sampling. FANCI's lightweight feature design and its generalizability allows for versatile application scenarios, including the use of its classification as a service, and its use in large-scale networks as well as on home-grade hardware. Preliminaries In this section, we provide a brief overview on the types of mAGDs different DGAs generate and categorize different types of domain names that occur in NXD responses due to benign causes. This is followed by an overview of the supervised learning classifiers we use in this work. Note that throughout this work, we always use NXD response to refer to the entire UDP 4 packet containing the DNS response. In contrast, we refer to NXD as the bare domain name included in such a response. Domain Names in NXD Responses In order to highlight the diversity in the generation schemes used by different DGAs, Figure 1 illustrates example mAGDs of six different DGAs. Where mAGDs generated by Kraken, Corebot, and Torpig look completely random, the mAGDs of Matsnu are concatenations of genuine English words. mAGDs of Volatile-Cedar are all permutations of the same base domain name and Dyre generates mAGDs of equal length that consist of a 3 character prefix followed by a hash-like string. In addition to NXDs generated by DGAs (i.e., mAGDs), there are mainly three groups of benign non-existent domains (bNXDs) originating from typing errors, misconfigurations, and misuse, respectively, where misconfiguration and misuse belong to the group of benign algorithmically-generated domains (bAGDs). bAGDs are, like mAGDs, generated algorithmically but originate from benign software and only have benign purposes. Typing error bNXDs are caused by humans misspelling existing domain names. Misconfiguration 4 in rare cases TCP is used b k n l l s n b f z q r . n e t c d z o g o e x i s . t v h d o z p c y . com (a) Kraken 3 l g r u p w d i v s f m 2 w 4 k n g 2 i h a . d d n s . n e t o j y v i p s 6 k l s n q p y . i n a f 5 f m b 7 8 s b u n o 4 c . ws (b) Corebot s a l t -amount-p a t t e r n . com company-d e p e n d . com b t k i n d a s a l a d m w . com 1166 27th USENIX Security Symposium USENIX Association # Feature Output F (d 1 ) F (d 2 ) 13 † Contains Digits binary 0 1 14 * † Vowel Ratio rational 0.21 0.3 15 * † Digit Ratio rational 0.0 0.2 16 * † Alphabet Cardinality integer 12 18 17 * † Ratio of Repeated Characters rational 0.25 0.33 18 * † Ratio of Consecutive Consonants rational 0.67 0.36