Malware detection plays a vital role in computer security. Modern machine learning approaches have been centered around domain knowledge for extracting malicious features. However, many potential features can be used, and it is not scalable to manually identify the best features, especially given the diversity of malware.
In this paper, we propose Neurlux, a not-very-deep convolutional neural network (NVD-CNN) for malware detection. Neurlux does not rely on any feature engineering, and rather it learns automatically from the raw JSON data output by a dynamic analysis detailing behavioral information. Our work is performed on two different sandboxes to ensure that our data is not biased. We also evaluate neural networks trained on specific features to gauge their importance in malware detection. These networks borrow ideas from the field of document classification, using word sequences present in the reports to predict if a report is from a malicious binary. We also compare against a recent malware classification model, MalDy. Our results show that Neurlux outperforms both the individual feature models and MalDy, showing that it is able to learn effectively from the raw reports.