Binary analysis with architecture and code section detection using supervised machine learning

Bryan Beckman, Jed Haile, Rita Foster

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

When presented with an unknown binary, which may or may not be complete, having the ability to determine information about it is critical to future reverse engineering, particularly in discovering the binary's intended use and potential malicious nature. This paper details techniques to both identify the machine architecture of the binary, as well as to locate the important code segments within the file. This identification of unknown binaries makes use of a technique called byte histogram in addition to various machine learning (ML) techniques, which we call 'What is it Binary' or WiiBin. Benefits of byte histograms reflect the simplicity of calculation and do not rely on file headers or metadata, allowing for acceptable results when only a small portion of the original file is available; e.g., when encrypted and/or compressed sections are present in a binary. Utilizing WiiBin, we were able to accurately (>80%) determine the architecture of test binaries with as little as a 20% contagious portion of the file present. We were also able to determine the location of code sections within a binary by utilizing the WiiBin framework. Ultimately, the more information that can be gleaned from a binary file, the easier it is to successfully reverse engineer.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE Symposium on Security and Privacy Workshops, SPW 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages152-156
Number of pages5
ISBN (Electronic)9781728193465
ISBN (Print)9781728193465
DOIs
StatePublished - May 2020
Event2020 IEEE Symposium on Security and Privacy Workshops, SPW 2020 - Virtual, San Francisco, United States
Duration: May 21 2020 → …

Publication series

NameProceedings - 2020 IEEE Symposium on Security and Privacy Workshops, SPW 2020

Conference

Conference2020 IEEE Symposium on Security and Privacy Workshops, SPW 2020
Country/TerritoryUnited States
CityVirtual, San Francisco
Period05/21/20 → …

Keywords

  • Algorithms
  • Architecture Identification
  • Binary
  • Byte Histogram
  • Endianness
  • Entropy
  • Machine Learning

Fingerprint

Dive into the research topics of 'Binary analysis with architecture and code section detection using supervised machine learning'. Together they form a unique fingerprint.

Cite this