Does Representation Matter? Evaluating IRs for LLM-based Binary Decompilation

Authors

Tomás Pelayo-Benedet, Kevin Borgolte, Ricardo J. Rodríguez

Publication

Proceedings of the 9th Workshop on Binary Analysis Research (BAR), February 2026

Abstract

Binary decompilation remains an open challenge in reverse engineering. While recent approaches have begun to leverage the capabilities of large language models (LLMs), most continue to focus exclusively on disassembly as input, ignoring the intermediate representations (IRs) employed by static binary analysis tools and traditional decompilers.

In this paper, we present the first systematic evaluation of LLM-based decompilation using hierarchical IRs. In particular, we investigate how different levels of abstraction in IRs affect binary decompilation quality in five commercial LLMs. Our findings show that the choice of IR significantly influences performance: Smaller models benefit markedly from high-level structured IRs, while larger models show stable performance across IR levels. Our evaluation also reveals a significant trade-off between recompilation success and functional correctness. Code decompiled from disassembly tends to recompile more reliably, but it is less often functionally correct. In contrast, code decompiled from high-level IRs more often retains the original functionality, albeit with slightly lower recompilation success rates. Furthermore, we find that cognitive complexity metrics, such as Halstead measures, are strong predictors of decompilation difficulty, while traditional structural metrics, such as cyclomatic complexity, offer limited insight. We also highlight the main lines of research to improve binary decompilation by combining the advantages of static binary analysis techniques with the capabilities of modern LLMs.

BibTeX

@inproceedings{bar2026-does-representation-matter,
  title     = {{Does Representation Matter? Evaluating IRs for LLM-based Binary Decompilation}},
  author    = {Pelayo-Benedet, Tomás and Borgolte, Kevin and Rodríguez, Ricardo J.},
  booktitle = {Proceedings of the 9th Workshop on Binary Analysis Research (BAR)},
  data      = {https://github.com/rub-softsec/does-representation-matter-dataset},
  date      = {2026-02-27},
  doi       = {10.14722/bar.2026.23077},
  editor    = {Bardin, Sébastien and Hauser, Christophe},
  location  = {San Diego, CA, USA},
  publisher = {Internet Society (ISOC)},
  volume    = {9}
}