HyperAIHyperAI
2 months ago

Learning Type Inference for Enhanced Dataflow Analysis

Seidel, Lukas ; Effendi, Sedick David Baker ; Pinho, Xavier ; Rieck, Konrad ; van der Merwe, Brink ; Yamaguchi, Fabian
Learning Type Inference for Enhanced Dataflow Analysis
Abstract

Statically analyzing dynamically-typed code is a challenging endeavor, aseven seemingly trivial tasks such as determining the targets of procedure callsare non-trivial without knowing the types of objects at compile time.Addressing this challenge, gradual typing is increasingly added todynamically-typed languages, a prominent example being TypeScript thatintroduces static typing to JavaScript. Gradual typing improves the developer'sability to verify program behavior, contributing to robust, secure anddebuggable programs. In practice, however, users only sparsely annotate typesdirectly. At the same time, conventional type inference facesperformance-related challenges as program size grows. Statistical techniquesbased on machine learning offer faster inference, but although recentapproaches demonstrate overall improved accuracy, they still performsignificantly worse on user-defined types than on the most common built-intypes. Limiting their real-world usefulness even more, they rarely integratewith user-facing applications. We propose CodeTIDAL5, a Transformer-based modeltrained to reliably predict type annotations. For effective result retrievaland re-integration, we extract usage slices from a program's code propertygraph. Comparing our approach against recent neural type inference systems, ourmodel outperforms the current state-of-the-art by 7.85% on theManyTypes4TypeScript benchmark, achieving 71.27% accuracy overall. Furthermore,we present JoernTI, an integration of our approach into Joern, an open sourcestatic analysis tool, and demonstrate that the analysis benefits from theadditional type information. As our model allows for fast inference times evenon commodity CPUs, making our system available through Joern leads to highaccessibility and facilitates security research.