XSearch

Introduction

In this work, we identify two fundamental challenges in existing semantic code search: (C1) the interpretability gap and (C2) the generalization collapse. While embedding-based retrievers perform well on in-distribution benchmarks, they are often black-box and fragile in real-world, out-of-distribution scenarios—frequently relying on shortcut cues such as leading tokens (e.g., function names) rather than the true functional semantics.

We argue that the root cause lies in the learning paradigm: current retrievers mainly perform inductive matching via a single global query–code similarity, whereas developers need deductive verification—explicitly checking whether each functional requirement in the query is satisfied by the code. To bridge this gap, we introduce XSearch, which reformulates retrieval as a concept-to-code alignment problem and unifies explanations with ranking.

(i) Intrinsic Explainability via Deductive Alignment: XSearch decomposes a query into compositional concepts (e.g., actions, entities, constraints), identifies salient code spans, and computes an optimal concept-to-code alignment, producing traceable concept highlights and alignments that help users verify why a snippet is retrieved.
(ii) Robust Generalization beyond Shortcut Learning: By integrating concept alignment into both training and retrieval (instead of post-hoc explanations), XSearch encourages coverage of multiple functional aspects of the query and reduces reliance on superficial cues, leading to substantially improved retrieval robustness on out-of-distribution benchmarks—without increasing model size or training data.

Dataset

Includes 925,362 query-code pairs in total.
Each pair is annotated with fine-grained concept-to-code alignments.
First explainability-oriented benchmark for semantic code search in the community.

Referenced in the paper (Line 130).

Link to the Dataset