Zotero History
- Date item added to Zotero:: 2026-04-05
- First date annotations or notes modified:: 2026-04-11
- Last date annotations or notes modified:: 2026-04-11
- Export date:: 2026-04-11
DEO: Training-Free Direct Embedding Optimization for Negation-Aware Retrieval
Cite
Lee, T., Park, J., Hwang, S., & Jang, J. (2026). DEO: Training-Free Direct Embedding Optimization for Negation-Aware Retrieval (arXiv:2603.09185). arXiv. https://doi.org/10.48550/arXiv.2603.09185
TL;DR
Contribution:: Training-free negation-aware retrieval โ encoder ๊ณ ์ ์ํ์์ query embedding์ contrastive loss๋ก ์ง์ ์ต์ ํํ์ฌ ๋ถ์ /์ ์ธ ์ฟผ๋ฆฌ ์ฒ๋ฆฌ
Pros:: ๋ชจ๋ธยท๋ชจ๋ฌ๋ฆฌํฐ ๋น์์กด์ (BGE, CLIP ๋ชจ๋ ์ ์ฉ), ์ถ๊ฐ ํ์ต ๋ฐ์ดํฐ ๋ถํ์, CPU์์๋ 20 steps 0.016์ด๋ก ์ค์ฉ์
Cons:: LLM ์ฟผ๋ฆฌ ๋ถํด ํ์ง์ ์ฑ๋ฅ ์์กด(91.76% ์ ํ๋), ์ค๋์คยท๋น๋์ค ๋ฑ ๋ค๋ฅธ modality ๋ฏธ๊ฒ์ฆ
Study Snapshot
Key takeaway:: ๋ถ์ ์ฟผ๋ฆฌ ๋ฌธ์ ๋ fine-tuning์ด ์๋๋ผ inference-time embedding ์ต์ ํ๋ก ํด๊ฒฐ ๊ฐ๋ฅํ๋ฉฐ, ๊ธฐ์กด ๋ชจ๋ธ ์์ plug-in์ผ๋ก ์ ์ฉ ๊ฐ๋ฅ
Methods:: (1) LLM(GPT-4.1-nano)์ผ๋ก ์ฟผ๋ฆฌ๋ฅผ positive/negative sub-query๋ก ๋ถํด (2) ๋ก query embedding ์ง์ ์ต์ ํ (Adam, 20 steps)
Outcomes:: NegConstraint: nDCG@10 +0.0738, MAP +0.1028 (BGE-large) / COCO-Neg: Recall@5 +6% (OpenAI CLIP) / NegCLIP ์์์๋ ์ถ๊ฐ ํฅ์(+2.65%)
Results:: ์ฟผ๋ฆฌ ๋ถํด ๋จ๋ ์ผ๋ก๋ ๋ฏธ๋ฏธํ ํฅ์(RRF: MAP 0.6641), ํต์ฌ ๊ธฐ์ฌ๋ embedding optimization(MAP 0.7379). ์ํ LLM(Qwen2.5-1.5B)์ผ๋ก๋ baseline ๋๋น ์ ์๋ฏธํ ๊ฐ์ ๋ฌ์ฑ
Implementations
- ๊ณต์ ๊ตฌํ: taegyeong-lee/DEO-negation-aware-retrieval
- ๊ฐ์ธ ๊ตฌํ: lots-o/paper-to-code (@leeDEOTrainingFreeDirect2026)
Meta
Author:: Lee, Taegyeong
Author:: Park, Jiwon
Author:: Hwang, Seunghyun
Author:: Jang, JooYoungTitle:: DEO: Training-Free Direct Embedding Optimization for Negation-Aware Retrieval
Short Title::DEO Year:: 2026Citekey:: @leeDEOTrainingFreeDirect2026
itemType:: preprintDOI:: 10.48550/arXiv.2603.09185
LINK
Abstract
Recent advances in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) have enabled diverse retrieval methods. However, existing retrieval methods often fail to accurately retrieve results for negation and exclusion queries. To address this limitation, prior approaches rely on embedding adaptation or fine-tuning, which introduce additional computational cost and deployment complexity. We propose Direct Embedding Optimization (DEO), a training-free method for negation-aware text and multimodal retrieval. DEO decomposes queries into positive and negative components and optimizes the query embedding with a contrastive objective. Without additional training data or model updates, DEO outperforms baselines on NegConstraint, with gains of +0.0738 nDCG@10 and +0.1028 MAP@100, while improving Recall@5 by +6% over OpenAI CLIP in multimodal retrieval. These results demonstrate the practicality of DEO for negation- and exclusion-aware retrieval in real-world settings.
Reading notes
๐ด Problems
Highlight (1 page, edited: 2026-04-11)
However, existing retrieval methods often fail to accurately retrieve results for negation and exclusion queries.
Problems:
๊ธฐ์กด dense retrieval์ ๋ถ์ (negation) ๋ฐ ์ ์ธ(exclusion) ์ฟผ๋ฆฌ๋ฅผ ์ ํํ ์ฒ๋ฆฌํ์ง ๋ชปํจ
์: โ2024 ๊ฒฐ๊ณผ๋ฅผ ์ ์ธํ ์ต์ ์์ต ์์ธกโ ์ฟผ๋ฆฌ์์ embedding์ด ์ ์ธ ์๋๋ฅผ ๋ฐ์ํ์ง ๋ชปํด ์์น ์๋ ๋ฌธ์๊ฐ ์์ ๊ฒ์๋จ
DEO ์ค๊ณ์ ํต์ฌ ๋๊ธฐ
Highlight (1 page, edited: 2026-04-11)
To address this limitation, prior approaches rely on embedding adaptation or fine-tuning, which introduce additional computational cost and deployment complexity.
Problems:
๊ธฐ์กด ํด๊ฒฐ์ฑ (fine-tuning, embedding adaptation)์ ํ๊ณ:
๋๊ท๋ชจ GPU ์์, ํ์ต ๋ฐ์ดํฐ์ , ๊ณผ๋ํ ํ๋ จ ๋น์ฉ ์๊ตฌ
resource-constrained ํ๊ฒฝ์์ ์ค์ฉ์ฑ ์ ํ
๋ถ์ ์ฟผ๋ฆฌ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ ค๋ ์๋๊ฐ ์คํ๋ ค ๋์ ๋ฐฐํฌ ๋ณต์ก๋๋ฅผ ์ด๋
Highlight (2 page, edited: 2026-04-11)
However, despite strong average performance, these models remain brittle to negation phenomena, including attribute negation (e.g., โnot redโ), absence (e.g., โno personโ), and relational negation (e.g., โA is not left of Bโ).
Problems:
CLIP, BLIP ๋ฑ ๋ํ vision-language ๋ชจ๋ธ๋ ์์ฑ ๋ถ์ (not red), ์กด์ฌ ๋ถ์ฌ(no person), ๊ด๊ณ ๋ถ์ (A is not left of B) ๋ฑ negation ํ์์ ์ทจ์ฝ โ NegBench(Alhamoud et al., 2025)๋ก ์ค์ฆ๋จ
๐ก Prior Research
Highlight (2 page, edited: 2026-04-11)
Prior research has explored improvements through training strategies, distillation, and pre-training. Transfer learning on large-scale datasets such as MS MARCO (Singh et al., 2023) has also been widely adopted, though it is resourceintensive to construct
Prior Research:
Dense retrieval ๊ธฐ์กด ์ฐ๊ตฌ ํจ๋ฌ๋ค์: (1) ํ๋ จ ์ ๋ต ๊ฐ์ (2) ์ง์ ์ฆ๋ฅ (3) ์ฌ์ ํ์ต (4) MSMARCO ๋ฑ ๋๊ท๋ชจ ๋ฐ์ดํฐ์ ์ ์ดํ์ต. ๊ณตํต์ ์ผ๋ก labeled data์ ์๋นํ GPU ์์ ํ์ โ DEO๊ฐ ํํผํ๋ ค๋ ์ ์ ์กฐ๊ฑด๋ค. zero-shot dense retrieval์ด ์ต๊ทผ ๋ฑ์ฅํ์ผ๋ ๋ถ์ ์ฟผ๋ฆฌ ์ฒ๋ฆฌ๋ ์ฌ์ ํ ๋ฏธํด๊ฒฐ
Highlight (2 page, edited: 2026-04-11)
Beyond full model fine-tuning, recent work has explored methods for directly controlling or refining embedding space to improve retrieval. Representative approaches include projecting dense embeddings into interpretable sparse latent features and applying non-parametric optimization to directly adjust record embeddings for improved k-NN accuracy (Zeighami et al., 2024; Shevkunov et al.; Wang et al., 2024).
Prior Research:
Embedding ์ ์ด ์ ํ์ฐ๊ตฌ:
SAE ๊ธฐ๋ฐ(Kang et al., 2025): dense embedding์ ํด์ ๊ฐ๋ฅํ sparse latent feature๋ก ํฌ์ํ์ฌ ์ ์ด
NUDGE(Zeighami et al., 2024): ๋ฐ์ดํฐ ๋ ์ฝ๋ embedding์ ์ง์ ์์ ํ๋ non-parametric fine-tuning โ k-NN ์ ํ๋ ํฅ์
โ ๋ชจ๋ ๋๊ท๋ชจ ๋ฐ์ดํฐ์ ๊ณผ GPU ์์ ํ์ โ DEO๊ฐ ํด๊ฒฐํ๋ ค๋ ํ๊ณ
Highlight (2 page, edited: 2026-04-11)
Prior work has attempted to address this issue through fine-tuning or task-specific regularization to improve sensitivity to negation (Wang et al., 2024; Zeighami et al., 2024), but such approaches typically incur substantial computational cost and offer limited controllability.
Prior Research:
๋ถ์ ์ธ์ retrieval ๊ธฐ์กด ์ ๊ทผ:
Wang et al., 2024 / Zeighami et al., 2024: fine-tuning ๋๋ task-specific regularization์ผ๋ก ๋ถ์ ๋ฏผ๊ฐ๋ ํฅ์ ์๋ โ ๋์ ๊ณ์ฐ ๋น์ฉ + ์ ์ด์ฑ ๋ถ์กฑ
NegCLIP(Yuksekgonul et al., 2022): compositional understanding์ ์ํด ๋ช ์์ fine-tuning๋ CLIP ๋ณํ
โ DEO๋ NegCLIP ์์์๋ ์ถ๊ฐ ์ฑ๋ฅ ํฅ์์ ๋ณด์ (Table 2)
๐ต Main Idea
Highlight (1 page, edited: 2026-04-11)
We propose Direct Embedding Optimization (DEO), a training-free method for negation-aware text and multimodal retrieval. DEO decomposes queries into positive and negative components and optimizes the query embedding with a contrastive objective.
Main Idea:
DEO์ ํต์ฌ ์์ด๋์ด:
์ฟผ๋ฆฌ๋ฅผ positive/negative ์ปดํฌ๋ํธ๋ก ๋ถํด
encoder๋ ๊ณ ์ ํ ์ฑ query embedding ์์ฒด๋ฅผ ํ์ต ๊ฐ๋ฅํ ํ๋ผ๋ฏธํฐ๋ก ์ทจ๊ธ
inference time์ contrastive loss๋ก ์ง์ ์ต์ ํ
fine-tuning ์์ด negation-aware retrieval์ ์คํํ๋ training-free ์ ๊ทผ
Highlight (1 page, edited: 2026-04-11)
DEO is model- and modality-agnostic, generalizing across diverse embedding models and retrieval settings, and experiments demonstrate consistent improvements over baselines on both text and multimodal benchmarks.
Main Idea:
DEO์ ๋ฒ์ฉ์ฑ: ํน์ embedding ๋ชจ๋ธ(BGE ๊ณ์ด, CLIP ๊ณ์ด)์ด๋ modality(text, image)์ ์ข ์๋์ง ์์. ๋์ผํ contrastive loss ๊ธฐ๋ฐ ์ต์ ํ ํ๋ ์์ํฌ๊ฐ text retrieval๊ณผ text-to-image retrieval ๋ชจ๋์์ ์ผ๊ด๋ ์ฑ๋ฅ ํฅ์์ ๋ณด์
๐ข Methods
Image (3 page, edited: 2026-04-11)
Highlight (3 page, edited: 2026-04-11)
our model consists of two stages: (a) Decomposing the user query into positive and negative sub-queries. (b) Directly optimizing the embedding space of input query as a parameter by using contrastive loss.
Methods:
DEO 2๋จ๊ณ ํ์ดํ๋ผ์ธ:
Query Decomposition: LLM์ด ์ ๋ ฅ ์ฟผ๋ฆฌ๋ฅผ positive/negative sub-query๋ก ๋ถํด
Direct Embedding Optimization: contrastive loss๋ก query embedding์ ์ง์ ์ต์ ํ (encoder ๊ณ ์ , query embedding๋ง ํ์ต ๊ฐ๋ฅํ ํ๋ผ๋ฏธํฐ๋ก ์ทจ๊ธ)
โ ์ถ๊ฐ fine-tuning ์์ด inference time์ embedding ์กฐ์
Highlight (3 page, edited: 2026-04-11)
we employ a large language model (LLM) in a prompt-based setting to semantically analyze the input query and explicitly capture its negation or exclusion intent. The LLM then decomposes the original query into structured positive and negative sub-queries.
Methods:
LLM ๊ธฐ๋ฐ ์ฟผ๋ฆฌ ๋ถํด:
Positive sub-queries: ์ฌ์ฉ์ ์์ฒญ์ ์๋ฏธ ํ์ฅ๋ ํฌํจ ๋์
Negative sub-queries: ์ ์ธ ์๋๋ฅผ ๋ช ์์ ์ผ๋ก ์ธ์ฝ๋ฉํ ์ ์ธ ๋์
์: โBayreuth(์ ์ฒด์ฑ ์ ์ธ)์ ํน์ฑ๊ณผ Photomontage(์์ ์ ์ธ)์ ์ํฅ?โ โ Positive 3๊ฐ / Negative 4๊ฐ๋ก ๋ถํด
GPT-4.1-nano ์ฌ์ฉ, temperature=0.1
Highlight (4 page, edited: 2026-04-11)
we directly optimize the embedding of the input query at inference time while keeping the encoder frozen.
Methods:
DEO Loss ํจ์ ๊ตฌ์ฑ:
(i) Attraction term (): positive embedding์ ๊ฐ๊น๊ฒ ๋น๊น
(ii) Repulsion term (): negative embedding์์ ๋ฐ์ด๋
(iii) Consistency term (): ์๋ณธ ์ฟผ๋ฆฌ ์๋ฏธ ์ ์ง
Adam optimizer๋ก ๊ณ ์ step ์ ์ต์ ํ, encoder ํ๋ผ๋ฏธํฐ๋ ๋ถ๋ณ
Text: , 20 steps / Multimodal:
Highlight (4 page, edited: 2026-04-11)
Implementation Details. We used the [CLS] token representation for all embedding models,
Methods:
๊ตฌํ ์ค์ :
Embedding: [CLS] token representation ์ฌ์ฉ
๊ฒ์: FAISS ๋ผ์ด๋ธ๋ฌ๋ฆฌ + cosine similarity
์ฟผ๋ฆฌ ๋ถํด: GPT-4.1-nano (temperature=0.1)
Baseline ํ ์คํธ ๋ชจ๋ธ: BGE-M3, BGE-large-en-v1.5, BGE-small-en-v1.5
Baseline ์ด๋ฏธ์ง ๋ชจ๋ธ: OpenAI CLIP, CLIP-laion400m, CLIP-datacomp, NegCLIP
Image (8 page, edited: 2026-04-11)
Methods:
[Figure 6 ๋ถ์] CLIP vision-language ๊ณต๊ฐ์์์ embedding ๊ถค์ :
PCA ํฌ์์ผ๋ก text query embedding์ ์ต์ ํ ๊ณผ์ ์๊ฐํ
์ด๊ธฐ embedding โ positive/ground-truth ๋ฐฉํฅ์ผ๋ก ์ด๋, negative์์ ์ดํ
text retrieval๊ณผ ๋์ผํ ๊ถค์ ํจํด์ด cross-modal ์ค์ ์์๋ ๊ด์ฐฐ๋จ โ DEO์ modality-agnostic ํน์ฑ ์ค์ฆ
๐ Limitations
Highlight (8 page, edited: 2026-04-11)
While DEO proves effective without fine-tuning, it relies on the ability of LLMs to correctly decompose user queries into positive and negative sub-queries. As shown in the Sec 4.3, the final retrieval performance may vary depending on the decomposition quality of the LLM.
Limitations:
DEO์ ์ฑ๋ฅ์ด LLM์ ์ฟผ๋ฆฌ ๋ถํด ํ์ง์ ์ง์ ์์กด โ ์ํ LLM(Qwen2.5-1.5B) ์ฌ์ฉ ์ ๋ํ LLM(GPT-4.1-nano) ๋๋น ์ฑ๋ฅ ์ด์ธ. ๋ถํด ์ ํ๋ 91.76%์ด๋, ๋๋จธ์ง 8.24%์ ์ค๋ถํด๊ฐ ๊ฒ์ ์ฑ๋ฅ์ ์ง์ ์ํฅ โ LLM ์ค์ผ์ผ์ ๋ฐ๋ฅธ ์ฑ๋ฅ ํธ์ฐจ ์กด์ฌ
Highlight (8 page, edited: 2026-04-11)
Future work could explore enhancing query decomposition with more robust LLMs, incorporating adaptive optimization strategies that automatically select loss balancing parameters per query, and extending DEO to diverse multimodal datasets beyond images, such as audio.
Limitations:
ํ์ฌ DEO๋ text-to-text ๋ฐ text-to-image ๊ฒ์์๋ง ๊ฒ์ฆ๋จ. ์ค๋์ค, ๋น๋์ค ๋ฑ ๋ค๋ฅธ modality๋ก์ ํ์ฅ์ ๋ฏธ๊ฒ์ฆ ์ํ โ ๋ฒ์ฉ ๋ฉํฐ๋ชจ๋ฌ retrieval ์์คํ ์ผ๋ก์ ์ผ๋ฐํ ๊ฐ๋ฅ์ฑ์ ๋ฏธ์ง์
๐ฃ Key Concepts to Clarify
Highlight (3 page, edited: 2026-04-11)
(a) Given an input query containing negation, we use an LLM to decompose it into positive and negative sub-queries. (b) The input query embedding is then optimized with a contrastive loss by pulling it closer to positive query embeddings and pushing it farther from negative query embeddings, enabling negation- and exclusion-aware retrieval.
Key Concepts to Clarify:
ํด๋น ๋ ผ๋ฌธ์์ Contrastive Loss ์๋ฏธ: ์ผ๋ฐ์ contrastive loss๋ ๋ชจ๋ธ ํ๋ผ๋ฏธํฐ๋ฅผ ํ์ตํ๋ ๋ฐ ์ฌ์ฉ๋์ง๋ง, DEO์์๋ encoder๋ ์์ ํ ๊ณ ์ ํ๊ณ query embedding ๋ฒกํฐ ์์ฒด๋ฅผ ์ต์ ํ ๋์(learnable parameter)์ผ๋ก ์ทจ๊ธํ๋ ๊ฒ์ด ํต์ฌ ์ฐจ๋ณ์ . โTraining-Freeโ๋ ๋ชจ๋ธ ๊ฐ์ค์น ์ ๋ฐ์ดํธ๊ฐ ์๋ค๋ ์๋ฏธ โ query embedding ๋ฒกํฐ์ ๋ํ gradient ์ต์ ํ๋ ๋ฐ์ํจ
Highlight (4 page, edited: 2026-04-11)
we evaluate performance on NegConstraint using nDCG@10 and MAP@100, while NevIR is evaluated using the Pairwise metric.
Key Concepts to Clarify:
nDCG@10 (Normalized Discounted Cumulative Gain): ์์ 10๊ฐ ๊ฒ์ ๊ฒฐ๊ณผ์ ๊ด๋ จ์ฑ์ ์์ ๊ฐ์ค ๋ฐฉ์์ผ๋ก ์ธก์ . ์์ ์์ ๋ฌธ์์ ๋ ๋์ ๊ฐ์ค์น ๋ถ์ฌ. ์ต๋๊ฐ 1.0
MAP@100 (Mean Average Precision): ์์ 100๊ฐ ๊ฒฐ๊ณผ์์ precision์ ์ฟผ๋ฆฌ๋ณ ํ๊ท ํ
Pairwise (NevIR): ๋ ๋ฌธ์ ์ ์ค ๋ถ์ ์ฟผ๋ฆฌ์ ๋ ์ ํฉํ ๊ฒ์ ์ฌ๋ฐ๋ฅด๊ฒ ์ ํํ๋ ๋น์จ โ ๋ถ์ ๋ณ๋ณ ๋ฅ๋ ฅ ์ง์ ์ธก์
Highlight (6 page, edited: 2026-04-11)
We introduce an Only Decompose variant that retrieves using the averaged embedding of decomposed positive and negative sub-queries, and an Only Decompose (RRF) variant that retrieves separately for each sub-query and merges results using Reciprocal Rank Fusion (RRF) with k=60.
Key Concepts to Clarify:
RRF (Reciprocal Rank Fusion): ์ฌ๋ฌ ์ฟผ๋ฆฌ/์์คํ ์ ๊ฒ์ ๊ฒฐ๊ณผ ์์๋ฅผ ๊ฒฐํฉํ๋ ์์๋ธ ๊ธฐ๋ฒ. ๊ฐ ๋ฌธ์์ ์ ์๋ ์ด๊ณ , ์ฌ๊ธฐ์ ์ ์์ ์ฐจ์ด๋ฅผ ์ํํ๋ smoothing ํ๋ผ๋ฏธํฐ. Table 6์์ RRF๋ง์ผ๋ก๋ DEO ๋๋น ์ฑ๋ฅ์ด ํฌ๊ฒ ์ด์ธ(MAP: 0.6641 vs 0.7379) โ embedding ์ต์ ํ๊ฐ ๋จ์ ๊ฒฐ๊ณผ ์์๋ธ๋ณด๋ค ํจ๊ณผ์ ์์ ์์ฌ
๐ช Results
Highlight (1 page, edited: 2026-04-11)
Without additional training data or model updates, DEO outperforms baselines on NegConstraint, with gains of +0.0738 nDCG@10 and +0.1028 MAP@100, while improving Recall@5 by +6% over OpenAI CLIP in multimodal retrieval.
Results:
Abstract ํต์ฌ ์์น ์์ฝ:
NegConstraint: nDCG@10 +0.0738, MAP@100 +0.1028
COCO-Neg (multimodal): OpenAI CLIP ๋๋น Recall@5 +6%
์ถ๊ฐ ํ์ต ๋ฐ์ดํฐ๋ ๋ชจ๋ธ ์ ๋ฐ์ดํธ ์์ด ๋ฌ์ฑ
Image (5 page, edited: 2026-04-11)
Results:
[Table 1 ๋ถ์] NegConstraint ๋ฒค์น๋งํฌ ์ฑ๋ฅ:
BGE-small-en-v1.5: MAP 0.6702โ0.7302 (+9.0%), nDCG@10 0.7372โ0.7795 (+5.7%)
BGE-large-en-v1.5: MAP 0.6299โ0.7327 (+16.3%), nDCG@10 0.7139โ0.7877 (+10.3%)
BGE-M3: MAP 0.6374โ0.7379 (+15.8%), nDCG@10 0.7250โ0.7946 (+9.6%)
์ต๊ณ ์ฑ๋ฅ(BGE-large-en-v1.5): MAP +0.1028, nDCG@10 +0.0738
NevIR pairwise score๋ ์ ๋ชจ๋ธ์์ ์ผ๊ด ํฅ์ โ ๋ถ์ ์ฟผ๋ฆฌ ๋ณ๋ณ๋ ฅ ๊ฐ์ ํ์ธ
Image (5 page, edited: 2026-04-11)
Results:
[Table 2 ๋ถ์] COCO-Neg text-to-image Recall@5:
OpenAI CLIP: 0.4792โ0.5392 (+6.00%) โ ์ต๋ ํฅ์
CLIP-laion400m: 0.5248โ0.5737 (+4.89%)
CLIP-datacomp: 0.4984โ0.5513 (+5.29%)
NegCLIP (fine-tuned): 0.6715โ0.6980 (+2.65%)
NOTE: negation-aware ๋ฅผ ์ํด ๋ช ์์ ์ผ๋ก fine-tuning๋ NegCLIP ์์์๋ ์ถ๊ฐ ํฅ์ โ DEO๊ฐ fine-tuning ๊ธฐ๋ฐ ๋ชจ๋ธ๊ณผ ์ํธ ๋ณด์์
Highlight (7 page, edited: 2026-04-11)
To evaluate decomposition quality, we perform a binary correctness assessment on NegConstraint (Xu et al., 2025) using GPT-4.1-mini, measuring whether each output captures the intended positive and negative components of the original query. The decomposition achieves 91.76% accuracy, indicating that generated sub-queries are largely aligned with user intent.
Results:
์ฟผ๋ฆฌ ๋ถํด ํ์ง ํ๊ฐ:
GPT-4.1-mini ๊ธฐ๋ฐ binary correctness ํ๊ฐ์์ 91.76% ์ ํ๋ ๋ฌ์ฑ
positive/negative ์ปดํฌ๋ํธ๊ฐ ์๋ ์ฟผ๋ฆฌ ์๋์ ๋์ฒด๋ก ๋ถํฉ
๋๋จธ์ง 8.24% ์ค๋ถํด๊ฐ ์ต์ข ๊ฒ์ ์ฑ๋ฅ์ ์ง์ ์ํฅ โ Limitations์ LLM ์์กด์ฑ๊ณผ ์ฐ๊ฒฐ
Highlight (8 page, edited: 2026-04-11)
On a CPU (AMD Ryzen 7 5800X 8-Core Processor, 64.0GB RAM), DEO with 20 optimization steps required a total of 0.016 seconds (average 0.000665 seconds per step), while 50 steps required 0.035 seconds (average 0.000640 seconds per step). On a GPU (NVIDIA GeForce RTX 3060 12GB), DEO with 20 optimization steps required a total of 0.033 seconds (average 0.00172 seconds per step), while 50 steps required 0.095 seconds (average 0.001932 seconds per step).
Results:
๊ณ์ฐ ํจ์จ์ฑ ์ค์ธก๊ฐ:
CPU (AMD Ryzen 7 5800X, 64GB): 20 steps = 0.016์ด (step๋น 0.000665์ด)
GPU (RTX 3060 12GB): 20 steps = 0.033์ด (step๋น 0.00172์ด)
โ CPU์์๋ ์ค์ฉ์ ์ธ latency โ GPU ์์์ด ์ ํ๋ ํ๊ฒฝ์์๋ real-world ๋ฐฐํฌ ๊ฐ๋ฅ. LLM ์ฟผ๋ฆฌ ๋ถํด ๋น์ฉ์ด ์ค์ ๋ณ๋ชฉ
๐ Ablation Study
Image (6 page, edited: 2026-04-11)
Ablation Study:
[Table 5 ๋ถ์] , , ์กฐํฉ๋ณ ์ฑ๋ฅ (BGE-M3 / OpenAI CLIP):
Text ์ต์ : () โ MAP 0.7379, nDCG@10 0.7946
Multimodal ์ต์ : () โ Recall@5 0.5392
๊ฐ text์์ ์ต์ โ ๋ถ์ ์๋ ๋ฐ์์ ๊ณต๊ฒฉ์ ์ผ๋ก ์ด๋
์ด ๋ฉํฐ๋ชจ๋ฌ์์ ์ต์ โ CLIP ๊ณต์ vision-language ๊ณต๊ฐ ์ ์ง์ alignment context ์ค์
๋ชจ๋ ๊ตฌ์ฑ์์ baseline ๋๋น ์ฐ์ธ โ ํ์ดํผํ๋ผ๋ฏธํฐ ๋ณํ์ ๊ฐ๊ฑดํ ์ค๊ณ
Image (6 page, edited: 2026-04-11)
Ablation Study:
[Table 6 ๋ถ์] ์ปดํฌ๋ํธ๋ณ ๊ธฐ์ฌ ๋ถ๋ฆฌ (BGE-M3, NegConstraint):
BGE-M3 baseline: MAP 0.6374, nDCG@10 0.7250
Only Decompose (AVG): MAP 0.6451, nDCG@10 0.7312 (๋ฏธ๋ฏธํ ํฅ์)
Only Decompose (RRF): MAP 0.6641, nDCG@10 0.7417 (์ํญ ํฅ์)
DEO Full: MAP 0.7379, nDCG@10 0.7946 (์๋์ ํฅ์)
โ ์ฟผ๋ฆฌ ๋ถํด ๋จ๋ ์ผ๋ก๋ ์ฑ๋ฅ ํฅ์ ๋ฏธ๋ฏธ. ํต์ฌ ๊ธฐ์ฌ๋ embedding optimization ๋จ๊ณ์์ ๋ฐ์. ๋ถํด๋ ์ต์ ํ์ ๋ฐฉํฅ์ ์ ๊ณตํ๋ ๋ณด์กฐ ์ญํ
Highlight (6 page, edited: 2026-04-11)
As shown in Tables 3 and 4, GPT4.1-nano consistently outperforms Qwen2.5-1.5BInstruct across all embedding models on both NegConstraint and COCO-Neg, which we attribute to more precise query decompositions from the larger model. Nevertheless, even with Qwen2.5-1.5BInstruct, our method achieves notable improvements over the baselines, indicating that DEO delivers consistent gains regardless of LLM scale.
Ablation Study:
[Table 3, 4 ๋ถ์] LLM ๋ฐฑ๋ณธ์ ๋ฐ๋ฅธ ์ฑ๋ฅ ๋น๊ต:
NegConstraint (BGE-M3): Qwen2.5-1.5B โ MAP 0.7280, nDCG@10 0.7871 / GPT-4.1-nano โ MAP 0.7379, nDCG@10 0.7946
COCO-Neg: Qwen NegCLIP 0.6872 / GPT NegCLIP 0.6980
โ ๋ํ LLM์ด ์ฟผ๋ฆฌ ๋ถํด ์ ๋ฐ๋๊ฐ ๋์ ์ฑ๋ฅ ์ฐ์. ๊ทธ๋ฌ๋ ์ํ LLM(Qwen)๋ baseline ๋๋น ์ ์๋ฏธํ ํฅ์ ๋ฌ์ฑ โ DEO์ LLM ์ค์ผ์ผ ๋น์์กด์ ๊ฐ๊ฑด์ฑ ํ์ธ
Highlight (7 page, edited: 2026-04-11)
On NegConstraint (Figure 2), performance improves sharply when increasing steps from 0 to 20, and remains stable between 20 and 50 steps. However, beyond 100 steps, both nDCG@10 and MAP@100 gradually decline. On COCO-Neg (Figure 3), Recall@5 similarly improves from 0 to 20 steps, peaks around 50 steps, and slightly decreases beyond 100 steps. In both cases, 20 to 50 steps is sufficient to achieve strong performance, and we adopt 20 steps as the default setting across all experiments.
Ablation Study:
[Figure 2, 3 ๋ถ์] ์ต์ ํ step ์ ์ํฅ:
NegConstraint:
0โ20 steps: ๊ธ๊ฒฉํ ์ฑ๋ฅ ํฅ์
20~50 steps: ์์ ์ ์ ์ง
100 steps ์ด๊ณผ: nDCG@10, MAP ์ ์ง์ ํ๋ฝ (๊ณผ์ต์ ํ)
COCO-Neg:
0โ20 steps: ํฅ์, 50 steps์์ peak
100 steps ์ด๊ณผ: ์ํญ ๊ฐ์
โ 20 steps๋ฅผ ๊ธฐ๋ณธ๊ฐ์ผ๋ก ์ฑํ โ ์ฑ๋ฅ๊ณผ ํจ์จ์ฑ์ ์ต์ ๊ท ํ์ . ๊ณผ๋ํ ์ต์ ํ๋ ์๋ณธ ์ฟผ๋ฆฌ ์๋ฏธ( consistency term)๋ฅผ ํผ์ํ๋ ๊ฒ์ผ๋ก ํด์






Discussion
Comments
๋๊ธ์ ์น์ธ ํ ๊ณต๊ฐ๋ฉ๋๋ค.