The query has different intents (most short queries)
There can be multiple tasks for the same interpretation
User intents are usually not clear
Find a balance between robustness and relevance
The traditional IR tend to find textual similarity between doc and query,
which tend to lead to a query focus mainly on the most probable intent,
which only good for some users.
The target is to maximize the expected SAT in each position.
Diversity Metrics
: Only R or NR, ignore the diversity of ranking
:
\begin{equation}
P−IA@k= \sum_{q_i} \Pr(q_i│q) P@k_{q_i}
\end{equation}
Tend to give higher priority to documents that satisfy multiple intents
Has a weight for diversity and feature y for diversity added
Cannot calclulate the h(x) all of a time then re rank, has to build while rank, because the y diversity features are built on the previous ranked docs.
Performance is the best
Each feature could be either min or max or avg
Implicit Methods Pros and Cons
Don’t favor any intent, all intents treated equally
Don’t need prior knowledge
Explicit Methods
Query intent discovery
Use search log to get intent, others are trick and open provlems
Intent
Ambiguous
Faceted - related interpretations of a query
Suggested queries - higher precision
Related queries - higher recall
xQuAD (Equations)
Key idea: (Same as MRR, but the formula has different symbols)
To reduce the redundancy as much as possible
To cover the intent as much as possible
Features:
The docs cover more intents are tend to be preferred. (Just like MMR, just like $P-IA@k$)
The intents are treated equally (Just like MMR)
The final rank prefer the mediocre docs that covers more documents, then prefer the top relevant document, documents that doesn’t rank high and doesn’t cover a lot of intents would not be promoted.
One query has 7-10 intents, thus time complexity is high
Why suggest queries are more effective than related queries
The cons of xQuAD: the intents are treated equally, thus when there’s a high recall algorithm that provided original rank, the discovering query intetns would find many rare or unpopular intents, which users doesn’t need.
PM2
Key Idea: The number of documents for each subtopic should be proportional to each subtopic’s popularity
Greedy
At each rank r, select the query intent $q_i$ that must be covered next to maintain proportional coverage of intents in the ranking
Select a document that covers the $q_i$, and might also cover other query intents.
The update process
Get doc that has the max
\begin{equation}
\lambda qt[i^∗] \Pr(d_i│q_{i^∗}) + (1 − \lambda) \Pr(d_{j\ others} | q_{j\ others})
\end{equation}
Update the rank after each iteration
\begin{equation}
s_i += \frac {\Pr(d^∗│q)} {\sum_{q_j \in Q} \Pr(d^∗ |q_j) }
\end{equation}
Summary
Contrast: xQuAD vs PM2:
$\lambda$ is on the weight of new intent to cover
xQuAD wouldn’t care about the none mediocro and none covered a lot docs
Both use uniform weight
Implicit vs Explicit Diversification Algorithms
Explicit is more effective than MMR (better metric), but having the query intent resource is difficult, the suggestions might not suit your task