Enhancing Search Query Understanding with Human Annotation

Leveraging human judgment and structured taxonomy to consistently handle ambiguous edge cases and improve search relevance for a leading Poland-based e-commerce conglomerate.

Project Overview

The client, a Poland-based e-commerce leader, receives millions of search queries daily. Many of these queries are ambiguous, include misspellings, or refer to multiple product categories, creating challenges for automated search engines.

To improve search accuracy and customer experience, Shaip developed a structured annotation framework inspired by Baymard’s study. Queries were systematically classified into 11 categories (e.g., Product Category, Theme, Specific Attribute, Exact, Merchant, Symptom, Non-Product, etc.) with precedence rules to ensure consistent categorization.

Key Stats

50,000+ Queries
Annotated
across multiple
categories

11 Annotation
Classes
with clear definitions &
precedence rules

3-Step
Workflow
Annotation ➔ QA ➔
SME arbitration

Project Scope

The project centered on building a comprehensive taxonomy to capture the full spectrum of user search behavior on a large-scale marketplace platform. The scope included:

Developing a taxonomy of 11 categories with clear definitions and a precedence hierarchy to address cases where queries could fit into more than one class.
Annotating thousands of real queries across both product and non-product domains to train and calibrate the classification system.
Resolving ambiguous queries by escalating to Subject Matter Experts (SMEs), ensuring consistency in how edge cases were handled.
Providing annotated examples and justifications for QA calibration, creating a training set that future annotators could rely on for reference.

Sample Annotations included:

De dietrich ELENSIO ➔ Exact
E 91 ➔ Hard-to-say
tezfiles ➔ Merchant
subaru brz toyota gt86 ➔ Non-Product
okulary BHP ➔ Product Category
stawu skokowego ➔ Symptom

Challenges

The project had to overcome several data complexity issues that are typical in e-commerce search environments:

Ambiguity

Queries like “E 91” could correspond to vastly different products (a car model, a fuse holder, a capsule imprint), making interpretation highly uncertain.

Typos & Variants

Misspellings or shorthand, such as “lampa uf zestaw”, required contextual human interpretation to understand as “lampa UV zestaw”.

Overlapping Categories

Queries often matched multiple classes (e.g., Exact vs. Compatible vs. Specific Attribute), requiring precedence rules to ensure consistency.

Invalid Inputs

Serial codes or identifiers without any product match needed to be tagged as “Invalid phrase” instead of being misclassified.

Scalability

Consistently applying nuanced classification rules across tens of thousands of queries demanded strong QA and annotation governance.

Solution

To address these challenges, a structured annotation framework was introduced, balancing automation with human oversight:

Annotation Guidelines

Detailed definitions, examples, and instructions were created to help annotators classify consistently, even in complex scenarios.

Precedence Rules

A hierarchy was established (e.g., Compatible > Exact > Specific Attribute) so overlapping cases were resolved systematically.

Multi-level QA Process

Initial annotation by trained annotators.
Secondary review by QA specialists.
Escalation to SMEs for arbitration on edge cases or disagreements

Practical Application of Guidelines with real-world queries

4008146044786 ➔ Invalid Phrase
miraculum królika ➔ Thematic Attribute
zcd galactic grey ➔ Compatible
owczarek belgijski ➔ Theme

This ensured alignment, quality, and reliability across the annotation pipeline.

Outcome

The initiative delivered measurable improvements to the client’s search ecosystem:

50,000+ Queries Classified with high precision, forming a robust training dataset for search improvements.
Improved Relevance of Search Results, directly boosting user satisfaction and reducing frustration from irrelevant matches.
Reduced Ambiguity by systematically resolving edge cases through SME-driven arbitration and precedence rules.
Enhanced Product Discoverability, ensuring users could find items more accurately across categories, attributes, and themes.

Overall, the project laid the groundwork for a more intelligent, user-focused search experience, helping the client maintain its competitive edge in the e-commerce market.

The human annotation workflow brought clarity to complex search queries. The structured taxonomy and precedence rules significantly improved our search engine’s accuracy and made user experiences more seamless.

– Head of Search & Discovery, Poland-based E-commerce Conglomerate

Enhancing Search Query Understanding with Human Annotation

Project Overview

Key Stats

Project Scope

Sample Annotations included:

Challenges

Solution

Outcome

AI Data Services

Platform

Speciality

Industry

Resources

Company

Contact Us

Enhancing Search Query Understanding with Human Annotation

Project Overview

Key Stats

Project Scope

Sample Annotations included:

Challenges

Solution

Outcome

Let us know more about you!