Indoor Scene Object Annotation for Service Robotics & Embodied AI — Case Study
How Shaip delivered comprehensive indoor scene annotation for a leading robotics innovation company — detecting and labeling every object in room environments with spatial relationship tagging — built as a production-grade dataset for robotic perception and navigation AI across homes, offices, warehouses, and healthcare facilities.
Project Overview
As robotics moves into real-world deployment across homes, offices, warehouses, and hospitals, the client needed a comprehensive annotation pipeline capable of labeling every visible object in densely cluttered indoor scenes — with attribute richness sufficient for robotic interaction tasks.
Shaip built the end-to-end annotation pipeline covering bounding box and polygon segmentation, dense object inclusion, spatial relationship tagging, and reflective-surface handling — producing model-ready datasets for robotic perception, navigation, and human-environment interaction.
Key Stats
Objects / Image
100s
Categories
Dense
Methods
Box + Polygon
Attribute Layers
4
Challenges
- Annotating every visible object in densely cluttered room imagery — hundreds per frame
- Choosing the right method — bounding box vs polygon segmentation — based on object complexity
- Including small objects like switches, plugs, and decorative items critical for robotic interaction
- Handling reflective surfaces (mirrors, glass tables) without ghost annotations
- Tagging spatial relationships and object states for context-aware robotic AI
Solution
Comprehensive Object Inclusion
Every visible object within each room image was individually annotated across a wide range of categories — furniture (chairs, tables, sofas, beds, shelves), appliances (televisions, refrigerators, microwaves, lamps), personal items (bags, books, bottles, clothing), structural elements (doors, windows, walls, floors), and small objects (remote controls, cups, plates, keyboards, switches, plugs).
Method Selection per Object
Objects were precisely labeled using bounding boxes or polygon segmentation depending on object shape and complexity. Boxes were used for regular rectangular objects; polygons captured organic shapes and tightly-packed items where boxes would overlap excessively. This per-object method selection ensured clean boundaries even in cluttered scenes.
Spatial Relationship Tagging
Each annotated object was enriched with attributes covering object state (open or closed for doors and drawers), spatial relationship tags indicating proximity and placement relative to other objects, occlusion status, and object condition. This spatial intelligence layer enables robotic AI systems to understand context, not just detect objects.
Small Object & Interaction-Critical Coverage
Annotators followed strict inclusion rules to label every visible object regardless of size — including small items like switches, plugs, and decorative objects that are critical for robotic interaction tasks. These items often determine whether a robot can complete its task, so they could not be deprioritized.
Reflective Surface Handling
Reflective surfaces such as mirrors and glass tables required special handling to avoid duplicate or ghost annotations. Specific guidelines governed whether reflected objects were labeled separately, ignored, or flagged — ensuring downstream models didn't learn from artifact-laden labels.
Project Scope
| Dataset Type | Coverage | Methods | Categories | Attributes | Special Handling |
|---|---|---|---|---|---|
| Indoor scene object annotation | Every visible object | Box + polygon | Furniture, appliances, personal items, structural, small | 4 layers (state, spatial, occlusion, condition) | Reflective surface rules |
Outcomes
- Established an object-dense annotation pipeline for indoor robotic perception
- Standardized per-object method selection between bounding boxes and polygon segmentation
- Delivered spatial relationship tagging enabling context-aware robotic AI
- Implemented reflective surface handling to prevent ghost annotations
- Enabled the client’s home, warehouse, retail, and healthcare facility robotics AI deployment
Overall, Shaip helped transform an object-dense indoor annotation requirement into a structured, production-ready pipeline — one capable of supporting robotic navigation, pick-and-place automation, smart environment monitoring, and human-robot interaction across diverse indoor deployment environments.
Shaip annotated rooms the way our robots see them — every object, every relationship, every small item that matters. Their attention to switches, plugs, and reflective surfaces meant our perception model didn’t trip on the things most datasets ignore.
– VP, Robotic Perception