Human Body Keypoint Annotation
Shaip annotated 150,000 frames of image and video data with a 36-keypoint full-body schema, combining facial landmarks (head, eyes, ears, nose, chin) with skeleton joints (shoulders, elbows, wrists, hips, knees, ankles) — to power pose estimation, motion analysis, fitness tracking, and healthcare movement AI.
Project Overview
As pose estimation and human motion AI move into production deployment, the client needed a scalable annotation framework to label 150,000 frames of image and video data with anatomically precise full-body keypoint coverage across diverse real-world conditions.
Shaip built the end-to-end annotation pipeline covering keypoint placement, multi-angle pose handling, occlusion management, and structured QA — supporting 36-point body schema delivery and producing model-ready datasets at consistent daily throughput.
Key Status
Frames Annotated
150,000
Keypoints / Body
36
Daily Throughput
30 frames
Platform
CVAT
Challenges
- Scaling from controlled sample workflows to 150,000 frames of full-body skeleton annotation
- Maintaining anatomical precision across varied poses, viewing angles, and lighting conditions
- Handling partial occlusion and overlapping subjects without compromising keypoint accuracy
- Coordinating facial landmark + body skeleton placement on a single 36-point schema
- Sustaining a 30 frames per day per annotator throughput benchmark across the full team
Solution
Annotation Strategy
Shaip designed a 36-keypoint body schema covering full anatomical coverage — facial landmarks (head, eyes, ears, nose, chin) combined with skeleton joints across shoulders, arms, torso, hips, and legs. The CVAT platform was configured with this schema and rolled out across the team for consistent labeling.
Pose & Diversity Handling
The dataset deliberately covered diverse subjects across varying poses, viewing angles, lighting conditions, and clothing types. Annotators followed pose-specific guidelines to handle standing, sitting, crouching, lying, and dynamic motion poses with the same level of keypoint precision.
Occlusion & Edge-Case Rules
Strict rules governed handling of partial occlusions — keypoints behind clothing, body parts hidden by other limbs, and subjects partially out of frame. Hidden landmarks were marked with visibility flags rather than approximated, preserving dataset integrity for downstream pose estimation models.
Throughput & Productivity Benchmark
The team maintained a 30 approved annotated frames per annotator per day benchmark across an 8.5-hour shift. This benchmark was calibrated against accuracy targets, ensuring throughput did not compromise quality.
Quality Assurance Workflow
Every annotated frame passed through a structured QA review covering keypoint placement accuracy, visibility flag correctness, and consistency with the 36-point schema. Rejected frames were returned for correction with annotator-level feedback to drive continuous improvement.
Project Scope
| Dataset Type | Volume | Keypoints | Platform | Throughput | Timeline |
|---|---|---|---|---|---|
| Human body skeleton + keypoint annotation | 150,000 frames | 36 per human figure | CVAT | 30 frames/day/annotator | Multi-month |
Outcomes
- Established a scalable 36-keypoint annotation framework ready for pose estimation production training
- Standardized anatomical landmark placement across facial and skeletal regions
- Maintained 30-frames-per-day annotator throughput without compromising precision
- Delivered a diverse, multi-condition dataset spanning poses, lighting, angles, and clothing types
- Enabled the client’s pose estimation, motion analysis, fitness tracking, and healthcare movement AI roadmap
Overall, Shaip helped transform a 150,000-frame keypoint annotation requirement into a structured, production-ready pipeline — one capable of supporting human pose AI, fitness tracking, motion diagnostics, and healthcare movement applications with consistent precision and scale.
Shaip delivered our keypoint annotation backbone with the precision our pose estimation models needed. Their 36-point schema execution, occlusion handling, and consistent daily throughput translated directly into stronger model performance.
– Director, Computer Vision Engineering