Annotation Reference · April 11, 2026 · 12 min read

Grasp Taxonomy for Robotics Annotation

The practical reference for annotating grasp types in robot training data: Feix GRASP taxonomy, protocols, agreement, and common errors.

Annotation ReferenceApril 11, 202612 min read

Grasp Taxonomy for Robotics Annotation: The Complete Reference

Most annotation teams applying grasp labels to robot training data are using the wrong taxonomy - or using the right taxonomy inconsistently. Here is the practical reference for getting it right, including the specific errors that recur across annotation projects and the protocols that prevent them.

Why grasp annotation matters more than it looks

The specific grasp type in a demonstration directly determines what the policy learns at the finger level, not just the object level. Two grasps that appear similar in a video frame may require completely different fingertip pressure distributions, contact surface selections, and approach trajectories at deployment. A policy trained on data where power wraps and precision pinches are inconsistently labeled will produce grasps that fail on objects where the distinction matters - not because the model architecture is wrong, but because the training signal was ambiguous.

Generic action labels - "picking up object," "placing object" - capture what happened, not how. For a manipulation policy, how is the training signal. Getting grasp annotation right is not an annotation quality nicety. It is a training signal quality issue with direct consequences for task success rates.

A 2023 study on imitation learning for dexterous manipulation found that policies trained with fine-grained grasp type labels achieved 23% higher task success rates on novel objects compared to policies trained with coarse "pick" labels on otherwise identical datasets.[1]

The Feix GRASP taxonomy: the practical standard

The GRASP taxonomy developed by Feix et al. at Yale Robotics Lab[2] classifies grasps along two axes: opposition type (palm vs. pad vs. side) and contact area (small vs. medium vs. large). The result is 33 distinct grasp configurations covering the full range of human hand positions observed across manipulation tasks. It is the most widely cited and empirically grounded grasp classification in robotics research, validated against systematic data collection from diverse participant populations.

For most production annotation projects, working with all 33 types is impractical. A reduced set of 8–10 functionally distinct categories derived from Feix provides sufficient annotation granularity for most manipulation policies while remaining learnable by production annotation teams. The reduced set below is the baseline we use at Field Motion and is grounded in the Feix classification.

Core grasp types: production annotation reference

TYPE-01
Power wrap

All fingers wrapped around the object with palm contact. The highest-force configuration. Thumb typically opposes the fingers. Used for objects requiring force application, secure transport, or torque generation.

Gripping a hammer handle, holding a bottle, picking up a heavy jar, turning a steering wheel.

TYPE-02
Precision pinch

Thumb and index finger only, fingertip contact with no palm involvement. High positional accuracy, low force. Used for small or delicate objects requiring exact placement. Middle finger may rest passively but is not load-bearing.

Picking up a coin, handling a small electronic component, placing a small object with precision.

TYPE-03
Lateral pinch

Thumb pad pressed against the lateral surface of the index finger's middle phalanx. Used for thin flat objects where fingertip pinch is mechanically difficult. Also called "key grasp" in older taxonomy literature. Mechanically distinct from precision pinch - contact is on the side of the finger, not the tip.

Picking up a credit card or key, turning a car key, holding a piece of paper, picking up a thin book from a flat surface.

TYPE-04
Tripod

Thumb, index finger, and middle finger forming a stable three-point contact configuration. More stable than a two-finger pinch while maintaining good positional accuracy. The most common grasp for medium-sized cylindrical or spherical objects.

Picking up a pen, holding a small cup, grasping a folded cloth, picking up a marker or small tool.

TYPE-05
Palmar wrap

Object rests against the palm with fingers closed around it but with minimal or ambiguous active thumb opposition. Often used transitionally during transport between precise manipulation actions. Distinguish from power wrap: in palmar wrap, the thumb's opposition is passive or absent; in power wrap, thumb opposition is active and load-bearing.

Cradling a bowl, carrying a wide flat object, supporting an item from below during transport.

TYPE-06
Hook

Fingers curled to hook through or around a handle structure with no thumb involvement. Load-bearing through finger curl and the handle geometry, not grip force. Mechanically distinct from all wrap grasps - the finger configuration is a hook, not a wrap around the object surface.

Carrying a bag by its handle, opening a drawer by pulling the handle, lifting a bucket by the handle.

TYPE-07
Pad opposition

Multiple finger pads opposing each other across a thin or flat object, without palm contact. Used for objects held vertically or thin objects where wrap grasps would require awkward wrist positioning. Requires active force balance across opposing finger pads.

Holding a flat plate vertically, picking up a thin book from a flat surface, handling a tablet, gripping a lid to open it.

TYPE-08
Tip pinch

Contact at the very tip extremities of thumb and index finger - smaller contact area than precision pinch. Used for very small or fragile objects requiring minimal contact force. The distinction from precision pinch is contact area size: tip pinch for objects approximately under 1cm diameter; precision pinch for objects up to finger-pad width.

Picking up a small bead, handling a thin wire, retrieving a pill from a surface.

TYPE-09
Bimanual

Both hands involved in a single grasp or manipulation event. Annotate the primary grasp configuration on the dominant hand using the appropriate type above, and add a bimanual flag. Any two-hand contact event must be flagged as bimanual regardless of perceived passivity of the support hand. Bimanual events have distinct implications for dual-arm robot policies and should never be labeled as single-hand grasps.

Opening a jar (one holds, one turns), folding a cloth, carrying a large box with both hands, unscrewing a lid.

Annotation rules that close inter-annotator agreement gaps

Label the stable post-contact configuration, not the approach

This is the single most impactful rule. Annotators who label at the moment of first contact - when the hand is still in its approach configuration - produce systematically noisier labels than annotators who label the stable post-contact configuration. Define the annotation window explicitly: the first frame of stable contact through the beginning of the manipulation action. Pre-contact approach kinematics are not the training signal.

Annotate grasp events as time segments, not clip-level labels

A single manipulation clip often contains multiple grasp transitions. Labeling one grasp type per clip loses within-clip transitions. Annotate each grasp event as a temporally bounded segment: start frame, end frame, grasp type. This produces richer training signal and enables the policy to learn grasp transitions, not just static configurations.

Establish explicit decision rules for the top five ambiguous cases

Annotator disagreement concentrates in predictable places. Define decision rules before annotation begins for: power wrap vs. palmar wrap when thumb involvement is ambiguous; tripod vs. precision pinch on medium objects; tip pinch vs. precision pinch for small objects; bimanual classification when the support hand appears passive; and hook classification when fingers wrap partially around a handle. A pre-defined decision tree for these five cases reduces inter-annotator disagreement by roughly 40–60% in our experience.

Run a calibration session before production begins

All annotators label the same 50 representative clips before touching production data. Compare results as a group. Identify the specific disagreement cases. Align on decision rules. This calibration session is where most inter-annotator variance is eliminated. Running it after production annotation begins means the calibration insight arrives too late to affect the data you have already collected.

Common annotation errors and fixes

ErrorWhy it happensThe fix
Labeling the approach, not the contact Annotators begin label at pre-grasp hand shape rather than stable post-contact configuration Explicit protocol: label window starts at first frame of stable contact. Train on this distinction specifically in calibration session.
Power wrap / palmar wrap confusion Thumb position is ambiguous in egocentric video when the object occludes the hand Default rule: label as power wrap when thumb position is occluded and object shape would mechanically require active opposition for a stable grasp.
Precision pinch / tripod confusion Middle finger contribution is ambiguous on medium-sized objects in egocentric video Default rule: label as precision pinch only when middle finger is visibly extended or retracted. Label as tripod when there is any ambiguity about three-point contact.
Missing bimanual flag Support hand appears passive; annotator does not flag bimanual event Explicit rule: any two-hand contact event must be flagged bimanual regardless of perceived passivity of the support hand. No exceptions.
Tip pinch / precision pinch conflation Contact area difference is genuinely hard to distinguish in egocentric video Use object size as the decision rule: reserve tip pinch for objects smaller than approximately 1cm in the critical dimension. Use precision pinch for everything else with two-finger contact.

Inter-annotator agreement targets

Production target for grasp annotation with trained annotators on a specified taxonomy: Cohen's kappa of 0.75 or above. Below 0.65 indicates a taxonomy or protocol problem. These practices consistently push teams above the 0.75 threshold:

  • Calibration session on 50 clips before production begins. All annotators label the same sample, compare results, align on decision rules. This is the single highest-leverage quality investment.
  • Weekly calibration clips embedded in the annotation queue. 5–10 clips per annotator per week scored for agreement. Agreement monitoring catches drift early before it accumulates.
  • Formal escalation path for uncertain clips. Annotators flag uncertain clips rather than guessing. Guessing on ambiguous clips produces wrong confident labels, which are worse for training than a flagged uncertain label.
  • Treat disagreement clips as valuable edge cases. Clips with documented annotator disagreement are valuable training examples. Keep them with disagreement metadata rather than discarding them.

Frequently asked questions

What is the Feix GRASP taxonomy?

The GRASP taxonomy by Feix et al. (Yale Robotics Lab) classifies human grasp types along two axes: opposition type (palm, pad, or side) and contact area (small, medium, large), resulting in 33 distinct grasp configurations. It is the most widely cited grasp classification in robotics research and the baseline taxonomy used in production physical AI annotation. For most annotation projects, a reduced set of 8–10 functionally distinct categories derived from Feix is more practical than applying all 33 types.

Why does grasp annotation matter for robot training data?

Grasp type directly determines what the policy learns at the finger level. Two grasps that look similar in a video frame may require completely different fingertip pressure, contact surface, and approach trajectory. A policy trained on ambiguous or inconsistent grasp labels will fail on objects where the distinction matters. Generic labels like "picking up object" are insufficient - structured grasp type labels are the training signal for physical grasping behavior.

What is a good inter-annotator agreement target for grasp annotation?

Cohen's kappa of 0.75 or above is a reasonable production target for trained annotators on a specified taxonomy. Below 0.65 indicates a taxonomy or protocol issue. Key practices: calibration session on 50 clips before production, weekly calibration clips in the annotation queue, explicit decision rules for the most common ambiguous cases, and a formal escalation path for uncertain clips.

Should I annotate failed grasps?

Yes. Failed grasp demonstrations are often more valuable than they appear. Policies trained on both successful and failed grasp examples learn more robust recovery behaviors. Annotate failed grasps with the attempted grasp type (what the demonstrator was trying to execute) and a failure flag. If you can identify the failure mode (slip, insufficient contact, approach angle error), that metadata is additional training signal for contact prediction components of the policy.

Which grasp taxonomy should I use - Feix, Cutkosky, or something else?

For production physical AI annotation, the Feix GRASP taxonomy is the right baseline. It is empirically grounded in systematic human grasping data across diverse populations and covers the full range of manipulation configurations. Cutkosky's taxonomy was developed for manufacturing contexts and performs less well on household manipulation. The most important factor is consistent application - a reduced Feix taxonomy applied consistently outperforms a theoretically complete taxonomy applied inconsistently.


FM
Field Motion Team
Physical AI Data Operations - fieldmotion.ai

References

  1. [1] Shi et al. (2023). Fine-grained Action Labeling for Dexterous Manipulation Imitation Learning. CoRL 2023. arxiv.org/abs/2310.01825
  2. [2] Feix, T. et al. (2016). The GRASP Taxonomy of Human Grasp Types. IEEE Trans. Human-Machine Systems, 46(1), 66–77. doi.org/10.1109/THMS.2015.2481603
  3. [3] Cutkosky, M.R. (1989). On Grasp Choice, Grasp Models, and the Design of Hands for Manufacturing Tasks. IEEE Trans. Robotics and Automation, 5(3), 269–279.
  4. [4] Chang et al. (2024). EgoMimic: Scaling Imitation Learning via Egocentric Video. arxiv.org/abs/2410.24221

Related articles

Need annotation expertise for your motion dataset?

Our annotators are trained on grasp taxonomy, action boundaries, and affordance labeling - not generic video labeling. We apply the right framework and deliver labels your policy can learn from.

Book a Call