Physics-grounded simulation matches real training data for cloth manipulation tasks

A real-to-sim-to-real pipeline using elastic modeling and diffusion-based trajectory generation achieves 90% zero-shot success on deformable object manipulation using only synthetic training data.

PaperarXiv:2604.08544v1 ↗

Yunsong Zhou · Hangxu Liu · Xuekun Jiang · Xing Shen · Yuanzhen Zhou · Hui Wang · +9 more

Research Digest·13 April 2026·3 min read

Read the paper →

Zhou et al. · AI-generated illustration · Zotpaper

Zhou et al. present SIM1, a data engine that converts a small number of real demonstrations into large-scale synthetic training data for robotic manipulation of deformable objects such as cloth. The system digitizes real scenes into metric-accurate virtual twins, calibrates soft-body physics via elastic modeling, and generates diverse trajectories through a diffusion model with quality filtering. Policies trained exclusively on this synthetic data match those trained on real data at a 1:15 equivalence ratio — one real demonstration is worth roughly fifteen synthetic ones, or equivalently, fifteen synthetic examples substitute for one real one.

What they did

The authors built a three-stage pipeline. First, real scenes are reconstructed into simulation as metric-consistent digital twins. Second, deformable dynamics (e.g., cloth stiffness, elasticity) are calibrated against physical measurements rather than left at default simulator values. Third, a diffusion model generates new manipulation trajectories from the calibrated scene, and a quality filter removes physically implausible samples before the data is used for policy training.

Experiments targeted cloth manipulation tasks in real-world settings. Policies were trained on purely synthetic data produced by SIM1 and evaluated zero-shot on physical hardware, then compared against baselines trained on equivalent quantities of real demonstrations.

Key findings

Policies trained on SIM1 synthetic data achieve 90% zero-shot success on real-world cloth manipulation tasks.
Synthetic data provides a 1:15 equivalence ratio relative to real demonstrations — fifteen synthetic examples are needed to match one real demonstration in policy quality.
SIM1-trained policies show 50% generalization gains over real-data baselines when evaluated on out-of-distribution configurations.
The pipeline is described as operating from limited demonstrations, suggesting low entry cost for new task setups.

Why it matters

Deformable object manipulation is among the hardest regimes for sim-to-real transfer because shape, contact, and topology change continuously and are poorly captured by rigid-body simulators. SIM1 demonstrates that physics calibration — rather than purely increasing simulation fidelity through rendering — is the key bottleneck. If the 1:15 ratio holds across tasks, synthetic data generation could substantially reduce real-world data collection effort for cloth-class manipulation.

Caveats

The 1:15 equivalence ratio means synthetic data is not a free substitute; real demonstrations are still more informationally dense. Results are reported on cloth manipulation specifically, and it is unclear how the pipeline transfers to other deformable categories (granular materials, liquids, ropes). The diffusion-based trajectory generator and quality filter introduce their own failure modes that are not fully characterized. Independent replication and ablation of each pipeline stage would strengthen the claims.

Analysis

The paper positions itself against a body of sim-to-real work that has focused on visual domain randomization and rendering realism. The authors' argument — that simulation fails because it is 'ungrounded' rather than because it is synthetic — redirects attention toward physics parameter calibration as the primary lever. This is consistent with recent trends in contact-rich manipulation research, where accurate contact models matter more than photorealistic appearance.

The use of a diffusion model for trajectory augmentation rather than scripted motion primitives is notable; it allows behavioral diversity beyond what hand-coded controllers can produce. The open question is whether the elastic calibration procedure generalizes to materials with more complex rheology (e.g., viscoplastic or wet materials), and whether the quality filter is conservative enough to prevent physically spurious data from degrading policy learning at scale.