FDU千寻 Moz1 叠衣服方案设计及比赛过程分享

Leonardo 领队 2026-01-21 10:06:04

我们在 Moz1 叠衣服任务中的核心目标并不是追求一次性解决“任意衣物”的泛化问题，而是希望先把叠衣服这件事本身解决得足够好。这里的“好”被我们非常工程化地定义为：从一件衣服出发，暂时不优先考虑泛化性，在真实场景中取得尽可能高的实际成功率。基于这一目标，我们选择了从单一衣物入手，逐步爬坡，而不是一开始就设计高度复杂的统一方案。

在方案设计上，我们对齐比赛设置的三个等级（单一 T 恤、多种衣物、任意形态衣物），采用了逐级能力构建的思路，整体以 VLA（Vision-Language-Action）based folding 作为核心基模。早期重点放在可控任务上，围绕展开状态的 T 恤与长袖衣物进行真实数据采集与折叠示教，构建折叠数据闭环，为后续模型训练和实机验证提供稳定基础。同时，我们并行探索了 3D 衣物 mesh 的生成与建模，引入褶皱检测来刻画衣物状态，为判断“是否具备可折叠性”提供更明确的结构信号。

随着实验推进，我们逐渐发现，叠衣服任务中的主要失败来源往往并不发生在 folding 动作本身，而是源于衣物初始状态不可控、自遮挡严重或褶皱过多。因此，我们进一步思考了基于 affordance 的 garment flatten 思路，希望能够从任意形态衣物出发，识别可操作区域并将其拉展到一个标准、可折叠的状态。但在有限的比赛周期内，我们目前完成了衣物褶皱检测、单件展平衣物折叠的初步验证，未来希望进一步验证衣物flatten 触发逻辑。

在系统实现层面，我们完成了三色短袖 T 恤的数据采集，训练并测试了 XVLA folding 模型，并结合 GR00T N1.5以及Motus 进行动作生成与控制稳定性的提升。在当前设定下，该系统在单一衣物任务中表现出较高的成功率，也验证了“先收敛一个可靠子问题”的路线是可行的。

在Motus模型上进行T恤折叠finetune，测试并开源该checkpoints：https://huggingface.co/Star-UU-Wang/motus_moz1

以下为在研究过程中，我们收集并讨论的一些论文/技术方案：
Manipulation Specific for Cloth Manipulation

Project	Code?	Venue/Year	Data	Sim/Real	Overview	Category
BiFold: Bimanual Cloth Folding with Language Guidance https://arxiv.org/pdf/2505.07600	https://github.com/Barbany/bifold	ICRA 2025	VR-Folding	Sim	Language-guided folding using ViT, CLOTH3D rendering	VLA
General-purpose Clothes Manipulation with Semantic Keypoints		ICRA 2025	SoftGym, CLOTH3D	Sim	Keypoints + LLM planning + action primitives (folding included)	VLA
APS-Net https://arxiv.org/abs/2506.22769		arXiv 2025	Collected demos	Real	Standardization + folding pipeline with reward shaping	VA
FoldNet https://arxiv.org/pdf/2505.09109		arXiv 2025	Synthetic demonstrations	Sim	Keypoint-driven folding policy from templates	VA
MetaFold		2025.03	MetaFold Dataset, DiffClothAI	Sim	LLM + CVAE for trajectory generation between keypoints	VLA
SSFold https://arxiv.org/abs/2411.02608		arXiv 2024	Human demo (sim to real)	Real	Graph dynamics + folding generalization	VA
UniFolding		CoRL 2023		Real	Sample-efficient, scalable folding	VA
Foldsformer https://arxiv.org/abs/2301.03003		arXiv 2023		Sim → Real	Space-time transformer for multi-step folding	VLA
Dual-arm Hem Folding		ROBOMECH 2023		Real	Four-step hem folding with real clothes	VA
SpeedFolding https://arxiv.org/abs/2208.10552		IROS 2022	Collected from 4300+ actions	Real	Efficient dual-arm folding pipeline	VA
Keypoints from Synthetic Data for Cloth Folding https://arxiv.org/abs/2205.06714		ICRA Workshop 2022	Synthetic	Real	CNN keypoint detector for towel folding	VA
1hr Real RL Fabric Folding https://proceedings.mlr.press/v155/lee21a/lee21a.pdf		MLR 2019		Real	Self-supervised RL in 1hr for goal-conditioned folding	RL
Gravity-Based Robotic Cloth Folding		2010		Real	Classic geometric g-fold algorithm	VA
FabricFolding: Learning Efficient Fabric Folding without Expert Demonstrations https://arxiv.org/abs/2303.06587		Robotica 2024	Fabric Keypoint Dataset (~1800 RGB-D images)	Real	Dual-stage: unfold with hybrid actions, then fold using keypoint heuristic; 88–92% success; no expert demos
GarmNet: Improving Global with Local Perception for Robotic Laundry Folding https://arxiv.org/abs/1907.00408		2019	RGB-D	Real	Landmark detection and global context model for folding	VA
Diffusion Dynamics Models with Generative State Estimation for Cloth Manipulation https://arxiv.org/abs/2403.00213		2025.03		Sim	Diffusion model predicts cloth dynamics; includes folding tasks; generative state estimator	VA
Dynamic Cloth Folding Using Curriculum Learning https://www.researchgate.net/publication/378427985		2023		Sim	Trains folding agents using curriculum RL; handles long-sleeved garments in simulation	RL
π₀: A Vision-Language-Action Flow Model for General Robot Control https://arxiv.org/abs/2504.16054		2024.10	OXE + 7 robots	Real	Flow-matching + PaliGemma VLM; folding included in task suite	VLA
π₀.₅: Open-world Generalization for Vision-Language-Action Robots https://arxiv.org/abs/2504.16054		2025.05	Multi-robot + web data	Real	Successor to π₀; general-purpose VLA model including household tasks	VLA
Learning Visual Feedback Control for Dynamic Cloth Folding (IROS 2022) https://arxiv.org/abs/2109.04771	https://github.com/hietalajulius/dynamic-cloth-folding	2021.09	Sim + Real (Franka, D435)	Real	RL-based visual feedback control; trained in sim, transferred to real; dynamic square cloth folding	RL
AdaFold: Adapting Folding Trajectories via Feedback-loop Manipulation https://arxiv.org/abs/2403.06210	https://github.com/albiLo17/Adafold	2024.03	Sim + Real	Real	MPC controller adapts cloth folding; uses semantic features & point cloud feedback; generalizes to new cloths	VA