92
社区成员
发帖
与我相关
我的任务我们在 Moz1 叠衣服任务中的核心目标并不是追求一次性解决“任意衣物”的泛化问题,而是希望先把叠衣服这件事本身解决得足够好。这里的“好”被我们非常工程化地定义为:从一件衣服出发,暂时不优先考虑泛化性,在真实场景中取得尽可能高的实际成功率。基于这一目标,我们选择了从单一衣物入手,逐步爬坡,而不是一开始就设计高度复杂的统一方案。
在方案设计上,我们对齐比赛设置的三个等级(单一 T 恤、多种衣物、任意形态衣物),采用了逐级能力构建的思路,整体以 VLA(Vision-Language-Action)based folding 作为核心基模。早期重点放在可控任务上,围绕展开状态的 T 恤与长袖衣物进行真实数据采集与折叠示教,构建折叠数据闭环,为后续模型训练和实机验证提供稳定基础。同时,我们并行探索了 3D 衣物 mesh 的生成与建模,引入褶皱检测来刻画衣物状态,为判断“是否具备可折叠性”提供更明确的结构信号。
随着实验推进,我们逐渐发现,叠衣服任务中的主要失败来源往往并不发生在 folding 动作本身,而是源于衣物初始状态不可控、自遮挡严重或褶皱过多。因此,我们进一步思考了基于 affordance 的 garment flatten 思路,希望能够从任意形态衣物出发,识别可操作区域并将其拉展到一个标准、可折叠的状态。但在有限的比赛周期内,我们目前完成了衣物褶皱检测、单件展平衣物折叠的初步验证,未来希望进一步验证衣物flatten 触发逻辑。
在系统实现层面,我们完成了三色短袖 T 恤的数据采集,训练并测试了 XVLA folding 模型,并结合 GR00T N1.5以及Motus 进行动作生成与控制稳定性的提升。在当前设定下,该系统在单一衣物任务中表现出较高的成功率,也验证了“先收敛一个可靠子问题”的路线是可行的。
在Motus模型上进行T恤折叠finetune,测试并开源该checkpoints:https://huggingface.co/Star-UU-Wang/motus_moz1
以下为在研究过程中,我们收集并讨论的一些论文/技术方案:
Manipulation Specific for Cloth Manipulation
|
Project |
Code?
|
Venue/Year
|
Data
|
Sim/Real |
Overview
|
Category |
Contributions & Limitations/Challenge
|
|---|---|---|---|---|---|---|---|
|
BiFold: Bimanual Cloth Folding with Language Guidance https://arxiv.org/pdf/2505.07600 |
https://github.com/Barbany/bifold |
ICRA 2025 |
VR-Folding |
Sim
|
Language-guided folding using ViT, CLOTH3D rendering |
VLA |
|
|
General-purpose Clothes Manipulation with Semantic Keypoints |
|
ICRA 2025 |
SoftGym, CLOTH3D |
Sim |
Keypoints + LLM planning + action primitives (folding included) |
VLA |
|
|
APS-Net https://arxiv.org/abs/2506.22769 |
|
arXiv 2025 |
Collected demos |
Real |
Standardization + folding pipeline with reward shaping |
VA |
|
|
FoldNet https://arxiv.org/pdf/2505.09109 |
|
arXiv 2025 |
Synthetic demonstrations |
Sim |
Keypoint-driven folding policy from templates |
VA |
|
|
MetaFold
|
|
2025.03 |
MetaFold Dataset, DiffClothAI |
Sim |
LLM + CVAE for trajectory generation between keypoints |
VLA |
|
|
SSFold https://arxiv.org/abs/2411.02608 |
|
arXiv 2024 |
Human demo (sim to real) |
Real
|
Graph dynamics + folding generalization |
VA |
|
|
UniFolding
|
|
CoRL 2023 |
|
Real |
Sample-efficient, scalable folding |
VA |
|
|
Foldsformer https://arxiv.org/abs/2301.03003 |
|
arXiv 2023 |
|
Sim → Real |
Space-time transformer for multi-step folding |
VLA |
|
|
Dual-arm Hem Folding |
|
ROBOMECH 2023 |
|
Real |
Four-step hem folding with real clothes |
VA |
|
|
SpeedFolding https://arxiv.org/abs/2208.10552 |
|
IROS 2022 |
Collected from 4300+ actions |
Real |
Efficient dual-arm folding pipeline |
VA |
|
|
Keypoints from Synthetic Data for Cloth Folding https://arxiv.org/abs/2205.06714 |
|
ICRA Workshop 2022 |
Synthetic |
Real |
CNN keypoint detector for towel folding |
VA |
|
|
1hr Real RL Fabric Folding https://proceedings.mlr.press/v155/lee21a/lee21a.pdf |
|
MLR 2019 |
|
Real |
Self-supervised RL in 1hr for goal-conditioned folding |
RL |
|
|
Gravity-Based Robotic Cloth Folding |
|
2010 |
|
Real |
Classic geometric g-fold algorithm |
VA |
|
|
FabricFolding: Learning Efficient Fabric Folding without Expert Demonstrations https://arxiv.org/abs/2303.06587 |
|
Robotica 2024 |
Fabric Keypoint Dataset (~1800 RGB-D images) |
Real |
Dual-stage: unfold with hybrid actions, then fold using keypoint heuristic; 88–92% success; no expert demos |
|
|
|
GarmNet: Improving Global with Local Perception for Robotic Laundry Folding https://arxiv.org/abs/1907.00408 |
|
2019 |
RGB-D |
Real |
Landmark detection and global context model for folding |
VA |
|
|
Diffusion Dynamics Models with Generative State Estimation for Cloth Manipulation https://arxiv.org/abs/2403.00213 |
|
2025.03 |
|
Sim |
Diffusion model predicts cloth dynamics; includes folding tasks; generative state estimator |
VA |
|
|
Dynamic Cloth Folding Using Curriculum Learning https://www.researchgate.net/publication/378427985 |
|
2023 |
|
Sim |
Trains folding agents using curriculum RL; handles long-sleeved garments in simulation |
RL |
|
|
π₀: A Vision-Language-Action Flow Model for General Robot Control https://arxiv.org/abs/2504.16054 |
|
2024.10 |
OXE + 7 robots |
Real |
Flow-matching + PaliGemma VLM; folding included in task suite |
VLA |
|
|
π₀.₅: Open-world Generalization for Vision-Language-Action Robots https://arxiv.org/abs/2504.16054 |
|
2025.05 |
Multi-robot + web data |
Real |
Successor to π₀; general-purpose VLA model including household tasks |
VLA |
|
|
Learning Visual Feedback Control for Dynamic Cloth Folding (IROS 2022) https://arxiv.org/abs/2109.04771 |
https://github.com/hietalajulius/dynamic-cloth-folding |
2021.09 |
Sim + Real (Franka, D435) |
Real |
RL-based visual feedback control; trained in sim, transferred to real; dynamic square cloth folding |
RL |
|
|
AdaFold: Adapting Folding Trajectories via Feedback-loop Manipulation https://arxiv.org/abs/2403.06210 |
https://github.com/albiLo17/Adafold |
2024.03 |
Sim + Real
|
Real
|
MPC controller adapts cloth folding; uses semantic features & point cloud feedback; generalizes to new cloths |
VA |
|
最后,期待与各赛队东莞见,具身智能,有你才能!
感谢分享
感谢分享
牛的
很有帮助!
牛的
感谢分享!
92
社区成员
发帖
与我相关
我的任务加载中
「智能机器人开发者大赛」官方平台,致力于为开发者和参赛选手提供赛事技术指导、行业标准解读及团队实战案例解析;聚焦智能机器人开发全栈技术闭环,助力开发者攻克技术瓶颈,促进软硬件集成、场景应用及商业化落地
试试用AI创作助手写篇文章吧