Abstract
Finding dense correspondences between 3D shapes is a fundamental yet unresolved challenge, especially in real-world environments. These environments present severe challenges, including the lack of time and sufficient samples for training, the prevalence of uncurated extreme-high resolution data with topological distortions, and the need to handle diverse 3D representations. We present ATM, a zero-shot framework that requires no correspondence-specific training and robustly addresses these issues through an articulate-then-match paradigm. Rather than relying on intrinsic geometric properties, ATM leverages pretrained vision foundation models and parametric shape priors to estimate parametric shape models from multi-view renderings, then grounds these estimations via multi-view geometric consistency. By mapping diverse inputs into a shared canonical parametric space, ATM establishes robust coarse correspondences that bypass topological noise, then refines them into precise dense mappings via spectral refinement.
Highlights
Zero-shot
No correspondence-specific training is required. The method operates through test-time optimized parametric reconstructions.
Robust to uncurated scans
ATM handles raw scans with severe topological noise and vertex counts up to 200k per shape.
Representation flexible
The articulate-then-match strategy supports meshes, point clouds, and 3D Gaussian splatting inputs.
Method Overview
Qualitative Results
Benchmark Comparisons
Average geodesic error x100 is reported for all comparisons. Lower is better; OOM indicates out-of-memory failure.
Non-Isometric Shape Matching
SMAL evaluates animal shapes; TOPKIDS evaluates humans under strong non-isometric variation.
| Method | SMAL | TOPKIDS |
|---|---|---|
| Axiomatic Methods | ||
| ZoomOut | 38.4 | 33.7 |
| Smooth Shells | 36.1 | 11.8 |
| DiscreteOp | 38.1 | 35.5 |
| Functional Map Methods | ||
| UnsupFMNet | - | 38.5 |
| SURFMNet | - | 48.6 |
| AttentiveFMaps | 5.4 | 23.4 |
| URSSM | 6.0 | 8.9 |
| Synchronous Diff. | 3.6 | 5.4 |
| Semantic Methods | ||
| Diff3F | 28.4 | 31.0 |
| DenseMatcher | 4.7 | 6.2 |
| Template-Based Methods | ||
| ATM (ours) | 3.8 | 2.4 |
Near-Isometric Shape Matching
Traditional remeshed human benchmarks where intrinsic spectral methods are typically strong.
| Method | FAUST | SCAPE | SHREC19 |
|---|---|---|---|
| Axiomatic Methods | |||
| BCICP | 6.4 | 11.0 | 8.0 |
| ZoomOut | 6.1 | 7.5 | 7.8 |
| Smooth Shells | 2.5 | 4.7 | 7.6 |
| Functional Map Methods | |||
| UnsupFMNet | 4.8 | 9.6 | 11.1 |
| SURFMNet | 2.5 | 6.0 | 4.8 |
| URSSM | 1.6 | 1.9 | 5.7 |
| Semantic Methods | |||
| Diff3F | 20.7 | 22.1 | 26.3 |
| DenseMatcher | 1.6 | 2.0 | 3.1 |
| Template-Based Methods | |||
| 3D-CODED | 2.5 | 9.8 | 7.7 |
| ATM (ours) | 1.3 | 1.7 | 3.1 |
High-Resolution Raw Scans
Raw FAUST scans contain topological artifacts and reach roughly 160k-200k vertices.
| Resolution | ATM (ours) | ZoomOut | URSSM |
|---|---|---|---|
| 5k | 2.4 | 20.9 | 10.0 |
| 10k | 2.1 | 20.6 | 23.1 |
| 20k | 2.0 | 20.2 | 21.6 |
| 40k | 2.0 | 20.6 | OOM |
| 80k | 1.9 | OOM | OOM |
| 120k | 1.9 | OOM | OOM |
| Raw | 1.9 | OOM | OOM |
Citation
@misc{atm2027,
title = {Articulating then Matching: Zero-Shot Shape Matching for Uncurated Data},
author = {Liu, Qilong and Xiao, Qinfeng and Yi, Chenyuan and Zhang, Liying and Yick, Kit-lun},
note = {Coming soon}
}