Articulating then Matching

Abstract

Finding dense correspondences between 3D shapes is a fundamental yet unresolved challenge, especially in real-world environments. These environments present severe challenges, including the lack of time and sufficient samples for training, the prevalence of uncurated extreme-high resolution data with topological distortions, and the need to handle diverse 3D representations. We present ATM, a zero-shot framework that requires no correspondence-specific training and robustly addresses these issues through an articulate-then-match paradigm. Rather than relying on intrinsic geometric properties, ATM leverages pretrained vision foundation models and parametric shape priors to estimate parametric shape models from multi-view renderings, then grounds these estimations via multi-view geometric consistency. By mapping diverse inputs into a shared canonical parametric space, ATM establishes robust coarse correspondences that bypass topological noise, then refines them into precise dense mappings via spectral refinement.

Highlights

Zero-shot

No correspondence-specific training is required. The method operates through test-time optimized parametric reconstructions.

Robust to uncurated scans

ATM handles raw scans with severe topological noise and vertex counts up to 200k per shape.

Representation flexible

The articulate-then-match strategy supports meshes, point clouds, and 3D Gaussian splatting inputs.

Method Overview

ATM method framework. — ATM maps diverse 3D inputs into a shared parametric space before dense correspondence refinement.

Qualitative Results

Texture transfer results on curated datasets. — Texture transfer on curated public datasets, including TOPKIDS, SMAL, FAUST, and SCAPE.

Texture transfer results on raw FAUST scans. — Texture transfer on original FAUST raw scans with increasing vertex counts.

Benchmark Comparisons

Average geodesic error x100 is reported for all comparisons. Lower is better; OOM indicates out-of-memory failure.

Non-Isometric Shape Matching

SMAL evaluates animal shapes; TOPKIDS evaluates humans under strong non-isometric variation.

Method	SMAL	TOPKIDS
Axiomatic Methods
ZoomOut	38.4	33.7
Smooth Shells	36.1	11.8
DiscreteOp	38.1	35.5
Functional Map Methods
UnsupFMNet	-	38.5
SURFMNet	-	48.6
AttentiveFMaps	5.4	23.4
URSSM	6.0	8.9
Synchronous Diff.	3.6	5.4
Semantic Methods
Diff3F	28.4	31.0
DenseMatcher	4.7	6.2
Template-Based Methods
ATM (ours)	3.8	2.4

Near-Isometric Shape Matching

Traditional remeshed human benchmarks where intrinsic spectral methods are typically strong.

Method	FAUST	SCAPE	SHREC19
Axiomatic Methods
BCICP	6.4	11.0	8.0
ZoomOut	6.1	7.5	7.8
Smooth Shells	2.5	4.7	7.6
Functional Map Methods
UnsupFMNet	4.8	9.6	11.1
SURFMNet	2.5	6.0	4.8
URSSM	1.6	1.9	5.7
Semantic Methods
Diff3F	20.7	22.1	26.3
DenseMatcher	1.6	2.0	3.1
Template-Based Methods
3D-CODED	2.5	9.8	7.7
ATM (ours)	1.3	1.7	3.1

High-Resolution Raw Scans

Raw FAUST scans contain topological artifacts and reach roughly 160k-200k vertices.

Resolution	ATM (ours)	ZoomOut	URSSM
5k	2.4	20.9	10.0
10k	2.1	20.6	23.1
20k	2.0	20.2	21.6
40k	2.0	20.6	OOM
80k	1.9	OOM	OOM
120k	1.9	OOM	OOM
Raw	1.9	OOM	OOM

Citation

@misc{atm2027,
  title  = {Articulating then Matching: Zero-Shot Shape Matching for Uncurated Data},
  author = {Liu, Qilong and Xiao, Qinfeng and Yi, Chenyuan and Zhang, Liying and Yick, Kit-lun},
  note   = {Coming soon}
}