Zero-Shot Shape Matching

Articulating then Matching

Zero-Shot Shape Matching for Uncurated Data

Qilong Liu*, Qinfeng Xiao*, Chenyuan Yi, Zhang Liying, Kit-lun Yick

1Hong Kong Polytechnic University, HK SAR
2Artificial Intelligence in Design, HK SAR

*Equal contribution. Corresponding author.

Code Coming Soon
Overview of ATM for zero-shot dense shape matching.
ATM grounds 2D vision foundation models with parametric shape priors, then refines dense correspondences across challenging 3D representations.

Abstract

Finding dense correspondences between 3D shapes is a fundamental yet unresolved challenge, especially in real-world environments. These environments present severe challenges, including the lack of time and sufficient samples for training, the prevalence of uncurated extreme-high resolution data with topological distortions, and the need to handle diverse 3D representations. We present ATM, a zero-shot framework that requires no correspondence-specific training and robustly addresses these issues through an articulate-then-match paradigm. Rather than relying on intrinsic geometric properties, ATM leverages pretrained vision foundation models and parametric shape priors to estimate parametric shape models from multi-view renderings, then grounds these estimations via multi-view geometric consistency. By mapping diverse inputs into a shared canonical parametric space, ATM establishes robust coarse correspondences that bypass topological noise, then refines them into precise dense mappings via spectral refinement.

Highlights

Zero-shot

No correspondence-specific training is required. The method operates through test-time optimized parametric reconstructions.

Robust to uncurated scans

ATM handles raw scans with severe topological noise and vertex counts up to 200k per shape.

Representation flexible

The articulate-then-match strategy supports meshes, point clouds, and 3D Gaussian splatting inputs.

Method Overview

ATM method framework.
ATM maps diverse 3D inputs into a shared parametric space before dense correspondence refinement.

Qualitative Results

Texture transfer results on curated datasets.
Texture transfer on curated public datasets, including TOPKIDS, SMAL, FAUST, and SCAPE.
Texture transfer results on raw FAUST scans.
Texture transfer on original FAUST raw scans with increasing vertex counts.

Benchmark Comparisons

Average geodesic error x100 is reported for all comparisons. Lower is better; OOM indicates out-of-memory failure.

Non-Isometric Shape Matching

SMAL evaluates animal shapes; TOPKIDS evaluates humans under strong non-isometric variation.

Method SMAL TOPKIDS
Axiomatic Methods
ZoomOut38.433.7
Smooth Shells36.111.8
DiscreteOp38.135.5
Functional Map Methods
UnsupFMNet-38.5
SURFMNet-48.6
AttentiveFMaps5.423.4
URSSM6.08.9
Synchronous Diff.3.65.4
Semantic Methods
Diff3F28.431.0
DenseMatcher4.76.2
Template-Based Methods
ATM (ours)3.82.4

Near-Isometric Shape Matching

Traditional remeshed human benchmarks where intrinsic spectral methods are typically strong.

Method FAUST SCAPE SHREC19
Axiomatic Methods
BCICP6.411.08.0
ZoomOut6.17.57.8
Smooth Shells2.54.77.6
Functional Map Methods
UnsupFMNet4.89.611.1
SURFMNet2.56.04.8
URSSM1.61.95.7
Semantic Methods
Diff3F20.722.126.3
DenseMatcher1.62.03.1
Template-Based Methods
3D-CODED2.59.87.7
ATM (ours)1.31.73.1

High-Resolution Raw Scans

Raw FAUST scans contain topological artifacts and reach roughly 160k-200k vertices.

Resolution ATM (ours) ZoomOut URSSM
5k2.420.910.0
10k2.120.623.1
20k2.020.221.6
40k2.020.6OOM
80k1.9OOMOOM
120k1.9OOMOOM
Raw1.9OOMOOM

Citation

@misc{atm2027,
  title  = {Articulating then Matching: Zero-Shot Shape Matching for Uncurated Data},
  author = {Liu, Qilong and Xiao, Qinfeng and Yi, Chenyuan and Zhang, Liying and Yick, Kit-lun},
  note   = {Coming soon}
}