NeurIPS 2024 oral notes

P.S. The paper descriptions are based on my personal understanding. Some text were extracted from the abstracts and reviews.

Domain

LLM

ZhenghaoLin2024NeurIPS scores training tokens using a reference model and then training the language model with a focused loss on tokens with higher scores selectively, proving "not all tokens are what you need"
Quantization
- HaokunLin2024NeurIPS utilizes rotation and permutation transformations to more effectively mitigate both massive and normal outliers when quantizing LLMs
- VladimirMalinovskii2024NeurIPS proposes a quantization-aware strategies for fine-tuning the LLMs after quantization, improving its performance especially in extreme-compression
Alignment
- JiamingJi2024NeurIPS proposes Aligner, a small model that learns the correctional residuals between preferred and dis-preferred answers, which can be used as a model-agnostic, plug-and-play alignment module with only one-off training
Evaluation
- ZheHu2024NeurIPS evaluates vision language models' abilities to understand human humor with the YesBut dataset
- RicardoDominguezOlmedo2024NeurIPS evaluates LLM's answer to survey questions that's designed for human and reveals that it suffers strong positional bias; if the positional bias is properly controlled, it represents aggregated uniform. Therefore, it's dangerous to interpret its response to survey as if it's a real human being
- ArjunPanickssery2024NeurIPS evaluates the self-preference bias when LLM acting as an evaluator, which is an issue with wide implications in LLM benchmarking, reward modeling, and self-refinement
- QiguangChen2024NeurIPS introduces the reasoning boundary (RB) for (a) the quantitative metrics to assess CoT capabilities and (b) explains how certain strategies optimizes CoT performance
Multimodal
- ShengbangTong2024NeurIPS explores the design choices of the vision components of MLLMs
- ChengyiCai2024NeurIPS uses Bayesian-guided label mapping for visual reprogramming, in replacement of the simple one-to-one mapping of the training and downstream labels

AI agent

GabrielPoesia2024NeurIPS replicates the axiom-conjuncture-proving scheme of human mathematician
ShangziXue2024NeurIPS introduces a reasoning tree framework consisting of decompose-analyze-rethink, noted that the decompose step builds sub-trees while the rethink step reflect & update the parent tree
JingchangChen2024NeurIPS replicates the idea of divide-and-conquer for code generation
ShaotengLiu2024NeurIPS break down a task into subtasks and dynamically decide whether to solve such a subtask by code generated by the LLM or by a "traditional" RL agent

Computer vision

2D
- JiaqingZhang2024NeurIPS for single stage end-to-end training of multi-modal fusion detection
- zhengruiXu2024NeurIPS uses diffusion model as feature extractor for discriminative tasks
- MichaelLuo2024NeurIPS automatically selects and composes task-specific adapters for diffusion models based on a user-provided prompt
- Generation
  - KeyuTian2024NeurIPS presents a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction", diverging from the standard raster-scan "next-token prediction"
    It presents two important properties of LLMs: scaling laws and zero-shot task generalization #📖
  - TianhongLi2024NeurIPS+ improves unconditioned image generation by using latent representation to conditioned the image generation process
Video
- Generation
  - SichengXu2024NeurIPS learns a disentangled face latent space for facial dynamics and head motion. It's then used for audio to facial video conversion in real-time
3D
- Reconstruction
  - RuiqiGao2024NeurIPS uses multi-view diffusion model to generate novel views for 3D reconstruction
- Segmentation
  - ChangliWu2024NeurIPS uses spatial information to enhance 3D referring expression segmentation
- Spatial-temporal
  - JunhaoCai2024NeurIPS uses Gaussians for simulation & physical property estimation
  - ZhongchaoYi2024NeurIPS cooperative multi-dimensional and multi-task learning for urban intelligence
- Generation
  - MinghuaLiu2024NeurIPS generates mesh with 3D sparse voxels as representation, instead of triplane #📖
    P.S. Trained with 8xH100 for 1 week

Recommendation

ShenLi2024NeurIPS proposes to use response time as a cue to learn human preference. Specifically, it combines choices and response times to estimate human utility functions, grounded in the EZ diffusion model from psychology.
P.S. It claims combining such extra-info accelerates the preference learning process. Hum... Claiming extra-info boost performance sounds trivial, yet claiming that it boosts learning sounds brilliant! Clever one.

AI4Science

Physics & chemistry
- GangLiu2024NeurIPS proposes Graph DiT for conditioned molecular design, with a condition encoder to learn the representation of numerical and categorical properties and a Transformer-based graph denoiser to achieve molecular graph denoising
- NicholasGao2024NeurIPS designs the over-parametrized & fully learnable neural wave functions, facilitating the use of learnable generalized wave functions for simulating the ground state of many-electron systems
- YuliaRubanova2024NeurIPS uses learned signed-distance functions (SDFs) to represent the object shapes and to speed up distance computation for GNN based rigid simulation. It's the fist GNN-based simulator that scale to scenes with hundreds of objects and up to 1.1 million nodes
Neuroscience
- SpencerRooke2024NeurIPS finds that (i) the number of contexts storable by the hippocampus grows exponentially with the number of place cells; and (ii) identifies a trade-off between high resolution encoding of position and the number of storable contexts
- ZixuanGong2024NeurIPS proposes NeuroClips for fMRI-to-video decoding. It first reconstruct video keyframes from high-level semantics flow, and then injects both keyframes and low-level perception flows to a pre-trained T2V diffusion model for video reconstruction
Healthcare
- YubinKim2024NeurIPS introduce a multi-agent framework that enforce a collaboration structure to a team of LLMs for medical decision

AI4Math

PDE
- ZekunShi2024NeurIPS uses stochastic Taylor derivative estimator for efficient amortization of differential operators, boosting the speed of high-order differential operators in large-scale problems, e.g. solving PDE and running Physics-Informed Neural Networks (PINNs)
Casual inference
- FengXie2024NeurIPS theoretically investigates the identification of the bi-directional MR from observational data and develops a cluster fusion-like method for causal inference
- SiyuanGuo2024NeurIPS develops a casual inference framework for exchangeable generative processes, which naturally arise in multi-environment data, extending current works from i.i.d. (independent and identically distributed) to non i.i.d. settings

Architecture

Diffusion model

AntonioTerpin2024NeurIPS enhances the training speed
TianweiYin2024NeurIPS enhances one-step generation by improve the distillation scheme with two time-scale update and GAN loss (rather than images sampled from the teacher model)
TeroKarras2024NeurIPS guides the model with a smaller, less-trained model, leading to improved variation, comparing with using an unconditional model
SangwoongYoon2024NeurIPS uses inverse reinforcement learning (IRL) to improve the sample quality of diffusion generative model

Transformer

TianyuHe2024NeurIPS investigates transformer model's out-of-distribution in-context learning ability with a set of constructed arithmetic tasks
YuhongChou2024NeurIPS unifies existing linear complexity attention and proposes Meta Linear Attention (MetaLA), to replace the conventional softmax attention
YutaoSun2024NeurIPS introduce a decoder-decoder architecture, YOCO, which only caches key-value pairs once, to reduce RAM demands and prefill latency

GNN

DongxiaoHe2024NeurIPS reveals the common mechanism behind various contrastive representation learning for GNN as representation scattering and proposes the Scattering Graph Representation Learning (SGRL) framework
RaffaelePaolino2024NeurIPS introduces a new hierarchy of graph isomorphism tests, alternative to the standard k-WL hierarchy
IoannisKalogeropoulos2024NeurIPS proposes a new GNN-based meta-networks design to include scaling symmetries, instead of only permutation symmetries that has been investigated before

CNN

FelixPetersen2024NeurIPS++ uses logic gate to replicate CNN

Traditional machine learning

ArthurdaCunha2024NeurIPS close the gap of the Boosting's theoretical lower bound of the $p$ (the number of training rounds) - $t$ (the total parallel work per round) trade-off
JinZhang2024NeurIPS establishes generalization upper bound of Rademacher complexity for various tree-based retrievers
XinChen2024NeurIPS achieves optimal clustering in Gaussian Mixture Models with anisotropic covariance structures

Supervision

Reinforcement learning

XiongHuiChen2024NeurIPS introduces policy learning from tutorial books and verifies the idea by outperforming GPT-agent without using real data for training
JaydenTeoh2024NeurIPS proposes coverage-based novelty evaluation for unsupervised environmental design (UED)
DengweiZhao2024NeurIPS enhances A* heuristic search with selective sampling
PhilipAmortila2024NeurIPS explores the latent dynamics
Theoretical analysis
- YangPeng2024NeurIPS analyzes the finite-sample performance of distributional temporal difference leading (TD)
- MatthewZurek2024NeurIPS analyzes the optimal sample complexity
- SudeepSalgia2024NeurIPS investigates the trade-off between sample and communication complexity in federated Q-learning

Post-training learning

Fine tuning
- ChunlinTian2024NeurIPS investigates the parameter inefficiency of LoRA and proposes an improvement
Knowledge distillation
- HaonanLin2024NeurIPS+ aligns teacher's & student's attention

Federated learning

YongzheJia2024NeurIPS introduces a FL framework to address system heterogeneity and domain shifts in edge computing environments. It employs a Model Fusion Pruning (MFP) module to generate personalized compact local models and a Domain Adaptive Regularization (DAR) module to enhance performance across multiple domains

Optimization algorithm

AaronDefazio2024NeurIPS introduces schedule-free AdamW, without additional hyper-parameters over standard optimizers with momentum. It's based on the authors' theory that unifies scheduling and iterate averaging
RohanAlur2024NeurIPS for human in the loop to incorporate side information that are algorithmically indistinguishable

Do you have any ideas or comments? Please join the discussion on X👇