NeurIPS 2024 oral notes

P.S. The paper descriptions are based on my personal understanding. Some text were extracted from the abstracts and reviews.

Domain

LLM

  • ZhenghaoLin2024NeurIPS scores training tokens using a reference model and then training the language model with a focused loss on tokens with higher scores selectively, proving "not all tokens are what you need"
  • Quantization
    • HaokunLin2024NeurIPS utilizes rotation and permutation transformations to more effectively mitigate both massive and normal outliers when quantizing LLMs
    • VladimirMalinovskii2024NeurIPS proposes a quantization-aware strategies for fine-tuning the LLMs after quantization, improving its performance especially in extreme-compression
  • Alignment
    • JiamingJi2024NeurIPS proposes Aligner, a small model that learns the correctional residuals between preferred and dis-preferred answers, which can be used as a model-agnostic, plug-and-play alignment module with only one-off training
  • Evaluation
    • ZheHu2024NeurIPS evaluates vision language models' abilities to understand human humor with the YesBut dataset
    • RicardoDominguezOlmedo2024NeurIPS evaluates LLM's answer to survey questions that's designed for human and reveals that it suffers strong positional bias; if the positional bias is properly controlled, it represents aggregated uniform. Therefore, it's dangerous to interpret its response to survey as if it's a real human being
    • ArjunPanickssery2024NeurIPS evaluates the self-preference bias when LLM acting as an evaluator, which is an issue with wide implications in LLM benchmarking, reward modeling, and self-refinement
    • QiguangChen2024NeurIPS introduces the reasoning boundary (RB) for (a) the quantitative metrics to assess CoT capabilities and (b) explains how certain strategies optimizes CoT performance
  • Multimodal
    • ShengbangTong2024NeurIPS explores the design choices of the vision components of MLLMs
    • ChengyiCai2024NeurIPS uses Bayesian-guided label mapping for visual reprogramming, in replacement of the simple one-to-one mapping of the training and downstream labels

AI agent

  • GabrielPoesia2024NeurIPS replicates the axiom-conjuncture-proving scheme of human mathematician
  • ShangziXue2024NeurIPS introduces a reasoning tree framework consisting of decompose-analyze-rethink, noted that the decompose step builds sub-trees while the rethink step reflect & update the parent tree
  • JingchangChen2024NeurIPS replicates the idea of divide-and-conquer for code generation
  • ShaotengLiu2024NeurIPS break down a task into subtasks and dynamically decide whether to solve such a subtask by code generated by the LLM or by a "traditional" RL agent

Computer vision

  • 2D
    • JiaqingZhang2024NeurIPS for single stage end-to-end training of multi-modal fusion detection
    • zhengruiXu2024NeurIPS uses diffusion model as feature extractor for discriminative tasks
    • MichaelLuo2024NeurIPS automatically selects and composes task-specific adapters for diffusion models based on a user-provided prompt
    • Generation
      • KeyuTian2024NeurIPS presents a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction", diverging from the standard raster-scan "next-token prediction"
        It presents two important properties of LLMs: scaling laws and zero-shot task generalization #📖
      • TianhongLi2024NeurIPS+ improves unconditioned image generation by using latent representation to conditioned the image generation process
  • Video
    • Generation
      • SichengXu2024NeurIPS learns a disentangled face latent space for facial dynamics and head motion. It's then used for audio to facial video conversion in real-time
  • 3D
    • Reconstruction
      • RuiqiGao2024NeurIPS uses multi-view diffusion model to generate novel views for 3D reconstruction
    • Segmentation
    • Spatial-temporal
    • Generation
      • MinghuaLiu2024NeurIPS generates mesh with 3D sparse voxels as representation, instead of triplane #📖
        P.S. Trained with 8xH100 for 1 week

Recommendation

  • ShenLi2024NeurIPS proposes to use response time as a cue to learn human preference. Specifically, it combines choices and response times to estimate human utility functions, grounded in the EZ diffusion model from psychology.
    P.S. It claims combining such extra-info accelerates the preference learning process. Hum... Claiming extra-info boost performance sounds trivial, yet claiming that it boosts learning sounds brilliant! Clever one.

AI4Science

  • Physics & chemistry
    • GangLiu2024NeurIPS proposes Graph DiT for conditioned molecular design, with a condition encoder to learn the representation of numerical and categorical properties and a Transformer-based graph denoiser to achieve molecular graph denoising
    • NicholasGao2024NeurIPS designs the over-parametrized & fully learnable neural wave functions, facilitating the use of learnable generalized wave functions for simulating the ground state of many-electron systems
    • YuliaRubanova2024NeurIPS uses learned signed-distance functions (SDFs) to represent the object shapes and to speed up distance computation for GNN based rigid simulation. It's the fist GNN-based simulator that scale to scenes with hundreds of objects and up to 1.1 million nodes
  • Neuroscience
    • SpencerRooke2024NeurIPS finds that (i) the number of contexts storable by the hippocampus grows exponentially with the number of place cells; and (ii) identifies a trade-off between high resolution encoding of position and the number of storable contexts
    • ZixuanGong2024NeurIPS proposes NeuroClips for fMRI-to-video decoding. It first reconstruct video keyframes from high-level semantics flow, and then injects both keyframes and low-level perception flows to a pre-trained T2V diffusion model for video reconstruction
  • Healthcare
    • YubinKim2024NeurIPS introduce a multi-agent framework that enforce a collaboration structure to a team of LLMs for medical decision

AI4Math

  • PDE
    • ZekunShi2024NeurIPS uses stochastic Taylor derivative estimator for efficient amortization of differential operators, boosting the speed of high-order differential operators in large-scale problems, e.g. solving PDE and running Physics-Informed Neural Networks (PINNs)
  • Casual inference
    • FengXie2024NeurIPS theoretically investigates the identification of the bi-directional MR from observational data and develops a cluster fusion-like method for causal inference
    • SiyuanGuo2024NeurIPS develops a casual inference framework for exchangeable generative processes, which naturally arise in multi-environment data, extending current works from i.i.d. (independent and identically distributed) to non i.i.d. settings

Architecture

Diffusion model

  • AntonioTerpin2024NeurIPS enhances the training speed
  • TianweiYin2024NeurIPS enhances one-step generation by improve the distillation scheme with two time-scale update and GAN loss (rather than images sampled from the teacher model)
  • TeroKarras2024NeurIPS guides the model with a smaller, less-trained model, leading to improved variation, comparing with using an unconditional model
  • SangwoongYoon2024NeurIPS uses inverse reinforcement learning (IRL) to improve the sample quality of diffusion generative model

Transformer

  • TianyuHe2024NeurIPS investigates transformer model's out-of-distribution in-context learning ability with a set of constructed arithmetic tasks
  • YuhongChou2024NeurIPS unifies existing linear complexity attention and proposes Meta Linear Attention (MetaLA), to replace the conventional softmax attention
  • YutaoSun2024NeurIPS introduce a decoder-decoder architecture, YOCO, which only caches key-value pairs once, to reduce RAM demands and prefill latency

GNN

  • DongxiaoHe2024NeurIPS reveals the common mechanism behind various contrastive representation learning for GNN as representation scattering and proposes the Scattering Graph Representation Learning (SGRL) framework
  • RaffaelePaolino2024NeurIPS introduces a new hierarchy of graph isomorphism tests, alternative to the standard k-WL hierarchy
  • IoannisKalogeropoulos2024NeurIPS proposes a new GNN-based meta-networks design to include scaling symmetries, instead of only permutation symmetries that has been investigated before

CNN

Traditional machine learning

  • ArthurdaCunha2024NeurIPS close the gap of the Boosting's theoretical lower bound of the pp (the number of training rounds)  - tt (the total parallel work per round) trade-off
  • JinZhang2024NeurIPS establishes generalization upper bound of Rademacher complexity for various tree-based retrievers
  • XinChen2024NeurIPS achieves optimal clustering in Gaussian Mixture Models with anisotropic covariance structures

Supervision

Reinforcement learning

Post-training learning

Federated learning

  • YongzheJia2024NeurIPS introduces a FL framework to address system heterogeneity and domain shifts in edge computing environments. It employs a Model Fusion Pruning (MFP) module to generate personalized compact local models and a Domain Adaptive Regularization (DAR) module to enhance performance across multiple domains

Optimization algorithm

  • AaronDefazio2024NeurIPS introduces schedule-free AdamW, without additional hyper-parameters over standard optimizers with momentum. It's based on the authors' theory that unifies scheduling and iterate averaging
  • RohanAlur2024NeurIPS for human in the loop to incorporate side information that are algorithmically indistinguishable

Do you have any ideas or comments? Please join the discussion on X👇