ICLR 2025 oral notes

P.S. The paper descriptions are based on my personal understanding. Some text were extracted from the abstracts and reviews.

Domain

LLM

Transfusion ChuntingZhou2025ICLR combines next-token prediction for text and diffusion-based learning for images within a single transformer architecture, bridging the modality gap without quantizing images into discrete tokens #🧠
Embedding
- AlexIacob2025ICLR introduces a pre-training framework that decouples embedding layers from the transformer body, enabling robust training on heterogeneous data (avoiding the curse of multi-linguality), improving generalization, and reducing memory footprint
- ZiyueLi2025ICLR+ investigates the limitation of using decoder-only models for embedding and finds that a good embedding can be acquired from the MoE layer, by combining routing weights (RW) and hidden states (HS)
  They found that weighted sum of RW and HS outperforms concatenation, similar to [[AshishVaswani2017NeurIPS|Transformer]]'s positional embedding
- KihoPark2025ICLR extend the linear representation hypothesis to general concepts and show that hierarchical relationships are encoded as orthogonality #🧠
Training
- Analysis
  - YiRen2025ICLR proposes a novel learning dynamics framework, i.e. how specific training examples influences the model's predictions on other examples, to understand LLM's behavior during fine-tuning (e.g., SFT, DPO, and other variants)
    Some counter-intuitive behavior can be well explained by the proposed framework, e.g. specific types of hallucination are strengthened after fine-tuning #🧠
  - JiyeonKim2025ICLR introduces the concept of knowledge entropy to analyze how language models store and access knowledge and shows that knowledge entropy decreases as models are trained, correlating with a reduced ability to learn new information and an increased tendency to forget existing knowledge #🧠
  - YudaSong2025ICLR conducts a comprehensive examination on LLM self-improvement capability and finds that the generation-verification gap grows with more training, with better or worse model verifying or generating
  - SachinGoyal2025ICLR identifies and thoroughly analyzes an important phenomenon called context-parametric inversion in instruction-tuned large language models, where models counter-intuitively become less reliant on input context as training progresses
- Pre-training
  - ZiqingFan2025ICLR addresses the dimensional collapse issue when using selective domain-specific data for pre-training (file selection). In the core is a classical greedy algorithm to achieve more uniform eigenvalues in the feature covariance matrix of the selected texts
  - YuxianGu2025ICLR+ leverages optimal control principles, specifically, Pontryagin's Maximum Principle (PMP), for data selection to enhance data efficiency
  - ZitongYang2025ICLR recognizes the data inefficiency problem of pre-training, i.e. to learn a fact models must be trained on hundreds to thousands of diverse examples, whereas a fact may only appears once in the corpus. It proposes to bridge this gap with data augmentation on small domain-specific corpus with EntiGraph, which leverages entities and their relations
- Alignment & fine-tuning
  - YuhengZhang2025ICLR+ proposes to conduct RLHF with a two-player game framework, namely the Nash learning problem with human preference, which is a generalization of the contemporary Bradley-Terry model
  - AudreyHuang2025ICLR+ observes that models are often better at evaluating responses than generating them, leading to the insight on how self-improvement shifts probability mass toward high-quality outputs, i.e. "sharpening". It proposes a framework to analyze this process on SFT and RLHF
  - GangweiJiang2025ICLR studies the phenomenon of catastrophic forgetting using Function Vectors and found that task similarity is correlated with the amount of forgetting. It proposes two solution: (ii) intervening on the trained model using function vectors of previous tasks and (iii) training the model with additional regularization with the function vectors of previous tasks
  - Safety
    - TianshengHuang2025ICLR proposes an alignment-stage method to defend against harmful fine-tuning attack by adding a loss regularizer in the alignment stage's optimization
    - XinranWang2025ICLR aligns generative models with multiple human values by framing value alignment as a optimization problem with user-set constraints
    - Backtracking YimingZhang2025ICLR++ rethink about an fundamental limitation of generative LLM: the generation is unidirectional, thus unable to backtrack. By introducing a RESET token for backtracking during SFT or DPO, the author improves the safety without harming helpfulness #🧠
    - XiangyuQi2025ICLR+ probs a fundamental vulnerability in current safety alignment approaches: shallow safety alignment, i.e. primarily adapting a model's generative distribution over only its very first few output tokens. It proposes "deep safety alignment" as a promising defense #🧠
  - Reinforcement learning based
    - HaoSun2025ICLR investigates the use of the Bradley-Terry model for reward modeling in LLM alignment, establishing its theoretical foundations while questioning its necessity for downstream optimization. It introduce order consistency as a central objective and propose a classification-based alternative
      The experiments are quite extensive: more than 12,000 experimental setups, using base LLMs
    - AviralKumar2025ICLR identifies key limitations of existing self-correction methods including distribution shift and behavior collapse and proposes a two-stage RL training approach with (i) self-generated correction traces under the model's own distribution and (ii) appropriate regularization
      The first approach to successfully enable reliable self-correction in LLMs without requiring external models or supervision
    - YantaoLiu2025ICLR presents a new benchmark dataset for the reward model in its sensitivity on subtlety and style, rather than reject answers generated by weak models - a bias that could be easily exploited
  - Data augmentation based
    - XIANGYUPENG2025ICLR proposes to self-synthesize reasoning paths as post-training data of LLMs by progressing from general reasoning structures to task-specific reasoning paths, to improve LLMs' generalization capability in reasoning
    - HaipengLuo2025ICLR uses the Evol-Instruct method from WizardLM to create a strong math SFT dataset. It also integrates process reward models into the reinforcement training pipeline
    - DongyoungKim2025ICLR proposes Spread Preference Annotation with direct preference judgment (SPA), aimed at reducing the high costs associated with collecting large preference datasets for alignment. It achieves superior alignment with only 3.3% of the ground-truth labels
    - AlihanHuyuk2025ICLR proposes to improve reasoning in LLM via fine tuning with counterfactual synthetic data
- Quantization & compression
  - TanishqKumar2025ICLR+ investigates the scaling law for post-training quantization. It finds that additional pre-training data actually degrades quantized models' performance. With 465 pre-training runs, it fits a formula that unifies the scaling laws for post and pre-training quantization to predict degradation
  - ChiHengLin2025ICLR proposes to compress Transformer-based models with a set of particular forms of low-rank matrix factorization for the weight matrices
Inference
- NguyenNhatMinh2025ICLR proposes min-p sampling, a dynamic truncation sampler for LLMs, that improves text quality and diversity, comparing with traditional top-k sampling
  P.S. It's has been widely adopted in applications
- YuFeng2025ICLR introduces a Bayesian inference framework that combines LLMs with structured Bayesian networks to produce more reliable probability estimations for better decision making
- Test-time scaling
  - ZhenruiYue2025ICLR investigates into the inference scaling of RAG, exploring effective strategies beyond merely increasing its context length, including in-context learning and iterative prompting. It also proposes a model to predict the optimal computation allocation, presenting linear gains of performance VS computation
  - JoaoLoula2025ICLR proposes an inference-time method for controlled generation, imposed as probabilistic conditioning, with Sequential Monte Carlo (SMC) techniques
  - CharlieVictorSnell2025ICLR investigates two primary test-time scaling mechanisms: (i) searching against dense verifier reward models and (ii) adaptively updating the model's response distribution based on the prompt. The authors then proposes a compute-optimal strategy that dynamically allocates test-time computation based on the difficulty of the prompt and shows that test-time computation can outperform a 14x larger models
- MoE
  - NiklasMuennighoff2025ICLR+ introduces a 7B LLM leveraging a sparse Mixture-of-Experts (MoE) architecture with model weights, training data, code, and logs open-sourced
  - PengJin2025ICLR proposes a heterogeneous MoE framework that integrates FFN and zero-computation experts (zero, copy, and constant experts). It reduces computing overhead by dynamically assigning simpler tokens to zero-computation experts, improves performance by focusing FFNs on challenging tokens, and eliminates GPU communication overhead
  - ZhengzhuoXu2025ICLR enhances complex chart understanding through a Mixture of Expert (MoE) architecture and the ChartMoE-Align dataset
- Speculative decoding
  - GregorBachmann2025ICLR uses LLM-as-a-judge, i.e. asking the LLM itself to verify the drafted content, for speculative decoding. It's motivated by the limitation of standard distribution-preserving methods
  - HarikrishnaNarasimhan2025ICLR proposes speculative cascading, an integration of cascading and speculative decoding methods to improve language model inference, with good balance of speed and quality
- Model editing
  - HongkangLi2025ICLR+ provide the first theoretical characterization of the generalization guarantees of task vector methods on nonlinear Transformers. Task vector is widely used for model editing, e.g. multi-task learning, unlearning, and out-of-distribution generalization
  - JunfengFang2025ICLR projects the editing perturbations onto the null space of preserved knowledge before applying them to the model parameters, ensuring that the edited LLM’s output remains unchanged for queries related to preserved knowledge, to minimize disruption to existing information
- Copyright
  - JavierAbad2025ICLR adaptively aggregates the model outputs to minimize the reproduction of copyrighted content, based on models trained on disjoint sets of copyrighted material
Evaluation
- YanScholten2025ICLR+ discusses the limitation of deterministic evaluations in capturing the whole output distribution. It proposes a formal probabilistic evaluation framework for LLMs with high-probability guarantees. It presents a solid case in unlearning #🧠
- Evaluation pitfalls
  - XiaosenZheng2025ICLR show that null models that always return the same cheating responses can achieve high win rates on automatic LLM benchmarks, advocating for anti-cheat mechanisms
  - RicardoDominguezOlmedo2025ICLR+defines training on the test task as a potential problem for evaluating LLMs, which could be common practice and strictly speaking not data contamination. It proposes fine-tuning on a small, common set of task-related data to put all the methods on equal footing
- Capability evaluation
  - XimingLu2025ICLR proposed CREATIVITY INDEX, a metric that quantifies the creativity of a text by reconstructing it from existing web snippets, supported by a novel dynamic programming algorithm, DJ SEARCH, for efficient computation
  - ZheyuanZhang2025ICLR presents an evaluation protocol & benchmark to assess the spatial reasoning capabilities of vision language models. The results shed light on the ambiguity and cross-cultural diversity of frame of reference in spatial reasoning
  - Cybench AndyKZhang2025ICLR provides a cybersecurity agent benchmark with 40 professional-level Capture the Flag tasks that are recent, meaningful, and difficult with subtasks
  - DanielPaleka2025ICLR evaluates LLM forecasters for future events on its prediction's logical consistency, where ground truth is inherently unavailable at the time of prediction
  - BigCodeBench TerryYueZhuo2025ICLR presents a new code generation benchmark for evaluating large language models, with focus of diverse function calls and complex instructions, making it more comprehensive and challenging
  - Spyder 2.0 FangyuLei2025ICLR provides a real-world enterprise text-to-SQL workflows benchmark, with 595 complex workflow problems involving databases often exceeding 1,000 columns and stored across diverse systems like BigQuery and Snowflake
  - PrefEval SiyanZhao2025ICLR presents a dataset containing multi-turn conversational data to evaluate the ability of LLMs in adhering to these conversationally expressed preferences in context
  - MMQA JianWu2025ICLR+ provides a benchmark designed to assess the capabilities of LLMs in handling complex multi-table data scenarios, which demand advanced understanding and reasoning across connected tables. It also proposes a multi-table retrieval method with SOTA performance on MMQA
- Safety evaluation
  - DarkBench EsbenKran2025ICLR evaluates dark patterns in LLMs with 660 adversarial prompts across six categories: brand bias, user retention, sycophancy, anthropomorphism, harmful generation, and sneaking
  - AaronJiaxunLi2025ICLR shows that RLHF does not simultaneously optimize LLMs for trustworthiness, in fact in the aspects of stereotypical bias, truthfulness and privacy it has the opposite effect
- LLM as judge
  - FlorianEDorner2025ICLR investigates the limits of using an LLM in lieu of a large set of ground truth labels for evaluating another LLM. Even if the judge LLM is debiased by a small set of ground truth data, the best we can expect is 2x sample efficiency, i.e. LLM as judge won’t beat twice the data
  - JaehunJung2025ICLR proposes a LLM-as-Judge framework that dynamically selects when to trust different judge models to reduce evaluation overhead, while providing a provable guarantee of human-judge agreement
Interpretability
- Identify critical heads
  - JunsolKim2025ICLR probes & isolates the attention heads that are most highly influential over political bias. These heads can be used to monitor the LLM's stances. Interestingly, by applying linear interventions to these attention heads, the LLM can be steered toward a more liberal or conservative stance #🧠
  - WenhaoWu2025ICLR demonstrates how to detect retrieval heads, which extract relevant information from long context, and validates them with experiments. Specifically, under a Needle-in-a-Haystack Test (NIAH) setting, it searches for attention heads that consistently attends to the injected tokens from a long-context input
  - ZhenhongZhou2025ICLR interprets the contribution of individual attention heads to LLM safety and identifies critical heads
- Sparse auto-encoders (SAEs)
  - LeoGao2025ICLR proposes scaled sparse auto-encoders with top-k activation to directly control the number of active latent variables and thereby improve interpretability and lead to significant benefits of reconstruction ability, non-dead latents and sparsity. It reveals clean scaling laws with respect to auto-encoder size and sparsity
  - JavierFerrando2025ICLR begins with the observation that hallucination in LLMs is closely linked to their ability (or inability) to recall knowledge about entities. It then introduces a novel approach with SAEs to identify latent directions within the LLMs' representation space that are responsible for entity recognition
  - SamuelMarks2025ICLR leverages SAEs to decompose models into mono-semantic features and constructs causal circuits from these interpretable features for discovering and editing interpretable sparse feature circuits. Previous work often focused on more coarse units (e.g., entire attention heads or neurons) that tend to be highly poly-semantic
- Probing the model
  - JieZhang2025ICLR+ presents a training-free methods for fingerprinting LLMs to and identify whether another LLM is a subsequent development. It leverages kernel alignment similarity of feature representations, which is robustness to modifications e.g. fine-tuning & model pruning
  - YaoTong2025ICLR proposes a method for quantifying how much has a dataset been used in the training of a given model

AI agent

Examples
- BoyuGou2025ICLR presents a vision-only GUI agents with visual grounding model (UGround) trained on a large synthetic dataset (Web-Hybrid)
- Cybench AndyKZhang2025ICLR builds examples of Capture the Flag (CTF) agents, besides providing a cybersecurity benchmark
- PathGen-1.6M YuxuanSun2025ICLR uses multi-agents collaborating with each other to construct a large-scale pathology dataset
Tool calling
- ChangleQu2025ICLR proposes to boost tool learning by dynamically adjusting/rewriting tool documentation based on the interaction feedback between LLMs and external tools
Decision path
- JiayiZhang2025ICLR reformulates automatic workflow optimization as a search problem over code-represented workflows and proposes a Monte Carlo Tree Search based approach with code modification, tree-structured experience, and execution feedback
Evaluation
- MaojiaSong2025ICLR evaluates LLMs’ performance as a RAG component across three dimensions: trustworthiness, citation groundedness, and refusal groundedness. It proposes an alignment method to improve LLMs for the RAG task
- ZhiyuanWeng2025ICLR takes inspiration from the famous Asch conformity experiments and shows that a similar phenomenon occurs with LLM based multi-agent systems. It provides a benchmark and some tricks to mitigate it
- MLE-bench JunShernChan2025ICLR provides a benchmark on LLM agent's ability to solve ML engineering tasks taken from Kaggle. It conducts comprehensive experiments and analysis

Computer vision

2D
- Generation
  - EnzeXie2025ICLR integrates high-compression-rate (32x) auto-encoder, linear attention, and a unique sampling solver to enable fast & high-resolution generation even on a laptop. It also shows the effective use of the LLM for text encoding
  - DSPO HuaishengZhu2025ICLR fine-tunes diffusion models on human preferences using score-matching, achieving better results than direct preference optimization (DPO)
- Editing
  - LvminZhang2025ICLR presents an illumination control model for relighting foreground objects using image diffusion models. It proposes a physically-grounded light transport consistency loss & effective strategies for collecting paired illumination datasets
    The results were praised as impressive
  - Style transfer
    - LituRout2025ICLR+ proposes a plug-and-play test-time optimization method for training-free stylization & content-style composition of diffusion models based on stochastic optimal control
- Segmentation
  - SAM2 NikhilaRavi2025ICLR
  - ShilinXu2025ICLR fills the gap in real-time multi-purpose segmentation
Video
- Generation
  - HuayuChen2025ICLR proposes Condition Contrastive Alignment (CCA), which directly fine-tunes pre-trained models to achieve guidance-free auto-regressive (AR) visual generation, to avoid the multi-modal inconsistencies induced by Classifier-Free Guidance (CFG)
  - HanyuWang2025ICLR presents a method for learning video tokenization with learned holistic queries that goes beyond patch-level representations and demonstrates improvements in auto-regressive (AR) video generative
  - Conditioning
    - HanLin2025ICLR+ adapts the pre-trained ControlNets to the new backbones in image and video generation, addressing the feature space mismatch problem. It provides a computationally-efficient approach to integrate ControlNets for video generation
    - JianwenJiang2025ICLR presents an audio-driven portrait animation pipeline that demonstrates natural talking-head motion
    - GaojieLin2025ICLR proposes a Region Attention Module, uses Human-Prior-Guided Conditions to improve the quality of the local hand and face regions, and reduces the ambiguity of driving body motion with the weaker audio signal
    - HaiyangLiu2025ICLR generates high-fidelity co-speech gesture videos using a motion graph-based retrieval approach. It addresses audio-motion misalignment and visual artifacts by introducing (i) a hierarchical audio-motion joint embedding and (ii) a diffusion-based interpolation network
3D
- 3D segmentation
  - XiuweiXu2025ICLR proposes a geometry-aware module that lifts 2D mask to 3D queries for using SAM, a 2D vision foundation models (VFM), in real time 3D segmentation #🚀
  - Open-YOLO 3D MohamedElAmineBoudjoghra2025ICLR presents an efficient approach to open-vocabulary 3D instance segmentation by leveraging 2D bounding box priors from a pre-trained open-vocabulary 2D object detector. Main contribution is the Multi-View Prompt Distribution (MVPDist) method, which effectively utilizes multi-view information while addressing potential misclassification from the 2D object detector #🚀
- 3D reconstruction
  - BotaoYe2025ICLR reconstructs 3DGS from sparse and unposed images, avoiding errors associated with per-frame Gaussians and pose estimation. It's trained solely on photometric constraints without the geometric ground truth, making a wider range of datasets available for training
  - TetSphere splatting MinghaoGuo2025ICLR represents 3D shapes by deforming a collection of tetrahedral spheres, with geometric regularizations and constraints that effectively resolve common mesh issues such as irregular triangles, non-manifoldness, and floating artifacts
  - NeuralPlane HanqiaoYe2025ICLR utilizes foundational models to provide prior (normal, segmentation, etc) and then employ neural fields to learn a plane field that is aware of both geometry and scene semantics
  - Grendel HexuZhao2025ICLR presents a parallel training method for 3DGS for 3D reconstruction, which significantly improves the training speed and working scene scale #🚀
- Novel view synthesis
  - LVSM HaianJin2025ICLR uses purely transformer-based framework for scalable and generalizable novel view synthesis from sparse-view inputs. It bypasses the 3D inductive biases used in previous methods, from 3D representations (e.g., NeRF, 3DGS) to network designs (e.g., epipolar projections, plane sweeps), addressing novel view synthesis with a fully data-driven approach #🚀
Vision-language model
- SimonSchrodi2025ICLR studies the phenomena of modality gap and object bias in contrastive VLMs, and shows that they stem from an information imbalance between modalities, limiting alignment in the embedding space, with the modality gap driven by few dimensions
  Praised by the reviewers as intriguing by connecting the modality gap with entropy, an innovative perspective
- AvikPal2025ICLR studies the hierarchical visual-text representation with hyperbolic representations through unsupervised contrastive training. It leverages the hierarchical relation within the image (whole image and objects) and the text (whole sentence and nouns) to construct a hierarchical embedding space, where the more general terms (objects / nouns) are pushing towards the origin and the more specific terms (sentences and whole images) are pushing towards the boundary
- YongxianWei2025ICLR fills the gap of applying Data-Free Knowledge Distillation (DFKD) for the CLIP model
- Visual grounding
  - XinGu2025ICLR+ introduces target-aware object queries for spatio-temporal video grounding (STVG), which is a significant departure from traditional zero-initialized queries. It comprises text-guided temporal sampling (TTS) module and attribute-aware spatial activation (ASA), working in a cascade
- Evaluation
  - YueYang2025ICLR reduces data contamination and enabling dynamic difficulty by using a multimodal bootstrapping module to dynamically generate new visual question-answering samples
  - MMIE PengXia2025ICLR+ provides a benchmark on interleaved multimodal comprehension and generation abilities. It comprises 20K multimodal queries across 12 fields, supporting interleaved text, multi-image inputs, and outputs in multiple-choice/open-ended formats. The automated evaluation metric is based on a fine-tuned LLM
  - PhysBench WeiChow2025ICLR provides a benchmark on VLMs' physical world understanding capability across a diverse set of tasks, contains 10,002 entries of interleaved video-image-text data. It also proposes a method to enhance VLMs' ability with specialized vision models

Robotics

TaiHoang2025ICLR introduces Geometry-aware RL with heterogeneous SE(3) equi-variant back-bone policy for robotic manipulation, enabling effective manipulation of rigid and deformable objects with multiple actuators
YinanZheng2025ICLR proposes a transformer-based Diffusion Planner for closed-loop planning, which can effectively model multi-modal driving behavior and ensure trajectory quality without any rule-based refinement
FanqiLin2025ICLR investigates whether scaling laws applies to imitation learning
YangTian2025ICLR presents an end-to-end paradigm that predicts actions using inverse dynamics models conditioned on the robot's forecasted visual states, named Predictive Inverse Dynamics Models (PIDM). The motivation is closing the loop between vision and action and make them work in synergy
VitalisVosylius2025ICLR achieves few shots In-Context Imitation Learning (ICIL) by reformulates ICIL as a graph generation problem using a learned diffusion process, enabling structured reasoning over demonstrations, observations, and actions
PhysAgent WeiChow2025ICLR enhance embodied agents physical world understanding capabilities by combining the generalization strengths of VLMs with the specialized expertise of vision models. It also provides a benchmark on VLMs' physical world understanding capability across a diverse set of tasks, contains 10,002 entries of interleaved video-image-text data. It also proposes a method to

Recommendation

LehengSheng2025ICLR demonstrates that linearly mapped advanced LM representations as item representations yield superior recommendation performance

Human-AI cooperation

CuongCNguyen2025ICLR uses a Expectation-Maximization approach to address the missing annotation problem for learning to defer (to human). It also enhance the workload distribution in the E-step

AI4Science

Physics & chemistry
- MarioLinoValencia2025ICLR proposes a graph-based latent diffusion model for computational fluid dynamics, which enables direct sampling of unsteady flow states from their equilibrium distribution given a mesh discretization of the system and its physical parameters
- PinChen2025ICLR presents a dataset of electronic charge density in crystalline inorganic materials
- Molecule modeling
  - ShihHsinWang2025ICLR proposes a 3D graph construction method for molecules modeling with sparsity, connectivity, and rigidity guarantees. The reviewer commented that this is an important yet somehow overlooked problem in the field of 3D GNNs
  - GabrieleCorso2025ICLR proposes unbalanced flow matching for molecular docking problem
- Molecule design
  - TomasGeffner2025ICLR introduces a flow-based generative model for protein backbone design, leveraging hierarchical fold class labels and a scalable non-equivariant transformer architecture
  - JingjingGong2025ICLR extends Bayesian Flow Network (BFN) to ProfileBFN that enables MSA profile-based protein family design
  - HannesStark2025ICLR generates protein structures conditioned on 3D ellipsoids that encode spatial layout information. The ellipsoids could be hand-constructed, extracted from existing proteins, or from a statistical model, enabling a vast range of molecule design applications
Biology
- ZiweiYang2025ICLR learns Gene interaction with (i) a deep generative model learns distinct disease subtypes from patient gene expression profiles and (ii) a graph neural network captures representations of prior gene networks from knowledge databases
- ZhenyiZhang2025ICLR learns stochastic dynamics from discretely observed data, e.g. single-cell RNA data. It connects regularized unbalanced optimal transport (RUOT) to Schrödinger bridge novelly and in particular comes alongside a careful treatment of unbalanced effects
- XingyuSu2025ICLR proposes a sequence-to-expression network explicitly designed to discover and extract regulatory elements that drive target gene expression, enhancing the accuracy of the gene expression prediction
Neuroscience
- DehongXu2025ICLR investigates the conformal isometry hypothesis that leads to the emergence of hexagon periodic patterns in grid cells, showing that learning a maximally distance-preserving position embedding naturally leads to such patterns
- MohammadBashiri2025ICLR learns single-neuron invariance manifolds and aligns them with affine transformation, enabling population-level exploration of neural invariances
- AtsunobuKotani2025ICLR proposes a computational framework to model the emergence of color vision in the human brain, with biologically realistic simulations of optic nerve signals and a self-supervised learning mechanism that infers the color dimensionality without predefined assumptions
- AminNejatbakhsh2025ICLR estimates the similarity between noisy neural trajectories with optimal transport distances
- GeelingChau2025ICLR introduces a self-supervised model called Population Transformer to model brain-wide neural activity sparsely and variably measured across subjects and datasets. Representations generated by this pre-trained model can then be used to perform downstream decoding tasks, leading to superior accuracy compared to models only trained on one specific dataset
Geoscience
- ZiyeWang2025ICLR proposes a novel framework for 3D weather nowcasting, combining SpatioTemporal Coherent Gaussian Splatting (STC-GS) for dynamic radar representation and GauMamba, a memory-augmented predictive network
Economics
- AliShirali2025ICLR studies a resource allocation problem in a pool of individuals where waiting for more observations improves resource allocation, yet risking some individuals of leaving the pool if resources are not allocated on time

AI4Math

miniCTX JiewenHu2025ICLR presents a benchmark for evaluating LLM-based theorem-proving models in real-world, context-rich scenarios, sourcing from real Lean projects and textbooks. It also provides NTP-toolkit, an automated tool for data extraction and annotation
MaxenceFaldor2025ICLR+ proposes an open-source python library for cell automaton (CA) with GPU acceleration provided by the JAX library
Equation discovery
- ParshinShojaee2025ICLR leverages large language models for symbolic regression by integrating program synthesis, numerical optimization, and evolutionary search to discover accurate and generalizable scientific equations
Differential Equation
- TianxiangGao2025ICLR investigates the impact of activation functions on Neural ODE training. It shows that smoothness of the activations are essential for stabilizing training and guaranteeing a unique solution
- RicardoBuitrago2025ICLR introduces a neural operator with memory for modeling time-dependent PDEs, extending beyond standard Markovian neural operators that only depend on the current state
- HonghuiWang2025ICLR proposes to replace global modulations with spatial ones for PDE modeling with neural fields
- JindouJia2025ICLR introduces feedback mechanisms to correct prediction errors in learned latent dynamics, enhancing the generalization capabilities of Neural ODEs for continuous-time prediction tasks
Causal inference
- HaoyueDai2025ICLR identifies limitations of existing causal discovery methods under pre-intervention selection bias and proposes the interventional twin graph framework to explicitly model both observed and counterfactual worlds, defining Markov properties and sound algorithms for causal inference
- ZijianLi2025ICLR+ proposes a framework to identify temporally causal relations with instantaneous dependencies
- XiaoHan2025ICLR proposes AERCA, a novel method integrating Granger causal discovery and root cause analysis to identify anomalies in multivariate time series
Game theory
- YanzhengChen2025ICLR investigates in the last iterate convergence guarantees of Extra Gradient and Optimistic Gradient algorithms for time varying games that converge to some smooth monotone game
- AlexandrosHollender2025ICLR investigates the computational complexity of computing a Nash equilibrium in two-team zero-sum poly-matrix games and proves it to be CLS-hard

AI4Health

PathGen-1.6M YuxuanSun2025ICLR uses multi-agents collaborating with each other to extract representative WSI patches, leading to a large-scale pathology dataset with 1.6M high-quality image-caption pairs. It then trains a pathology-specific CLIP model, PathGen-CLIP
Drug design
- KeirAdams2025ICLR designs a diffusion model that jointly generates 3D molecules and explicit representations of their 3D shapes, electrostatics, and pharmacophores and demonstrate its utility in bioisosteric drug design
- ChenbinZhang2025ICLR proposes a similarity-aware evaluation (SAE) approach for splitting datasets in drug-target affinity prediction, aiming to produce test sets with controlled similarity distributions

AI4Design

ZijieGeng2025ICLR+ proposes to optimize macro placement in chip design by training an offline predictor to estimate cross-stage metrics and generating a pixel-level placement mask

Architecture

Diffusion models

Diffusion models
- ZijingOu2025ICLR directly regresses the optimal (diagonal) covariances to improve the sampling efficiency and accuracy
- Analysis
  - BrunoKacperMlodozeniec2025ICLR extends influence functions for data attribution to diffusion models, addressing the computational challenges of Hessian inversions with scalable approximations. It also provides theoretical insights that unifies prior works
- Discrete diffusion model
  - NetaShaul2025ICLR makes a significant contribution to discrete generative modeling by broadening the design space of flow matching methods, allowing the use of arbitrary probability paths with a strong theoretical foundation grounded in kinetic-optimal velocities
  - YongxingZhang2025ICLR extends diffusion model to learn distributions over the group of permutations $S_n$ , which is essential in fields of combinatorics, physics, and chemistry, etc.
- Flexible-length generation
  - Block Diffusion MarianneArriola2025ICLR proposes to decompose a sequence into blocks of tokens, within each discrete diffusion is used, for enabling flexible-length generation. It also improves inference efficiency with KV caching and parallel token sampling #🧠
- Few-step generation
  - VinhTong2025ICLR improves the inference-time time-step schedule with a teacher/student framework to learn the optimal time discretization based on minimizing the KL divergence between the teacher and student's output distribution
    It's validated on image, point cloud, & protein structure tasks
  - KevinFrans2025ICLR proposes to distill diffusion model to a one/multiple steps generator with single phase training. It conditions the network on the number of desired steps -- a simple and intuitive idea
    Related: TianweiYin2024NeurIPS
  - ChengLu2025ICLR propose a simplified theoretical framework that unifies diffusion models and consistency models (CM), identifying the root causes of continuous-time CM's training instability, leading to significantly improvement of the SOTA CMs
    Praised by the reviewers as having strong theoretical, empirical, and engineering contributions
- Diffusion model as encoder
  - SihyunYu2025ICLR introduces a regularization term to align the diffusion models induced representations with pre-trained self-supervised visual encoders. It boosts the training speed by 17x in an experiment
  - YiboYang2025ICLR draws a connection between diffusion models and compression. It repurposes it for image compression and proposes to use uniform noise instead of Gaussian noise
Flow matching
- PeterHolderrieth2025ICLR shows that the core principles of flow matching can be vastly generalized to practically all continuous-time Markov processes using Markov generators, unifying all previous methods including diffusion model and opening the door to new generative models agnostic to data modality
  Praised by the reviewers as innovative and significant
- PanagiotisTheodoropoulos2025ICLR draws inspiration from guided optimal transport schemes and introduces Feedback Schrödinger Bridge Matching, a semi-supervised matching framework that uses a small set of aligned pairs to guide the transport map of non-coupled samples
- GabrieleCorso2025ICLR proposes unbalanced flow matching to introduce a trade-off between the sampling efficiency from prior distribution and the error of the target distribution approximation, for molecular docking problem

Transformer

JohannesVonOswald2025ICLR gives transformers a random seed as input, which are used to learn randomized algorithms, which shown to be beneficial in adversarial situations, where it is known from standard computer science that the added randomness "breaks" the adversary attack
NeilRathi2025ICLR modifies the standard autoregressive setting by adding a regularizing term that encourages nearby parts the model to behave similarly, with the motivation that this mimics known brain function
Attention
- TianzhuYe2025ICLR mitigates the problem in over-allocate attention weights to irrelevant context by using the difference between two separate softmax attention scores to cancel noise assigned to irrelevant contextual tokens
- SimonSchug2025ICLR demonstrates a mathematical equivalence between multi-head attention and linear hyper-networks, which is then used to provide an explanation for why Transformers are able to compositionally generalize to some extent. Qualitative visualizations show that the heads are consistent with hypothesized compositional functions
  It may open new design space choices for improving attention schemes in general
- Long context
  - XunhaoLai2025ICLR proposes a sparse attention mechanism for efficient long-sequence inference. The core idea is to dynamically adjust sparse attention with query-aware sparse pattern determination and cumulative-attention based index selection
Training efficiency
- Cut Cross-Entropy (CCE) ErikWijmans2025ICLR proposes to drastically reduce the memory consumption of the cross-entropy loss, by avoiding materializing the logits of all tokens in vocabulary and only computes the logits for the correct token and evaluates the log-sum-exp over on the fly
  As the vocabulary size grows, memory consumption increasingly shifts from weights and activations to the cross-entropy layer. Thus such technique could reduce memory footprint of loss computation from 24 GB to 1 MB in a 2B model #🧠
Theoretical analysis
- GiuseppeBruno2025ICLR studies a mean-field limit for a simplified model of transformers, with the framework Geshkovski et al. (2023) and extends it in various theoretical aspects
- JunoKim2025ICLR+ studies a simple setup of $k$ -parity problem with 1-layer Transformer and provides a separation results for transformer (i) trained without intermediate supervision and (ii) trained with teacher forcing, thereby showing the importance of chain-of-thought

RNN

RNN
- RiccardoGrazzi2025ICLR demonstrates that extending the eigenvalue range of Mamba and DeltaNet to include negative values not only enables them to solve parity but consistently improves their performance on state-tracking tasks
- LinOSS TKonstantinRusch2025ICLR is inspired by cortical dynamics of biological neural networks and uses forced harmonic oscillators to form the state space. It outperforms Mamba and LRU by nearly 2x on a sequence modeling task with sequences of length 50k #🚀
Time series pattern machine (TSPM)
- ShiyuWang2025ICLR transforms time series data into multi-resolution images to capture complex temporal and frequency-domain patterns, achieving impressive results in various time series analytical tasks

VAE

ChristopherFifty2025ICLR uses rotation and rescaling linear transformation to propagate gradients through vector quantization layer in the Vector Quantized Variational AutoEncoders (VQ-VAEs). In previous works, the gradients propagation simply skips it as it's non-differentiable

GNN

JonasLinkerhagner2025ICLR jointly denoises feature data and rewiring the graph by aligning the singular vector subspaces of node features and the graph adjacency matrix to achieve spectral resonance
The reviewers praised it to be novel, interesting, and well-presented
YairDavidson2025ICLR investigates the separation power and stability of functions on graphs and multi-sets and provides explicit adversarial boundary cases. Based on the analysis, it proposes a pair-wise separation quality analysis framework based on an adaptation of Lipschitz and Hölder stability
Expresiveness
- JingchuGai2025ICLR analyzes the expressive power of spectral invariant graph neural networks from the perspective of graph homomorphisms
  The reviewers praised that the theoretical contributions are significant and it addresses several open questions
- TuoXu2025ICLR analyzes the logical expressiveness of arbitrary graph neural networks
- YamEitan2025ICLR investigates the expressivity limitations of Higher-Order Message Passing (HOMP) architectures in Topological Deep Learning, identifying "topological blindspots" such as diameter, orientability, planarity, and homology. It proposes two new architectures as remedies
Safty
- ZhiweiZhang2025ICLR+ presents a defense method against graph backdoor attacks, combining poisoned node detection and robust training

Neuroscience inspired

AKOrN TakeruMiyato2025ICLR takes on a long-standing idea from the computational neuroscience community, namely, that binding of different features in an input can be done using oscillations in unit activity that synchronize to indicate binding
MrinalMathur2025ICLR proposes to dynamically allocate network compute based on the difficulty of the inputs. The proposed method consists of two networks: a) a prediction network and b) an introspection network that decides which layers to run. It shows strong results of a 3-layer network surpassing much larger ResNet-50 and EfficientNet on ImageNet

Information theory inspired

KAN (Kolmogorov-Arnold Networks) ZimingLiu2025ICLR has learnable activation functions on all edges ("weights'') -- every weight parameter is replaced by a univariate function parametrized as a spline. It's claimed to have better sample efficiency, parameter efficiency, & interpretability than [[DavidERumelhart1986Nature|MLP]], thus suitable for AI4Science tasks. However, it's much slower to train #🔥
AndreasChristianSchneider2025ICLR proposes using Partial Information Decomposition as a local objective for neurons training and thus achieves neuron-level interpretability

Probabilistic methods

Bayesian optimization
- ZhitongXu2025ICLR demonstrates that, on some problems, Bayesian optimization with standard gaussian process can perform well on high-dimensional by simply using $\sqrt d$ length-scales
- SeunghunLee2025ICLR proposes to use (invertible) normalizing flows to solve the mismatch problem of latent Bayesian optimization arises from the reconstruction gap between input and latent spaces
- KacperWyrwal2025ICLR proposes a technique for manifold-to-manifold Gaussian process regression. Similar in spirit to [[KaimingHe2016CVPR|ResNet]], it allows reverting to shallow models when additional complexity is unneeded
Variational inference
- ByoungwooPark2025ICLR proposes a multi-marginal Doob's-transform for irregular time series and variational inference with stochastic optimal control to approximate it
Sampling
- BadrMOUFAD2025ICLR proposes a diffusion-based method for posterior sampling in diffusion models based on decomposition of the transitions which allows a trade-off between the complexity of the intractable guidance term and that of the prior transitions
SiyuChen2025ICLR proposes a unified gradient-based algorithm for feature learning in Gaussian single-index model with sample complexity matching the SQ lower bound

Supervision

Analysis

Theoretic
- PatrikReizinger2025ICLR+ proves that models trained with cross-entropy in supervised learning can recover latent factors of the data-generating process up to a linear transformation
- JingyangLi2025ICLR theoretically justifies why FixMatch-like self-supervised learning methods outperform supervised learning (SL) in generalization for deep networks, showing that FixMatch learns all class features while SL captures only a subset. It also proposes an enhanced version of FixMatch
- KrishnaBalasubramanian2025ICLR provides a convergence analysis of the Stein Variational Gradient Descent (SVGD) algorithm in its full formulation, i.e., using finitely many particles and in discrete time
  P.S. Praised by the reviewers as providing a long sought result
- SungyoonKim2025ICLR studies the loss landscape of regularized ReLU networks based on convex duality, focusing on the structure of stationary points, the connectivity of optimal solutions and the non uniqueness of optimal solutions. The authors starts with a two-layer network with scalar output and considers extensions to minimal norm interpolation, vector-valued networks, and deep neural networks #🧠
  Praised by reviewers as insightful, especially the "staircase of connectivity" phenomenon
- ArthurJacot2025ICLR+ shows that neural collapse provably holds in the end-to-end training of the model with weight decay, when low training error, balancedness of linear layers, and bounded conditioning of pre-linear features are meet
Experimental
- JiachenTWang2025ICLR addresses an important research gap of quantifying data influence in different stages via trajectory-specific leave-one-out (LOO) influence, which is approximated with data value embedding. It reveals that data points in the early and late stages of training exert a greater impact, leading to practicable data selection strategy
- ZhuangLiu2025ICLR demonstrates that despite the diversity and scale of current datasets, dataset bias persists, as evidenced by training neural networks to classify which dataset a sample belongs to
Interpretability
- Weight attribution
  - ChingLamChoi2025ICLR identifies that gradient-based attribution methods with static baselines imposes unintended biases on attribution maps, leading to fragility and unfaithful interpretations. It proposes to compute baselines by perturbing inputs in an "unlearning" direction to erase salient features while maintaining model-specific properties
- Data attribution
  - JiachenTWang2025ICLR+ proposes In-run Data Shapley that requires only one model training run to evaluate the contributions of the data samples to the model. It's of great interest for scientific, technical and legal concerns (privacy, copyright)

Reinforcement learning

ChenJiang2025ICLR uses a biologically-inspired stochastic continuous Hopfield network to address the exploration-exploitation dilemma. It can perform posterior sampling with tunable uncertainty bias, matching human and animal choice patterns in multi-armed bandit (MAB) tasks
Data
- MichaelMatthews2025ICLR procedurally generates tens of millions of 2D physics-based tasks and using these to train a general reinforcement learning (RL) agent for physical control, mimicking the large scale pre-training that has prevailed in language/vision domains. It exhibits strong zero-shot physical reasoning capabilities in 2D space
Skill learning
- MartinKlissarov2025ICLR+ feeds natural language description of a skill to LLM to automatically design rewards, generate code, and then learn the skill via reinforcement learning
- ChongyiZheng2025ICLR proposes a new method (contrastive successor features), which works within the paradigm of mutual information skill learning. The new method matches the performance of METRA, a method based on optimal transport, which achieves state of the art performance
- PoWeiHuang2025ICLR integrates an option network into the MuZero algorithm, which autonomously discovers options through self-play games and utilizes options during hierarchical planning
  It makes options really work at scale for RL
- RenhaoWang2025ICLR uses conditional diffusion models to generate samples for experience replay in RL and pushes these generations towards more useful parts of an agent's acquired history with relevance functions
Decision
- DixantMittal2025ICLR introduces a new differentiable neural tree search architecture that learns directly from data trajectories, effectively embedding a search-like inductive bias into the neural network weights. It can be trained from just demonstration sequences, where search & exploration behavior are missing
- JuanAgustinDuque2025ICLR introduces Advantage Alignment, a new family of algorithms for opponent shaping in general-sum games, designed to promote cooperation and avoid suboptimal outcomes. It unifies the existing opponent shaping methods and simplifies the mathematical formulation
- MathiasJackermeier2025ICLR presents a method to perform multi-task RL with linear temporal logic (LTL) specifications, leveraging two recent innovations: eventual-discounting and goal-conditioned RL, to create RL agents that can zero-shot generalize to wide range of specifications
- JiajianLi2025ICLR introduces LS-Image, a model-based RL method that uses hierarchical imagination to solve MineDojo tasks. The key idea is to use a short-term model for step-by-step transitions and a long-term one for multi-step transitions guided by learned affordance maps, which are computed using the MineCLIP reward model to identify task-specific spatial regions in the pixel space and then to guide intrinsic rewards and long-horizon state predictions
- KaustubhSridhar2025ICLR constructs generalist agents that can adapt to new environments via retrieval-augmented policy that retrieves nearby states from demonstrations. It introduces advancements in LLM to RL, such as RAG and in-context learning
- EricMazumdar2025ICLR incorporates risk aversion and bounded rationality from behavioral economics into multi-agent reinforcement learning (MARL). It defines a class of risk-averse quantal response equilibria (RQE) which, under certain adjustments, are no-regret learnable in both $n$ -player matrix games and finite-horizon Markov games. Importantly, the tractability of RQE is determined by the agents' degree of risk aversion and bounded rationality rather than the underlying game structure
Theoretical
- SaketTiwari2025ICLR proposes a new regularizer for actor-critic methods that work in continuous action spaces. The regularizer works by constraining the state to be in a low-dimensional manifold, backed by theoretical & empirical proofs
- RunzheWu2025ICLR+ provides a computationally tractable algorithm for the linear Bellman complete setting, which remains an open question for years. The key gradient is randomization which ensures optimism while circumventing a subtle error amplification issue
- HyunKyuLee2025ICLR investigates the relationship between flat minima and robustness in RL, finding that flatter minima correspond to more robust policies
  Great visuals
Analysis
- ThomasBush2025ICLR uses concept-based interpretability to provide the first non-behavioural evidence that model-free agents can learn to plan over a set of concepts that are implicitly encoded in the learnt internal representation

Post-training learning

LoRA
- HiRA QiushiHuang2025ICLR+ address the expressiveness issue of LoRA by using a Hadamard product to retain high-rank update parameters
- LoRA-RITE JuiNanYen2025ICLR proposes an adaptive matrix preconditioning method for LoRA optimization to achieve transformation invariance, which can mitigate the dependence on how LoRA factors are scaled and rotated to avoid sub-optimal solutions and improve the representation capability
- SD-LoRA YichenWu2025ICLR continually separates the learning of the magnitude and direction of LoRA components, enabling incremental learning of task-specific LoRA while maintaining the optimization directions of previous tasks
Distillation
- AbhishekPanigrahi2025ICLR+ investigates why progressive distillation can mitigate the challenge of better teacher doesn't always lead to a better student. It identifies that its benefits stems from an "implicit curriculum" embedded within these intermediate teacher checkpoints, which accelerates the optimization process of the student
Continuous learning
- GangweiJiang2025ICLR studies the phenomenon of catastrophic forgetting using Function Vectors and found that task similarity is correlated with the amount of forgetting. It proposes two solution: (ii) intervening on the trained model using function vectors of previous tasks and (iii) training the model with additional regularization with the function vectors of previous tasks
- ZhuoxiaoChen2025ICLR proposes a test-time adaptation framework for LiDAR-based 3D object detection. The main idea is to dynamically select and assemble historical checkpoints to build a composite "super model" that adapts to domain shifts #🧠
- SongTang2025ICLR uses vision-language models (VLMs) for source-free domain adaptation (SFDA). The major contribution is addressing the noise of VLMs' supervision with proxy denoising (ProDe) before target adaptation

Federated learning

SungwonKim2025ICLR uses synthetic global data generated from reliable node types to tackle challenges such as missing classes and mutable graph structures in federated graph learning
GuanchengWan2025ICLR tackles the challenge of backdoor attacks in Federated Graph Learning with Topological Graph Energy. At the client level, it uses energy modeling to distinguish between benign and malicious samples, while at the server level, it creates a global energy graph for energy propagation to detect and filter out malicious clients
WenjingYan2025ICLR achieves parameter-free FL through normalized gradient updates performed locally, which ensure that the step size and momentum parameters do not depend on problem-specific parameters like smoothness. It also incorporates control variate-based drift correction and momentum-based variance reduction

Architecture search

ArminWThomas2025ICLR proposes an architecture search framework with hierarchical design spaces based on linear input-varying systems. It encodes architectures as genomes optimized via evolutionary algorithms to achieve superior trade-offs across model quality, size, and cache efficiency

Optimization algorithm

LesiChen2025ICLR improves the complexity of second-order methods for convex-concave minimax problem by reusing the Hessian information
SiteBai2025ICLR provides a theoretical analysis on the oracle complexity of minimizing convex functions with varying degrees of high-order smoothness $p$ and degrees of uniform convexity $q$
Highway-BP ErwanFagnou2025ICLR speeds up training of sequential DNN models by parallelization of the gradient calculation. It approximates gradient by iterative computation from different residual paths in parallel while pruning some paths

Do you have any ideas or comments? Please join the discussion on X👇