A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning

IAM Huijben, W Kool, MB Paulus… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by
its unnormalized (log-) probabilities. Over the past years, the machine learning community …

Difusco: Graph-based diffusion solvers for combinatorial optimization

Z Sun, Y Yang - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Abstract Neural network-based Combinatorial Optimization (CO) methods have shown
promising results in solving various NP-complete (NPC) problems without relying on hand …

Pomo: Policy optimization with multiple optima for reinforcement learning

YD Kwon, J Choo, B Kim, I Yoon… - Advances in Neural …, 2020 - proceedings.neurips.cc
In neural combinatorial optimization (CO), reinforcement learning (RL) can turn a deep
neural net into a fast, powerful heuristic solver of NP-hard problems. This approach has a …

Dimes: A differentiable meta solver for combinatorial optimization problems

R Qiu, Z Sun, Y Yang - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Recently, deep reinforcement learning (DRL) models have shown promising results in
solving NP-hard Combinatorial Optimization (CO) problems. However, most DRL solvers …

Rmm: Reinforced memory management for class-incremental learning

Y Liu, B Schiele, Q Sun - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Abstract Class-Incremental Learning (CIL)[38] trains classifiers under a strict memory
budget: in each incremental phase, learning is done for new data, most of which is …

Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning

Q Ma, S Ge, D He, D Thaker, I Drori - arXiv preprint arXiv:1911.04936, 2019 - arxiv.org
In this work, we introduce Graph Pointer Networks (GPNs) trained using reinforcement
learning (RL) for tackling the traveling salesman problem (TSP). GPNs build upon Pointer …

A-nesi: A scalable approximate method for probabilistic neurosymbolic inference

E van Krieken, T Thanapalasingam… - Advances in …, 2023 - proceedings.neurips.cc
We study the problem of combining neural networks with symbolic reasoning. Recently
introduced frameworks for Probabilistic Neurosymbolic Learning (PNL), such as …

Rlhf can speak many languages: Unlocking multilingual preference optimization for llms

J Dang, A Ahmadian, K Marchisio, J Kreutzer… - arXiv preprint arXiv …, 2024 - arxiv.org
Preference optimization techniques have become a standard final stage for training state-of-
art large language models (LLMs). However, despite widespread adoption, the vast majority …

Learn to design the heuristics for vehicle routing problem

L Gao, M Chen, Q Chen, G Luo, N Zhu, Z Liu - arXiv preprint arXiv …, 2020 - arxiv.org
This paper presents an approach to learn the local-search heuristics that iteratively improves
the solution of Vehicle Routing Problem (VRP). A local-search heuristics is composed of a …

A reinforcement learning approach to the orienteering problem with time windows

R Gama, HL Fernandes - Computers & Operations Research, 2021 - Elsevier
Abstract The Orienteering Problem with Time Windows (OPTW) is a combinatorial
optimization problem where the goal is to maximize the total score collected from different …