Buy 4 reinforce samples, get a baseline for free!

IAM Huijben, W Kool, MB Paulus… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by
its unnormalized (log-) probabilities. Over the past years, the machine learning community …

被引用次数：103 相关文章所有 9 个版本

[PDF] neurips.cc

Difusco: Graph-based diffusion solvers for combinatorial optimization

Z Sun, Y Yang - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc

Abstract Neural network-based Combinatorial Optimization (CO) methods have shown
promising results in solving various NP-complete (NPC) problems without relying on hand …

被引用次数：88 相关文章所有 7 个版本

[PDF] neurips.cc

Pomo: Policy optimization with multiple optima for reinforcement learning

YD Kwon, J Choo, B Kim, I Yoon… - Advances in Neural …, 2020 - proceedings.neurips.cc

In neural combinatorial optimization (CO), reinforcement learning (RL) can turn a deep
neural net into a fast, powerful heuristic solver of NP-hard problems. This approach has a …

被引用次数：326 相关文章所有 8 个版本

[PDF] neurips.cc

Dimes: A differentiable meta solver for combinatorial optimization problems

R Qiu, Z Sun, Y Yang - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Recently, deep reinforcement learning (DRL) models have shown promising results in
solving NP-hard Combinatorial Optimization (CO) problems. However, most DRL solvers …

被引用次数：77 相关文章所有 7 个版本

[PDF] neurips.cc

Rmm: Reinforced memory management for class-incremental learning

Y Liu, B Schiele, Q Sun - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Abstract Class-Incremental Learning (CIL)[38] trains classifiers under a strict memory
budget: in each incremental phase, learning is done for new data, most of which is …

被引用次数：109 相关文章所有 14 个版本

[PDF] arxiv.org

Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning

Q Ma, S Ge, D He, D Thaker, I Drori - arXiv preprint arXiv:1911.04936, 2019 - arxiv.org

In this work, we introduce Graph Pointer Networks (GPNs) trained using reinforcement
learning (RL) for tackling the traveling salesman problem (TSP). GPNs build upon Pointer …

被引用次数：251 相关文章所有 3 个版本

[PDF] neurips.cc

A-nesi: A scalable approximate method for probabilistic neurosymbolic inference

E van Krieken, T Thanapalasingam… - Advances in …, 2023 - proceedings.neurips.cc

We study the problem of combining neural networks with symbolic reasoning. Recently
introduced frameworks for Probabilistic Neurosymbolic Learning (PNL), such as …

被引用次数：37 相关文章所有 8 个版本

[PDF] arxiv.org

Rlhf can speak many languages: Unlocking multilingual preference optimization for llms

J Dang, A Ahmadian, K Marchisio, J Kreutzer… - arXiv preprint arXiv …, 2024 - arxiv.org

Preference optimization techniques have become a standard final stage for training state-of-
art large language models (LLMs). However, despite widespread adoption, the vast majority …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Learn to design the heuristics for vehicle routing problem

L Gao, M Chen, Q Chen, G Luo, N Zhu, Z Liu - arXiv preprint arXiv …, 2020 - arxiv.org

This paper presents an approach to learn the local-search heuristics that iteratively improves
the solution of Vehicle Routing Problem (VRP). A local-search heuristics is composed of a …

被引用次数：78 相关文章所有 3 个版本

[PDF] arxiv.org

A reinforcement learning approach to the orienteering problem with time windows

R Gama, HL Fernandes - Computers & Operations Research, 2021 - Elsevier

Abstract The Orienteering Problem with Time Windows (OPTW) is a combinatorial
optimization problem where the goal is to maximize the total score collected from different …

被引用次数：51 相关文章所有 4 个版本