Rank-DETR for high quality object detection
Modern detection transformers (DETRs) use a set of object queries to predict a list of
bounding boxes, sort them by their classification confidence scores, and select the top …
bounding boxes, sort them by their classification confidence scores, and select the top …
Train once, get a family: State-adaptive balances for offline-to-online reinforcement learning
Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-
training on a pre-collected dataset with fine-tuning in an online environment. However, the …
training on a pre-collected dataset with fine-tuning in an online environment. However, the …
Understanding, predicting and better resolving Q-value divergence in offline-RL
The divergence of the Q-value estimation has been a prominent issue offline reinforcement
learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs …
learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs …
Counterfactual-augmented importance sampling for semi-offline policy evaluation
In applying reinforcement learning (RL) to high-stakes domains, quantitative and qualitative
evaluation using observational data can help practitioners understand the generalization …
evaluation using observational data can help practitioners understand the generalization …
Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced.
To address this, existing methods often constrain the learned policy through policy …
To address this, existing methods often constrain the learned policy through policy …