Understanding contrastive representation learning through alignment and uniformity on the hypersphere
Contrastive representation learning has been outstandingly successful in practice. In this
work, we identify two key properties related to the contrastive loss:(1) alignment (closeness)
of features from positive pairs, and (2) uniformity of the induced distribution of the
(normalized) features on the hypersphere. We prove that, asymptotically, the contrastive loss
optimizes these properties, and analyze their positive effects on downstream tasks.
Empirically, we introduce an optimizable metric to quantify each property. Extensive …
work, we identify two key properties related to the contrastive loss:(1) alignment (closeness)
of features from positive pairs, and (2) uniformity of the induced distribution of the
(normalized) features on the hypersphere. We prove that, asymptotically, the contrastive loss
optimizes these properties, and analyze their positive effects on downstream tasks.
Empirically, we introduce an optimizable metric to quantify each property. Extensive …
Abstract
Contrastive representation learning has been outstandingly successful in practice. In this work, we identify two key properties related to the contrastive loss:(1) alignment (closeness) of features from positive pairs, and (2) uniformity of the induced distribution of the (normalized) features on the hypersphere. We prove that, asymptotically, the contrastive loss optimizes these properties, and analyze their positive effects on downstream tasks. Empirically, we introduce an optimizable metric to quantify each property. Extensive experiments on standard vision and language datasets confirm the strong agreement between both metrics and downstream task performance. Directly optimizing for these two metrics leads to representations with comparable or better performance at downstream tasks than contrastive learning.
proceedings.mlr.press
以上显示的是最相近的搜索结果。 查看全部搜索结果