Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders
Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …
Context autoencoder for self-supervised representation learning
We present a novel masked image modeling (MIM) approach, context autoencoder (CAE),
for self-supervised representation pretraining. We pretrain an encoder by making predictions …
for self-supervised representation pretraining. We pretrain an encoder by making predictions …
Hiera: A hierarchical vision transformer without the bells-and-whistles
Modern hierarchical vision transformers have added several vision-specific components in
the pursuit of supervised classification performance. While these components lead to …
the pursuit of supervised classification performance. While these components lead to …
What to hide from your students: Attention-guided masked image modeling
Transformers and masked language modeling are quickly being adopted and explored in
computer vision as vision transformers and masked image modeling (MIM). In this work, we …
computer vision as vision transformers and masked image modeling (MIM). In this work, we …
Masked image modeling with local multi-scale reconstruction
Abstract Masked Image Modeling (MIM) achieves outstanding success in self-supervised
representation learning. Unfortunately, MIM models typically have huge computational …
representation learning. Unfortunately, MIM models typically have huge computational …
A survey on masked autoencoder for self-supervised learning in vision and beyond
Masked autoencoders are scalable vision learners, as the title of MAE\cite {he2022masked},
which suggests that self-supervised learning (SSL) in vision might undertake a similar …
which suggests that self-supervised learning (SSL) in vision might undertake a similar …
Masked modeling for self-supervised representation learning on vision and beyond
As the deep learning revolution marches on, self-supervised learning has garnered
increasing attention in recent years thanks to its remarkable representation learning ability …
increasing attention in recent years thanks to its remarkable representation learning ability …
Torchsparse++: Efficient training and inference framework for sparse convolution on gpus
Sparse convolution plays a pivotal role in emerging workloads, including point cloud
processing in AR/VR, autonomous driving, and graph understanding in recommendation …
processing in AR/VR, autonomous driving, and graph understanding in recommendation …