Medical image segmentation review: The success of u-net
Automatic medical image segmentation is a crucial topic in the medical domain and
successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the …
successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the …
[HTML][HTML] A survey of transformers
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …
natural language processing, computer vision, and audio processing. Therefore, it is natural …
Biformer: Vision transformer with bi-level routing attention
As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …
range dependency. However, such power comes at a cost: it incurs a huge computation …
[HTML][HTML] Discovering faster matrix multiplication algorithms with reinforcement learning
Improving the efficiency of algorithms for fundamental computations can have a widespread
impact, as it can affect the overall speed of a large amount of computations. Matrix …
impact, as it can affect the overall speed of a large amount of computations. Matrix …
Evolutionary-scale prediction of atomic-level protein structure with a language model
Recent advances in machine learning have leveraged evolutionary information in multiple
sequence alignments to predict protein structure. We demonstrate direct inference of full …
sequence alignments to predict protein structure. We demonstrate direct inference of full …
Maxvit: Multi-axis vision transformer
Transformers have recently gained significant attention in the computer vision community.
However, the lack of scalability of self-attention mechanisms with respect to image size has …
However, the lack of scalability of self-attention mechanisms with respect to image size has …
Video diffusion models
Generating temporally coherent high fidelity video is an important milestone in generative
modeling research. We make progress towards this milestone by proposing a diffusion …
modeling research. We make progress towards this milestone by proposing a diffusion …
Simvp: Simpler yet better video prediction
Abstract From CNN, RNN, to ViT, we have witnessed remarkable advancements in video
prediction, incorporating auxiliary inputs, elaborate neural architectures, and sophisticated …
prediction, incorporating auxiliary inputs, elaborate neural architectures, and sophisticated …
Transformer quality in linear time
We revisit the design choices in Transformers, and propose methods to address their
weaknesses in handling long sequences. First, we propose a simple layer named gated …
weaknesses in handling long sequences. First, we propose a simple layer named gated …
Cswin transformer: A general vision transformer backbone with cross-shaped windows
Abstract We present CSWin Transformer, an efficient and effective Transformer-based
backbone for general-purpose vision tasks. A challenging issue in Transformer design is …
backbone for general-purpose vision tasks. A challenging issue in Transformer design is …