Vnvc: A versatile neural video coding framework for efficient human-machine vision

X Sheng, L Li, D Liu, H Li - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024ieeexplore.ieee.org
Almost all digital videos are coded into compact representations before being transmitted.
Such compact representations need to be decoded back to pixels before being displayed to
humans and–as usual–before being enhanced/analyzed by machine vision algorithms.
Intuitively, it is more efficient to enhance/analyze the coded representations directly without
decoding them into pixels. Therefore, we propose a versatile neural video coding (VNVC)
framework, which targets learning compact representations to support both reconstruction …
Almost all digital videos are coded into compact representations before being transmitted. Such compact representations need to be decoded back to pixels before being displayed to humans and – as usual – before being enhanced/analyzed by machine vision algorithms. Intuitively, it is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. Therefore, we propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis, thereby being versatile for both human and machine vision. Our VNVC framework has a feature-based compression loop. In the loop, one frame is encoded into compact representations and decoded to an intermediate feature that is obtained before performing reconstruction. The intermediate feature can be used as reference in motion compensation and motion estimation through feature-based temporal context mining and cross-domain motion encoder-decoder to compress the following frames. The intermediate feature is directly fed into video reconstruction, video enhancement, and video analysis networks to evaluate its effectiveness. The evaluation shows that our framework with the intermediate feature achieves high compression efficiency for video reconstruction and satisfactory task performances with lower complexities.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果