Webformer: The web-page transformer for structure information extraction

K Lee, M Joshi, IR Turc, H Hu, F Liu… - International …, 2023 - proceedings.mlr.press

Visually-situated language is ubiquitous—sources range from textbooks with diagrams to
web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to …

被引用次数：242 相关文章所有 7 个版本

[PDF] mdpi.com

Towards future internet: The metaverse perspective for diverse industrial applications

P Bhattacharya, D Saraswat, D Savaliya, S Sanghavi… - Mathematics, 2023 - mdpi.com

The Metaverse allows the integration of physical and digital versions of users, processes,
and environments where entities communicate, transact, and socialize. With the shift …

被引用次数：89 相关文章所有 6 个版本

[PDF] ijcai.org

[PDF][PDF] Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning.

L Yan, C Han, Z Xu, D Liu, Q Wang - IJCAI, 2023 - ijcai.org

Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches
have been introduced to learn fixed textual or visual prompts while freezing the pre-trained …

被引用次数：40 相关文章所有 3 个版本

Label-efficient video object segmentation with motion clues

Y Lu, J Zhang, S Sun, Q Guo, Z Cao… - … on Circuits and …, 2023 - ieeexplore.ieee.org

Video object segmentation (VOS) plays an important role in video analysis and
understanding, which in turn facilitates a number of diverse applications, including video …

被引用次数：15 相关文章

Feature fusion Vision Transformers using MLP-Mixer for enhanced deepfake detection

E Essa - Neurocomputing, 2024 - Elsevier

Deepfake technology, utilizing deep learning and computer vision, presents significant
security threats by generating highly realistic synthetic media, such as images and videos. In …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Weblinx: Real-world website navigation with multi-turn dialogue

XH Lù, Z Kasner, S Reddy - arXiv preprint arXiv:2402.05930, 2024 - arxiv.org

We propose the problem of conversational web navigation, where a digital agent controls a
web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue …

被引用次数：29 相关文章所有 4 个版本

[PDF] aclanthology.org

Learning to generate question by asking question: A primal-dual approach with uncommon word generation

Q Wang, L Yang, X Quan, F Feng, D Liu… - Proceedings of the …, 2022 - aclanthology.org

Automatic question generation (AQG) is the task of generating a question from a given
passage and an answer. Most existing AQG methods aim at encoding the passage and the …

被引用次数：10 相关文章

[PDF] aaai.org

WIERT: web information extraction via render tree

Z Li, B Shao, L Shou, M Gong, G Li… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Web information extraction (WIE) is a fundamental problem in web document understanding,
with a significant impact on various applications. Visual information plays a crucial role in …

被引用次数：6 相关文章所有 2 个版本

[PDF] ieee.org

A triangulation-based visual localization for field robots

J Liang, Y Wang, Y Chen, B Yang… - IEEE/CAA Journal of …, 2022 - ieeexplore.ieee.org

Dear Editor, Visual localization relies on local features and searches a pre-stored GPS-
tagged image database to retrieve the reference image with the highest similarity in feature …

被引用次数：20 相关文章所有 6 个版本

[PDF] aclanthology.org

Smartave: Structured multimodal transformer for product attribute value extraction

Q Wang, L Yang, J Wang, J Krishnan… - Findings of the …, 2022 - aclanthology.org

Automatic product attribute value extraction refers to the task of identifying values of an
attribute from the product information. Product attributes are essential in improving online …

被引用次数：14 相关文章所有 2 个版本