DDS-LOGO

Grounding DINO 1.5 Edge

Wenlong Liu · May 17, 2024

We introduce Grounding DINO 1.5 Edge, which is our most efficient, edge-optimized open-set object detection model. We replace the original feature enhancer with our newly proposed efficient one and employ EfficientViT-L1 as the image backbone for rapid multi-scale feature extraction. In our efficient feature enhancer, we limit cross-modality fusion to high-level image features (P5 level) only and introduce a cross-scale feature fusion module to integrate low-level image feature(P3, P4). This approach balances feature enhancement with computational efficiency.

image

Figure 1: Comparison of (a) Origin Feature Enhancer and (b) New Efficient Feature Enhancer.

State-of-the-art Zero-Shot Performance

After pre-training on Grounding-20M, we directly evaluate Grounding DINO 1.5 Edge on the COCO dataset and LVIS dataset in a zero-shot manner. The main results are shown in Table 1. Speed measurement is performed on the A100 GPU, expressed in frames per second (FPS). The format used is PyTorch speed / TensorRT FP32 speed, and FPS on NVIDIA Orin NX also reported. †indicates results of YOLO-World are reproduced by the latest official codes. ‡ indicates it used language cache, which did not calculate the latency of text encoder.

image

Table 1: Zero-shot Results of Grounding DINO 1.5 Edge on COCO and LVIS.

Compared with current real-time open-set detectors in an end-to-end test setting, which do not use language cache, Grounding DINO 1.5 Edge achieved a zero-shot AP of 45.0 on COCO. Regarding the zero-shot performance on LVIS-minival, Grounding DINO 1.5 Edge achieves remarkable performance, an AP score of 36.2, which surpasses all other state-of-the-art algorithms (OmDet-Turbo-T 30.3 AP, YOLO-Worldv2-L 32.9 AP, YOLO-Worldv2-M 30.0 AP, YOLO-Worldv2-S 22.7 AP). Notably, deploying Grounding DINO 1.5 Edge model optimized with TensorRT on NVIDIA Orin NX achieves an inference speed of over 10 FPS at an input size of 640x640.

Acknowledgment

We would like to thank everyone involved in the Grounding DINO 1.5 project, including application lead Wei Liu, product manager Qin Liu and Xiaohui Wang, front-end developers Yuanhao Zhu, Ce Feng, and Jiongrong Fan, back-end developers Weiqiang Hu and Zhiqiang Li, UX designer Xinyi Ruan, tester Yinuo Chen, and Zijun Deng for helping with demo videos.

Try Grounding DINO 1.5 Edge Demo

Read Grounding DINO 1.5 Paper

Go to Github

Apply for API