DINO-X MCP Tutorial Part1: How to Connect to MCP Server

I. What is DINO-X MCP?

MCP, short for Model Context Protocol, is a protocol released by Anthropic. It aims to provide large language models (LLMs) with a standardized way to access external context data sources and tools.

MCP serves as a bridge connecting external tools and agents. Through MCP, developers can add various third-party tools or services to agents, enabling them to have stronger task execution and understanding capabilities. For example, users can obtain accurate descriptions of objects or scenes by integrating "object detection services + large language models"; they can also carry out comprehensive travel itinerary and route planning by integrating "large language models + travel services + map services".

In short, MCP allows AI to break free from the limitations of its own capabilities and rely on powerful external tools to perform more professional and complex development tasks. DINO-X MCP, on the other hand, provides users with object detection services based on the SOTA DINO-X vision model available.

II. Why Do We Need DINO-X MCP?

Although many multimodal models on the market currently have the ability to understand and describe images, they often lack precise positioning of visual content and high-quality structured output, and are prone to falling into the empirical misunderstandings of massive prior knowledge.

图1-1.png

By using DINO-X MCP, you can integrate more accurate object detection effects into your workflow, including but not limited to:

(1) Achieving fine-grained understanding of images, including full-image recognition, directional detection, etc.;

(2) Accurately obtaining the quantity, position, and attributes of objects, and using this as a basis for tasks such as image question answering;

(3) Combining with other MCP Servers to more flexibly and quickly build multi-step visual workflows;

(4) Building natural language-driven visual agents for automated tasks in real scenarios.

III. Application Scenarios of DINO-X MCP

DINO-X is like the eyes of AI in the real world, helping AI better see the world and interact with reality. Almost all application scenarios involving real-world content can integrate DINO-X MCP into their workflows. The following are 5 common cases:

(1) Object Detection and Localization

For example, through flame detection, identify and locate the position of a fire in the woods.

图2.png

(2) Object Counting

Mark and count all specified objects in a specific scene, such as counting the number of cartons in a warehouse.

图3.png

(3) Feature Detection

Detect and locate target objects according to user-specified features (attributes, positions, etc.), such as all red cars in a parking lot.

图4.png

(4) Pose Analysis

By detecting key points of target objects (people, animals, etc.) and combining with large language models to analyze the meaning of the target object's pose, such as yoga pose analysis.

图 5 英.png

(5) Diet Monitoring

Accurately detect the types and quantities of food in the diet to help users achieve more precise diet monitoring. Such as calorie calculation, nutritional analysis, etc.

图 6-3 英.png

IV. How to Connect to DINO-X MCP Server

Any AI programming IDE that supports MCP Server can use DINO-X MCP, including but not limited to Cursor, Trae, WindSurf, etc.

DINO-X MCP has two versions: online and local deployment. The online version does not require any environment installation; while local deployment requires installing a dependent environment, and currently only supports Node.js's npx. Therefore, if you want to connect to DINO-X MCP through local deployment, please first go to Node.js official website to download the installation package.

1. Cursor IDE

(1) On the Cursor IDE page, click the "Tools & Integrations" button on the left and then you'll see the "Add Custom MCP" options. Then enter the MCP settings interface.

(2) Fill in the corresponding JSON content in the pop-up JSON configuration page:

a. For online calling only, the JSON configuration is:

{
  "mcpServers": {
    "dinox-mcp": {
      "url": "https://mcp.deepdataspace.com/mcp?key=your-api-key"
    }
  }
}

The 'your-api-key' field in the code needs to be replaced with the API key from the DINO-X Platform.

b. For local deployment, the JSON configuration is:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "npx",
      "args": ["-y", "@deepdataspace/dinox-mcp"],
      "env": {
        "DINOX_API_KEY": "you-api-key-here",
        "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
      }
    }
  }
}

The 'your-api-key-here' field in the code needs to be replaced with the API key from the DINO-X Platform; the '/path/to/your/image/directory' field should be replaced with a local path (such as a specified folder path).

Finally, Close and Save.

(3) Return to the settings interface and navigate to the 'Tools & Integrations' page, and you will see that DINO-X MCP has already been configured successfully and is activated by default.

(4) You can also select your favorite model on the "Models" page.

(5) Press the CTRL/CMD + L shortcut key to open the dialog box on the right side of the editor. Select "Agent" mode to call DINO-X MCP. Currently, Cursor IDE supports direct image upload. If you want to provide images through local paths or http links in the dialogue, you need to strictly follow the format of 'file:///path/to/your/image/directory' or 'https://'.

2. Trae IDE

(1) Enter the Trae IDE page, click the settings button in the upper right corner to enter the "AI Managent" settings interface.

trae 1 英文.png

(2) In the "Agents" settings interface, select the "MCP" option and choose "Add Manually".

trae2 英文.png

(3) Fill in the corresponding JSON configuration content in the pop-up manual configuration box. For example:

a. For online calling only, the JSON configuration is:

{
  "mcpServers": {
    "dinox-mcp": {
      "url": "https://mcp.deepdataspace.com/mcp?key=your-api-key"
    }
  }
}

The 'your-api-key' field in the code needs to be replaced with the API key from the DINO-X Platform.

b. For local deployment, the JSON configuration is:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "npx",
      "args": ["-y", "@deepdataspace/dinox-mcp"],
      "env": {
        "DINOX_API_KEY": "you-api-key-here",
        "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
      }
    }
  }
}

After successful configuration, Trae IDE starts MCP by default. At this time, you can directly call DINO-X MCP by returning to the conversation interface. Currently, Trae IDE supports direct image upload. If you want to provide images through local paths or http links in the dialogue, you need to strictly follow the format of 'file:///path/to/your/image/directory' or 'https://'.

trae3 英文.png

Reference Resources

1.Node.js (to install npx environment): https://nodejs.org/

2.Apply for DINO-X API Key: https://cloud.deepdataspace.com/

3.Cursor: https://www.cursor.com/

4.Trae: https://www.trae.ai/