XML is a hierarchical data format (HTML, the markup language that defines the layout and content of the page you are currently reading, is a subset of XML). In the field of computer vision, XML is most commonly used with the Pascal VOC XML annotation format.