Securing XML documents can be very complex. There are several issues that are unique to the XML documents. There are several key security issues that must be addressed:
· Since data can be collected from a wide range of sources, the validity of the source and accuracy of the data need to be checked. A hacker could introduce spoofed data into the XML document.
· Due to the processing and storage of data in separate applications, systems and/or locations, integrity of the data becomes a major concern. Transmission of documents between these heterogeneous environments could alter the original document.
· One of the major benefits of XML is that it allows documents to be more efficiently shared between multiple users with different interests. For instance, in Figure 1, the given document can be shared amongst several types of users such as other professors, students, payroll department etc.
· A professor interested in collaboration in a research paper will be interested in the subject’s ‘name’, ‘education’ and ‘research’, but perhaps has no interest in knowing the ‘teaching’ and ‘salary’ information.
· A student interested in taking a course should be provided with ‘name’, and ‘teaching’ information, but has no interest in the ‘education’ or ‘research’ of the professor. The student should be prohibited from viewing personal details such as ‘address’ or even ‘salary’.
· A University payroll administrator is only interested in the ‘name’, ‘address’ and ‘salary’ information of this professor, and has no use of the extra information about ‘education’, ‘research’, and ‘teaching’.
Thus, due to the nested structure of the document, select portions of the document need to be protected from different types of users. There is a need for security measures that allow confidential dispersion of information, based on the heterogeneous classifications of users that will be accessing the information.
Therefore, there are three key security issues that need to be ensured when securing XML documents:
1. Authentication – Subject receiving a document must be assured that the document comes from the source it claims to be from.
2. Integrity – Contents of documents are not altered during its transmission from the source(s) to the intended recipient.
3. Confidentiality – Selected document contents can only be disclosed to authorized subjects based on access control policies.[1]
XML (eXtentsible Markup Language) is currently the most valuable mechanism for data exchange across the Internet. Private and public enterprises are increasingly using the “Web as the main source of information dissemination”[1]. This can take the form of providing documents upon user requests or actively broadcasting documents to interested clients.
Due to the widespread use of XML on the Internet, there is a strong need for mechanisms for securing XML documents. Since XML is primarily Internet based it provides the opportunity for other to sniff or spoof information. Traditional methods of establishing trust aren’t applicable on the Internet.
XML Signatures can be applied to any digital content. There are several reasons why XML Signatures are more effective than alternative mechanisms for protecting data in transit such as SSL. Firstly, XML Signatures “entwined within the XML data are more portable since it is part of the data rather than a sub-layer that is stripped off as the data is retrieved from the network. Secondly, an XML Signature may be applied to a complete document, parts of a single document or many documents from many sources”[5].
2. XML Basics
The basic building blocks of an XML document are tagged elements. These elements can be nested at any depth and can contain other elements leading to an eventual hierarchical structure.
Each element is delimited by two tags – start tag and end tag. The start tag is placed at the beginning of the element, and the end tag at the end of the element. These tags are denoted by angle brackets (< and >), and the name inside the tags denotes the tag name, which indicates the type of the element. Each element can contain any number of attributes that provide additional information about the element. An attribute is a pair of words consisting of a name and a value written in the form: x = “y”, where x is the name and y is the value.
In Figure 1 below, the elements are ‘professor’, ‘name’, ‘address’, ‘street’, ‘city’, and ‘course’, while an attribute is ‘country’.
Figure 1
<professor>
<name>William Farmer</name>
<education>Ph.D. Mathematics</education>
<teaching>SE 4C03
<lecturestyle>Presentation</lecturestyle>
<workload>Heavy</workload>
</teaching>
<research>IMPS</research>
<email>wmfarmer@mcmaster.ca</email>
<address Country = “US”>
<street>99 Example Street</street>
<city>Boston</city>
</address>
<salary>$200000</salary>
</professor>
3. Benefits of XML
XML differs from HTML in that it permits information providers to define their own tag elements, allows nesting and parent/child relationships, and defines its grammar for others to use.
“XML as a data container is ideal due to its simplicity, richness of data structure, and being text based the XML document is very easy for humans to understand”[7]. An XML format is also very extendible since it is very easy to create new element names and attributes.
The major benefit of XML is that it allows the presentation of data from
multiple sources. This begs the question:
3.1. How does XML present data from various sources?
1. Information can be collected from sources spread across the Internet into one instance and then presented.
2. Provide pointers and links to applicable information at different locations.
In either case the end result to the user looks the same.