The purpose of a Document Type Definition or DTD is to define the structure of a document encoded
in XML (eXtended Markup Language).
It is possible to build and use files containing XML tags without ever defining what tags are legal.
However, if you want to insure that files conform to a known
structure, writing a DTD is the preferred method.
Two definitions:
- A well-formed file is one that obeys the general XML rules for tags: tags must be properly nested,opening and closing tags must be balanced, and empty tags must end with '/>'.
- A valid file is not only well-formed, but it must also conform to a publicly available DTD that specifies
which tags it uses, what attributes those tags can contain, and which tags can occur inside which
other tags, among other properties.
Definitions :
We need to review some terminology before proceeding:
- A proper XML name must start with a letter or underbar (_), with the rest letters, underbars, digits, or hyphen (-).
- A tag is one of the XML constructs used to mark up documents. All tags start with a less-than symbol (<) and end with a greater-than symbol (>).
- An element is a section of an XML document that acts as a unit. It may be either empty element, or it may have content.
- An empty element consists of a single tag of the form
<gi.../>
Where gi is the tag type (or “generic identifier”), and the tag may include attributes. Note the slash
before the closing “>”; this signifies an empty tag.
- An opening tag begins a section of an XML document that ends with the corresponding closing tag. an opening tag has this form:
<gi...>
where gi is the tag type (or “generic identifier”), and the tag may include attributes. A closing tag
has the form:
</gi>
- The content is everything between the opening tag and its corresponding closing tag. The content may be other elements or just plain text.
Where does a DTD live?
- External DTD: You can put the DTD in a separate file from the XML file, and refer to it using a <!DOCTYPE ...> line at the beginning of the XML file. The advantage of this method is that many XML files can all refer to the same DTD.
- Internal DTD: You can put the DTD inside the <!DOCTYPE ...> declaration at the front of the XML file. The advantage of an internal DTD is that the file can be validated by itself, without reference to any external files.
Linking an XML file to an external DTD :
If your XML file is supposed to conform to an external DTD, place a declaration of this form at the beginning of the XML file:
<?xml version="1.0"?> <!DOCTYPE root-name SYSTEM "dtd-name.dtd">
where root-name is the name of the root (highest-level) element of the document, and dtd-name.dtd
is the name of the file containing the DTD.
Including the DTD inside your XML file :
To include the DTD in an XML file, the file should start like this:
<?xml version='1.0'?> <!DOCTYPE root-name [ dtd-declarations ... ]>
Here's an example of a complete XML file with an internal DTD that defines two element types: a root element <park> and a second-level element <trail>. We'll explain the pieces of the DTD later on.
<?xml version="1.0"?> <!DOCTYPE park [ <!ELEMENT park (trail*)> <!ATTLIST park name CDATA #IMPLIED> <!ELEMENT trail (#PCDATA)> <!ATTLIST trail dist CDATA #REQUIRED climb CDATA #REQUIRED> ]> <park name='Lincoln Natural Forest'> <trail dist='3400' climb='medium'>Canyon Trail</trail> <trail climb='easy' dist='1200'>Pickle Madden Trail</trail> </park>
Types of DTD declarations :
- Element declarations let you specify what kinds of tags can be used, and what (if anything) can appear inside the contents of the element.
- Attribute declarations define what attributes you can use inside a given element.
- Entity declarations define chunks of fixed text that can be included elsewhere.
- Notation declarations define file types (like JPG and WAV files) so you can refer to non-XML files like image and sound files.
Element declarations :
In a DTD, an element declaration defines one of the kinds of elements you can use, that is, one of the
tag types.
All element declarations have this general form:
<!ELEMENT gi (content)>
where gi is the element name (also called the “generic identifier”) and the content describes what
content (if any) can go inside the element. The generic identifier must follow the rules for XML names, above.
The content part describes the syntax of the element's content using a general notation with a number
of different parts. The next few sections describe the items that can go into the content.
Declaring empty elements :
If you don't want a certain element to have any content, that is, you want that element always to be
represented by an empty tag (see above), use this element declaration:
<!ELEMENT gi (EMPTY)>
For example, if your DTD contains this declaration:
<!ELEMENT pagebreak (EMPTY)>
then an XML document conforming to this DTD could contain a tag that looks like:
<pagebreak/>
Attribute declarations :
If an element is to have attributes, the names and possible values of those attributes must be declared
in the DTD. Here is the general form:
<!ATTLIST ename {aname atype default} ...>
where ename is the name of the element for which you're defining attributes, aname is the name of one of that element's possible attributes, atype describes what values it can have, and default describes whether it has a default value. The last three items can be repeated inside an <!ATTLIST...> declaration, one group per attribute. The atype part describing the attribute's type can have three kinds of values:
- The keyword CDATA means that the attribute can have any character string as a value.For example, suppose you want every <play> element to have a title attribute that can containany text, and that attribute is required. Here is the complete attribute declaration:
<!ATTLIST play title CDATA #REQUIRED>
- There are several tokenized attribute types, which are required to have a certain structure. See tokenizedattributes below.
- You can provide a specific set of legal values for the attribute; see enumerated attributes below.
The last part of the declaration, default, specifies whether the attribute can be omitted, and what
value it will have if omitted. This must be one of the following:
#REQUIRED
The attribute must always be supplied.
#IMPLIED
The attribute can be omitted, and the DTD does not provide a default value. Anyone reading this
file may assume a default value, but that is not the DTD's problem.
"value"
The attribute can be omitted, and the default value is the quoted string that you provide.
#FIXED "value"
The attribute must be given and must have the given "value".
0 Comments:
Post a Comment