10_588451 ch05.qxd 4/15/05 9:35 AM Page 67
Marking up your content
After you’ve started your XML document creation with an XML declaration, it’s time to get down to the details of the markup. If you’ve already done your content analysis— the stuff we discuss in Chapter 3 — it’s easy to start mark- ing up your content. (If you haven’t yet ventured into Chapter 3, now may be the time to do so.)
XML elements are the basic building blocks of XML document structure. XML elements can contain other elements and/or text content. XML attributes are used to provide additional information about an element or its content.
Attributes are contained within an element tag. For example, in the following markup
<source sourceType=”Retail”/>
an empty element (source) contains an attribute (sourceType) that adds information about a subcategory (Retail).
You could use two elements to provide the same content:
<source>
<sourceType>Retail</sourceType>
</source>
You find out more using attributes in the section “Adding attributes,” later in this chapter.
Choosing a root element
The XML declaration (in a well-formed document, anyway) is followed by the opening tag of the root element. The root element is the most important ele- ment in any XML document. The root element contains all other elements — in effect, everything else in the XML document. All the markup is contained between the opening and closing tags of the root element. In any well-formed HTML document, htmlis always the root element. In XML, however, the root element can be just about anything you want it to be. In XML, you create a root element and then put all the related XML elements inside it.
For our bookstore example, it makes sense for us to designate booksas the root element. Because our document contains one or more books, our root element’s name sets the stage for our document, which can contain a single book or many books.
68 Part II: XML and the Web
10_588451 ch05.qxd 4/15/05 9:35 AM Page 68
At this point in the process, our markup looks like this:
<?xml version=”1.0” standalone=”yes” encoding=”UTF-8”?>
<books>
</books>
When you think about your root element, keep in mind that every other ele- ment has to fit neatly inside it. Don’t be surprised if a couple of candidates for your root element appear. With a little trial and error, you can find the one that works right for you.
Defining elements
The categories and subcategories that you extracted during your content analysis — all that Chapter 3 stuff — are a good place to start when you’re defining elements. These should include all the important content areas for your data.
For our bookstore, we chose to include these categories (and subcategories):
Book :Title, Author, Publisher, ISBN, Content Type, Format Sales:Item Number, Date of Sale, Source, Price, Shipping, Cost
per Item(Price+ Shipping), Total Cost
Customer:Customer Number, First Name, Last Name, Street Address, City, State, Zip Code, Phone, E-mail Address
For the first draft of our markup, we include all these categories and subcate- gories as elements — and add opening and closing tags for each element, like this:
<books>
<book>
<title></title>
<author></author>
<publisher></publisher>
<isbn></isbn>
<contentType></contentType>
<format></format>
</book>
<sales>
<itemNumber></itemNumber>
<date></date>
<source></source>
<shipping></shipping>
<cost></cost>
<totalCost></totalCost>
</sales>
69
Chapter 5: Putting Together an XML File
10_588451 ch05.qxd 4/15/05 9:35 AM Page 69
<customer>
<customerNumber></customerNumber>
<firstName></firstName>
<lastName></lastName>
<address></address>
<city></city>
<state></state>
<zipcode></zipcode>
<phone></phone>
<email></email>
</customer>
</books>
Looking at the family tree
The elements in an XML document are like one big happy family that comes together to describe your content. Imagine a family tree with a single trunk that splits into branches — which in turn split into branches with leaves at the end. An XML document has the same structure: The root element is the trunk that forms the foundation for the tree; the branches and their branches are the elements and (ultimately) content in your document.
Parent elements:An element becomes a parent element when it con- tains other elements.
Child element:A child element is — yep, you guessed it — an element that sits inside of a parent element.
Sibling element:When a parent has more than one child element, those elements are siblingsof one another. Sibling elements occupy the same level in the document hierarchy.
But what about grandparents, aunts, uncles, and cousins? Well, no. XML doesn’t take the family-tree metaphor to such an extreme.
The categories and subcategories we list here are all child elements of the root element books— and all the categories are sibling elements to one another. The subcategories are child elements of category elements, and — no surprise here — are all sibling elements to one another.
Mapping relationships
Using the family-tree structure is a convenient way to map our document hierarchy and look at relationships between elements.
As you glanced over the first draft of our markup, you may have noticed a problem with our family structure: The totalCostis a child of the salesele- ment. Because we want to be able to include more than one book in our doc- uments, and because each book has a cost (price plus shipping), totalCost won’t work well as a child of the saleselement. We need to take totalCost out of the saleselement and make it a separate element, like so:
70 Part II: XML and the Web
10_588451 ch05.qxd 4/15/05 9:35 AM Page 70
<sales>
<itemNumber></itemNumber>
<date></date>
<source></source>
<shipping></shipping>
<cost></cost>
</sales>
<totalCost></totalCost>
Because each book includes information about the book itself, such as the title, and sales information for that book, we added a new category (bookInfo) that wasn’t included in our original content analysis. We also changed the name of the saleselement to salesInfojust to keep things consistent. The new draft of our markup looks like this:
<books>
<book>
<bookInfo>
<title></title>
<author></author>
<publisher></publisher>
<isbn></isbn>
<contentType></contentType>
<format></format>
</bookInfo>
<salesInfo>
<itemNumber></itemNumber>
<date></date>
<source></source>
<shipping></shipping>
<cost></cost>
</salesInfo>
</book>
<totalCost></totalCost>
<customer>
<customerNumber></customerNumber>
<firstName></firstName>
<lastName></lastName>
<address></address>
<city></city>
<state></state>
<zipcode></zipcode>
<phone></phone>
<email></email>
</customer>
</books>
So, Notes to Self: Change the hierarchy of the document so that totalCostis now a child of the root element, books. Also add a bookInfoelement and change sales-element name to salesInfo.
71
Chapter 5: Putting Together an XML File
10_588451 ch05.qxd 4/15/05 9:35 AM Page 71
Adding attributes
With a little savvy about elements in hand, you can turn your attention to the attributesthat modify or manage the content that those elements may con- tain. Not only can attributes help clarify what content elements may contain, but they can also help define what an element does and how it relates to other elements.
To help you decide when to use an attribute with an element, here’s a quick quiz:
Are you defining a particular aspect of an element, such as size, height, or color?
Do you need a way to provide more information about individual instances of an element?
Do you want to be sure that every time an element is used, certain infor- mation is included with it?
Keeping these guidelines in mind while you’re looking at the first draft of our document markup, behold! Two elements appear to be good candidates for attribute status: contentTypeand format, both child elements of the book element. With a little ingenuity, we can add attributes to these elements so they end up bearing a bit more of the informational burden. Our ingenuity is on display in the final draft of the markup:
<book contentType=”Fiction” format=”Hardback”>
We also added other attributes that weren’t included in our initial content analysis. Because our bookstore does both retail and wholesale sales, we realized we needed to add categories for the aspects of price, source, and customer. So our final markup also includes three additional attributes, priceType, sourceType, and custType, like so:
<price priceType=”Retail”>$24.95</price>
...
<source sourceType=”Retail”/>
...
<customer custType=”newRetail”>
The custTypeattribute also allows us to include information about whether a customer is a new or repeat customer.
As you can see from the process we went through to create our final markup document, using content analysis, creating markup, and testing the markup allow you to create the XML document that best meets your needs for data storage and exchange.
72 Part II: XML and the Web
10_588451 ch05.qxd 4/15/05 9:35 AM Page 72
The final form of our markup is shown in Listing 5-1. You’ll notice another change — the line after the XML declaration is a processing instruction for adding a CSS stylesheet. (You get a look at adding a stylesheet in the section called “Adding Style for the Web,” later in this chapter.)
Listing 5-1: bookstore.xml
<?xml version=”1.0” encoding=”UTF-8”?>
<?xml-stylesheet type=”text/css” href=”bookstore.css”?>
<books>
<book contentType=”” format=””>
<bookInfo>
<title></title>
<author></author>
<publisher></publisher>
<isbn></isbn>
</bookInfo>
<salesInfo>
<price priceType=””></price>
<itemNumber></itemNumber>
<date></date>
<source sourceType=””/>
<shipping></shipping>
<cost></cost>
</salesInfo>
</book>
<totalCost></totalCost>
<customer custType=””>
<custNumber></custNumber>
<lastName></lastName>
<firstName></firstName>
<address></address>
<city></city>
<state></state>
<zip></zip>
<phone></phone>
<email></email>
</customer>
</books>
Our document at this point doesn’t include any information about what order elements should appear in, and it also doesn’t indicate whether elements are required or optional, or whether they can occur more than once in the docu- ment. To add these kind of rules, you need to add validation for your docu- ment with a DTD or XML schema. Before you can validate an XML document, though, you need to ensure that it’s well formed, as outlined in the following section.
73
Chapter 5: Putting Together an XML File
10_588451 ch05.qxd 4/15/05 9:35 AM Page 73
Playing by the Rules: Well-Formed Documents
A well-formed XML document follows all the rules of XML syntax. XML is very flexible; its syntax is rigid. This is a good thing, because it guarantees that all XML documents adhere to the same basic rules (and computers likedata that follows the rules).
If you think some of these rules are a bit nitpicky, you’re right. Remember, the intended audience for your XML isn’t a human being who can intuit what you
“meant to mark,” but a computer that can only work with what you give it.
We introduce the rules of XML syntax in our discussion of XHTML in Chapter 4; this chapter throws in a couple more rules for good measure. The following list includes all the rules introduced so far and adds one more rule so that you have everything you need to create well-formed XML documents:
You need an XML declaration.The first line in every XML document is a simple declaration that specifies that the document is an XML document. In its simplest form, it looks like this:
<?xml?>
You need a root element to contain all the other elements.All ele- ments and content within an XML document must live within a single top-level element, appropriately called the document elementor root element.
Every nonempty element must have a start tag and an end tag.If you open an element with a tag, make sure that you close it with a tag.
Empty elements have to end with a slash (/). Elements that consist of only a start tag — such as the sourceelement in our example — are called empty elementsbecause they don’t hold content between opening and closing tags (they don’t even haveclosing tags). To avoid confusion and to prevent your XML tools from searching endlessly for closing tags that don’t exist, identify all empty elements with a slash (/) before the closing greater-than sign (>), like this:
<source sourceType=”Retail”/>
In XHTML documents, you add a space before the closing slash in empty elements so that older browsers can recognize them as empty elements.
You don’t need to include a space before the ending slash in an XML document — the XML processor will recognize an empty element with- out that extra space.
74 Part II: XML and the Web
10_588451 ch05.qxd 4/15/05 9:35 AM Page 74
Tags must be properly nested. To avoid breaking this cardinal rule, always close first the tag that you opened last, working your way from the inside to the outside tags.
A good way to remember to nest your elements correctly is to think of nested suitcases. Before you can close and zip the outer suitcase, you have to close and zip the inner suitcase. Think of tags as suitcase tops:
You can’t close the one on the outside until you close the one on the inside.
All attribute values must be in quotation marks.You mustenclose every attribute value in quotation marks (either single or double quotes — double quotes are used most often). If you forget even one set of quotation marks, you can count on the markup to break somewhere along the line.
Tags have to be built the right way.Every XML tag must begin with a less-than sign (<),. XML tools don’t know what to do with tags that don’t play by this rule and usually treat them as plain ol’ content. Not a total disaster (if you fix the error) — but certainly not a boon to the docu- ment if you leave it alone.
A corollary to this rule is that every XML entity must begin with an ampersand (&). Fine,you say, but what’s an entity?We’re glad you asked.
An entityis a virtual storage unit that can contain text, binary files such as graphics or sound clips, or non-ASCII characters such as the copy- right symbol. You reference an entity in an XML document by using a string of characters that begins with an ampersand (&) and ends with a semicolon (;).
XML supports non-ASCII characters. In Chapter 6, we discuss the XML use of characters and entities in depth.
If you’re worried that building well-formed documents by hand will be tedious and not worth the effort, don’t abandon us (and XML) here. Take a look at the sidebar “Staying well formed with good tools” elsewhere in this chapter to find out how a good XML tool picks the nits for you.
We’ve found that people with HTML experience have a harder time learning to adhere to the rules of well-formedness simply because Web browsers seem to encourage breaking rules instead of following them. Although this shift in thinking happens gradually (some may say painfully), with a little practice, you’ll be over the HTML hump. (We made it without too much discomfort.)
75
Chapter 5: Putting Together an XML File
10_588451 ch05.qxd 4/15/05 9:35 AM Page 75
Adding Style for the Web
Although XML is a great tool for storing data for all kinds of stuff, it’s not completely Web compatible yet. But because the Web is hot, hot, hot, it’s no surprise that content developers — like you — want to deliver their data through the Web. So if you want to transmit XML through the Web, Cascading Style Sheets (CSS) provide a mechanism to display XML documents directly.
A CSS stylesheet is a plain-text file that lists style properties. It’s saved with a .cssfile extension. CSS is so important that we’ve devoted a whole chapter to it (Chapter 7), but for now, you only need to know a bit of CSS syntax to add a stylesheet to your XML markup for an enhanced Web view of your content.
We like CSS because it’s human-readable and uses a simple-but-flexible syntax. To understand CSS, you only need to remember this magic formula:
selector {property: value}
76 Part II: XML and the Web
Staying well formed with good tools
You’re probably wondering how you can possi- bly remember all the rules that we describe in this chapter when you develop XML documents.
Even veteran document designers forget a quo- tation mark or two here and there — not to mention occasionally forgetting a closing tag or a slash at the end of an empty tag. If you try to send such a malformed XML document to the application that is going to work with it, the application will spit it right back out at you or spit out error messages (and that’s just as bad).
Before you get your document to your applica- tion, it pays to ensure that it’s well formed and valid (if necessary).
The best way to make sure your documents are well formed is to build your XML document with a text editor designed specifically for XML doc- uments. XML editors can check documents as
you build them so that easy-to-make mistakes don’t fester long enough to grow into ugly, mal- formed documents. Believe us, you’ll be a hap- pier and less-stressed camper if you go out and find yourself a good editor — we promise! XML editors are available for a variety of platforms and range in price from free to fairly expensive.
Every editor has extra gimmicks and functions, but no XML editor is worth its salt if it can’t check documents to make sure they’re well formed.
In Chapter 19, we focus entirely on XML-related tools, including a section on XML editors. Read more about the editors available for your plat- forms of choice and then download a few and try them out. The best online resource that we’ve found for XML software is www.xml software.com.
10_588451 ch05.qxd 4/15/05 9:35 AM Page 76