Adding XHTML for the Web

Một phần của tài liệu xml for dummies, 4th edition (2005) (Trang 70 - 88)

09_588451 ch04.qxd 4/15/05 12:11 AM Page 49

Web designers sometimes try to achieve the same formatting control over Web documents that they have over, say, printed documents. They want what they see on their screens with their browsers to be exactly what any visitor to their sites may also see. (Even if they can’t pronounce WYSIWYG, they want “what you see” to be “what you get.”) Two overarching problems pre- vent Web designers from achieving this control with HTML:

HTML lacks fine controls.First and foremost, HTML by itself doesn’t include mechanisms for fine control. You can’t specify a document’s dis- play size or control the size of a browser window. A user’s monitor size and display settings can dramatically affect how a browser displays HTML documents. Although HTML 4.01 does include a fontelement to help you manipulate font style, size, and color, users can override your settings with their own.

Displays vary.Along with the different versions of the two most

common browsers (IE and Navigator), users view Web pages on different platforms, such as Windows XP, Unix, or Mac OS X. And you can’t realis- tically test every Web page on every available browser on every plat- form just to see what your users see.

Although adding CSS (Cascading Style Sheets) to your arsenal gives you more flexibility than HTML alone, it doesn’t solve every Web design problem.

Stylesheets have become an integral part of the XML discussion. For an in- depth look at the use of stylesheets with XML, please visit Chapter 7 (CSS) and Chapter 12 (XSLT).

Comparing XML and HTML

Right off the bat, we want to tell you in no uncertain terms that XML won’t — and can’t — replace HTML. XML and HTML are not the same kind of markup language. But XML and HTML both derive from the same parent, SGML, so they must be similar, right? The answer is: “Yes, they’re similar, but not iden- tical, and they can’t pinch-hit for each other.”

HTML and XML both use tags and attributes. Indeed, XML and HTML look similar. But whereas HTML defines basic text elements and includes defaults (and more explicit controls) for how text may be displayed in a browser window, XML tells us only what each element means. XML says nothing about how elements should or must be displayed — XML separates content and the presentation of that content.

50 Part II: XML and the Web

09_588451 ch04.qxd 4/15/05 12:11 AM Page 50

Using XML to describe data

Unlike HTML, XML is not limited to any fixed set of tagsor element types (which is the proper name for the whatchamathingies that show up between the opening <and closing >characters in XML markup). By using XML, you can define your own sets of elements and even your own attributes that you may then use within your documents. On the other hand, XML has already been used to define lots of specific markup languages (called XML applica- tions) that you can use within your documents as well.

Although it might seem that the terms tagand elementare interchangeable, they’re not. One example of a tag is the opening <p>tag; an example of an ele- ment is <p>text</p>. An element includes the opening and closing tags for a tag pair — andeverything in between. A tagis just a tag, all by itself.

With this power, XML enables you to give meaningful names to your markup.

In HTML, the paragraph element (p) is one of the most frequently used ele- ments. In XML, you can replace the paragraph element with something more descriptive.

For example, suppose you have HTML text that looks like this:

<html>

<p>

This book is about the foundations of the Extensible Markup Language (XML) and how to use it for your own applications.

</p>

<p>

The authors are Lucinda Dykes and Ed Tittel.

</p>

</html>

Looks like a plain old HTML document; you know nothing about the meaning of the data. In other words, the first sentence could be a very long title, a description in a catalog, the first line of a book, the chorus of a new song, or something else entirely.

HTML is used to describe the display of data as seen through a Web browser.

We could have just as easily written

<html>

<p>

Blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah (XML) blobbity blobbity blobitty blah blah blah blah.

</p>

<p>

Blah blah blah Lucinda Dykes blah Ed Tittel.

</p>

</html>

51

Chapter 4: Adding XHTML for the Web

09_588451 ch04.qxd 4/15/05 12:11 AM Page 51

and you would have gained just as much (or as little, depending on your point of view) insight as to the function of this data. But you would know that the data is displayed in two distinct paragraphs, because the paragraph tags

<p>and </p>define the appearance of the data.

Consider this alternate form of expression, in which cover copy (the informa- tion found on the cover of a book) is provided and divided into specific, named elements:

<Cover>

<Abstract>

This book is about the foundations of the Extensible Markup Language (XML) and how to use it for your own applications.

</Abstract>

<AuthorInfo>

The authors are <Author>Lucinda Dykes</Author> and

<Author>Ed Tittel</Author>.

</AuthorInfo>

</Cover>

The Cover, Abstract, and AuthorInfoelements identify the previous markup as XML based. By using the elements in this markup language, you can now identify cover copy and specify further what that cover copy includes (say, an abstract plus a section for author information). In addition, this markup identifies each of the book’s authors individually.

Never fear, XML also provides numerous ways to translate this information so it looks the way you want it to on-screen. These ways include CSS and XSL.

(See Chapter 7 of this book for more about CSS and Chapter 12 for more about XSL.)

XML enables you to define and use your own elements and attributes. This is what XML is all about and explains why XML is called the extensiblemarkup language.

After you get your head wrapped around the differences between XML and HTML, it’s a lot easier to think of ways to increase the information associated with particular elements. For example, perhaps the Authorelement should include not only an author’s name, but also a brief biography.

In “The limits of HTML,” earlier in this chapter, we outline the shortcomings of HTML. Perhaps you’re wondering why you would want to use it at all.

HTML does have some advantages (see the following section); and in the short term, you have no real choice in the matter. Besides, HTML can create a reasonably consistent Web-page presentation for users. As the expectations of businesses and end-users increase, however, so does the need for more flexible markup languages. XML rises to meet that challenge.

52 Part II: XML and the Web

09_588451 ch04.qxd 4/15/05 12:11 AM Page 52

The benefits of using HTML

Why use HTML? Because it’s quick, easy, and cheap. In addition, HTML is way easier than the alternative — which is to use an XML-based source document along with a document created with XSLT (or something similar) to define the display attributes for your document’s contents.

Anyone can create an HTML document by using a text editor and a little knowledge. Even if you don’t know HTML, you can use an HTML editor — a What You See Is What You Get-style (that is, WYSIWYG-style) editor such as FrontPage or Dreamweaver — to produce readable Web pages in minutes.

The most important reason not to write off HTML just yet lies in where it’s headed: XHTML. HTML isn’t out of the game. (See “XHTML Makes the Move to XML Syntax,” later in this chapter, for more information.)

The benefits of using XML

XML seems to be brimming with benefits. Here’s a list that puts these bene- fits in a nutshell:

Unlimited elements:You get to create your own elements and attributes instead of working with a restricted, predefined set.

Structured data:Applications can extract information that they need from XML documents.

Data exchange:Using XML enables you to exchange database contents or other structured information across the Internet or between dissimi- lar applications.

XML complements HTML:XML data can be used in HTML pages.

XML documents are well formed:XML documents must follow certain rules. This consistency makes such documents easier to read and create.

Self-describing:No prior knowledge of an XML application is needed. Of course, knowing HTML can really help you understand more about XML (but software thrives on self-describing documents and data).

Search engines:XML delivers a noticeable increase in search relevance because it provides ample contextual information and explicit labels for document elements.

Updates:No need to update an entire site page by page; the Document Object Model (DOM) built into XML documents permits individual ele- ments to be accessed (and updated).

User-selected view of data:Different users can access different informa- tion or can present the same information in various ways.

53

Chapter 4: Adding XHTML for the Web

09_588451 ch04.qxd 4/15/05 12:11 AM Page 53

Intelligent XML-based pages that contain human-readable data offer exciting potential for users. For example, they can go beyond HTML-based search tools that use keywords and text strings; XML-based search tools can also use metadata and data structures. XML-powered searches produce more rele- vant results — and quicker.

A Web designer/developer reaps several benefits from XML as well. For exam- ple, if you maintain a site that sells widgets, and inflation kicks in, raising prices by $1.50 per widget is easy to implement across an entire site. No late nights spent changing each page — you’re working with intelligent data now.

The benefits of XML are endless. Trust us, large corporations have good rea- sons for using XML-based markup languages to maintain their intranets.

XHTML Makes the Move to XML Syntax

XHTML is the successor to HTML 4.01; in effect, it’s the final version of HTML.

This progress was needed; HTML has some limitations (spelled out, appropri- ately enough, in “The limits of HTML,” earlier in the chapter). XHTML is a clever reformulation of HTML 4 as an application of XML 1.0.By definition, this arrangement offers you the benefits of XML in any XHTML documents. If you’re familiar with HTML 4.01, XHTML 1.0 won’t seem all that revolutionary as long as you don’t let the new acronym scare you. (If it helps, think of XHTML as HTML 4.01 with a facelift.)

We’re focusing on XHTML version 1.0 in this section. XHTML 1.0 documents are compatible with current browsers and closely resemble HTML docu- ments. Other forms of XHTML have already been developed, though. XHTML modules have been created to allow you to create documents with subsets of XHTML. These modules are described in the document “Modularization of XHTML”, available at www.w3.org/TR/2004/WD-xhtml-modularization- 20040218/. XHTML Basic, XHTML 1.1, and XHTML 2.0 (currently a W3C Working Draft specification) are all built from these modules. These versions of XHTML offer expanded XML features and aren’t as similar to HTML as XHTML 1.0. Details on the newer versions of XHTML would take up too much space in this book about XML, but you can get more information about them in Beginning Web Programming with HTML, XHTML, and CSS,by Jon Duckett (also published by Wiley).

Why do we like XHTML? Let us count the ways:

54 Part II: XML and the Web

09_588451 ch04.qxd 4/15/05 12:11 AM Page 54

XHTML documents can be viewed, edited, and validated using XML tools.

Well-formed XHTML documents mean better-structured documents.

This improvement and uniformity in structure makes it easier to inter- face XHTML documents with databases and with other highly structured documents.

XHTML documents can be delivered using different Internet media types and output devices, such as handheld computers, Internet-enabled cell phones, and specialized browsers such as speech-enabled browsers.

Using valid XHTML gives you the best chance of having your document displayed the way you intend, especially if you’re using a browser that’s compliant with Web standards, such as Internet Explorer 6 or Netscape Navigator (versions 6 and later).

Web standards are the specifications created by the W3C and other stan- dards organizations for Web content. The XML and XHTML specifications are part of these standards. For more information, see www.webstandards.org.

Making the switch

Making the switch from HTML to XHTML means mastering the rules of XHTML — in particular, XML syntax and structure. Mastering XHTML is a jump-start to finding out how to create XML documents (as described in Chapter 5).

You have only a few major rules to get under your belt, but you have to follow them if you want to create a valid XHTML document. Here they are in a nutshell:

Every tag in an XHTML document must be closed.

Empty elements (elements without content, such as a brtag) must be correctly formatted with a closing slash. For example, a break tag is formatted <br />.

All tags must be nestedcorrectly — the tag you open last must be the tag you close first.

All XHTML tags must be written using only lowercase.

All attribute values must be put in quotation marks.

These requirements are pretty straightforward, even though they aren’t strictly necessary in HTML. If you want to transform an HTML document into an acceptable XHTML document, you have to keep in mind that the docu- ment has to work with an XML-based application (namely XHTML).

55

Chapter 4: Adding XHTML for the Web

09_588451 ch04.qxd 4/15/05 12:11 AM Page 55

An acceptable XML document must be well formed (which means it follows the rules of XML syntax). XML won’t let you use HTML shortcuts and is unfor- giving of syntax errors. Although they impose a bit more fuss and bother, these requirements make XML and XHTML documents easy for computers to read and digest. For these reasons, each requirement is worth a closer look — coming right up in the next several sections.

Every element must be closed

The first XHTML requirement is that all nonempty elements (that is, those that contain actual text) must have a start tag and an end tag. Several HTML elements, such as pand li, act as text containers but don’t explicitly require end tags. Those tags work in nonpair containers because a special SGML shortcut allows the HTML parser to assume an end tag must occur before another start tag for the same element. So in effect

<p>text text text is the same as

<p>text text text</p>

. . . at least, that’s the case in HTML. Doing without an end tag just doesn’t fly in XHTML. You have to add closing paragraph tags where they belong if you want the resulting lines to work right.

You also need to add ending list item tags (</li>) if you have list items, whether those lists are numbered or bulleted.

Empty elements must be formatted correctly

The XML way to format empty elements may seem a bit strange, but remem- ber that all nonempty XML elements must use both a start tag and an end tag to be correct.

Anempty elementis a singleton tag (also called an empty tag) that hangs around by itself. (A nonemptyelement sandwiches some text between a pair of tags.) Empty tags in HTML include the <br>, <hr>, and <img>tags.

56 Part II: XML and the Web

09_588451 ch04.qxd 4/15/05 12:11 AM Page 56

The hrelement looks like this in HTML:

<hr>

In XHTML, it looks like this:

<hr />

For backward compatibility with older browsers, you have to put a space in front of the closing slash that occurs near the end of each XHTML empty ele- ment (like this: <br />). Doing so permits older, XML-ignorant Web browsers to recognize empty elements. When native XML markup is used (or, more to the point, interpreted properly), that extra space isn’t necessary.

If you validate your XHTML document, you’ll find that you can either include the space before the closing slash or not — your document will validate in either case. The extra space is a browser issue, not a validation issue.

Tags must be properly nested

The rules of XHTML syntax say that tags must be nested in the correct order.

The rule is always to close first what you opened last, working your way from the inside to the outside tags. So even though this HTML markup

<p>This book was written by <i><b>Dan Brown</i></b>.

looks fine in a browser, the line is technically ill formed. To make it a well- formed line, you must make these changes:

<p>This book was written by <i><b>Dan Brown</b></i>.</p>

You probably noticed that we also added a closing paragraph tag (</p>) to follow the previous rule about closing all elements.

Case makes a difference

HTML is not case sensitive; XHTML is. When you use HTML, it doesn’t matter what case you use for elements and attributes. For example, for the opening body tag, you can use <BODY>, <body>, or even <Body>— they all work fine.

In fact, you can even mix case between opening and closing tags (for exam- ple, <BODY>and </body>, respectively), and it won’t make a difference in a browser.

57

Chapter 4: Adding XHTML for the Web

09_588451 ch04.qxd 4/15/05 12:11 AM Page 57

XHTML, on the other hand, is a bit more finicky about case. All XHTML ele- ments and attribute names must be in lowercase or your page won’t validate.

You can, however, use any case for the valueof an attribute — for example,

<h3 align= “CENTER”>will work just as well as <h3 align= “center”>.

Attribute values are in quotation marks

HTML requires that only some attribute values, such as text strings and URLs, be in quotation marks. Other values, such as image dimensions and font sizes, produce the desired results whether quoted or not. In XHTML, all attribute values must be in quotation marks.

The following markup works just fine on an HTML page:

<tr align=right>

If you want to create valid XHTML, however, you have to add quotation marks around the attribute value, like this:

<tr align=”right”>

Both single quotation marks (‘) and double quotation marks (“) are legal in XML and XHTML. You see double quotation marks used more, mostly because they’re easier to see.

Table 4-1 highlights the major rules for XHTML syntax and shows how markup looks in HTML and XHTML. Some additional rules are highlighted in Chapter 5 when you create an XML document. If you’d like to see the full set of rules, check out the “HTML Compatibility Guidelines” document in the XHTML 1.0 specification at

www.w3.org/TR/xhtml1/#guidelines

Table 4-1 Correct Formats for XHTML versus HTML

Rule Looks Like This in XHTML Looks Like This in HTML XHTML tags must <b>This text is <B>This text is be written using bold</b> bold</B>

only lowercase.

Empty tags in XHTML <img src=”goblin.gif” <img src=”goblin.

must include a trailing alt=”goblin” /> gif” alt=”goblin”>

/ >or />in all empty <br /> <br>

elements.

58 Part II: XML and the Web

09_588451 ch04.qxd 4/15/05 12:11 AM Page 58

Một phần của tài liệu xml for dummies, 4th edition (2005) (Trang 70 - 88)

Tải bản đầy đủ (PDF)

(386 trang)