source: draft-ietf-iri-3987bis/draft-ietf-iri-comparison.xml @ 95

Last change on this file since 95 was 95, checked in by masinter@…, 8 years ago

remove todo list from doc, submitting tix, fix other minor issues

File size: 29.3 KB
Line 
1<?xml version="1.0"?>
2<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
3<!ENTITY rfc2045 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2045.xml">
4<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
5<!ENTITY rfc2130 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2130.xml">
6<!ENTITY rfc2616 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2616.xml">
7<!ENTITY rfc3490 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3490.xml">
8<!ENTITY rfc3491 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3491.xml">
9<!ENTITY rfc3629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml">
10<!ENTITY rfc3986 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3986.xml">
11<!ENTITY rfc3987 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3987.xml">
12<!ENTITY rfc5890 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5890.xml">
13]>
14<?rfc strict='yes'?>
15
16<?xml-stylesheet type='text/css' href='rfc2629.css' ?>
17<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
18<?rfc symrefs='yes'?>
19<?rfc sortrefs='yes'?>
20<?rfc iprnotified="no" ?>
21<?rfc toc='yes'?>
22<?rfc compact='yes'?>
23<?rfc subcompact='no'?>
24<rfc ipr="pre5378Trust200902" docName="draft-ietf-iri-comparison-01" category="std" xml:lang="en">
25<front>
26<title abbrev="IRI Equivalence">Equivalence and Canonicalization of Internationalized Resource Identifiers (IRIs)</title>
27
28<author initials="L." surname="Masinter" fullname="Larry Masinter">
29   <organization>Adobe</organization>
30   <address>
31   <postal>
32   <street>345 Park Ave</street>
33   <city>San Jose</city>
34   <region>CA</region>
35   <code>95110</code>
36   <country>U.S.A.</country>
37   </postal>
38   <phone>+1-408-536-3024</phone>
39   <email>masinter@adobe.com</email>
40   <uri>http://larry.masinter.net</uri>
41   </address>
42</author>
43
44  <author initials="M.J." surname="Duerst" fullname='Martin Duerst'>
45    <!-- (Note: Please write "Duerst" with u-umlaut wherever
46      possible, for example as "D&#252;rst" in XML and HTML) -->
47  <organization abbrev="Aoyama Gakuin University">Aoyama Gakuin University</organization>
48  <address>
49  <postal>
50  <street>5-10-1 Fuchinobe</street>
51  <city>Sagamihara</city>
52  <region>Kanagawa</region>
53  <code>229-8558</code>
54  <country>Japan</country>
55  </postal>
56  <phone>+81 42 759 6329</phone>
57  <facsimile>+81 42 759 6495</facsimile>
58  <email>duerst@it.aoyama.ac.jp</email>
59  <uri>http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/<!-- (Note: This is the percent-encoded form of an IRI)--></uri>
60  </address>
61</author>
62
63
64<date year="2011" />
65<area>Applications</area>
66<workgroup>Internationalized Resource Identifiers (iri)</workgroup>
67<keyword>IRI</keyword>
68<keyword>Internationalized Resource Identifier</keyword>
69<keyword>UTF-8</keyword>
70<keyword>URI</keyword>
71<keyword>URL</keyword>
72<keyword>IDN</keyword>
73<keyword>Normalization</keyword>
74<keyword>Canonicalization</keyword>
75<abstract>
76<t>Internationalized Resource Identifiers (IRIs) are unicode strings
77used to identify resources on the Internet. Applications that use
78IRIs often define a means of comparing two IRIs to determine
79when two IRIs are equivalent for the purpose of that
80application. Some applications also define a method
81for 'canonicalizing' or 'normalizing' an IRI -- translating one
82IRI into another which is equivalent under the comparison
83method used.</t>
84<t>This document gives guidelines and best practices for defining
85and using IRI comparison, equivalence, normalization and canonicalization
86methods.</t>
87</abstract>
88
89</front>
90<middle>
91
92<section title="Introduction">
93
94<t>Internationalized Resource Identifiers (IRIs) are unicode strings
95used to identify resources on the Internet. Applications that use
96IRIs often define a means of comparing two IRIs to determine
97when two IRIs are equivalent for the purpose of that
98application. Some applications also define a method
99for 'canonicalizing' or 'normalizing' an IRI -- translating one
100IRI into another which is equivalent under the comparison
101method used.</t>
102<t>This document gives guidelines and best practices for defining
103and using IRI comparison, equivalence, normalization and canonicalization
104methods.</t>
105
106<t>One of the most common operations on IRIs is simple comparison:
107Determining whether two IRIs are equivalent, without using the IRIs to
108access their respective resource(s). A comparison is performed
109whenever a response cache is accessed, a browser checks its history to
110color a link, or an XML parser processes tags within a
111namespace. Extensive normalization prior to comparison of IRIs may be
112used by spiders and indexing engines to prune a search space or reduce
113duplication of request actions and response storage.</t>
114
115<t>IRI comparison is performed for some particular purpose. Protocols
116or implementations that compare IRIs for different purposes will often
117be subject to differing design trade-offs in regards to how much
118effort should be spent in reducing aliased identifiers. This document
119describes various methods that may be used to compare IRIs, the
120trade-offs between them, and the types of applications that might use
121them.</t>
122
123</section> <!-- introduction -->
124
125<section title="Equivalence">
126
127<t>Because IRIs exist to identify resources, presumably they should be
128considered equivalent when they identify the same resource. However,
129this definition of equivalence is not of much practical use, as there
130is no way for an implementation to compare two resources to determine
131if they are "the same" unless it has full knowledge or control of
132them. For this reason, determination of equivalence or difference of
133IRIs is based on string comparison, augmented by reference to
134additional rules provided by scheme definition.  We use the terms
135"different" and "equivalent" to describe the possible outcomes of such
136comparisons, but there are many application-dependent versions of
137equivalence.</t>
138
139<t>Even when it is possible to determine that two IRIs are equivalent,
140IRI comparison is not sufficient to determine whether two IRIs
141identify different resources. For example, an owner of two different
142domain names could decide to serve the same resource from both,
143resulting in two different IRIs. Therefore, comparison methods are
144designed to minimize false negatives while strictly avoiding false
145positives.</t>
146
147<t>In testing for equivalence, applications should not directly
148compare relative references; the references should be converted to
149their respective target IRIs before comparison. When IRIs are compared
150to select (or avoid) a network action, such as retrieval of a
151representation, fragment components (if any) MUST be excluded from
152the comparison.</t>
153
154<t>Applications using IRIs as identity tokens with no relationship to
155a protocol MUST use the Simple String Comparison (see <xref
156target="stringcomp"></xref>).  All other applications MUST select one
157of the comparison practices from the Comparison Ladder (see <xref
158target="ladder"></xref>.</t>
159</section> <!-- equivalence -->
160
161<section title="Comparison, Equivalence, Normalization and  Canonicalization">
162
163<t>In general, when considering a set of items or strings, there are several
164interrelated concepts. A comparison method determines, between two items in the
165set, their relationship. In particular, a comparison method for determining
166equivalence might result in a determination that two (different) items are equivalent,
167known to be different, or that equivalence isn't determined.  </t>
168<t> One way to define a comparison for equivalence is to define a
169a normalization or canonicalization algorithm. For each item in a set
170of equivalent items, one of them could be designated the "normal" or
171"canonical" form. </t>
172
173<t>These general concepts are used with IRIs in this document,
174and in other circumstances, where a mapping from one sequence of Unicode
175characters to another one could be described as a "normalization" algorithm.</t>
176<t> In general, this document tries to stay with the "equivalence" or
177"comparison" methods, become some times the mathematical notion of
178"normalization" results in forms that ordinary users might not consider "normal"
179in an ordinary sense.
180 </t>
181</section>
182
183<section title="Preparation for Comparison">
184<t>Any kind of IRI comparison REQUIRES that any additional contextual
185processing is first performed, including undoing higher-level
186escapings or encodings in the protocol or format that carries an
187IRI. This preprocessing is usually done when the protocol or format is
188parsed.</t>
189
190<t>Examples of such escapings or encodings are entities and
191numeric character references in <xref target="HTML4"></xref> and <xref
192target="XML1"></xref>. As an example,
193"http://example.org/ros&amp;eacute;" (in HTML),
194"http://example.org/ros&amp;#233;" (in HTML or XML), and
195<vspace/>"http://example.org/ros&amp;#xE9;" (in HTML or XML) are all
196resolved into what is denoted in this document (see 'Notation' section of <xref
197target="RFC3987bis" />) as "http://example.org/ros&amp;#xE9;"
198(the "&amp;#xE9;" here standing for the actual e-acute character, to
199compensate for the fact that this document cannot contain non-ASCII
200characters).</t>
201
202<t>Similar considerations apply to encodings such as Transfer Codings
203in HTTP (see <xref target="RFC2616"></xref>) and Content Transfer
204Encodings in MIME (<xref target="RFC2045"></xref>), although in these
205cases, the encoding is based not on characters but on octets, and
206additional care is required to make sure that characters, and not just
207arbitrary octets, are compared (see <xref
208target="stringcomp"></xref>).</t>
209
210</section> <!-- preparation -->
211
212<section title="Comparison Ladder" anchor="ladder">
213
214<t>In practice, a variety of methods are used to test IRI
215equivalence. These methods fall into a range distinguished by the
216amount of processing required and the degree to which the probability
217of false negatives is reduced. As noted above, false negatives cannot
218be eliminated. In practice, their probability can be reduced, but this
219reduction requires more processing and is not cost-effective for all
220applications.</t>
221
222
223<t>If this range of comparison practices is considered as a ladder,
224the following discussion will climb the ladder, starting with
225practices that are cheap but have a relatively higher chance of
226producing false negatives, and proceeding to those that have higher
227computational cost and lower risk of false negatives.</t>
228
229<section title="Simple String Comparison" anchor="stringcomp">
230
231<t>If two IRIs, when considered as character strings, are identical,
232then it is safe to conclude that they are equivalent.  This type of
233equivalence test has very low computational cost and is in wide use in
234a variety of applications, particularly in the domain of parsing. It
235is also used when a definitive answer to the question of IRI
236equivalence is needed that is independent of the scheme used and that
237can be calculated quickly and without accessing a network. An example
238of such a case is XML Namespaces (<xref
239target="XMLNamespace"></xref>).</t>
240
241
242<t>Testing strings for equivalence requires some basic precautions.
243This procedure is often referred to as "bit-for-bit" or
244"byte-for-byte" comparison, which is potentially misleading. Testing
245strings for equality is normally based on pair comparison of the
246characters that make up the strings, starting from the first and
247proceeding until both strings are exhausted and all characters are
248found to be equal, until a pair of characters compares unequal, or
249until one of the strings is exhausted before the other.</t>
250
251<t>This character comparison requires that each pair of characters be
252put in comparable encoding form. For example, should one IRI be stored
253in a byte array in UTF-8 encoding form and the second in a UTF-16
254encoding form, bit-for-bit comparisons applied naively will produce
255errors. It is better to speak of equality on a character-for-character
256rather than on a byte-for-byte or bit-for-bit basis.  In practical
257terms, character-by-character comparisons should be done codepoint by
258codepoint after conversion to a common character encoding form.
259
260When comparing character by character, the comparison function MUST
261NOT map IRIs to URIs, because such a mapping would create additional
262spurious equivalences. It follows that an IRI SHOULD NOT be modified
263when being transported if there is any chance that this IRI might be
264used in a context that uses Simple String Comparison.</t>
265
266
267<t>False negatives are caused by the production and use of IRI
268aliases. Unnecessary aliases can be reduced, regardless of the
269comparison method, by consistently providing IRI references in an
270already normalized form (i.e., a form identical to what would be
271produced after normalization is applied, as described below).
272Protocols and data formats often limit some IRI comparisons to simple
273string comparison, based on the theory that people and implementations
274will, in their own best interest, be consistent in providing IRI
275references, or at least be consistent enough to negate any efficiency
276that might be obtained from further normalization.</t>
277</section> <!-- stringcomp -->
278
279<section title="Syntax-Based Equivalence">
280
281<figure><preamble>Implementations may use logic based on the
282definitions provided by this specification to reduce the probability
283of false negatives. This processing is moderately higher in cost than
284character-for-character string comparison. For example, an application
285using this approach could reasonably consider the following two IRIs
286equivalent:</preamble>
287
288<artwork>
289   example://a/b/c/%7Bfoo%7D/ros&amp;#xE9;
290   eXAMPLE://a/./b/../b/%63/%7bfoo%7d/ros%C3%A9
291</artwork></figure>
292
293<t>Web user agents, such as browsers, typically apply this type of IRI
294normalization when determining whether a cached response is
295available. Syntax-based normalization includes such techniques as case
296normalization, character normalization, percent-encoding
297normalization, and removal of dot-segments.</t>
298
299<section title="Case Equivalence">
300
301<t>For all IRIs, the hexadecimal digits within a percent-encoding
302triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore
303should be considered equivalent to forms which use
304uppercase letters for the digits A-F.</t>
305
306<t>When an IRI uses components of the generic syntax, the component
307syntax equivalence rules always apply; namely, that the scheme and
308US-ASCII only host are case insensitive and therefore should be
309normalized to lowercase. For example, the URI
310"HTTP://www.EXAMPLE.com/" is equivalent to
311"http://www.example.com/". Case equivalence for non-ASCII characters
312in IRI components that are IDNs are discussed in <xref
313target="schemecomp"></xref>.  The other generic syntax components are
314assumed to be case sensitive unless specifically defined otherwise by
315the scheme.</t>
316
317<t>Creating schemes that allow case-insensitive syntax components
318containing non-ASCII characters should be avoided. Case normalization
319of non-ASCII characters can be culturally dependent and is always a
320complex operation. The only exception concerns non-ASCII host names
321for which the character normalization includes a mapping step derived
322from case folding.</t>
323
324</section> <!-- casenorm -->
325
326<section title="Unicode Character Normalization" anchor="normalization">
327
328<t>The Unicode Standard <xref target="UNIV6"></xref> defines various
329equivalences between sequences of characters for various
330purposes. Unicode Standard Annex #15 <xref target="UTR15"></xref>
331defines various Normalization Forms for these equivalences, in
332particular Normalization Form C (NFC, Canonical Decomposition,
333followed by Canonical Composition) and Normalization Form KC (NFKC,
334Compatibility Decomposition, followed by Canonical Composition).</t>
335
336<t> IRIs already in Unicode MUST NOT be normalized before parsing or
337interpreting. In many non-Unicode character encodings, some text
338cannot be represented directly. For example, the word "Vietnam" is
339natively written "Vi&amp;#x1EC7;t Nam" (containing a LATIN SMALL
340LETTER E WITH CIRCUMFLEX AND DOT BELOW) in NFC, but a direct
341transcoding from the windows-1258 character encoding leads to
342"Vi&amp;#xEA;&amp;#x323;t Nam" (containing a LATIN SMALL LETTER E WITH
343CIRCUMFLEX followed by a COMBINING DOT BELOW). Direct transcoding of
344other 8-bit encodings of Vietnamese may lead to other
345representations.</t>
346
347<t>Equivalence of IRIs MUST rely on the assumption that IRIs are
348appropriately pre-character-normalized rather than apply character
349normalization when comparing two IRIs. The exceptions are conversion
350from a non-digital form, and conversion from a non-UCS-based character
351encoding to a UCS-based character encoding. In these cases, NFC or a
352normalizing transcoder using NFC MUST be used for interoperability. To
353avoid false negatives and problems with transcoding, IRIs SHOULD be
354created by using NFC. Using NFKC may avoid even more problems; for
355example, by choosing half-width Latin letters instead of full-width
356ones, and full-width instead of half-width Katakana.</t>
357
358
359<t>As an example,
360"http://www.example.org/r&amp;#xE9;sum&amp;#xE9;.html" (in XML
361Notation) is in NFC. On the other hand,
362"http://www.example.org/re&amp;#x301;sume&amp;#x301;.html" is not in
363NFC.</t>
364
365<t>The former uses precombined e-acute characters, and the latter uses
366"e" characters followed by combining acute accents. Both usages are
367defined as canonically equivalent in <xref target="UNIV6"></xref>.</t>
368
369<t><list style="hanging">
370
371<t hangText="Note:">
372Because it is unknown how a particular sequence of characters is being
373treated with respect to character normalization, it would be
374inappropriate to allow third parties to normalize an IRI
375arbitrarily. This does not contradict the recommendation that when a
376resource is created, its IRI should be as character normalized as
377possible (i.e., NFC or even NFKC). This is similar to the
378uppercase/lowercase problems.  Some parts of a URI are case
379insensitive (for example, the domain name). For others, it is unclear
380whether they are case sensitive, case insensitive, or something in
381between (e.g., case sensitive, but with a multiple choice selection if
382the wrong case is used, instead of a direct negative result).  The
383best recipe is that the creator use a reasonable capitalization and,
384when transferring the URI, capitalization never be
385changed.</t></list></t>
386
387<t>Various IRI schemes may allow the usage of Internationalized Domain
388Names (IDN) <xref target="RFC5890"/> either in the ireg-name
389part or elsewhere. Character Normalization also applies to IDNs, as
390discussed in <xref target="schemecomp"/>.</t>
391</section> <!-- charnorm -->
392
393<section title="Percent-Encoding Equivalence">
394
395<t>The percent-encoding mechanism (Section 2.1 of <xref
396target="RFC3986"></xref>) is a frequent source of variance among
397otherwise identical IRIs. In addition to the case normalization issue
398noted above, some IRI producers percent-encode octets that do not
399require percent-encoding, resulting in IRIs that are equivalent to
400their nonencoded counterparts. These IRIs should be normalized by
401decoding any percent-encoded octet sequence that corresponds to an
402unreserved character, as described in section 2.3 of <xref
403target="RFC3986"></xref>.</t>
404
405<t>For actual resolution, differences in percent-encoding (except for
406the percent-encoding of reserved characters) MUST always result in the
407same resource.  For example, "http://example.org/~user",
408"http://example.org/%7euser", and "http://example.org/%7Euser", must
409resolve to the same resource.</t>
410
411<t>If this kind of equivalence is to be tested, the percent-encoding
412of both IRIs to be compared has to be aligned; for example, by
413converting both IRIs to URIs, eliminating escape
414differences in the resulting URIs, and making sure that the case of
415the hexadecimal characters in the percent-encoding is always the same
416(preferably upper case). If the IRI is to be passed to another
417application or used further in some other way, its original form MUST
418be preserved.  The conversion described here should be performed only
419for local comparison.</t>
420
421</section> <!-- pctnorm -->
422
423<section title="Path Segment Equivalence">
424
425<t>The complete path segments "." and ".." are intended only for use
426within relative references (Section 4.1 of <xref
427target="RFC3986"></xref>) and are removed as part of the reference
428resolution process (Section 5.2 of <xref target="RFC3986"></xref>).
429However, some implementations may incorrectly assume that reference
430resolution is not necessary when the reference is already an IRI, and
431thus fail to remove dot-segments when they occur in non-relative
432paths.  IRI normalizers should remove dot-segments by applying the
433remove_dot_segments algorithm to the path, as described in Section
4345.2.4 of <xref target="RFC3986"></xref>.</t>
435
436</section> <!-- pathnorm -->
437</section> <!-- ladder -->
438
439<section title="Scheme-Based Comparison" anchor="schemecomp">
440
441<t>The syntax and semantics of IRIs vary from scheme to scheme, as
442described by the defining specification for each
443scheme. Implementations may use scheme-specific rules, at further
444processing cost, to reduce the probability of false negatives. For
445example, because the "http" scheme makes use of an authority
446component, has a default port of "80", and defines an empty path to be
447equivalent to "/", the following four IRIs are equivalent:</t>
448
449<figure><artwork>
450   http://example.com
451   http://example.com/
452   http://example.com:/
453   http://example.com:80/</artwork></figure>
454
455<t>In general, an IRI that uses the generic syntax for authority with
456an empty path should be normalized to a path of "/". Likewise, an
457explicit ":port", for which the port is empty or the default for the
458scheme, is equivalent to one where the port and its ":" delimiter are
459elided and thus should be removed by scheme-based normalization. For
460example, the second IRI above is the normal form for the "http"
461scheme.</t>
462
463<t>Another case where normalization varies by scheme is in the
464handling of an empty authority component or empty host
465subcomponent. For many scheme specifications, an empty authority or
466host is considered an error; for others, it is considered equivalent
467to "localhost" or the end-user's host. When a scheme defines a default
468for authority and an IRI reference to that default is desired, the
469reference should be normalized to an empty authority for the sake of
470uniformity, brevity, and internationalization. If, however, either the
471userinfo or port subcomponents are non-empty, then the host should be
472given explicitly even if it matches the default.</t>
473
474<t>Normalization should not remove delimiters when their associated
475component is empty unless it is licensed to do so by the scheme
476specification. For example, the IRI "http://example.com/?" cannot be
477assumed to be equivalent to any of the examples above. Likewise, the
478presence or absence of delimiters within a userinfo subcomponent is
479usually significant to its interpretation.  The fragment component is
480not subject to any scheme-based normalization; thus, two IRIs that
481differ only by the suffix "#" are considered different regardless of
482the scheme.</t>
483 
484<t>Some IRI schemes allow the usage of Internationalized Domain
485Names (IDN) <xref target='RFC5890'></xref> either in their ireg-name
486part or elswhere. When in use in IRIs, those names SHOULD
487conform to the definition of U-Label in <xref
488target='RFC5890'></xref>. An IRI containing an invalid IDN cannot
489successfully be resolved. For legibility purposes, they
490SHOULD NOT be converted into ASCII Compatible Encoding (ACE).</t>
491
492<t>Scheme-based normalization may also consider IDN
493components and their conversions to punycode as equivalent. As an
494example, "http://r&amp;#xE9;sum&amp;#xE9;.example.org" may be
495considered equivalent to
496"http://xn--rsum-bpad.example.org".</t><t>Other scheme-specific
497normalizations are possible.</t>
498
499</section> <!-- schemenorm -->
500
501<section title="Protocol-Based Comparison">
502
503<t>Substantial effort to reduce the incidence of false negatives is
504often cost-effective for web spiders. Consequently, they implement
505even more aggressive techniques in IRI comparison. For example, if
506they observe that an IRI such as</t>
507
508<figure><artwork>
509   http://example.com/data</artwork></figure>
510<t>redirects to an IRI differing only in the trailing slash</t>
511<figure><artwork>
512   http://example.com/data/</artwork></figure>
513
514<t>they will likely regard the two as equivalent in the future.  This
515kind of technique is only appropriate when equivalence is clearly
516indicated by both the result of accessing the resources and the common
517conventions of their scheme's dereference algorithm (in this case, use
518of redirection by HTTP origin servers to avoid problems with relative
519references).</t>
520
521</section> <!-- protonorm -->
522</section> <!-- equivalence -->
523
524<section title="Security Considerations" anchor="security">
525<t>The primary security difficulty comes from applications choosing the
526wrong equivalence relationship, or two different parties disagreeing
527on equivalence. This is especially a problem when IRIs are used in
528security protocols.</t>
529
530<t>Besides the large character repertoire of Unicode, reasons for
531  confusion include different forms of normalization and different normalization
532  expectations, use of percent-encoding with various legacy encodings,
533  and bidirectionality issues. See also <xref target='UTR36'/>.</t>
534
535</section><!-- security -->
536
537<section title="Acknowledgements">
538
539<t>This document was originally derived from <xref target="RFC3986"/>
540and <xref target="RFC3987"/>, based on text contributed by Tim
541Bray.</t>
542</section>
543
544</middle>
545
546<back>
547<references title="Normative References">
548
549      <reference anchor="RFC3987bis" 
550         target="http://tools.ietf.org/id/draft-ietf-iri-3987bis">
551         
552          <front>
553            <title>Internationalized Resource Identifiers (IRIs)</title>
554          <author initials="M." surname="Duerst"/>
555          <author initials="L." surname="Masinter" fullname="Larry Masinter"/>
556          <author initials="M." surname="Suignard"/>
557          <date year="2011"/>
558          </front>
559      </reference>
560
561
562&rfc2119;
563&rfc3490;
564&rfc3491;
565&rfc3629;
566&rfc3986;
567&rfc5890;
568
569<reference anchor="UNIV6">
570<front>
571<title>The Unicode Standard, Version 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, ISBN 978-1-936213-01-6)</title>
572<author><organization>The Unicode Consortium</organization></author>
573<date year="2010" month="October"/>
574</front>
575</reference>
576
577
578<reference anchor="UTR15" target="http://www.unicode.org/unicode/reports/tr15/tr15-23.html">
579<front>
580<title>Unicode Normalization Forms</title>
581<author initials="M." surname="Davis" fullname="Mark Davis"><organization/></author>
582<author initials="M.J." surname="Duerst" fullname="Martin Duerst"><organization/></author>
583<date year="2008" month="March"/>
584</front>
585<seriesInfo name="Unicode Standard Annex" value="#15"/>
586</reference>
587
588</references>
589
590<references title="Informative References">
591
592<reference anchor="HTML4" target="http://www.w3.org/TR/html401/appendix/notes.html#h-B.2">
593<front>
594<title>HTML 4.01 Specification</title>
595<author initials="D." surname="Raggett" fullname="Dave Raggett"><organization/></author>
596<author initials="A." surname="Le Hors" fullname="Arnaud Le Hors"><organization/></author>
597<author initials="I." surname="Jacobs" fullname="Ian Jacobs"><organization/></author>
598<date year="1999" month="December" day="24"/>
599</front>
600<seriesInfo name="World Wide Web Consortium" value="Recommendation"/>
601</reference>
602
603&rfc2045;
604&rfc3987;
605&rfc2616;
606 
607
608<reference anchor="UTR36" target="http://unicode.org/reports/tr36/">
609<front>
610<title>Unicode Security Considerations</title>
611<author initials="M." surname="Davis" fullname="Mark Davis"><organization/></author>
612<author initials="M." surname="Suignard" fullname="Michel Suignard"><organization/></author>
613<date year="2010" month="August" day="4"/>
614</front>
615<seriesInfo name="Unicode Technical Report" value="#36"/>
616</reference>
617
618<reference anchor="XML1" target="http://www.w3.org/TR/REC-xml">
619  <front>
620    <title>Extensible Markup Language (XML) 1.0 (Forth Edition)</title>
621    <author initials="T." surname="Bray" fullname="Tim Bray"><organization/></author>
622    <author initials="J." surname="Paoli" fullname="Jean Paoli"><organization/></author>
623    <author initials="C.M." surname="Sperberg-McQueen" fullname="C. M. Sperberg-McQueen">
624      <organization/></author>
625    <author initials="E." surname="Maler" fullname="Eve Maler"><organization/></author>
626    <author initials="F." surname="Yergeau" fullname="Francois Yergeau"><organization/></author>
627    <date day="16" month="August" year="2006"/>
628  </front>
629  <seriesInfo name="World Wide Web Consortium" value="Recommendation"/>
630</reference>
631
632<reference anchor="XMLNamespace" target="http://www.w3.org/TR/REC-xml-names">
633  <front>
634    <title>Namespaces in XML (Second Edition)</title>
635    <author initials="T." surname="Bray" fullname="Tim Bray"><organization/></author>
636    <author initials="D." surname="Hollander" fullname="Dave Hollander"><organization/></author>
637    <author initials="A." surname="Layman" fullname="Andrew Layman"><organization/></author>
638    <author initials="R." surname="Tobin" fullname="Richard Tobin"><organization></organization></author>
639    <date day="16" month="August" year="2006"/>
640  </front>
641  <seriesInfo name="World Wide Web Consortium" value="Recommendation"/>
642</reference>
643
644</references>
645
646</back>
647</rfc>
Note: See TracBrowser for help on using the repository browser.