source: draft-ietf-iri-3987bis/draft-ietf-iri-comparison.xml @ 94

Last change on this file since 94 was 94, checked in by masinter@…, 8 years ago

ticket 96

File size: 30.5 KB
Line 
1<?xml version="1.0"?>
2<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
3<!ENTITY rfc2045 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2045.xml">
4<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
5<!ENTITY rfc2130 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2130.xml">
6<!ENTITY rfc2616 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2616.xml">
7<!ENTITY rfc3490 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3490.xml">
8<!ENTITY rfc3491 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3491.xml">
9<!ENTITY rfc3629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml">
10<!ENTITY rfc3986 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3986.xml">
11<!ENTITY rfc3987 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3987.xml">
12<!ENTITY rfc5890 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5890.xml">
13]>
14<?rfc strict='yes'?>
15
16<?xml-stylesheet type='text/css' href='rfc2629.css' ?>
17<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
18<?rfc symrefs='yes'?>
19<?rfc sortrefs='yes'?>
20<?rfc iprnotified="no" ?>
21<?rfc toc='yes'?>
22<?rfc compact='yes'?>
23<?rfc subcompact='no'?>
24<rfc ipr="pre5378Trust200902" docName="draft-ietf-iri-comparison-01" category="std" xml:lang="en">
25<front>
26<title abbrev="IRI Equivalence">Equivalence and Canonicalization of Internationalized Resource Identifiers (IRIs)</title>
27
28<author initials="L." surname="Masinter" fullname="Larry Masinter">
29   <organization>Adobe</organization>
30   <address>
31   <postal>
32   <street>345 Park Ave</street>
33   <city>San Jose</city>
34   <region>CA</region>
35   <code>95110</code>
36   <country>U.S.A.</country>
37   </postal>
38   <phone>+1-408-536-3024</phone>
39   <email>masinter@adobe.com</email>
40   <uri>http://larry.masinter.net</uri>
41   </address>
42</author>
43
44  <author initials="M.J." surname="Duerst" fullname='Martin Duerst'>
45    <!-- (Note: Please write "Duerst" with u-umlaut wherever
46      possible, for example as "D&#252;rst" in XML and HTML) -->
47  <organization abbrev="Aoyama Gakuin University">Aoyama Gakuin University</organization>
48  <address>
49  <postal>
50  <street>5-10-1 Fuchinobe</street>
51  <city>Sagamihara</city>
52  <region>Kanagawa</region>
53  <code>229-8558</code>
54  <country>Japan</country>
55  </postal>
56  <phone>+81 42 759 6329</phone>
57  <facsimile>+81 42 759 6495</facsimile>
58  <email>duerst@it.aoyama.ac.jp</email>
59  <uri>http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/<!-- (Note: This is the percent-encoded form of an IRI)--></uri>
60  </address>
61</author>
62
63
64<date year="2011" />
65<area>Applications</area>
66<workgroup>Internationalized Resource Identifiers (iri)</workgroup>
67<keyword>IRI</keyword>
68<keyword>Internationalized Resource Identifier</keyword>
69<keyword>UTF-8</keyword>
70<keyword>URI</keyword>
71<keyword>URL</keyword>
72<keyword>IDN</keyword>
73<keyword>Normalization</keyword>
74<keyword>Canonicalization</keyword>
75<abstract>
76<t>Internationalized Resource Identifiers (IRIs) are unicode strings
77used to identify resources on the Internet. Applications that use
78IRIs often define a means of comparing two IRIs to determine
79when two IRIs are equivalent for the purpose of that
80application. Some applications also define a method
81for 'canonicalizing' or 'normalizing' an IRI -- translating one
82IRI into another which is equivalent under the comparison
83method used.</t>
84<t>This document gives guidelines and best practices for defining
85and using IRI comparison, equivalence, normalization and canonicalization
86methods.</t>
87</abstract>
88
89</front>
90<middle>
91
92<section title="Introduction">
93
94<t>Internationalized Resource Identifiers (IRIs) are unicode strings
95used to identify resources on the Internet. Applications that use
96IRIs often define a means of comparing two IRIs to determine
97when two IRIs are equivalent for the purpose of that
98application. Some applications also define a method
99for 'canonicalizing' or 'normalizing' an IRI -- translating one
100IRI into another which is equivalent under the comparison
101method used.</t>
102<t>This document gives guidelines and best practices for defining
103and using IRI comparison, equivalence, normalization and canonicalization
104methods.</t>
105
106<t>Things to do:
107<list style="symbols"><t>Introductory section on comparison, equivalence, normalization and
108canonicalization.</t>
109<t> Verify acknowledgements for this component.</t>
110<t> Verify cross-references from other documents.</t>
111<t> Consider making 4395bis reference this document and recommend scheme definitions describe equivalence
112specifically.
113</t>
114<t> Consider making this document 'update' 3986 in order to resolve which one is normative if there are conflicts. </t>
115<t> alternatively? Consider making this document BCP rather than standards track, since it basically gives
116  guidance for protocols and applications needing equivalence, and doesn't directly have a scope of application? </t>
117<t> Distingish between IRIs as sequence-of-unicode characters and presentations of IRIs. </t>
118<t> Should we insist that percent-hex encoding equivalence of non-reserved characters
119  MUST be always used if there is any equivalence at all? </t>
120<t> Update security considerations to describe security concerns specific to comparison.</t>
121<t> Consider making sections talk about 'equivalent' rather than 'normalization' where
122  appropriate. </t>
123</list></t>
124
125<t>One of the most common operations on IRIs is simple comparison:
126Determining whether two IRIs are equivalent, without using the IRIs to
127access their respective resource(s). A comparison is performed
128whenever a response cache is accessed, a browser checks its history to
129color a link, or an XML parser processes tags within a
130namespace. Extensive normalization prior to comparison of IRIs may be
131used by spiders and indexing engines to prune a search space or reduce
132duplication of request actions and response storage.</t>
133
134<t>IRI comparison is performed for some particular purpose. Protocols
135or implementations that compare IRIs for different purposes will often
136be subject to differing design trade-offs in regards to how much
137effort should be spent in reducing aliased identifiers. This document
138describes various methods that may be used to compare IRIs, the
139trade-offs between them, and the types of applications that might use
140them.</t>
141
142</section> <!-- introduction -->
143
144<section title="Equivalence">
145
146<t>Because IRIs exist to identify resources, presumably they should be
147considered equivalent when they identify the same resource. However,
148this definition of equivalence is not of much practical use, as there
149is no way for an implementation to compare two resources to determine
150if they are "the same" unless it has full knowledge or control of
151them. For this reason, determination of equivalence or difference of
152IRIs is based on string comparison, augmented by reference to
153additional rules provided by scheme definition.  We use the terms
154"different" and "equivalent" to describe the possible outcomes of such
155comparisons, but there are many application-dependent versions of
156equivalence.</t>
157
158<t>Even when it is possible to determine that two IRIs are equivalent,
159IRI comparison is not sufficient to determine whether two IRIs
160identify different resources. For example, an owner of two different
161domain names could decide to serve the same resource from both,
162resulting in two different IRIs. Therefore, comparison methods are
163designed to minimize false negatives while strictly avoiding false
164positives.</t>
165
166<t>In testing for equivalence, applications should not directly
167compare relative references; the references should be converted to
168their respective target IRIs before comparison. When IRIs are compared
169to select (or avoid) a network action, such as retrieval of a
170representation, fragment components (if any) MUST be excluded from
171the comparison.</t>
172
173<t>Applications using IRIs as identity tokens with no relationship to
174a protocol MUST use the Simple String Comparison (see <xref
175target="stringcomp"></xref>).  All other applications MUST select one
176of the comparison practices from the Comparison Ladder (see <xref
177target="ladder"></xref>.</t>
178</section> <!-- equivalence -->
179
180<section title="Comparison, Equivalence, Normalization and  Canonicalization">
181
182<t>In general, when considering a set of items or strings, there are several
183interrelated concepts. A comparison method determines, between two items in the
184set, their relationship. In particular, a comparison method for determining
185equivalence might result in a determination that two (different) items are equivalent,
186known to be different, or that equivalence isn't determined.  </t>
187<t> One way to define a comparison for equivalence is to define a
188a normalization or canonicalization algorithm. For each item in a set
189of equivalent items, one of them could be designated the "normal" or
190"canonical" form. </t>
191
192<t>These general concepts are used with IRIs in this document,
193and in other circumstances, where a mapping from one sequence of Unicode
194characters to another one could be described as a "normalization" algorithm.</t>
195<t> In general, this document tries to stay with the "equivalence" or
196"comparison" methods, become some times the mathematical notion of
197"normalization" results in forms that ordinary users might not consider "normal"
198in an ordinary sense.
199 </t>
200</section>
201
202<section title="Preparation for Comparison">
203<t>Any kind of IRI comparison REQUIRES that any additional contextual
204processing is first performed, including undoing higher-level
205escapings or encodings in the protocol or format that carries an
206IRI. This preprocessing is usually done when the protocol or format is
207parsed.</t>
208
209<t>Examples of such escapings or encodings are entities and
210numeric character references in <xref target="HTML4"></xref> and <xref
211target="XML1"></xref>. As an example,
212"http://example.org/ros&amp;eacute;" (in HTML),
213"http://example.org/ros&amp;#233;" (in HTML or XML), and
214<vspace/>"http://example.org/ros&amp;#xE9;" (in HTML or XML) are all
215resolved into what is denoted in this document (see 'Notation' section of <xref
216target="RFC3987bis" />) as "http://example.org/ros&amp;#xE9;"
217(the "&amp;#xE9;" here standing for the actual e-acute character, to
218compensate for the fact that this document cannot contain non-ASCII
219characters).</t>
220
221<t>Similar considerations apply to encodings such as Transfer Codings
222in HTTP (see <xref target="RFC2616"></xref>) and Content Transfer
223Encodings in MIME (<xref target="RFC2045"></xref>), although in these
224cases, the encoding is based not on characters but on octets, and
225additional care is required to make sure that characters, and not just
226arbitrary octets, are compared (see <xref
227target="stringcomp"></xref>).</t>
228
229</section> <!-- preparation -->
230
231<section title="Comparison Ladder" anchor="ladder">
232
233<t>In practice, a variety of methods are used to test IRI
234equivalence. These methods fall into a range distinguished by the
235amount of processing required and the degree to which the probability
236of false negatives is reduced. As noted above, false negatives cannot
237be eliminated. In practice, their probability can be reduced, but this
238reduction requires more processing and is not cost-effective for all
239applications.</t>
240
241
242<t>If this range of comparison practices is considered as a ladder,
243the following discussion will climb the ladder, starting with
244practices that are cheap but have a relatively higher chance of
245producing false negatives, and proceeding to those that have higher
246computational cost and lower risk of false negatives.</t>
247
248<section title="Simple String Comparison" anchor="stringcomp">
249
250<t>If two IRIs, when considered as character strings, are identical,
251then it is safe to conclude that they are equivalent.  This type of
252equivalence test has very low computational cost and is in wide use in
253a variety of applications, particularly in the domain of parsing. It
254is also used when a definitive answer to the question of IRI
255equivalence is needed that is independent of the scheme used and that
256can be calculated quickly and without accessing a network. An example
257of such a case is XML Namespaces (<xref
258target="XMLNamespace"></xref>).</t>
259
260
261<t>Testing strings for equivalence requires some basic precautions.
262This procedure is often referred to as "bit-for-bit" or
263"byte-for-byte" comparison, which is potentially misleading. Testing
264strings for equality is normally based on pair comparison of the
265characters that make up the strings, starting from the first and
266proceeding until both strings are exhausted and all characters are
267found to be equal, until a pair of characters compares unequal, or
268until one of the strings is exhausted before the other.</t>
269
270<t>This character comparison requires that each pair of characters be
271put in comparable encoding form. For example, should one IRI be stored
272in a byte array in UTF-8 encoding form and the second in a UTF-16
273encoding form, bit-for-bit comparisons applied naively will produce
274errors. It is better to speak of equality on a character-for-character
275rather than on a byte-for-byte or bit-for-bit basis.  In practical
276terms, character-by-character comparisons should be done codepoint by
277codepoint after conversion to a common character encoding form.
278
279When comparing character by character, the comparison function MUST
280NOT map IRIs to URIs, because such a mapping would create additional
281spurious equivalences. It follows that an IRI SHOULD NOT be modified
282when being transported if there is any chance that this IRI might be
283used in a context that uses Simple String Comparison.</t>
284
285
286<t>False negatives are caused by the production and use of IRI
287aliases. Unnecessary aliases can be reduced, regardless of the
288comparison method, by consistently providing IRI references in an
289already normalized form (i.e., a form identical to what would be
290produced after normalization is applied, as described below).
291Protocols and data formats often limit some IRI comparisons to simple
292string comparison, based on the theory that people and implementations
293will, in their own best interest, be consistent in providing IRI
294references, or at least be consistent enough to negate any efficiency
295that might be obtained from further normalization.</t>
296</section> <!-- stringcomp -->
297
298<section title="Syntax-Based Normalization">
299
300<figure><preamble>Implementations may use logic based on the
301definitions provided by this specification to reduce the probability
302of false negatives. This processing is moderately higher in cost than
303character-for-character string comparison. For example, an application
304using this approach could reasonably consider the following two IRIs
305equivalent:</preamble>
306
307<artwork>
308   example://a/b/c/%7Bfoo%7D/ros&amp;#xE9;
309   eXAMPLE://a/./b/../b/%63/%7bfoo%7d/ros%C3%A9
310</artwork></figure>
311
312<t>Web user agents, such as browsers, typically apply this type of IRI
313normalization when determining whether a cached response is
314available. Syntax-based normalization includes such techniques as case
315normalization, character normalization, percent-encoding
316normalization, and removal of dot-segments.</t>
317
318<section title="Case Normalization">
319
320<t>For all IRIs, the hexadecimal digits within a percent-encoding
321triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore
322should be normalized to use uppercase letters for the digits A-F.</t>
323
324<t>When an IRI uses components of the generic syntax, the component
325syntax equivalence rules always apply; namely, that the scheme and
326US-ASCII only host are case insensitive and therefore should be
327normalized to lowercase. For example, the URI
328"HTTP://www.EXAMPLE.com/" is equivalent to
329"http://www.example.com/". Case equivalence for non-ASCII characters
330in IRI components that are IDNs are discussed in <xref
331target="schemecomp"></xref>.  The other generic syntax components are
332assumed to be case sensitive unless specifically defined otherwise by
333the scheme.</t>
334
335<t>Creating schemes that allow case-insensitive syntax components
336containing non-ASCII characters should be avoided. Case normalization
337of non-ASCII characters can be culturally dependent and is always a
338complex operation. The only exception concerns non-ASCII host names
339for which the character normalization includes a mapping step derived
340from case folding.</t>
341
342</section> <!-- casenorm -->
343
344<section title="Character Normalization" anchor="normalization">
345
346<t>The Unicode Standard <xref target="UNIV6"></xref> defines various
347equivalences between sequences of characters for various
348purposes. Unicode Standard Annex #15 <xref target="UTR15"></xref>
349defines various Normalization Forms for these equivalences, in
350particular Normalization Form C (NFC, Canonical Decomposition,
351followed by Canonical Composition) and Normalization Form KC (NFKC,
352Compatibility Decomposition, followed by Canonical Composition).</t>
353
354<t> IRIs already in Unicode MUST NOT be normalized before parsing or
355interpreting. In many non-Unicode character encodings, some text
356cannot be represented directly. For example, the word "Vietnam" is
357natively written "Vi&amp;#x1EC7;t Nam" (containing a LATIN SMALL
358LETTER E WITH CIRCUMFLEX AND DOT BELOW) in NFC, but a direct
359transcoding from the windows-1258 character encoding leads to
360"Vi&amp;#xEA;&amp;#x323;t Nam" (containing a LATIN SMALL LETTER E WITH
361CIRCUMFLEX followed by a COMBINING DOT BELOW). Direct transcoding of
362other 8-bit encodings of Vietnamese may lead to other
363representations.</t>
364
365<t>Equivalence of IRIs MUST rely on the assumption that IRIs are
366appropriately pre-character-normalized rather than apply character
367normalization when comparing two IRIs. The exceptions are conversion
368from a non-digital form, and conversion from a non-UCS-based character
369encoding to a UCS-based character encoding. In these cases, NFC or a
370normalizing transcoder using NFC MUST be used for interoperability. To
371avoid false negatives and problems with transcoding, IRIs SHOULD be
372created by using NFC. Using NFKC may avoid even more problems; for
373example, by choosing half-width Latin letters instead of full-width
374ones, and full-width instead of half-width Katakana.</t>
375
376
377<t>As an example,
378"http://www.example.org/r&amp;#xE9;sum&amp;#xE9;.html" (in XML
379Notation) is in NFC. On the other hand,
380"http://www.example.org/re&amp;#x301;sume&amp;#x301;.html" is not in
381NFC.</t>
382
383<t>The former uses precombined e-acute characters, and the latter uses
384"e" characters followed by combining acute accents. Both usages are
385defined as canonically equivalent in <xref target="UNIV6"></xref>.</t>
386
387<t><list style="hanging">
388
389<t hangText="Note:">
390Because it is unknown how a particular sequence of characters is being
391treated with respect to character normalization, it would be
392inappropriate to allow third parties to normalize an IRI
393arbitrarily. This does not contradict the recommendation that when a
394resource is created, its IRI should be as character normalized as
395possible (i.e., NFC or even NFKC). This is similar to the
396uppercase/lowercase problems.  Some parts of a URI are case
397insensitive (for example, the domain name). For others, it is unclear
398whether they are case sensitive, case insensitive, or something in
399between (e.g., case sensitive, but with a multiple choice selection if
400the wrong case is used, instead of a direct negative result).  The
401best recipe is that the creator use a reasonable capitalization and,
402when transferring the URI, capitalization never be
403changed.</t></list></t>
404
405<t>Various IRI schemes may allow the usage of Internationalized Domain
406Names (IDN) <xref target="RFC5890"/> either in the ireg-name
407part or elsewhere. Character Normalization also applies to IDNs, as
408discussed in <xref target="schemecomp"/>.</t>
409</section> <!-- charnorm -->
410
411<section title="Percent-Encoding Normalization">
412
413<t>The percent-encoding mechanism (Section 2.1 of <xref
414target="RFC3986"></xref>) is a frequent source of variance among
415otherwise identical IRIs. In addition to the case normalization issue
416noted above, some IRI producers percent-encode octets that do not
417require percent-encoding, resulting in IRIs that are equivalent to
418their nonencoded counterparts. These IRIs should be normalized by
419decoding any percent-encoded octet sequence that corresponds to an
420unreserved character, as described in section 2.3 of <xref
421target="RFC3986"></xref>.</t>
422
423<t>For actual resolution, differences in percent-encoding (except for
424the percent-encoding of reserved characters) MUST always result in the
425same resource.  For example, "http://example.org/~user",
426"http://example.org/%7euser", and "http://example.org/%7Euser", must
427resolve to the same resource.</t>
428
429<t>If this kind of equivalence is to be tested, the percent-encoding
430of both IRIs to be compared has to be aligned; for example, by
431converting both IRIs to URIs, eliminating escape
432differences in the resulting URIs, and making sure that the case of
433the hexadecimal characters in the percent-encoding is always the same
434(preferably upper case). If the IRI is to be passed to another
435application or used further in some other way, its original form MUST
436be preserved.  The conversion described here should be performed only
437for local comparison.</t>
438
439</section> <!-- pctnorm -->
440
441<section title="Path Segment Normalization">
442
443<t>The complete path segments "." and ".." are intended only for use
444within relative references (Section 4.1 of <xref
445target="RFC3986"></xref>) and are removed as part of the reference
446resolution process (Section 5.2 of <xref target="RFC3986"></xref>).
447However, some implementations may incorrectly assume that reference
448resolution is not necessary when the reference is already an IRI, and
449thus fail to remove dot-segments when they occur in non-relative
450paths.  IRI normalizers should remove dot-segments by applying the
451remove_dot_segments algorithm to the path, as described in Section
4525.2.4 of <xref target="RFC3986"></xref>.</t>
453
454</section> <!-- pathnorm -->
455</section> <!-- ladder -->
456
457<section title="Scheme-Based Normalization" anchor="schemecomp">
458
459<t>The syntax and semantics of IRIs vary from scheme to scheme, as
460described by the defining specification for each
461scheme. Implementations may use scheme-specific rules, at further
462processing cost, to reduce the probability of false negatives. For
463example, because the "http" scheme makes use of an authority
464component, has a default port of "80", and defines an empty path to be
465equivalent to "/", the following four IRIs are equivalent:</t>
466
467<figure><artwork>
468   http://example.com
469   http://example.com/
470   http://example.com:/
471   http://example.com:80/</artwork></figure>
472
473<t>In general, an IRI that uses the generic syntax for authority with
474an empty path should be normalized to a path of "/". Likewise, an
475explicit ":port", for which the port is empty or the default for the
476scheme, is equivalent to one where the port and its ":" delimiter are
477elided and thus should be removed by scheme-based normalization. For
478example, the second IRI above is the normal form for the "http"
479scheme.</t>
480
481<t>Another case where normalization varies by scheme is in the
482handling of an empty authority component or empty host
483subcomponent. For many scheme specifications, an empty authority or
484host is considered an error; for others, it is considered equivalent
485to "localhost" or the end-user's host. When a scheme defines a default
486for authority and an IRI reference to that default is desired, the
487reference should be normalized to an empty authority for the sake of
488uniformity, brevity, and internationalization. If, however, either the
489userinfo or port subcomponents are non-empty, then the host should be
490given explicitly even if it matches the default.</t>
491
492<t>Normalization should not remove delimiters when their associated
493component is empty unless it is licensed to do so by the scheme
494specification. For example, the IRI "http://example.com/?" cannot be
495assumed to be equivalent to any of the examples above. Likewise, the
496presence or absence of delimiters within a userinfo subcomponent is
497usually significant to its interpretation.  The fragment component is
498not subject to any scheme-based normalization; thus, two IRIs that
499differ only by the suffix "#" are considered different regardless of
500the scheme.</t>
501 
502<t>Some IRI schemes allow the usage of Internationalized Domain
503Names (IDN) <xref target='RFC5890'></xref> either in their ireg-name
504part or elswhere. When in use in IRIs, those names SHOULD
505conform to the definition of U-Label in <xref
506target='RFC5890'></xref>. An IRI containing an invalid IDN cannot
507successfully be resolved. For legibility purposes, they
508SHOULD NOT be converted into ASCII Compatible Encoding (ACE).</t>
509
510<t>Scheme-based normalization may also consider IDN
511components and their conversions to punycode as equivalent. As an
512example, "http://r&amp;#xE9;sum&amp;#xE9;.example.org" may be
513considered equivalent to
514"http://xn--rsum-bpad.example.org".</t><t>Other scheme-specific
515normalizations are possible.</t>
516
517</section> <!-- schemenorm -->
518
519<section title="Protocol-Based Normalization">
520
521<t>Substantial effort to reduce the incidence of false negatives is
522often cost-effective for web spiders. Consequently, they implement
523even more aggressive techniques in IRI comparison. For example, if
524they observe that an IRI such as</t>
525
526<figure><artwork>
527   http://example.com/data</artwork></figure>
528<t>redirects to an IRI differing only in the trailing slash</t>
529<figure><artwork>
530   http://example.com/data/</artwork></figure>
531
532<t>they will likely regard the two as equivalent in the future.  This
533kind of technique is only appropriate when equivalence is clearly
534indicated by both the result of accessing the resources and the common
535conventions of their scheme's dereference algorithm (in this case, use
536of redirection by HTTP origin servers to avoid problems with relative
537references).</t>
538
539</section> <!-- protonorm -->
540</section> <!-- equivalence -->
541
542<section title="Security Considerations" anchor="security">
543<t>The primary security difficulty comes from applications choosing the
544wrong equivalence relationship, or two different parties disagreeing
545on equivalence. This is especially a problem when IRIs are used in
546security protocols.</t>
547
548<t>Besides the large character repertoire of Unicode, reasons for
549  confusion include different forms of normalization and different normalization
550  expectations, use of percent-encoding with various legacy encodings,
551  and bidirectionality issues. See also <xref target='UTR36'/>.</t>
552
553</section><!-- security -->
554
555<section title="Acknowledgements">
556
557<t>This document was originally derived from <xref target="RFC3986"/>
558and <xref target="RFC3987"/>, based on text contributed by Tim
559Bray.</t>
560</section>
561
562</middle>
563
564<back>
565<references title="Normative References">
566
567      <reference anchor="RFC3987bis" 
568         target="http://tools.ietf.org/id/draft-ietf-iri-3987bis">
569         
570          <front>
571            <title>Internationalized Resource Identifiers (IRIs)</title>
572          <author initials="M." surname="Duerst"/>
573          <author initials="L." surname="Masinter" fullname="Larry Masinter"/>
574          <author initials="M." surname="Suignard"/>
575          <date year="2011"/>
576          </front>
577      </reference>
578
579
580&rfc2119;
581&rfc3490;
582&rfc3491;
583&rfc3629;
584&rfc3986;
585&rfc5890;
586
587<reference anchor="UNIV6">
588<front>
589<title>The Unicode Standard, Version 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, ISBN 978-1-936213-01-6)</title>
590<author><organization>The Unicode Consortium</organization></author>
591<date year="2010" month="October"/>
592</front>
593</reference>
594
595
596<reference anchor="UTR15" target="http://www.unicode.org/unicode/reports/tr15/tr15-23.html">
597<front>
598<title>Unicode Normalization Forms</title>
599<author initials="M." surname="Davis" fullname="Mark Davis"><organization/></author>
600<author initials="M.J." surname="Duerst" fullname="Martin Duerst"><organization/></author>
601<date year="2008" month="March"/>
602</front>
603<seriesInfo name="Unicode Standard Annex" value="#15"/>
604</reference>
605
606</references>
607
608<references title="Informative References">
609
610<reference anchor="HTML4" target="http://www.w3.org/TR/html401/appendix/notes.html#h-B.2">
611<front>
612<title>HTML 4.01 Specification</title>
613<author initials="D." surname="Raggett" fullname="Dave Raggett"><organization/></author>
614<author initials="A." surname="Le Hors" fullname="Arnaud Le Hors"><organization/></author>
615<author initials="I." surname="Jacobs" fullname="Ian Jacobs"><organization/></author>
616<date year="1999" month="December" day="24"/>
617</front>
618<seriesInfo name="World Wide Web Consortium" value="Recommendation"/>
619</reference>
620
621&rfc2045;
622&rfc3987;
623&rfc2616;
624 
625
626<reference anchor="UTR36" target="http://unicode.org/reports/tr36/">
627<front>
628<title>Unicode Security Considerations</title>
629<author initials="M." surname="Davis" fullname="Mark Davis"><organization/></author>
630<author initials="M." surname="Suignard" fullname="Michel Suignard"><organization/></author>
631<date year="2010" month="August" day="4"/>
632</front>
633<seriesInfo name="Unicode Technical Report" value="#36"/>
634</reference>
635
636<reference anchor="XML1" target="http://www.w3.org/TR/REC-xml">
637  <front>
638    <title>Extensible Markup Language (XML) 1.0 (Forth Edition)</title>
639    <author initials="T." surname="Bray" fullname="Tim Bray"><organization/></author>
640    <author initials="J." surname="Paoli" fullname="Jean Paoli"><organization/></author>
641    <author initials="C.M." surname="Sperberg-McQueen" fullname="C. M. Sperberg-McQueen">
642      <organization/></author>
643    <author initials="E." surname="Maler" fullname="Eve Maler"><organization/></author>
644    <author initials="F." surname="Yergeau" fullname="Francois Yergeau"><organization/></author>
645    <date day="16" month="August" year="2006"/>
646  </front>
647  <seriesInfo name="World Wide Web Consortium" value="Recommendation"/>
648</reference>
649
650<reference anchor="XMLNamespace" target="http://www.w3.org/TR/REC-xml-names">
651  <front>
652    <title>Namespaces in XML (Second Edition)</title>
653    <author initials="T." surname="Bray" fullname="Tim Bray"><organization/></author>
654    <author initials="D." surname="Hollander" fullname="Dave Hollander"><organization/></author>
655    <author initials="A." surname="Layman" fullname="Andrew Layman"><organization/></author>
656    <author initials="R." surname="Tobin" fullname="Richard Tobin"><organization></organization></author>
657    <date day="16" month="August" year="2006"/>
658  </front>
659  <seriesInfo name="World Wide Web Consortium" value="Recommendation"/>
660</reference>
661
662</references>
663
664</back>
665</rfc>
Note: See TracBrowser for help on using the repository browser.