source: draft-ietf-iri-3987bis/draft-ietf-iri-bidi-guidelines.xml @ 93

Last change on this file since 93 was 93, checked in by adil@…, 8 years ago

Updated contact information for Adil

File size: 18.7 KB
Line 
1<?xml version="1.0"?>
2<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
3<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
4<!ENTITY rfc3490 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3490.xml">
5<!ENTITY rfc3491 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3491.xml">
6<!ENTITY rfc3987 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3987.xml">
7]>
8<?rfc strict='yes'?>
9
10<?xml-stylesheet type='text/css' href='rfc2629.css' ?>
11<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
12<?rfc symrefs='yes'?>
13<?rfc sortrefs='yes'?>
14<?rfc iprnotified="no" ?>
15<?rfc toc='yes'?>
16<?rfc compact='yes'?>
17<?rfc subcompact='no'?>
18<rfc ipr="pre5378Trust200902" docName="draft-ietf-iri-bidi-guidelines-00" category="bcp" xml:lang="en">
19<front>
20<title abbrev="Bidi IRI Guidelines">Guidelines for Internationalized Resource Identifiers with Bi-directional Characters (Bidi IRIs)</title>
21
22  <author initials="M.J." surname="Duerst" fullname='Martin Duerst'>
23    <!-- (Note: Please write "Duerst" with u-umlaut wherever
24      possible, for example as "D&#252;rst" in XML and HTML) -->
25  <organization abbrev="Aoyama Gakuin University">Aoyama Gakuin University</organization>
26  <address>
27  <postal>
28  <street>5-10-1 Fuchinobe</street>
29  <city>Sagamihara</city>
30  <region>Kanagawa</region>
31  <code>229-8558</code>
32  <country>Japan</country>
33  </postal>
34  <phone>+81 42 759 6329</phone>
35  <facsimile>+81 42 759 6495</facsimile>
36  <email>duerst@it.aoyama.ac.jp</email>
37  <uri>http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/<!-- (Note: This is the percent-encoded form of an IRI)--></uri>
38  </address>
39</author>
40
41<author initials="L." surname="Masinter" fullname="Larry Masinter">
42   <organization>Adobe</organization>
43   <address>
44   <postal>
45   <street>345 Park Ave</street>
46   <city>San Jose</city>
47   <region>CA</region>
48   <code>95110</code>
49   <country>U.S.A.</country>
50   </postal>
51   <phone>+1-408-536-3024</phone>
52   <email>masinter@adobe.com</email>
53   <uri>http://larry.masinter.net</uri>
54   </address>
55</author>
56 
57<author initials="A." surname="Allawi" fullname="Adil Allawi">
58  <organization>Diwan Software Limited</organization>
59  <address>
60  <postal>
61  <street>37-39 Peckham Road</street>
62  <city>London</city>
63  <code>SE5 8UH</code>
64  <country>United Kingdom</country>
65  </postal>
66  <phone>+44 7718 785850</phone>
67  <facsimile>+44 20 72525444</facsimile>
68  <email>adil@diwan.com</email>
69  <uri>http://ironymark.diwan.com/</uri>
70  </address>
71</author>
72
73<date year="2011" month="August" day="14" />
74
75<area>Applications</area>
76<workgroup>Internationalized Resource Identifiers (iri)</workgroup>
77<keyword>IRI</keyword>
78<keyword>Internationalized Resource Identifier</keyword>
79<keyword>BIDI</keyword>
80<keyword>URI</keyword>
81<keyword>URL</keyword>
82<keyword>IDN</keyword>
83
84<abstract>
85
86<t>This specification gives guidelines for selection, use, presentation of
87International Resource Identifiers (IRI) which include characters with
88in inherent right-to-left (rtl) writing direction. </t>
89</abstract>
90
91</front>
92<middle>
93
94<section title="Introduction">
95
96<t>Some UCS characters, such as those used in the Arabic and Hebrew
97scripts, have an inherent right-to-left (rtl) writing direction. IRIs
98containing these characters (called bidirectional IRIs or Bidi IRIs)
99require additional attention because of the non-trivial relation
100between logical representation (used for digital representation and
101for reading/spelling) and visual representation (used for
102display/printing).</t>
103
104<t>Because of the complex interaction between the logical representation,
105the visual representation, and the syntax of a Bidi IRI, a balance is
106needed between various requirements.
107The main requirements are<list style="hanging">
108<t hangText="1.">user-predictable conversion between visual and
109    logical representation;</t>
110<t hangText="2.">the ability to include a wide range of characters
111    in various parts of the IRI; and</t>
112<t hangText="3.">minor or no changes or restrictions for
113      implementations.</t>
114</list></t>
115<section title="Notation">
116
117<t>In this document, Bidi Notation is used for bidirectional examples: Lower case
118letters stand for Latin letters or other letters that are written left
119to right, whereas upper case letters represent Arabic or Hebrew
120letters that are written right to left.</t>
121
122<t> In this document, the key words "MUST", "MUST NOT", "REQUIRED",
123"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
124and "OPTIONAL" are to be interpreted as described in <xref
125target="RFC2119"/>.</t>
126
127</section> <!-- Notation -->
128
129</section> <!-- Introduction -->
130
131
132
133
134<section title="Logical Storage and Visual Presentation" anchor="visual">
135
136<t>When stored or transmitted in digital representation, bidirectional
137IRIs MUST be in full logical order and MUST conform to the IRI syntax
138rules (which includes the rules relevant to their scheme). This
139ensures that bidirectional IRIs can be processed in the same way as
140other IRIs.</t> <t>Bidirectional IRIs MUST be rendered by using the
141Unicode Bidirectional Algorithm <xref target="UNIV6"/>, <xref
142target="UNI9"/>.  Bidirectional IRIs MUST be rendered in the same way
143as they would be if they were in a left-to-right embedding; i.e., as
144if they were preceded by U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and
145followed by U+202C, POP DIRECTIONAL FORMATTING (PDF).  Setting the
146embedding direction can also be done in a higher-level protocol (e.g.,
147the dir='ltr' attribute in HTML).</t> 
148
149<t>There is no requirement to use the above embedding if the display
150is still the same without the embedding. For example, a bidirectional
151IRI in a text with left-to-right base directionality (such as used for
152English or Cyrillic) that is preceded and followed by whitespace and
153strong left-to-right characters does not need an embedding.  Also, a
154bidirectional relative IRI reference that only contains strong
155right-to-left characters and weak characters and that starts and ends
156with a strong right-to-left character and appears in a text with
157right-to-left base directionality (such as used for Arabic or Hebrew)
158and is preceded and followed by whitespace and strong characters does
159not need an embedding.</t>
160
161<t>In some other cases, using U+200E, LEFT-TO-RIGHT MARK (LRM), may be
162sufficient to force the correct display behavior.  However, the
163details of the Unicode Bidirectional algorithm are not always easy to
164understand. Implementers are strongly advised to err on the side of
165caution and to use embedding in all cases where they are not
166completely sure that the display behavior is unaffected without the
167embedding.</t>
168
169<t>The Unicode Bidirectional Algorithm (<xref target="UNI9"/>, section
1704.3) permits higher-level protocols to influence bidirectional
171rendering. Such changes by higher-level protocols MUST NOT be used if
172they change the rendering of IRIs.</t> 
173
174<t>The bidirectional formatting characters that may be used before or
175after the IRI to ensure correct display are not themselves part of the
176IRI.  IRIs MUST NOT contain bidirectional formatting characters (LRM,
177RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual rendering of
178the IRI but do not appear themselves. It would therefore not be
179possible to input an IRI with such characters correctly.</t>
180
181</section> <!-- visual -->
182<section title="Bidi IRI Structure" anchor="bidi-structure">
183
184<t>The Unicode Bidirectional Algorithm is designed mainly for running
185text.  To make sure that it does not affect the rendering of
186bidirectional IRIs too much, some restrictions on bidirectional IRIs
187are necessary. These restrictions are given in terms of delimiters
188(structural characters, mostly punctuation such as "@", ".", ":",
189and<vspace/>"/") and components (usually consisting mostly of letters
190and digits).</t>
191
192<t>The following syntax rules from the ABNF of <xref target="RFC3987bis"/>
193 correspond to
194components for the purpose of Bidi behavior: iuserinfo, ireg-name,
195isegment, isegment-nz, isegment-nz-nc, ireg-name, iquery, and
196ifragment.</t>
197
198<t>Specifications that define the syntax of any of the above
199components MAY divide them further and define smaller parts to be
200components according to this document. As an example, the restrictions
201of <xref target="RFC3490"/> on bidirectional domain names correspond
202to treating each label of a domain name as a component for schemes
203with ireg-name as a domain name.  Even where the components are not
204defined formally, it may be helpful to think about some syntax in
205terms of components and to apply the relevant restrictions.  For
206example, for the usual name/value syntax in query parts, it is
207convenient to treat each name and each value as a component. As
208another example, the extensions in a resource name can be treated as
209separate components.</t>
210
211<t>For each component, the following restrictions apply:</t>
212<t>
213<list style="hanging">
214
215<t hangText="1.">A component SHOULD NOT use both right-to-left and
216  left-to-right characters.</t>
217
218<t hangText="2.">A component using right-to-left characters SHOULD
219  start and end with right-to-left characters.</t>
220
221</list></t>
222
223<t>The above restrictions are given as "SHOULD"s, rather than as
224"MUST"s.  For IRIs that are never presented visually, they are not
225relevant.  However, for IRIs in general, they are very important to
226ensure consistent conversion between visual presentation and logical
227representation, in both directions.</t>
228
229<t><list style="hanging">
230
231<t hangText="Note:">In some components, the above restrictions may
232  actually be strictly enforced.  For example, <xref
233  target="RFC3490"></xref> requires that these restrictions apply to
234  the labels of a host name for those schemes where ireg-name is a
235  host name.  In some other components (for example, path components)
236  following these restrictions may not be too difficult.  For other
237  components, such as parts of the query part, it may be very
238  difficult to enforce the restrictions because the values of query
239  parameters may be arbitrary character sequences.</t>
240
241</list></t>
242
243<t>If the above restrictions cannot be satisfied otherwise, the
244affected component can always be mapped to URI notation using the
245general percent-encoding of IRI components, as described
246in <xref target="RFC3987bis"/>. Please note that the whole component
247has to be mapped (see also Example 9 below).</t>
248
249</section> <!-- bidi-structure -->
250
251<section title="Input of Bidi IRIs" anchor="bidiInput">
252
253<t>Bidi input methods MUST generate Bidi IRIs in logical order while
254rendering them according to <xref target="visual"/>.  During input,
255rendering SHOULD be updated after every new character is input to
256avoid end-user confusion.</t>
257
258</section> <!-- bidiInput -->
259
260<section title="Examples">
261
262<t>This section gives examples of bidirectional IRIs, in Bidi
263Notation.  It shows legal IRIs with the relationship between logical
264and visual representation and explains how certain phenomena in this
265relationship may look strange to somebody not familiar with
266bidirectional behavior, but familiar to users of Arabic and Hebrew. It
267also shows what happens if the restrictions given in <xref
268target="bidi-structure"/> are not followed. The examples below can be
269seen at <xref target="BidiEx"/>, in Arabic, Hebrew, and Bidi Notation
270variants.</t>
271
272<t>To read the bidi text in the examples, read the visual
273representation from left to right until you encounter a block of rtl
274text. Read the rtl block (including slashes and other special
275characters) from right to left, then continue at the next unread ltr
276character.</t>
277
278<t>Example 1: A single component with rtl characters is inverted:
279<vspace/>Logical representation:
280"http://ab.CDEFGH.ij/kl/mn/op.html"<vspace/>Visual representation:
281"http://ab.HGFEDC.ij/kl/mn/op.html"<vspace/> Components can be read
282one by one, and each component can be read in its natural
283direction.</t>
284
285<t>Example 2: More than one consecutive component with rtl characters
286is inverted as a whole: <vspace/>Logical representation:
287"http://ab.CDE.FGH/ij/kl/mn/op.html"<vspace/>Visual representation:
288"http://ab.HGF.EDC/ij/kl/mn/op.html"<vspace/> A sequence of rtl
289components is read rtl, in the same way as a sequence of rtl words is
290read rtl in a bidi text.</t>
291
292<t>Example 3: All components of an IRI (except for the scheme) are
293rtl.  All rtl components are inverted overall: <vspace/>Logical
294representation:
295"http://AB.CD.EF/GH/IJ/KL?MN=OP;QR=ST#UV"<vspace/>Visual
296representation: "http://VU#TS=RQ;PO=NM?LK/JI/HG/FE.DC.BA"<vspace/> The
297whole IRI (except the scheme) is read rtl. Delimiters between rtl
298components stay between the respective components; delimiters between
299ltr and rtl components don't move.</t>
300
301<t>Example 4: Each of several sequences of rtl components is inverted
302on its own: <vspace/>Logical representation:
303"http://AB.CD.ef/gh/IJ/KL.html"<vspace/>Visual representation:
304"http://DC.BA.ef/gh/LK/JI.html"<vspace/> Each sequence of rtl
305components is read rtl, in the same way as each sequence of rtl words
306in an ltr text is read rtl.</t>
307
308<t>Example 5: Example 2, applied to components of different kinds:
309<vspace/>Logical representation: "http://ab.cd.EF/GH/ij/kl.html"
310<vspace/>Visual representation:
311"http://ab.cd.HG/FE/ij/kl.html"<vspace/> The inversion of the domain
312name label and the path component may be unexpected, but it is
313consistent with other bidi behavior.  For reassurance that the domain
314component really is "ab.cd.EF", it may be helpful to read aloud the
315visual representation following the bidi algorithm. After
316"http://ab.cd." one reads the RTL block "E-F-slash-G-H", which
317corresponds to the logical representation.
318</t>
319
320<t>Example 6: Same as Example 5, with more rtl components:
321<vspace/>Logical representation:
322"http://ab.CD.EF/GH/IJ/kl.html"<vspace/>Visual representation:
323"http://ab.JI/HG/FE.DC/kl.html"<vspace/> The inversion of the domain
324name labels and the path components may be easier to identify because
325the delimiters also move.</t>
326
327<t>Example 7: A single rtl component includes digits: <vspace/>Logical
328representation: "http://ab.CDE123FGH.ij/kl/mn/op.html"<vspace/>Visual
329representation: "http://ab.HGF123EDC.ij/kl/mn/op.html"<vspace/>
330Numbers are written ltr in all cases but are treated as an additional
331embedding inside a run of rtl characters. This is completely
332consistent with usual bidirectional text.</t>
333
334<t>Example 8 (not allowed): Numbers are at the start or end of an rtl
335component:<vspace/>Logical representation:
336"http://ab.cd.ef/GH1/2IJ/KL.html"<vspace/>Visual representation:
337"http://ab.cd.ef/LK/JI1/2HG.html"<vspace/> The sequence "1/2" is
338interpreted by the bidi algorithm as a fraction, fragmenting the
339components and leading to confusion. There are other characters that
340are interpreted in a special way close to numbers; in particular, "+",
341"-", "#", "$", "%", ",", ".", and ":".</t>
342
343<t>Example 9 (not allowed): The numbers in the previous example are
344percent-encoded: <vspace/>Logical representation:
345"http://ab.cd.ef/GH%31/%32IJ/KL.html",<vspace/>Visual representation:
346"http://ab.cd.ef/LK/JI%32/%31HG.html"</t>
347
348<t>Example 10 (allowed but not recommended): <vspace/>Logical
349representation: "http://ab.CDEFGH.123/kl/mn/op.html"<vspace/>Visual
350representation: "http://ab.123.HGFEDC/kl/mn/op.html"<vspace/>
351Components consisting of only numbers are allowed (it would be rather
352difficult to prohibit them), but these may interact with adjacent RTL
353components in ways that are not easy to predict.</t>
354
355<t>Example 11 (allowed but not recommended): <vspace/>Logical
356representation: "http://ab.CDEFGH.123ij/kl/mn/op.html"<vspace/>Visual
357representation: "http://ab.123.HGFEDCij/kl/mn/op.html"<vspace/>
358Components consisting of numbers and left-to-right characters are
359allowed, but these may interact with adjacent RTL components in ways
360that are not easy to predict.</t>
361</section><!-- examples -->
362
363<section title="IANA Considerations" anchor="iana">
364<t>This document makes no changes to IANA registries.</t>
365</section> <!-- IANA -->
366   
367<section title="Security Considerations" anchor="security">
368<t>Confusion can occur with bidirectional IRIs, if the restrictions
369in <xref target="bidi-structure"/> are not followed. The same visual
370representation may be interpreted as different logical representations,
371and vice versa. It is also very important that a correct Unicode bidirectional
372implementation be used.</t>
373</section><!-- security -->
374
375<section title="Acknowledgements">
376<t>This document was derived from <xref target="RFC3987"/> and
377<xref target="RFC3987bis"/> and the acknowledgments of those
378documents apply.</t>
379</section><!-- acknowledgements -->
380</middle>
381
382<back>
383<references title="Normative References">
384
385      <reference anchor="RFC3987bis" 
386         target="http://tools.ietf.org/id/draft-ietf-iri-3987bis">
387         
388          <front>
389            <title>Internationalized Resource Identifiers (IRIs)</title>
390          <author initials="M." surname="Duerst"/>
391          <author initials="L." surname="Masinter" fullname="Larry Masinter"/>
392          <author initials="M." surname="Suignard"/>
393          <date year="2011" month="August" day="14"/>
394          </front>
395      </reference>
396
397
398<reference anchor="ASCII">
399<front>
400<title>Coded Character Set -- 7-bit American Standard Code for Information
401Interchange</title>
402<author>
403<organization>American National Standards Institute</organization>
404</author>
405<date year="1986"/>
406</front>
407<seriesInfo name="ANSI" value="X3.4"/>
408</reference>
409
410<reference anchor="ISO10646">
411<front>
412<title>ISO/IEC 10646:2003: Information Technology -
413Universal Multiple-Octet Coded Character Set (UCS)</title>
414<author>
415<organization>International Organization for Standardization</organization>
416</author>
417<date month="December" year="2003"/>
418</front>
419<seriesInfo name="ISO" value="Standard 10646"/>
420</reference>
421
422&rfc2119;
423&rfc3490;
424&rfc3491;
425
426
427<reference anchor="UNIV6">
428<front>
429<title>The Unicode Standard, Version 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, ISBN 978-1-936213-01-6)</title>
430<author><organization>The Unicode Consortium</organization></author>
431<date year="2010" month="October"/>
432</front>
433</reference>
434
435<reference anchor="UNI9" target="http://www.unicode.org/reports/tr9/tr9-13.html">
436<front>
437<title>The Bidirectional Algorithm</title>
438<author initials="M." surname="Davis" fullname="Mark Davis"><organization/></author>
439<date year="2004" month="March"/>
440</front>
441<seriesInfo name="Unicode Standard Annex" value="#9"/>
442</reference>
443
444</references>
445
446<references title="Informative References">
447
448<reference anchor="BidiEx" target="http://www.w3.org/International/iri-edit/BidiExamples">
449<front>
450<title>Examples of bidirectional IRIs</title>
451<author><organization/></author>
452<date year="" month=""/>
453</front>
454</reference>
455
456
457&rfc3987;
458 
459
460</references>
461
462</back>
463</rfc>
Note: See TracBrowser for help on using the repository browser.