Changeset 101


Ignore:
Timestamp:
Mar 6, 2012, 5:29:09 PM (8 years ago)
Author:
adil@…
Message:

Improved some explainatiion
Regularized language
Made 80 column for easier comparisons in the future

File:
1 edited

Legend:

Unmodified
Added
Removed
  • draft-ietf-iri-3987bis/draft-ietf-iri-bidi-guidelines.xml

    r100 r101  
    1616<?rfc compact='yes'?>
    1717<?rfc subcompact='no'?>
    18 <rfc ipr="pre5378Trust200902" docName="draft-ietf-iri-bidi-guidelines-01" category="bcp" xml:lang="en">
    19 <front>
    20 <title abbrev="Bidi IRI Guidelines">Guidelines for Internationalized Resource Identifiers with Bi-directional Characters (Bidi IRIs)</title>
    21 
    22   <author initials="M.J." surname="Duerst" fullname='Martin Duerst'>
    23     <!-- (Note: Please write "Duerst" with u-umlaut wherever
    24       possible, for example as "D&#252;rst" in XML and HTML) -->
    25   <organization abbrev="Aoyama Gakuin University">Aoyama Gakuin University</organization>
    26   <address>
    27   <postal>
    28   <street>5-10-1 Fuchinobe</street>
    29   <city>Sagamihara</city>
    30   <region>Kanagawa</region>
    31   <code>229-8558</code>
    32   <country>Japan</country>
    33   </postal>
    34   <phone>+81 42 759 6329</phone>
    35   <facsimile>+81 42 759 6495</facsimile>
    36   <email>duerst@it.aoyama.ac.jp</email>
    37   <uri>http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/<!-- (Note: This is the percent-encoded form of an IRI)--></uri>
    38   </address>
    39 </author>
    40 
    41 <author initials="L." surname="Masinter" fullname="Larry Masinter">
    42    <organization>Adobe</organization>
    43    <address>
    44    <postal>
    45    <street>345 Park Ave</street>
    46    <city>San Jose</city>
    47    <region>CA</region>
    48    <code>95110</code>
    49    <country>U.S.A.</country>
    50    </postal>
    51    <phone>+1-408-536-3024</phone>
    52    <email>masinter@adobe.com</email>
    53    <uri>http://larry.masinter.net</uri>
    54    </address>
    55 </author>
    56  
    57 <author initials="A." surname="Allawi" fullname="Adil Allawi">
    58   <organization>Diwan Software Limited</organization>
    59   <address>
    60   <postal>
    61   <street>37-39 Peckham Road</street>
    62   <city>London</city>
    63   <code>SE5 8UH</code>
    64   <country>United Kingdom</country>
    65   </postal>
    66   <phone>+44 7718 785850</phone>
    67   <facsimile>+44 20 72525444</facsimile>
    68   <email>adil@diwan.com</email>
    69   <uri>http://ironymark.diwan.com/</uri>
    70   </address>
    71 </author>
    72 
    73 <date year="2012" month="March" day="2" />
    74 
    75 <area>Applications</area>
    76 <workgroup>Internationalized Resource Identifiers (iri)</workgroup>
    77 <keyword>IRI</keyword>
    78 <keyword>Internationalized Resource Identifier</keyword>
    79 <keyword>BIDI</keyword>
    80 <keyword>URI</keyword>
    81 <keyword>URL</keyword>
    82 <keyword>IDN</keyword>
    83 
    84 <abstract>
    85 
    86 <t>This specification gives guidelines for selection, use, presentation of
    87 International Resource Identifiers (IRI) which include characters with
    88 in inherent right-to-left (rtl) writing direction. </t>
    89 </abstract>
    90 
    91 </front>
    92 <middle>
    93 
    94 <section title="Introduction">
    95 
    96 <t>Some UCS characters, such as those used in the Arabic and Hebrew
    97 scripts, have an inherent right-to-left (rtl) writing direction. IRIs
    98 containing these characters (called bidirectional IRIs or Bidi IRIs)
    99 require additional attention because of the non-trivial relation
    100 between logical representation (used for digital representation and
    101 for reading/spelling) and visual representation (used for
    102 display/printing).</t>
    103 
    104 <t>Because of the complex interaction between the logical representation,
    105 the visual representation, and the syntax of a Bidi IRI, a balance is
    106 needed between various requirements.
    107 The main requirements are<list style="hanging">
    108 <t hangText="1.">user-predictable conversion between visual and
    109     logical representation;</t>
    110 <t hangText="2.">the ability to include a wide range of characters
    111     in various parts of the IRI; and</t>
    112 <t hangText="3.">minor or no changes or restrictions for
    113       implementations.</t>
    114 </list></t>
    115 <section title="Notation">
    116 
    117 <t>In this document, Bidi Notation is used for bidirectional examples: Lower case
    118 letters stand for Latin letters or other letters that are written left
    119 to right, whereas upper case letters represent Arabic or Hebrew
    120 letters that are written right to left.</t>
    121 
    122 <t> In this document, the key words "MUST", "MUST NOT", "REQUIRED",
    123 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
    124 and "OPTIONAL" are to be interpreted as described in <xref
    125 target="RFC2119"/>.</t>
    126 
    127 </section> <!-- Notation -->
    128 
    129 </section> <!-- Introduction -->
    130 
    131 
    132 
    133 
    134 <section title="Logical Storage and Visual Presentation" anchor="visual">
    135 
    136 <t>When stored or transmitted in digital representation, bidirectional
    137 IRIs MUST be in full logical order and MUST conform to the IRI syntax
    138 rules (which includes the rules relevant to their scheme). This
    139 ensures that bidirectional IRIs can be processed in the same way as
    140 other IRIs.</t> <t>Bidirectional IRIs MUST be rendered by using the
    141 Unicode Bidirectional Algorithm <xref target="UNIV6"/>, <xref
    142 target="UNI9"/>.  Bidirectional IRIs MUST be rendered in the same way
    143 as they would be if they were in a left-to-right embedding; i.e., as
    144 if they were preceded by U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and
    145 followed by U+202C, POP DIRECTIONAL FORMATTING (PDF).  Setting the
    146 embedding direction can also be done in a higher-level protocol (e.g.,
    147 the dir='ltr' attribute in HTML).</t>
    148 
    149 <t>There is no requirement to use the above embedding if the display
    150 is still the same without the embedding. For example, a bidirectional
    151 IRI in a text with left-to-right base directionality (such as used for
    152 English or Cyrillic) that is preceded and followed by whitespace and
    153 strong left-to-right characters does not need an embedding.  Also, a
    154 bidirectional relative IRI reference that only contains strong
    155 right-to-left characters and weak characters and that starts and ends
    156 with a strong right-to-left character and appears in a text with
    157 right-to-left base directionality (such as used for Arabic or Hebrew)
    158 and is preceded and followed by whitespace and strong characters does
    159 not need an embedding.</t>
    160 
    161 <t>In some other cases, using U+200E, LEFT-TO-RIGHT MARK (LRM), may be
    162 sufficient to force the correct display behavior.  However, the
    163 details of the Unicode Bidirectional algorithm are not always easy to
    164 understand. Implementers are strongly advised to err on the side of
    165 caution and to use embedding in all cases where they are not
    166 completely sure that the display behavior is unaffected without the
    167 embedding.</t>
    168 
    169 <t>The Unicode Bidirectional Algorithm (<xref target="UNI9"/>, section
    170 4.3) permits higher-level protocols to influence bidirectional
    171 rendering. Such changes by higher-level protocols MUST NOT be used if
    172 they change the rendering of IRIs.</t>
    173 
    174 <t>The bidirectional formatting characters that may be used before or
    175 after the IRI to ensure correct display are not themselves part of the
    176 IRI.  IRIs MUST NOT contain bidirectional formatting characters (LRM,
    177 RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual rendering of
    178 the IRI but do not appear themselves. It would therefore not be
    179 possible to input an IRI with such characters correctly.</t>
    180 
    181 </section> <!-- visual -->
    182 <section title="Bidi IRI Structure" anchor="bidi-structure">
    183 
    184 <t>The Unicode Bidirectional Algorithm is designed mainly for running
    185 text.  To make sure that it does not affect the rendering of
    186 bidirectional IRIs too much, some restrictions on bidirectional IRIs
    187 are necessary. These restrictions are given in terms of delimiters
    188 (structural characters, mostly punctuation such as "@", ".", ":",
    189 and<vspace/>"/") and components (usually consisting mostly of letters
    190 and digits).</t>
    191 
    192 <t>The following syntax rules from the ABNF of <xref target="RFC3987bis"/>
    193  correspond to
    194 components for the purpose of Bidi behavior: iuserinfo, ireg-name,
    195 isegment, isegment-nz, isegment-nz-nc, ireg-name, iquery, and
    196 ifragment.</t>
    197 
    198 <t>Specifications that define the syntax of any of the above
    199 components MAY divide them further and define smaller parts to be
    200 components according to this document. As an example, the restrictions
    201 of <xref target="RFC3490"/> on bidirectional domain names correspond
    202 to treating each label of a domain name as a component for schemes
    203 with ireg-name as a domain name.  Even where the components are not
    204 defined formally, it may be helpful to think about some syntax in
    205 terms of components and to apply the relevant restrictions.  For
    206 example, for the usual name/value syntax in query parts, it is
    207 convenient to treat each name and each value as a component. As
    208 another example, the extensions in a resource name can be treated as
    209 separate components.</t>
    210 
    211 <t>For each component, the following restrictions apply:</t>
    212 <t>
    213 <list style="hanging">
    214 
    215 <t hangText="1.">A component SHOULD NOT use both right-to-left and
    216   left-to-right characters.</t>
    217 
    218 <t hangText="2.">A component using right-to-left characters SHOULD
    219   start and end with right-to-left characters.</t>
    220 
    221 </list></t>
    222 
    223 <t>The above restrictions are given as "SHOULD"s, rather than as
    224 "MUST"s.  For IRIs that are never presented visually, they are not
    225 relevant.  However, for IRIs in general, they are very important to
    226 ensure consistent conversion between visual presentation and logical
    227 representation, in both directions.</t>
    228 
    229 <t><list style="hanging">
    230 
    231 <t hangText="Note:">In some components, the above restrictions may
    232   actually be strictly enforced.  For example, <xref
    233   target="RFC3490"></xref> requires that these restrictions apply to
    234   the labels of a host name for those schemes where ireg-name is a
    235   host name.  In some other components (for example, path components)
    236   following these restrictions may not be too difficult.  For other
    237   components, such as parts of the query part, it may be very
    238   difficult to enforce the restrictions because the values of query
    239   parameters may be arbitrary character sequences.</t>
    240 
    241 </list></t>
    242 
    243 <t>If the above restrictions cannot be satisfied otherwise, the
    244 affected component can always be mapped to URI notation using the
    245 general percent-encoding of IRI components, as described
    246 in <xref target="RFC3987bis"/>. Please note that the whole component
    247 has to be mapped (see also Example 9 below).</t>
    248 
    249 </section> <!-- bidi-structure -->
    250 
    251 <section title="Input of Bidi IRIs" anchor="bidiInput">
    252 
    253 <t>Bidi input methods MUST generate Bidi IRIs in logical order while
    254 rendering them according to <xref target="visual"/>.  During input,
    255 rendering SHOULD be updated after every new character is input to
    256 avoid end-user confusion.</t>
    257 
    258 </section> <!-- bidiInput -->
    259 
    260 <section title="Examples">
    261 
    262 <t>This section gives examples of bidirectional IRIs, in Bidi
    263 Notation.  It shows legal IRIs with the relationship between logical
    264 and visual representation and explains how certain phenomena in this
    265 relationship may look strange to somebody not familiar with
    266 bidirectional behavior, but familiar to users of Arabic and Hebrew. It
    267 also shows what happens if the restrictions given in <xref
    268 target="bidi-structure"/> are not followed. The examples below can be
    269 seen at <xref target="BidiEx"/>, in Arabic, Hebrew, and Bidi Notation
    270 variants.</t>
    271 
    272 <t>To read the bidi text in the examples, read the visual
    273 representation from left to right until you encounter a block of rtl
    274 text. Read the rtl block (including slashes and other special
    275 characters) from right to left, then continue at the next unread ltr
    276 character.</t>
    277 
    278 <t>Example 1: A single component with rtl characters is inverted:
    279 <vspace/>Logical representation:
    280 "http://ab.CDEFGH.ij/kl/mn/op.html"<vspace/>Visual representation:
    281 "http://ab.HGFEDC.ij/kl/mn/op.html"<vspace/> Components can be read
    282 one by one, and each component can be read in its natural
    283 direction.</t>
    284 
    285 <t>Example 2: More than one consecutive component with rtl characters
    286 is inverted as a whole: <vspace/>Logical representation:
    287 "http://ab.CDE.FGH/ij/kl/mn/op.html"<vspace/>Visual representation:
    288 "http://ab.HGF.EDC/ij/kl/mn/op.html"<vspace/> A sequence of rtl
    289 components is read rtl, in the same way as a sequence of rtl words is
    290 read rtl in a bidi text.</t>
    291 
    292 <t>Example 3: All components of an IRI (except for the scheme) are
    293 rtl.  All rtl components are inverted overall: <vspace/>Logical
    294 representation:
    295 "http://AB.CD.EF/GH/IJ/KL?MN=OP;QR=ST#UV"<vspace/>Visual
    296 representation: "http://VU#TS=RQ;PO=NM?LK/JI/HG/FE.DC.BA"<vspace/> The
    297 whole IRI (except the scheme) is read rtl. Delimiters between rtl
    298 components stay between the respective components; delimiters between
    299 ltr and rtl components don't move.</t>
    300 
    301 <t>Example 4: Each of several sequences of rtl components is inverted
    302 on its own: <vspace/>Logical representation:
    303 "http://AB.CD.ef/gh/IJ/KL.html"<vspace/>Visual representation:
    304 "http://DC.BA.ef/gh/LK/JI.html"<vspace/> Each sequence of rtl
    305 components is read rtl, in the same way as each sequence of rtl words
    306 in an ltr text is read rtl.</t>
    307 
    308 <t>Example 5: Example 2, applied to components of different kinds:
    309 <vspace/>Logical representation: "http://ab.cd.EF/GH/ij/kl.html"
    310 <vspace/>Visual representation:
    311 "http://ab.cd.HG/FE/ij/kl.html"<vspace/> The inversion of the domain
    312 name label and the path component may be unexpected, but it is
    313 consistent with other bidi behavior.  For reassurance that the domain
    314 component really is "ab.cd.EF", it may be helpful to read aloud the
    315 visual representation following the bidi algorithm. After
    316 "http://ab.cd." one reads the RTL block "E-F-slash-G-H", which
    317 corresponds to the logical representation.
    318 </t>
    319 
    320 <t>Example 6: Same as Example 5, with more rtl components:
    321 <vspace/>Logical representation:
    322 "http://ab.CD.EF/GH/IJ/kl.html"<vspace/>Visual representation:
    323 "http://ab.JI/HG/FE.DC/kl.html"<vspace/> The inversion of the domain
    324 name labels and the path components may be easier to identify because
    325 the delimiters also move.</t>
    326 
    327 <t>Example 7: A single rtl component includes digits: <vspace/>Logical
    328 representation: "http://ab.CDE123FGH.ij/kl/mn/op.html"<vspace/>Visual
    329 representation: "http://ab.HGF123EDC.ij/kl/mn/op.html"<vspace/>
    330 Numbers are written ltr in all cases but are treated as an additional
    331 embedding inside a run of rtl characters. This is completely
    332 consistent with usual bidirectional text.</t>
    333 
    334 <t>Example 8 (not allowed): Numbers are at the start or end of an rtl
    335 component:<vspace/>Logical representation:
    336 "http://ab.cd.ef/GH1/2IJ/KL.html"<vspace/>Visual representation:
    337 "http://ab.cd.ef/LK/JI1/2HG.html"<vspace/> The sequence "1/2" is
    338 interpreted by the bidi algorithm as a fraction, fragmenting the
    339 components and leading to confusion. There are other characters that
    340 are interpreted in a special way close to numbers; in particular, "+",
    341 "-", "#", "$", "%", ",", ".", and ":".</t>
    342 
    343 <t>Example 9 (not allowed): The numbers in the previous example are
    344 percent-encoded: <vspace/>Logical representation:
    345 "http://ab.cd.ef/GH%31/%32IJ/KL.html",<vspace/>Visual representation:
    346 "http://ab.cd.ef/LK/JI%32/%31HG.html"</t>
    347 
    348 <t>Example 10 (allowed but not recommended): <vspace/>Logical
    349 representation: "http://ab.CDEFGH.123/kl/mn/op.html"<vspace/>Visual
    350 representation: "http://ab.123.HGFEDC/kl/mn/op.html"<vspace/>
    351 Components consisting of only numbers are allowed (it would be rather
    352 difficult to prohibit them), but these may interact with adjacent RTL
    353 components in ways that are not easy to predict.</t>
    354 
    355 <t>Example 11 (allowed but not recommended): <vspace/>Logical
    356 representation: "http://ab.CDEFGH.123ij/kl/mn/op.html"<vspace/>Visual
    357 representation: "http://ab.123.HGFEDCij/kl/mn/op.html"<vspace/>
    358 Components consisting of numbers and left-to-right characters are
    359 allowed, but these may interact with adjacent RTL components in ways
    360 that are not easy to predict.</t>
    361 </section><!-- examples -->
    362 
    363 <section title="IANA Considerations" anchor="iana">
    364 <t>This document makes no changes to IANA registries.</t>
    365 </section> <!-- IANA -->
    366    
    367 <section title="Security Considerations" anchor="security">
    368 <t>Confusion can occur with bidirectional IRIs, if the restrictions
    369 in <xref target="bidi-structure"/> are not followed. The same visual
    370 representation may be interpreted as different logical representations,
    371 and vice versa. It is also very important that a correct Unicode bidirectional
    372 implementation be used.</t>
    373 </section><!-- security -->
    374 
    375 <section title="Acknowledgements">
    376 <t>This document was derived from <xref target="RFC3987"/> and
    377 <xref target="RFC3987bis"/> and the acknowledgments of those
    378 documents apply.</t>
    379 </section><!-- acknowledgements -->
    380 </middle>
    381 
    382 <back>
    383 <references title="Normative References">
    384 
    385       <reference anchor="RFC3987bis"
    386          target="http://tools.ietf.org/id/draft-ietf-iri-3987bis">
    387          
    388           <front>
    389             <title>Internationalized Resource Identifiers (IRIs)</title>
     18<rfc ipr="pre5378Trust200902" docName="draft-ietf-iri-bidi-guidelines-01"
     19  category="bcp" xml:lang="en">
     20  <front>
     21    <title abbrev="Bidi IRI Guidelines">Guidelines for Internationalized
     22      Resource Identifiers with Bi-directional Characters (Bidi IRIs)</title>
     23    <author initials="M.J." surname="Duerst" fullname="Martin Duerst">
     24      <!-- (Note: Please write "Duerst" with u-umlaut wherever
     25        possible, for example as "D&#252;rst" in XML and HTML) -->
     26      <organization abbrev="Aoyama Gakuin University">Aoyama Gakuin
     27        University</organization>
     28      <address>
     29        <postal>
     30          <street>5-10-1 Fuchinobe</street>
     31          <city>Sagamihara</city>
     32          <region>Kanagawa</region>
     33          <code>229-8558</code>
     34          <country>Japan</country>
     35        </postal>
     36        <phone>+81 42 759 6329</phone>
     37        <facsimile>+81 42 759 6495</facsimile>
     38        <email>duerst@it.aoyama.ac.jp</email>
     39        <uri>http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/<!-- (Note: This is the percent-encoded form of an IRI)--></uri>
     40      </address>
     41    </author>
     42    <author initials="L." surname="Masinter" fullname="Larry Masinter">
     43      <organization>Adobe</organization>
     44      <address>
     45        <postal>
     46          <street>345 Park Ave</street>
     47          <city>San Jose</city>
     48          <region>CA</region>
     49          <code>95110</code>
     50          <country>U.S.A.</country>
     51        </postal>
     52        <phone>+1-408-536-3024</phone>
     53        <email>masinter@adobe.com</email>
     54        <uri>http://larry.masinter.net</uri>
     55      </address>
     56    </author>
     57    <author initials="A." surname="Allawi" fullname="Adil Allawi">
     58      <organization>Diwan Software Limited</organization>
     59      <address>
     60        <postal>
     61          <street>37-39 Peckham Road</street>
     62          <city>London</city>
     63          <code>SE5 8UH</code>
     64          <country>United Kingdom</country>
     65        </postal>
     66        <phone>+44 7718 785850</phone>
     67        <facsimile>+44 20 72525444</facsimile>
     68        <email>adil@diwan.com</email>
     69        <uri>http://ironymark.diwan.com/</uri>
     70      </address>
     71    </author>
     72    <date year="2012" month="March" day="2"/>
     73    <area>Applications</area>
     74    <workgroup>Internationalized Resource Identifiers (iri)</workgroup>
     75    <keyword>IRI</keyword>
     76    <keyword>Internationalized Resource Identifier</keyword>
     77    <keyword>BIDI</keyword>
     78    <keyword>URI</keyword>
     79    <keyword>URL</keyword>
     80    <keyword>IDN</keyword>
     81    <abstract>
     82      <t>This specification gives guidelines for selection, use, and
     83        presentation of International Resource Identifiers (IRIs) which include
     84        characters with inherent right-to-left (rtl) writing direction. </t>
     85    </abstract>
     86  </front>
     87  <middle>
     88    <section title="Introduction">
     89      <t>Some UCS characters, such as those used in the Arabic and Hebrew
     90        scripts, have an inherent right-to-left (rtl) writing direction as
     91        opposed to characters, such as those in Latin scripts, that have an
     92        inherent left-to-right (ltr) direction. IRIs containing rtl characters
     93        (called bidirectional IRIs or Bidi IRIs) require additional attention
     94        because of the non-trivial relation between their logical and visual
     95        ordering. The logical order represents the order in which the characters
     96        are read and stored on computers. The visual order represents the order
     97        the characters are drawn on a computer display or printout in the way a
     98        human expects to read them.</t>
     99      <t>Generally, alphabetic characters in scripts like Arabic and Hebrew are
     100        drawn rtl while numbers are drawn ltr. Symbols, such as slash '/' and
     101        period '.' take their visual direction from the surrounding chracters.</t>
     102      <t>Because of this complex interaction between the logical representation,
     103        the visual representation, and the syntax of a Bidi IRI, a balance is
     104        needed between various requirements. The main requirements are: <list
     105        style="hanging">
     106        <t hangText="1.">user-predictable conversion between visual and logical
     107          representation;</t>
     108        <t hangText="2.">the ability to include a wide range of characters in
     109          various parts of the IRI; and</t>
     110        <t hangText="3.">minor or no changes or restrictions for
     111          implementations.</t>
     112        </list></t>
     113      <section title="Notation">
     114        <t>In this document, "Bidi Notation" is used for the given Bidi IRI
     115          examples as follows: Lower case letters a-z stand for characters that
     116          are written with a left to right ordering (such as Latin characters),
     117          whereas upper case letters A-Z represent characters that are written
     118          right to left (such as Arqbic or Hebrew characters). Numbers and
     119          symbols are the same.</t>
     120        <t> In this document, the key words "MUST", "MUST NOT", "REQUIRED",
     121          "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
     122          and "OPTIONAL" are to be interpreted as described in <xref
     123            target="RFC2119"/>.</t>
     124      </section>
     125      <!-- Notation -->
     126    </section>
     127    <!-- Introduction -->
     128    <section title="Logical Storage and Visual Presentation" anchor="visual">
     129      <t>When stored or transmitted in digital representation, Bidi IRIs MUST be
     130        in full logical order and MUST conform to the IRI syntax rules (which
     131        includes the rules relevant to their scheme). This ensures that
     132        Bidi IRIs can be processed in the same way as other IRIs.</t>
     133      <t>Bidi IRIs MUST be visually ordered by the Unicode Bidirectional
     134        Algorithm <xref target="UNIV6"/>, <xref target="UNI9"/>. Bidi IRIs MUST
     135        be rendered in the same way as they would be if they were in a
     136        left-to-right embedding. </t>
     137      <t>In conformance with the Unicode Bidirectional Algorithm, embedding MAY
     138        be done in one of two ways: <list style="hanging">
     139        <t hangText="1.">precede the IRI with U+202A, LEFT-TO-RIGHT EMBEDDING
     140          (LRE), and follow with U+202C, POP DIRECTIONAL FORMATTING (PDF);
     141          or</t>
     142        <t hangText="2.">use a higher-level protocol (e.g., the dir='ltr'
     143          attribute in HTML).</t>
     144        </list></t>
     145      <t>Preceding and following the Bidi IRI with U+200E, LEFT-TO-RIGHT MARK
     146        (LRM). Is NOT RECOMMENDED as, there are cases where this may not be
     147        sufficient to match full left to right embedding.</t>
     148      <t>There is no requirement to use embedding if the display is still the
     149        same without the embedding. For example, a Bidi IRI in a text
     150        with left-to-right base directionality (such as used for English or
     151        Cyrillic) that is preceded and followed by whitespace and strong
     152        left-to-right characters does not need an embedding. Also, a
     153        bidirectional relative IRI reference that only contains strong
     154        right-to-left characters and weak characters (such as symbols) and that
     155        starts and ends with a strong right-to-left character and appears in a
     156        text with right-to-left base directionality (such as used for Arabic or
     157        Hebrew) and is preceded and followed by whitespace and strong characters
     158        does not need an embedding.</t>
     159      <t>However, Implementers are, RECOMMENDED to use embedding in all cases
     160        where they are not completely sure that the display behavior is
     161        unaffected without the embedding.</t>
     162      <t>The Unicode Bidirectional Algorithm (<xref target="UNI9"/>, section
     163        4.3) permits higher-level protocols to influence bidirectional
     164        rendering. Such changes by higher-level protocols MUST NOT be used if
     165        they change the rendering of IRIs.</t>
     166      <t>The bidirectional formatting characters that may be used before or
     167        after the IRI to ensure correct display are not themselves part of the
     168        IRI. IRIs MUST NOT contain bidirectional formatting characters (LRM,
     169        RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual rendering of
     170        the IRI but do not appear themselves. It would therefore not be possible
     171        to input an IRI with such characters correctly.</t>
     172    </section>
     173    <!-- visual -->
     174    <section title="Bidi IRI Structure" anchor="bidi-structure">
     175      <t>The Unicode Bidirectional Algorithm is designed mainly for plain text.
     176        To make sure that it does not affect the rendering of Bidi IRIs outside
     177        of the requirements of this document, some restrictions on Bidi IRIs are
     178        necessary. These restrictions are given in terms of delimiters
     179        (structural characters, mostly punctuation such as "@", ".", ":", and
     180        "/") and components (usually consisting mostly of letters and
     181        digits).</t>
     182      <t>The following syntax rules from the ABNF of <xref target="RFC3987bis"/>
     183        correspond to components for the purpose of Bidi behavior: iuserinfo,
     184        ireg-name, isegment, isegment-nz, isegment-nz-nc, ireg-name, iquery, and
     185        ifragment.</t>
     186      <t>Specifications that define the syntax of any of the above components
     187        MAY divide them further and define smaller parts to be components
     188        according to this document. As an example, the restrictions of <xref
     189          target="RFC3490"/> on bidirectional domain names correspond to treating
     190        each label of a domain name as a component for schemes with ireg-name as
     191        a domain name. Even where the components are not defined formally, it
     192        may be helpful to think about some syntax in terms of components and to
     193        apply the relevant restrictions. For example, for the usual name/value
     194        syntax in query parts, it is convenient to treat each name and each
     195        value as a component. As another example, the extensions in a resource
     196        name can be treated as separate components.</t>
     197      <t>For each component, the following restrictions apply:</t>
     198      <t> <list style="hanging">
     199        <t hangText="1.">A component SHOULD NOT use both right-to-left and
     200          left-to-right characters.</t>
     201        <t hangText="2.">A component using right-to-left characters SHOULD start
     202          and end with right-to-left characters.</t>
     203      </list></t>
     204      <t>The above restrictions are given as "SHOULD"s, rather than as "MUST"s.
     205        For IRIs that are never presented visually, they are not relevant.
     206        However, for IRIs in general, they are very important to ensure
     207        consistent conversion between visual presentation and logical
     208        representation, in both directions.</t>
     209      <t><list style="hanging">
     210        <t hangText="Note:">In some components, the above restrictions may
     211          actually be strictly enforced. For example, <xref target="RFC3490"/>
     212          requires that these restrictions apply to the labels of a host name
     213          for those schemes where ireg-name is a host name. In some other
     214          components (for example, path components) following these restrictions
     215          may not be too difficult. For other components, such as parts of the
     216          query part, it may be very difficult to enforce the restrictions
     217          because the values of query parameters may be arbitrary character
     218          sequences.</t>
     219      </list></t>
     220      <t>If the above restrictions cannot be satisfied otherwise, the affected
     221        component can always be mapped to URI notation using the general
     222        percent-encoding of IRI components, as described in <xref
     223          target="RFC3987bis"/>. Please note that the whole component has to be
     224        mapped (see also Example 9 below).</t>
     225    </section>
     226    <!-- bidi-structure -->
     227    <section title="Input of Bidi IRIs" anchor="bidiInput">
     228      <t>Bidi input methods MUST generate Bidi IRIs in logical order while
     229        rendering them according to <xref target="visual"/>. During input,
     230        rendering SHOULD be updated after every new character is input to avoid
     231        end-user confusion.</t>
     232    </section>
     233    <!-- bidiInput -->
     234    <section title="Examples">
     235      <t>This section gives examples of Bidi IRIs in Bidi Notation. It shows
     236        legal IRIs with the relationship between their logical and visual
     237        representation and explains how certain phenomena in this relationship
     238        may look strange to somebody not familiar with bidirectional behavior,
     239        but familiar to users of Arabic and Hebrew. It also shows what happens
     240        if the restrictions given in <xref target="bidi-structure"/> are not
     241        followed. The examples below can be seen at <xref target="BidiEx"/>, in
     242        Arabic, Hebrew, and Bidi Notation variants.</t>
     243      <t>To read the bidi text in the examples, read the visual representation
     244        from left to right until you encounter a block of rtl text. Read the rtl
     245        block (including slashes and other special characters) from right to
     246        left, then continue at the next unread ltr character.</t>
     247      <t>Example 1: A single component with rtl characters is inverted:
     248        <vspace/>Logical representation:
     249        "http://ab.CDEFGH.ij/kl/mn/op.html"<vspace/>Visual representation:
     250        "http://ab.HGFEDC.ij/kl/mn/op.html"<vspace/>Components can be read one
     251        by one, and each component can be read in its natural direction.</t>
     252      <t>Example 2: More than one consecutive component with rtl characters is
     253        inverted as a whole: <vspace/>Logical representation:
     254        "http://ab.CDE.FGH/ij/kl/mn/op.html"<vspace/>Visual representation:
     255        "http://ab.HGF.EDC/ij/kl/mn/op.html"<vspace/> A sequence of rtl
     256        components is read rtl, in the same way as a sequence of rtl words is
     257        read rtl in a bidi text.</t>
     258      <t>Example 3: All components of an IRI (except for the scheme) are rtl.
     259        All rtl components are inverted overall: <vspace/>Logical
     260        representation: "http://AB.CD.EF/GH/IJ/KL?MN=OP;QR=ST#UV"<vspace/>Visual
     261        representation: "http://VU#TS=RQ;PO=NM?LK/JI/HG/FE.DC.BA"<vspace/> The
     262        whole IRI (except the scheme) is read rtl. Delimiters between rtl
     263        components stay between the respective components; delimiters between
     264        ltr and rtl components don't move.</t>
     265      <t>Example 4: Each of several sequences of rtl components is inverted on
     266        its own: <vspace/>Logical representation:
     267        "http://AB.CD.ef/gh/IJ/KL.html"<vspace/>Visual representation:
     268        "http://DC.BA.ef/gh/LK/JI.html"<vspace/> Each sequence of rtl components
     269        is read rtl, in the same way as each sequence of rtl words in an ltr
     270        text is read rtl.</t>
     271      <t>Example 5: Example 2, applied to components of different kinds:
     272        <vspace/>Logical representation: "http://ab.cd.EF/GH/ij/kl.html"
     273        <vspace/>Visual representation: "http://ab.cd.HG/FE/ij/kl.html"<vspace/>
     274        The inversion of the domain name label and the path component may be
     275        unexpected, but it is consistent with other bidi behavior. For
     276        reassurance that the domain component really is "ab.cd.EF", it may be
     277        helpful to read aloud the visual representation following the Unicode
     278        Bidirectional Algorithm. After "http://ab.cd." one reads the RTL block
     279        "E-F-slash-G-H", which corresponds to the logical representation. </t>
     280      <t>Example 6: Same as Example 5, with more rtl components:
     281        <vspace/>Logical representation:
     282        "http://ab.CD.EF/GH/IJ/kl.html"<vspace/>Visual representation:
     283        "http://ab.JI/HG/FE.DC/kl.html"<vspace/> The inversion of the domain
     284        name labels and the path components may be easier to identify because
     285        the delimiters also move.</t>
     286      <t>Example 7: A single rtl component includes digits: <vspace/>Logical
     287        representation: "http://ab.CDE123FGH.ij/kl/mn/op.html"<vspace/>Visual
     288        representation: "http://ab.HGF123EDC.ij/kl/mn/op.html"<vspace/> Numbers
     289        are written ltr in all cases but are treated as an additional embedding
     290        inside a run of rtl characters. This is completely consistent with usual
     291        bidirectional text.</t>
     292      <t>Example 8 (not allowed): Numbers are at the start or end of an rtl
     293        component:<vspace/>Logical representation:
     294        "http://ab.cd.ef/GH1/2IJ/KL.html"<vspace/>Visual representation:
     295        "http://ab.cd.ef/LK/JI1/2HG.html"<vspace/> The sequence "1/2" is
     296        interpreted by the Bidirectional Algorithm as a fraction, fragmenting the
     297        components and leading to confusion. There are other characters that are
     298        interpreted in a special way close to numbers; in particular, "+", "-",
     299        "#", "$", "%", ",", ".", and ":".</t>
     300      <t>Example 9 (not allowed): The numbers in the previous example are
     301        percent-encoded: <vspace/>Logical representation:
     302        "http://ab.cd.ef/GH%31/%32IJ/KL.html",<vspace/>Visual representation:
     303        "http://ab.cd.ef/LK/JI%32/%31HG.html"</t>
     304      <t>Example 10 (allowed but not recommended): <vspace/>Logical
     305        representation: "http://ab.CDEFGH.123/kl/mn/op.html"<vspace/>Visual
     306        representation: "http://ab.123.HGFEDC/kl/mn/op.html"<vspace/> Components
     307        consisting of only numbers are allowed (it would be rather difficult to
     308        prohibit them), but these may interact with adjacent RTL components in
     309        ways that are not easy to predict.</t>
     310      <t>Example 11 (allowed but not recommended): <vspace/>Logical
     311        representation: "http://ab.CDEFGH.123ij/kl/mn/op.html"<vspace/>Visual
     312        representation: "http://ab.123.HGFEDCij/kl/mn/op.html"<vspace/>
     313        Components consisting of numbers and left-to-right characters are
     314        allowed, but these may interact with adjacent RTL components in ways
     315        that are not easy to predict.</t>
     316    </section>
     317    <!-- examples -->
     318    <section title="IANA Considerations" anchor="iana">
     319      <t>This document makes no changes to IANA registries.</t>
     320    </section>
     321    <!-- IANA -->
     322    <section title="Security Considerations" anchor="security">
     323      <t>Confusion can occur with bidirectional IRIs, if the restrictions in
     324        <xref target="bidi-structure"/> are not followed. The same visual
     325        representation may be interpreted as different logical representations,
     326        and vice versa. It is also very important that a correct Unicode
     327        bidirectional implementation be used.</t>
     328    </section>
     329    <!-- security -->
     330    <section title="Acknowledgements">
     331      <t>This document was derived from <xref target="RFC3987"/> and <xref
     332        target="RFC3987bis"/> and the acknowledgments of those documents
     333        apply.</t>
     334    </section>
     335    <!-- acknowledgements -->
     336  </middle>
     337  <back>
     338    <references title="Normative References">
     339      <reference anchor="RFC3987bis"
     340        target="http://tools.ietf.org/id/draft-ietf-iri-3987bis">
     341        <front>
     342          <title>Internationalized Resource Identifiers (IRIs)</title>
    390343          <author initials="M." surname="Duerst"/>
    391           <author initials="L." surname="Masinter" fullname="Larry Masinter"/>
     344          <author initials="L." surname="Masinter" fullname="Larry Masinter"/>
    392345          <author initials="M." surname="Suignard"/>
    393           <date year="2011" month="August" day="14"/>
    394           </front>
     346          <date year="2011" month="August" day="14"/>
     347        </front>
    395348      </reference>
    396 
    397 
    398 <reference anchor="ASCII">
    399 <front>
    400 <title>Coded Character Set -- 7-bit American Standard Code for Information
    401 Interchange</title>
    402 <author>
    403 <organization>American National Standards Institute</organization>
    404 </author>
    405 <date year="1986"/>
    406 </front>
    407 <seriesInfo name="ANSI" value="X3.4"/>
    408 </reference>
    409 
    410 <reference anchor="ISO10646">
    411 <front>
    412 <title>ISO/IEC 10646:2003: Information Technology -
    413 Universal Multiple-Octet Coded Character Set (UCS)</title>
    414 <author>
    415 <organization>International Organization for Standardization</organization>
    416 </author>
    417 <date month="December" year="2003"/>
    418 </front>
    419 <seriesInfo name="ISO" value="Standard 10646"/>
    420 </reference>
    421 
    422 &rfc2119;
    423 &rfc3490;
    424 &rfc3491;
    425 
    426 
    427 <reference anchor="UNIV6">
    428 <front>
    429 <title>The Unicode Standard, Version 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, ISBN 978-1-936213-01-6)</title>
    430 <author><organization>The Unicode Consortium</organization></author>
    431 <date year="2010" month="October"/>
    432 </front>
    433 </reference>
    434 
    435 <reference anchor="UNI9" target="http://www.unicode.org/reports/tr9/tr9-13.html">
    436 <front>
    437 <title>The Bidirectional Algorithm</title>
    438 <author initials="M." surname="Davis" fullname="Mark Davis"><organization/></author>
    439 <date year="2004" month="March"/>
    440 </front>
    441 <seriesInfo name="Unicode Standard Annex" value="#9"/>
    442 </reference>
    443 
    444 </references>
    445 
    446 <references title="Informative References">
    447 
    448 <reference anchor="BidiEx" target="http://www.w3.org/International/iri-edit/BidiExamples">
    449 <front>
    450 <title>Examples of bidirectional IRIs</title>
    451 <author><organization/></author>
    452 <date year="" month=""/>
    453 </front>
    454 </reference>
    455 
    456 
    457 &rfc3987;
    458  
    459 
    460 </references>
    461 
    462 </back>
     349      <reference anchor="ASCII">
     350        <front>
     351          <title>Coded Character Set -- 7-bit American Standard Code for
     352            Information Interchange</title>
     353          <author>
     354            <organization>American National Standards Institute</organization>
     355          </author>
     356          <date year="1986"/>
     357        </front>
     358        <seriesInfo name="ANSI" value="X3.4"/>
     359      </reference>
     360      <reference anchor="ISO10646">
     361        <front>
     362          <title>ISO/IEC 10646:2003: Information Technology - Universal
     363            Multiple-Octet Coded Character Set (UCS)</title>
     364          <author>
     365            <organization>International Organization for
     366              Standardization</organization>
     367          </author>
     368          <date month="December" year="2003"/>
     369        </front>
     370        <seriesInfo name="ISO" value="Standard 10646"/>
     371      </reference> &rfc2119; &rfc3490; &rfc3491; <reference anchor="UNIV6">
     372        <front>
     373          <title>The Unicode Standard, Version 6.0.0 (Mountain View, CA, The
     374            Unicode Consortium, 2011, ISBN 978-1-936213-01-6)</title>
     375          <author>
     376            <organization>The Unicode Consortium</organization>
     377          </author>
     378          <date year="2010" month="October"/>
     379        </front>
     380      </reference>
     381      <reference anchor="UNI9"
     382        target="http://www.unicode.org/reports/tr9/tr9-13.html">
     383        <front>
     384          <title>The Unicode Bidirectional Algorithm</title>
     385          <author initials="M." surname="Davis" fullname="Mark Davis">
     386            <organization/>
     387          </author>
     388          <date year="2004" month="March"/>
     389        </front>
     390        <seriesInfo name="Unicode Standard Annex" value="#9"/>
     391      </reference>
     392    </references>
     393    <references title="Informative References">
     394      <reference anchor="BidiEx"
     395        target="http://www.w3.org/International/iri-edit/BidiExamples">
     396        <front>
     397          <title>Examples of Bidi IRIs</title>
     398          <author>
     399            <organization/>
     400          </author>
     401          <date year="" month=""/>
     402        </front>
     403      </reference> &rfc3987; </references>
     404  </back>
    463405</rfc>
Note: See TracChangeset for help on using the changeset viewer.