Dec 20, 2011, 5:02:22 PM (8 years ago)

edits to address issue #5, plus some other minor edits

1 edited


  • draft-ietf-iri-3987bis/draft-ietf-iri-3987bis.xml

    r81 r82  
    3232<?rfc compact='yes'?>
    3333<?rfc subcompact='no'?>
    34 <rfc ipr="pre5378Trust200902" docName="draft-ietf-iri-3987bis-08" category="std" xml:lang="en" obsoletes="3987">
     34<rfc ipr="pre5378Trust200902" docName="draft-ietf-iri-3987bis-09" category="std" xml:lang="en" obsoletes="3987">
    3636<title abbrev="IRIs">Internationalized Resource Identifiers (IRIs)</title>
    104104rules are given for IRIs and related syntactic forms.</t>
    106 <t>In addition, this document provides named additional rule sets
    107 for processing otherwise invalid IRIs, in a way that supports
    108 other specifications that wish to mandate common behavior for
    109 'error' handling. In particular, rules used in some XML languages
    110 (LEIRI) and web applications are given.</t>
    112106<t>Defining IRI as new protocol element (rather than updating or
    113107extending the definition of URI) allows independent orderly
    118112related protocol elements when revising protocols, formats, and
    119113software components that currently deal only with URIs.</t>
     115<t>This document is part of a set of documents intended to
     116replace RFC 3987.</t>
    162 <t>URIs are used both as a protocol element (for transmission and
    163 processing by software) and also a presentation element (for display
    164 and handling by people who read, interpret, coin, or guess them). The
    165 transition between these roles is more difficult and complex when
    166 dealing with the larger set of characters than allowed for URIs in
    167 <xref target="RFC3986"/>. </t>
     159<t>URIs are composed out of a very limited repertoire of characters;
     160this design choice was made to support global transcription(<xref
     161target="RFC3986"/> section 1.2.1.).  Reliable transition between a URI
     162(as an abstract protocol element composed of a sequence of characters)
     163and a presentation of that URI (written on a napkin, read out loud)
     164and back is relatively straightforward, because of the limited
     165repertoire of characters used.  IRIs are designed to satisfy a
     166different set of use requirements; in particular, to allow IRIs to be
     167written in ways that are more meaningful to their users, even at the
     168expense of global transcribability. However, ensuring reliability of
     169the transition between an IRI and its presentation and back is more
     170difficult and complex when dealing with the larger set of Unicode
     171characters.  For example, Unicode supports multiple ways of encoding
     172complex combinations of characters and accents, with multiple
     173character sequences that can result in the same presentation.</t>
    169175<t>This document defines the protocol element called Internationalized
    178 <t>Using characters outside of A - Z in IRIs adds a number of
    179 difficulties. <xref target="IRIuse"/> discusses the use
     184<t>Within this document,
     185              <xref target="IRIuse"/> discusses the use
    180186of IRIs in different situations.  <xref target="guidelines"/> gives
    181187additional informative guidelines.  <xref target="security"/>
    182188discusses IRI-specific security considerations.</t>
    184 <t>
     190<t>This specification is part of a collection of specifications
     191intended to replace <xref target="RFC3987"/>.
    185192<xref target="Bidi"/> discusses the special case of
    186193bidirectional IRIs using characters from scripts written
    190197some equivalence methods.
    191198<xref target="RFC4395bis"/> updates the URI scheme registration
    192 guidelines and proceedures to note that every URI scheme is also
     199guidelines and procedures to note that every URI scheme is also
    193200automatically an IRI scheme and to allow scheme definitions
    194201to be directly described in terms of Unicode characters.
    197  <t>When originally defining IRIs, several design alternatives were considered.
    198     Historically interested readers can find an overview in Appendix A of <xref target="RFC3987"/>.
    199   For some additional background on the design of URIs and IRIs, please also see
    200     <xref target="Gettys"/>.</t>
    202204</section> <!-- overview -->
    309311    processing of that message by the protocol in question.</t>
    311 <t hangText="presentation element:">A presentation form corresponding
    312     to a protocol element; for example, using a wider range of
    313     characters.</t>
    315313<t hangText="create (a URI or IRI):">With respect to URIs and IRIs,
    316314     the term is used for the initial creation. This may be the
    380378<t>As with URIs, an IRI is defined as a sequence of characters, not as
    381 a sequence of octets. This definition accommodates the fact that IRIs
     379a sequence of octets.
     380This definition accommodates the fact that IRIs
    382381may be written on paper or read over the radio as well as stored or
    383382transmitted digitally.  The same IRI might be represented as different
    588587  <t>An IRI or IRI reference is a sequence of characters from the UCS.
    589     For resource identifiers that are not already in a Unicode form
    590     (as when written on paper, read aloud, or represented in a text stream
    591     using a legacy character encoding), convert the IRI to Unicode.
     588    For input from presentations (written on paper, read aloud)
     589    or translation from other representations (a text stream using a legacy character
     590    encoding), convert the input to Unicode.
    592591    Note that some character encodings or transcriptions can be converted
    593592    to or represented by more than one sequence of Unicode characters.
    597596    since that ensures a stable, consistent representation
    598597    that is most likely to produce the intended results.
     598    Previous versions of this specification required
     599    normalization at this step. However, attempts to
     600    require normalization in other protocols have met with
     601    strong enough resistance that requiring normalization
     602    here was considered impractical.
    599603    Implementers and users are cautioned that, while denormalized character sequences are valid,
    600604    they might be difficult for other users or processes to reproduce
    601605    and might lead to unexpected results.
     606  <!-- raise on list:
     607    It is recommended
     608    that the processing of IRI components treat
     609    strings with the same normalized forms as equivalent.
     610   -->
    602611  </t>
    604 <t> In other cases (written on paper, read aloud, or otherwise
    605  represented independent of any character encoding) represent the IRI
    606  as a sequence of characters from the UCS normalized according to
    607  Unicode Normalization Form C (NFC, <xref target="UTR15"/>).</t>
    608613</section> <!-- ucsconv -->
    12351240for IRI entry.</t>
    1237 <t>A person viewing a visual representation of an IRI (as a sequence
    1238 of glyphs, in some order, in some visual display) or hearing an IRI
     1242<t>A person viewing a visual presentation of an IRI (as a sequence
     1243of glyphs, in some order, in some visual display)
    12391244will use an entry method for characters in the user's language to
    12401245input the IRI. Depending on the script and the input method used, this
Note: See TracChangeset for help on using the changeset viewer.