Changeset 32

Mar 9, 2011, 2:49:49 AM (9 years ago)

Added a sentence referring to Appendix A in RFC 3987 to the end of Overview subsection of Introduction section.
Removed Appendix A.
This closes issue #53.

1 edited


  • draft-ietf-iri-3987bis/draft-ietf-iri-3987bis.xml

    r31 r32  
    182182additional informative guidelines.  <xref target="security"/>
    183183discusses IRI-specific security considerations.</t>
     185  <t>When defining IRIs originally, several design alternatives were considered.
     186    Historically interested readers can find an overview in Appendix A of <xref target="RFC3987"/>.</t>
    184187</section> <!-- overview -->
    2736 <section title="Design Alternatives">
    2737 <t>This section briefly summarizes some design alternatives
    2738 considered earlier and the reasons why they were not chosen.</t>
    2739 <section title="New Scheme(s)">
    2740 <t>Introducing new schemes (for example, httpi:, ftpi:,...) or a
    2741 new metascheme (e.g., i:, leading to URI/IRI prefixes such as
    2742 i:http:, i:ftp:,...) was proposed to make IRI-to-URI conversion
    2743 scheme dependent or to distinguish between percent-encodings
    2744 resulting from IRI-to-URI conversion and percent-encodings from
    2745 legacy character encodings.</t>
    2747 <t>New schemes are not needed to distinguish URIs from true IRIs (i.e.,
    2748   IRIs that contain non-ASCII characters). The benefit of being able
    2749   to detect the origin of percent-encodings is marginal, as UTF-8
    2750   can be detected with very high reliability. Deploying new schemes is
    2751   extremely hard, so not requiring new schemes for IRIs makes
    2752   deployment of IRIs vastly easier. Making conversion scheme dependent
    2753   is highly inadvisable and would be encouraged by separate schemes for IRIs.
    2754   Using a uniform convention for conversion from IRIs to URIs makes
    2755   IRI implementation orthogonal to the introduction of actual new
    2756   schemes.</t>
    2757 </section>
    2758 <section title="Character Encodings Other Than UTF-8">
    2759 <t>At an early stage, UTF-7 was considered as an alternative to
    2760 UTF-8 when IRIs are converted to URIs. UTF-7 would not have needed
    2761 percent-encoding and  in most cases would have been shorter than
    2762 percent-encoded UTF-8.</t>
    2763 <t>Using UTF-8 avoids a double layering and overloading of the use of
    2764    the "+" character. UTF-8 is fully compatible with US-ASCII and has
    2765    therefore been recommended by the IETF, and is being used widely.</t>
    2767   <t>UTF-7 has never been used much and is now clearly being
    2768    discouraged. Requiring implementations to convert from UTF-8
    2769    to UTF-7 and back would be an additional implementation burden.</t>
    2770 </section> <!-- notutf8 -->
    2771 <section title="New Encoding Convention">
    2772 <t>Instead of using the existing percent-encoding convention
    2773 of URIs, which is based on octets, the idea was to create a new
    2774 encoding convention; for example, to use "%u" to introduce
    2775 UCS code points.</t>
    2776 <t>Using the existing octet-based percent-encoding mechanism
    2777 does not need an upgrade of the URI syntax and does not
    2778 need corresponding server upgrades.</t>
    2779 </section> <!-- new encoding -->
    2780 <section title="Indicating Character Encodings in the URI/IRI">
    2781 <t>Some proposals suggested indicating the character encodings used
    2782 in an URI or IRI with some new syntactic convention in the URI itself,
    2783 similar to the "charset" parameter for e-mails and Web pages.
    2784 As an example, the label in square brackets in
    2785 "[iso-8859-1]&amp;#xE9;" indicated that
    2786 the following "&amp;#xE9;" had to be interpreted as iso-8859-1.</t>
    2787 <t>If UTF-8 is used exclusively, an upgrade to the URI syntax is not needed.
    2788 It avoids potentially multiple labels that have to be copied correctly
    2789 in all cases, even on the side of a bus or on a napkin, leading to
    2790 usability problems (and being prohibitively annoying).
    2791 Exclusively using UTF-8 also reduces transcoding errors and confusion.</t>
    2792 </section> <!-- indicating -->
    2793 </section>
Note: See TracChangeset for help on using the changeset viewer.