05/11/12 05:10:11 (10 years ago)

Update charset terminology to the latest BCPs and remove duplicate or antiquated references to IETF requirements

1 edited


  • draft-ietf-httpbis/latest/p2-semantics.xml

    r1964 r1975  
    7272  <!ENTITY header-warning             "<xref target='Part6' x:rel='#header.warning' xmlns:x='http://purl.org/net/xml2rfc/ext'/>">
    7373  <!ENTITY header-www-authenticate    "<xref target='Part7' x:rel='#header.www-authenticate' xmlns:x='http://purl.org/net/xml2rfc/ext'/>">
    74   <!ENTITY media-types                "<xref target='media.types' xmlns:x='http://purl.org/net/xml2rfc/ext'/>">
     74  <!ENTITY media-type                "<xref target='media.type' xmlns:x='http://purl.org/net/xml2rfc/ext'/>">
    7575  <!ENTITY message-body               "<xref target='Part1' x:rel='#message.body' xmlns:x='http://purl.org/net/xml2rfc/ext'/>">
    7676  <!ENTITY media-type-message-http    "<xref target='Part1' x:rel='#internet.media.type.message.http' xmlns:x='http://purl.org/net/xml2rfc/ext'/>">
    245245   with the list rule expanded.
     248   This specification uses the terms
     249   "character",
     250   "character encoding scheme",
     251   "charset", and
     252   "protocol element"
     253   as they are defined in <xref target="RFC6365"/>.
    321329<section title="Data Type" anchor="data.type">
    323 <section title="Media Types" anchor="media.types">
     331<section title="Media Type" anchor="media.type">
    324332  <x:anchor-alias value="media-type"/>
    325333  <x:anchor-alias value="type"/>
    358366   A parameter value that matches the <x:ref>token</x:ref> production can be
    359367   transmitted as either a token or within a quoted-string. The quoted and
    360    unquoted values are equivalent.
    361 </t>
     368   unquoted values are equivalent. For example, the following examples are
     369   all equivalent, but the first is preferred for consistency:
     371<figure><artwork type="example">
     372  text/html;charset=utf-8
     373  text/html;charset=UTF-8
     374  Text/HTML;Charset="utf-8"
     375  text/html; charset="utf-8"
    363378   Media-type values are registered with the Internet Assigned Number
    370 <section title="Character Encodings (charset)" anchor="character.sets">
    371 <t>
    372    HTTP uses charset names to indicate the character encoding of a
    373    textual representation.
    374 </t>
    375 <t anchor="rule.charset">
    376   <x:anchor-alias value="charset"/>
    377    A character encoding is identified by a case-insensitive token. The
    378    complete set of tokens is defined by the IANA Character Set registry
    379    (<eref target="http://www.iana.org/assignments/character-sets"/>).
     385<section title="Charset" anchor="charset">
     386  <x:anchor-alias value="rule.charset"/>
     388   HTTP uses charset names to indicate or negotiate the character encoding
     389   scheme of a textual representation <xref target="RFC6365"/>.
     390   A charset is identified by a case-insensitive token.
    381392<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="charset"/>
    385    Although HTTP allows an arbitrary token to be used as a charset
    386    value, any token that has a predefined value within the IANA
    387    Character Set registry &MUST; represent the character encoding defined
    388    by that registry. Applications &SHOULD; limit their use of character
    389    encodings to those defined within the IANA registry.
    390 </t>
    391 <t>
    392    HTTP uses charset in two contexts: within an <x:ref>Accept-Charset</x:ref>
    393    request header field (in which the charset value is an unquoted token) and
    394    as the value of a parameter in a <x:ref>Content-Type</x:ref> header field
    395    (within a request or response), in which case the parameter value of the
    396    charset parameter can be quoted.
    397 </t>
    398 <t>
    399    Implementers need to be aware of IETF character set requirements <xref target="RFC3629"/>
    400    <xref target="RFC2277"/>.
     396   The IANA Character Set registry
     397   (<eref target="http://www.iana.org/assignments/character-sets"/>)
     398   maintains the set of tokens registered for use on the Internet as
     399   charset names  <xref target="RFC2978"/>.
    416415   applications &MUST; accept CRLF, bare CR, and bare LF as indicating
    417416   a line break in text media received via HTTP. In
    418    addition, if the text is in a character encoding that does not
     417   addition, if the text is in a charset that does not
    419418   use octets 13 and 10 for CR and LF respectively, as is the case for
    420    some multi-byte character encodings, HTTP allows the use of whatever octet
    421    sequences are defined by that character encoding to represent the
     419   some multi-byte charsets, HTTP allows the use of whatever octet
     420   sequences are defined by that charset to represent the
    422421   equivalent of CR and LF for line breaks. This flexibility regarding
    423422   line breaks applies only to text media in the payload body; a bare CR
    481    Media types are defined in <xref target="media.types"/>. An example of the field is
     480   Media types are defined in <xref target="media.type"/>. An example of the field is
    483482<figure><artwork type="example">
    913912   Often, the server has different ways of representing the
    914913   same information; for example, in different formats, languages,
    915    or using different character encodings.
     914   or using different charsets.
    21792178  <x:anchor-alias value="Accept-Charset"/>
    2181    The "Accept-Charset" header field can be used by user agents to
    2182    indicate what character encodings are acceptable in a response
    2183    payload. This field allows
    2184    clients capable of understanding more comprehensive or special-purpose
    2185    character encodings to signal that capability to a server which is capable of
    2186    representing documents in those character encodings.
     2180   The "Accept-Charset" header field can be sent by a user agent to
     2181   indicate what charsets are acceptable in a selected representation.
     2182   This field allows user agents capable of understanding more comprehensive
     2183   or special-purpose charsets to signal that capability to an origin server
     2184   which is capable of representing documents in those charsets.
    21882186<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="Accept-Charset"/>
    2192    Character encoding values (a.k.a., charsets) are described in
    2193    <xref target="character.sets"/>. Each charset &MAY; be given an
    2194    associated quality value which represents the user's preference
    2195    for that charset, as defined in &qvalue;.
     2190   Charset names are defined in <xref target="charset"/>.
     2191   A user agent &MAY; associate a quality value with each charset to indicate
     2192   the user's relative preference for that charset, as defined in &qvalue;.
    21962193   An example is
    22022199   The special value "*", if present in the Accept-Charset field,
    2203    matches every character encoding which is not mentioned elsewhere in the
     2200   matches every charset which is not mentioned elsewhere in the
    22042201   Accept-Charset field. If no "*" is present in an Accept-Charset field,
    2205    then any character encodings not explicitly mentioned in the field are
     2202   then any charsets not explicitly mentioned in the field are
    22062203   considered "not acceptable" to the client.
    22092206   A request without any Accept-Charset header field implies that the user
    2210    agent will accept any character encoding in response.
     2207   agent will accept any charset in response.
     2208   Most general-purpose user agents do not send Accept-Charset, unless
     2209   specifically configured to do so, because a detailed list of supported
     2210   charsets makes it easier for a server to identify an individual by virtue
     2211   of the user agent's request characteristics (a.k.a., fingerprinting).
    22132214   If an Accept-Charset header field is present in a request and none of the
    2214    available representations for the response have a character encoding that
    2215    is listed as acceptable, the origin server &MAY; either honor the
    2216    Accept-Charset header field by sending a <x:ref>406 (Not Acceptable)</x:ref> response or
    2217    disregard the Accept-Charset header field by treating the response as if
     2215   available representations for the response has a charset that is listed as
     2216   acceptable, the origin server &MAY; either honor the Accept-Charset header
     2217   field, by sending a <x:ref>406 (Not Acceptable)</x:ref> response, or
     2218   disregard the Accept-Charset header field by treating the resource as if
    22182219   it is not subject to content negotiation.
    43864387   parameter value ought to be independent of the syntax used for it (for an
    43874388   example, see the notes on parameter handling for media types in
    4388    &media-types;).
     4389   &media-type;).
     5126<reference anchor='RFC6365'>
     5127  <front>
     5128    <title>Terminology Used in Internationalization in the IETF</title>
     5129    <author initials='P.' surname='Hoffman' fullname='P. Hoffman'>
     5130      <organization /></author>
     5131    <author initials='J.' surname='Klensin' fullname='J. Klensin'>
     5132      <organization /></author>
     5133    <date year='2011' month='September' />
     5134  </front>
     5135  <seriesInfo name='BCP' value='166' />
     5136  <seriesInfo name='RFC' value='6365' />
    52245238  </front>
    52255239  <seriesInfo name="RFC" value="2076"/>
    5226 </reference>
    5228 <reference anchor="RFC2277">
    5229   <front>
    5230     <title abbrev="Charset Policy">IETF Policy on Character Sets and Languages</title>
    5231     <author initials="H.T." surname="Alvestrand" fullname="Harald Tveit Alvestrand">
    5232       <organization>UNINETT</organization>
    5233       <address><email>Harald.T.Alvestrand@uninett.no</email></address>
    5234     </author>
    5235     <date month="January" year="1998"/>
    5236   </front>
    5237   <seriesInfo name="BCP" value="18"/>
    5238   <seriesInfo name="RFC" value="2277"/>
    5348 <reference anchor="RFC3629">
     5349<reference anchor='RFC2978'>
    53495350  <front>
    5350     <title>UTF-8, a transformation format of ISO 10646</title>
    5351     <author initials="F." surname="Yergeau" fullname="F. Yergeau">
    5352       <organization>Alis Technologies</organization>
    5353       <address><email>fyergeau@alis.com</email></address>
    5354     </author>
    5355     <date month="November" year="2003"/>
     5351    <title>IANA Charset Registration Procedures</title>
     5352    <author initials='N.' surname='Freed' fullname='N. Freed'>
     5353      <organization /></author>
     5354    <author initials='J.' surname='Postel' fullname='J. Postel'>
     5355      <organization /></author>
     5356    <date year='2000' month='October' />
    53565357  </front>
    5357   <seriesInfo name="STD" value="63"/>
    5358   <seriesInfo name="RFC" value="3629"/>
     5358   <seriesInfo name='BCP' value='19' />
     5359   <seriesInfo name='RFC' value='2978' />
    55625563   of this document to the RFC 2049 canonical form of CRLF. Note, however, that
    55635564   this might be complicated by the presence of a <x:ref>Content-Encoding</x:ref>
    5564    and by the fact that HTTP allows the use of some character encodings which do
    5565    not use octets 13 and 10 to represent CR and LF, respectively, as is the case
    5566    for some multi-byte character encodings.
     5565   and by the fact that HTTP allows the use of some charsets
     5566   which do not use octets 13 and 10 to represent CR and LF, respectively.
    57495749  Clarify contexts that charset is used in.
    5750   (<xref target="character.sets"/>)
    5751 </t>
    5752 <t>
    5753   Remove the default character encoding of "ISO-8859-1" for text media types; the
    5754   default now is whatever the media type definition says.
     5750  (<xref target="charset"/>)
     5753  Remove the default charset of "ISO-8859-1" for text media
     5754  types; the default now is whatever the media type definition says.
    57555755  (<xref target="canonicalization.and.text.defaults"/>)
Note: See TracChangeset for help on using the changeset viewer.