Changeset 1177 for draft-ietf-httpbis/latest/p3-payload.html
- Timestamp:
- 14/03/11 03:25:50 (11 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
draft-ietf-httpbis/latest/p3-payload.html
r1174 r1177 546 546 </li> 547 547 <li>2. <a href="#protocol.parameters">Protocol Parameters</a><ul> 548 <li>2.1 <a href="#character.sets">Character Sets</a><ul>548 <li>2.1 <a href="#character.sets">Character Encodings (charset)</a><ul> 549 549 <li>2.1.1 <a href="#missing.charset">Missing Charset</a></li> 550 550 </ul> … … 685 685 <a href="#abnf.dependencies" class="smpl">qvalue</a> = <qvalue, defined in <a href="#Part1" id="rfc.xref.Part1.8"><cite title="HTTP/1.1, part 1: URIs, Connections, and Message Parsing">[Part1]</cite></a>, <a href="p1-messaging.html#quality.values" title="Quality Values">Section 6.4</a>> 686 686 </pre><h1 id="rfc.section.2"><a href="#rfc.section.2">2.</a> <a id="protocol.parameters" href="#protocol.parameters">Protocol Parameters</a></h1> 687 <h2 id="rfc.section.2.1"><a href="#rfc.section.2.1">2.1</a> <a id="character.sets" href="#character.sets">Character Sets</a></h2> 688 <p id="rfc.section.2.1.p.1">HTTP uses the same definition of the term "character set" as that described for MIME:</p> 689 <p id="rfc.section.2.1.p.2">The term "character set" is used in this document to refer to a method used with one or more tables to convert a sequence 690 of octets into a sequence of characters. Note that unconditional conversion in the other direction is not required, in that 691 not all characters might be available in a given character set and a character set might provide more than one sequence of 692 octets to represent a particular character. This definition is intended to allow various kinds of character encoding, from 693 simple single-table mappings such as US-ASCII to complex table switching methods such as those that use ISO-2022's techniques. 694 However, the definition associated with a MIME character set name <em class="bcp14">MUST</em> fully specify the mapping to be performed from octets to characters. In particular, use of external profiling information 695 to determine the exact mapping is not permitted. 696 </p> 697 <div class="note" id="rfc.section.2.1.p.3"> 698 <p> <b>Note:</b> This use of the term "character set" is more commonly referred to as a "character encoding". However, since HTTP and MIME 699 share the same registry, it is important that the terminology also be shared. 700 </p> 701 </div> 687 <h2 id="rfc.section.2.1"><a href="#rfc.section.2.1">2.1</a> <a id="character.sets" href="#character.sets">Character Encodings (charset)</a></h2> 688 <p id="rfc.section.2.1.p.1">HTTP uses charset names to indicate the character encoding of a textual representation.</p> 702 689 <div id="rule.charset"> 703 <p id="rfc.section.2.1.p. 4"> HTTP character sets are identified by case-insensitive tokens. The complete set of tokens is defined by the IANA Character690 <p id="rfc.section.2.1.p.2"> A character encoding is identified by a case-insensitive token. The complete set of tokens is defined by the IANA Character 704 691 Set registry (<<a href="http://www.iana.org/assignments/character-sets">http://www.iana.org/assignments/character-sets</a>>). 705 692 </p> 706 693 </div> 707 694 <div id="rfc.figure.u.3"></div><pre class="inline"><span id="rfc.iref.g.1"></span> <a href="#rule.charset" class="smpl">charset</a> = <a href="#core.rules" class="smpl">token</a> 708 </pre><p id="rfc.section.2.1.p. 6">Although HTTP allows an arbitrary token to be used as a charset value, any token that has a predefined value within the IANA709 Character Set registry <em class="bcp14">MUST</em> represent the character set defined by that registry. Applications <em class="bcp14">SHOULD</em> limit their use of character sets to those defined bythe IANA registry.710 </p> 711 <p id="rfc.section.2.1.p. 7">HTTP uses charset in two contexts: within an Accept-Charset request header field (in which the charset value is an unquoted695 </pre><p id="rfc.section.2.1.p.4">Although HTTP allows an arbitrary token to be used as a charset value, any token that has a predefined value within the IANA 696 Character Set registry <em class="bcp14">MUST</em> represent the character encoding defined by that registry. Applications <em class="bcp14">SHOULD</em> limit their use of character encodings to those defined within the IANA registry. 697 </p> 698 <p id="rfc.section.2.1.p.5">HTTP uses charset in two contexts: within an Accept-Charset request header field (in which the charset value is an unquoted 712 699 token) and as the value of a parameter in a Content-Type header field (within a request or response), in which case the parameter 713 700 value of the charset parameter can be quoted. 714 701 </p> 715 <p id="rfc.section.2.1.p. 8">Implementors need to be aware of IETF character set requirements <a href="#RFC3629" id="rfc.xref.RFC3629.1"><cite title="UTF-8, a transformation format of ISO 10646">[RFC3629]</cite></a> <a href="#RFC2277" id="rfc.xref.RFC2277.1"><cite title="IETF Policy on Character Sets and Languages">[RFC2277]</cite></a>.702 <p id="rfc.section.2.1.p.6">Implementors need to be aware of IETF character set requirements <a href="#RFC3629" id="rfc.xref.RFC3629.1"><cite title="UTF-8, a transformation format of ISO 10646">[RFC3629]</cite></a> <a href="#RFC2277" id="rfc.xref.RFC2277.1"><cite title="IETF Policy on Character Sets and Languages">[RFC2277]</cite></a>. 716 703 </p> 717 704 <h3 id="rfc.section.2.1.1"><a href="#rfc.section.2.1.1">2.1.1</a> <a id="missing.charset" href="#missing.charset">Missing Charset</a></h3> … … 810 797 <p id="rfc.section.2.3.1.p.3">If a representation is encoded with a content-coding, the underlying data <em class="bcp14">MUST</em> be in a form defined above prior to being encoded. 811 798 </p> 812 <p id="rfc.section.2.3.1.p.4">The "charset" parameter is used with some media types to define the character encoding (<a href="#character.sets" title="Character Sets">Section 2.1</a>) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined799 <p id="rfc.section.2.3.1.p.4">The "charset" parameter is used with some media types to define the character encoding (<a href="#character.sets" title="Character Encodings (charset)">Section 2.1</a>) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined 813 800 to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character encodings other than "ISO-8859-1" 814 801 or its subsets <em class="bcp14">MUST</em> be labeled with an appropriate charset value. See <a href="#missing.charset" title="Missing Charset">Section 2.1.1</a> for compatibility problems. … … 1151 1138 <div id="rfc.iref.h.2"></div> 1152 1139 <h2 id="rfc.section.6.2"><a href="#rfc.section.6.2">6.2</a> <a id="header.accept-charset" href="#header.accept-charset">Accept-Charset</a></h2> 1153 <p id="rfc.section.6.2.p.1">The "Accept-Charset" header field can be used by user agents to indicate what response character sets are acceptable. This1154 field allows clients capable of understanding more comprehensive or special-purpose character sets to signal that capability1155 t o a server which is capable of representing documents in those character sets.1140 <p id="rfc.section.6.2.p.1">The "Accept-Charset" header field can be used by user agents to indicate what character encodings are acceptable in a response 1141 payload. This field allows clients capable of understanding more comprehensive or special-purpose character encodings to signal 1142 that capability to a server which is capable of representing documents in those character encodings. 1156 1143 </p> 1157 1144 <div id="rfc.figure.u.15"></div><pre class="inline"><span id="rfc.iref.g.16"></span><span id="rfc.iref.g.17"></span> <a href="#header.accept-charset" class="smpl">Accept-Charset</a> = "Accept-Charset" ":" <a href="#core.rules" class="smpl">OWS</a> … … 1159 1146 <a href="#header.accept-charset" class="smpl">Accept-Charset-v</a> = 1#( ( <a href="#rule.charset" class="smpl">charset</a> / "*" ) 1160 1147 [ <a href="#core.rules" class="smpl">OWS</a> ";" <a href="#core.rules" class="smpl">OWS</a> "q=" <a href="#abnf.dependencies" class="smpl">qvalue</a> ] ) 1161 </pre><p id="rfc.section.6.2.p.3">Character set values are described in <a href="#character.sets" title="Character Sets">Section 2.1</a>. Each charset <em class="bcp14">MAY</em> be given an associated quality value which represents the user's preference for that charset. The default value is q=1. An1148 </pre><p id="rfc.section.6.2.p.3">Character encoding values (a.k.a., charsets) are described in <a href="#character.sets" title="Character Encodings (charset)">Section 2.1</a>. Each charset <em class="bcp14">MAY</em> be given an associated quality value which represents the user's preference for that charset. The default value is q=1. An 1162 1149 example is 1163 1150 </p> 1164 1151 <div id="rfc.figure.u.16"></div><pre class="text"> Accept-Charset: iso-8859-5, unicode-1-1;q=0.8 1165 </pre><p id="rfc.section.6.2.p.5">The special value "*", if present in the Accept-Charset field, matches every character set (including ISO-8859-1) which is1166 not mentioned elsewhere in the Accept-Charset field. If no "*" is present in an Accept-Charset field, then all character sets1167 not explicitly mentioned get a quality value of 0, except for ISO-8859-1, which gets a quality value of 1 if not explicitly1168 mentioned.1169 </p> 1170 <p id="rfc.section.6.2.p.6">If no Accept-Charset header field is present, the default is that any character set is acceptable. If an Accept-Charset header1171 field is present, and if the server cannot send a response which is acceptable according to the Accept-Charset header field,1172 then the server <em class="bcp14">SHOULD</em> send an error response with the 406 (Not Acceptable) status code, though the sending of an unacceptable response is also allowed.1152 </pre><p id="rfc.section.6.2.p.5">The special value "*", if present in the Accept-Charset field, matches every character encoding (including ISO-8859-1) which 1153 is not mentioned elsewhere in the Accept-Charset field. If no "*" is present in an Accept-Charset field, then all character 1154 encodings not explicitly mentioned get a quality value of 0, except for ISO-8859-1, which gets a quality value of 1 if not 1155 explicitly mentioned. 1156 </p> 1157 <p id="rfc.section.6.2.p.6">If no Accept-Charset header field is present, the default is that any character encoding is acceptable. If an Accept-Charset 1158 header field is present, and if the server cannot send a response which is acceptable according to the Accept-Charset header 1159 field, then the server <em class="bcp14">SHOULD</em> send an error response with the 406 (Not Acceptable) status code, though the sending of an unacceptable response is also allowed. 1173 1160 </p> 1174 1161 <div id="rfc.iref.a.3"></div> … … 1773 1760 </p> 1774 1761 <p id="rfc.section.A.2.p.2">Where it is possible, a proxy or gateway from HTTP to a strict MIME environment <em class="bcp14">SHOULD</em> translate all line breaks within the text media types described in <a href="#canonicalization.and.text.defaults" title="Canonicalization and Text Defaults">Section 2.3.1</a> of this document to the RFC 2049 canonical form of CRLF. Note, however, that this might be complicated by the presence of 1775 a Content-Encoding and by the fact that HTTP allows the use of some character sets which do not use octets 13 and 10 to represent1776 CR and LF, as is the case for some multi-byte character sets.1762 a Content-Encoding and by the fact that HTTP allows the use of some character encodings which do not use octets 13 and 10 1763 to represent CR and LF, respectively, as is the case for some multi-byte character encodings. 1777 1764 </p> 1778 1765 <p id="rfc.section.A.2.p.3">Conversion will break any cryptographic checksums applied to the original content unless the original content is already in … … 1813 1800 </p> 1814 1801 <h1 id="rfc.section.C"><a href="#rfc.section.C">C.</a> <a id="changes.from.rfc.2616" href="#changes.from.rfc.2616">Changes from RFC 2616</a></h1> 1815 <p id="rfc.section.C.p.1">Clarify contexts that charset is used in. (<a href="#character.sets" title="Character Sets">Section 2.1</a>)1802 <p id="rfc.section.C.p.1">Clarify contexts that charset is used in. (<a href="#character.sets" title="Character Encodings (charset)">Section 2.1</a>) 1816 1803 </p> 1817 1804 <p id="rfc.section.C.p.2">Remove base URI setting semantics for Content-Location due to poor implementation support, which was caused by too many broken
Note: See TracChangeset
for help on using the changeset viewer.