Changeset 1975 for draft-ietf-httpbis/latest/p2-semantics.xml
- Timestamp:
- 05/11/12 05:10:11 (10 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
draft-ietf-httpbis/latest/p2-semantics.xml
r1964 r1975 72 72 <!ENTITY header-warning "<xref target='Part6' x:rel='#header.warning' xmlns:x='http://purl.org/net/xml2rfc/ext'/>"> 73 73 <!ENTITY header-www-authenticate "<xref target='Part7' x:rel='#header.www-authenticate' xmlns:x='http://purl.org/net/xml2rfc/ext'/>"> 74 <!ENTITY media-type s "<xref target='media.types' xmlns:x='http://purl.org/net/xml2rfc/ext'/>">74 <!ENTITY media-type "<xref target='media.type' xmlns:x='http://purl.org/net/xml2rfc/ext'/>"> 75 75 <!ENTITY message-body "<xref target='Part1' x:rel='#message.body' xmlns:x='http://purl.org/net/xml2rfc/ext'/>"> 76 76 <!ENTITY media-type-message-http "<xref target='Part1' x:rel='#internet.media.type.message.http' xmlns:x='http://purl.org/net/xml2rfc/ext'/>"> … … 245 245 with the list rule expanded. 246 246 </t> 247 <t> 248 This specification uses the terms 249 "character", 250 "character encoding scheme", 251 "charset", and 252 "protocol element" 253 as they are defined in <xref target="RFC6365"/>. 254 </t> 247 255 </section> 248 256 </section> … … 321 329 <section title="Data Type" anchor="data.type"> 322 330 323 <section title="Media Type s" anchor="media.types">331 <section title="Media Type" anchor="media.type"> 324 332 <x:anchor-alias value="media-type"/> 325 333 <x:anchor-alias value="type"/> … … 358 366 A parameter value that matches the <x:ref>token</x:ref> production can be 359 367 transmitted as either a token or within a quoted-string. The quoted and 360 unquoted values are equivalent. 361 </t> 368 unquoted values are equivalent. For example, the following examples are 369 all equivalent, but the first is preferred for consistency: 370 </t> 371 <figure><artwork type="example"> 372 text/html;charset=utf-8 373 text/html;charset=UTF-8 374 Text/HTML;Charset="utf-8" 375 text/html; charset="utf-8" 376 </artwork></figure> 362 377 <t> 363 378 Media-type values are registered with the Internet Assigned Number … … 368 383 </section> 369 384 370 <section title="Character Encodings (charset)" anchor="character.sets"> 371 <t> 372 HTTP uses charset names to indicate the character encoding of a 373 textual representation. 374 </t> 375 <t anchor="rule.charset"> 376 <x:anchor-alias value="charset"/> 377 A character encoding is identified by a case-insensitive token. The 378 complete set of tokens is defined by the IANA Character Set registry 379 (<eref target="http://www.iana.org/assignments/character-sets"/>). 385 <section title="Charset" anchor="charset"> 386 <x:anchor-alias value="rule.charset"/> 387 <t> 388 HTTP uses charset names to indicate or negotiate the character encoding 389 scheme of a textual representation <xref target="RFC6365"/>. 390 A charset is identified by a case-insensitive token. 380 391 </t> 381 392 <figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="charset"/> … … 383 394 </artwork></figure> 384 395 <t> 385 Although HTTP allows an arbitrary token to be used as a charset 386 value, any token that has a predefined value within the IANA 387 Character Set registry &MUST; represent the character encoding defined 388 by that registry. Applications &SHOULD; limit their use of character 389 encodings to those defined within the IANA registry. 390 </t> 391 <t> 392 HTTP uses charset in two contexts: within an <x:ref>Accept-Charset</x:ref> 393 request header field (in which the charset value is an unquoted token) and 394 as the value of a parameter in a <x:ref>Content-Type</x:ref> header field 395 (within a request or response), in which case the parameter value of the 396 charset parameter can be quoted. 397 </t> 398 <t> 399 Implementers need to be aware of IETF character set requirements <xref target="RFC3629"/> 400 <xref target="RFC2277"/>. 396 The IANA Character Set registry 397 (<eref target="http://www.iana.org/assignments/character-sets"/>) 398 maintains the set of tokens registered for use on the Internet as 399 charset names <xref target="RFC2978"/>. 401 400 </t> 402 401 </section> … … 416 415 applications &MUST; accept CRLF, bare CR, and bare LF as indicating 417 416 a line break in text media received via HTTP. In 418 addition, if the text is in a char acter encodingthat does not417 addition, if the text is in a charset that does not 419 418 use octets 13 and 10 for CR and LF respectively, as is the case for 420 some multi-byte char acter encodings, HTTP allows the use of whatever octet421 sequences are defined by that char acter encodingto represent the419 some multi-byte charsets, HTTP allows the use of whatever octet 420 sequences are defined by that charset to represent the 422 421 equivalent of CR and LF for line breaks. This flexibility regarding 423 422 line breaks applies only to text media in the payload body; a bare CR … … 479 478 </artwork></figure> 480 479 <t> 481 Media types are defined in <xref target="media.type s"/>. An example of the field is480 Media types are defined in <xref target="media.type"/>. An example of the field is 482 481 </t> 483 482 <figure><artwork type="example"> … … 913 912 Often, the server has different ways of representing the 914 913 same information; for example, in different formats, languages, 915 or using different char acter encodings.914 or using different charsets. 916 915 </t> 917 916 <t> … … 2179 2178 <x:anchor-alias value="Accept-Charset"/> 2180 2179 <t> 2181 The "Accept-Charset" header field can be used by user agents to 2182 indicate what character encodings are acceptable in a response 2183 payload. This field allows 2184 clients capable of understanding more comprehensive or special-purpose 2185 character encodings to signal that capability to a server which is capable of 2186 representing documents in those character encodings. 2180 The "Accept-Charset" header field can be sent by a user agent to 2181 indicate what charsets are acceptable in a selected representation. 2182 This field allows user agents capable of understanding more comprehensive 2183 or special-purpose charsets to signal that capability to an origin server 2184 which is capable of representing documents in those charsets. 2187 2185 </t> 2188 2186 <figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="Accept-Charset"/> … … 2190 2188 </artwork></figure> 2191 2189 <t> 2192 Character encoding values (a.k.a., charsets) are described in 2193 <xref target="character.sets"/>. Each charset &MAY; be given an 2194 associated quality value which represents the user's preference 2195 for that charset, as defined in &qvalue;. 2190 Charset names are defined in <xref target="charset"/>. 2191 A user agent &MAY; associate a quality value with each charset to indicate 2192 the user's relative preference for that charset, as defined in &qvalue;. 2196 2193 An example is 2197 2194 </t> … … 2201 2198 <t> 2202 2199 The special value "*", if present in the Accept-Charset field, 2203 matches every char acter encodingwhich is not mentioned elsewhere in the2200 matches every charset which is not mentioned elsewhere in the 2204 2201 Accept-Charset field. If no "*" is present in an Accept-Charset field, 2205 then any char acter encodings not explicitly mentioned in the field are2202 then any charsets not explicitly mentioned in the field are 2206 2203 considered "not acceptable" to the client. 2207 2204 </t> 2208 2205 <t> 2209 2206 A request without any Accept-Charset header field implies that the user 2210 agent will accept any character encoding in response. 2207 agent will accept any charset in response. 2208 Most general-purpose user agents do not send Accept-Charset, unless 2209 specifically configured to do so, because a detailed list of supported 2210 charsets makes it easier for a server to identify an individual by virtue 2211 of the user agent's request characteristics (a.k.a., fingerprinting). 2211 2212 </t> 2212 2213 <t> 2213 2214 If an Accept-Charset header field is present in a request and none of the 2214 available representations for the response ha ve a character encoding that2215 is listed as acceptable, the origin server &MAY; either honor the2216 Accept-Charset header field by sending a <x:ref>406 (Not Acceptable)</x:ref> responseor2217 disregard the Accept-Charset header field by treating the res ponse as if2215 available representations for the response has a charset that is listed as 2216 acceptable, the origin server &MAY; either honor the Accept-Charset header 2217 field, by sending a <x:ref>406 (Not Acceptable)</x:ref> response, or 2218 disregard the Accept-Charset header field by treating the resource as if 2218 2219 it is not subject to content negotiation. 2219 2220 </t> … … 4386 4387 parameter value ought to be independent of the syntax used for it (for an 4387 4388 example, see the notes on parameter handling for media types in 4388 &media-type s;).4389 &media-type;). 4389 4390 </t> 4390 4391 <t> … … 5123 5124 </reference> 5124 5125 5126 <reference anchor='RFC6365'> 5127 <front> 5128 <title>Terminology Used in Internationalization in the IETF</title> 5129 <author initials='P.' surname='Hoffman' fullname='P. Hoffman'> 5130 <organization /></author> 5131 <author initials='J.' surname='Klensin' fullname='J. Klensin'> 5132 <organization /></author> 5133 <date year='2011' month='September' /> 5134 </front> 5135 <seriesInfo name='BCP' value='166' /> 5136 <seriesInfo name='RFC' value='6365' /> 5137 </reference> 5138 5125 5139 </references> 5126 5140 … … 5224 5238 </front> 5225 5239 <seriesInfo name="RFC" value="2076"/> 5226 </reference>5227 5228 <reference anchor="RFC2277">5229 <front>5230 <title abbrev="Charset Policy">IETF Policy on Character Sets and Languages</title>5231 <author initials="H.T." surname="Alvestrand" fullname="Harald Tveit Alvestrand">5232 <organization>UNINETT</organization>5233 <address><email>Harald.T.Alvestrand@uninett.no</email></address>5234 </author>5235 <date month="January" year="1998"/>5236 </front>5237 <seriesInfo name="BCP" value="18"/>5238 <seriesInfo name="RFC" value="2277"/>5239 5240 </reference> 5240 5241 … … 5346 5347 </reference> 5347 5348 5348 <reference anchor= "RFC3629">5349 <reference anchor='RFC2978'> 5349 5350 <front> 5350 <title> UTF-8, a transformation format of ISO 10646</title>5351 <author initials= "F." surname="Yergeau" fullname="F. Yergeau">5352 <organization >Alis Technologies</organization>5353 <address><email>fyergeau@alis.com</email></address>5354 </author>5355 <date month="November" year="2003"/>5351 <title>IANA Charset Registration Procedures</title> 5352 <author initials='N.' surname='Freed' fullname='N. Freed'> 5353 <organization /></author> 5354 <author initials='J.' surname='Postel' fullname='J. Postel'> 5355 <organization /></author> 5356 <date year='2000' month='October' /> 5356 5357 </front> 5357 <seriesInfo name="STD" value="63"/>5358 <seriesInfo name="RFC" value="3629"/>5358 <seriesInfo name='BCP' value='19' /> 5359 <seriesInfo name='RFC' value='2978' /> 5359 5360 </reference> 5360 5361 … … 5562 5563 of this document to the RFC 2049 canonical form of CRLF. Note, however, that 5563 5564 this might be complicated by the presence of a <x:ref>Content-Encoding</x:ref> 5564 and by the fact that HTTP allows the use of some character encodings which do 5565 not use octets 13 and 10 to represent CR and LF, respectively, as is the case 5566 for some multi-byte character encodings. 5565 and by the fact that HTTP allows the use of some charsets 5566 which do not use octets 13 and 10 to represent CR and LF, respectively. 5567 5567 </t> 5568 5568 <t> … … 5748 5748 <t> 5749 5749 Clarify contexts that charset is used in. 5750 (<xref target="char acter.sets"/>)5751 </t> 5752 <t> 5753 Remove the default char acter encoding of "ISO-8859-1" for text media types; the5754 default now is whatever the media type definition says.5750 (<xref target="charset"/>) 5751 </t> 5752 <t> 5753 Remove the default charset of "ISO-8859-1" for text media 5754 types; the default now is whatever the media type definition says. 5755 5755 (<xref target="canonicalization.and.text.defaults"/>) 5756 5756 </t>
Note: See TracChangeset
for help on using the changeset viewer.