Changeset 1176


Ignore:
Timestamp:
Mar 13, 2011, 8:25:25 PM (8 years ago)
Author:
fielding@…
Message:

Use the term octet instead of character where we are talking
about parsing, since HTTP must be parsed as octets. Make sure
there is no ambiguity about how many spaces are allowed on a
request-line or response-line.

Use the term character encoding instead of character set
because it is annoying to use the term incorrectly regardless
of how MIME does it.

Location:
draft-ietf-httpbis/latest
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • draft-ietf-httpbis/latest/p1-messaging.xml

    r1175 r1176  
    427427</t>
    428428<t>
    429    The OWS rule is used where zero or more linear whitespace characters might
    430    appear. OWS &SHOULD; either not be produced or be produced as a single SP
    431    character. Multiple OWS characters that occur within field-content &SHOULD;
     429   The OWS rule is used where zero or more linear whitespace octets might
     430   appear. OWS &SHOULD; either not be produced or be produced as a single
     431   SP. Multiple OWS octets that occur within field-content &SHOULD;
    432432   be replaced with a single SP before interpreting the field value or
    433433   forwarding the message downstream.
    434434</t>
    435435<t>
    436    RWS is used when at least one linear whitespace character is required to
    437    separate field tokens. RWS &SHOULD; be produced as a single SP character.
    438    Multiple RWS characters that occur within field-content &SHOULD; be
     436   RWS is used when at least one linear whitespace octet is required to
     437   separate field tokens. RWS &SHOULD; be produced as a single SP.
     438   Multiple RWS octets that occur within field-content &SHOULD; be
    439439   replaced with a single SP before interpreting the field value or
    440440   forwarding the message downstream.
     
    503503<t anchor="rule.quoted-pair">
    504504  <x:anchor-alias value="quoted-pair"/>
    505    The backslash character ("\") can be used as a single-character
     505   The backslash octet ("\") can be used as a single-octet
    506506   quoting mechanism within quoted-string constructs:
    507507</t>
     
    510510</artwork></figure>
    511511<t>
    512    Producers &SHOULD-NOT; escape characters that do not require escaping
    513    (i.e., other than DQUOTE and the backslash character).
     512   Senders &SHOULD-NOT; escape octets that do not require escaping
     513   (i.e., other than DQUOTE and the backslash octet).
    514514</t>
    515515</section>
     
    819819<t>
    820820   The HTTP version number consists of two non-negative decimal integers
    821    separated by the "." (period or decimal point) character.  The first
     821   separated by a "." (period or decimal point).  The first
    822822   number ("major version") indicates the HTTP messaging syntax, whereas
    823823   the second number ("minor version") indicates the highest minor
     
    11331133</section>
    11341134
    1135 <section title="HTTP Message" anchor="http.message">
     1135<section title="Message Format" anchor="http.message">
    11361136<x:anchor-alias value="generic-message"/>
    11371137<x:anchor-alias value="message.types"/>
     
    11431143<t>
    11441144   All HTTP/1.1 messages consist of a start-line followed by a sequence of
    1145    characters in a format similar to the Internet Message Format
     1145   octets in a format similar to the Internet Message Format
    11461146   <xref target="RFC5322"/>: zero or more header fields (collectively
    11471147   referred to as the "headers" or the "header section"), an empty line
     
    11681168</artwork></figure>
    11691169<t>
    1170    Whitespace (WSP) &MUST-NOT; be sent between the start-line and the first
    1171    header field. The presence of whitespace might be an attempt to trick a
    1172    noncompliant implementation of HTTP into ignoring that field or processing
    1173    the next line as a new request, either of which might result in security
    1174    issues when implementations within the request chain interpret the
    1175    same message differently. HTTP/1.1 servers &MUST; reject such a message
    1176    with a 400 (Bad Request) response.
     1170   Implementations &MUST-NOT; send whitespace between the start-line and
     1171   the first header field. The presence of such whitespace in a request
     1172   might be an attempt to trick a server into ignoring that field or
     1173   processing the line after it as a new request, either of which might
     1174   result in a security vulnerability if other implementations within
     1175   the request chain interpret the same message differently.
     1176   Likewise, the presence of such whitespace in a response might be
     1177   ignored by some clients or cause others to cease parsing.
    11771178</t>
    11781179
     
    11931194   client &MUST; include the terminating CRLF octets as part of the
    11941195   message-body length.
     1196</t>
     1197<t>
     1198   When a server listening only for HTTP request messages, or processing
     1199   what appears from the start-line to be an HTTP request message,
     1200   receives a sequence of octets that does not match the HTTP-message
     1201   grammar aside from the robustness exceptions listed above, the
     1202   server &MUST; respond with an HTTP/1.1 400 (Bad Request) response. 
    11951203</t>
    11961204<t>
     
    12421250   A field value &MAY; be preceded by optional whitespace (OWS); a single SP is
    12431251   preferred. The field value does not include any leading or trailing white
    1244    space: OWS occurring before the first non-whitespace character of the
    1245    field value or after the last non-whitespace character of the field value
     1252   space: OWS occurring before the first non-whitespace octet of the
     1253   field value or after the last non-whitespace octet of the field value
    12461254   is ignored and &SHOULD; be removed before further processing (as this does
    12471255   not change the meaning of the header field).
     
    12831291   Historically, HTTP header field values could be extended over multiple
    12841292   lines by preceding each extra line with at least one space or horizontal
    1285    tab character (line folding). This specification deprecates such line
     1293   tab octet (line folding). This specification deprecates such line
    12861294   folding except within the message/http media type
    12871295   (<xref target="internet.media.type.message.http"/>).
     
    12991307   In practice, most HTTP header field values use only a subset of the
    13001308   US-ASCII character encoding <xref target="USASCII"/>. Newly defined
    1301    header fields &SHOULD; limit their field values to US-ASCII characters.
     1309   header fields &SHOULD; limit their field values to US-ASCII octets.
    13021310   Recipients &SHOULD; treat other (obs-text) octets in field content as
    13031311   opaque data.
     
    13171325<t anchor="rule.quoted-cpair">
    13181326  <x:anchor-alias value="quoted-cpair"/>
    1319    The backslash character ("\") can be used as a single-character
     1327   The backslash octet ("\") can be used as a single-octet
    13201328   quoting mechanism within comment constructs:
    13211329</t>
     
    13241332</artwork></figure>
    13251333<t>
    1326    Producers &SHOULD-NOT; escape characters that do not require escaping
    1327    (i.e., other than the backslash character "\" and the parentheses "(" and
     1334   Senders &SHOULD-NOT; escape octets that do not require escaping
     1335   (i.e., other than the backslash octet "\" and the parentheses "(" and
    13281336   ")").
    13291337</t>
     
    15471555  <x:anchor-alias value="Request"/>
    15481556<t>
    1549    A request message from a client to a server includes, within the
    1550    first line of that message, the method to be applied to the resource,
    1551    the identifier of the resource, and the protocol version in use.
     1557   A request message from a client to a server begins with a
     1558   Request-Line, followed by zero or more header fields, an empty
     1559   line signifying the end of the header block, and an optional
     1560   message body.
    15521561</t>
    15531562<!--                 Host                      ; should be moved here eventually -->
     
    15621571  <x:anchor-alias value="Request-Line"/>
    15631572<t>
    1564    The Request-Line begins with a method token, followed by the
    1565    request-target and the protocol version, and ending with CRLF. The
    1566    elements are separated by SP characters. No CR or LF is allowed
    1567    except in the final CRLF sequence.
     1573   The Request-Line begins with a method token, followed by a single
     1574   space (SP), the request-target, another single space (SP), the
     1575   protocol version, and ending with CRLF.
    15681576</t>
    15691577<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="Request-Line"/>
     
    17871795    </t>
    17881796    <t>
    1789       the character sequence "://",
     1797      the octet sequence "://",
    17901798    </t>
    17911799    <t>
     
    18631871<t>
    18641872   The first line of a Response message is the Status-Line, consisting
    1865    of the protocol version followed by a numeric status code and its
    1866    associated textual phrase, with each element separated by SP
    1867    characters. No CR or LF is allowed except in the final CRLF sequence.
     1873   of the protocol version, a space (SP), the status code, another space,
     1874   a possibly-empty textual phrase describing the status code, and
     1875   ending with CRLF.
    18681876</t>
    18691877<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="Status-Line"/>
     
    23502358   Product tokens &SHOULD; be short and to the point. They &MUST-NOT; be
    23512359   used for advertising or other non-essential information. Although any
    2352    token character &MAY; appear in a product-version, this token &SHOULD;
     2360   token octet &MAY; appear in a product-version, this token &SHOULD;
    23532361   only be used for a version identifier (i.e., successive versions of
    23542362   the same product &SHOULD; only differ in the product-version portion of
     
    48474855</t>
    48484856<t>
    4849    Clients &SHOULD; be tolerant in parsing the Status-Line and servers
    4850    &SHOULD; be tolerant when parsing the Request-Line. In particular, they
    4851    &SHOULD; accept any amount of WSP characters between fields, even though
    4852    only a single SP is required.
    4853 </t>
    4854 <t>
    48554857   The line terminator for header fields is the sequence CRLF.
    48564858   However, we recommend that applications, when parsing such headers fields,
     
    48584860</t>
    48594861<t>
    4860    The character set of a representation &SHOULD; be labeled as the lowest
     4862   The character encoding of a representation &SHOULD; be labeled as the lowest
    48614863   common denominator of the character codes used within that representation, with
    48624864   the exception that not labeling the representation is preferred over labeling
     
    50275029  Rules about implicit linear whitespace between certain grammar productions
    50285030  have been removed; now it's only allowed when specifically pointed out
    5029   in the ABNF. The NUL character is no longer allowed in comment and quoted-string
     5031  in the ABNF. The NUL octet is no longer allowed in comment and quoted-string
    50305032  text. The quoted-pair rule no longer allows escaping control characters other than HTAB.
    50315033  Non-ASCII content in header fields and reason phrase has been obsoleted and
  • draft-ietf-httpbis/latest/p3-payload.xml

    r1173 r1176  
    336336<section title="Protocol Parameters" anchor="protocol.parameters">
    337337
    338 <section title="Character Sets" anchor="character.sets">
    339 <t>
    340    HTTP uses the same definition of the term "character set" as that
    341    described for MIME:
    342 </t>
    343 <t>
    344    The term "character set" is used in this document to refer to a
    345    method used with one or more tables to convert a sequence of octets
    346    into a sequence of characters. Note that unconditional conversion in
    347    the other direction is not required, in that not all characters might
    348    be available in a given character set and a character set might provide
    349    more than one sequence of octets to represent a particular character.
    350    This definition is intended to allow various kinds of character
    351    encoding, from simple single-table mappings such as US-ASCII to
    352    complex table switching methods such as those that use ISO-2022's
    353    techniques. However, the definition associated with a MIME character
    354    set name &MUST; fully specify the mapping to be performed from octets
    355    to characters. In particular, use of external profiling information
    356    to determine the exact mapping is not permitted.
    357 </t>
    358 <x:note>
    359   <t>
    360     <x:h>Note:</x:h> This use of the term "character set" is more commonly
    361     referred to as a "character encoding". However, since HTTP and
    362     MIME share the same registry, it is important that the terminology
    363     also be shared.
    364   </t>
    365 </x:note>
     338<section title="Character Encodings (charset)" anchor="character.sets">
     339<t>
     340   HTTP uses charset names to indicate the character encoding of a
     341   textual representation.
     342</t>
    366343<t anchor="rule.charset">
    367344  <x:anchor-alias value="charset"/>
    368    HTTP character sets are identified by case-insensitive tokens. The
     345   A character encoding is identified by a case-insensitive token. The
    369346   complete set of tokens is defined by the IANA Character Set registry
    370347   (<eref target="http://www.iana.org/assignments/character-sets"/>).
     
    376353   Although HTTP allows an arbitrary token to be used as a charset
    377354   value, any token that has a predefined value within the IANA
    378    Character Set registry &MUST; represent the character set defined
     355   Character Set registry &MUST; represent the character encoding defined
    379356   by that registry. Applications &SHOULD; limit their use of character
    380    sets to those defined by the IANA registry.
     357   encodings to those defined within the IANA registry.
    381358</t>
    382359<t>
     
    10941071<t>
    10951072   The "Accept-Charset" header field can be used by user agents to
    1096    indicate what response character sets are acceptable. This field allows
     1073   indicate what character encodings are acceptable in a response
     1074   payload. This field allows
    10971075   clients capable of understanding more comprehensive or special-purpose
    1098    character sets to signal that capability to a server which is capable of
    1099    representing documents in those character sets.
     1076   character encodings to signal that capability to a server which is capable of
     1077   representing documents in those character encodings.
    11001078</t>
    11011079<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="Accept-Charset"/><iref primary="true" item="Grammar" subitem="Accept-Charset-v"/>
     
    11061084</artwork></figure>
    11071085<t>
    1108    Character set values are described in <xref target="character.sets"/>. Each charset &MAY;
    1109    be given an associated quality value which represents the user's
    1110    preference for that charset. The default value is q=1. An example is
     1086   Character encoding values (a.k.a., charsets) are described in
     1087   <xref target="character.sets"/>. Each charset &MAY; be given an
     1088   associated quality value which represents the user's preference
     1089   for that charset. The default value is q=1. An example is
    11111090</t>
    11121091<figure><artwork type="example">
     
    11151094<t>
    11161095   The special value "*", if present in the Accept-Charset field,
    1117    matches every character set (including ISO-8859-1) which is not
     1096   matches every character encoding (including ISO-8859-1) which is not
    11181097   mentioned elsewhere in the Accept-Charset field. If no "*" is present
    1119    in an Accept-Charset field, then all character sets not explicitly
     1098   in an Accept-Charset field, then all character encodings not explicitly
    11201099   mentioned get a quality value of 0, except for ISO-8859-1, which gets
    11211100   a quality value of 1 if not explicitly mentioned.
     
    11231102<t>
    11241103   If no Accept-Charset header field is present, the default is that any
    1125    character set is acceptable. If an Accept-Charset header field is present,
     1104   character encoding is acceptable. If an Accept-Charset header field is present,
    11261105   and if the server cannot send a response which is acceptable
    11271106   according to the Accept-Charset header field, then the server &SHOULD; send
     
    25442523   Where it is possible, a proxy or gateway from HTTP to a strict MIME
    25452524   environment &SHOULD; translate all line breaks within the text media
    2546    types described in <xref target="canonicalization.and.text.defaults"/> of this document to the RFC 2049
     2525   types described in <xref target="canonicalization.and.text.defaults"/>
     2526   of this document to the RFC 2049
    25472527   canonical form of CRLF. Note, however, that this might be complicated
    25482528   by the presence of a Content-Encoding and by the fact that HTTP
    2549    allows the use of some character sets which do not use octets 13 and
    2550    10 to represent CR and LF, as is the case for some multi-byte
    2551    character sets.
     2529   allows the use of some character encodings which do not use octets 13 and
     2530   10 to represent CR and LF, respectively, as is the case for some multi-byte
     2531   character encodings.
    25522532</t>
    25532533<t>
Note: See TracChangeset for help on using the changeset viewer.