Ignore:
Timestamp:
Aug 8, 2011, 5:46:19 PM (8 years ago)
Author:
fielding@…
Message:

Reorganize and clarify the sections on message header parsing.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • draft-ietf-httpbis/latest/p1-messaging.xml

    r1392 r1393  
    473473                 ; see <xref target="header.fields"/>
    474474</artwork></figure>
    475 <t anchor="rule.token.separators">
    476   <x:anchor-alias value="tchar"/>
    477   <x:anchor-alias value="token"/>
    478   <x:anchor-alias value="special"/>
    479   <x:anchor-alias value="word"/>
    480    Many HTTP/1.1 header field values consist of words (token or quoted-string)
    481    separated by whitespace or special characters. These special characters
    482    &MUST; be in a quoted string to be used within a parameter value (as defined
    483    in <xref target="transfer.codings"/>).
    484 </t>
    485 <figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="word"/><iref primary="true" item="Grammar" subitem="token"/><iref primary="true" item="Grammar" subitem="tchar"/><iref primary="true" item="Grammar" subitem="special"/>
    486   <x:ref>word</x:ref>           = <x:ref>token</x:ref> / <x:ref>quoted-string</x:ref>
    487 
    488   <x:ref>token</x:ref>          = 1*<x:ref>tchar</x:ref>
    489 <!--
    490   IMPORTANT: when editing "tchar" make sure that "special" is updated accordingly!!!
    491  -->
    492   <x:ref>tchar</x:ref>          = "!" / "#" / "$" / "%" / "&amp;" / "'" / "*"
    493                  / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
    494                  / <x:ref>DIGIT</x:ref> / <x:ref>ALPHA</x:ref>
    495                  ; any <x:ref>VCHAR</x:ref>, except <x:ref>special</x:ref>
    496 
    497   <x:ref>special</x:ref>        = "(" / ")" / "&lt;" / ">" / "@" / ","
    498                  / ";" / ":" / "\" / DQUOTE / "/" / "["
    499                  / "]" / "?" / "=" / "{" / "}"
    500 </artwork></figure>
    501 <t anchor="rule.quoted-string">
    502   <x:anchor-alias value="quoted-string"/>
    503   <x:anchor-alias value="qdtext"/>
    504   <x:anchor-alias value="obs-text"/>
    505    A string of text is parsed as a single word if it is quoted using
    506    double-quote marks.
    507 </t>
    508 <figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="quoted-string"/><iref primary="true" item="Grammar" subitem="qdtext"/><iref primary="true" item="Grammar" subitem="obs-text"/>
    509   <x:ref>quoted-string</x:ref>  = <x:ref>DQUOTE</x:ref> *( <x:ref>qdtext</x:ref> / <x:ref>quoted-pair</x:ref> ) <x:ref>DQUOTE</x:ref>
    510   <x:ref>qdtext</x:ref>         = <x:ref>OWS</x:ref> / %x21 / %x23-5B / %x5D-7E / <x:ref>obs-text</x:ref>
    511                  ; <x:ref>OWS</x:ref> / &lt;<x:ref>VCHAR</x:ref> except <x:ref>DQUOTE</x:ref> and "\"&gt; / <x:ref>obs-text</x:ref>
    512   <x:ref>obs-text</x:ref>       = %x80-FF
    513 </artwork></figure>
    514 <t anchor="rule.quoted-pair">
    515   <x:anchor-alias value="quoted-pair"/>
    516    The backslash octet ("\") can be used as a single-octet
    517    quoting mechanism within quoted-string constructs:
    518 </t>
    519 <figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="quoted-pair"/>
    520   <x:ref>quoted-pair</x:ref>    = "\" ( <x:ref>WSP</x:ref> / <x:ref>VCHAR</x:ref> / <x:ref>obs-text</x:ref> )
    521 </artwork></figure>
    522 <t>
    523    Recipients that process the value of the quoted-string &MUST; handle a
    524    quoted-pair as if it were replaced by the octet following the backslash.
    525 </t>
    526 <t>
    527    Senders &SHOULD-NOT; escape octets that do not require escaping
    528    (i.e., other than DQUOTE and the backslash octet).
    529 </t>
    530 </section>
    531 
     475</section>
    532476</section>
    533477</section>
     
    12201164</t>
    12211165
    1222 <section title="Message Parsing Robustness" anchor="message.robustness">
     1166<section title="Message Parsing and Robustness" anchor="message.robustness">
     1167<t>
     1168   The normal procedure for parsing an HTTP message is to read the
     1169   start-line into a structure, read each header field into a hash
     1170   table by field name until the empty line, and then use the parsed
     1171   data to determine if a message-body is expected.  If a message-body
     1172   has been indicated, then it is read as a stream until an amount
     1173   of octets equal to the message-body length is read or the connection
     1174   is closed.
     1175</t>
     1176<t>
     1177   Care must be taken to parse an HTTP message as a sequence
     1178   of octets in an encoding that is a superset of US-ASCII.  Attempting
     1179   to parse HTTP as a stream of Unicode characters in a character encoding
     1180   like UTF-16 might introduce security flaws due to the differing ways
     1181   that such parsers interpret invalid characters.
     1182</t>
     1183<t>
     1184   Older HTTP/1.0 client implementations might send an extra CRLF
     1185   after a POST request as a lame workaround for some early server
     1186   applications that failed to read message-body content that was
     1187   not terminated by a line-ending. An HTTP/1.1 client &MUST-NOT;
     1188   preface or follow a request with an extra CRLF.  If terminating
     1189   the request message-body with a line-ending is desired, then the
     1190   client &MUST; include the terminating CRLF octets as part of the
     1191   message-body length.
     1192</t>
    12231193<t>
    12241194   In the interest of robustness, servers &SHOULD; ignore at least one
     
    12311201</t>
    12321202<t>
    1233    Some old HTTP/1.0 client implementations send an extra CRLF
    1234    after a POST request as a lame workaround for some early server
    1235    applications that failed to read message-body content that was
    1236    not terminated by a line-ending. An HTTP/1.1 client &MUST-NOT;
    1237    preface or follow a request with an extra CRLF.  If terminating
    1238    the request message-body with a line-ending is desired, then the
    1239    client &MUST; include the terminating CRLF octets as part of the
    1240    message-body length.
    1241 </t>
    1242 <t>
    12431203   When a server listening only for HTTP request messages, or processing
    12441204   what appears from the start-line to be an HTTP request message,
     
    12461206   grammar aside from the robustness exceptions listed above, the
    12471207   server &MUST; respond with an HTTP/1.1 400 (Bad Request) response. 
    1248 </t>
    1249 <t>
    1250    The normal procedure for parsing an HTTP message is to read the
    1251    start-line into a structure, read each header field into a hash
    1252    table by field name until the empty line, and then use the parsed
    1253    data to determine if a message-body is expected.  If a message-body
    1254    has been indicated, then it is read as a stream until an amount
    1255    of octets equal to the message-body length is read or the connection
    1256    is closed.  Care must be taken to parse an HTTP message as a sequence
    1257    of octets in an encoding that is a superset of US-ASCII.  Attempting
    1258    to parse HTTP as a stream of Unicode characters in a character encoding
    1259    like UTF-16 might introduce security flaws due to the differing ways
    1260    that such parsers interpret invalid characters.
    1261 </t>
    1262 <t>
    1263    HTTP allows the set of defined header fields to be extended without
    1264    changing the protocol version (see <xref target="header.field.registration"/>).
    1265    Unrecognized header fields &MUST; be forwarded by a proxy unless the
    1266    proxy is specifically configured to block or otherwise transform such
    1267    fields.  Unrecognized header fields &SHOULD; be ignored by other recipients.
    12681208</t>
    12691209</section>
     
    12861226</artwork></figure>
    12871227<t>
    1288    No whitespace is allowed between the header field name and colon. For
    1289    security reasons, any request message received containing such whitespace
    1290    &MUST; be rejected with a response code of 400 (Bad Request). A proxy
    1291    &MUST; remove any such whitespace from a response message before
    1292    forwarding the message downstream.
    1293 </t>
    1294 <t>
    1295    A field value &MAY; be preceded by optional whitespace (OWS); a single SP is
    1296    preferred. The field value does not include any leading or trailing white
    1297    space: OWS occurring before the first non-whitespace octet of the
    1298    field value or after the last non-whitespace octet of the field value
    1299    is ignored and &SHOULD; be removed before further processing (as this does
    1300    not change the meaning of the header field).
     1228   The field-name token labels the corresponding field-value as having the
     1229   semantics defined by that header field.  For example, the Date header field
     1230   is defined in <xref target="header.date"/> as containing the origination
     1231   timestamp for the message in which it appears.
     1232</t>
     1233<t>
     1234   HTTP header fields are fully extensible: there is no limit on the
     1235   introduction of new field names, each presumably defining new semantics,
     1236   or on the number of header fields used in a given message.  Existing
     1237   fields are defined in each part of this specification and in many other
     1238   specifications outside the standards process.
     1239   New header fields can be introduced without changing the protocol version
     1240   if their defined semantics allow them to be safely ignored by recipients
     1241   that do not recognize them.
     1242</t>
     1243<t>
     1244   New HTTP header fields &SHOULD; be registered with IANA according
     1245   to the procedures in <xref target="header.field.registration"/>.
     1246   Unrecognized header fields &MUST; be forwarded by a proxy unless the
     1247   field-name is listed in the Connection header field
     1248   (<xref target="header.connection"/>) or the proxy is specifically
     1249   configured to block or otherwise transform such fields.
     1250   Unrecognized header fields &SHOULD; be ignored by other recipients.
    13011251</t>
    13021252<t>
     
    13331283  </t>
    13341284</x:note>
     1285
     1286<section title="Field Parsing" anchor="field.parsing">
     1287<t>
     1288   No whitespace is allowed between the header field-name and colon.
     1289   In the past, differences in the handling of such whitespace have led to
     1290   security vulnerabilities in request routing and response handling.
     1291   Any received request message that contains whitespace between a header
     1292   field-name and colon &MUST; be rejected with a response code of 400
     1293   (Bad Request).  A proxy &MUST; remove any such whitespace from a response
     1294   message before forwarding the message downstream.
     1295</t>
     1296<t>
     1297   A field value &MAY; be preceded by optional whitespace (OWS); a single SP is
     1298   preferred. The field value does not include any leading or trailing white
     1299   space: OWS occurring before the first non-whitespace octet of the
     1300   field value or after the last non-whitespace octet of the field value
     1301   is ignored and &SHOULD; be removed before further processing (as this does
     1302   not change the meaning of the header field).
     1303</t>
    13351304<t>
    13361305   Historically, HTTP header field values could be extended over multiple
     
    13391308   folding except within the message/http media type
    13401309   (<xref target="internet.media.type.message.http"/>).
    1341    HTTP/1.1 senders &MUST-NOT; produce messages that include line folding
     1310   HTTP senders &MUST-NOT; produce messages that include line folding
    13421311   (i.e., that contain any field-content that matches the obs-fold rule) unless
    13431312   the message is intended for packaging within the message/http media type.
    1344    HTTP/1.1 recipients &SHOULD; accept line folding and replace any embedded
    1345    obs-fold whitespace with a single SP prior to interpreting the field value
    1346    or forwarding the message downstream.
     1313   HTTP recipients &SHOULD; accept line folding and replace any embedded
     1314   obs-fold whitespace with either a single SP or a matching number of SP
     1315   octets (to avoid buffer copying) prior to interpreting the field value or
     1316   forwarding the message downstream.
    13471317</t>
    13481318<t>
     
    13561326   opaque data.
    13571327</t>
     1328</section>
     1329
     1330<section title="Field Length" anchor="field.length">
     1331<t>
     1332   HTTP does not place a pre-defined limit on the length of header fields,
     1333   either in isolation or as a set. A server &MUST; be prepared to receive
     1334   request header fields of unbounded length and respond with a 4xx status
     1335   code if the received header field(s) would be longer than the server wishes
     1336   to handle.
     1337</t>
     1338<t>
     1339   A client that receives response headers that are longer than it wishes to
     1340   handle can only treat it as a server error.
     1341</t>
     1342<t>
     1343   Various ad-hoc limitations on header length are found in practice. It is
     1344   &RECOMMENDED; that all HTTP senders and recipients support messages whose
     1345   combined header fields have 4000 or more octets.
     1346</t>
     1347</section>
     1348
     1349<section title="Common Field ABNF Rules" anchor="field.rules">
     1350<t anchor="rule.token.separators">
     1351  <x:anchor-alias value="tchar"/>
     1352  <x:anchor-alias value="token"/>
     1353  <x:anchor-alias value="special"/>
     1354  <x:anchor-alias value="word"/>
     1355   Many HTTP/1.1 header field values consist of words (token or quoted-string)
     1356   separated by whitespace or special characters. These special characters
     1357   &MUST; be in a quoted string to be used within a parameter value (as defined
     1358   in <xref target="transfer.codings"/>).
     1359</t>
     1360<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="word"/><iref primary="true" item="Grammar" subitem="token"/><iref primary="true" item="Grammar" subitem="tchar"/><iref primary="true" item="Grammar" subitem="special"/>
     1361  <x:ref>word</x:ref>           = <x:ref>token</x:ref> / <x:ref>quoted-string</x:ref>
     1362
     1363  <x:ref>token</x:ref>          = 1*<x:ref>tchar</x:ref>
     1364<!--
     1365  IMPORTANT: when editing "tchar" make sure that "special" is updated accordingly!!!
     1366 -->
     1367  <x:ref>tchar</x:ref>          = "!" / "#" / "$" / "%" / "&amp;" / "'" / "*"
     1368                 / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
     1369                 / <x:ref>DIGIT</x:ref> / <x:ref>ALPHA</x:ref>
     1370                 ; any <x:ref>VCHAR</x:ref>, except <x:ref>special</x:ref>
     1371
     1372  <x:ref>special</x:ref>        = "(" / ")" / "&lt;" / ">" / "@" / ","
     1373                 / ";" / ":" / "\" / DQUOTE / "/" / "["
     1374                 / "]" / "?" / "=" / "{" / "}"
     1375</artwork></figure>
     1376<t anchor="rule.quoted-string">
     1377  <x:anchor-alias value="quoted-string"/>
     1378  <x:anchor-alias value="qdtext"/>
     1379  <x:anchor-alias value="obs-text"/>
     1380   A string of text is parsed as a single word if it is quoted using
     1381   double-quote marks.
     1382</t>
     1383<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="quoted-string"/><iref primary="true" item="Grammar" subitem="qdtext"/><iref primary="true" item="Grammar" subitem="obs-text"/>
     1384  <x:ref>quoted-string</x:ref>  = <x:ref>DQUOTE</x:ref> *( <x:ref>qdtext</x:ref> / <x:ref>quoted-pair</x:ref> ) <x:ref>DQUOTE</x:ref>
     1385  <x:ref>qdtext</x:ref>         = <x:ref>OWS</x:ref> / %x21 / %x23-5B / %x5D-7E / <x:ref>obs-text</x:ref>
     1386  <x:ref>obs-text</x:ref>       = %x80-FF
     1387</artwork></figure>
     1388<t anchor="rule.quoted-pair">
     1389  <x:anchor-alias value="quoted-pair"/>
     1390   The backslash octet ("\") can be used as a single-octet
     1391   quoting mechanism within quoted-string constructs:
     1392</t>
     1393<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="quoted-pair"/>
     1394  <x:ref>quoted-pair</x:ref>    = "\" ( <x:ref>WSP</x:ref> / <x:ref>VCHAR</x:ref> / <x:ref>obs-text</x:ref> )
     1395</artwork></figure>
     1396<t>
     1397   Recipients that process the value of the quoted-string &MUST; handle a
     1398   quoted-pair as if it were replaced by the octet following the backslash.
     1399</t>
     1400<t>
     1401   Senders &SHOULD-NOT; escape octets in quoted-strings that do not require
     1402   escaping (i.e., other than DQUOTE and the backslash octet).
     1403</t>
    13581404<t anchor="rule.comment">
    13591405  <x:anchor-alias value="comment"/>
     
    13661412  <x:ref>comment</x:ref>        = "(" *( <x:ref>ctext</x:ref> / <x:ref>quoted-cpair</x:ref> / <x:ref>comment</x:ref> ) ")"
    13671413  <x:ref>ctext</x:ref>          = <x:ref>OWS</x:ref> / %x21-27 / %x2A-5B / %x5D-7E / <x:ref>obs-text</x:ref>
    1368                  ; <x:ref>OWS</x:ref> / &lt;<x:ref>VCHAR</x:ref> except "(", ")", and "\"&gt; / <x:ref>obs-text</x:ref>
    13691414</artwork></figure>
    13701415<t anchor="rule.quoted-cpair">
     
    13771422</artwork></figure>
    13781423<t>
    1379    Senders &SHOULD-NOT; escape octets that do not require escaping
    1380    (i.e., other than the backslash octet "\" and the parentheses "(" and
    1381    ")").
    1382 </t>
    1383 <t>
    1384    HTTP does not place a pre-defined limit on the length of header fields,
    1385    either in isolation or as a set. A server &MUST; be prepared to receive
    1386    request header fields of unbounded length and respond with a 4xx status
    1387    code if the received header field(s) would be longer than the server wishes
    1388    to handle.
    1389 </t>
    1390 <t>
    1391    A client that receives response headers that are longer than it wishes to
    1392    handle can only treat it as a server error.
    1393 </t>
    1394 <t>
    1395    Various ad-hoc limitations on header length are found in practice. It is
    1396    &RECOMMENDED; that all HTTP senders and recipients support messages whose
    1397    combined header fields have 4000 or more octets.
    1398 </t>
     1424   Senders &SHOULD-NOT; escape octets in comments that do not require escaping
     1425   (i.e., other than the backslash octet "\" and the parentheses "(" and ")").
     1426</t>
     1427</section>
    13991428</section>
    14001429
Note: See TracChangeset for help on using the changeset viewer.