Ignore:
Timestamp:
Jul 28, 2009, 8:02:09 AM (10 years ago)
Author:
fielding@…
Message:

first pass at cleaning up message parsing definition: header fields.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • draft-ietf-httpbis/latest/p1-messaging.xml

    r646 r647  
    427427                 ; "bad" whitespace
    428428  <x:ref>obs-fold</x:ref>       = <x:ref>CRLF</x:ref>
    429                  ; see <xref target="message.headers"/>
     429                 ; see <xref target="header.fields"/>
    430430</artwork></figure>
    431431<t anchor="rule.token.separators">
     
    10191019
    10201020<section title="HTTP Message" anchor="http.message">
    1021 
    1022 <section title="Message Types" anchor="message.types">
    1023   <x:anchor-alias value="generic-message"/>
    1024   <x:anchor-alias value="HTTP-message"/>
    1025   <x:anchor-alias value="start-line"/>
    1026 <t>
    1027    HTTP messages consist of requests from client to server and responses
    1028    from server to client.
     1021<x:anchor-alias value="generic-message"/>
     1022<x:anchor-alias value="message.types"/>
     1023<x:anchor-alias value="HTTP-message"/>
     1024<x:anchor-alias value="start-line"/>
     1025<iref item="header section"/>
     1026<iref item="headers"/>
     1027<iref item="header field"/>
     1028<t>
     1029   All HTTP/1.1 messages consist of a start-line followed by a sequence of
     1030   characters in a format similar to the Internet Message Format
     1031   <xref target="RFC5322"/>: zero or more header fields (collectively
     1032   referred to as the "headers" or the "header section"), an empty line
     1033   indicating the end of the header section, and an optional message-body.
     1034</t>
     1035<t>
     1036   An HTTP message can either be a request from client to server or a
     1037   response from server to client.  Syntactically, the two types of message
     1038   differ only in the start-line, which is either a Request-Line (for requests)
     1039   or a Status-Line (for responses), and in the algorithm for determining
     1040   the length of the message-body (<xref target="message.length"/>).
     1041   In theory, a client could receive requests and a server could receive
     1042   responses, distinguishing them by their different start-line formats,
     1043   but in practice servers are implemented to only expect a request
     1044   (a response is interpreted as an unknown or invalid request method)
     1045   and clients are implemented to only expect a response.
    10291046</t>
    10301047<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="HTTP-message"/>
    1031   <x:ref>HTTP-message</x:ref>   = <x:ref>Request</x:ref> / <x:ref>Response</x:ref>     ; HTTP/1.1 messages
    1032 </artwork></figure>
    1033 <t>
    1034    Request (<xref target="request"/>) and Response (<xref target="response"/>) messages use the generic
    1035    message format of <xref target="RFC5322"/> for transferring entities (the payload
    1036    of the message). Both types of message consist of a start-line, zero
    1037    or more header fields (also known as "headers"), an empty line (i.e.,
    1038    a line with nothing preceding the CRLF) indicating the end of the
    1039    header fields, and possibly a message-body.
    1040 </t>
    1041 <figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="generic-message"/><iref primary="true" item="Grammar" subitem="start-line"/>
    1042   <x:ref>generic-message</x:ref> = <x:ref>start-line</x:ref>
    1043                     *( <x:ref>message-header</x:ref> <x:ref>CRLF</x:ref> )
     1048  <x:ref>HTTP-message</x:ref>    = <x:ref>start-line</x:ref>
     1049                    *( <x:ref>header-field</x:ref> <x:ref>CRLF</x:ref> )
    10441050                    <x:ref>CRLF</x:ref>
    10451051                    [ <x:ref>message-body</x:ref> ]
    10461052  <x:ref>start-line</x:ref>      = <x:ref>Request-Line</x:ref> / <x:ref>Status-Line</x:ref>
    10471053</artwork></figure>
    1048 <t>
    1049    In the interest of robustness, servers &SHOULD; ignore any empty
    1050    line(s) received where a Request-Line is expected. In other words, if
    1051    the server is reading the protocol stream at the beginning of a
    1052    message and receives a CRLF first, it should ignore the CRLF.
    1053 </t>
    1054 <t>
    1055    Certain buggy HTTP/1.0 client implementations generate extra CRLF's
    1056    after a POST request. To restate what is explicitly forbidden by the
    1057    BNF, an HTTP/1.1 client &MUST-NOT; preface or follow a request with an
    1058    extra CRLF.
    1059 </t>
    10601054<t>
    10611055   Whitespace (WSP) &MUST-NOT; be sent between the start-line and the first
     
    10671061   with a 400 (Bad Request) response.
    10681062</t>
    1069 </section>
    1070 
    1071 <section title="Message Headers" anchor="message.headers">
     1063
     1064<section title="Message Parsing Robustness" anchor="message.robustness">
     1065<t>
     1066   In the interest of robustness, servers &SHOULD; ignore at least one
     1067   empty line received where a Request-Line is expected. In other words, if
     1068   the server is reading the protocol stream at the beginning of a
     1069   message and receives a CRLF first, it should ignore the CRLF.
     1070</t>
     1071<t>
     1072   Some old HTTP/1.0 client implementations generate an extra CRLF
     1073   after a POST request as a lame workaround for some early server
     1074   applications that failed to read message-body content that was
     1075   not terminated by a line-ending. An HTTP/1.1 client &MUST-NOT;
     1076   preface or follow a request with an extra CRLF.  If terminating
     1077   the request message-body with a line-ending is desired, then the
     1078   client &MUST; include the terminating CRLF octets as part of the
     1079   message-body length.
     1080</t>
     1081<t>
     1082   The normal procedure for parsing an HTTP message is to read the
     1083   start-line into a structure, read each header field into a hash
     1084   table by field name until the empty line, and then use the parsed
     1085   data to determine if a message-body is expected.  If a message-body
     1086   has been indicated, then it is read as a stream until an amount
     1087   of OCTETs equal to the message-length is read or the connection
     1088   is closed.  Care must be taken to parse an HTTP message as a sequence
     1089   of OCTETs in an encoding that is a superset of US-ASCII.  Attempting
     1090   to parse HTTP as a stream of Unicode characters in a character encoding
     1091   like UTF-16 may introduce security flaws due to the differing ways
     1092   that such parsers interpret invalid characters.
     1093</t>
     1094</section>
     1095
     1096<section title="Header Fields" anchor="header.fields">
     1097  <x:anchor-alias value="header-field"/>
    10721098  <x:anchor-alias value="field-content"/>
    10731099  <x:anchor-alias value="field-name"/>
    10741100  <x:anchor-alias value="field-value"/>
    1075   <x:anchor-alias value="message-header"/>
    1076 <t>
    1077    HTTP header fields follow the same general format as Internet messages in
    1078    <xref target="RFC5322" x:fmt="of" x:sec="2.1"/>. Each header field consists
    1079    of a name followed by a colon (":"), optional whitespace, and the field
    1080    value. Field names are case-insensitive.
    1081 </t>
    1082 <figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="message-header"/><iref primary="true" item="Grammar" subitem="field-name"/><iref primary="true" item="Grammar" subitem="field-value"/><iref primary="true" item="Grammar" subitem="field-content"/>
    1083   <x:ref>message-header</x:ref> = <x:ref>field-name</x:ref> ":" OWS [ <x:ref>field-value</x:ref> ] OWS
     1101  <x:anchor-alias value="OWS"/>
     1102<t>
     1103   Each HTTP header field consists of a case-insensitive field name
     1104   followed by a colon (":"), optional whitespace, and the field value.
     1105</t>
     1106<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="header-field"/><iref primary="true" item="Grammar" subitem="field-name"/><iref primary="true" item="Grammar" subitem="field-value"/><iref primary="true" item="Grammar" subitem="field-content"/>
     1107  <x:ref>header-field</x:ref>   = <x:ref>field-name</x:ref> ":" OWS [ <x:ref>field-value</x:ref> ] OWS
    10841108  <x:ref>field-name</x:ref>     = <x:ref>token</x:ref>
    10851109  <x:ref>field-value</x:ref>    = *( <x:ref>field-content</x:ref> / <x:ref>OWS</x:ref> )
     
    10871111</artwork></figure>
    10881112<t>
    1089    Historically, HTTP has allowed field-content with text in the ISO-8859-1
    1090    <xref target="ISO-8859-1"/> character encoding (allowing other character sets
    1091    through use of <xref target="RFC2047"/> encoding). In practice, most HTTP
    1092    header field-values use only a subset of the US-ASCII charset
    1093    <xref target="USASCII"/>. Newly defined header fields &SHOULD; constrain
    1094    their field-values to US-ASCII characters. Recipients &SHOULD; treat other
    1095    (obs-text) octets in field-content as opaque data.
    1096 </t>
    1097 <t>
    1098    No whitespace is allowed between the header field-name and colon. For
     1113   No whitespace is allowed between the header field name and colon. For
    10991114   security reasons, any request message received containing such whitespace
    1100    &MUST; be rejected with a response code of 400 (Bad Request) and any such
    1101    whitespace in a response message &MUST; be removed.
    1102 </t>
    1103 <t>
    1104    The field value &MAY; be preceded by optional whitespace; a single SP is
    1105    preferred. The field-value does not include any leading or trailing white
     1115   &MUST; be rejected with a response code of 400 (Bad Request). A proxy
     1116   &MUST; remove any such whitespace from a response message before
     1117   forwarding the message downstream.
     1118</t>
     1119<t>
     1120   A field value &MAY; be preceded by optional whitespace (OWS); a single SP is
     1121   preferred. The field value does not include any leading or trailing white
    11061122   space: OWS occurring before the first non-whitespace character of the
    1107    field-value or after the last non-whitespace character of the field-value
    1108    is ignored and &MAY; be removed without changing the meaning of the header
     1123   field value or after the last non-whitespace character of the field value
     1124   is ignored and &SHOULD; be removed without changing the meaning of the header
    11091125   field.
    11101126</t>
     1127<t>
     1128   The order in which header fields with differing field names are
     1129   received is not significant. However, it is "good practice" to send
     1130   header fields that contain control data first, such as Host on
     1131   requests and Date on responses, so that implementations can decide
     1132   when not to handle a message as early as possible.  A server &MUST;
     1133   wait until the entire header section is received before interpreting
     1134   a request message, since later header fields might include conditionals,
     1135   authentication credentials, or deliberately misleading duplicate
     1136   header fields that would impact request processing.
     1137</t>
     1138<t>
     1139   Multiple header fields with the same field name &MAY; be
     1140   sent in a message if and only if the entire field value for that
     1141   header field is defined as a comma-separated list [i.e., #(values)].
     1142   Multiple header fields with the same field name can be combined into
     1143   one "field-name: field-value" pair, without changing the semantics of the
     1144   message, by appending each subsequent field value to the combined
     1145   field value in order, separated by a comma. The order in which
     1146   header fields with the same field name are received is therefore
     1147   significant to the interpretation of the combined field value;
     1148   a proxy &MUST-NOT; change the order of these field values when
     1149   forwarding a message.
     1150</t>
     1151<x:note>
     1152  <t>
     1153   <x:h>Note:</x:h> the "Set-Cookie" header as implemented in
     1154   practice (as opposed to how it is specified in <xref target="RFC2109"/>)
     1155   can occur multiple times, but does not use the list syntax, and thus cannot
     1156   be combined into a single line. (See Appendix A.2.3 of <xref target="Kri2001"/>
     1157   for details.) Also note that the Set-Cookie2 header specified in
     1158   <xref target="RFC2965"/> does not share this problem.
     1159  </t>
     1160</x:note>
    11111161<t>
    11121162   Historically, HTTP header field values could be extended over multiple
     
    11221172   or forwarding the message downstream.
    11231173</t>
     1174<t>
     1175   Historically, HTTP has allowed field content with text in the ISO-8859-1
     1176   <xref target="ISO-8859-1"/> character encoding and supported other
     1177   character sets only through use of <xref target="RFC2047"/> encoding.
     1178   In practice, most HTTP header field values use only a subset of the
     1179   US-ASCII character encoding <xref target="USASCII"/>. Newly defined
     1180   header fields &SHOULD; limit their field values to US-ASCII characters.
     1181   Recipients &SHOULD; treat other (obs-text) octets in field content as
     1182   opaque data.
     1183</t>
    11241184<t anchor="rule.comment">
    11251185  <x:anchor-alias value="comment"/>
     
    11281188   the comment text with parentheses. Comments are only allowed in
    11291189   fields containing "comment" as part of their field value definition.
    1130    In all other fields, parentheses are considered part of the field
    1131    value.
    11321190</t>
    11331191<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="comment"/><iref primary="true" item="Grammar" subitem="ctext"/>
     
    11361194                 ; <x:ref>OWS</x:ref> / &lt;<x:ref>VCHAR</x:ref> except "(", ")", and "\"&gt; / <x:ref>obs-text</x:ref>
    11371195</artwork></figure>
    1138 <t>
    1139    The order in which header fields with differing field names are
    1140    received is not significant. However, it is "good practice" to send
    1141    general-header fields first, followed by request-header or response-header
    1142    fields, and ending with the entity-header fields.
    1143 </t>
    1144 <t>
    1145    Multiple message-header fields with the same field-name &MAY; be
    1146    present in a message if and only if the entire field-value for that
    1147    header field is defined as a comma-separated list [i.e., #(values)].
    1148    It &MUST; be possible to combine the multiple header fields into one
    1149    "field-name: field-value" pair, without changing the semantics of the
    1150    message, by appending each subsequent field-value to the first, each
    1151    separated by a comma. The order in which header fields with the same
    1152    field-name are received is therefore significant to the
    1153    interpretation of the combined field value, and thus a proxy &MUST-NOT;
    1154    change the order of these field values when a message is forwarded.
    1155 </t>
    1156 <x:note>
    1157   <t>
    1158    <x:h>Note:</x:h> the "Set-Cookie" header as implemented in
    1159    practice (as opposed to how it is specified in <xref target="RFC2109"/>)
    1160    can occur multiple times, but does not use the list syntax, and thus cannot
    1161    be combined into a single line. (See Appendix A.2.3 of <xref target="Kri2001"/>
    1162    for details.) Also note that the Set-Cookie2 header specified in
    1163    <xref target="RFC2965"/> does not share this problem.
    1164   </t>
    1165 </x:note>
    11661196 
    11671197</section>
     
    11941224   The presence of a message-body in a request is signaled by the
    11951225   inclusion of a Content-Length or Transfer-Encoding header field in
    1196    the request's message-headers.
     1226   the request's header fields.
    11971227   When a request message contains both a message-body of non-zero
    11981228   length and a method that does not define any semantics for that
     
    23712401
    23722402
    2373 <section title="Header Field Definitions" anchor="header.fields">
     2403<section title="Header Field Definitions" anchor="header.field.definitions">
    23742404<t>
    23752405   This section defines the syntax and semantics of HTTP/1.1 header fields
     
    40594089</t>
    40604090<t>
    4061    The line terminator for message-header fields is the sequence CRLF.
     4091   The line terminator for header fields is the sequence CRLF.
    40624092   However, we recommend that applications, when parsing such headers,
    40634093   recognize a single LF as a line terminator and ignore the leading CR.
     
    42974327<t>
    42984328  Require that invalid whitespace around field-names be rejected.
    4299   (<xref target="message.headers"/>)
     4329  (<xref target="header.fields"/>)
    43004330</t>
    43014331<t>
     
    43874417<x:ref>HTTP-Version</x:ref> = HTTP-Prot-Name "/" 1*DIGIT "." 1*DIGIT
    43884418<x:ref>HTTP-date</x:ref> = rfc1123-date / obs-date
    4389 <x:ref>HTTP-message</x:ref> = Request / Response
     4419<x:ref>HTTP-message</x:ref> = start-line *( header-field CRLF ) CRLF [
     4420 message-body ]
    43904421<x:ref>Host</x:ref> = "Host:" OWS Host-v
    43914422<x:ref>Host-v</x:ref> = uri-host [ ":" port ]
     
    44754506<x:ref>general-header</x:ref> = Cache-Control / Connection / Date / Pragma / Trailer
    44764507 / Transfer-Encoding / Upgrade / Via / Warning
    4477 <x:ref>generic-message</x:ref> = start-line *( message-header CRLF ) CRLF [
    4478  message-body ]
    44794508
    44804509<x:ref>hour</x:ref> = 2DIGIT
     
    44864515<x:ref>message-body</x:ref> = entity-body /
    44874516 &lt;entity-body encoded as per Transfer-Encoding&gt;
    4488 <x:ref>message-header</x:ref> = field-name ":" OWS [ field-value ] OWS
     4517<x:ref>header-field</x:ref> = field-name ":" OWS [ field-value ] OWS
    44894518<x:ref>minute</x:ref> = 2DIGIT
    44904519<x:ref>month</x:ref> = %x4A.61.6E ; Jan
Note: See TracChangeset for help on using the changeset viewer.