Changeset 1424


Ignore:
Timestamp:
Aug 31, 2011, 4:38:20 PM (8 years ago)
Author:
fielding@…
Message:

Clarify that parsing as octets is a MUST requirement (as already
implied by the ABNF) and explain the security issues, as well as
the point at which normal parsers can be used.

Location:
draft-ietf-httpbis/latest
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • draft-ietf-httpbis/latest/p1-messaging.html

    r1422 r1424  
    11921192         is read or the connection is closed.
    11931193      </p>
    1194       <p id="rfc.section.3.1.p.2">Care must be taken to parse an HTTP message as a sequence of octets in an encoding that is a superset of US-ASCII. Attempting
    1195          to parse HTTP as a stream of Unicode characters in a character encoding like UTF-16 might introduce security flaws due to
    1196          the differing ways that such parsers interpret invalid characters.
     1194      <p id="rfc.section.3.1.p.2">Recipients <em class="bcp14">MUST</em> parse an HTTP message as a sequence of octets in an encoding that is a superset of US-ASCII <a href="#USASCII" id="rfc.xref.USASCII.2"><cite title="Coded Character Set -- 7-bit American Standard Code for Information Interchange">[USASCII]</cite></a>. Parsing an HTTP message as a stream of Unicode characters, without regard for the specific encoding, creates security vulnerabilities
     1195         due to the varying ways that string processing libraries handle invalid multibyte character sequences that contain the octet
     1196         LF (%x0A). String-based parsers can only be safely used within protocol elements after the element has been extracted from
     1197         the message, such as within a header field-value after message parsing has delineated the individual fields.
    11971198      </p>
    11981199      <p id="rfc.section.3.1.p.3">Older HTTP/1.0 client implementations might send an extra CRLF after a POST request as a lame workaround for some early server
     
    12581259         (to avoid buffer copying) prior to interpreting the field value or forwarding the message downstream.
    12591260      </p>
    1260       <p id="rfc.section.3.2.1.p.4">Historically, HTTP has allowed field content with text in the ISO-8859-1 <a href="#ISO-8859-1" id="rfc.xref.ISO-8859-1.1"><cite title="Information technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1">[ISO-8859-1]</cite></a> character encoding and supported other character sets only through use of <a href="#RFC2047" id="rfc.xref.RFC2047.1"><cite title="MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text">[RFC2047]</cite></a> encoding. In practice, most HTTP header field values use only a subset of the US-ASCII character encoding <a href="#USASCII" id="rfc.xref.USASCII.2"><cite title="Coded Character Set -- 7-bit American Standard Code for Information Interchange">[USASCII]</cite></a>. Newly defined header fields <em class="bcp14">SHOULD</em> limit their field values to US-ASCII octets. Recipients <em class="bcp14">SHOULD</em> treat other (obs-text) octets in field content as opaque data.
     1261      <p id="rfc.section.3.2.1.p.4">Historically, HTTP has allowed field content with text in the ISO-8859-1 <a href="#ISO-8859-1" id="rfc.xref.ISO-8859-1.1"><cite title="Information technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1">[ISO-8859-1]</cite></a> character encoding and supported other character sets only through use of <a href="#RFC2047" id="rfc.xref.RFC2047.1"><cite title="MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text">[RFC2047]</cite></a> encoding. In practice, most HTTP header field values use only a subset of the US-ASCII character encoding <a href="#USASCII" id="rfc.xref.USASCII.3"><cite title="Coded Character Set -- 7-bit American Standard Code for Information Interchange">[USASCII]</cite></a>. Newly defined header fields <em class="bcp14">SHOULD</em> limit their field values to US-ASCII octets. Recipients <em class="bcp14">SHOULD</em> treat other (obs-text) octets in field content as opaque data.
    12611262      </p>
    12621263      <h3 id="rfc.section.3.2.2"><a href="#rfc.section.3.2.2">3.2.2</a>&nbsp;<a id="field.length" href="#field.length">Field Length</a></h3>
     
    40124013                     </ul>
    40134014                  </li>
    4014                   <li><em>USASCII</em>&nbsp;&nbsp;<a href="#rfc.xref.USASCII.1">1.2</a>, <a href="#rfc.xref.USASCII.2">3.2.1</a>, <a href="#USASCII"><b>13.1</b></a></li>
     4015                  <li><em>USASCII</em>&nbsp;&nbsp;<a href="#rfc.xref.USASCII.1">1.2</a>, <a href="#rfc.xref.USASCII.2">3.1</a>, <a href="#rfc.xref.USASCII.3">3.2.1</a>, <a href="#USASCII"><b>13.1</b></a></li>
    40154016                  <li>user agent&nbsp;&nbsp;<a href="#rfc.iref.u.1"><b>2.1</b></a></li>
    40164017               </ul>
  • draft-ietf-httpbis/latest/p1-messaging.xml

    r1420 r1424  
    11761176</t>
    11771177<t>
    1178    Care must be taken to parse an HTTP message as a sequence
    1179    of octets in an encoding that is a superset of US-ASCII.  Attempting
    1180    to parse HTTP as a stream of Unicode characters in a character encoding
    1181    like UTF-16 might introduce security flaws due to the differing ways
    1182    that such parsers interpret invalid characters.
     1178   Recipients &MUST; parse an HTTP message as a sequence of octets in an
     1179   encoding that is a superset of US-ASCII <xref target="USASCII"/>.
     1180   Parsing an HTTP message as a stream of Unicode characters, without regard
     1181   for the specific encoding, creates security vulnerabilities due to the
     1182   varying ways that string processing libraries handle invalid multibyte
     1183   character sequences that contain the octet LF (%x0A).  String-based
     1184   parsers can only be safely used within protocol elements after the element
     1185   has been extracted from the message, such as within a header field-value
     1186   after message parsing has delineated the individual fields.
    11831187</t>
    11841188<t>
Note: See TracChangeset for help on using the changeset viewer.