Ignore:
Timestamp:
Aug 31, 2011, 4:38:20 PM (8 years ago)
Author:
fielding@…
Message:

Clarify that parsing as octets is a MUST requirement (as already
implied by the ABNF) and explain the security issues, as well as
the point at which normal parsers can be used.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • draft-ietf-httpbis/latest/p1-messaging.html

    r1422 r1424  
    11921192         is read or the connection is closed.
    11931193      </p>
    1194       <p id="rfc.section.3.1.p.2">Care must be taken to parse an HTTP message as a sequence of octets in an encoding that is a superset of US-ASCII. Attempting
    1195          to parse HTTP as a stream of Unicode characters in a character encoding like UTF-16 might introduce security flaws due to
    1196          the differing ways that such parsers interpret invalid characters.
     1194      <p id="rfc.section.3.1.p.2">Recipients <em class="bcp14">MUST</em> parse an HTTP message as a sequence of octets in an encoding that is a superset of US-ASCII <a href="#USASCII" id="rfc.xref.USASCII.2"><cite title="Coded Character Set -- 7-bit American Standard Code for Information Interchange">[USASCII]</cite></a>. Parsing an HTTP message as a stream of Unicode characters, without regard for the specific encoding, creates security vulnerabilities
     1195         due to the varying ways that string processing libraries handle invalid multibyte character sequences that contain the octet
     1196         LF (%x0A). String-based parsers can only be safely used within protocol elements after the element has been extracted from
     1197         the message, such as within a header field-value after message parsing has delineated the individual fields.
    11971198      </p>
    11981199      <p id="rfc.section.3.1.p.3">Older HTTP/1.0 client implementations might send an extra CRLF after a POST request as a lame workaround for some early server
     
    12581259         (to avoid buffer copying) prior to interpreting the field value or forwarding the message downstream.
    12591260      </p>
    1260       <p id="rfc.section.3.2.1.p.4">Historically, HTTP has allowed field content with text in the ISO-8859-1 <a href="#ISO-8859-1" id="rfc.xref.ISO-8859-1.1"><cite title="Information technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1">[ISO-8859-1]</cite></a> character encoding and supported other character sets only through use of <a href="#RFC2047" id="rfc.xref.RFC2047.1"><cite title="MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text">[RFC2047]</cite></a> encoding. In practice, most HTTP header field values use only a subset of the US-ASCII character encoding <a href="#USASCII" id="rfc.xref.USASCII.2"><cite title="Coded Character Set -- 7-bit American Standard Code for Information Interchange">[USASCII]</cite></a>. Newly defined header fields <em class="bcp14">SHOULD</em> limit their field values to US-ASCII octets. Recipients <em class="bcp14">SHOULD</em> treat other (obs-text) octets in field content as opaque data.
     1261      <p id="rfc.section.3.2.1.p.4">Historically, HTTP has allowed field content with text in the ISO-8859-1 <a href="#ISO-8859-1" id="rfc.xref.ISO-8859-1.1"><cite title="Information technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1">[ISO-8859-1]</cite></a> character encoding and supported other character sets only through use of <a href="#RFC2047" id="rfc.xref.RFC2047.1"><cite title="MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text">[RFC2047]</cite></a> encoding. In practice, most HTTP header field values use only a subset of the US-ASCII character encoding <a href="#USASCII" id="rfc.xref.USASCII.3"><cite title="Coded Character Set -- 7-bit American Standard Code for Information Interchange">[USASCII]</cite></a>. Newly defined header fields <em class="bcp14">SHOULD</em> limit their field values to US-ASCII octets. Recipients <em class="bcp14">SHOULD</em> treat other (obs-text) octets in field content as opaque data.
    12611262      </p>
    12621263      <h3 id="rfc.section.3.2.2"><a href="#rfc.section.3.2.2">3.2.2</a>&nbsp;<a id="field.length" href="#field.length">Field Length</a></h3>
     
    40124013                     </ul>
    40134014                  </li>
    4014                   <li><em>USASCII</em>&nbsp;&nbsp;<a href="#rfc.xref.USASCII.1">1.2</a>, <a href="#rfc.xref.USASCII.2">3.2.1</a>, <a href="#USASCII"><b>13.1</b></a></li>
     4015                  <li><em>USASCII</em>&nbsp;&nbsp;<a href="#rfc.xref.USASCII.1">1.2</a>, <a href="#rfc.xref.USASCII.2">3.1</a>, <a href="#rfc.xref.USASCII.3">3.2.1</a>, <a href="#USASCII"><b>13.1</b></a></li>
    40154016                  <li>user agent&nbsp;&nbsp;<a href="#rfc.iref.u.1"><b>2.1</b></a></li>
    40164017               </ul>
Note: See TracChangeset for help on using the changeset viewer.