wiki:ContentDispositionErrorHandling

Version 23 (modified by ietf@…, 9 years ago) (diff)

--

A proposal for optional error handling in processing the Content-Disposition header field

This document does not require any specific handling of invalid header field values. With this in mind, the text below describes a simple strategy for parsing the header field and detecting problems in general, or in specific parameters.

Combine Multiple Instances of Content-Disposition

If the HTTP message contains multiple instances of the Content- Disposition header field, combine all field values into a single one as specified in Section 4.2 of [RFC2616].

Parsing for Disposition Type and Parameters

Parse the field value using the state machine below (starting in the INITIAL state):

INITIAL:
  <">   => Buffer the current character and switch to the QUOTED state.
  ";"   => Emit the buffered characters.
  EOF   => Emit the buffered characters.
  OTHER => Buffer the current character.

QUOTED:
  <">   => Buffer the current character and switch to the INITIAL state.
  EOF   => Emit the buffered characters.
  OTHER => Buffer the current character.

Consider each emitted string of characters in turn:

  1. If the string matches the grammar below, the string is a name-value-pair. Otherwise, the string is a disposition-type.
    name-value-pair = name "=" value
    name            = <OCTET, except "=">
    value           = OCTET
    
  1. If the string is the first disposition-type considered, the string is the the Disposition Type of the header field.
  1. If the string is a name-value-pair:
  1. Let the canonicalized-name be the name with any ASCII upper-case characters replaced with their lower-case equivalents and with all leading and trailing LWS removed.
  1. This is the first name-value-pair with this canonicalized-name, then the canonicalized-name parameter of the header field is value.

Post-Process Parameter Values

For each parameter, post-process the associated value part according to the grammar:

o According to Section 3.2.1 of [RFC5987] for parameters using the

RFC 5987 syntax (such as "filename*"). If this fails, just ignore this parameter.

o According to the grammar for quoted-string (Section 2.2 of

[RFC2616]) for values starting with a double quote character (").

o Verbatim otherwise.

Note that this step starts with an octet sequence obtained from the HTTP message, and results in a sequence of Unicode characters.

Extracting the Disposition Type

The parsing step (Appendix 5.2) has returned the disposition type (to be matched case-insensitively), which can be "attachment", "inline", or an extension type. If the type is unknown, treat it like "attachment" (see Section 3.2).

Determining the File Name

The parsing and post-processing steps resulted in a set of parameters (name/value pairs). The suggested file name is the value of the "filename*" parameter (when present), otherwise the value of the "filename" parameter.

If neither is given, the UA can determine a name based on the associated URI; for instance based on the last path segment.

Otherwise, the UA ought to post-process the suggested filename according following Section 3.3. [[anchor10: We could say here that UAs may reject filenames for security reasons, such as those with a path separator character.]]

Old text (to be integrated)

Extracting Parameter Values From Header Fields

To extract the value for a given parameter-name from an unparsed-string, parse the unparsed-string using the following grammar:

unparsed-string  = unbalanced-block / block * ( ";" block ) [ ";" unbalanced-block ]
block            = *run
unbalanced-block = *run unbalanced-run
run              = unquoted-run / quoted-run
unquoted-run     = non-quote *boring-octet
quoted-run       = <"> *non-quote <">
unbalanced-run   = <"> *non-quote
non-quote        = <OCTET, except <"> >
boring-octet     = <OCTET, except <"> and ";">

Parse each block, in turn, (including the unbalanced-block, if present) using the following grammar:

block = *LWS name *LWS "=" value
value = OCTET

where the name production is a gramatical production that is a case-insensitive match for the given parameter-name. If any block can be parsed by the grammar, let the raw-value be the characters produced by the value production of the first such block. Otherwise, let the raw-value be the empty string.

If the raw-value both begins an ends with a <"> character, return the value stripped of those <"> characters. Otherwise return the raw-value.

jre: this changes the interpretation of escape sequences in quoted-string, thus changes the interpretation of valid header fields

Decoding the File Name

To filename-decode an encoded-string, use the following algorithm:

  1. If the encoded-string contains non-ASCII characters, emit the encoded-string (decoded as ISO-8859-1) and abort these steps.
  2. Let the url-unescaped-string be the encoded-string %-unescaped.
  3. Emit the url-unescaped-string (decoded as UTF-8). (There's actually more sadness here if the url-unescaped-string isn't valid UTF-8.)

The emitted characters are the decoded file name.

jre: percent-unescaping is only done by Chrome and IE, and breaks the semantics of valid header fields

Determining the File Name

To determine the file name indicated by a Content-Disposition header field, use the following algorithm:

  1. Let filename-star be the value extracted from the Content-Disposition header field for for the "filename*" parameter.
  2. If filename-star parses as a RFC5987-value, return the RFC5987-value of filename-star and abort these steps.
  3. Let filename be the value extracted from the Content-Disposition header field for the "filename" parameter.
  4. If filename is empty, instead let filename be the value extracted from the Content-Disposition header field for the "name" parameter.
  5. If filename is empty, return the empty string and abort these steps.
  6. Return the filename-decoding of filename.

jre: 'name' is only supported by FF and Chrome, see http://greenbytes.de/tech/tc2231/#attwithnamepct

Determining the Disposition

To determine the disposition-type, parse the Content-Disposition header field using the following grammar:

unparsed-string = *LWS nominal-type *OCTET
nominal-type    = "inline" / "filename" / "name" / ";"

If the Content-Disposition header field is non-empty and fails to parse, then the disposition type is "attachment". Otherwise, the disposition-type is "inline".

jre: (a) the instructions are very confusing, because the grammar doesn't mention the actual disposition types, (b) it's not clear what problem is solved here; what illegal header fields is this parsing that occur in practice?

Processing the Content-Disposition Header Field

To process the Content-Disposition header field, use the following algorithm:

  1. Determine the disposition-type.
  2. If the disposition-type is "inline", then ...
  3. If the disposition-type is "attachment", then let filename be the file name indicated by the header field. ...