wiki:ContentDispositionErrorHandling

Version 18 (modified by ietf@…, 9 years ago) (diff)

--

A proposal for optional error handling in processing the Content-Disposition header field

Text from Julian

  1. Parsing

This document does not require any specific handling of invalid header field values. With this in mind, the text below describes a simple strategy for parsing the header field and detecting problems in general, or in specific parameters.

5.1. Combine Multiple Instances of Content-Disposition

If the HTTP message contains multiple instances of the Content- Disposition header field, combine all field values into a single one as specified in Section 4.2 of [RFC2616].

5.2. Parsing for Disposition Type and Parameters

Using the simplified grammar below:

field-value = disp-type *( ";" param ) disp-type = token param = token "=" value

...parse the field value into a disp-type (disposition type) and a sequence of parameters (pairs of name (token) and value).

Treat the result values as characters encoded using the ISO-8859-1 character encoding ([ISO-8859-1]).

Lower-case all disposition types and parameter names (note that these characters will all fall into the US-ASCII range by definition in the ABNF).

If the field value does not conform to the grammar (such as when not exactly one disposition type is specified), ignore the whole header field.

5.3. Checking Cardinality Constraints

If the parameter sequence contains multiple instances of the same parameter name, ignore the whole header field.

5.4. Post-Process Parameter Values

For each parameter, post-process the associated value part according to the grammar:

o According to Section 3.2.1 of [RFC5987] for parameters using the

RFC 5987 syntax (such as "filename*"). If this fails, just ignore this parameter.

o According to the grammar for quoted-string (Section 2.2 of

[RFC2616]) for values starting with a double quote character (").

o Verbatim otherwise.

Note that this step starts with an octet sequence obtained from the HTTP message, and results in a sequence of Unicode characters.

5.5. Extracting the Disposition Type

The parsing step (Appendix 5.2) has returned the disposition type (to be matched case-insensitively), which can be "attachment", "inline", or an extension type. If the type is unknown, treat it like "attachment" (see Section 3.2).

5.6. Determining the File Name

The parsing and post-processing steps resulted in a set of parameters (name/value pairs). The suggested file name is the value of the "filename*" parameter (when present), otherwise the value of the "filename" parameter.

If neither is given, the UA can determine a name based on the associated URI; for instance based on the last path segment.

Otherwise, the UA ought to post-process the suggested filename according following Section 3.3. [[anchor10: We could say here that UAs may reject filenames for security reasons, such as those with a path separator character.]]

Extracting Parameter Values From Header Fields

To extract the value for a given parameter-name from an unparsed-string, parse the unparsed-string using the following grammar:

unparsed-string  = unbalanced-block / block * ( ";" block ) [ ";" unbalanced-block ]
block            = *run
unbalanced-block = *run unbalanced-run
run              = unquoted-run / quoted-run
unquoted-run     = non-quote *boring-octet
quoted-run       = <"> *non-quote <">
unbalanced-run   = <"> *non-quote
non-quote        = <OCTET, except <"> >
boring-octet     = <OCTET, except <"> and ";">

Parse each block, in turn, (including the unbalanced-block, if present) using the following grammar:

block = *LWS name *LWS "=" value
value = OCTET

where the name production is a gramatical production that is a case-insensitive match for the given parameter-name. If any block can be parsed by the grammar, let the raw-value be the characters produced by the value production of the first such block. Otherwise, let the raw-value be the empty string.

If the raw-value both begins an ends with a <"> character, return the value stripped of those <"> characters. Otherwise return the raw-value.

jre: this changes the interpretation of escape sequences in quoted-string, thus changes the interpretation of valid header fields

Decoding the File Name

To filename-decode an encoded-string, use the following algorithm:

  1. If the encoded-string contains non-ASCII characters, emit the encoded-string (decoded as ISO-8859-1) and abort these steps.
  2. Let the url-unescaped-string be the encoded-string %-unescaped.
  3. Emit the url-unescaped-string (decoded as UTF-8). (There's actually more sadness here if the url-unescaped-string isn't valid UTF-8.)

The emitted characters are the decoded file name.

jre: percent-unescaping is only done by Chrome and IE, and breaks the semantics of valid header fields

Determining the File Name

To determine the file name indicated by a Content-Disposition header field, use the following algorithm:

  1. Let filename-star be the value extracted from the Content-Disposition header field for for the "filename*" parameter.
  2. If filename-star parses as a RFC5987-value, return the RFC5987-value of filename-star and abort these steps.
  3. Let filename be the value extracted from the Content-Disposition header field for the "filename" parameter.
  4. If filename is empty, instead let filename be the value extracted from the Content-Disposition header field for the "name" parameter.
  5. If filename is empty, return the empty string and abort these steps.
  6. Return the filename-decoding of filename.

jre: 'name' is only supported by FF and Chrome, see http://greenbytes.de/tech/tc2231/#attwithnamepct

Determining the Disposition

To determine the disposition-type, parse the Content-Disposition header field using the following grammar:

unparsed-string = *LWS nominal-type *OCTET
nominal-type    = "inline" / "filename" / "name" / ";"

If the Content-Disposition header field is non-empty and fails to parse, then the disposition type is "attachment". Otherwise, the disposition-type is "inline".

jre: (a) the instructions are very confusing, because the grammar doesn't mention the actual disposition types, (b) it's not clear what problem is solved here; what illegal header fields is this parsing that occur in practice?

Processing the Content-Disposition Header Field

To process the Content-Disposition header field, use the following algorithm:

  1. Determine the disposition-type.
  2. If the disposition-type is "inline", then ...
  3. If the disposition-type is "attachment", then let filename be the file name indicated by the header field. ...