Version 21 (modified by ietf@…, 10 years ago) (diff) |
---|
A proposal for optional error handling in processing the Content-Disposition header field
This document does not require any specific handling of invalid header field values. With this in mind, the text below describes a simple strategy for parsing the header field and detecting problems in general, or in specific parameters.
Combine Multiple Instances of Content-Disposition
If the HTTP message contains multiple instances of the Content- Disposition header field, combine all field values into a single one as specified in Section 4.2 of [RFC2616].
Parsing for Disposition Type and Parameters
Parse the field value using the state machine below (starting in the INITIAL state):
INITIAL: <"> => Buffer the current character and switch to the QUOTED state. ";" => Emit the buffered characters. EOF => Emit the buffered characters. OTHER => Buffer the current character. QUOTED: <"> => Buffer the current character and switch to the INITIAL state. EOF => Emit the buffered characters. OTHER => Buffer the current character.
Consider each emitted string of characters in turn. If the string matches the grammar below, the string is a name-value-pair. Otherwise, the string is a disposition-type.
name-value-pair = name "=" value name = <OCTET, except "="> value = OCTET
Duplicate Elements
If the header field contains more than one disposition-type, ignore all the disposition-types except the first one.
Of all the name-value-pairs that share a common name (when compared ASCII-case insensitively), ignore all but the first.
Post-Process Parameter Values
For each parameter, post-process the associated value part according to the grammar:
o According to Section 3.2.1 of [RFC5987] for parameters using the
RFC 5987 syntax (such as "filename*"). If this fails, just ignore this parameter.
o According to the grammar for quoted-string (Section 2.2 of
[RFC2616]) for values starting with a double quote character (").
o Verbatim otherwise.
Note that this step starts with an octet sequence obtained from the HTTP message, and results in a sequence of Unicode characters.
Extracting the Disposition Type
The parsing step (Appendix 5.2) has returned the disposition type (to be matched case-insensitively), which can be "attachment", "inline", or an extension type. If the type is unknown, treat it like "attachment" (see Section 3.2).
Determining the File Name
The parsing and post-processing steps resulted in a set of parameters (name/value pairs). The suggested file name is the value of the "filename*" parameter (when present), otherwise the value of the "filename" parameter.
If neither is given, the UA can determine a name based on the associated URI; for instance based on the last path segment.
Otherwise, the UA ought to post-process the suggested filename according following Section 3.3. [[anchor10: We could say here that UAs may reject filenames for security reasons, such as those with a path separator character.]]
Old text (to be integrated)
Extracting Parameter Values From Header Fields
To extract the value for a given parameter-name from an unparsed-string, parse the unparsed-string using the following grammar:
unparsed-string = unbalanced-block / block * ( ";" block ) [ ";" unbalanced-block ] block = *run unbalanced-block = *run unbalanced-run run = unquoted-run / quoted-run unquoted-run = non-quote *boring-octet quoted-run = <"> *non-quote <"> unbalanced-run = <"> *non-quote non-quote = <OCTET, except <"> > boring-octet = <OCTET, except <"> and ";">
Parse each block, in turn, (including the unbalanced-block, if present) using the following grammar:
block = *LWS name *LWS "=" value value = OCTET
where the name production is a gramatical production that is a case-insensitive match for the given parameter-name. If any block can be parsed by the grammar, let the raw-value be the characters produced by the value production of the first such block. Otherwise, let the raw-value be the empty string.
If the raw-value both begins an ends with a <"> character, return the value stripped of those <"> characters. Otherwise return the raw-value.
jre: this changes the interpretation of escape sequences in quoted-string, thus changes the interpretation of valid header fields
Decoding the File Name
To filename-decode an encoded-string, use the following algorithm:
- If the encoded-string contains non-ASCII characters, emit the encoded-string (decoded as ISO-8859-1) and abort these steps.
- Let the url-unescaped-string be the encoded-string %-unescaped.
- Emit the url-unescaped-string (decoded as UTF-8). (There's actually more sadness here if the url-unescaped-string isn't valid UTF-8.)
The emitted characters are the decoded file name.
jre: percent-unescaping is only done by Chrome and IE, and breaks the semantics of valid header fields
Determining the File Name
To determine the file name indicated by a Content-Disposition header field, use the following algorithm:
- Let filename-star be the value extracted from the Content-Disposition header field for for the "filename*" parameter.
- If filename-star parses as a RFC5987-value, return the RFC5987-value of filename-star and abort these steps.
- Let filename be the value extracted from the Content-Disposition header field for the "filename" parameter.
- If filename is empty, instead let filename be the value extracted from the Content-Disposition header field for the "name" parameter.
- If filename is empty, return the empty string and abort these steps.
- Return the filename-decoding of filename.
jre: 'name' is only supported by FF and Chrome, see http://greenbytes.de/tech/tc2231/#attwithnamepct
Determining the Disposition
To determine the disposition-type, parse the Content-Disposition header field using the following grammar:
unparsed-string = *LWS nominal-type *OCTET nominal-type = "inline" / "filename" / "name" / ";"
If the Content-Disposition header field is non-empty and fails to parse, then the disposition type is "attachment". Otherwise, the disposition-type is "inline".
jre: (a) the instructions are very confusing, because the grammar doesn't mention the actual disposition types, (b) it's not clear what problem is solved here; what illegal header fields is this parsing that occur in practice?
Processing the Content-Disposition Header Field
To process the Content-Disposition header field, use the following algorithm:
- Determine the disposition-type.
- If the disposition-type is "inline", then ...
- If the disposition-type is "attachment", then let filename be the file name indicated by the header field. ...