#20 closed design (fixed)
Default charsets for text media types
Reported by: | mnot@… | Owned by: | fielding@… |
---|---|---|---|
Priority: | normal | Milestone: | 14 |
Component: | p3-payload | Severity: | Active WG Document |
Keywords: | Cc: |
Description
2616 Section 3.7.1 states;
When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.
However, many, if not all, of the text/* media types define their own defaults; text/plain (RFC2046), for example, defaults to ASCII, as does text/xml (RFC3023).
How do these format-specific defaults interact with HTTP's default? Is HTTP really overriding them?
I'm far from the first to be confused by this text, and I'm sure it's been asked before, but I haven't been able to find a definitive answer. If errata are still being considered, perhaps removing/ modifying this line would be a good start...
Attachments (2)
Change History (27)
comment:1 Changed 15 years ago by mnot@…
- Component set to payload
- Milestone set to unassigned
- version set to d00
comment:2 Changed 15 years ago by julian.reschke@…
comment:3 Changed 15 years ago by julian.reschke@…
comment:4 follow-up: ↓ 5 Changed 15 years ago by mnot@…
- Milestone changed from unassigned to 02
Resolution:
- remove <http://tools.ietf.org/id/draft-ietf-httpbis-p3-payload-01.txt>, section 2.3.1, the entire forth paragraph (i.e., the last one in that section).
- From 2.1.1: Move """HTTP/1.1 recipients MUST respect the charset label provided by the sender; and those user agents that have a provision to "guess" a charset MUST use the charset from the content-type field if they support that charset, rather than the recipient's preference, when initially displaying a document. """ to the end of 2.3.1, removing the rest of 2.1.1.
- Add text to Security Considerations explaining UTF-7 vulnerability in browsers and exclude such charsets from the guessing algorithm. (see http://www.w3.org/mid/B412EABE-8E69-455F-A00B-A1ED1F386440@gbiv.com)
comment:5 in reply to: ↑ 4 Changed 15 years ago by julian.reschke@…
- Add text to Security Considerations explaining UTF-7 vulnerability in browsers and exclude such charsets from the guessing algorithm. (see http://www.w3.org/mid/B412EABE-8E69-455F-A00B-A1ED1F386440@gbiv.com)
I'll be happy to apply the changes if somebody proposes the exact text to be added to the security considerations...
comment:6 Changed 15 years ago by julian.reschke@…
comment:7 Changed 15 years ago by julian.reschke@…
From [211]:
Back out change [209], see discussion around <http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0233.html>; relates to #20.
comment:8 Changed 15 years ago by mnot@…
- Milestone changed from 02 to 03
comment:9 Changed 15 years ago by mnot@…
- Milestone changed from 03 to unassigned
comment:10 Changed 14 years ago by julian.reschke@…
From Dublin meeting minutes (http://jabber.ietf.org/logs/httpbis/2008-07-29.txt):
[09:20:47] <Thomas Roessler> julian: default character set for media types (text/*)? [09:20:51] <Thomas Roessler> aleksey: oh gosh [09:20:54] <Thomas Roessler> mnot: wary of this [09:21:58] <Thomas Roessler> mnot: <grepping RFC 2616 for ISO-8859-1 occurences> [09:22:08] <Thomas Roessler> mnot: well, the issue is... [09:22:25] <Thomas Roessler> julian: the issue is when you look in different RFCs for default encoding of text/*, you get different answers [09:22:33] <Thomas Roessler> ... text registration, text/xml registration, HTTP text [09:22:37] <Thomas Roessler> ... wouldn't know which one is normative ... [09:22:59] <Thomas Roessler> ... were close to getting rid of ISO-8859-1, but then Roy stepped in ... [09:23:11] <Thomas Roessler> ... if we can't make normative change, might be useful to phrase this in a way that makes clear what's going on ... [09:23:15] <Thomas Roessler> mnot: issue-20 [09:23:47] <Thomas Roessler> ... proposed tetx suggests that we override defaults ... [09:24:00] <Thomas Roessler> ... relationship between the two isn't clear -- which takes precedence ... [09:24:04] <Thomas Roessler> ... this is confusing people ... [09:24:08] <Thomas Roessler> ... we had a proposal that we backed out ... [09:24:26] <Thomas Roessler> mnot: roy, did you have a proposal for this that you remember? [09:24:32] <roy.fielding> It was a deliberate decision to override MIME. Lots of discussion way back then. [09:24:42] <Thomas Roessler> barry: <channeling roy> [09:24:48] <roy.fielding> not that I can remember .. will search [09:25:25] <Thomas Roessler> julian: If it was deliberate discussion to override MIME, should we now override text/...? [09:25:44] <Thomas Roessler> mnot: remember there were historical reasons for iso-8859-1 [09:25:51] <roy.fielding> right, Mosaic puked on charset parameter [09:26:06] <Thomas Roessler> julian: problem is that default is harmful for formats that carry their own charset info [09:26:23] <Thomas Roessler> ... at least for text/xml, should document what's implemented in practice ... [09:26:40] <Thomas Roessler> mnot: document [09:26:55] <Thomas Roessler> ACTION: mnot to research previous discussion, and restate so we can get going again
comment:11 Changed 14 years ago by julian.reschke@…
Remove our own default, but point out that the MIME default doesn't apply either.
comment:12 Changed 14 years ago by julian.reschke@…
A proposal from Roy made in Feb 2008: <http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0259.html>
comment:13 Changed 14 years ago by julian.reschke@…
- Priority set to urgent
comment:14 Changed 14 years ago by mnot@…
- Priority changed from urgent to normal
Latest summary at:
http://www.w3.org/mid/5565932F-C73D-4183-A09B-46993DD63F88@mnot.net
Discussed at Stockholm editors' meeting; inclination is to define default as 8859-1 and allow sniffing (perhaps just when not declared), but not to allow sniffing to UTF-7 (i.e., only a superset of ascii).
comment:15 Changed 13 years ago by mnot@…
- Priority changed from normal to later
- Severity set to Active WG Document
comment:16 Changed 12 years ago by lmm@…
Text this refers to is currently:
http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-11.html#rfc.section.2.3.1
see
http://tools.ietf.org/html/draft-masinter-mime-web-info also for mention of this as possible change to MIME rather than HTTP.
comment:17 Changed 12 years ago by mnot@…
- Owner set to fielding@…
Prague 2011 editor discussion: proposal is to remove any default (i.e., default is not ascii as in mime, not iso-8859-1 as in 2616) and allow sniffing for text charset. Update linked "Missing Charset" as a result (as well as any other refs to iso-8859-1).
comment:18 Changed 12 years ago by mnot@…
- Priority changed from later to normal
comment:19 Changed 12 years ago by julian.reschke@…
Accept-Charset still special-cases ISO-8859-1; do we want to get rid of this, too?
Changed 12 years ago by julian.reschke@…
proposed patch, *removing* those sections (this may be too drastic)
Changed 12 years ago by julian.reschke@…
new path, now also removing the special case for Accept-Encoding
comment:20 Changed 12 years ago by julian.reschke@…
comment:21 Changed 12 years ago by julian.reschke@…
- Resolution set to incorporated
- Status changed from new to closed
comment:22 Changed 12 years ago by mnot@…
- Milestone changed from unassigned to 14
comment:23 Changed 12 years ago by mnot@…
- Resolution incorporated deleted
- Status changed from closed to reopened
comment:24 Changed 12 years ago by mnot@…
- Resolution set to fixed
- Status changed from reopened to closed
From [146]:
Add directory for test cases, starting with encoding tests (addresses #20)