Opened 9 years ago

Closed 8 years ago

#39 closed defect (wontfix)

Warn about mistaken conversion for non-BMP characters

Reported by: duerst@… Owned by:
Priority: minor Milestone:
Component: 3987bis Version:
Severity: - Keywords:
Cc:

Description

Find a place to note that some older software transcoding to UTF-8 may produce illegal output for some input, in particular for characters outside the BMP (Basic Multilingual Plane). As an example, for the IRI with non-BMP characters (in XML Notation):
"http://example.com/𐌀𐌁&#x10302";
which contains the first three letters of the Old Italic alphabet,
the correct conversion to a URI is
"http://example.com/%F0%90%8C%80%F0%90%8C%81%F0%90%8C%82"

Change History (1)

comment:1 Changed 8 years ago by masinter@…

  • Resolution set to wontfix
  • Status changed from new to closed

I'm reluctant to add a warning about a situation whose likelihood is unclear without more evidence that mis-implementations of UTF-8 are in practice still deployed and a difficulty.

If there's any evidence that we actually need to do this, please re-open ticket.

Note: See TracTickets for help on using tickets.