Opened 9 years ago

Closed 8 years ago

#63 closed defect (fixed)

Consistency of scheme syntax definitions for URI<->IRI conversion

Reported by: duerst@… Owned by:
Priority: major Milestone:
Component: 4395bis Version:
Severity: - Keywords:
Cc:

Description

This issue was raised by Björn Höhrmann at http://lists.w3.org/Archives/Public/public-iri/2010Oct/0019.html. In essence:

"Previously, those who wish to describe resource identifiers that are
useful as IRIs were encouraged to define the corresponding URI syntax,
and note that the IRI usage follows the rules and transformations
defined in [6]. This document changes that advice to encourage explicit
definition of the scheme and allowable syntax elements within the larger
character repertoire of IRIs, as defined by [7]."

I am concerned that this would further draw a distinction between the
characters that occur literally in an identifier and characters that
are percent-encoded. I am not entirely sure in fact how to read RFC
3987 on this (it starts out saying it's just like URIs, except that
there are more unreserved characters, but then excludes private use
code points from the set of unreserved characters).

Let's say I make a scheme where the scheme-specific part can only be
"ö". Since "ö" is an unreserved character, I might be inclined to say

def = "example:" %x00F6;

but that would not work as "example:%c3%b6" is essentially defined as
equivalent to "example:ö". The definition would have to account for a
level of indirection at some point to remove percent-encoding, so I'd
think you cannot quite distinguish between defining an URI scheme and
an IRI scheme, so far the only difference could be in percent-encoded
private use characters. I'd rather remove that difference, and am not
sure what the actual change there would be.

Change History (3)

comment:1 Changed 9 years ago by duerst@…

I have thought about this for quite a while. To avoid the problem that somebody defines a scheme with

def = "example:" %x00F6;

(%x00F6; represents "ö") but forgets to also include the corresponding percent-encoded variant, we could do one of the following things (probably not covering all possibilities and variants):
(1) Define in 4395bis that the necessary percent-encoded variants are automatically included by definition for any character given in a production.
(2) Same as (1), but make this contingent on the scheme definition not doing something else to address the problem more explicitly.
(3) Require in 4395bin that any scheme definition that defines the scheme on the IRI level either include the necessary percent-encoding (explicitly or by a general provision such as in (1) above).
(4) Define in 4395bis that any potential IRI in a given scheme which would be illegal in that scheme when converted to an URI (because the necessary percent-encoded syntax isn't part of the scheme definition) is not part of the IRIs allowed by the scheme.

Each of the above should make sure that for Björn's scheme, not only <example:ö>, but also <example:%c3%b6> is allowed (or potentially that both are disallowed), and that therefore the IRI <example:ö> can be converted to an URI.

I think it is important to recognize that in general, what might be a very simple syntax definition on the IRI level (i.e. in terms of Unicode codepoints) can get very extensive on the URI level (i.e. trying to specify the percent-encoding that corresponds to the IRI syntax exactly, not a single character less or more).

comment:2 Changed 9 years ago by tony@…

discussion at ietf 80:
(1) define that the necessary percent-encoded variants are automatically included by definition for any character given in a production
(2) same as 91), but make this contingent on the scheme definition not doing something else to address the problem more explicitly
(3) require that any scheme definition that defines the scheme on the IRI level either include the necessary percent-encoding (explicitly) or by a general provision such as in (1) above.
(4) define that any potential IRI in a given scheme which would be illegal in that scheme when converted to an URI (because the necessary percent-encoded syntax isn't part of the scheme definition) is not part of the IRIs allowed by the scheme

#1 had concensus

comment:3 Changed 8 years ago by masinter@…

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in section 3.6 referenced in section 4

Note: See TracTickets for help on using tickets.