Opened 9 years ago

Last modified 7 years ago

#3 new defect

verify additional errata reported by Ian Hickson on draft-duerst-iri-bis-06

Reported by: lmm@… Owned by:
Priority: major Milestone:
Component: iri-processing Version:
Severity: - Keywords:
Cc:

Description

(These comments were made on version draft-duerst-iri-06, need to validate whether they are fixed in wg draft)

From http://lists.w3.org/Archives/Public/public-iri/2009Jul/0032.html

  • The definition of how to determine whether the following components are present in, and how to obtain their value from, a string that may not

be a "valid URL":

<scheme>
<host>
<port>
<hostport>
<path>
<query>
<fragment>
<host-specific>

  • The definitions for how to resolve a string to an "absolute URL" when the original string is not necessarily a "valid URL".

(I use the term "URL" here in the HTML5 sense, which has varyingly been
called a Web Address or an HRef in related work.)

The following issues also exist in the draft:

  • "is the script's character. encoding" has a typo (misplaced ".")
  • Step 8 in the algorithm for parsing HRefs appears to be a corrupted form of the definition of <hostport> from the old HTML5 text. The new text as phrased appears to be meaningless.
  • It appears that the parsing algorithm is destructive, in that the results will not be isomorphic with the input. For example, the following:

http://example.com/##

...will turn into:

http://example.com/#%23

...which, once the "resolving" algorithm is reintroduced, will be
incompatible with implemented practice.

Change History (3)

comment:1 Changed 9 years ago by lmm@…

This proposed edit makes the parsing algorithm more
explicit, by referencing RFC 3986.

610,616c636,641
< <t>Parse the IRI, either as a relative reference (no scheme)
< or using scheme specific processing (according to the scheme
< given); the result resulting in a set of parsed IRI components.
< (NOTE: FIX BEFORE RELEASE: INTENT IS THAT ALL IRI SCHEMES
< THAT USE GENERIC SYNTAX AND ALLOW NON-ASCII AUTHORITY CAN
< ONLY USE AUTHORITY FOR NAMES THAT FOLLOW PUNICODE.)
< </t>
---

<t>The first step in interpretation of an IRI is to parse the IRI into
its syntactic components. This is accomplished using the method
defined in Section 3 of <xref target="RFC3986"/>, except using the
larger repertoire of unreserved characters.
the result resulting in a set of parsed IRI components.
</t>

618c643
< <t>NOTE: The result of parsing into components will correspond result
---

<t>The result of parsing into components will correspond result

comment:2 Changed 9 years ago by duerst@…

  • Summary changed from verify additional reported errata to verify additional errata reported by Ian Hickson on draft-duerst-iri-bis-06

comment:3 Changed 7 years ago by masinter@…

  • Component changed from 3987bis to iri-processing

These comments apply to things that have been moved (back) into the processing spec.

Note: See TracTickets for help on using tickets.