Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#58 closed editorial (fixed)

What identifies an HTTP resource

Reported by: mnot@… Owned by:
Priority: normal Milestone: unassigned
Component: p1-messaging Severity:
Keywords: Cc:


3.2.2 really doesn't say what identifies the resource:

"If the port is empty or not given, port 80 is assumed. The semantics are that the identified resource is located at the server listening for TCP connections on that port of that host, and the Request-URI for the resource is abs_path (Section 5.1.2)."

But it *does* say what part of the HTTP URL becomes the Request-URI, and that definitively needs to be fixed.

Change History (9)

comment:1 Changed 11 years ago by mnot@…

Here's a proposed replacement text:

"The semantics are that the identified resource is located at the server listening for TCP connections on that port of that host, and the Request-URI for the resource is abs_path plus the optional query parameter (Section 5.1.2)."

comment:2 Changed 11 years ago by mnot@…

  • Component set to messaging
  • Milestone set to unassigned

comment:3 Changed 10 years ago by meta@…

HTTP 1.1 said that the resource was identified by the path. As I understand it, various parties would like to change this so that the resource is identified by the combination of the path and the query.

There are a couple of fairly major problems with that proposal.

Firstly, the current HTTP spec says "The POST method is used to request that the origin server accept the entity enclosed in the request as data to be processed by the resource identified by the Request-URI in the Request-Line."

If the query is part of the resource identity, then what is posted in the case where there is no body data, i.e. the data is URL-encoded? It seems to me that you will need to have a different definition of HTTP POST (or a different interpretation of the URL) depending on whether the HTTP POST is using URL-encoded data, or form-encoded data in the body. This is messy.

Secondly, the proposed new definition of "resource" implicitly demands that if


is a valid request, then


should also be a valid request, a request to get the named resource that you just posted. This is not the case for most existing software.

My view is that RFC 3986 is in error, and its change to the definition of query parameters should be ignored in favor of the definition in every prior URI and HTTP standard, also the definition in common use. Then at some point, the URI spec should be fixed.

comment:4 Changed 10 years ago by mnot@…

  • Milestone changed from unassigned to 06

comment:5 Changed 10 years ago by julian.reschke@…

  • Milestone changed from 06 to unassigned

comment:6 Changed 10 years ago by fielding@…

RFC3986 is correct. I expect this section to be entirely rewritten for draft 07 so that it corresponds to the definitions in RFC3986.

I do not expect to use the suggested text above, however, since the whole notion that the "resource" is some mechanism listening for requests on a TCP interface is flawed. A server listens for and handles requests, and by doing so establishes a consistent mapping of request to response that defines the "resource" from the perspective of the client (the consumer of the interface provided by HTTP that hides all implementation details). The URI referenced by the client identifies the resource. The rules for what parts of that URI can be found in the request-target, the Host header field, and the underlying transport connection (e.g., http vs. https) are much harder to define than the rather lame summary I wrote for previous HTTP specs.

meta's argument about POST is based on the rules of HTML FORMs, which are only *one* example of how POST is used. For example, there is nothing in HTTP preventing query arguments from being found in both the request-target and the body of a POST. And there seems to be some confusion over what is a valid request (i.e., correct HTTP syntax sent to a server) versus what methods are allowed by the server for a given resource. POST should not imply that GET on the same resource will be allowed, but it certainly does imply that both are syntactically valid requests.

Both the original wording and meta's argument assume that a common implementation practice (e.g., CGI's interpretation of query arguments) is somehow definitive of HTTP resource handling. Anyone who believes that should spend some time reading the Apache mod_rewrite documentation (or its analog on most other HTTP servers). We need to define HTTP in terms of the interface and intent of the request, taking into account all possible methods and all possible request-target resolution mechanisms, rather than trying to define it in terms of one possible implementation.

comment:7 Changed 10 years ago by fielding@…

From [621]:

Define http and https URI schemes: addresses #58, #128, #159 Resolves #157: removed reference to RFC1900 use of IP addresses in URI.

comment:8 Changed 10 years ago by fielding@…

  • Priority set to normal
  • Resolution set to fixed
  • Status changed from new to closed
  • Type changed from design to editorial

Resolved by the rewording in [621].

comment:9 Changed 10 years ago by meta@…

In that case, quoting the new wording: "...and the remainder of the URI is considered to be identifying data corresponding to a resource for which that server might provide an HTTP interface."

I think that should be "...and the remainder of the URI is considered to include identifying data corresponding to a resource...". Point being, the entire remainder of the URI might not correspond to a resource; it may be that the remainder of the URI is a resource specifier, plus a query to send to that resource, as per previous HTTP specs and many existing implementations.

I still think it's a shame that we're basically throwing away the distinction between ? and / and saying it's entirely up to the server...

Note: See TracTickets for help on using tickets.