Opened 7 years ago

Last modified 7 years ago

#121 reopened defect

BIDI: Some users are requiring right-to-left label ordering.

Reported by: shawnste@… Owned by: draft-ietf-iri-bidi-guidelines@…
Priority: major Milestone:
Component: bidi-guidelines Version:
Severity: - Keywords: bidi
Cc:

Description

BIDI section 2 requires adding embedding marks with force a "western" left-to-right ordering of labels. I have requirements from customers, including government customers, that require a right-to-left ordering of labels in at least some cases.

This preferences seems to be a user preference, with, perhaps, a strong language bias.

Specifically, how is a user reading an Arabic domain name from the side of the bus over a phone going to read it? And how will the person on the end of the phone type it? My investigation shows that native speakers will prefer reading a domain name from the right in BIDI contexts.

Change History (9)

comment:1 Changed 7 years ago by duerst@…

Shawn, thanks for creating this issue. Can you give more details about your customer's requirements (e.g. is right-to-left ordering meant to work per component or per run? At what point should a mixed (including RTL and LTR components) IRI be displayed right-to-left (e.g. even if only a single component, e.g. a single path component (directory) in a path is RTL)? Are there details that vary per "customer", and if yes, what?

comment:2 Changed 7 years ago by shawnste@…

The primary concern would be a simple domain name, even without http:// :) Of course an IRI needs to be consistent with that.

The customers have been focused primarily on the domain portion. By the time we look at the query string they've "lost interest". So RTL in the domain should probably force reading order.

Interestingly, however, the key indicator isn't the domain itself, but rather the context/mindset of the user. If they're dealing with Arabic, they may expect the URL to render labels from right-to-left, even if it's entirely ASCII! Specifically, if the browser's UI language is Arabic, or if the Address Field is in Right To Left Reading order, this expectation increases.

The bias also seems to be cultural &/or experience related. A software engineer that majored in math speaking from one country may feel more comfortable with left-to-right behavior than a non-computer/math focused person in another country.

I know it doesn't help this RFC, but keying off the address box directionality might be good. In a document, keying off the primary document language might work. That doesn't provide the consistency necessary here.

I don't think that "any RTL means all-RTL" works very well, because a simple Arabic query string to Bing probably doesn't mean that the address needs flipped. Any RTL within the domain portion (or local part of an email address) probably does indicate that the labels should be ordered from Right to Left.

I realize that following these rules may end up with behavior that is "fuzzier" than some are comfortable with, however the goal here is human readable (by the 90%, not engineers). Machines and Engineers already know how to "read" it, we've got byte order if nothing else; our biases should not impact the "see a domain name on the side of the bus and type it into my phone" case.

In summary: Follow the order of the address box if the user sets that. If there is no other context, any RTL in the primary portion (eg domain) of the IRI should trigger RTL ordering of the labels. EG: put the whole thing in right to left marks instead of left to right marks.

comment:3 Changed 7 years ago by adil@…

  • Keywords bidi added
  • Resolution set to wontfix
  • Status changed from new to closed

Shawn, being one of the people that wants to see Arabic URLs flowing right to left I fully understand what you are saying. I have gone around in circles a few times with this and I concluded that this version of the Bidi-IRI document is not where we should resolve the issue.

Firstly, internet addresses is a subset of the use of IRIs and I need to take into account the general purpose of the IRI. IRIs are rendered by a wide variety of devices that have only a few things in common. The primary concern is that the IRI is consistent on all these devices when it contains bidi characters.

Secondly, a full solution to getting to URLs to render readably right-to-left requires either a modification to the Unicode bidi algorithm (which Mark Davis proposed) or a restriction to the characters that can be used for registering right-to-left domain names (e.g. only allow Arabic alphabetic characters in an Arabic domain name). Both of these cases are out of the scope of this document.

I think what is needed (independently of this document) is a specification for URLs that are safe to be drawn right-to-left. Then, if a browser recognizes a safe URL it can draw the URL right-to-left without concern. This specification can be advertised to domain name registrars and web companies. In theory we could then have the Googles, and Facebooks of this world using and advertising URLs that are right to left.

I am setting this issue as won't fix but if you disagree please comment here and I will reopen it.

comment:4 Changed 7 years ago by shawnste@…

Well, I disagree with pretty much every point :)

  • clearly everything won't be consistent because plain text that doesn't know how to detect an IRI isn't going to behave as expected.
  • I think that the importance isn't consistency between devices, but rather the ability for users to consistently transcribe the IRI. That includes not only display on devices, but input through whatever keyboards from sticky notes that were transcribed by hand from an IRI on the side of a bus.
  • Related, I don't think they can be "unnatural".
  • There's a lot of pressure to ensure that RTL domains are "correctly" rendered in RTL fashion. So I think we'd get a better job of consistency if the guidelines took that into account instead of having software developers trying to do something "better" in an inconsistent fashion.
  • Though fixing the BIDI Algorithm would help, it's not required. Indeed, the proposed behavior uses bidi override marks to get the desired behavior. The same thing can be done for RTL. Granted a better BIDI algorithm for IRIs would make "plain text" better, but it’s not required.
  • As noted, this isn’t necessarily easily gleaned from the script(s) being used, as some cultural and user preferences also influence it.

I disagree that there’s anything particularly interesting about “safe”. I think that as long as the sections are consistently from left to right or right to left it doesn’t matter whether its drawn http://www.microsoft.com or com.microsoft.www/ /:http. Indeed if that was the user preference, independent of the actual script, then they’d always be consistent for that user. If there does prove to be a spoofing problem with http://www.spoof.me.com/com.microsoft.www//:http type things, those are fairly easy for malware filters to detect. Also 90% of users can’t tell that http://www.microsoft.safe-secure.com isn’t a great place to enter a credit card #. At the machine level, the rendering is irrelevant since it’s always stored the same way.

I really need an way, even optional if need be, of rendering for RTL before I can "sign off" on this draft :)

comment:5 Changed 7 years ago by duerst@…

  • Resolution wontfix deleted
  • Status changed from closed to reopened

Reopening it for Shawn. We definitely need wider consensus on how to proceed with this.

comment:6 Changed 7 years ago by duerst@…

(from Larry)

My read on the situation:

It would be helpful if we could get some agreed text describing the nature of the problem --

it sounds to me that there might be agreement on the problem (more or less) ,

just not on whether there are feasible (partial) solutions.

If we have agreement on the problem statement, then we can:

  • document partial solutions (with caveats)
  • say we don't believe there are any feasible solutions at this time

It would be useful also to get a survey of of what current implementations actually are doing now, along with some concrete examples of the nature of the problems.

I really need an way, even optional if need be, of rendering for RT before I can "sign off" on this draft :)

There's no magic, just "rough consensus and running code":

  • if all of the implementations agree, then we can document that.
  • If there are multiple implementations currently, we can try to pick one.
  • if we don't like any of the implementations, we can say so.
  • If there are no implementations or even demos or samples of implementations, we shouldn't hold our breath hoping one will appear.

Larry

comment:7 Changed 7 years ago by duerst@…

(from Shawn)

IE is currently not great now, getting into the mixed-up situations we all know is undesirable.

A "concrete example" seems hard, but one that I'm keen on is a partial web name on the side of a bus, in Arabic, eg: CCC.BBB.AAA. Note that I'm intentionally leaving out the http:// and any default.html or whatever. I have a difficult time imagining any Arabic speaker copying that onto a notepad other than by writing from right to left. I also expect that they would then naturally type it the same way they wrote it. I think we have to build from there, that's how 90% of the people use an IRI. Nobody's going to type the http://, particularly in Arabic, because it requires a keyboard change, and the browser will add it for them.

In those 90% useful cases there is no mixed Latin/Arabic?, it's just a domain name. It's nice if we present mixed up stuff a little more orderly, but nobody cares about the part after the domain name.

I believe that we need to allow the same thing we have with LTR ordering, except for RTL. Where it gets confusing to me is when you choose LTR or RTL behavior. A few options seem possible:

  • User Preference
  • System/Application? Preference (eg: I'm looking at an Arabic web site, so I'll show RTL labels. I'm looking at an English web site, I'll show LTR labels).
  • If there're any RTL characters, do the whole thing as RTL
  • Restrict the RTL/LTR test to the primary part of the IRI, eg: domain.

Caveats are that many of those probably allow homographs in some cases (Maybe not User Preference, since they'd know it'd always be one direction or the other.) I'm not worried about those cases as SmartScreen? will easily filter those out if necessary. It'd be harder if we didn't force RTL/LTR on the whole thing (eg: had current BIDI algorithm behavior).

-Shawn

comment:8 Changed 7 years ago by duerst@…

Hello Shawn,

Two points of clarification:

  • At http://trac.tools.ietf.org/wg/iri/trac/ticket/121#comment:4, you write "Indeed, the proposed behavior uses bidi override marks to get the desired behavior.", but it's not override marks, it's embedding marks. Otherwise, not a single RTL domain label or path component would be readable. (maybe that's what you meant, but in that case, please be careful with terminology)
                                                         Also, a
   bidirectional relative IRI reference that only contains strong right-
   to-left characters and weak characters (such as symbols) and that
   starts and ends with a strong right-to-left character and appears in
   a text with right-to-left base directionality (such as used for
   Arabic or Hebrew) and is preceded and followed by whitespace and
   strong characters does not need an embedding.

comment:9 Changed 7 years ago by adil@…

I think I should clarify what I meant:

Right now if you have a 'simple' URL that is all in Arabic it will be rendered right-to-left even given the restrictions of this document. So, using the normal bidi notation (capitals for rtl characters):
Logical order:

http://ABC.DEF.GHI/JKL

Appears as:

http://LKJ/IHG.FED.CBA

Or without the http.. :

LKJ/IHG.FED.CBA

This is why I believe the current situation satisfies the 'side of a bus URL' criteria for a small subset of right-to-left URLs.

The point is to strictly define what that subset is and create tools and documents to verify it so that web sites and browsers can display them. Also within this subset of URLs it is possible to have browsers draw these in the URL bar right-to-left and right aligned. But I do not know if this document is the place for such a definition.

Note: See TracTickets for help on using tickets.