Opened 8 years ago

Closed 7 years ago

#118 closed defect (fixed)

What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

Reported by: duerst@… Owned by: adil@…
Priority: major Milestone:
Component: bidi-guidelines Version:
Severity: - Keywords: bidi
Cc:

Description

What term should we use for the kind of text that the Unicode Bidi Algorithm was designed for. RFC 3987 and 3987bis use "running text". bidi-guidelines (-01) changed to "plain text".

We have a definition for running text at
http://tools.ietf.org/html/draft-ietf-iri-3987bis-10#section-1.3:

running text: Human text (paragraphs, sentences, phrases) with

syntax according to orthographic conventions of a natural
language, as opposed to syntax defined for ease of processing by
machines (e.g., markup, programming languages).

In RFC 3987, there are two uses:

The Unicode Bidirectional Algorithm is designed mainly for running text.

[UNIXML] is written in the context of running text rather than in that of identifiers.

The first use moved to bidi-guidelines, but the second use is still in 3987bis. In both cases, the term "plain text" isn't appropriate, because the main use of "plain text" is to distinguish from "fancy text", i.e. text with styling,... But in both usages above, the distinction between "plain text" and "fancy text" is irrelevant. See also http://en.wikipedia.org/wiki/Plain_text.

Change History (9)

comment:1 Changed 8 years ago by adil@…

Mati writes:

Since the question is related to Unicode (the kind of text that the Unicode Bidi Algorithm was designed for), maybe we should check the Unicode definition for "plain text". In the Unicode glossary (http://unicode.org/glossary/#P), we find:

Plain Text. Computer-encoded text that consists only of a sequence of code points from a given standard, with no other formatting or structural information. Plain text interchange is commonly used between computer systems that do not share higher-level protocols. (See also rich text.)

Personally, I find this definition appropriate for "the kind of text that the Unicode Bidi Algorithm was designed for", and I prefer "plain text" over "running text". It is also my experience that "plain text" is much more in use in Unicode circles than "running text".

comment:2 Changed 8 years ago by adil@…

Martin Writes:

I agree that if we look at the distinction between plain text and rich text, then it is appropriate to say that the Bidi Algorithm has been designed for plain text rather than for rich text. But in the two places in the spec where we have been using "running text" for the past seven or more years, it's NOT this distinction between plain text and rich text that we are after.

To be more specific, it's irrelevant whether an IRI shows up in a plain text file (.txt) or a rich text file (e.g. MS Word, HTML with stylesheets,...). We have exactly the same problems with bidi IRIs in plain text as we have in rich text. This is because although the Bidi Algorithm was designed for plain text, essentially the same algorithm is used for rich text. For MS Word, there are usually a few tweaks where it does not behave exactly the same as the Unicode Bidi Algorithm (the last one of them is the special behavior regarding parentheses that was presented and discussed at last year's IUC), but the basics are the same. Rendered HTML also uses the Unicode Bidi Algorithm for its basic features.

What the spec is referring to is the fact that the Bidi Algorithm was designed for sequences of characters, words, and punctuation such as they turn up in letters, newspaper articles, explanatory text in books, and so on, as opposed to sequences of characters as they turn up in artificial stuff such as IRIs, markup source, programming languages, and so on.

I'm not sure whether "running text" is the best term for this, but I am very sure "plain text" is wrong for where we want to use it, because IRIs, markup source, programs, and so on are in many if not most cases plain text. Running text at least seems to come close, see e.g. the definition at http://en.wiktionary.org/wiki/running_text.

comment:3 Changed 8 years ago by adil@…

Addison writes:

I'm pretty sure that 'running text' is too limiting as well. It there a need for a specialized term here at all? How about 'text' as the term? Even such "off-line" formats as napkins and bus sides qualify then. As in: "Where an IRI appears in text...."

I notice that the term "running text" in section 1.3 appears exactly once in the document and there only provides a sort of informative explanation of UNIXML.

comment:4 Changed 8 years ago by adil@…

  • Owner changed from draft-ietf-iri-3987bis@… to adil@…
  • Status changed from new to assigned

comment:5 Changed 8 years ago by adil@…

  • Keywords bidi added

comment:6 Changed 8 years ago by adil@…

  • Component changed from 3987bis to bidi-guidelines

comment:7 Changed 8 years ago by adil@…

I think the best description is:
The Unicode Bidirectional Algorithm is designed for general purpose text

comment:8 Changed 8 years ago by duerst@…

The proposal by Adil ("The Unicode Bidirectional Algorithm is designed for general purpose text") looks very good to me.

I had entered the component as "3987bis" originally, because there is a definition and one use of "running text" in 3987bis, too.

In line with Adil's proposal, I propose to change "[UNIXML] is written in the context of running text rather than in that of identifiers." to "[UNIXML] is written in the context of general proprose text rather than in that of identifiers."

There are two things we can do with the definition we currenly have for running text: Change it to a definition of general purpose text, or remove it. The changed definition would read:

general purpose text: Human text (paragraphs, sentences,

phrases) with syntax according to orthographic conventions of a
natural language, as opposed to syntax defined for ease of
processing by machines (e.g., markup, programming languages).

Becasue we use the term only once in each of two documents, and because we use it only in contrast, I propose to remove the definition.

comment:9 Changed 7 years ago by duerst@…

  • Resolution set to fixed
  • Status changed from assigned to closed

Using "general purpose text" as proposed by Adil. Was is already implemented in bidi. Also changed in 3987bis, and removed the definition, as proposed before.

Note: See TracTickets for help on using tickets.