Opened 12 months ago

Last modified 7 weeks ago

#16 new issue

L4S - Interaction w/ 3168-only ECN AQMs

Reported by: wes@… Owned by: draft-ietf-tsvwg-l4s-arch@…
Priority: major Milestone: L4S Suite - WGLC Preparation
Component: l4s-arch Version:
Severity: - Keywords:
Cc:

Description (last modified by wes@…)

L4S senders need to safely transit non-L4S (e.g. RFC 3168-only) AQMs that are on the path of a data flow.

This was discussed at IETF 105 (issue "A").

Jonathan Morton has explained the concern in greater detail, and raised it as a concern of his at the open microphone.

One ancillary question is about the prevalence of RFC 3168-only AQMs in the Internet. There seems to be some deployment, based on observations shared by Jake Holland.

Change History (12)

comment:1 Changed 11 months ago by wes@…

  • Summary changed from Interaction w/ 3168-only ECN AQMs to L4S - Interaction w/ 3168-only ECN AQMs

comment:2 Changed 11 months ago by wes@…

  • Milestone changed from L4S Suite - WGLC to L4S Suite - WGLC Preparation

Milestone renamed

comment:3 Changed 11 months ago by wes@…

  • Description modified (diff)

comment:4 Changed 8 months ago by chromatix99@…

Testing subsequent to IETF-105 appears to substantiate this concern. In its present form, an L4S sender made available for public testing reacts very slowly to congestion signals produced by existing Codel-family AQMs. Induced queuing delays reach approximately 125ms over a 10ms baseline RTT, and a standing queue persists for several seconds before being brought under control.

In the single-queue case this also causes competing TCP flows to collapse, due to the high signalling rate (2 CE marks per RTT) needed to keep L4S flows in steady-state, compared to the low signalling rate (many RTTs between single CE marks) expected by RFC-3168 compliant TCPs.

Codel is documented in RFC-8289. Codel AQMs are deployed by at least two known European ISPs (Free.fr and Telenor) on all their last-mile ADSL/VDSL links, and in the majority of MacOS X and Linux hosts presently deployed, as well as in a substantial number of CPE devices. Clearly, this is a serious incompatibility which affects deployability of L4S.

comment:5 Changed 8 months ago by ietf@…

Are these deployments Codel or FQ_Codel? If FQ_Codel, other flows will not be affected as long as the bottleneck stays in the FQ_Codel node (see issue 17).

Also, the original issue says "There seems to be some deployment, based on observations shared by Jake Holland." Again we don't yet know whether that's FQ or not? There's no problem between TCP Prague and FQ_* with RFC3168 ECN.

The question is whether there's any /single/ queue RFC3168 ECN AQMs out there.

comment:6 Changed 8 months ago by pete@…

Test results for single queue and FQ AQMs (fq_codel with and without the 'flows 1' parameter) are available in Scenario 2 and Scenario 3 of the SCE-L4S "bakeoff" tests.

comment:7 Changed 8 months ago by ietf@…

Discussion paper published giving rationale and pseudocode for an algorithm we are implementing and evaluating in TCP Prague.

TCP Prague Fall-back on Detection of a Classic ECN AQM

We'll publish evaluation results as soon as we have them.

Email thread opened for discussion. I suggest iccrg should be the lead forum if cross-posting gets irritating.

comment:8 Changed 8 months ago by jholland@…

Thanks Bob, that looks like a solid attempt, and it'll be interesting to see the tests.

Of note: Dave mentioned that he believes fq_codel is different in ns3 than in linux, and so it might be worth trying this with a real kernel, just to be sure.

comment:9 Changed 8 months ago by jholland@…

For reference, the latest test results without detection:

https://www.heistp.net/downloads/sce-l4s-bakeoff/bakeoff-2019-11-09T193036-r2/l4s-s3-2/batch-l4s-s3-2-cubic-vs-prague-50Mbit-10ms_fixed.png

comment:10 Changed 8 months ago by pete@…

Here's the same plot from the latest result using the L4S tag testing/5-11-2019 (sorry about the broken image Jake, already pulled the prior plots after a full run :)

https://www.heistp.net/downloads/sce-l4s-bakeoff/bakeoff-2019-11-11T090559-r2/l4s-s3-2/batch-l4s-s3-2-cubic-vs-prague-50Mbit-10ms_fixed.png

Version 0, edited 8 months ago by pete@… (next)

comment:11 Changed 8 months ago by pete@…

Here's a plot of long convergence times with fq_codel flows 1 (single queue AQM) at 80ms.

https://www.heistp.net/downloads/sce-l4s-bakeoff/bakeoff-2019-11-11T090559-r2/l4s-s3-2/batch-l4s-s3-2-prague-vs-prague-50Mbit-80ms_fixed.png

comment:12 Changed 7 weeks ago by pete@…

Adding to this issue a link to the "false negatives" section of our recent results, which show cases in which classic bottleneck detection fails to detect RFC 3168 bottlenecks:

https://github.com/heistp/sce-l4s-ect1#false-negatives

Note: See TracTickets for help on using tickets.