Opened 11 months ago

Last modified 7 weeks ago

#17 new issue

Interaction w/ FQ AQMs

Reported by: wes@… Owned by: draft-ietf-tsvwg-l4s-arch@…
Priority: major Milestone: L4S Suite - WGLC Preparation
Component: l4s-arch Version:
Severity: - Keywords:
Cc:

Description (last modified by wes@…)

There have been questions regarding:

  • whether there are possible poor interactions with FQ AQMs
  • if there is a problem, then what the prevalence of such AQMs is in the different Internet paths (to understand how serious an issue this may currently be)

This was discussed at IETF 105 (topic "B"). Jonathan Morton specifically mentioned it as a concern at the open microphone.

Change History (11)

comment:1 Changed 11 months ago by wes@…

  • Description modified (diff)

comment:2 Changed 11 months ago by wes@…

  • Description modified (diff)

comment:3 Changed 10 months ago by wes@…

This has been clarified, better described and elaborated on by Jonathan on the mailing list: https://mailarchive.ietf.org/arch/msg/tsvwg/kERw1493r7SC6ggKj68rh_scp_0

Testing as described in the message thread will produce results that help with some of the uncertainty in this case.

Last edited 10 months ago by wes@… (previous) (diff)

comment:4 Changed 8 months ago by chromatix99@…

FQ algorithms such as DRR++ (documented in RFC-8290) are very robust in preventing differences of response by some subset of flows controlled by them from affecting the throughput and delay available to other flows. The poor response noted under issue #16 is no exception, and the FQ system can successfully contain the standing queue such that only the L4S flows are affected by their own slow response to CE marks.

However, even this excellent performance by FQ is unable to completely mitigate the problem when the FQ node is immediately preceded by a single queue with only slightly higher throughput capacity, a situation which is presently fairly common when a smart CPE device is installed at one end of a dumb last-mile link. In this case the standing queue in the single queue can be observed to persist for several seconds, only slightly less time than the total standing queue, and with almost the same peak induced delay.

This may be referred to as the "consecutive bottlenecks" problem, wherein queuing may occur at multiple nodes, as it takes time for the first queue to drain into later ones after receiving a burst of excess traffic. This is not actually a failure of the interaction between L4S and FQ, but of the interaction between L4S and the AQM overlaid on the FQ, such that in certain circumstances the benefit of the FQ system is diluted.

comment:5 Changed 8 months ago by pete@…

Test results for consecutive bottlenecks are provided in Scenario 5 of the SCE-L4S "bakeoff" tests.

comment:6 Changed 8 months ago by jholland@…

I think it's maybe useful to add direct links to the most interesting l4s plots:

  • A spike lasting >10s of >100ms additional RTT in competing cross-traffic on a 50mbps link with 80ms base rtt, in typical existing 3168 topology (an access drop-tail bottleneck followed by a configured-lower AQM bottleneck, as with gaming router deployments):

https://www.heistp.net/downloads/sce-l4s-bakeoff/bakeoff-2019-09-13T045427-r1/l4s-s5-1/batch-l4s-s5-1-prague-50Mbit-80ms_fixed.png

  • wider scale of the same run, i think? shows the spike gets to almost 400ms total (~300ms additional):

https://www.heistp.net/downloads/sce-l4s-bakeoff/bakeoff-2019-09-13T045427-r1/l4s-s5-1/batch-l4s-s5-1-prague-50Mbit-80ms_var.png

  • similar effect at smaller scale for a 10ms base rtt (spike of ~60ms rtt increase lasts for ~5s).
  • also visible: long-term standing queue at ~10ms additional (not present until much later and more like 5ms in the 80ms base rtt case):

https://www.heistp.net/downloads/sce-l4s-bakeoff/bakeoff-2019-09-13T045427-r1/l4s-s5-1/batch-l4s-s5-1-prague-50Mbit-10ms_var.png

comment:7 Changed 8 months ago by pete@…

Thanks Jake. :) To answer the comment on the second plot, yes, the that's the same result with variable scaling, such that the maximum TCP RTT value is 1/3 of the range for the axis. The fixed scale plots are such that the maximum RTT value is 100ms above the minimum, and are there for easier comparison between plots, whereas the variable scale plots are guaranteed to show the maximum values.

comment:8 Changed 8 months ago by pete@…

After the L4S code update as of tag testing/5-11-2019, the TCP RTT spikes at flow start have been worked around, but they can still occur if TCP Prague is after slow-start exit, as in the following plot (see the green trace):

https://www.heistp.net/downloads/sce-l4s-bakeoff/bakeoff-2019-11-11T090559-r2/l4s-s5-2/batch-l4s-s5-2-prague-vs-cubic-50Mbit-80ms_fixed.png

comment:9 Changed 7 months ago by g.white@…

As discussed on the mailing list and during IETF106 tsvwg session (https://datatracker.ietf.org/meeting/106/materials/slides-106-tsvwg-sessa-72-l4s-drafts-00), this issue was the result of a bug introduced in the refactoring of tcp_dctcp.cc into tcp_prague.cc on July 30, 2019. Once fixed, this issue has been confirmed to be eliminated.

  • the main result of concern was due to a bug in initializing the value of TCP Prague alpha, which has been fixed and demonstrated to resolve the latency impact that was spanning multiple seconds
  • the remaining short duration latency spike in the FIFO queue is seen in *all* congestion control variants tested, including BBRv1, NewReno, and Cubic, and is not specific to Prague
  • if the CoDel queue is upgraded to perform Immediate AQM on L4S flows, the latency spike can be largely avoided for TCP Prague flow startups

Based on this, I believe the consensus is that this issue can be closed.

Pete Heist's comment on Nov 11 points out a scenario in which a TCP Cubic flow starts up after a steady-state TCP Prague flow has been established. In this case, the TCP Prague "alpha" parameter has settled to a low value, and its cwnd is equal to the BDP (+5ms). When the Cubic flow starts up, the FQ scheduler more-or-less immediately cuts the available bandwidth for the TCP Prague flow to half its previous value (so the Prague cwnd is now approx. double what it should be), yet due to the slow evolution of the CoDel AQM control law, it takes several seconds for the queue to provide sufficient feedback to TCP Prague in order for it to reduce its cwnd. The result in the plot above is a period of 4 seconds in which the TCP Prague flow sees a larger queueing delay than would be desired.

This is a separate phenomenon that only affects the TCP Prague flow - other flows are unaffected. Also, this phenomenon can be eliminated if the CoDel AQM is upgraded to perform Immediate AQM for L4S flows.

comment:10 Changed 5 months ago by chromatix99@…

Coming back to this after it was mentioned on the list…

It's important to remember that this behaviour is not "L4S traffic encountering a non-L4S-compliant AQM", as Comment 9 implies. Rather, it is an increasingly common configuration of an RFC-3168 compliant network, being subjected to non-RFC-3168-compliant traffic. It is only because throughput fairness is enforced by a robust FQ component that the effect on the competing, RFC-compliant traffic is minimised.

In that context, I think the slow response of TCP Prague to an RFC-3168 AQM is something the L4S team should be more concerned about than they are, and that is why this issue was opened in the first place - to show that the problem is noticeable not only with single-queue AQMs but also with FQ. Even if the problem is contained to the TCP Prague flow, the latency performance is clearly impaired relative to conventional traffic, contrary to L4S' stated goal of improving latency.

comment:11 Changed 7 weeks ago by pete@…

Adding to this issue (since it's FQ related) that hash collisions in FQ AQMs do occur, and the frequency with which problems may occur is governed by a few things, including the size of the hash table (default 1024 for fq_codel), number of flows in the queue and percentage mix of flows. A calculation of the likelihood of problems is here:

https://docs.google.com/spreadsheets/d/1hOgTTZCKwR8f05Jjb3otJukh4gT6CKv-pnAJRqXEHLI/edit#gid=0

Note: See TracTickets for help on using tickets.