Opened 9 years ago

Closed 9 years ago

#43 closed technical (fixed)

Issues related to RLOC rechability (definition and mechanism)

Reported by: luigi@… Owned by:
Priority: minor Component: draft-ietf-lisp
Severity: - Keywords:
Cc:

Description (last modified by luigi@…)

This ticket groups together several inter-related issues concerning RLOC reachability raised by both D. Papadimitriou and Y. Rekhter in their respective review. (If people identify specific and isolated issues that would like to see in different tickets please let me know).

From P. Papadimitriou's Review:

Section 6.1.5 where is the “Data Probe packet” format defined actually ?

Section 6.1.5 states “The RLOCs in the Map-Reply are the
globally-routable IP addresses of the ETR but are not necessarily
reachable; separate testing of reachability is required.” Section 6.3
does not define any procedure actually and does not define any
criteria for deciding when the RLOC is reachable or not. The key
question is if the ITR persists in testing reachability and there is
no criteria process by which such decision would stop, the traffic
would be forwarded by means of the “separate/alternative topology”
forever.

Section 6.3 proposes “encapsulated traffic” based procedures thus, if
there is no traffic there is no RLOC reachability “test” possible. On
the other hand, that section states certain “ICMP exchanges” are
documented but reachability of RLOC does not mean availability of the
associated EID. There no actual “EID-to-RLOC” testing procedure being
defined in the document?

What is struggling here is that the “introductory” sections of the
document refer to “functional” separations but all the techniques
described in this document (and for RLOC reachability testing in
particular) result in a complex inter-twin between control messaging
as part of the forwarding plane and forwarding date as part of the
control plane of routers. The latter is typical from Data Probes
processing. If this is the design choice of LISP so let it be but 1)
please proof it offers any better functionality compared to “current”
design and 2) better cost/performance. It is far from obvious that the
complexity concentrated at TR taking into the proposed design offers
any real compelling argument. This may also defeat the argument stated
in the introduction that “network deployment” is facilitated by
network-based solutions.

Section 6.3 Point 1 what is the use of the “Loc-Status-Bits” if there
is no return traffic (or more precisely no return traffic passing via
this ETR i.e. the ETR is not an ITR for the source RLOC-EID pair). The
ETR may process this information but never use it.

Section 6.3 states “Each ITR can thus observe the presence or lack of
a default route originated by the others to determine the Locator
Status Bits it sets for them.” This does not tell which technique to
use when no default route are used at all (under “non-normal”
circumstances … what should define normality in this context).

Section 6.3.1 states “This mechanism does not completely solve the
forward path reachability problem as traffic may be unidirectional.”
The forwarding path may simply be asymmetric (there is nothing that
imposes reverse path reaching the source RLOC in case of dual attached
sites). This shortcoming defeats this mechanism as an ITR is not
“aware” of its neighboring ITR EID-to-RLOC mappings (connected to the
same site). This mechanism can only be safely used if the RLOC pair
between two sites is unique and remains unique.

Section 6.3.2 RLOC Probing by means of the “control plane” this opens
the question of what is actually probed the “RLOC” or the EID-to-RLOC
entries of the ETR’s database. The difference stems because the latter
are static entries the liveness of the former are dependent on the
incoming interface liveness. That is entries can be available in the
database but if database entries are not in sync with the actual RLOC
status there is no possibility to detect reachability of RLOCs by
means of this mechanism.

From Y. Rekhter's Review:
This is Comment 27

Section 6.3
Several mechanisms for determining RLOC reachability are currently
defined:

  1. An ETR may examine the Loc-Status-Bits in the LISP header of an encapsulated data packet received from an ITR. If the ETR is also acting as an ITR and has traffic to return to the original ITR site, it can use this status information to help select an

RLOC.

  1. An ITR may receive an ICMP Network or ICMP Host Unreachable message

for an RLOC it is using. This indicates that the RLOC is likely down.

  1. An ITR which participates in the global routing system can

determine that an RLOC is down if no BGP RIB route exists that matches
the RLOC IP address.

  1. An ITR may receive an ICMP Port Unreachable message from a

destination host. This occurs if an ITR attempts to use interworking
[INTERWORK] and LISP-encapsulated data is sent to a non-LISP-capable
site.

  1. An ITR may receive a Map-Reply from a ETR in response to a

previously sent Map-Request. The RLOC source of the Map-Reply is
likely up since the ETR was able to send the Map-Reply to the ITR.

  1. When an ETR receives an encapsulated packet from an ITR, the source

RLOC from the outer header of the packet is likely up.

  1. An ITR/ETR pair can use the Locator Reachability Algorithms

described in this section, namely Echo-Noncing or RLOC-Probing.

For the first mechanism to work it is not sufficient for the ETR to
have traffic "to return to the original ITR site" - the ETR has to
have traffic to return to the original ITR. And if the ETR has no
traffic to return to the original ITR (even if the ETR has traffic to
return to some other ITR of the site), then the first mechanism is
useless.

Moreover, the first mechanism is predicated on the assumption that the
Loc-Status-Bits contain up to date information about reachability
of all ETRs at the site. The document describes how the ITR obtains
this information only for the case of CE-based or intra-site based
ITRs. Thus it does not cover the case where the ITR is the
PE. Moreover, for the case of CE-based ITR the scheme does not work if
the site runs RIP as its IGP. Furthermore it assumes that just because
the correspondent ITR can reach the given RLOC, the ETR also
will. Since IP connectivity is not always transitive in this
way, the fact that the corresponding ITR can reach the given RLOC does
not mean that the ETR also can.

The second mechanism depends on ICMP Unreachable. As such it may
result in sustained traffic blackholing due to ICMP Unreachable being
dropped along the path. Also, how would that mechanism handle DoS
attacks caused by spoofed ICMP Unreachables ? Finally, if ITR
is within a site, and behind a firewall, would the firewall pass ICMP
Unreachables in the first place ?

The third mechanism is likely to be of very limited use, as an ETR
going down is unlikely to cause the route that covers the RLOC of that
ETR to be withdrawn (unless this is /32 route).

The fourth mechanism is applicable only for LISP interworking between
a LISP and a non-LISP site.

The fifth and the sixth mechanisms do provide the indication that the
ETR is up, but do not provide the information when the ETR is down. As
such they are not applicable for determining unreachability, as
unreachability requires the ability to determine that the ETR is down.

That leaves us with the seventh mechanism. This mechanism offers two
options: Echo Nonce (section 6.3.1) and RLOC Probing (section
6.3.2). The Echo Nonce mechanism is applicable only when there is a
bidirectional flow between a pair of RLOCs. The RLOC Probing mechanism
has scaling limitations - quoting from 6.3.2:

The major disadvantage of RLOC Probing is in the number of control
messages required and the amount of bandwidth used to obtain those
benefits, especially if the requirement for failure detection times
are very small.

Given the above, what are the mechanisms that are available when xTRs
are PEs (as described in 8.3), and what are their scaling
characteristics ?

Consider an example of site A with two xTRs A1 and A2, and site B with
two xTRs, B1 and B2. Now imagine that outbound traffic from A to B is
using the ITR component of A1, and talking to the ETR component B1 of

  1. But the traffic from B back to A uses the ITR component of B2 and

is sending to the ETR component of A2. So each ITR->ETR flow is
unidirectional, even though all four devices are
fully capable of bidirectional communication. What are the options for
the RLOC reachability in this scenario ?

This is Comment 28

Section 6.3

The ITR can test the reachability of the unreachable Locator by
sending periodic Requests. Both Requests and Replies MUST be rate-
limited. Locator reachability testing is never done with data packets
since that increases the risk of packet loss for end-to-end sessions.

How would one use data packets for testing locator reachability ?

Change History (6)

comment:1 Changed 9 years ago by luigi@…

  • Description modified (diff)
  • Summary changed from Issues related to RLOC rechability (definition and mechanism) raised by D. Papadimitriou in his review to Issues related to RLOC rechability (definition and mechanism)

comment:2 Changed 9 years ago by luigi@…

  • Description modified (diff)

comment:3 Changed 9 years ago by luigi@…

  • Description modified (diff)

comment:4 Changed 9 years ago by luigi@…

  • Description modified (diff)

comment:5 Changed 9 years ago by luigi@…

  • Resolution set to fixed
  • Status changed from new to resolved

Text has been modified to fix the issue.

comment:6 Changed 9 years ago by luigi@…

  • Status changed from resolved to closed
Note: See TracTickets for help on using tickets.