[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

(usagi-users 03505) Mobility: HA hangs after MN comes home



Hi all,

While testing IPv6 mobility we bumped into the following issue on the Home Agent node.

(I originally sent this email to mipl-devel list but as this issue seems to be linked to ip6-ip6 tunnels, I cc'ed usagi-users list)

In a particular configuration, with UDP streams flowing between the Mobile Node and a
correspondent located on the home link (without support for mobility), the HA
goes wrong:


- The MN moves to a foreign link. Everything is fine (BU, traffic, etc).
- But, when the MN comes back on the home network, the following message
is printed on the HA console:

"unregister_netdevice: waiting for ip6tnl1 to become free. Usage count=169."

(This message is printed every few seconds until the HA is powered off.)

Then, any action related to network interfaces hangs on the HA:
- mip6d hangs, we can't quit it,
- any ifconfig command hangs,
- we can't even shutdown cleanly because the shutdown process hangs
too). This is bad! :)

It seems the HA tries to delete the IP6-IP6 tunnel it used to forward
UDP packets between MN and CN, and gets stuck in netdev_wait_allrefs()
(net/core/dev.c). The tunnel device has a non-zero refcount, and this
refcount never decreases.

Notes:
======
- We don't see this problem with TCP streams.
- We don't see this problem with UDP when the CN is not on the home link.
- The version of MIPL we use is the development version based on a
2.6.11 kernel.

We have the following test-bed:

                             |  -----
                             |--|R2a|----------- L2a (2201::/64)
                             |  -----   |
                             |          |
              L2 (2200::/64) |        ------
                             |        | MN |
                             |        ------
                             |
         ------           ------
         | HA |           | R2 |
         ------           ------
   ________|_________________|_________
         |
         |    L0 (2000::/64) Home Link
         |
         | 2000::cc
       ------
       | CN |
       ------


The problem is linked to this particular configuration: CN on the home link and UDP streams seem to be the factors that trigger this bug.

If we move CN to L2 and do the same test everything goes fine.


This is as far as we get in the analysis of this problem.

We suspect this must be a bug in the kernel code that manage ip6-ip6
tunnels.

Any advice to solve this problem are welcome.
Any idea why the refcount never decrease to 0 in this case?
Who is supposed to increment/decrement the device refcount?

Mark Huth replied to my message on mipl-devel list, and attached a patch for a similar refcount problem (http://oss.sgi.com/archives/netdev/2005-02/msg00149.html). But unfortunately the patch doesn't seem to solve my current problem.

Thanks for your help,

Benjamin

--
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D