aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/bonding
AgeCommit message (Collapse)Author
2018-06-16bonding: require speed/duplex only for 802.3ad, alb and tlbAndreas Born
commit ad729bc9acfb7c47112964b4877ef5404578ed13 upstream. The patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") puts the link state to down if bond_update_speed_duplex() cannot retrieve speed and duplex settings. Assumably the patch was written with 802.3ad mode in mind which relies on link speed/duplex settings. For other modes like active-backup these settings are not required. Thus, only for these other modes, this patch reintroduces support for slaves that do not support reporting speed or duplex such as wireless devices. This fixes the regression reported in bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547). Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Andreas Born <futur.andy@googlemail.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Nate Clark <nate@neworld.us> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-16bonding: fix active-backup transitionMahesh Bandewar
commit 3f3c278c94dd994fe0d9f21679ae19b9c0a55292 upstream. Earlier patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") made an attempt to keep slave state consistent with speed and duplex settings. Unfortunately link-state transition is used to change the active link especially when used in conjunction with mii-mon. The above mentioned patch broke that logic. Also when speed and duplex settings for a link are updated during a link-event, the link-status should not be changed to invoke correct transition logic. This patch fixes this issue by moving the link-state update outside of the bond_update_speed_duplex() fn and to the places where this fn is called and update link-state selectively. Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Nate Clark <nate@neworld.us> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-16bonding: correctly update link status during mii-commit phaseMahesh Bandewar
commit b5bf0f5b16b9c316c34df9f31d4be8729eb86845 upstream. bond_miimon_commit() marks the link UP after attempting to get the speed and duplex settings for the link. There is a possibility that bond_update_speed_duplex() could fail. This is another place where it could result into an inconsistent bonding link state. With this patch the link will be marked UP only if the speed and duplex values retrieved have sane values and processed further. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Nate Clark <nate@neworld.us> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-19bonding: send learning packets for vlans on slaveDebabrata Banerjee
[ Upstream commit 21706ee8a47d3ede7fdae0be6d7c0a0e31a83229 ] There was a regression at some point from the intended functionality of commit f60c3704e87d ("bonding: Fix alb mode to only use first level vlans.") Given the return value vlan_get_encap_level() we need to store the nest level of the bond device, and then compare the vlan's encap level to this. Without this, this check always fails and learning packets are never sent. In addition, this same commit caused a regression in the behavior of balance_alb, which requires learning packets be sent for all interfaces using the slave's mac in order to load balance properly. For vlan's that have not set a user mac, we can send after checking one bit. Otherwise we need send the set mac, albeit defeating rx load balancing for that vlan. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-19bonding: do not allow rlb updates to invalid macDebabrata Banerjee
[ Upstream commit 4fa8667ca3989ce14cf66301fa251544fbddbdd0 ] Make sure multicast, broadcast, and zero mac's cannot be the output of rlb updates, which should all be directed arps. Receive load balancing will be collapsed if any of these happen, as the switch will broadcast. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-29bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslaveXin Long
[ Upstream commit ddea788c63094f7c483783265563dd5b50052e28 ] After Commit 8a8efa22f51b ("bonding: sync netpoll code with bridge"), it would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev if bond->dev->npinfo was set. However now slave_dev npinfo is set with bond->dev->npinfo before calling slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup(). It causes that the lower dev of this slave dev can't set its npinfo. One way to reproduce it: # modprobe bonding # brctl addbr br0 # brctl addif br0 eth1 # ifconfig bond0 192.168.122.1/24 up # ifenslave bond0 eth2 # systemctl restart netconsole # ifenslave bond0 br0 # ifconfig eth2 down # systemctl restart netconsole The netpoll won't really work. This patch is to remove that slave_dev npinfo setting in bond_enslave(). Fixes: 8a8efa22f51b ("bonding: sync netpoll code with bridge") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13bonding: process the err returned by dev_set_allmulti properly in bond_enslaveXin Long
[ Upstream commit 9f5a90c107741b864398f4ac0014711a8c1d8474 ] When dev_set_promiscuity(1) succeeds but dev_set_allmulti(1) fails, dev_set_promiscuity(-1) should be done before going to the err path. Otherwise, dev->promiscuity will leak. Fixes: 7e1a1ac1fbaa ("bonding: Check return of dev_set_promiscuity/allmulti") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13bonding: move dev_mc_sync after master_upper_dev_link in bond_enslaveXin Long
[ Upstream commit ae42cc62a9f07f1f6979054ed92606b9c30f4a2e ] Beniamino found a crash when adding vlan as slave of bond which is also the parent link: ip link add bond1 type bond ip link set bond1 up ip link add link bond1 vlan1 type vlan id 80 ip link set vlan1 master bond1 The call trace is as below: [<ffffffffa850842a>] queued_spin_lock_slowpath+0xb/0xf [<ffffffffa8515680>] _raw_spin_lock+0x20/0x30 [<ffffffffa83f6f07>] dev_mc_sync+0x37/0x80 [<ffffffffc08687dc>] vlan_dev_set_rx_mode+0x1c/0x30 [8021q] [<ffffffffa83efd2a>] __dev_set_rx_mode+0x5a/0xa0 [<ffffffffa83f7138>] dev_mc_sync_multiple+0x78/0x80 [<ffffffffc084127c>] bond_enslave+0x67c/0x1190 [bonding] [<ffffffffa8401909>] do_setlink+0x9c9/0xe50 [<ffffffffa8403bf2>] rtnl_newlink+0x522/0x880 [<ffffffffa8403ff7>] rtnetlink_rcv_msg+0xa7/0x260 [<ffffffffa8424ecb>] netlink_rcv_skb+0xab/0xc0 [<ffffffffa83fe498>] rtnetlink_rcv+0x28/0x30 [<ffffffffa8424850>] netlink_unicast+0x170/0x210 [<ffffffffa8424bf8>] netlink_sendmsg+0x308/0x420 [<ffffffffa83cc396>] sock_sendmsg+0xb6/0xf0 This is actually a dead lock caused by sync slave hwaddr from master when the master is the slave's 'slave'. This dead loop check is actually done by netdev_master_upper_dev_link. However, Commit 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") moved it after dev_mc_sync. This patch is to fix it by moving dev_mc_sync after master_upper_dev_link, so that this loop check would be earlier than dev_mc_sync. It also moves if (mode == BOND_MODE_8023AD) into if (!bond_uses_primary) clause as an improvement. Note team driver also has this issue, I will fix it in another patch. Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") Reported-by: Beniamino Galvani <bgalvani@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13bonding: fix the err path for dev hwaddr sync in bond_enslaveXin Long
[ Upstream commit 5c78f6bfae2b10ff70e21d343e64584ea6280c26 ] vlan_vids_add_by_dev is called right after dev hwaddr sync, so on the err path it should unsync dev hwaddr. Otherwise, the slave dev's hwaddr will never be unsync when this err happens. Fixes: 1ff412ad7714 ("bonding: change the bond's vlan syncing functions with the standard ones") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13bonding: Don't update slave->link until ready to commitNithin Sujir
[ Upstream commit 797a93647a48d6cb8a20641a86a71713a947f786 ] In the loadbalance arp monitoring scheme, when a slave link change is detected, the slave->link is immediately updated and slave_state_changed is set. Later down the function, the rtnl_lock is acquired and the changes are committed, updating the bond link state. However, the acquisition of the rtnl_lock can fail. The next time the monitor runs, since slave->link is already updated, it determines that link is unchanged. This results in the bond link state permanently out of sync with the slave link. This patch modifies bond_loadbalance_arp_mon() to handle link changes identical to bond_ab_arp_{inspect/commit}(). The new link state is maintained in slave->new_link until we're ready to commit at which point it's copied into slave->link. NOTE: miimon_{inspect/commit}() has a more complex state machine requiring the use of the bond_{propose,commit}_link_state() functions which maintains the intermediate state in slave->link_new_state. The arp monitors don't require that. Testing: This bug is very easy to reproduce with the following steps. 1. In a loop, toggle a slave link of a bond slave interface. 2. In a separate loop, do ifconfig up/down of an unrelated interface to create contention for rtnl_lock. Within a few iterations, the bond link goes out of sync with the slave link. Signed-off-by: Nithin Nayak Sujir <nsujir@tintri.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24bonding: handle link transition from FAIL to UP correctlyMahesh Bandewar
[ Upstream commit fb9eb899a6dc663e4a2deed9af2ac28f507d0ffb ] When link transitions from LINK_FAIL to LINK_UP, the commit phase is not called. This leads to an erroneous state causing slave-link state to get stuck in "going down" state while its speed and duplex are perfectly fine. This issue is a side-effect of splitting link-set into propose and commit phases introduced by de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") This patch fixes these issues by calling commit phase whenever link state change is proposed. Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22bonding: refine bond_fold_stats() wrap detectionEric Dumazet
[ Upstream commit 142c6594acbcc32391af9c15f8cd65c6c177698f ] Some device drivers reset their stats at down/up events, possibly fooling bonding stats, since they operate with relative deltas. It is nearly not possible to fix drivers, since some of them compute the tx/rx counters based on per rx/tx queue stats, and the queues can be reconfigured (ethtool -L) between the down/up sequence. Lets avoid accumulating 'negative' values that render bonding stats useless. It is better to lose small deltas, assuming the bonding stats are fetched at a reasonable frequency. Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22bonding: make speed, duplex setting consistent with link stateMahesh Bandewar
[ Upstream commit c4adfc822bf5d8e97660b6114b5a8892530ce8cb ] bond_update_speed_duplex() retrieves speed and duplex settings. There is a possibility of failure in retrieving these values but caller has to assume it's always successful. This leads to having inconsistent slave link settings. If these (speed, duplex) values cannot be retrieved, then keeping the link UP causes problems. The updated bond_update_speed_duplex() returns 0 on success if it retrieves sane values for speed and duplex. On failure it returns 1 and marks the link down. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24bonding: discard lowest hash bit for 802.3ad layer3+4Hangbin Liu
[ Upstream commit b5f862180d7011d9575d0499fa37f0f25b423b12 ] After commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range in connect()"), we will try to use even ports for connect(). Then if an application (seen clearly with iperf) opens multiple streams to the same destination IP and port, each stream will be given an even source port. So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing will always hash all these streams to the same interface. And the total throughput will limited to a single slave. Change the tcp code will impact the whole tcp behavior, only for bonding usage. Paolo Abeni suggested fix this by changing the bonding code only, which should be more reasonable, and less impact. Fix this by discarding the lowest hash bit because it contains little entropy. After the fix we can re-balance between slaves. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07bonding: fix accounting of active ports in 3adJarod Wilson
[ Upstream commit 751da2a69b7cc82d83dc310ed7606225f2d6e014 ] As of 7bb11dc9f59d and 0622cab0341c, bond slaves in a 3ad bond are not removed from the aggregator when they are down, and the active slave count is NOT equal to number of ports in the aggregator, but rather the number of ports in the aggregator that are still enabled. The sysfs spew for bonding_show_ad_num_ports() has a comment that says "Show number of active 802.3ad ports.", but it's currently showing total number of ports, both active and inactive. Remedy it by using the same logic introduced in 0622cab0341c in __bond_3ad_get_active_agg_info(), so sysfs, procfs and netlink all report the number of active ports. Note that this means that IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of NUM_PORTS, and thus perhaps should be renamed for clarity. Lightly tested on a dual i40e lacp bond, simulating link downs with an ip link set dev <slave2> down, was able to produce the state where I could see both in the same aggregator, but a number of ports count of 1. MII Status: up Active Aggregator Info: Aggregator ID: 1 Number of ports: 2 <--- Slave Interface: ens10 MII Status: up <--- Aggregator ID: 1 Slave Interface: ens11 MII Status: up Aggregator ID: 1 MII Status: up Active Aggregator Info: Aggregator ID: 1 Number of ports: 1 <--- Slave Interface: ens10 MII Status: down <--- Aggregator ID: 1 Slave Interface: ens11 MII Status: up Aggregator ID: 1 CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-11Merge branch 'work.uaccess2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess.h prepwork from Al Viro: "Preparations to tree-wide switch to use of linux/uaccess.h (which, obviously, will allow to start unifying stuff for real). The last step there, ie PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ `git grep -l "$PATT"|grep -v ^include/linux/uaccess.h` is not taken here - I would prefer to do it once just before or just after -rc1. However, everything should be ready for it" * 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: remove a stray reference to asm/uaccess.h in docs sparc64: separate extable_64.h, switch elf_64.h to it score: separate extable.h, switch module.h to it mips: separate extable.h, switch module.h to it x86: separate extable.h, switch sections.h to it remove stray include of asm/uaccess.h from cacheflush.h mn10300: remove a bogus processor.h->uaccess.h include xtensa: split uaccess.h into C and asm sides bonding: quit messing with IOCTL kill __kernel_ds_p off mn10300: finish verify_area() off frv: move HAVE_ARCH_UNMAPPED_AREA to pgtable.h exceptions: detritus removal
2016-09-27bonding: quit messing with IOCTLAl Viro
The only remaining users are issuing SIOCGMIIPHY and SIOCGMIIREG, neither of which deals with userland pointers. Simply calling ->ndo_do_ioctl() is fine; no messing with set_fs() is needed. It used to mess with SIOCETHTOOL, which would've needed set_fs(), but that has been killed in "[NET] ethtool ops are the only way" 9 years ago... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/qlogic/qed/qed_dcbx.c drivers/net/phy/Kconfig All conflicts were cases of overlapping commits. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04bonding: Fix bonding crashMahesh Bandewar
Following few steps will crash kernel - (a) Create bonding master > modprobe bonding miimon=50 (b) Create macvlan bridge on eth2 > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ type macvlan (c) Now try adding eth2 into the bond > echo +eth2 > /sys/class/net/bond0/bonding/slaves <crash> Bonding does lots of things before checking if the device enslaved is busy or not. In this case when the notifier call-chain sends notifications, the bond_netdev_event() assumes that the rx_handler /rx_handler_data is registered while the bond_enslave() hasn't progressed far enough to register rx_handler for the new slave. This patch adds a rx_handler check that can be performed right at the beginning of the enslave code to avoid getting into this situation. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01bonding: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "wq" queues multiple work items viz &bond->mcast_work, &nnw->work, &bond->mii_work, &bond->arp_work, &bond->alb_work, &bond->mii_work, &bond->ad_work, &bond->slave_arr_work which require strict execution ordering. Hence, an ordered dedicated workqueue has been used. Since, it is a network driver, WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-09bonding: fix the typoZhu Yanjun
The message "803.ad" should be "802.3ad". Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25net/bonding: Enforce active-backup policy for IPoIB bondsMark Bloch
When using an IPoIB bond currently only active-backup mode is a valid use case and this commit strengthens it. Since commit 2ab82852a270 ("net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()") was introduced till 4.7-rc1, IPoIB didn't support the set_mac_address ndo, and hence the fail over mac policy always applied to IPoIB bonds. With the introduction of commit 492a7e67ff83 ("IB/IPoIB: Allow setting the device address"), that doesn't hold and practically IPoIB bonds are broken as of that. To fix it, lets go to fail over mac if the device doesn't support the ndo OR this is IPoIB device. As a by-product, this commit also prevents a stack corruption which occurred when trying to copy 20 bytes (IPoIB) device address to a sockaddr struct that has only 16 bytes of storage. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Just several instances of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14bonding: set carrier off for devices created through netlinkBeniamino Galvani
Commit e826eafa65c6 ("bonding: Call netif_carrier_off after register_netdevice") moved netif_carrier_off() from bond_init() to bond_create(), but the latter is called only for initial default devices and ones created through sysfs: $ modprobe bonding $ echo +bond1 > /sys/class/net/bonding_masters $ ip link add bond2 type bond $ grep "MII Status" /proc/net/bonding/* /proc/net/bonding/bond0:MII Status: down /proc/net/bonding/bond1:MII Status: down /proc/net/bonding/bond2:MII Status: up Ensure that carrier is initially off also for devices created through netlink. Signed-off-by: Beniamino Galvani <bgalvani@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en.h drivers/net/ethernet/mellanox/mlx5/core/en_main.c drivers/net/usb/r8152.c All three conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05bonding: fix enslavement slave link notificationsAviv Heller
Currently, link notifications are not sent by bond_set_slave_link_state() upon enslavement if the slave is enslaved when up. This happens because slave->link default init value is 0, which is the same as BOND_LINK_UP, resulting in bond_set_slave_link_state() ignoring this transition. This patch sets the default value of slave->link to BOND_LINK_NOCHANGE, assuring it will count as a state transition and thus trigger notification logic. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05net: introduce default neigh_construct/destroy ndo calls for L2 upper devicesJiri Pirko
L2 upper device needs to propagate neigh_construct/destroy calls down to lower devices. Do this by defining default ndo functions and use them in team, bond, bridge and vlan. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01bonding: prevent out of bound accessesEric Dumazet
ether_addr_equal_64bits() requires some care about its arguments, namely that 8 bytes might be read, even if last 2 byte values are not used. KASan detected a violation with null_mac_addr and lacpdu_mcast_addr in bond_3ad.c Same problem with mac_bcast[] and mac_v6_allmcast[] in bond_alb.c : Although the 8-byte alignment was there, KASan would detect out of bound accesses. Fixes: 815117adaf5b ("bonding: use ether_addr_equal_unaligned for bond addr compare") Fixes: bb54e58929f3 ("bonding: Verify RX LACPDU has proper dest mac-addr") Fixes: 885a136c52a8 ("bonding: use compare_ether_addr_64bits() in ALB") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-28bonding: fix 802.3ad aggregator reselectionJay Vosburgh
Since commit 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to be updated."), the logic in bonding to handle selection between multiple aggregators has not functioned. This affects only configurations wherein the bonding slaves connect to two discrete aggregators (e.g., two independent switches, each with LACP enabled), thus creating two separate aggregation groups within a single bond. The cause is a change in 7bb11dc9f59d to no longer set AD_PORT_BEGIN on a port after a link state change, which would cause the port to be reselected for attachment to an aggregator as if were newly added to the bond. We cannot restore the prior behavior, as it contradicts IEEE 802.1AX 5.4.12, which requires ports that "become inoperable" (lose carrier, setting port_enabled=false as per 802.1AX 5.4.7) to remain selected (i.e., assigned to the aggregator). As the port now remains selected, the aggregator selection logic is not invoked. A side effect of this change is that aggregators in bonding will now contain ports that are link down. The aggregator selection logic does not currently handle this situation correctly, causing incorrect aggregator selection. This patch makes two changes to repair the aggregator selection logic in bonding to function as documented and within the confines of the standard: First, the aggregator selection and related logic now utilizes the number of active ports per aggregator, not the number of selected ports (as some selected ports may be down). The ad_select "bandwidth" and "count" options only consider ports that are link up. Second, on any carrier state change of any slave, the aggregator selection logic is explicitly called to insure the correct aggregator is active. Reported-by: Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> Fixes: 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to be updated.") Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net: add netdev_lockdep_set_classes() helperEric Dumazet
It is time to add netdev_lockdep_set_classes() helper so that lockdep annotations per device type are easier to manage. This removes a lot of copies and missing annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net_sched: transform qdisc running bit into a seqcountEric Dumazet
Instead of using a single bit (__QDISC___STATE_RUNNING) in sch->__state, use a seqcount. This adds lockdep support, but more importantly it will allow us to sample qdisc/class statistics without having to grab qdisc root lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18bonding: fix bond_get_stats()Eric Dumazet
bond_get_stats() can be called from rtnetlink (with RTNL held) or from /proc/net/dev seq handler (with RCU held) The logic added in commit 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") kind of assumed only one cpu could run there. If multiple threads are reading /proc/net/dev, stats can be really messed up after a while. A second problem is that some fields are 32bit, so we need to properly handle the wrap around problem. Given that RTNL is not always held, we need to use bond_for_each_slave_rcu(). Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18bonding: remove duplicate set of flag IFF_MULTICASTZhang Shengju
Remove unnecessary set of flag IFF_MULTICAST, since ether_setup already does this. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25net: bonding: use __ethtool_get_ksettingsDavid Decotigny
Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16bonding: don't use stale speed and duplex informationJay Vosburgh
There is presently a race condition between the bonding periodic link monitor and the updating of a slave's speed and duplex. The former occurs on a periodic basis, and the latter in response to a driver's calling of netif_carrier_on. It is possible for the periodic monitor to run between the driver call of netif_carrier_on and the receipt of the NETDEV_CHANGE event that causes bonding to update the slave's speed and duplex. This manifests most notably as a report that a slave is up and "0 Mbps full duplex" after enslavement, but in principle could report an incorrect speed and duplex after any link up event if the device comes up with a different speed or duplex. This affects the 802.3ad aggregator selection, as the speed and duplex are selection criteria. This is fixed by updating the speed and duplex in the periodic monitor, prior to using that information. This was done historically in bonding, but the call to bond_update_speed_duplex was removed in commit 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks"), as it might sleep under lock. Later, the locking was changed to only hold RTNL, and so after commit 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks") this call is again safe. Tested-by: "Tantilov, Emil S" <emil.s.tantilov@intel.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: dingtianhong <dingtianhong@huawei.com> Fixes: 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks") Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-13bonding: Fix ARP monitor validationJay Vosburgh
The current logic in bond_arp_rcv will accept an incoming ARP for validation if (a) the receiving slave is either "active" (which includes the currently active slave, or the current ARP slave) or, (b) there is a currently active slave, and it has received an ARP since it became active. For case (b), the receiving slave isn't the currently active slave, and is receiving the original broadcast ARP request, not an ARP reply from the target. This logic can fail if there is no currently active slave. In this situation, the ARP probe logic cycles through all slaves, assigning each in turn as the "current_arp_slave" for one arp_interval, then setting that one as "active," and sending an ARP probe from that slave. The current logic expects the ARP reply to arrive on the sending current_arp_slave, however, due to switch FDB updating delays, the reply may be directed to another slave. This can arise if the bonding slaves and switch are working, but the ARP target is not responding. When the ARP target recovers, a condition may result wherein the ARP target host replies faster than the switch can update its forwarding table, causing each ARP reply to be sent to the previous current_arp_slave. This will never pass the logic in bond_arp_rcv, as neither of the above conditions (a) or (b) are met. Some experimentation on a LAN shows ARP reply round trips in the 200 usec range, but my available switches never update their FDB in less than 4000 usec. This patch changes the logic in bond_arp_rcv to additionally accept an ARP reply for validation on any slave if there is a current ARP slave and it sent an ARP probe during the previous arp_interval. Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works") Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11bonding: use return instead of gotoZhang Shengju
Replace 'goto' with 'return' to remove unnecessary check at label: err_undo_flags. The reason is that 'err_undo_flags' do two things for the first slave device: 1.revert bond mac address if it is set by the slave device. 2.revert bond device type if it's not ARPHRD_ETHER. It's not necessary for the following three places, they changed neither bond mac address nor type. It's straightforward to return directly. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11bonding: Return correct error codeAmitoj Kaur Chawla
The return value of kzalloc on failure of allocation of memory should be -ENOMEM and not -1. Found using Coccinelle. A simplified version of the semantic patch used is: //<smpl> @@ expression *e; @@ e = kzalloc(...); if (e == NULL) { ... return - -1 + -ENOMEM ; } //</smpl> The single call site only checks that the return value is not 0, hence no change is required at the call site. Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11bonding: 3ad: allow to set ad_actor settings while the bond is upNikolay Aleksandrov
No need to require the bond down while changing these settings, the change will be reflected immediately and the 3ad mode will sort itself out. For faster convergence set port->ntt to true in order to generate new LACPDUs immediately. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09bonding: 3ad: apply ad_actor settings changes immediatelyNikolay Aleksandrov
Currently the bonding allows to set ad_actor_system and prio while the bond device is down, but these are actually applied only if there aren't any slaves yet (applied to bond device when first slave shows up, and to slaves at 3ad bind time). After this patch changes are applied immediately and the new values can be used/seen after the bond's upped so it's not necessary anymore to release all and enslave again to see the changes. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-08bonding: trivial: style fixesZhang Shengju
remove some redudant brackets, use sizeof(*) instead of sizeof(struct x). Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06bonding: add slave device name for debugZhang Shengju
netdev_dbg() will add bond device name, it will be helpful if we print slave device name. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06bond: track sum of rx_nohandler for all slavesJarod Wilson
Sample output with this set applied for an active-backup bond: $ cat /sys/devices/virtual/net/bond0/lower_p7p1/statistics/rx_nohandler 16568 $ cat /sys/devices/virtual/net/bond0/lower_p5p2/statistics/rx_nohandler 16583 $ cat /sys/devices/virtual/net/bond0/statistics/rx_nohandler 33151 CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/bonding/bond_main.c drivers/net/ethernet/mellanox/mlxsw/spectrum.h drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c The bond_main.c and mellanox switch conflicts were cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11bonding: Prevent IPv6 link local address on enslaved devicesKarl Heiss
Commit 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") undoes the fix provided by commit c2edacf80e15 ("bonding / ipv6: no addrconf for slaves separately from master") by effectively setting the slave flag after the slave has been opened. If the slave comes up quickly enough, it will go through the IPv6 addrconf before the slave flag has been set and will get a link local IPv6 address. In order to ensure that addrconf knows to ignore the slave devices on state change, set IFF_SLAVE before dev_open() during bonding enslavement. Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") Signed-off-by: Karl Heiss <kheiss@gmail.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11bonding: make mii_status sysfs node consistentJarod Wilson
The spew in /proc/net/bonding/bond0 uses netif_carrier_ok() to determine mii_status, while /sys/class/net/bond0/bonding/mii_status looks at curr_active_slave, which doesn't actually seem to be set sometimes when the bond actually is up. A mode 4 bond configured via ifcfg-foo files on a Red Hat Enterprise Linux system, after boot, comes up clean and functional, but the sysfs node shows mii_status of down, while proc shows up. A simple enough fix here seems to be to use the same method for determining up or down in both places, and I'd opt for the one that seems to match reality. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23bonding: drop unused to_dev macro in bond_sysfs.cGeliang Tang
to_dev is not used anymore so drop it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-15net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASKTom Herbert
The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the set of features for offloading all checksums. This is a mask of the checksum offload related features bits. It is incorrect to set both NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for features of a device. This patch: - Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where NETIF_F_ALL_CSUM is being used as a mask). - Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>