aboutsummaryrefslogtreecommitdiff
path: root/net/core
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2013-11-13 17:40:34 +0900
committerLinus Torvalds <torvalds@linux-foundation.org>2013-11-13 17:40:34 +0900
commit42a2d923cc349583ebf6fdd52a7d35e1c2f7e6bd (patch)
tree2b2b0c03b5389c1301800119333967efafd994ca /net/core
parent5cbb3d216e2041700231bcfc383ee5f8b7fc8b74 (diff)
parent75ecab1df14d90e86cebef9ec5c76befde46e65f (diff)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: 1) The addition of nftables. No longer will we need protocol aware firewall filtering modules, it can all live in userspace. At the core of nftables is a, for lack of a better term, virtual machine that executes byte codes to inspect packet or metadata (arriving interface index, etc.) and make verdict decisions. Besides support for loading packet contents and comparing them, the interpreter supports lookups in various datastructures as fundamental operations. For example sets are supports, and therefore one could create a set of whitelist IP address entries which have ACCEPT verdicts attached to them, and use the appropriate byte codes to do such lookups. Since the interpreted code is composed in userspace, userspace can do things like optimize things before giving it to the kernel. Another major improvement is the capability of atomically updating portions of the ruleset. In the existing netfilter implementation, one has to update the entire rule set in order to make a change and this is very expensive. Userspace tools exist to create nftables rules using existing netfilter rule sets, but both kernel implementations will need to co-exist for quite some time as we transition from the old to the new stuff. Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have worked so hard on this. 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements to our pseudo-random number generator, mostly used for things like UDP port randomization and netfitler, amongst other things. In particular the taus88 generater is updated to taus113, and test cases are added. 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet and Yang Yingliang. 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin Sujir. 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet, Neal Cardwell, and Yuchung Cheng. 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary control message data, much like other socket option attributes. From Francesco Fusco. 7) Allow applications to specify a cap on the rate computed automatically by the kernel for pacing flows, via a new SO_MAX_PACING_RATE socket option. From Eric Dumazet. 8) Make the initial autotuned send buffer sizing in TCP more closely reflect actual needs, from Eric Dumazet. 9) Currently early socket demux only happens for TCP sockets, but we can do it for connected UDP sockets too. Implementation from Shawn Bohrer. 10) Refactor inet socket demux with the goal of improving hash demux performance for listening sockets. With the main goals being able to use RCU lookups on even request sockets, and eliminating the listening lock contention. From Eric Dumazet. 11) The bonding layer has many demuxes in it's fast path, and an RCU conversion was started back in 3.11, several changes here extend the RCU usage to even more locations. From Ding Tianhong and Wang Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav Falico. 12) Allow stackability of segmentation offloads to, in particular, allow segmentation offloading over tunnels. From Eric Dumazet. 13) Significantly improve the handling of secret keys we input into the various hash functions in the inet hashtables, TCP fast open, as well as syncookies. From Hannes Frederic Sowa. The key fundamental operation is "net_get_random_once()" which uses static keys. Hannes even extended this to ipv4/ipv6 fragmentation handling and our generic flow dissector. 14) The generic driver layer takes care now to set the driver data to NULL on device removal, so it's no longer necessary for drivers to explicitly set it to NULL any more. Many drivers have been cleaned up in this way, from Jingoo Han. 15) Add a BPF based packet scheduler classifier, from Daniel Borkmann. 16) Improve CRC32 interfaces and generic SKB checksum iterators so that SCTP's checksumming can more cleanly be handled. Also from Daniel Borkmann. 17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces using the interface MTU value. This helps avoid PMTU attacks, particularly on DNS servers. From Hannes Frederic Sowa. 18) Use generic XPS for transmit queue steering rather than internal (re-)implementation in virtio-net. From Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) random32: add test cases for taus113 implementation random32: upgrade taus88 generator to taus113 from errata paper random32: move rnd_state to linux/random.h random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized random32: add periodic reseeding random32: fix off-by-one in seeding requirement PHY: Add RTL8201CP phy_driver to realtek xtsonic: add missing platform_set_drvdata() in xtsonic_probe() macmace: add missing platform_set_drvdata() in mace_probe() ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe() ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh vlan: Implement vlan_dev_get_egress_qos_mask as an inline. ixgbe: add warning when max_vfs is out of range. igb: Update link modes display in ethtool netfilter: push reasm skb through instead of original frag skbs ip6_output: fragment outgoing reassembled skb properly MAINTAINERS: mv643xx_eth: take over maintainership from Lennart net_sched: tbf: support of 64bit rates ixgbe: deleting dfwd stations out of order can cause null ptr deref ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS ...
Diffstat (limited to 'net/core')
-rw-r--r--net/core/datagram.c2
-rw-r--r--net/core/dev.c557
-rw-r--r--net/core/dev_addr_lists.c4
-rw-r--r--net/core/ethtool.c3
-rw-r--r--net/core/fib_rules.c3
-rw-r--r--net/core/flow_dissector.c79
-rw-r--r--net/core/iovec.c2
-rw-r--r--net/core/neighbour.c2
-rw-r--r--net/core/net-sysfs.c2
-rw-r--r--net/core/netprio_cgroup.c3
-rw-r--r--net/core/rtnetlink.c12
-rw-r--r--net/core/secure_seq.c16
-rw-r--r--net/core/skbuff.c144
-rw-r--r--net/core/sock.c45
-rw-r--r--net/core/utils.c49
15 files changed, 632 insertions, 291 deletions
diff --git a/net/core/datagram.c b/net/core/datagram.c
index af814e76420..a16ed7bbe37 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -577,7 +577,7 @@ EXPORT_SYMBOL(skb_copy_datagram_from_iovec);
/**
* zerocopy_sg_from_iovec - Build a zerocopy datagram from an iovec
* @skb: buffer to copy
- * @from: io vector to copy to
+ * @from: io vector to copy from
* @offset: offset in the io vector to start copying from
* @count: amount of vectors to copy to buffer from
*
diff --git a/net/core/dev.c b/net/core/dev.c
index 3430b1ed12e..8ffc52e01ec 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1203,7 +1203,7 @@ void netdev_state_change(struct net_device *dev)
{
if (dev->flags & IFF_UP) {
call_netdevice_notifiers(NETDEV_CHANGE, dev);
- rtmsg_ifinfo(RTM_NEWLINK, dev, 0);
+ rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
}
}
EXPORT_SYMBOL(netdev_state_change);
@@ -1293,7 +1293,7 @@ int dev_open(struct net_device *dev)
if (ret < 0)
return ret;
- rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
+ rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
call_netdevice_notifiers(NETDEV_UP, dev);
return ret;
@@ -1307,7 +1307,7 @@ static int __dev_close_many(struct list_head *head)
ASSERT_RTNL();
might_sleep();
- list_for_each_entry(dev, head, unreg_list) {
+ list_for_each_entry(dev, head, close_list) {
call_netdevice_notifiers(NETDEV_GOING_DOWN, dev);
clear_bit(__LINK_STATE_START, &dev->state);
@@ -1323,7 +1323,7 @@ static int __dev_close_many(struct list_head *head)
dev_deactivate_many(head);
- list_for_each_entry(dev, head, unreg_list) {
+ list_for_each_entry(dev, head, close_list) {
const struct net_device_ops *ops = dev->netdev_ops;
/*
@@ -1351,7 +1351,7 @@ static int __dev_close(struct net_device *dev)
/* Temporarily disable netpoll until the interface is down */
netpoll_rx_disable(dev);
- list_add(&dev->unreg_list, &single);
+ list_add(&dev->close_list, &single);
retval = __dev_close_many(&single);
list_del(&single);
@@ -1362,21 +1362,20 @@ static int __dev_close(struct net_device *dev)
static int dev_close_many(struct list_head *head)
{
struct net_device *dev, *tmp;
- LIST_HEAD(tmp_list);
- list_for_each_entry_safe(dev, tmp, head, unreg_list)
+ /* Remove the devices that don't need to be closed */
+ list_for_each_entry_safe(dev, tmp, head, close_list)
if (!(dev->flags & IFF_UP))
- list_move(&dev->unreg_list, &tmp_list);
+ list_del_init(&dev->close_list);
__dev_close_many(head);
- list_for_each_entry(dev, head, unreg_list) {
- rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
+ list_for_each_entry_safe(dev, tmp, head, close_list) {
+ rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
call_netdevice_notifiers(NETDEV_DOWN, dev);
+ list_del_init(&dev->close_list);
}
- /* rollback_registered_many needs the complete original list */
- list_splice(&tmp_list, head);
return 0;
}
@@ -1397,7 +1396,7 @@ int dev_close(struct net_device *dev)
/* Block netpoll rx while the interface is going down */
netpoll_rx_disable(dev);
- list_add(&dev->unreg_list, &single);
+ list_add(&dev->close_list, &single);
dev_close_many(&single);
list_del(&single);
@@ -2378,6 +2377,8 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
}
SKB_GSO_CB(skb)->mac_offset = skb_headroom(skb);
+ SKB_GSO_CB(skb)->encap_level = 0;
+
skb_reset_mac_header(skb);
skb_reset_mac_len(skb);
@@ -2537,7 +2538,7 @@ static inline int skb_needs_linearize(struct sk_buff *skb,
}
int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
- struct netdev_queue *txq)
+ struct netdev_queue *txq, void *accel_priv)
{
const struct net_device_ops *ops = dev->netdev_ops;
int rc = NETDEV_TX_OK;
@@ -2603,9 +2604,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
dev_queue_xmit_nit(skb, dev);
skb_len = skb->len;
- rc = ops->ndo_start_xmit(skb, dev);
+ if (accel_priv)
+ rc = ops->ndo_dfwd_start_xmit(skb, dev, accel_priv);
+ else
+ rc = ops->ndo_start_xmit(skb, dev);
+
trace_net_dev_xmit(skb, rc, dev, skb_len);
- if (rc == NETDEV_TX_OK)
+ if (rc == NETDEV_TX_OK && txq)
txq_trans_update(txq);
return rc;
}
@@ -2621,7 +2626,10 @@ gso:
dev_queue_xmit_nit(nskb, dev);
skb_len = nskb->len;
- rc = ops->ndo_start_xmit(nskb, dev);
+ if (accel_priv)
+ rc = ops->ndo_dfwd_start_xmit(nskb, dev, accel_priv);
+ else
+ rc = ops->ndo_start_xmit(nskb, dev);
trace_net_dev_xmit(nskb, rc, dev, skb_len);
if (unlikely(rc != NETDEV_TX_OK)) {
if (rc & ~NETDEV_TX_MASK)
@@ -2646,6 +2654,7 @@ out_kfree_skb:
out:
return rc;
}
+EXPORT_SYMBOL_GPL(dev_hard_start_xmit);
static void qdisc_pkt_len_init(struct sk_buff *skb)
{
@@ -2853,7 +2862,7 @@ int dev_queue_xmit(struct sk_buff *skb)
if (!netif_xmit_stopped(txq)) {
__this_cpu_inc(xmit_recursion);
- rc = dev_hard_start_xmit(skb, dev, txq);
+ rc = dev_hard_start_xmit(skb, dev, txq, NULL);
__this_cpu_dec(xmit_recursion);
if (dev_xmit_complete(rc)) {
HARD_TX_UNLOCK(dev, txq);
@@ -4374,42 +4383,40 @@ struct netdev_adjacent {
/* upper master flag, there can only be one master device per list */
bool master;
- /* indicates that this dev is our first-level lower/upper device */
- bool neighbour;
-
/* counter for the number of times this device was added to us */
u16 ref_nr;
+ /* private field for the users */
+ void *private;
+
struct list_head list;
struct rcu_head rcu;
};
-static struct netdev_adjacent *__netdev_find_adj(struct net_device *dev,
- struct net_device *adj_dev,
- bool upper)
+static struct netdev_adjacent *__netdev_find_adj_rcu(struct net_device *dev,
+ struct net_device *adj_dev,
+ struct list_head *adj_list)
{
struct netdev_adjacent *adj;
- struct list_head *dev_list;
-
- dev_list = upper ? &dev->upper_dev_list : &dev->lower_dev_list;
- list_for_each_entry(adj, dev_list, list) {
+ list_for_each_entry_rcu(adj, adj_list, list) {
if (adj->dev == adj_dev)
return adj;
}
return NULL;
}
-static inline struct netdev_adjacent *__netdev_find_upper(struct net_device *dev,
- struct net_device *udev)
+static struct netdev_adjacent *__netdev_find_adj(struct net_device *dev,
+ struct net_device *adj_dev,
+ struct list_head *adj_list)
{
- return __netdev_find_adj(dev, udev, true);
-}
+ struct netdev_adjacent *adj;
-static inline struct netdev_adjacent *__netdev_find_lower(struct net_device *dev,
- struct net_device *ldev)
-{
- return __netdev_find_adj(dev, ldev, false);
+ list_for_each_entry(adj, adj_list, list) {
+ if (adj->dev == adj_dev)
+ return adj;
+ }
+ return NULL;
}
/**
@@ -4426,7 +4433,7 @@ bool netdev_has_upper_dev(struct net_device *dev,
{
ASSERT_RTNL();
- return __netdev_find_upper(dev, upper_dev);
+ return __netdev_find_adj(dev, upper_dev, &dev->all_adj_list.upper);
}
EXPORT_SYMBOL(netdev_has_upper_dev);
@@ -4441,7 +4448,7 @@ bool netdev_has_any_upper_dev(struct net_device *dev)
{
ASSERT_RTNL();
- return !list_empty(&dev->upper_dev_list);
+ return !list_empty(&dev->all_adj_list.upper);
}
EXPORT_SYMBOL(netdev_has_any_upper_dev);
@@ -4458,10 +4465,10 @@ struct net_device *netdev_master_upper_dev_get(struct net_device *dev)
ASSERT_RTNL();
- if (list_empty(&dev->upper_dev_list))
+ if (list_empty(&dev->adj_list.upper))
return NULL;
- upper = list_first_entry(&dev->upper_dev_list,
+ upper = list_first_entry(&dev->adj_list.upper,
struct netdev_adjacent, list);
if (likely(upper->master))
return upper->dev;
@@ -4469,15 +4476,26 @@ struct net_device *netdev_master_upper_dev_get(struct net_device *dev)
}
EXPORT_SYMBOL(netdev_master_upper_dev_get);
-/* netdev_upper_get_next_dev_rcu - Get the next dev from upper list
+void *netdev_adjacent_get_private(struct list_head *adj_list)
+{
+ struct netdev_adjacent *adj;
+
+ adj = list_entry(adj_list, struct netdev_adjacent, list);
+
+ return adj->private;
+}
+EXPORT_SYMBOL(netdev_adjacent_get_private);
+
+/**
+ * netdev_all_upper_get_next_dev_rcu - Get the next dev from upper list
* @dev: device
* @iter: list_head ** of the current position
*
* Gets the next device from the dev's upper list, starting from iter
* position. The caller must hold RCU read lock.
*/
-struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
- struct list_head **iter)
+struct net_device *netdev_all_upper_get_next_dev_rcu(struct net_device *dev,
+ struct list_head **iter)
{
struct netdev_adjacent *upper;
@@ -4485,14 +4503,71 @@ struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
upper = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
- if (&upper->list == &dev->upper_dev_list)
+ if (&upper->list == &dev->all_adj_list.upper)
return NULL;
*iter = &upper->list;
return upper->dev;
}
-EXPORT_SYMBOL(netdev_upper_get_next_dev_rcu);
+EXPORT_SYMBOL(netdev_all_upper_get_next_dev_rcu);
+
+/**
+ * netdev_lower_get_next_private - Get the next ->private from the
+ * lower neighbour list
+ * @dev: device
+ * @iter: list_head ** of the current position
+ *
+ * Gets the next netdev_adjacent->private from the dev's lower neighbour
+ * list, starting from iter position. The caller must hold either hold the
+ * RTNL lock or its own locking that guarantees that the neighbour lower
+ * list will remain unchainged.
+ */
+void *netdev_lower_get_next_private(struct net_device *dev,
+ struct list_head **iter)
+{
+ struct netdev_adjacent *lower;
+
+ lower = list_entry(*iter, struct netdev_adjacent, list);
+
+ if (&lower->list == &dev->adj_list.lower)
+ return NULL;
+
+ if (iter)
+ *iter = lower->list.next;
+
+ return lower->private;
+}
+EXPORT_SYMBOL(netdev_lower_get_next_private);
+
+/**
+ * netdev_lower_get_next_private_rcu - Get the next ->private from the
+ * lower neighbour list, RCU
+ * variant
+ * @dev: device
+ * @iter: list_head ** of the current position
+ *
+ * Gets the next netdev_adjacent->private from the dev's lower neighbour
+ * list, starting from iter position. The caller must hold RCU read lock.
+ */
+void *netdev_lower_get_next_private_rcu(struct net_device *dev,
+ struct list_head **iter)
+{
+ struct netdev_adjacent *lower;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
+
+ if (&lower->list == &dev->adj_list.lower)
+ return NULL;
+
+ if (iter)
+ *iter = &lower->list;
+
+ return lower->private;
+}
+EXPORT_SYMBOL(netdev_lower_get_next_private_rcu);
/**
* netdev_master_upper_dev_get_rcu - Get master upper device
@@ -4505,7 +4580,7 @@ struct net_device *netdev_master_upper_dev_get_rcu(struct net_device *dev)
{
struct netdev_adjacent *upper;
- upper = list_first_or_null_rcu(&dev->upper_dev_list,
+ upper = list_first_or_null_rcu(&dev->adj_list.upper,
struct netdev_adjacent, list);
if (upper && likely(upper->master))
return upper->dev;
@@ -4515,15 +4590,16 @@ EXPORT_SYMBOL(netdev_master_upper_dev_get_rcu);
static int __netdev_adjacent_dev_insert(struct net_device *dev,
struct net_device *adj_dev,
- bool neighbour, bool master,
- bool upper)
+ struct list_head *dev_list,
+ void *private, bool master)
{
struct netdev_adjacent *adj;
+ char linkname[IFNAMSIZ+7];
+ int ret;
- adj = __netdev_find_adj(dev, adj_dev, upper);
+ adj = __netdev_find_adj(dev, adj_dev, dev_list);
if (adj) {
- BUG_ON(neighbour);
adj->ref_nr++;
return 0;
}
@@ -4534,124 +4610,179 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
adj->dev = adj_dev;
adj->master = master;
- adj->neighbour = neighbour;
adj->ref_nr = 1;
-
+ adj->private = private;
dev_hold(adj_dev);
- pr_debug("dev_hold for %s, because of %s link added from %s to %s\n",
- adj_dev->name, upper ? "upper" : "lower", dev->name,
- adj_dev->name);
- if (!upper) {
- list_add_tail_rcu(&adj->list, &dev->lower_dev_list);
- return 0;
+ pr_debug("dev_hold for %s, because of link added from %s to %s\n",
+ adj_dev->name, dev->name, adj_dev->name);
+
+ if (dev_list == &dev->adj_list.lower) {
+ sprintf(linkname, "lower_%s", adj_dev->name);
+ ret = sysfs_create_link(&(dev->dev.kobj),
+ &(adj_dev->dev.kobj), linkname);
+ if (ret)
+ goto free_adj;
+ } else if (dev_list == &dev->adj_list.upper) {
+ sprintf(linkname, "upper_%s", adj_dev->name);
+ ret = sysfs_create_link(&(dev->dev.kobj),
+ &(adj_dev->dev.kobj), linkname);
+ if (ret)
+ goto free_adj;
}
- /* Ensure that master upper link is always the first item in list. */
- if (master)
- list_add_rcu(&adj->list, &dev->upper_dev_list);
- else
- list_add_tail_rcu(&adj->list, &dev->upper_dev_list);
+ /* Ensure that master link is always the first item in list. */
+ if (master) {
+ ret = sysfs_create_link(&(dev->dev.kobj),
+ &(adj_dev->dev.kobj), "master");
+ if (ret)
+ goto remove_symlinks;
+
+ list_add_rcu(&adj->list, dev_list);
+ } else {
+ list_add_tail_rcu(&adj->list, dev_list);
+ }
return 0;
-}
-static inline int __netdev_upper_dev_insert(struct net_device *dev,
- struct net_device *udev,
- bool master, bool neighbour)
-{
- return __netdev_adjacent_dev_insert(dev, udev, neighbour, master,
- true);
-}
+remove_symlinks:
+ if (dev_list == &dev->adj_list.lower) {
+ sprintf(linkname, "lower_%s", adj_dev->name);
+ sysfs_remove_link(&(dev->dev.kobj), linkname);
+ } else if (dev_list == &dev->adj_list.upper) {
+ sprintf(linkname, "upper_%s", adj_dev->name);
+ sysfs_remove_link(&(dev->dev.kobj), linkname);
+ }
-static inline int __netdev_lower_dev_insert(struct net_device *dev,
- struct net_device *ldev,
- bool neighbour)
-{
- return __netdev_adjacent_dev_insert(dev, ldev, neighbour, false,
- false);
+free_adj:
+ kfree(adj);
+ dev_put(adj_dev);
+
+ return ret;
}
void __netdev_adjacent_dev_remove(struct net_device *dev,
- struct net_device *adj_dev, bool upper)
+ struct net_device *adj_dev,
+ struct list_head *dev_list)
{
struct netdev_adjacent *adj;
+ char linkname[IFNAMSIZ+7];
- if (upper)
- adj = __netdev_find_upper(dev, adj_dev);
- else
- adj = __netdev_find_lower(dev, adj_dev);
+ adj = __netdev_find_adj(dev, adj_dev, dev_list);
- if (!adj)
+ if (!adj) {
+ pr_err("tried to remove device %s from %s\n",
+ dev->name, adj_dev->name);
BUG();
+ }
if (adj->ref_nr > 1) {
+ pr_debug("%s to %s ref_nr-- = %d\n", dev->name, adj_dev->name,
+ adj->ref_nr-1);
adj->ref_nr--;
return;
}
+ if (adj->master)
+ sysfs_remove_link(&(dev->dev.kobj), "master");
+
+ if (dev_list == &dev->adj_list.lower) {
+ sprintf(linkname, "lower_%s", adj_dev->name);
+ sysfs_remove_link(&(dev->dev.kobj), linkname);
+ } else if (dev_list == &dev->adj_list.upper) {
+ sprintf(linkname, "upper_%s", adj_dev->name);
+ sysfs_remove_link(&(dev->dev.kobj), linkname);
+ }
+
list_del_rcu(&adj->list);
- pr_debug("dev_put for %s, because of %s link removed from %s to %s\n",
- adj_dev->name, upper ? "upper" : "lower", dev->name,
- adj_dev->name);
+ pr_debug("dev_put for %s, because link removed from %s to %s\n",
+ adj_dev->name, dev->name, adj_dev->name);
dev_put(adj_dev);
kfree_rcu(adj, rcu);
}
-static inline void __netdev_upper_dev_remove(struct net_device *dev,
- struct net_device *udev)
-{
- return __netdev_adjacent_dev_remove(dev, udev, true);
-}
-
-static inline void __netdev_lower_dev_remove(struct net_device *dev,
- struct net_device *ldev)
-{
- return __netdev_adjacent_dev_remove(dev, ldev, false);
-}
-
-int __netdev_adjacent_dev_insert_link(struct net_device *dev,
- struct net_device *upper_dev,
- bool master, bool neighbour)
+int __netdev_adjacent_dev_link_lists(struct net_device *dev,
+ struct net_device *upper_dev,
+ struct list_head *up_list,
+ struct list_head *down_list,
+ void *private, bool master)
{
int ret;
- ret = __netdev_upper_dev_insert(dev, upper_dev, master, neighbour);
+ ret = __netdev_adjacent_dev_insert(dev, upper_dev, up_list, private,
+ master);
if (ret)
return ret;
- ret = __netdev_lower_dev_insert(upper_dev, dev, neighbour);
+ ret = __netdev_adjacent_dev_insert(upper_dev, dev, down_list, private,
+ false);
if (ret) {
- __netdev_upper_dev_remove(dev, upper_dev);
+ __netdev_adjacent_dev_remove(dev, upper_dev, up_list);
return ret;
}
return 0;
}
-static inline int __netdev_adjacent_dev_link(struct net_device *dev,
- struct net_device *udev)
+int __netdev_adjacent_dev_link(struct net_device *dev,
+ struct net_device *upper_dev)
{
- return __netdev_adjacent_dev_insert_link(dev, udev, false, false);
+ return __netdev_adjacent_dev_link_lists(dev, upper_dev,
+ &dev->all_adj_list.upper,
+ &upper_dev->all_adj_list.lower,
+ NULL, false);
}
-static inline int __netdev_adjacent_dev_link_neighbour(struct net_device *dev,
- struct net_device *udev,
- bool master)
+void __netdev_adjacent_dev_unlink_lists(struct net_device *dev,
+ struct net_device *upper_dev,
+ struct list_head *up_list,
+ struct list_head *down_list)
{
- return __netdev_adjacent_dev_insert_link(dev, udev, master, true);
+ __netdev_adjacent_dev_remove(dev, upper_dev, up_list);
+ __netdev_adjacent_dev_remove(upper_dev, dev, down_list);
}
void __netdev_adjacent_dev_unlink(struct net_device *dev,
struct net_device *upper_dev)
{
- __netdev_upper_dev_remove(dev, upper_dev);
- __netdev_lower_dev_remove(upper_dev, dev);
+ __netdev_adjacent_dev_unlink_lists(dev, upper_dev,
+ &dev->all_adj_list.upper,
+ &upper_dev->all_adj_list.lower);
}
+int __netdev_adjacent_dev_link_neighbour(struct net_device *dev,
+ struct net_device *upper_dev,
+ void *private, bool master)
+{
+ int ret = __netdev_adjacent_dev_link(dev, upper_dev);
+
+ if (ret)
+ return ret;
+
+ ret = __netdev_adjacent_dev_link_lists(dev, upper_dev,
+ &dev->adj_list.upper,
+ &upper_dev->adj_list.lower,
+ private, master);
+ if (ret) {
+ __netdev_adjacent_dev_unlink(dev, upper_dev);
+ return ret;
+ }
+
+ return 0;
+}
+
+void __netdev_adjacent_dev_unlink_neighbour(struct net_device *dev,
+ struct net_device *upper_dev)
+{
+ __netdev_adjacent_dev_unlink(dev, upper_dev);
+ __netdev_adjacent_dev_unlink_lists(dev, upper_dev,
+ &dev->adj_list.upper,
+ &upper_dev->adj_list.lower);
+}
static int __netdev_upper_dev_link(struct net_device *dev,
- struct net_device *upper_dev, bool master)
+ struct net_device *upper_dev, bool master,
+ void *private)
{
struct netdev_adjacent *i, *j, *to_i, *to_j;
int ret = 0;
@@ -4662,26 +4793,29 @@ static int __netdev_upper_dev_link(struct net_device *dev,
return -EBUSY;
/* To prevent loops, check if dev is not upper device to upper_dev. */
- if (__netdev_find_upper(upper_dev, dev))
+ if (__netdev_find_adj(upper_dev, dev, &upper_dev->all_adj_list.upper))
return -EBUSY;
- if (__netdev_find_upper(dev, upper_dev))
+ if (__netdev_find_adj(dev, upper_dev, &dev->all_adj_list.upper))
return -EEXIST;
if (master && netdev_master_upper_dev_get(dev))
return -EBUSY;
- ret = __netdev_adjacent_dev_link_neighbour(dev, upper_dev, master);
+ ret = __netdev_adjacent_dev_link_neighbour(dev, upper_dev, private,
+ master);
if (ret)
return ret;
/* Now that we linked these devs, make all the upper_dev's
- * upper_dev_list visible to every dev's lower_dev_list and vice
+ * all_adj_list.upper visible to every dev's all_adj_list.lower an
* versa, and don't forget the devices itself. All of these
* links are non-neighbours.
*/
- list_for_each_entry(i, &dev->lower_dev_list, list) {
- list_for_each_entry(j, &upper_dev->upper_dev_list, list) {
+ list_for_each_entry(i, &dev->all_adj_list.lower, list) {
+ list_for_each_entry(j, &upper_dev->all_adj_list.upper, list) {
+ pr_debug("Interlinking %s with %s, non-neighbour\n",
+ i->dev->name, j->dev->name);
ret = __netdev_adjacent_dev_link(i->dev, j->dev);
if (ret)
goto rollback_mesh;
@@ -4689,14 +4823,18 @@ static int __netdev_upper_dev_link(struct net_device *dev,
}
/* add dev to every upper_dev's upper device */
- list_for_each_entry(i, &upper_dev->upper_dev_list, list) {
+ list_for_each_entry(i, &upper_dev->all_adj_list.upper, list) {
+ pr_debug("linking %s's upper device %s with %s\n",
+ upper_dev->name, i->dev->name, dev->name);
ret = __netdev_adjacent_dev_link(dev, i->dev);
if (ret)
goto rollback_upper_mesh;
}
/* add upper_dev to every dev's lower device */
- list_for_each_entry(i, &dev->lower_dev_list, list) {
+ list_for_each_entry(i, &dev->all_adj_list.lower, list) {
+ pr_debug("linking %s's lower device %s with %s\n", dev->name,
+ i->dev->name, upper_dev->name);
ret = __netdev_adjacent_dev_link(i->dev, upper_dev);
if (ret)
goto rollback_lower_mesh;
@@ -4707,7 +4845,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
rollback_lower_mesh:
to_i = i;
- list_for_each_entry(i, &dev->lower_dev_list, list) {
+ list_for_each_entry(i, &dev->all_adj_list.lower, list) {
if (i == to_i)
break;
__netdev_adjacent_dev_unlink(i->dev, upper_dev);
@@ -4717,7 +4855,7 @@ rollback_lower_mesh:
rollback_upper_mesh:
to_i = i;
- list_for_each_entry(i, &upper_dev->upper_dev_list, list) {
+ list_for_each_entry(i, &upper_dev->all_adj_list.upper, list) {
if (i == to_i)
break;
__netdev_adjacent_dev_unlink(dev, i->dev);
@@ -4728,8 +4866,8 @@ rollback_upper_mesh:
rollback_mesh:
to_i = i;
to_j = j;
- list_for_each_entry(i, &dev->lower_dev_list, list) {
- list_for_each_entry(j, &upper_dev->upper_dev_list, list) {
+ list_for_each_entry(i, &dev->all_adj_list.lower, list) {
+ list_for_each_entry(j, &upper_dev->all_adj_list.upper, list) {
if (i == to_i && j == to_j)
break;
__netdev_adjacent_dev_unlink(i->dev, j->dev);
@@ -4738,7 +4876,7 @@ rollback_mesh:
break;
}
- __netdev_adjacent_dev_unlink(dev, upper_dev);
+ __netdev_adjacent_dev_unlink_neighbour(dev, upper_dev);
return ret;
}
@@ -4756,7 +4894,7 @@ rollback_mesh:
int netdev_upper_dev_link(struct net_device *dev,
struct net_device *upper_dev)
{
- return __netdev_upper_dev_link(dev, upper_dev, false);
+ return __netdev_upper_dev_link(dev, upper_dev, false, NULL);
}
EXPORT_SYMBOL(netdev_upper_dev_link);
@@ -4774,10 +4912,18 @@ EXPORT_SYMBOL(netdev_upper_dev_link);
int netdev_master_upper_dev_link(struct net_device *dev,
struct net_device *upper_dev)
{
- return __netdev_upper_dev_link(dev, upper_dev, true);
+ return __netdev_upper_dev_link(dev, upper_dev, true, NULL);
}
EXPORT_SYMBOL(netdev_master_upper_dev_link);
+int netdev_master_upper_dev_link_private(struct net_device *dev,
+ struct net_device *upper_dev,
+ void *private)
+{
+ return __netdev_upper_dev_link(dev, upper_dev, true, private);
+}
+EXPORT_SYMBOL(netdev_master_upper_dev_link_private);
+
/**
* netdev_upper_dev_unlink - Removes a link to upper device
* @dev: device
@@ -4792,29 +4938,59 @@ void netdev_upper_dev_unlink(struct net_device *dev,
struct netdev_adjacent *i, *j;
ASSERT_RTNL();
- __netdev_adjacent_dev_unlink(dev, upper_dev);
+ __netdev_adjacent_dev_unlink_neighbour(dev, upper_dev);
/* Here is the tricky part. We must remove all dev's lower
* devices from all upper_dev's upper devices and vice
* versa, to maintain the graph relationship.
*/
- list_for_each_entry(i, &dev->lower_dev_list, list)
- list_for_each_entry(j, &upper_dev->upper_dev_list, list)
+ list_for_each_entry(i, &dev->all_adj_list.lower, list)
+ list_for_each_entry(j, &upper_dev->all_adj_list.upper, list)
__netdev_adjacent_dev_unlink(i->dev, j->dev);
/* remove also the devices itself from lower/upper device
* list
*/
- list_for_each_entry(i, &dev->lower_dev_list, list)
+ list_for_each_entry(i, &dev->all_adj_list.lower, list)
__netdev_adjacent_dev_unlink(i->dev, upper_dev);
- list_for_each_entry(i, &upper_dev->upper_dev_list, list)
+ list_for_each_entry(i, &upper_dev->all_adj_list.upper, list)
__netdev_adjacent_dev_unlink(dev, i->dev);
call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
}
EXPORT_SYMBOL(netdev_upper_dev_unlink);
+void *netdev_lower_dev_get_private_rcu(struct net_device *dev,
+ struct net_device *lower_dev)
+{
+ struct netdev_adjacent *lower;
+
+ if (!lower_dev)
+ return NULL;
+ lower = __netdev_find_adj_rcu(dev, lower_dev, &dev->adj_list.lower);
+ if (!lower)
+ return NULL;
+
+ return lower->private;
+}
+EXPORT_SYMBOL(netdev_lower_dev_get_private_rcu);
+
+void *netdev_lower_dev_get_private(struct net_device *dev,
+ struct net_device *lower_dev)
+{
+ struct netdev_adjacent *lower;
+
+ if (!lower_dev)
+ return NULL;
+ lower = __netdev_find_adj(dev, lower_dev, &dev->adj_list.lower);
+ if (!lower)
+ return NULL;
+
+ return lower->private;
+}
+EXPORT_SYMBOL(netdev_lower_dev_get_private);
+
static void dev_change_rx_flags(struct net_device *dev, int flags)
{
const struct net_device_ops *ops = dev->netdev_ops;
@@ -4823,7 +4999,7 @@ static void dev_change_rx_flags(struct net_device *dev, int flags)
ops->ndo_change_rx_flags(dev, flags);
}
-static int __dev_set_promiscuity(struct net_device *dev, int inc)
+static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify)
{
unsigned int old_flags = dev->flags;
kuid_t uid;
@@ -4866,6 +5042,8 @@ static int __dev_set_promiscuity(struct net_device *dev, int inc)
dev_change_rx_flags(dev, IFF_PROMISC);
}
+ if (notify)
+ __dev_notify_flags(dev, old_flags, IFF_PROMISC);
return 0;
}
@@ -4885,7 +5063,7 @@ int dev_set_promiscuity(struct net_device *dev, int inc)
unsigned int old_flags = dev->flags;
int err;
- err = __dev_set_promiscuity(dev, inc);
+ err = __dev_set_promiscuity(dev, inc, true);
if (err < 0)
return err;
if (dev->flags != old_flags)
@@ -4894,22 +5072,9 @@ int dev_set_promiscuity(struct net_device *dev, int inc)
}
EXPORT_SYMBOL(dev_set_promiscuity);
-/**
- * dev_set_allmulti - update allmulti count on a device
- * @dev: device
- * @inc: modifier
- *
- * Add or remove reception of all multicast frames to a device. While the
- * count in the device remains above zero the interface remains listening
- * to all interfaces. Once it hits zero the device reverts back to normal
- * filtering operation. A negative @inc value is used to drop the counter
- * when releasing a resource needing all multicasts.
- * Return 0 if successful or a negative errno code on error.
- */
-
-int dev_set_allmulti(struct net_device *dev, int inc)
+static int __dev_set_allmulti(struct net_device *dev, int inc, bool notify)
{
- unsigned int old_flags = dev->flags;
+ unsigned int old_flags = dev->flags, old_gflags = dev->gflags;
ASSERT_RTNL();
@@ -4932,9 +5097,30 @@ int dev_set_allmulti(struct net_device *dev, int inc)
if (dev->flags ^ old_flags) {
dev_change_rx_flags(dev, IFF_ALLMULTI);
dev_set_rx_mode(dev);
+ if (notify)
+ __dev_notify_flags(dev, old_flags,
+ dev->gflags ^ old_gflags);
}
return 0;
}
+
+/**
+ * dev_set_allmulti - update allmulti count on a device
+ * @dev: device
+ * @inc: modifier
+ *
+ * Add or remove reception of all multicast frames to a device. While the
+ * count in the device remains above zero the interface remains listening
+ * to all interfaces. Once it hits zero the device reverts back to normal
+ * filtering operation. A negative @inc value is used to drop the counter
+ * when releasing a resource needing all multicasts.
+ * Return 0 if successful or a negative errno code on error.
+ */
+
+int dev_set_allmulti(struct net_device *dev, int inc)
+{
+ return __dev_set_allmulti(dev, inc, true);
+}
EXPORT_SYMBOL(dev_set_allmulti);
/*
@@ -4959,10 +5145,10 @@ void __dev_set_rx_mode(struct net_device *dev)
* therefore calling __dev_set_promiscuity here is safe.
*/
if (!netdev_uc_empty(dev) && !dev->uc_promisc) {
- __dev_set_promiscuity(dev, 1);
+ __dev_set_promiscuity(dev, 1, false);
dev->uc_promisc = true;
} else if (netdev_uc_empty(dev) && dev->uc_promisc) {
- __dev_set_promiscuity(dev, -1);
+ __dev_set_promiscuity(dev, -1, false);
dev->uc_promisc = false;
}
}
@@ -5051,9 +5237,13 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags)
if ((flags ^ dev->gflags) & IFF_PROMISC) {
int inc = (flags & IFF_PROMISC) ? 1 : -1;
+ unsigned int old_flags = dev->flags;
dev->gflags ^= IFF_PROMISC;
- dev_set_promiscuity(dev, inc);
+
+ if (__dev_set_promiscuity(dev, inc, false) >= 0)
+ if (dev->flags != old_flags)
+ dev_set_rx_mode(dev);
}
/* NOTE: order of synchronization of IFF_PROMISC and IFF_ALLMULTI
@@ -5064,16 +5254,20 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags)
int inc = (flags & IFF_ALLMULTI) ? 1 : -1;
dev->gflags ^= IFF_ALLMULTI;
- dev_set_allmulti(dev, inc);
+ __dev_set_allmulti(dev, inc, false);
}
return ret;
}
-void __dev_notify_flags(struct net_device *dev, unsigned int old_flags)
+void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
+ unsigned int gchanges)
{
unsigned int changes = dev->flags ^ old_flags;
+ if (gchanges)
+ rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
+
if (changes & IFF_UP) {
if (dev->flags & IFF_UP)
call_netdevice_notifiers(NETDEV_UP, dev);
@@ -5102,17 +5296,14 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags)
int dev_change_flags(struct net_device *dev, unsigned int flags)
{
int ret;
- unsigned int changes, old_flags = dev->flags;
+ unsigned int changes, old_flags = dev->flags, old_gflags = dev->gflags;
ret = __dev_change_flags(dev, flags);
if (ret < 0)
return ret;
- changes = old_flags ^ dev->flags;
- if (changes)
- rtmsg_ifinfo(RTM_NEWLINK, dev, changes);
-
- __dev_notify_flags(dev, old_flags);
+ changes = (old_flags ^ dev->flags) | (old_gflags ^ dev->gflags);
+ __dev_notify_flags(dev, old_flags, changes);
return ret;
}
EXPORT_SYMBOL(dev_change_flags);
@@ -5259,6 +5450,7 @@ static void net_set_todo(struct net_device *dev)
static void rollback_registered_many(struct list_head *head)
{
struct net_device *dev, *tmp;
+ LIST_HEAD(close_head);
BUG_ON(dev_boot_phase);
ASSERT_RTNL();
@@ -5281,7 +5473,9 @@ static void rollback_registered_many(struct list_head *head)
}
/* If device is running, close it first. */
- dev_close_many(head);
+ list_for_each_entry(dev, head, unreg_list)
+ list_add_tail(&dev->close_list, &close_head);
+ dev_close_many(&close_head);
list_for_each_entry(dev, head, unreg_list) {
/* And unlink it from device chain. */
@@ -5304,7 +5498,7 @@ static void rollback_registered_many(struct list_head *head)
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
- rtmsg_ifinfo(RTM_DELLINK, dev, ~0U);
+ rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
/*
* Flush the unicast and multicast chains
@@ -5703,7 +5897,7 @@ int register_netdevice(struct net_device *dev)
*/
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
- rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U);
+ rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
out:
return ret;
@@ -6010,6 +6204,16 @@ void netdev_set_default_ethtool_ops(struct net_device *dev,
}
EXPORT_SYMBOL_GPL(netdev_set_default_ethtool_ops);
+void netdev_freemem(struct net_device *dev)
+{
+ char *addr = (char *)dev - dev->padded;
+
+ if (is_vmalloc_addr(addr))
+ vfree(addr);
+ else
+ kfree(addr);
+}
+
/**
* alloc_netdev_mqs - allocate network device
* @sizeof_priv: size of private data to allocate space for
@@ -6053,7 +6257,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
/* ensure 32-byte alignment of whole construct */
alloc_size += NETDEV_ALIGN - 1;
- p = kzalloc(alloc_size, GFP_KERNEL);
+ p = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
+ if (!p)
+ p = vzalloc(alloc_size);
if (!p)
return NULL;
@@ -6062,7 +6268,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
dev->pcpu_refcnt = alloc_percpu(int);
if (!dev->pcpu_refcnt)
- goto free_p;
+ goto free_dev;
if (dev_addr_init(dev))
goto free_pcpu;
@@ -6077,9 +6283,12 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
INIT_LIST_HEAD(&dev->napi_list);
INIT_LIST_HEAD(&dev->unreg_list);
+ INIT_LIST_HEAD(&dev->close_list);
INIT_LIST_HEAD(&dev->link_watch_list);
- INIT_LIST_HEAD(&dev->upper_dev_list);
- INIT_LIST_HEAD(&dev->lower_dev_list);
+ INIT_LIST_HEAD(&dev->adj_list.upper);
+ INIT_LIST_HEAD(&dev->adj_list.lower);
+ INIT_LIST_HEAD(&dev->all_adj_list.upper);
+ INIT_LIST_HEAD(&dev->all_adj_list.lower);
dev->priv_flags = IFF_XMIT_DST_RELEASE;
setup(dev);
@@ -6112,8 +6321,8 @@ free_pcpu:
kfree(dev->_rx);
#endif
-free_p:
- kfree(p);
+free_dev:
+ netdev_freemem(dev);
return NULL;
}
EXPORT_SYMBOL(alloc_netdev_mqs);
@@ -6150,7 +6359,7 @@ void free_netdev(struct net_device *dev)
/* Compatibility with error handling in drivers */
if (dev->reg_state == NETREG_UNINITIALIZED) {
- kfree((char *)dev - dev->padded);
+ netdev_freemem(dev);
return;
}
@@ -6312,7 +6521,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
rcu_barrier();
call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
- rtmsg_ifinfo(RTM_DELLINK, dev, ~0U);
+ rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
/*
* Flush the unicast and multicast chains
@@ -6351,7 +6560,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
* Prevent userspace races by waiting until the network
* device is fully setup before sending notifications.
*/
- rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U);
+ rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
synchronize_net();
err = 0;
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index 6cda4e2c213..ec40a849fc4 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -752,7 +752,7 @@ int dev_mc_del_global(struct net_device *dev, const unsigned char *addr)
EXPORT_SYMBOL(dev_mc_del_global);
/**
- * dev_mc_sync - Synchronize device's unicast list to another device
+ * dev_mc_sync - Synchronize device's multicast list to another device
* @to: destination device
* @from: source device
*
@@ -780,7 +780,7 @@ int dev_mc_sync(struct net_device *to, struct net_device *from)
EXPORT_SYMBOL(dev_mc_sync);
/**
- * dev_mc_sync_multiple - Synchronize device's unicast list to another
+ * dev_mc_sync_multiple - Synchronize device's multicast list to another
* device, but allow for multiple calls to sync to multiple devices.
* @to: destination device
* @from: source device
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 78e9d9223e4..30071dec287 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -81,6 +81,8 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation",
[NETIF_F_FSO_BIT] = "tx-fcoe-segmentation",
[NETIF_F_GSO_GRE_BIT] = "tx-gre-segmentation",
+ [NETIF_F_GSO_IPIP_BIT] = "tx-ipip-segmentation",
+ [NETIF_F_GSO_SIT_BIT] = "tx-sit-segmentation",
[NETIF_F_GSO_UDP_TUNNEL_BIT] = "tx-udp_tnl-segmentation",
[NETIF_F_GSO_MPLS_BIT] = "tx-mpls-segmentation",
@@ -94,6 +96,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_LOOPBACK_BIT] = "loopback",
[NETIF_F_RXFCS_BIT] = "rx-fcs",
[NETIF_F_RXALL_BIT] = "rx-all",
+ [NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
};
static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 2e654138433..f409e0bd35c 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -460,7 +460,8 @@ static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh)
if (frh->action && (frh->action != rule->action))
continue;
- if (frh->table && (frh_get_table(frh, tb) != rule->table))
+ if (frh_get_table(frh, tb) &&
+ (frh_get_table(frh, tb) != rule->table))
continue;
if (tb[FRA_PRIORITY] &&
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 143b6fdb964..d6ef1732250 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -25,9 +25,35 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
memcpy(&flow->src, &iph->saddr, sizeof(flow->src) + sizeof(flow->dst));
}
+/**
+ * skb_flow_get_ports - extract the upper layer ports and return them
+ * @skb: buffer to extract the ports from
+ * @thoff: transport header offset
+ * @ip_proto: protocol for which to get port offset
+ *
+ * The function will try to retrieve the ports at offset thoff + poff where poff
+ * is the protocol port offset returned from proto_ports_offset
+ */
+__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto)
+{
+ int poff = proto_ports_offset(ip_proto);
+
+ if (poff >= 0) {
+ __be32 *ports, _ports;
+
+ ports = skb_header_pointer(skb, thoff + poff,
+ sizeof(_ports), &_ports);
+ if (ports)
+ return *ports;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(skb_flow_get_ports);
+
bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow)
{
- int poff, nhoff = skb_network_offset(skb);
+ int nhoff = skb_network_offset(skb);
u8 ip_proto;
__be16 proto = skb->protocol;
@@ -42,13 +68,13 @@ ip:
iph = skb_header_pointer(skb, nhoff, sizeof(_iph), &_iph);
if (!iph || iph->ihl < 5)
return false;
+ nhoff += iph->ihl * 4;
+ ip_proto = iph->protocol;
if (ip_is_fragment(iph))
ip_proto = 0;
- else
- ip_proto = iph->protocol;
+
iph_to_flow_copy_addrs(flow, iph);
- nhoff += iph->ihl * 4;
break;
}
case __constant_htons(ETH_P_IPV6): {
@@ -150,16 +176,7 @@ ipv6:
}
flow->ip_proto = ip_proto;
- poff = proto_ports_offset(ip_proto);
- if (poff >= 0) {
- __be32 *ports, _ports;
-
- ports = skb_header_pointer(skb, nhoff + poff,
- sizeof(_ports), &_ports);
- if (ports)
- flow->ports = *ports;
- }
-
+ flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto);
flow->thoff = (u16) nhoff;
return true;
@@ -167,6 +184,22 @@ ipv6:
EXPORT_SYMBOL(skb_flow_dissect);
static u32 hashrnd __read_mostly;
+static __always_inline void __flow_hash_secret_init(void)
+{
+ net_get_random_once(&hashrnd, sizeof(hashrnd));
+}
+
+static __always_inline u32 __flow_hash_3words(u32 a, u32 b, u32 c)
+{
+ __flow_hash_secret_init();
+ return jhash_3words(a, b, c, hashrnd);
+}
+
+static __always_inline u32 __flow_hash_1word(u32 a)
+{
+ __flow_hash_secret_init();
+ return jhash_1word(a, hashrnd);
+}
/*
* __skb_get_rxhash: calculate a flow hash based on src/dst addresses
@@ -193,9 +226,9 @@ void __skb_get_rxhash(struct sk_buff *skb)
swap(keys.port16[0], keys.port16[1]);
}
- hash = jhash_3words((__force u32)keys.dst,
- (__force u32)keys.src,
- (__force u32)keys.ports, hashrnd);
+ hash = __flow_hash_3words((__force u32)keys.dst,
+ (__force u32)keys.src,
+ (__force u32)keys.ports);
if (!hash)
hash = 1;
@@ -231,7 +264,7 @@ u16 __skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb,
hash = skb->sk->sk_hash;
else
hash = (__force u16) skb->protocol;
- hash = jhash_1word(hash, hashrnd);
+ hash = __flow_hash_1word(hash);
return (u16) (((u64) hash * qcount) >> 32) + qoffset;
}
@@ -323,7 +356,7 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
else
hash = (__force u16) skb->protocol ^
skb->rxhash;
- hash = jhash_1word(hash, hashrnd);
+ hash = __flow_hash_1word(hash);
queue_index = map->queues[
((u64)hash * map->len) >> 32];
}
@@ -378,11 +411,3 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
skb_set_queue_mapping(skb, queue_index);
return netdev_get_tx_queue(dev, queue_index);
}
-
-static int __init initialize_hashrnd(void)
-{
- get_random_bytes(&hashrnd, sizeof(hashrnd));
- return 0;
-}
-
-late_initcall_sync(initialize_hashrnd);
diff --git a/net/core/iovec.c b/net/core/iovec.c
index b77eeecc001..4cdb7c48dad 100644
--- a/net/core/iovec.c
+++ b/net/core/iovec.c
@@ -100,7 +100,7 @@ int memcpy_toiovecend(const struct iovec *iov, unsigned char *kdata,
EXPORT_SYMBOL(memcpy_toiovecend);
/*
- * Copy iovec from kernel. Returns -EFAULT on error.
+ * Copy iovec to kernel. Returns -EFAULT on error.
*/
int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 6072610a867..ca15f32821f 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -867,7 +867,7 @@ static void neigh_invalidate(struct neighbour *neigh)
static void neigh_probe(struct neighbour *neigh)
__releases(neigh->lock)
{
- struct sk_buff *skb = skb_peek(&neigh->arp_queue);
+ struct sk_buff *skb = skb_peek_tail(&neigh->arp_queue);
/* keep skb alive even if arp_queue overflows */
if (skb)
skb = skb_copy(skb, GFP_ATOMIC);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 325dee863e4..f3edf9635e0 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1263,7 +1263,7 @@ static void netdev_release(struct device *d)
BUG_ON(dev->reg_state != NETREG_RELEASED);
kfree(dev->ifalias);
- kfree((char *)dev - dev->padded);
+ netdev_freemem(dev);
}
static const void *net_namespace(struct device *d)
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index d9cd627e6a1..9b7cf6c85f8 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -222,11 +222,10 @@ static void net_prio_attach(struct cgroup_subsys_state *css,
struct cgroup_taskset *tset)
{
struct task_struct *p;
- void *v;
+ void *v = (void *)(unsigned long)css->cgroup->id;
cgroup_taskset_for_each(p, css, tset) {
task_lock(p);
- v = (void *)(unsigned long)task_netprioidx(p);
iterate_fd(p->files, 0, update_netprio, v);
task_unlock(p);
}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2a0e21de306..cf67144d3e3 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1647,9 +1647,8 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm)
}
dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
- rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U);
- __dev_notify_flags(dev, old_flags);
+ __dev_notify_flags(dev, old_flags, ~0U);
return 0;
}
EXPORT_SYMBOL(rtnl_configure_link);
@@ -1985,14 +1984,15 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
return skb->len;
}
-void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change)
+void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
+ gfp_t flags)
{
struct net *net = dev_net(dev);
struct sk_buff *skb;
int err = -ENOBUFS;
size_t if_info_size;
- skb = nlmsg_new((if_info_size = if_nlmsg_size(dev, 0)), GFP_KERNEL);
+ skb = nlmsg_new((if_info_size = if_nlmsg_size(dev, 0)), flags);
if (skb == NULL)
goto errout;
@@ -2003,7 +2003,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change)
kfree_skb(skb);
goto errout;
}
- rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
+ rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
return;
errout:
if (err < 0)
@@ -2717,7 +2717,7 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
case NETDEV_JOIN:
break;
default:
- rtmsg_ifinfo(RTM_NEWLINK, dev, 0);
+ rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
break;
}
return NOTIFY_DONE;
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 8d9d05edd2e..897da56f3af 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -7,6 +7,7 @@
#include <linux/hrtimer.h>
#include <linux/ktime.h>
#include <linux/string.h>
+#include <linux/net.h>
#include <net/secure_seq.h>
@@ -15,20 +16,9 @@
static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
-static void net_secret_init(void)
+static __always_inline void net_secret_init(void)
{
- u32 tmp;
- int i;
-
- if (likely(net_secret[0]))
- return;
-
- for (i = NET_SECRET_SIZE; i > 0;) {
- do {
- get_random_bytes(&tmp, sizeof(tmp));
- } while (!tmp);
- cmpxchg(&net_secret[--i], 0, tmp);
- }
+ net_get_random_once(net_secret, sizeof(net_secret));
}
#endif
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d81cff119f7..8cec1e6b844 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -476,6 +476,18 @@ void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
}
EXPORT_SYMBOL(skb_add_rx_frag);
+void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size,
+ unsigned int truesize)
+{
+ skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+ skb_frag_size_add(frag, size);
+ skb->len += size;
+ skb->data_len += size;
+ skb->truesize += truesize;
+}
+EXPORT_SYMBOL(skb_coalesce_rx_frag);
+
static void skb_drop_list(struct sk_buff **listp)
{
kfree_skb_list(*listp);
@@ -580,9 +592,6 @@ static void skb_release_head_state(struct sk_buff *skb)
#if IS_ENABLED(CONFIG_NF_CONNTRACK)
nf_conntrack_put(skb->nfct);
#endif
-#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
- nf_conntrack_put_reasm(skb->nfct_reasm);
-#endif
#ifdef CONFIG_BRIDGE_NETFILTER
nf_bridge_put(skb->nf_bridge);
#endif
@@ -903,6 +912,9 @@ EXPORT_SYMBOL(skb_clone);
static void skb_headers_offset_update(struct sk_buff *skb, int off)
{
+ /* Only adjust this if it actually is csum_start rather than csum */
+ if (skb->ip_summed == CHECKSUM_PARTIAL)
+ skb->csum_start += off;
/* {transport,network,mac}_header and tail are relative to skb->head */
skb->transport_header += off;
skb->network_header += off;
@@ -1036,8 +1048,8 @@ EXPORT_SYMBOL(__pskb_copy);
* @ntail: room to add at tail
* @gfp_mask: allocation priority
*
- * Expands (or creates identical copy, if &nhead and &ntail are zero)
- * header of skb. &sk_buff itself is not changed. &sk_buff MUST have
+ * Expands (or creates identical copy, if @nhead and @ntail are zero)
+ * header of @skb. &sk_buff itself is not changed. &sk_buff MUST have
* reference count of 1. Returns zero in the case of success or error,
* if expansion failed. In the last case, &sk_buff is not changed.
*
@@ -1109,9 +1121,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
#endif
skb->tail += off;
skb_headers_offset_update(skb, nhead);
- /* Only adjust this if it actually is csum_start rather than csum */
- if (skb->ip_summed == CHECKSUM_PARTIAL)
- skb->csum_start += nhead;
skb->cloned = 0;
skb->hdr_len = 0;
skb->nohdr = 0;
@@ -1176,7 +1185,6 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb,
NUMA_NO_NODE);
int oldheadroom = skb_headroom(skb);
int head_copy_len, head_copy_off;
- int off;
if (!n)
return NULL;
@@ -1200,11 +1208,7 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb,
copy_skb_header(n, skb);
- off = newheadroom - oldheadroom;
- if (n->ip_summed == CHECKSUM_PARTIAL)
- n->csum_start += off;
-
- skb_headers_offset_update(n, off);
+ skb_headers_offset_update(n, newheadroom - oldheadroom);
return n;
}
@@ -1257,6 +1261,29 @@ free_skb:
EXPORT_SYMBOL(skb_pad);
/**
+ * pskb_put - add data to the tail of a potentially fragmented buffer
+ * @skb: start of the buffer to use
+ * @tail: tail fragment of the buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the potentially
+ * fragmented buffer. @tail must be the last fragment of @skb -- or
+ * @skb itself. If this would exceed the total buffer size the kernel
+ * will panic. A pointer to the first byte of the extra data is
+ * returned.
+ */
+
+unsigned char *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len)
+{
+ if (tail != skb) {
+ skb->data_len += len;
+ skb->len += len;
+ }
+ return skb_put(tail, len);
+}
+EXPORT_SYMBOL_GPL(pskb_put);
+
+/**
* skb_put - add data to a buffer
* @skb: buffer to use
* @len: amount of data to add
@@ -1933,9 +1960,8 @@ fault:
EXPORT_SYMBOL(skb_store_bits);
/* Checksum skb data. */
-
-__wsum skb_checksum(const struct sk_buff *skb, int offset,
- int len, __wsum csum)
+__wsum __skb_checksum(const struct sk_buff *skb, int offset, int len,
+ __wsum csum, const struct skb_checksum_ops *ops)
{
int start = skb_headlen(skb);
int i, copy = start - offset;
@@ -1946,7 +1972,7 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset,
if (copy > 0) {
if (copy > len)
copy = len;
- csum = csum_partial(skb->data + offset, copy, csum);
+ csum = ops->update(skb->data + offset, copy, csum);
if ((len -= copy) == 0)
return csum;
offset += copy;
@@ -1967,10 +1993,10 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset,
if (copy > len)
copy = len;
vaddr = kmap_atomic(skb_frag_page(frag));
- csum2 = csum_partial(vaddr + frag->page_offset +
- offset - start, copy, 0);
+ csum2 = ops->update(vaddr + frag->page_offset +
+ offset - start, copy, 0);
kunmap_atomic(vaddr);
- csum = csum_block_add(csum, csum2, pos);
+ csum = ops->combine(csum, csum2, pos, copy);
if (!(len -= copy))
return csum;
offset += copy;
@@ -1989,9 +2015,9 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset,
__wsum csum2;
if (copy > len)
copy = len;
- csum2 = skb_checksum(frag_iter, offset - start,
- copy, 0);
- csum = csum_block_add(csum, csum2, pos);
+ csum2 = __skb_checksum(frag_iter, offset - start,
+ copy, 0, ops);
+ csum = ops->combine(csum, csum2, pos, copy);
if ((len -= copy) == 0)
return csum;
offset += copy;
@@ -2003,6 +2029,18 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset,
return csum;
}
+EXPORT_SYMBOL(__skb_checksum);
+
+__wsum skb_checksum(const struct sk_buff *skb, int offset,
+ int len, __wsum csum)
+{
+ const struct skb_checksum_ops ops = {
+ .update = csum_partial_ext,
+ .combine = csum_block_add_ext,
+ };
+
+ return __skb_checksum(skb, offset, len, csum, &ops);
+}
EXPORT_SYMBOL(skb_checksum);
/* Both of above in one bottle. */
@@ -2522,14 +2560,14 @@ EXPORT_SYMBOL(skb_prepare_seq_read);
* @data: destination pointer for data to be returned
* @st: state variable
*
- * Reads a block of skb data at &consumed relative to the
+ * Reads a block of skb data at @consumed relative to the
* lower offset specified to skb_prepare_seq_read(). Assigns
- * the head of the data block to &data and returns the length
+ * the head of the data block to @data and returns the length
* of the block or 0 if the end of the skb data or the upper
* offset has been reached.
*
* The caller is not required to consume all of the data
- * returned, i.e. &consumed is typically set to the number
+ * returned, i.e. @consumed is typically set to the number
* of bytes already consumed and the next call to
* skb_seq_read() will return the remaining part of the block.
*
@@ -2837,14 +2875,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
__copy_skb_header(nskb, skb);
nskb->mac_len = skb->mac_len;
- /* nskb and skb might have different headroom */
- if (nskb->ip_summed == CHECKSUM_PARTIAL)
- nskb->csum_start += skb_headroom(nskb) - headroom;
-
- skb_reset_mac_header(nskb);
- skb_set_network_header(nskb, skb->mac_len);
- nskb->transport_header = (nskb->network_header +
- skb_network_header_len(skb));
+ skb_headers_offset_update(nskb, skb_headroom(nskb) - headroom);
skb_copy_from_linear_data_offset(skb, -tnl_hlen,
nskb->data - tnl_hlen,
@@ -2936,32 +2967,30 @@ EXPORT_SYMBOL_GPL(skb_segment);
int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
{
- struct sk_buff *p = *head;
- struct sk_buff *nskb;
- struct skb_shared_info *skbinfo = skb_shinfo(skb);
- struct skb_shared_info *pinfo = skb_shinfo(p);
- unsigned int headroom;
- unsigned int len = skb_gro_len(skb);
+ struct skb_shared_info *pinfo, *skbinfo = skb_shinfo(skb);
unsigned int offset = skb_gro_offset(skb);
unsigned int headlen = skb_headlen(skb);
+ struct sk_buff *nskb, *lp, *p = *head;
+ unsigned int len = skb_gro_len(skb);
unsigned int delta_truesize;
+ unsigned int headroom;
- if (p->len + len >= 65536)
+ if (unlikely(p->len + len >= 65536))
return -E2BIG;
- if (pinfo->frag_list)
- goto merge;
- else if (headlen <= offset) {
+ lp = NAPI_GRO_CB(p)->last ?: p;
+ pinfo = skb_shinfo(lp);
+
+ if (headlen <= offset) {
skb_frag_t *frag;
skb_frag_t *frag2;
int i = skbinfo->nr_frags;
int nr_frags = pinfo->nr_frags + i;
- offset -= headlen;
-
if (nr_frags > MAX_SKB_FRAGS)
- return -E2BIG;
+ goto merge;
+ offset -= headlen;
pinfo->nr_frags = nr_frags;
skbinfo->nr_frags = 0;
@@ -2992,7 +3021,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
unsigned int first_offset;
if (nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS)
- return -E2BIG;
+ goto merge;
first_offset = skb->data -
(unsigned char *)page_address(page) +
@@ -3010,7 +3039,10 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
delta_truesize = skb->truesize - SKB_DATA_ALIGN(sizeof(struct sk_buff));
NAPI_GRO_CB(skb)->free = NAPI_GRO_FREE_STOLEN_HEAD;
goto done;
- } else if (skb_gro_len(p) != pinfo->gso_size)
+ }
+ if (pinfo->frag_list)
+ goto merge;
+ if (skb_gro_len(p) != pinfo->gso_size)
return -E2BIG;
headroom = skb_headroom(p);
@@ -3062,16 +3094,24 @@ merge:
__skb_pull(skb, offset);
- NAPI_GRO_CB(p)->last->next = skb;
+ if (!NAPI_GRO_CB(p)->last)
+ skb_shinfo(p)->frag_list = skb;
+ else
+ NAPI_GRO_CB(p)->last->next = skb;
NAPI_GRO_CB(p)->last = skb;
skb_header_release(skb);
+ lp = p;
done:
NAPI_GRO_CB(p)->count++;
p->data_len += len;
p->truesize += delta_truesize;
p->len += len;
-
+ if (lp != p) {
+ lp->data_len += len;
+ lp->truesize += delta_truesize;
+ lp->len += len;
+ }
NAPI_GRO_CB(skb)->same_flow = 1;
return 0;
}
diff --git a/net/core/sock.c b/net/core/sock.c
index 0b39e7ae438..ab20ed9b0f3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -475,12 +475,6 @@ discard_and_relse:
}
EXPORT_SYMBOL(sk_receive_skb);
-void sk_reset_txq(struct sock *sk)
-{
- sk_tx_queue_clear(sk);
-}
-EXPORT_SYMBOL(sk_reset_txq);
-
struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
{
struct dst_entry *dst = __sk_dst_get(sk);
@@ -914,6 +908,13 @@ set_rcvbuf:
}
break;
#endif
+
+ case SO_MAX_PACING_RATE:
+ sk->sk_max_pacing_rate = val;
+ sk->sk_pacing_rate = min(sk->sk_pacing_rate,
+ sk->sk_max_pacing_rate);
+ break;
+
default:
ret = -ENOPROTOOPT;
break;
@@ -1177,6 +1178,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
break;
#endif
+ case SO_MAX_PACING_RATE:
+ v.val = sk->sk_max_pacing_rate;
+ break;
+
default:
return -ENOPROTOOPT;
}
@@ -1836,7 +1841,17 @@ EXPORT_SYMBOL(sock_alloc_send_skb);
/* On 32bit arches, an skb frag is limited to 2^15 */
#define SKB_FRAG_PAGE_ORDER get_order(32768)
-bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
+/**
+ * skb_page_frag_refill - check that a page_frag contains enough room
+ * @sz: minimum size of the fragment we want to get
+ * @pfrag: pointer to page_frag
+ * @prio: priority for memory allocation
+ *
+ * Note: While this allocator tries to use high order pages, there is
+ * no guarantee that allocations succeed. Therefore, @sz MUST be
+ * less or equal than PAGE_SIZE.
+ */
+bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
{
int order;
@@ -1845,16 +1860,16 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
pfrag->offset = 0;
return true;
}
- if (pfrag->offset < pfrag->size)
+ if (pfrag->offset + sz <= pfrag->size)
return true;
put_page(pfrag->page);
}
/* We restrict high order allocations to users that can afford to wait */
- order = (sk->sk_allocation & __GFP_WAIT) ? SKB_FRAG_PAGE_ORDER : 0;
+ order = (prio & __GFP_WAIT) ? SKB_FRAG_PAGE_ORDER : 0;
do {
- gfp_t gfp = sk->sk_allocation;
+ gfp_t gfp = prio;
if (order)
gfp |= __GFP_COMP | __GFP_NOWARN;
@@ -1866,6 +1881,15 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
}
} while (--order >= 0);
+ return false;
+}
+EXPORT_SYMBOL(skb_page_frag_refill);
+
+bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
+{
+ if (likely(skb_page_frag_refill(32U, pfrag, sk->sk_allocation)))
+ return true;
+
sk_enter_memory_pressure(sk);
sk_stream_moderate_sndbuf(sk);
return false;
@@ -2319,6 +2343,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
sk->sk_ll_usec = sysctl_net_busy_read;
#endif
+ sk->sk_max_pacing_rate = ~0U;
sk->sk_pacing_rate = ~0U;
/*
* Before updating sk_refcnt, we must commit prior changes to memory
diff --git a/net/core/utils.c b/net/core/utils.c
index aa88e23fc87..2f737bf90b3 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -338,3 +338,52 @@ void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
csum_unfold(*sum)));
}
EXPORT_SYMBOL(inet_proto_csum_replace16);
+
+struct __net_random_once_work {
+ struct work_struct work;
+ struct static_key *key;
+};
+
+static void __net_random_once_deferred(struct work_struct *w)
+{
+ struct __net_random_once_work *work =
+ container_of(w, struct __net_random_once_work, work);
+ if (!static_key_enabled(work->key))
+ static_key_slow_inc(work->key);
+ kfree(work);
+}
+
+static void __net_random_once_disable_jump(struct static_key *key)
+{
+ struct __net_random_once_work *w;
+
+ w = kmalloc(sizeof(*w), GFP_ATOMIC);
+ if (!w)
+ return;
+
+ INIT_WORK(&w->work, __net_random_once_deferred);
+ w->key = key;
+ schedule_work(&w->work);
+}
+
+bool __net_get_random_once(void *buf, int nbytes, bool *done,
+ struct static_key *done_key)
+{
+ static DEFINE_SPINLOCK(lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&lock, flags);
+ if (*done) {
+ spin_unlock_irqrestore(&lock, flags);
+ return false;
+ }
+
+ get_random_bytes(buf, nbytes);
+ *done = true;
+ spin_unlock_irqrestore(&lock, flags);
+
+ __net_random_once_disable_jump(done_key);
+
+ return true;
+}
+EXPORT_SYMBOL(__net_get_random_once);