aboutsummaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)Author
2012-01-09net: Fix build with INET disabled.David S. Miller
> net/core/sock.c: In function 'sk_update_clone': > net/core/sock.c:1278:3: error: implicit declaration of function 'sock_update_memcg' Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09net: introduce netif_addr_lock_nested() and call if when appropriateJiri Pirko
dev_uc_sync() and dev_mc_sync() are acquiring netif_addr_lock for destination device of synchronization. Since netif_addr_lock is already held at the time for source device, this triggers lockdep deadlock warning. There's no way this deadlock can happen so use spin_lock_nested() to silence the warning. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09net: correct lock name in dev_[uc/mc]_sync documentations.Jiri Pirko
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-08net: sk_update_clone is only used in net/core/sock.cStephen Rothwell
so move it there. Fixes build errors when CONFIG_INET is not defined: In file included from include/linux/tcp.h:211:0, from include/linux/ipv6.h:221, from include/net/ipv6.h:16, from include/linux/sunrpc/clnt.h:26, from include/linux/nfs_fs.h:50, from init/do_mounts.c:20: include/net/sock.h: In function 'sk_update_clone': include/net/sock.h:1109:3: error: implicit declaration of function 'sock_update_memcg' [-Werror=implicit-function-declaration] Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-07pktgen: set correct max and min in pktgen_setup_inject()Dan Carpenter
In 882716604ec "pktgen: fix multiple queue warning" we added special logic to handle the case where ntxq is zero. It's not clear to me that ntxq can actually be zero. But if it were then we would set ->queue_map_min and ->queue_map_max to USHRT_MAX when probably we want to set them to zero? Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-07net: fix sock_clone reference mismatch with tcp memcontrolGlauber Costa
Sockets can also be created through sock_clone. Because it copies all data in the sock structure, it also copies the memcg-related pointer, and all should be fine. However, since we now use reference counts in socket creation, we are left with some sockets that have no reference counts. It matters when we destroy them, since it leads to a mismatch. Signed-off-by: Glauber Costa <glommer@parallels.com> CC: David S. Miller <davem@davemloft.net> CC: Greg Thelen <gthelen@google.com> CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: Laurent Chavey <chavey@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-04ethtool: Remove ethtool_ops::set_rx_ntuple operationBen Hutchings
All implementations have been converted to implement set_rxnfc instead. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-04ethtool: Allow drivers to select RX NFC rule locationsBen Hutchings
Define special location values for RX NFC that request the driver to select the actual rule location. This allows for implementation on devices that use hash-based filter lookup, whereas currently the API is more suited to devices with TCAM lookup or linear search. In ethtool_set_rxnfc() and the compat wrapper ethtool_ioctl(), copy the structure back to user-space after insertion so that the actual location is returned. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30sock_diag: Introduce the meminfo nla core (v2)Pavel Emelyanov
Add a routine that dumps memory-related values of a socket. It's made as an array to make it possible to add more stuff here later without breaking compatibility. Since v1: The SK_MEMINFO_ constants are in userspace visible part of sock_diag.h, the rest is under __KERNEL__. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-28ipv6: Use universal hash for NDISC.David S. Miller
In order to perform a proper universal hash on a vector of integers, we have to use different universal hashes on each vector element. Which means we need 4 different hash randoms for ipv6. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-24rfs: better sizing of dev_flow_tableEric Dumazet
Aim of this patch is to provide full range of rps_flow_cnt on 64bit arches. Theorical limit on number of flows is 2^32 Fix some buggy RPS/RFS macros as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Xi Wang <xi.wang@gmail.com> CC: Laurent Chavey <chavey@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23net: relax rcvbuf limitsEric Dumazet
skb->truesize might be big even for a small packet. Its even bigger after commit 87fb4b7b533 (net: more accurate skb truesize) and big MTU. We should allow queueing at least one packet per receiver, even with a low RCVBUF setting. Reported-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt()Xi Wang
Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will cause a kernel oops due to insufficient bounds checking. if (count > 1<<30) { /* Enforce a limit to prevent overflow */ return -EINVAL; } count = roundup_pow_of_two(count); table = vmalloc(RPS_DEV_FLOW_TABLE_SIZE(count)); Note that the macro RPS_DEV_FLOW_TABLE_SIZE(count) is defined as: ... + (count * sizeof(struct rps_dev_flow)) where sizeof(struct rps_dev_flow) is 8. (1 << 30) * 8 will overflow 32 bits. This patch replaces the magic number (1 << 30) with a symbolic bound. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-21net: Add a flow_cache_flush_deferred functionSteffen Klassert
flow_cach_flush() might sleep but can be called from atomic context via the xfrm garbage collector. So add a flow_cache_flush_deferred() function and use this if the xfrm garbage colector is invoked from within the packet path. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19Revert "net: Remove unused neighbour layer ops."David S. Miller
This reverts commit 5c3ddec73d01a1fae9409c197078cb02c42238c3. S390 qeth driver actually still uses the setup ops. Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16net:core: use IS_ENABLEDIgor Maravić
Use IS_ENABLED(CONFIG_FOO) instead of defined(CONFIG_FOO) || defined (CONFIG_FOO_MODULE) Signed-off-by: Igor Maravić <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16net: fix sleeping while atomic problem in sock mem_cgroup.Glauber Costa
We can't scan the proto_list to initialize sock cgroups, as it holds a rwlock, and we also want to keep the code generic enough to avoid calling the initialization functions of protocols directly, Convert proto_list_lock into a mutex, so we can sleep and do the necessary allocations. This lock is seldom taken, so there shouldn't be any performance penalties associated with that Signed-off-by: Glauber Costa <glommer@parallels.com> CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> CC: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16ethtool: Define and apply a default policy for RX flow hash indirectionBen Hutchings
All drivers that support modification of the RX flow hash indirection table initialise it in the same way: RX rings are assigned to table entries in rotation. Make that default policy explicit by having them call a ethtool_rxfh_indir_default() function. In the ethtool core, add support for a zero size value for ETHTOOL_SRXFHINDIR, which resets the table to this default. Partly-suggested-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Shreyas N Bhatewara <sbhatewara@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16ethtool: Centralise validation of ETHTOOL_{G, S}RXFHINDIR parametersBen Hutchings
Add a new ethtool operation (get_rxfh_indir_size) to get the indirectional table size. Use this to validate the user buffer size before calling get_rxfh_indir or set_rxfh_indir. Use get_rxnfc to get the number of RX rings, and validate the contents of the new indirection table before calling set_rxfh_indir. Remove this validation from drivers. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16sock_diag: Generalize requests cookies managementsPavel Emelyanov
The sk address is used as a cookie between dump/get_exact calls. It will be required for unix socket sdumping, so move it from inet_diag to sock_diag. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16sock_diag: Fix module netlink aliasesPavel Emelyanov
I've made a mistake when fixing the sock_/inet_diag aliases :( 1. The sock_diag layer should request the family-based alias, not just the IPPROTO_IP one; 2. The inet_diag layer should request for AF_INET+protocol alias, not just the protocol one. Thus fix this. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14rtnetlink: rtnl_link_register() sanity testEric Dumazet
Before adding a struct rtnl_link_ops into link_ops list, check it doesnt clash with a prior one. Based on a previous patch from Alexander Smirnov Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13net: Remove unused neighbour layer ops.David S. Miller
It's simpler to just keep these things out until there is a real user of them, so we can see what the needs actually are, rather than keep these things around as useless overhead. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12tcp memory pressure controlsGlauber Costa
This patch introduces memory pressure controls for the tcp protocol. It uses the generic socket memory pressure code introduced in earlier patches, and fills in the necessary data in cg_proto struct. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12socket: initial cgroup code.Glauber Costa
The goal of this work is to move the memory pressure tcp controls to a cgroup, instead of just relying on global conditions. To avoid excessive overhead in the network fast paths, the code that accounts allocated memory to a cgroup is hidden inside a static_branch(). This branch is patched out until the first non-root cgroup is created. So when nobody is using cgroups, even if it is mounted, no significant performance penalty should be seen. This patch handles the generic part of the code, and has nothing tcp-specific. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtsu.com> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-12foundations of per-cgroup memory pressure controlling.Glauber Costa
This patch replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to acessor macros. Those macros can either receive a socket argument, or a mem_cgroup argument, depending on the context they live in. Since we're only doing a macro wrapping here, no performance impact at all is expected in the case where we don't have cgroups disabled. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-11net: use IS_ENABLED(CONFIG_IPV6)Eric Dumazet
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09Revert "net: netprio_cgroup: make net_prio_subsys static"John Fastabend
This reverts commit 865d9f9f748fdc1943679ea65d9ee1dc55e4a6ae. This commit breaks the build with CONFIG_NETPRIO_CGROUP=y so revert it. It does build as a module though. The SUBSYS macro in the cgroup core code automatically defines a subsys structure as extern. Long term we should fix the macro. And I need to fully build test things. Tested with CONFIG_NETPRIO_CGROUP={y|m|n} with and without CONFIG_CGROUPS defined. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> CC: Neil Horman <nhorman@tuxdriver.com> Reported-By: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08sock_diag: off by one checksDan Carpenter
These tests are off by one because sock_diag_handlers[] only has AF_MAX elements. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08net: netprio_cgroup: make net_prio_subsys staticJohn Fastabend
net_prio_subsys can be made static this removes the sparse warning it was throwing. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-12-06net: Silence seq_scale() unused warningStephen Boyd
On a CONFIG_NET=y build net/core/secure_seq.c:22: warning: 'seq_scale' defined but not used Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06sock_diag: Move the sock_ code to net/core/Pavel Emelyanov
This patch moves the sock_ code from inet_diag.c to generic sock_diag.c file and provides necessary request_module-s calls and a pointer on inet_diag_compat dumping routine. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06ipv4:correct description for tcp_max_syn_backlogPeter Pan(潘卫平)
Since commit c5ed63d66f24(tcp: fix three tcp sysctls tuning), sysctl_max_syn_backlog is determined by tcp_hashinfo->ehash_mask, and the minimal value is 128, and it will increase in proportion to the memory of machine. The original description for tcp_max_syn_backlog and sysctl_max_syn_backlog are out of date. Changelog: V2: update description for sysctl_max_syn_backlog Signed-off-by: Weiping Pan <panweiping3@gmail.com> Reviewed-by: Shan Wei <shanwei88@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05bql: fix CONFIG_XPS=n buildEric Dumazet
netdev_queue_release() should be called even if CONFIG_XPS=n to properly release device reference. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.David Miller
To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-04tcp: take care of misalignmentsEric Dumazet
We discovered that TCP stack could retransmit misaligned skbs if a malicious peer acknowledged sub MSS frame. This currently can happen only if output interface is non SG enabled : If SG is enabled, tcp builds headless skbs (all payload is included in fragments), so the tcp trimming process only removes parts of skb fragments, header stay aligned. Some arches cant handle misalignments, so force a head reallocation and shrink headroom to MAX_TCP_HEADER. Dont care about misaligments on x86 and PPC (or other arches setting NET_IP_ALIGN to 0) This patch introduces __pskb_copy() which can specify the headroom of new head, and pskb_copy() becomes a wrapper on top of __pskb_copy() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-12-01net: net_device flags is an unsigned intEric Dumazet
commit b00055aacdb ([NET] core: add RFC2863 operstate) changed net_device flags from unsigned short to unsigned int. Some core functions still assume its an unsigned short. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30net/core: fix rollback handler in register_netdevice_notifierRongQing.Li
Within nested statements, the break statement terminates only the do, for, switch, or while statement that immediately encloses it, So replace the break with goto. Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30neigh: Add device constructor/destructor capability.David Miller
If the neigh entry has device private state, it will need constructor/destructor ops. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30neigh: Add infrastructure for allocating device neigh privates.David Miller
netdev->neigh_priv_len records the private area length. This will trigger for neigh_table objects which set tbl->entry_size to zero, and the first instances of this will be forthcoming. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30neigh: Get rid of neigh_table->kmem_cachepDavid Miller
We are going to alloc for device specific private areas for neighbour entries, and in order to do that we have to move away from the fixed allocation size enforced by using neigh_table->kmem_cachep As a nice side effect we can now use kfree_rcu(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29net: Fix skb_update_prio RCU usage.Igor Maravic
Change function rcu_dereference to rcu_dereference_bh to avoid warning [ INFO: suspicious RCU usage. ] ------------------------------- net/core/dev.c:2459 suspicious rcu_dereference_check() usage! because we are locking with rcu_read_lock_bh(); in function dev_queue_xmit(struct sk_buff *skb) Signed-off-by: Igor Maravic <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29flow_dissector: use a 64bit load/storeEric Dumazet
Le lundi 28 novembre 2011 à 19:06 -0500, David Miller a écrit : > From: Dimitris Michailidis <dm@chelsio.com> > Date: Mon, 28 Nov 2011 08:25:39 -0800 > > >> +bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys > >> *flow) > >> +{ > >> + int poff, nhoff = skb_network_offset(skb); > >> + u8 ip_proto; > >> + u16 proto = skb->protocol; > > > > __be16 instead of u16 for proto? > > I'll take care of this when I apply these patches. ( CC trimmed ) Thanks David ! Here is a small patch to use one 64bit load/store on x86_64 instead of two 32bit load/stores. [PATCH net-next] flow_dissector: use a 64bit load/store gcc compiler is smart enough to use a single load/store if we memcpy(dptr, sptr, 8) on x86_64, regardless of CONFIG_CC_OPTIMIZE_FOR_SIZE In IP header, daddr immediately follows saddr, this wont change in the future. We only need to make sure our flow_keys (src,dst) fields wont break the rule. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29bql: Byte queue limitsTom Herbert
Networking stack support for byte queue limits, uses dynamic queue limits library. Byte queue limits are maintained per transmit queue, and a dql structure has been added to netdev_queue structure for this purpose. Configuration of bql is in the tx-<n> sysfs directory for the queue under the byte_queue_limits directory. Configuration includes: limit_min, bql minimum limit limit_max, bql maximum limit hold_time, bql slack hold time Also under the directory are: limit, current byte limit inflight, current number of bytes on the queue Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29xps: Add xps_queue_release functionTom Herbert
This patch moves the xps specific parts in netdev_queue_release into its own function which netdev_queue_release can call. This allows netdev_queue_release to be more generic (for adding new attributes to tx queues). Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29net: Add queue state xoff flag for stackTom Herbert
Create separate queue state flags so that either the stack or drivers can turn on XOFF. Added a set of functions used in the stack to determine if a queue is really stopped (either by stack or driver) Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29net: optimize socket timestampingEric Dumazet
We can test/set multiple bits from sk_flags at once, to shorten a bit socket setup/dismantle phase. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>