linaro-lsk.git - [no description]

Age	Commit message (Collapse)	Author
2014-03-24	tipc: fix spinlock recursion bug for failed subscriptions	Erik Hugne
	If a topology event subscription fails for any reason, such as out of memory, max number reached or because we received an invalid request the correct behavior is to terminate the subscribers connection to the topology server. This is currently broken and produces the following oops: [27.953662] tipc: Subscription rejected, illegal request [27.955329] BUG: spinlock recursion on CPU#1, kworker/u4:0/6 [27.957066] lock: 0xffff88003c67f408, .magic: dead4ead, .owner: kworker/u4:0/6, .owner_cpu: 1 [27.958054] CPU: 1 PID: 6 Comm: kworker/u4:0 Not tainted 3.14.0-rc6+ #5 [27.960230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [27.960874] Workqueue: tipc_rcv tipc_recv_work [tipc] [27.961430] ffff88003c67f408 ffff88003de27c18 ffffffff815c0207 ffff88003de1c050 [27.962292] ffff88003de27c38 ffffffff815beec5 ffff88003c67f408 ffffffff817f0a8a [27.963152] ffff88003de27c58 ffffffff815beeeb ffff88003c67f408 ffffffffa0013520 [27.964023] Call Trace: [27.964292] [<ffffffff815c0207>] dump_stack+0x45/0x56 [27.964874] [<ffffffff815beec5>] spin_dump+0x8c/0x91 [27.965420] [<ffffffff815beeeb>] spin_bug+0x21/0x26 [27.965995] [<ffffffff81083df6>] do_raw_spin_lock+0x116/0x140 [27.966631] [<ffffffff815c6215>] _raw_spin_lock_bh+0x15/0x20 [27.967256] [<ffffffffa0008540>] subscr_conn_shutdown_event+0x20/0xa0 [tipc] [27.968051] [<ffffffffa000fde4>] tipc_close_conn+0xa4/0xb0 [tipc] [27.968722] [<ffffffffa00101ba>] tipc_conn_terminate+0x1a/0x30 [tipc] [27.969436] [<ffffffffa00089a2>] subscr_conn_msg_event+0x1f2/0x2f0 [tipc] [27.970209] [<ffffffffa0010000>] tipc_receive_from_sock+0x90/0xf0 [tipc] [27.970972] [<ffffffffa000fa79>] tipc_recv_work+0x29/0x50 [tipc] [27.971633] [<ffffffff8105dbf5>] process_one_work+0x165/0x3e0 [27.972267] [<ffffffff8105e869>] worker_thread+0x119/0x3a0 [27.972896] [<ffffffff8105e750>] ? manage_workers.isra.25+0x2a0/0x2a0 [27.973622] [<ffffffff810648af>] kthread+0xdf/0x100 [27.974168] [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0 [27.974893] [<ffffffff815ce13c>] ret_from_fork+0x7c/0xb0 [27.975466] [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0 The recursion occurs when subscr_terminate tries to grab the subscriber lock, which is already taken by subscr_conn_msg_event. We fix this by checking if the request to establish a new subscription was successful, and if not we initiate termination of the subscriber after we have released the subscriber lock. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-24	netpoll: fix the skb check in pkt_is_ns	Li RongQing
	Neighbor Solicitation is ipv6 protocol, so we should check skb->protocol with ETH_P_IPV6 Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Cc: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20	Merge branch 'fixes' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== Open vSwitch Four small fixes for net/3.14. I realize that these are late in the cycle - just got back from vacation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20	ip6mr: fix mfc notification flags	Nicolas Dichtel
	Commit 812e44dd1829 ("ip6mr: advertise new mfc entries via rtnl") reuses the function ip6mr_fill_mroute() to notify mfc events. But this function was used only for dump and thus was always setting the flag NLM_F_MULTI, which is wrong in case of a single notification. Libraries like libnl will wait forever for NLMSG_DONE. CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20	ipmr: fix mfc notification flags	Nicolas Dichtel
	Commit 8cd3ac9f9b7b ("ipmr: advertise new mfc entries via rtnl") reuses the function ipmr_fill_mroute() to notify mfc events. But this function was used only for dump and thus was always setting the flag NLM_F_MULTI, which is wrong in case of a single notification. Libraries like libnl will wait forever for NLMSG_DONE. CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20	rtnetlink: fix fdb notification flags	Nicolas Dichtel
	Commit 3ff661c38c84 ("net: rtnetlink notify events for FDB NTF_SELF adds and deletes") reuses the function nlmsg_populate_fdb_fill() to notify fdb events. But this function was used only for dump and thus was always setting the flag NLM_F_MULTI, which is wrong in case of a single notification. Libraries like libnl will wait forever for NLMSG_DONE. CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20	openvswitch: Correctly report flow used times for first 5 minutes after boot.	Ben Pfaff
	The kernel starts out its "jiffies" timer as 5 minutes below zero, as shown in include/linux/jiffies.h: /* * Have the 32 bit jiffies value wrap 5 minutes after boot * so jiffies wrap bugs show up earlier. / #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300HZ)) The loop in ovs_flow_stats_get() starts out with 'used' set to 0, then takes any "later" time. This means that for the first five minutes after boot, flows will always be reported as never used, since 0 is greater than any time already seen. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-03-19	netfilter: xt_qtaguid: 64-bit warning fixes	Greg Hackmann
	Change-Id: I2adc517c0c51050ed601992fa0ea4de8f1449414 Signed-off-by: Greg Hackmann <ghackmann@google.com>
2014-03-19	tcp: add a sysctl to config the tcp_default_init_rwnd	JP Abgrall
	The default initial rwnd is hardcoded to 10. Now we allow it to be controlled via /proc/sys/net/ipv4/tcp_default_init_rwnd which limits the values from 3 to 100 This is somewhat needed because ipv6 routes are autoconfigured by the kernel. See "An Argument for Increasing TCP's Initial Congestion Window" in https://developers.google.com/speed/articles/tcp_initcwnd_paper.pdf Change-Id: I386b2a9d62de0ebe05c1ebe1b4bd91b314af5c54 Signed-off-by: JP Abgrall <jpa@google.com> Conflicts: net/ipv4/sysctl_net_ipv4.c net/ipv4/tcp_input.c
2014-03-19	netfilter: xt_IDLETIMER: Revert to retain the kernel API format.	Ashish Sharma
	Reverted Change-Id: Iaeca5dd2d7878c0733923ae03309a2a7b86979ca Change-Id: I0e0a4f60ec14330d8d8d1c5a508fa058d9919e07 Signed-off-by: Ashish Sharma <ashishsharma@google.com> (cherry picked from commit e0a4e5b0e808d718dd9af500c5754118fc3935db)
2014-03-19	netfilter: xt_qtaguid: fix memory leak in seq_file handlers	Greg Hackmann
	Change-Id: I15b21230d52479d008a00d9e2191dda020f00925 Signed-off-by: Greg Hackmann <ghackmann@google.com>
2014-03-19	netfilter: xt_qtaguid: 3.10 fixes	Arve Hjønnevåg
	Stop using obsolete procfs api. Signed-off-by: Arve Hjønnevåg <arve@android.com>
2014-03-19	netfilter: xt_quota2: 3.10 fixes.	Arve Hjønnevåg
	- Stop using obsolete create_proc_entry api. - Use proc_set_user instead of directly accessing the private structure. Signed-off-by: Arve Hjønnevåg <arve@android.com>
2014-03-19	net: activity_stats: Stop using obsolete create_proc_read_entry api	Arve Hjønnevåg
	Convert to use seq_read Signed-off-by: Arve Hjønnevåg <arve@android.com>
2014-03-19	netfilter: xt_qtaguid: fix bad tcp_time_wait sock handling	JP Abgrall
	Since (41063e9 ipv4: Early TCP socket demux), skb's can have an sk which is not a struct sock but the smaller struct inet_timewait_sock without an sk->sk_socket. Now we bypass sk_state == TCP_TIME_WAIT Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: qtaguid: rate limit some of the printks	JP Abgrall
	Some of the printks are in the packet handling path. We now ratelimit the very unlikely errors to avoid kmsg spamming. Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: xt_qtaguid: Allow tracking loopback	JP Abgrall
	In the past it would always ignore interfaces with loopback addresses. Now we just treat them like any other. This also helps with writing tests that check for the presence of the qtaguid module. Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: xt_qtaguid: extend iface stat to report protocols	JP Abgrall
	In the past the iface_stat_fmt would only show global bytes/packets for the skb-based numbers. For stall detection in userspace, distinguishing tcp vs other protocols makes it easier. Now we report ifname total_skb_rx_bytes total_skb_rx_packets total_skb_tx_bytes total_skb_tx_packets {rx,tx}_{tcp,udp,ohter}_{bytes,packets} Bug: 6818637 Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: xt_qtaguid: remove AID_* dependency for access control	JP Abgrall
	qtaguid limits what can be done with /ctrl and /stats based on group membership. This changes removes AID_NET_BW_STATS and AID_NET_BW_ACCT, and picks up the groups from the gid of the matching proc entry files. Signed-off-by: JP Abgrall <jpa@google.com> Change-Id: I42e477adde78a12ed5eb58fbc0b277cdaadb6f94
2014-03-19	netfilter: qtaguid: Don't BUG_ON if create_if_tag_stat fails	Pontus Fuchs
	If create_if_tag_stat fails to allocate memory (GFP_ATOMIC) the following will happen: qtaguid: iface_stat: tag stat alloc failed ... kernel BUG at xt_qtaguid.c:1482! Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com>
2014-03-19	netfilter: xt_qtaguid: fix error exit that would keep a spinlock.	JP Abgrall
	qtudev_open() could return with a uid_tag_data_tree_lock held when an kzalloc(..., GFP_ATOMIC) would fail. Very unlikely to get triggered AND survive the mayhem of running out of mem. Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: xt_qtaguid: report only uid tags to non-privileged processes	JP Abgrall
	In the past, a process could only see its own stats (uid-based summary, and details). Now we allow any process to see other UIDs uid-based stats, but still hide the detailed stats. Change-Id: I7666961ed244ac1d9359c339b048799e5db9facc Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: xt_IDLETIMER: Rename INTERFACE to LABEL in netlink notification.	Ashish Sharma
	Change-Id: Iaeca5dd2d7878c0733923ae03309a2a7b86979ca Signed-off-by: Ashish Sharma <ashishsharma@google.com>
2014-03-19	netfilter: xt_qtaguid: start tracking iface rx/tx at low level	JP Abgrall
	qtaguid tracks the device stats by monitoring when it goes up and down, then it gets the dev_stats(). But devs don't correctly report stats (either they don't count headers symmetrically between rx/tx, or they count internal control messages). Now qtaguid counts the rx/tx bytes/packets during raw:prerouting and mangle:postrouting (nat is not available in ipv6). The results are in /proc/net/xt_qtaguid/iface_stat_fmt which outputs a format line (bash expansion): ifname total_skb_{rx,tx}_{bytes,packets} Added event counters for pre/post handling. Added extra ctrl_*() pid/uid debugging. Change-Id: Id84345d544ad1dd5f63e3842cab229e71d339297 Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: xt_IDLETIMER: Add new netlink msg type	JP Abgrall
	Send notifications when the label becomes active after an idle period. Send netlink message notifications in addition to sysfs notifications. Using a uevent with subsystem=xt_idletimer INTERFACE=... STATE={active,inactive} This is backport from common android-3.0 commit: beb914e987cbbd368988d2b94a6661cb907c4d5a with uevent support instead of a new netlink message type. Change-Id: I31677ef00c94b5f82c8457e5bf9e5e584c23c523 Signed-off-by: Ashish Sharma <ashishsharma@google.com> Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: xt_qtaguid: fix ipv6 protocol lookup	JP Abgrall
	When updating the stats for a given uid it would incorrectly assume IPV4 and pick up the wrong protocol when IPV6. Change-Id: Iea4a635012b4123bf7aa93809011b7b2040bb3d5 Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: qtaguid: initialize a local var to keep compiler happy.	JP Abgrall
	There was a case that might have seemed like new_tag_stat was not initialized and actually used. Added comment explaining why it was impossible, and a BUG() in case the logic gets changed. Change-Id: I1eddd1b6f754c08a3bf89f7e9427e5dce1dfb081 Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	bridge: Have tx_bytes count headers like rx_bytes.	Ashish Sharma
	Since rx_bytes accounting does not include Ethernet Headers in br_input.c, excluding ETH_HLEN on the transmit path for consistent measurement of packet length on both the Tx and Rx chains. The clean way would be for Rx to include the eth header, but the skb len has already been adjusted by the time the br code sees the skb. This is only a temporary workaround until we can completely ignore or cleanly fix the skb->len handling. Change-Id: I910de95a4686b2119da7f1f326e2154ef31f9972 Signed-off-by: Ashish Sharma <ashishsharma@google.com>
2014-03-19	netfilter: ipv6: fix crash caused by ipv6_find_hdr()	JP Abgrall
	When calling: ipv6_find_hdr(skb, &thoff, -1, NULL) on a fragmented packet, thoff would be left with a random value causing callers to read random memory offsets with: skb_header_pointer(skb, thoff, ...) Now we force ipv6_find_hdr() to return a failure in this case. Calling: ipv6_find_hdr(skb, &thoff, -1, &fragoff) will set fragoff as expected, and not return a failure. Change-Id: Ib474e8a4267dd2b300feca325811330329684a88 Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: fixup the quota2, and enable.	JP Abgrall
	The xt_quota2 came from http://sourceforge.net/projects/xtables-addons/develop It needed tweaking for it to compile within the kernel tree. Fixed kmalloc() and create_proc_entry() invocations within a non-interruptible context. Removed useless copying of current quota back to the iptable's struct matchinfo: - those are per CPU: they will change randomly based on which cpu gets to update the value. - they prevent matching a rule: e.g. -A chain -m quota2 --name q1 --quota 123 can't be followed by -D chain -m quota2 --name q1 --quota 123 as the 123 will be compared to the struct matchinfo's quota member. Use the NETLINK NETLINK_NFLOG family to log a single message when the quota limit is reached. It uses the same packet type as ipt_ULOG, but - never copies skb data, - uses 112 as the event number (ULOG's +1) It doesn't log if the module param "event_num" is 0. Change-Id: I021d3b743db3b22158cc49acb5c94d905b501492 Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: adding the original quota2 from xtables-addons	JP Abgrall
	The original xt_quota in the kernel is plain broken: - counts quota at a per CPU level (was written back when ubiquitous SMP was just a dream) - provides no way to count across IPV4/IPV6. This patch is the original unaltered code from: http://sourceforge.net/projects/xtables-addons at commit e84391ce665cef046967f796dd91026851d6bbf3 Change-Id: I19d49858840effee9ecf6cff03c23b45a97efdeb Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	netfilter: add xt_qtaguid matching module	JP Abgrall
	This module allows tracking stats at the socket level for given UIDs. It replaces xt_owner. If the --uid-owner is not specified, it will just count stats based on who the skb belongs to. This will even happen on incoming skbs as it looks into the skb via xt_socket magic to see who owns it. If an skb is lost, it will be assigned to uid=0. To control what sockets of what UIDs are tagged by what, one uses: echo t $sock_fd $accounting_tag $the_billed_uid \ > /proc/net/xt_qtaguid/ctrl So whenever an skb belongs to a sock_fd, it will be accounted against $the_billed_uid and matching stats will show up under the uid with the given $accounting_tag. Because the number of allocations for the stats structs is not that big: ~500 apps * 32 per app we'll just do it atomic. This avoids walking lists many times, and the fancy worker thread handling. Slabs will grow when needed later. It use netdevice and inetaddr notifications instead of hooks in the core dev code to track when a device comes and goes. This removes the need for exposed iface_stat.h. Put procfs dirs in /proc/net/xt_qtaguid/ ctrl stats iface_stat/<iface>/... The uid stats are obtainable in ./stats. Change-Id: I01af4fd91c8de651668d3decb76d9bdc1e343919 Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-19	net: wireless: change the expire time about each entry of scan results	jun.ho.lee
	Change-Id: I6e8d838d91bebc28f4cd09dcb8b9f1de775be13d Signed-off-by: jun.ho.lee <jun.ho.lee@samsung.com> Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2014-03-19	net: Fix CONFIG_RPS option to be turned off	Dmitry Shmidt
	Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2014-03-19	net: activity_stats: Add statistics for network transmission activity	Mike Chan
	When enabled, tracks the frequency of network transmissions (inbound and outbound) and buckets them accordingly. Buckets are determined by time between network activity. Each bucket represents the number of network transmisions that were N sec or longer apart. Where N is defined as 1 << bucket index. This network pattern tracking is particularly useful for wireless networks (ie: 3G) where batching network activity closely together is more power efficient than far apart. New file: /proc/net/stat/activity output: Min Bucket(sec) Count 1 7 2 0 4 1 8 0 16 0 32 2 64 1 128 0 Change-Id: I4c4cd8627b872a55f326b1715c51bc3bdd6e8d92 Signed-off-by: Mike Chan <mike@android.com>
2014-03-19	rfkill: Introduce CONFIG_RFKILL_PM and use instead of CONFIG_PM to power down	Nick Pelly
	Some platforms do not want to power down rfkill devices on suspend. Change-Id: I62a11630521c636d54a4a02ab9037a43435925f5 Signed-off-by: Nick Pelly <npelly@google.com>
2014-03-19	net: Replace AID_NET_RAW checks with capable(CAP_NET_RAW).	Chia-chi Yeh
	Signed-off-by: Chia-chi Yeh <chiachi@android.com>
2014-03-19	misc: uidstat: Adding uid stat driver to collect network statistics.	Mike Chan
	Signed-off-by: Mike Chan <mike@android.com>
2014-03-19	sysfs_net_ipv4: Add sysfs-based knobs for controlling TCP window size	Robert Love
	Add a family of knobs to /sys/kernel/ipv4 for controlling the TCP window size: tcp_wmem_min tcp_wmem_def tcp_wmem_max tcp_rmem_min tcp_rmem_def tcp_rmem_max This six values mirror the sysctl knobs in /proc/sys/net/ipv4/tcp_wmem and /proc/sys/net/ipv4/tcp_rmem. Sysfs, unlike sysctl, allows us to set and manage the files' permissions and owners. Signed-off-by: Robert Love <rlove@google.com>
2014-03-19	net: socket ioctl to reset connections matching local address	Robert Love
	Introduce a new socket ioctl, SIOCKILLADDR, that nukes all sockets bound to the same local address. This is useful in situations with dynamic IPs, to kill stuck connections. Signed-off-by: Brian Swetland <swetland@google.com> net: fix tcp_v4_nuke_addr Signed-off-by: Dima Zavin <dima@android.com> net: ipv4: Fix a spinlock recursion bug in tcp_v4_nuke. We can't hold the lock while calling to tcp_done(), so we drop it before calling. We then have to start at the top of the chain again. Signed-off-by: Dima Zavin <dima@android.com> net: ipv4: Fix race in tcp_v4_nuke_addr(). To fix a recursive deadlock in 2.6.29, we stopped holding the hash table lock across tcp_done() calls. This fixed the deadlock, but introduced a race where the socket could die or change state. Fix: Before unlocking the hash table, we grab a reference to the socket. We can then unlock the hash table without risk of the socket going away. We then lock the socket, which is safe because it is pinned. We can then call tcp_done() without recursive deadlock and without race. Upon return, we unlock the socket and then unpin it, killing it. Change-Id: Idcdae072b48238b01bdbc8823b60310f1976e045 Signed-off-by: Robert Love <rlove@google.com> Acked-by: Dima Zavin <dima@android.com> ipv4: disable bottom halves around call to tcp_done(). Signed-off-by: Robert Love <rlove@google.com> Signed-off-by: Colin Cross <ccross@android.com> ipv4: Move sk_error_report inside bh_lock_sock in tcp_v4_nuke_addr When sk_error_report is called, it wakes up the user-space thread, which then calls tcp_close. When the tcp_close is interrupted by the tcp_v4_nuke_addr ioctl thread running tcp_done, it leaks 392 bytes and triggers a WARN_ON. This patch moves the call to sk_error_report inside the bh_lock_sock, which matches the locking used in tcp_v4_err. Signed-off-by: Colin Cross <ccross@android.com>
2014-03-19	Paranoid network.	Robert Love
	With CONFIG_ANDROID_PARANOID_NETWORK, require specific uids/gids to instantiate network sockets. Signed-off-by: Robert Love <rlove@google.com> paranoid networking: Use in_egroup_p() to check group membership The previous group_search() caused trouble for partners with module builds. in_egroup_p() is also cleaner. Signed-off-by: Nick Pelly <npelly@google.com> Fix 2.6.29 build. Signed-off-by: Arve Hjønnevåg <arve@android.com> net: Fix compilation of the IPv6 module Fix compilation of the IPv6 module -- current->euid does not exist anymore, current_euid() is what needs to be used. Signed-off-by: Steinar H. Gunderson <sesse@google.com> net: bluetooth: Remove the AID_NET_BT* gid numbers Removed bluetooth checks for AID_NET_BT and AID_NET_BT_ADMIN which are not useful anymore. This is in preparation for getting rid of all the AID_* gids. Signed-off-by: JP Abgrall <jpa@google.com>
2014-03-18	ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly	lucien
	In ip6_append_data_mtu(), when the xfrm mode is not tunnel(such as transport),the ipsec header need to be added in the first fragment, so the mtu will decrease to reserve space for it, then the second fragment come, the mtu should be turn back, as the commit 0c1833797a5a6ec23ea9261d979aa18078720b74 said. however, in the commit a493e60ac4bbe2e977e7129d6d8cbb0dd236be, it use mtu = min(mtu, ...) to change the mtu, which lead to the new mtu is alway equal with the first fragment's. and cannot turn back. when I test through ping6 -c1 -s5000 $ip (mtu=1280): ...frag (0\|1232) ESP(spi=0x00002000,seq=0xb), length 1232 ...frag (1232\|1216) ...frag (2448\|1216) ...frag (3664\|1216) ...frag (4880\|164) which should be: ...frag (0\|1232) ESP(spi=0x00001000,seq=0x1), length 1232 ...frag (1232\|1232) ...frag (2464\|1232) ...frag (3696\|1232) ...frag (4928\|116) so delete the min() when change back the mtu. Signed-off-by: Xin Long <lucien.xin@gmail.com> Fixes: 75a493e60ac4bb ("ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== 1) Fix a sleep in atomic when pfkey_sadb2xfrm_user_sec_ctx() is called from pfkey_compile_policy(). Fix from Nikolay Aleksandrov. 2) security_xfrm_policy_alloc() can be called in process and atomic context. Add an argument to let the callers choose the appropriate way. Fix from Nikolay Aleksandrov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-13	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds
	Pull networking fixes from David Miller: "I know this is a bit more than you want to see, and I've told the wireless folks under no uncertain terms that they must severely scale back the extent of the fixes they are submitting this late in the game. Anyways: 1) vmxnet3's netpoll doesn't perform the equivalent of an ISR, which is the correct implementation, like it should. Instead it does something like a NAPI poll operation. This leads to crashes. From Neil Horman and Arnd Bergmann. 2) Segmentation of SKBs requires proper socket orphaning of the fragments, otherwise we might access stale state released by the release callbacks. This is a 5 patch fix, but the initial patches are giving variables and such significantly clearer names such that the actual fix itself at the end looks trivial. From Michael S. Tsirkin. 3) TCP control block release can deadlock if invoked from a timer on an already "owned" socket. Fix from Eric Dumazet. 4) In the bridge multicast code, we must validate that the destination address of general queries is the link local all-nodes multicast address. From Linus Lüssing. 5) The x86 BPF JIT support for negative offsets puts the parameter for the helper function call in the wrong register. Fix from Alexei Starovoitov. 6) The descriptor type used for RTL_GIGA_MAC_VER_17 chips in the r8169 driver is incorrect. Fix from Hayes Wang. 7) The xen-netback driver tests skb_shinfo(skb)->gso_type bits to see if a packet is a GSO frame, but that's not the correct test. It should use skb_is_gso(skb) instead. Fix from Wei Liu. 8) Negative msg->msg_namelen values should generate an error, from Matthew Leach. 9) at86rf230 can deadlock because it takes the same lock from it's ISR and it's hard_start_xmit method, without disabling interrupts in the latter. Fix from Alexander Aring. 10) The FEC driver's restart doesn't perform operations in the correct order, so promiscuous settings can get lost. Fix from Stefan Wahren. 11) Fix SKB leak in SCTP cookie handling, from Daniel Borkmann. 12) Reference count and memory leak fixes in TIPC from Ying Xue and Erik Hugne. 13) Forced eviction in inet_frag_evictor() must strictly make sure all frags are deleted, otherwise module unload (f.e. 6lowpan) can crash. Fix from Florian Westphal. 14) Remove assumptions in AF_UNIX's use of csum_partial() (which it uses as a hash function), which breaks on PowerPC. From Anton Blanchard. The main gist of the issue is that csum_partial() is defined only as a value that, once folded (f.e. via csum_fold()) produces a correct 16-bit checksum. It is legitimate, therefore, for csum_partial() to produce two different 32-bit values over the same data if their respective alignments are different. 15) Fix endiannes bug in MAC address handling of ibmveth driver, also from Anton Blanchard. 16) Error checks for ipv6 exthdrs offload registration are reversed, from Anton Nayshtut. 17) Externally triggered ipv6 addrconf routes should count against the garbage collection threshold. Fix from Sabrina Dubroca. 18) The PCI shutdown handler added to the bnx2 driver can wedge the chip if it was not brought up earlier already, which in particular causes the firmware to shut down the PHY. Fix from Michael Chan. 19) Adjust the sanity WARN_ON_ONCE() in qdisc_list_add() because as currently coded it can and does trigger in legitimate situations. From Eric Dumazet. 20) BNA driver fails to build on ARM because of a too large udelay() call, fix from Ben Hutchings. 21) Fair-Queue qdisc holds locks during GFP_KERNEL allocations, fix from Eric Dumazet. 22) The vlan passthrough ops added in the previous release causes a regression in source MAC address setting of outgoing headers in some circumstances. Fix from Peter Boström" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits) ipv6: Avoid unnecessary temporary addresses being generated eth: fec: Fix lost promiscuous mode after reconnecting cable bonding: set correct vlan id for alb xmit path at86rf230: fix lockdep splats net/mlx4_en: Deregister multicast vxlan steering rules when going down vmxnet3: fix building without CONFIG_PCI_MSI MAINTAINERS: add networking selftests to NETWORKING net: socket: error on a negative msg_namelen MAINTAINERS: Add tools/net to NETWORKING [GENERAL] packet: doc: Spelling s/than/that/ net/mlx4_core: Load the IB driver when the device supports IBoE net/mlx4_en: Handle vxlan steering rules for mac address changes net/mlx4_core: Fix wrong dump of the vxlan offloads device capability xen-netback: use skb_is_gso in xenvif_start_xmit r8169: fix the incorrect tx descriptor version tools/net/Makefile: Define PACKAGE to fix build problems x86: bpf_jit: support negative offsets bridge: multicast: enable snooping on general queries only bridge: multicast: add sanity check for general query destination tcp: tcp_release_cb() should release socket ownership ...
2014-03-13	ipv6: Avoid unnecessary temporary addresses being generated	Heiner Kallweit
	tmp_prefered_lft is an offset to ifp->tstamp, not now. Therefore age needs to be added to the condition. Age calculation in ipv6_create_tempaddr is different from the one in addrconf_verify and doesn't consider ADDRCONF_TIMER_FUZZ_MINUS. This can cause age in ipv6_create_tempaddr to be less than the one in addrconf_verify and therefore unnecessary temporary address to be generated. Use age calculation as in addrconf_modify to avoid this. Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12	net: socket: error on a negative msg_namelen	Matthew Leach
	When copying in a struct msghdr from the user, if the user has set the msg_namelen parameter to a negative value it gets clamped to a valid size due to a comparison between signed and unsigned values. Ensure the syscall errors when the user passes in a negative value. Signed-off-by: Matthew Leach <matthew.leach@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11	bridge: multicast: enable snooping on general queries only	Linus Lüssing
	Without this check someone could easily create a denial of service by injecting multicast-specific queries to enable the bridge snooping part if no real querier issuing periodic general queries is present on the link which would result in the bridge wrongly shutting down ports for multicast traffic as the bridge did not learn about these listeners. With this patch the snooping code is enabled upon receiving valid, general queries only. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11	bridge: multicast: add sanity check for general query destination	Linus Lüssing
	General IGMP and MLD queries are supposed to have the multicast link-local all-nodes address as their destination according to RFC2236 section 9, RFC3376 section 4.1.12/9.1, RFC2710 section 8 and RFC3810 section 5.1.15. Without this check, such malformed IGMP/MLD queries can result in a denial of service: The queries are ignored by most IGMP/MLD listeners therefore they will not respond with an IGMP/MLD report. However, without this patch these malformed MLD queries would enable the snooping part in the bridge code, potentially shutting down the according ports towards these hosts for multicast traffic as the bridge did not learn about these listeners. Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11	tcp: tcp_release_cb() should release socket ownership	Eric Dumazet
	Lars Persson reported following deadlock : -000 \|M:0x0:0x802B6AF8(asm) <-- arch_spin_lock -001 \|tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0 -002 \|ip_local_deliver_finish(skb = 0x8BD527A0) -003 \|__netif_receive_skb_core(skb = 0x8BD527A0, ?) -004 \|netif_receive_skb(skb = 0x8BD527A0) -005 \|elk_poll(napi = 0x8C770500, budget = 64) -006 \|net_rx_action(?) -007 \|__do_softirq() -008 \|do_softirq() -009 \|local_bh_enable() -010 \|tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?) -011 \|tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0) -012 \|tcp_delack_timer_handler(sk = 0x8BE6B2A0) -013 \|tcp_release_cb(sk = 0x8BE6B2A0) -014 \|release_sock(sk = 0x8BE6B2A0) -015 \|tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?) -016 \|sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096) -017 \|kernel_sendmsg(?, ?, ?, ?, size = 4096) -018 \|smb_send_kvec() -019 \|smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0) -020 \|cifs_call_async() -021 \|cifs_async_writev(wdata = 0x87FD6580) -022 \|cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88) -023 \|__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88) -024 \|writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88) -025 \|__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88) -026 \|wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88) -027 \|wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0) -028 \|bdi_writeback_workfn(work = 0x87E4A9CC) -029 \|process_one_work(worker = 0x8B045880, work = 0x87E4A9CC) -030 \|worker_thread(__worker = 0x8B045880) -031 \|kthread(_create = 0x87CADD90) -032 \|ret_from_kernel_thread(asm) Bug occurs because __tcp_checksum_complete_user() enables BH, assuming it is running from softirq context. Lars trace involved a NIC without RX checksum support but other points are problematic as well, like the prequeue stuff. Problem is triggered by a timer, that found socket being owned by user. tcp_release_cb() should call tcp_write_timer_handler() or tcp_delack_timer_handler() in the appropriate context : BH disabled and socket lock held, but 'owned' field cleared, as if they were running from timer handlers. Fixes: 6f458dfb4092 ("tcp: improve latencies of timer triggered events") Reported-by: Lars Persson <lars.persson@axis.com> Tested-by: Lars Persson <lars.persson@axis.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11	skbuff: skb_segment: orphan frags before copying	Michael S. Tsirkin
	skb_segment copies frags around, so we need to copy them carefully to avoid accessing user memory after reporting completion to userspace through a callback. skb_segment doesn't normally happen on datapath: TSO needs to be disabled - so disabling zero copy in this case does not look like a big deal. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>