aboutsummaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2013-01-27ip6mr: limit IPv6 MRT_TABLE identifiersDan Carpenter
We did this for IPv4 in b49d3c1e1c "net: ipmr: limit MRT_TABLE identifiers" but we need to do it for IPv6 as well. On IPv6 the name is "pim6reg" instead of "pimreg" so there is one less digit allowed. The strcpy() is in ip6mr_reg_vif(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== 1) The transport header did not point to the right place after esp/ah processing on tunnel mode in the receive path. As a result, the ECN field of the inner header was not set correctly, fixes from Li RongQing. 2) We did a null check too late in one of the xfrm_replay advance functions. This can lead to a division by zero, fix from Nickolai Zeldovich. 3) The size calculation of the hash table missed the muiltplication with the actual struct size when the hash table is freed. We might call the wrong free function, fix from Michal Kubecek. 4) On IPsec pmtu events we can't access the transport headers of the original packet, so force a relookup for all routes to notify about the pmtu event. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18ipv6: Add an error handler for icmp6Steffen Klassert
pmtu and redirect events are now handled in the protocols error handler, so add an error handler for icmp6 to do this. It is needed in the case when we have no socket context. Based on a patch by Duan Jiong. Reported-by: Duan Jiong <djduanjiong@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17ipv6: fix header length calculation in ip6_append_data()Romain KUNTZ
Commit 299b0767 (ipv6: Fix IPsec slowpath fragmentation problem) has introduced a error in the header length calculation that provokes corrupted packets when non-fragmentable extensions headers (Destination Option or Routing Header Type 2) are used. rt->rt6i_nfheader_len is the length of the non-fragmentable extension header, and it should be substracted to rt->dst.header_len, and not to exthdrlen, as it was done before commit 299b0767. This patch reverts to the original and correct behavior. It has been successfully tested with and without IPsec on packets that include non-fragmentable extensions headers. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10ipv6: use addrconf_get_prefix_route for prefix route lookup [v2]Romain Kuntz
Replace ip6_route_lookup() with addrconf_get_prefix_route() when looking up for a prefix route. This ensures that the connected prefix is looked up in the main table, and avoids the selection of other matching routes located in different tables as well as blackhole or prohibited entries. In addition, this fixes an Opps introduced by commit 64c6d08e (ipv6: del unreachable route when an addr is deleted on lo), that would occur when a blackhole or prohibited entry is selected by ip6_route_lookup(). Such entries have a NULL rt6i_table argument, which is accessed by __ip6_del_rt() when trying to lock rt6i_table->tb6_lock. The function addrconf_is_prefix_route() is not used anymore and is removed. [v2] Minor indentation cleanup and log updates. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10ipv6: fix the noflags test in addrconf_get_prefix_routeRomain Kuntz
The tests on the flags in addrconf_get_prefix_route() does no make much sense: the 'noflags' parameter contains the set of flags that must not match with the route flags, so the test must be done against 'noflags', and not against 'flags'. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08ah6/esp6: set transport header correctly for IPsec tunnel mode.Li RongQing
IPsec tunnel does not set ECN field to CE in inner header when the ECN field in the outer header is CE, and the ECN field in the inner header is ECT(0) or ECT(1). The cause is ipip6_hdr() does not return the correct address of inner header since skb->transport-header is not the inner header after esp6_input_done2(), or ah6_input(). Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-01-04netfilter: ip6t_NPT: fix IPv6 NTP checksum calculationUlrich Weber
csum16_add() has a broken carry detection, should be: sum += sum < (__force u16)b; Instead of fixing csum16_add, remove the custom checksum functions and use the generic csum_add/csum_sub ones. Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-28Merge branch 'master' of git://1984.lsi.us.es/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following batch contains Netfilter fixes for 3.8-rc1. They are a mixture of old bugs that have passed unnoticed (I'll pass these to stable) and more fresh ones from the previous merge window, they are: * Fix for MAC address in 6in4 tunnels via NFLOG that results in ulogd showing up wrong address, from Bob Hockney. * Fix a comment in nf_conntrack_ipv6, from Florent Fourcot. * Fix a leak an error path in ctnetlink while creating an expectation, from Jesper Juhl. * Fix missing ICMP time exceeded in the IPv6 defragmentation code, from Haibo Xi. * Fix inconsistent handling of routing changes in MASQUERADE for the new connections case, from Andrew Collins. * Fix a missing skb_reset_transport in ip[6]t_REJECT that leads to crashes in the ixgbe driver (since it seems to access the transport header with TSO enabled), from Mukund Jampala. * Recover obsoleted NOTRACK target by including it into the CT and spot a warning via printk about being obsoleted. Many people don't check the scheduled to be removal file under Documentation, so we follow some less agressive approach to kill this in a year or so. Spotted by Florian Westphal, patch from myself. * Fix race condition in xt_hashlimit that allows to create two or more entries, from myself. * Fix crash if the CT is used due to the recently added facilities to consult the dying and unconfirmed conntrack lists, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26ipv6/ip6_gre: set transport header correctlyIsaku Yamahata
ip6gre_xmit2() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. Set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-19ipv6: addrconf.c: remove unnecessary "if"Cong Ding
the value of err is always negative if it goes to errout, so we don't need to check the value of err. Signed-off-by: Cong Ding <dinggnu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-16netfilter: nf_ct_reasm: fix conntrack reassembly expire codeHaibo Xi
Commit b836c99fd6c9 (ipv6: unify conntrack reassembly expire code with standard one) use the standard IPv6 reassembly code(ip6_expire_frag_queue) to handle conntrack reassembly expire. In ip6_expire_frag_queue, it invoke dev_get_by_index_rcu to get which device received this expired packet.so we must save ifindex when NF_conntrack get this packet. With this patch applied, I can see ICMP Time Exceeded sent from the receiver when the sender sent out 1/2 fragmented IPv6 packet. Signed-off-by: Haibo Xi <haibbo@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16netfilter: nf_conntrack_ipv6: fix comment for packets without dataFlorent Fourcot
Remove ambiguity of double negation. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADEAndrew Collins
Since (a0ecb85 netfilter: nf_nat: Handle routing changes in MASQUERADE target), the MASQUERADE target handles routing changes which affect the output interface of a connection, but only for ESTABLISHED connections. It is also possible for NEW connections which already have a conntrack entry to be affected by routing changes. This adds a check to drop entries in the NEW+conntrack state when the oif has changed. Signed-off-by: Andrew Collins <bsderandrew@gmail.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16netfilter: ip[6]t_REJECT: fix wrong transport header pointer in TCP resetMukund Jampala
The problem occurs when iptables constructs the tcp reset packet. It doesn't initialize the pointer to the tcp header within the skb. When the skb is passed to the ixgbe driver for transmit, the ixgbe driver attempts to access the tcp header and crashes. Currently, other drivers (such as our 1G e1000e or igb drivers) don't access the tcp header on transmit unless the TSO option is turned on. <1>BUG: unable to handle kernel NULL pointer dereference at 0000000d <1>IP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe] <4>*pdpt = 0000000085e5d001 *pde = 0000000000000000 <0>Oops: 0000 [#1] SMP [...] <4>Pid: 0, comm: swapper Tainted: P 2.6.35.12 #1 Greencity/Thurley <4>EIP: 0060:[<d081621c>] EFLAGS: 00010246 CPU: 16 <4>EIP is at ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe] <4>EAX: c7628820 EBX: 00000007 ECX: 00000000 EDX: 00000000 <4>ESI: 00000008 EDI: c6882180 EBP: dfc6b000 ESP: ced95c48 <4> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 <0>Process swapper (pid: 0, ti=ced94000 task=ced73bd0 task.ti=ced94000) <0>Stack: <4> cbec7418 c779e0d8 c77cc888 c77cc8a8 0903010a 00000000 c77c0008 00000002 <4><0> cd4997c0 00000010 dfc6b000 00000000 d0d176c9 c77cc8d8 c6882180 cbec7318 <4><0> 00000004 00000004 cbec7230 cbec7110 00000000 cbec70c0 c779e000 00000002 <0>Call Trace: <4> [<d0d176c9>] ? 0xd0d176c9 <4> [<d0d18a4d>] ? 0xd0d18a4d <4> [<411e243e>] ? dev_hard_start_xmit+0x218/0x2d7 <4> [<411f03d7>] ? sch_direct_xmit+0x4b/0x114 <4> [<411f056a>] ? __qdisc_run+0xca/0xe0 <4> [<411e28b0>] ? dev_queue_xmit+0x2d1/0x3d0 <4> [<411e8120>] ? neigh_resolve_output+0x1c5/0x20f <4> [<411e94a1>] ? neigh_update+0x29c/0x330 <4> [<4121cf29>] ? arp_process+0x49c/0x4cd <4> [<411f80c9>] ? nf_hook_slow+0x3f/0xac <4> [<4121ca8d>] ? arp_process+0x0/0x4cd <4> [<4121ca8d>] ? arp_process+0x0/0x4cd <4> [<4121c6d5>] ? T.901+0x38/0x3b <4> [<4121c918>] ? arp_rcv+0xa3/0xb4 <4> [<4121ca8d>] ? arp_process+0x0/0x4cd <4> [<411e1173>] ? __netif_receive_skb+0x32b/0x346 <4> [<411e19e1>] ? netif_receive_skb+0x5a/0x5f <4> [<411e1ea9>] ? napi_skb_finish+0x1b/0x30 <4> [<d0816eb4>] ? ixgbe_xmit_frame_ring+0x1564/0x2260 [ixgbe] <4> [<41013468>] ? lapic_next_event+0x13/0x16 <4> [<410429b2>] ? clockevents_program_event+0xd2/0xe4 <4> [<411e1b03>] ? net_rx_action+0x55/0x127 <4> [<4102da1a>] ? __do_softirq+0x77/0xeb <4> [<4102dab1>] ? do_softirq+0x23/0x27 <4> [<41003a67>] ? do_IRQ+0x7d/0x8e <4> [<41002a69>] ? common_interrupt+0x29/0x30 <4> [<41007bcf>] ? mwait_idle+0x48/0x4d <4> [<4100193b>] ? cpu_idle+0x37/0x4c <0>Code: df 09 d7 0f 94 c2 0f b6 d2 e9 e7 fb ff ff 31 db 31 c0 e9 38 ff ff ff 80 78 06 06 0f 85 3e fb ff ff 8b 7c 24 38 8b 8f b8 00 00 00 <0f> b6 51 0d f6 c2 01 0f 85 27 fb ff ff 80 e2 02 75 0d 8b 6c 24 <0>EIP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe] SS:ESP Signed-off-by: Mukund Jampala <jbmukund@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16ipv6: Fix Makefile offload objectsSimon Arlott
The following commit breaks IPv6 TCP transmission for me: Commit 75fe83c32248d99e6d5fe64155e519b78bb90481 Author: Vlad Yasevich <vyasevic@redhat.com> Date: Fri Nov 16 09:41:21 2012 +0000 ipv6: Preserve ipv6 functionality needed by NET This patch fixes the typo "ipv6_offload" which should be "ipv6-offload". I don't know why not including the offload modules should break TCP. Disabling all offload options on the NIC didn't help. Outgoing pulseaudio traffic kept stalling. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sockChristoph Paasch
If in either of the above functions inet_csk_route_child_sock() or __inet_inherit_port() fails, the newsk will not be freed: unreferenced object 0xffff88022e8a92c0 (size 1592): comm "softirq", pid 0, jiffies 4294946244 (age 726.160s) hex dump (first 32 bytes): 0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................ 02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5 [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481 [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416 [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701 [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4 [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233 [<ffffffff814cee68>] ip_rcv+0x217/0x267 [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553 [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82 This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus a single sock_put() is not enough to free the memory. Additionally, things like xfrm, memcg, cookie_values,... may have been initialized. We have to free them properly. This is fixed by forcing a call to tcp_done(), ending up in inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary, because it ends up doing all the cleanup on xfrm, memcg, cookie_values, xfrm,... Before calling tcp_done, we have to set the socket to SOCK_DEAD, to force it entering inet_csk_destroy_sock. To avoid the warning in inet_csk_destroy_sock, inet_num has to be set to 0. As inet_csk_destroy_sock does a dec on orphan_count, we first have to increase it. Calling tcp_done() allows us to remove the calls to tcp_clear_xmit_timer() and tcp_cleanup_congestion_control(). A similar approach is taken for dccp by calling dccp_done(). This is in the kernel since 093d282321 (tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()), thus since version >= 2.6.37. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14ipv6: Change skb->data before using icmpv6_notify() to propagate redirectDuan Jiong
In function ndisc_redirect_rcv(), the skb->data points to the transport header, but function icmpv6_notify() need the skb->data points to the inner IP packet. So before using icmpv6_notify() to propagate redirect, change skb->data to point the inner IP packet that triggered the sending of the Redirect, and introduce struct rd_msg to make it easy. Signed-off-by: Duan Jiong <djduanjiong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13ndisc: Fix padding error in link-layer address option.YOSHIFUJI Hideaki / 吉藤英明
If a natural number n exists where 2 + data_len <= 8n < 2 + data_len + pad, post padding is not initialized correctly. (Un)fortunately, the only type that requires pad is Infiniband, whose pad is 2 and data_len is 20, and this logical error has not become obvious, but it is better to fix. Note that ndisc_opt_addr_space() handles the situation described above correctly. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12ndisc: Unexport ndisc_{build,send}_skb().YOSHIFUJI Hideaki
These symbols were exported for bonding device by commit 305d552a ("bonding: send IPv6 neighbor advertisement on failover"). It bacame obsolete by commit 7c899432 ("bonding, ipv4, ipv6, vlan: Handle NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS") and removed by commit 4f5762ec ("bonding: Remove obsolete source file 'bond_ipv6.c'"). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05ipv6: avoid taking locks at socket dismantleEric Dumazet
ipv6_sock_mc_close() is called for ipv6 sockets at close time, and most of them don't use multicast. Add a test to avoid contention on a shared spinlock. Same heuristic applies for ipv6_sock_ac_close(), to avoid contention on a shared rwlock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04ipv6: Protect ->mc_forwarding access with CONFIG_IPV6_MROUTEDavid S. Miller
Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04ip6mr: fix rtm_family of rtnl msgNicolas Dichtel
We talk about IPv6, hence the family is RTNL_FAMILY_IP6MR! rtnl_register() is already called with RTNL_FAMILY_IP6MR. The bug is here since the beginning of this function (commit 5b285cac3570). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04ip6mr: advertise new mfc entries via rtnlNicolas Dichtel
This patch allows to monitor mf6c activities via rtnetlink. To avoid parsing two times the mf6c oifs, we use maxvif to allocate the rtnl msg, thus we may allocate some superfluous space. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04ipmr/ip6mr: allow to get unresolved cache via netlinkNicolas Dichtel
/proc/net/ip[6]_mr_cache allows to get all mfc entries, even if they are put in the unresolved list (mfc[6]_unres_queue). But only the table RT_TABLE_DEFAULT is displayed. This patch adds the parsing of the unresolved list when the dump is made via rtnetlink, hence each table can be checked. In IPv6, we set rtm_type in ip6mr_fill_mroute(), because in case of unresolved mfc __ip6mr_fill_mroute() will not set it. In IPv4, it is already done. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04ipmr/ip6mr: report origin of mfc entry into rtnl msgNicolas Dichtel
A mfc entry can be static or not (added via the mroute_sk socket). The patch reports MFC_STATIC flag into rtm_protocol by setting rtm_protocol to RTPROT_STATIC or RTPROT_MROUTED. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04ipmr/ip6mr: advertise mfc stats via rtnetlinkNicolas Dichtel
These statistics can be checked only via /proc/net/ip_mr_cache or SIOCGETSGCNT[_IN6] and thus only for the table RT_TABLE_DEFAULT. Advertising them via rtnetlink allows to get statistics for all cache entries, whatever the table is. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04ip6mr: use nla_nest_* helpersNicolas Dichtel
This patch removes the skb manipulations when nested attributes are added by using standard helpers. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04netconf: advertise mc_forwarding statusNicolas Dichtel
This patch advertise the MC_FORWARDING status for IPv4 and IPv6. This field is readonly, only multicast engine in the kernel updates it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== * Remove limitation in the maximum number of supported sets in ipset. Now ipset automagically increments the number of slots in the array of sets by 64 new spare slots, from Jozsef Kadlecsik. * Partially remove the generic queue infrastructure now that ip_queue is gone. Its only client is nfnetlink_queue now, from Florian Westphal. * Add missing attribute policy checkings in ctnetlink, from Florian Westphal. * Automagically kill conntrack entries that use the wrong output interface for the masquerading case in case of routing changes, from Jozsef Kadlecsik. * Two patches two improve ct object traceability. Now ct objects are always placed in any of the existing lists. This allows us to dump the content of unconfirmed and dying conntracks via ctnetlink as a way to provide more instrumentation in case you suspect leaks, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03ipv6: Fix default route failover when CONFIG_IPV6_ROUTER_PREF=nPaul Marks
I believe this commit from 2008 was incorrect: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=398bcbebb6f721ac308df1e3d658c0029bb74503 When CONFIG_IPV6_ROUTER_PREF is disabled, the kernel should follow RFC4861 section 6.3.6: if no route is NUD_VALID, then traffic should be sprayed across all routers (indirectly triggering NUD) until one of them becomes NUD_VALID. However, the following experiment demonstrates that this does not work: 1) Connect to an IPv6 network. 2) Change the router's MAC (and link-local) address. The kernel will lock onto the first router and never try the new one, even if the first becomes unreachable. This patch fixes the problem by allowing rt6_check_neigh() to return 0; if all routers return 0, then rt6_select() will fall back to round-robin behavior. This patch should have no effect when CONFIG_IPV6_ROUTER_PREF=y. Note that rt6_check_neigh() is only used in a boolean context, so I've changed its return type accordingly. Signed-off-by: Paul Marks <pmarks@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03ipv6: Make 'addrconf_rs_timer' send Router Solicitations (and re-arm itself) ↵Shmulik Ladkani
if Router Advertisements are accepted As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], Router Solicitations are sent whenever kernel accepts Router Advertisements on the interface. However, this logic isn't reflected in 'addrconf_rs_timer'. The timer fails to issue subsequent RS messages (and fails to re-arm itself) if forwarding is enabled and the special hybrid mode is enabled (accept_ra=2). Fix the condition determining whether next RS should be sent, by using 'ipv6_accept_ra()'. Reported-by: Ami Koren <amikoren@yahoo.com> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03netfilter: nf_nat: Handle routing changes in MASQUERADE targetJozsef Kadlecsik
When the route changes (backup default route, VPNs) which affect a masqueraded target, the packets were sent out with the outdated source address. The patch addresses the issue by comparing the outgoing interface directly with the masqueraded interface in the nat table. Events are inefficient in this case, because it'd require adding route events to the network core and then scanning the whole conntrack table and re-checking the route for all entry. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-01ipv6: unify logic evaluating inet6_dev's accept_ra propertyShmulik Ladkani
As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], the logic determining whether to send Router Solicitations is identical to the logic determining whether kernel accepts Router Advertisements. However the condition itself is repeated in several code locations. Unify it by introducing 'ipv6_accept_ra()' accessor. Also, simplify the condition expression, making it more readable. No semantic change. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30net: move inet_dport/inet_num in sock_commonEric Dumazet
commit 68835aba4d9b (net: optimize INET input path further) moved some fields used for tcp/udp sockets lookup in the first cache line of struct sock_common. This patch moves inet_dport/inet_num as well, filling a 32bit hole on 64 bit arches and reducing number of cache line misses in lookups. Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match before addresses match, as this check is more discriminant. Remove the hash check from MATCH() macros because we dont need to re validate the hash value after taking a refcount on socket, and use likely/unlikely compiler hints, as the sk_hash/hash check makes the following conditional tests 100% predicted by cpu. Introduce skc_addrpair/skc_portpair pair values to better document the alignment requirements of the port/addr pairs used in the various MATCH() macros, and remove some casts. The namespace check can also be done at last. This slightly improves TCP/UDP lookup times. IP/TCP early demux needs inet->rx_dst_ifindex and TCP needs inet->min_ttl, lets group them together in same cache line. With help from Ben Hutchings & Joe Perches. Idea of this patch came after Ling Ma proposal to move skc_hash to the beginning of struct sock_common, and should allow him to submit a final version of his patch. My tests show an improvement doing so. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Joe Perches <joe@perches.com> Cc: Ling Ma <ling.ma.program@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Conflicts: net/ipv6/exthdrs_core.c Jesse Gross says: ==================== This series of improvements for 3.8/net-next contains four components: * Support for modifying IPv6 headers * Support for matching and setting skb->mark for better integration with things like iptables * Ability to recognize the EtherType for RARP packets * Two small performance enhancements The movement of ipv6_find_hdr() into exthdrs_core.c causes two small merge conflicts. I left it as is but can do the merge if you want. The conflicts are: * ipv6_find_hdr() and ipv6_find_tlv() were both moved to the bottom of exthdrs_core.c. Both should stay. * A new use of ipv6_find_hdr() was added to net/netfilter/ipvs/ip_vs_core.c after this patch. The IPVS user has two instances of the old constant name IP6T_FH_F_FRAG which has been renamed to IP6_FH_F_FRAG. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28ip6tnl/sit: drop packet if ECN present with not-ECTNicolas Dichtel
This patch reports the change made by Stephen Hemminger in ipip and gre[6] in commit eccc1bb8d4b4 (tunnel: drop packet if ECN present with not-ECT). Goal is to handle RFC6040, Section 4.2: Default Tunnel Egress Behaviour. o If the inner ECN field is Not-ECT, the decapsulator MUST NOT propagate any other ECN codepoint onwards. This is because the inner Not-ECT marking is set by transports that rely on dropped packets as an indication of congestion and would not understand or respond to any other ECN codepoint [RFC4774]. Specifically: * If the inner ECN field is Not-ECT and the outer ECN field is CE, the decapsulator MUST drop the packet. * If the inner ECN field is Not-ECT and the outer ECN field is Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the outgoing packet with the ECN field cleared to Not-ECT. The patch takes benefits from common function added in net/inet_ecn.h. Like it was done for Xin4 tunnels, it adds logging to allow detecting broken systems that set ECN bits incorrectly when tunneling (or an intermediate router might be changing the header). Errors are also tracked via rx_frame_error. CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26ip6mr: Add sizeof verification to MRT6_ASSERT and MT6_PIMJoe Perches
Verify the length of the user-space arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-25ipv4/ipmr and ipv6/ip6mr: Convert int mroute_do_<foo> to boolJoe Perches
Save a few bytes per table by convert mroute_do_assert and mroute_do_pim from int to bool. Remove !! as the compiler does that when assigning int to bool. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/wireless/iwlwifi/pcie/tx.c Minor iwlwifi conflict in TX queue disabling between 'net', which removed a bogus warning, and 'net-next' which added some status register poking code. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-22ipv6: adapt connect for repair moveAndrey Vagin
This is work the same as for ipv4. All other hacks about tcp repair are in common code for ipv4 and ipv6, so this patch is enough for repairing ipv6 connections. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== This pull request is intended for net-next and contains the following changes: 1) Remove a redundant check when initializing the xfrm replay functions, from Ulrich Weber. 2) Use a faster per-cpu helper when allocating ipcomt transforms, from Shan Wei. 3) Use a static gc threshold value for ipv6, simmilar to what we do for ipv4 now. 4) Remove a commented out function call. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-20ipv6: fix inet6_csk_update_pmtu() return valueEric Dumazet
In case of error, inet6_csk_update_pmtu() should consistently return NULL. Bug added in commit 35ad9b9cf7d8a (ipv6: Add helper inet6_csk_update_pmtu().) Reported-by: Lluís Batlle i Rossell <viric@viric.name> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-20sit: allow to configure 6rd tunnels via netlinkNicolas Dichtel
This patch add the support of 6RD tunnels management via netlink. Note that netdev_state_change() is now called when 6RD parameters are updated. 6RD parameters are updated only if there is at least one 6RD attribute. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Make CAP_NET_BIND_SERVICE per user namespaceEric W. Biederman
Allow privileged users in any user namespace to bind to privileged sockets in network namespaces they control. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Enable a userns root rtnl calls that are safe for unprivilged usersEric W. Biederman
- Only allow moving network devices to network namespaces you have CAP_NET_ADMIN privileges over. - Enable creating/deleting/modifying interfaces - Enable adding/deleting addresses - Enable adding/setting/deleting neighbour entries - Enable adding/removing routes - Enable adding/removing fib rules - Enable setting the forwarding state - Enable adding/removing ipv6 address labels - Enable setting bridge parameter Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Enable some sysctls that are safe for the userns rootEric W. Biederman
- Enable the per device ipv4 sysctls: net/ipv4/conf/<if>/forwarding net/ipv4/conf/<if>/mc_forwarding net/ipv4/conf/<if>/accept_redirects net/ipv4/conf/<if>/secure_redirects net/ipv4/conf/<if>/shared_media net/ipv4/conf/<if>/rp_filter net/ipv4/conf/<if>/send_redirects net/ipv4/conf/<if>/accept_source_route net/ipv4/conf/<if>/accept_local net/ipv4/conf/<if>/src_valid_mark net/ipv4/conf/<if>/proxy_arp net/ipv4/conf/<if>/medium_id net/ipv4/conf/<if>/bootp_relay net/ipv4/conf/<if>/log_martians net/ipv4/conf/<if>/tag net/ipv4/conf/<if>/arp_filter net/ipv4/conf/<if>/arp_announce net/ipv4/conf/<if>/arp_ignore net/ipv4/conf/<if>/arp_accept net/ipv4/conf/<if>/arp_notify net/ipv4/conf/<if>/proxy_arp_pvlan net/ipv4/conf/<if>/disable_xfrm net/ipv4/conf/<if>/disable_policy net/ipv4/conf/<if>/force_igmp_version net/ipv4/conf/<if>/promote_secondaries net/ipv4/conf/<if>/route_localnet - Enable the global ipv4 sysctl: net/ipv4/ip_forward - Enable the per device ipv6 sysctls: net/ipv6/conf/<if>/forwarding net/ipv6/conf/<if>/hop_limit net/ipv6/conf/<if>/mtu net/ipv6/conf/<if>/accept_ra net/ipv6/conf/<if>/accept_redirects net/ipv6/conf/<if>/autoconf net/ipv6/conf/<if>/dad_transmits net/ipv6/conf/<if>/router_solicitations net/ipv6/conf/<if>/router_solicitation_interval net/ipv6/conf/<if>/router_solicitation_delay net/ipv6/conf/<if>/force_mld_version net/ipv6/conf/<if>/use_tempaddr net/ipv6/conf/<if>/temp_valid_lft net/ipv6/conf/<if>/temp_prefered_lft net/ipv6/conf/<if>/regen_max_retry net/ipv6/conf/<if>/max_desync_factor net/ipv6/conf/<if>/max_addresses net/ipv6/conf/<if>/accept_ra_defrtr net/ipv6/conf/<if>/accept_ra_pinfo net/ipv6/conf/<if>/accept_ra_rtr_pref net/ipv6/conf/<if>/router_probe_interval net/ipv6/conf/<if>/accept_ra_rt_info_max_plen net/ipv6/conf/<if>/proxy_ndp net/ipv6/conf/<if>/accept_source_route net/ipv6/conf/<if>/optimistic_dad net/ipv6/conf/<if>/mc_forwarding net/ipv6/conf/<if>/disable_ipv6 net/ipv6/conf/<if>/accept_dad net/ipv6/conf/<if>/force_tllao - Enable the global ipv6 sysctls: net/ipv6/bindv6only net/ipv6/icmp/ratelimit Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Allow userns root to control ipv6Eric W. Biederman
Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Settings that merely control a single network device are allowed. Either the network device is a logical network device where restrictions make no difference or the network device is hardware NIC that has been explicity moved from the initial network namespace. In general policy and network stack state changes are allowed while resource control is left unchanged. Allow the SIOCSIFADDR ioctl to add ipv6 addresses. Allow the SIOCDIFADDR ioctl to delete ipv6 addresses. Allow the SIOCADDRT ioctl to add ipv6 routes. Allow the SIOCDELRT ioctl to delete ipv6 routes. Allow creation of ipv6 raw sockets. Allow setting the IPV6_JOIN_ANYCAST socket option. Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR socket option. Allow setting the IPV6_TRANSPARENT socket option. Allow setting the IPV6_HOPOPTS socket option. Allow setting the IPV6_RTHDRDSTOPTS socket option. Allow setting the IPV6_DSTOPTS socket option. Allow setting the IPV6_IPSEC_POLICY socket option. Allow setting the IPV6_XFRM_POLICY socket option. Allow sending packets with the IPV6_2292HOPOPTS control message. Allow sending packets with the IPV6_2292DSTOPTS control message. Allow sending packets with the IPV6_RTHDRDSTOPTS control message. Allow setting the multicast routing socket options on non multicast routing sockets. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for setting up, changing and deleting tunnels over ipv6. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for setting up, changing and deleting ipv6 over ipv4 tunnels. Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding, deleting, and changing the potential router list for ISATAP tunnels. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Push capable(CAP_NET_ADMIN) into the rtnl methodsEric W. Biederman
- In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check to ns_capable(net->user-ns, CAP_NET_ADMIN). Allowing unprivileged users to make netlink calls to modify their local network namespace. - In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so that calls that are not safe for unprivileged users are still protected. Later patches will remove the extra capable calls from methods that are safe for unprivilged users. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Don't export sysctls to unprivileged usersEric W. Biederman
In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users. This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>