aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2011-04-28ipv4: Fetch route saddr from flow key in tcp_v4_connect().David S. Miller
Now that output route lookups update the flow with source address selection, we can fetch it from fl4->saddr instead of rt->rt_src Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28ipv4: Fetch route saddr from flow key in ip4_datagram_connect().David S. Miller
Now that output route lookups update the flow with source address selection, we can fetch it from fl4->saddr instead of rt->rt_src Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28ipv4: Fetch route saddr from flow key in inet_sk_reselect_saddr().David S. Miller
Now that output route lookups update the flow with source address selection, we can fetch it from fl4->saddr instead of rt->rt_src Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28ipv4: Use caller's on-stack flowi as-is in output route lookups.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28net: Use non-zero allocations in dst_alloc().David S. Miller
Make dst_alloc() and it's users explicitly initialize the entire entry. The zero'ing done by kmem_cache_zalloc() was almost entirely redundant. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28net: Make dst_alloc() take more explicit initializations.David S. Miller
Now the dst->dev, dev->obsolete, and dst->flags values can be specified as well. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28inet: add RCU protection to inet->optEric Dumazet
We lack proper synchronization to manipulate inet->opt ip_options Problem is ip_make_skb() calls ip_setup_cork() and ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options), without any protection against another thread manipulating inet->opt. Another thread can change inet->opt pointer and free old one under us. Use RCU to protect inet->opt (changed to inet->inet_opt). Instead of handling atomic refcounts, just copy ip_options when necessary, to avoid cache line dirtying. We cant insert an rcu_head in struct ip_options since its included in skb->cb[], so this patch is large because I had to introduce a new ip_options_rcu structure. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-27ipv4: Remove erroneous check in igmpv3_newpack() and igmp_send_report().David S. Miller
Output route resolution never returns a route with rt_src set to zero (which is INADDR_ANY). Even if the flow key for the output route lookup specifies INADDR_ANY for the source address, the output route resolution chooses a real source address to use in the final route. This test has existed forever in igmp_send_report() and David Stevens simply copied over the erroneous test when implementing support for IGMPv3. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-04-27ipv4: Sanitize and simplify ip_route_{connect,newports}()David S. Miller
These functions are used together as a unit for route resolution during connect(). They address the chicken-and-egg problem that exists when ports need to be allocated during connect() processing, yet such port allocations require addressing information from the routing code. It's currently more heavy handed than it needs to be, and in particular we allocate and initialize a flow object twice. Let the callers provide the on-stack flow object. That way we only need to initialize it once in the ip_route_connect() call. Later, if ip_route_newports() needs to do anything, it re-uses that flow object as-is except for the ports which it updates before the route re-lookup. Also, describe why this set of facilities are needed and how it works in a big comment. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-04-26Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Resolved logic conflicts causing a build failure due to drivers/net/r8169.c changes using a patch from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-25net: provide cow_metrics() methods to blackhole dst_opsHeld Bernhard
Since commit 62fa8a846d7d (net: Implement read-only protection and COW'ing of metrics.) the kernel throws an oops. [ 101.620985] BUG: unable to handle kernel NULL pointer dereference at (null) [ 101.621050] IP: [< (null)>] (null) [ 101.621084] PGD 6e53c067 PUD 3dd6a067 PMD 0 [ 101.621122] Oops: 0010 [#1] SMP [ 101.621153] last sysfs file: /sys/devices/virtual/ppp/ppp/uevent [ 101.621192] CPU 2 [ 101.621206] Modules linked in: l2tp_ppp pppox ppp_generic slhc l2tp_netlink l2tp_core deflate zlib_deflate twofish_x86_64 twofish_common des_generic cbc ecb sha1_generic hmac af_key iptable_filter snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device loop snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm snd_timer snd i2c_i801 iTCO_wdt psmouse soundcore snd_page_alloc evdev uhci_hcd ehci_hcd thermal [ 101.621552] [ 101.621567] Pid: 5129, comm: openl2tpd Not tainted 2.6.39-rc4-Quad #3 Gigabyte Technology Co., Ltd. G33-DS3R/G33-DS3R [ 101.621637] RIP: 0010:[<0000000000000000>] [< (null)>] (null) [ 101.621684] RSP: 0018:ffff88003ddeba60 EFLAGS: 00010202 [ 101.621716] RAX: ffff88003ddb5600 RBX: ffff88003ddb5600 RCX: 0000000000000020 [ 101.621758] RDX: ffffffff81a69a00 RSI: ffffffff81b7ee61 RDI: ffff88003ddb5600 [ 101.621800] RBP: ffff8800537cd900 R08: 0000000000000000 R09: ffff88003ddb5600 [ 101.621840] R10: 0000000000000005 R11: 0000000000014b38 R12: ffff88003ddb5600 [ 101.621881] R13: ffffffff81b7e480 R14: ffffffff81b7e8b8 R15: ffff88003ddebad8 [ 101.621924] FS: 00007f06e4182700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 101.621971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 101.622005] CR2: 0000000000000000 CR3: 0000000045274000 CR4: 00000000000006e0 [ 101.622046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 101.622087] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 101.622129] Process openl2tpd (pid: 5129, threadinfo ffff88003ddea000, task ffff88003de9a280) [ 101.622177] Stack: [ 101.622191] ffffffff81447efa ffff88007d3ded80 ffff88003de9a280 ffff88007d3ded80 [ 101.622245] 0000000000000001 ffff88003ddebbb8 ffffffff8148d5a7 0000000000000212 [ 101.622299] ffff88003dcea000 ffff88003dcea188 ffffffff00000001 ffffffff81b7e480 [ 101.622353] Call Trace: [ 101.622374] [<ffffffff81447efa>] ? ipv4_blackhole_route+0x1ba/0x210 [ 101.622415] [<ffffffff8148d5a7>] ? xfrm_lookup+0x417/0x510 [ 101.622450] [<ffffffff8127672a>] ? extract_buf+0x9a/0x140 [ 101.622485] [<ffffffff8144c6a0>] ? __ip_flush_pending_frames+0x70/0x70 [ 101.622526] [<ffffffff8146fbbf>] ? udp_sendmsg+0x62f/0x810 [ 101.622562] [<ffffffff813f98a6>] ? sock_sendmsg+0x116/0x130 [ 101.622599] [<ffffffff8109df58>] ? find_get_page+0x18/0x90 [ 101.622633] [<ffffffff8109fd6a>] ? filemap_fault+0x12a/0x4b0 [ 101.622668] [<ffffffff813fb5c4>] ? move_addr_to_kernel+0x64/0x90 [ 101.622706] [<ffffffff81405d5a>] ? verify_iovec+0x7a/0xf0 [ 101.622739] [<ffffffff813fc772>] ? sys_sendmsg+0x292/0x420 [ 101.622774] [<ffffffff810b994a>] ? handle_pte_fault+0x8a/0x7c0 [ 101.622810] [<ffffffff810b76fe>] ? __pte_alloc+0xae/0x130 [ 101.622844] [<ffffffff810ba2f8>] ? handle_mm_fault+0x138/0x380 [ 101.622880] [<ffffffff81024af9>] ? do_page_fault+0x189/0x410 [ 101.622915] [<ffffffff813fbe03>] ? sys_getsockname+0xf3/0x110 [ 101.622952] [<ffffffff81450c4d>] ? ip_setsockopt+0x4d/0xa0 [ 101.622986] [<ffffffff813f9932>] ? sockfd_lookup_light+0x22/0x90 [ 101.623024] [<ffffffff814b61fb>] ? system_call_fastpath+0x16/0x1b [ 101.623060] Code: Bad RIP value. [ 101.623090] RIP [< (null)>] (null) [ 101.623125] RSP <ffff88003ddeba60> [ 101.623146] CR2: 0000000000000000 [ 101.650871] ---[ end trace ca3856a7d8e8dad4 ]--- [ 101.651011] __sk_free: optmem leakage (160 bytes) detected. The oops happens in dst_metrics_write_ptr() include/net/dst.h:124: return dst->ops->cow_metrics(dst, p); dst->ops->cow_metrics is NULL and causes the oops. Provide cow_metrics() methods, like we did in commit 214f45c91bb (net: provide default_advmss() methods to blackhole dst_ops) Signed-off-by: Held Bernhard <berny156@gmx.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-22inet: constify ip headers and in6_addrEric Dumazet
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-19Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2011-04-19Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x_ethtool.c
2011-04-17bonding, ipv4, ipv6, vlan: Handle NETDEV_BONDING_FAILOVER like ↵Ben Hutchings
NETDEV_NOTIFY_PEERS It is undesirable for the bonding driver to be poking into higher level protocols, and notifiers provide a way to avoid that. This does mean removing the ability to configure reptitition of gratuitous ARPs and unsolicited NAs. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-14ip: ip_options_compile() resilient to NULL skb routeEric Dumazet
Scot Doyle demonstrated ip_options_compile() could be called with an skb without an attached route, using a setup involving a bridge, netfilter, and forged IP packets. Let's make ip_options_compile() and ip_options_rcv_srr() a bit more robust, instead of changing bridge/netfilter code. With help from Hiroaki SHIMODA. Reported-by: Scot Doyle <lkml@scotdoyle.com> Tested-by: Scot Doyle <lkml@scotdoyle.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-14ipv4: Call fib_select_default() only when actually necessary.David S. Miller
fib_select_default() is a complete NOP, and completely pointless to invoke, when we have no more than 1 default route installed. And this is far and away the common case. So remember how many prefixlen==0 routes we have in the routing table, and elide the call when we have no more than one of those. This cuts output route creation time by 157 cycles on Niagara2+. In order to add the new int to fib_table, we have to correct the type of ->tb_data[] to unsigned long, otherwise the private area will be unaligned on 64-bit systems. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-04-13Revert "tcp: disallow bind() to reuse addr/port"David S. Miller
This reverts commit c191a836a908d1dd6b40c503741f91b914de3348. It causes known regressions for programs that expect to be able to use SO_REUSEADDR to shutdown a socket, then successfully rebind another socket to the same ID. Programs such as haproxy and amavisd expect this to work. This should fix kernel bugzilla 32832. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12net: Do not wrap sysctl igmp_max_memberships in IP_MULTICASTJoakim Tjernlund
controlling igmp_max_membership is useful even when IP_MULTICAST is off. Quagga(an OSPF deamon) uses multicast addresses for all interfaces using a single socket and hits igmp_max_membership limit when there are 20 interfaces or more. Always export sysctl igmp_max_memberships in proc, just like igmp_max_msf Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12inetpeer: reduce stack usageEric Dumazet
On 64bit arches, we use 752 bytes of stack when cleanup_once() is called from inet_getpeer(). Lets share the avl stack to save ~376 bytes. Before patch : # objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl 0x000006c3 unlink_from_pool [inetpeer.o]: 376 0x00000721 unlink_from_pool [inetpeer.o]: 376 0x00000cb1 inet_getpeer [inetpeer.o]: 376 0x00000e6d inet_getpeer [inetpeer.o]: 376 0x0004 inet_initpeers [inetpeer.o]: 112 # size net/ipv4/inetpeer.o text data bss dec hex filename 5320 432 21 5773 168d net/ipv4/inetpeer.o After patch : objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl 0x00000c11 inet_getpeer [inetpeer.o]: 376 0x00000dcd inet_getpeer [inetpeer.o]: 376 0x00000ab9 peer_check_expire [inetpeer.o]: 328 0x00000b7f peer_check_expire [inetpeer.o]: 328 0x0004 inet_initpeers [inetpeer.o]: 112 # size net/ipv4/inetpeer.o text data bss dec hex filename 5163 432 21 5616 15f0 net/ipv4/inetpeer.o Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Scot Doyle <lkml@scotdoyle.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Reviewed-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-11Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/smsc911x.c
2011-04-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits) net: Add support for SMSC LAN9530, LAN9730 and LAN89530 mlx4_en: Restoring RX buffer pointer in case of failure mlx4: Sensing link type at device initialization ipv4: Fix "Set rt->rt_iif more sanely on output routes." MAINTAINERS: add entry for Xen network backend be2net: Fix suspend/resume operation be2net: Rename some struct members for clarity pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev dsa/mv88e6131: add support for mv88e6085 switch ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2) be2net: Fix a potential crash during shutdown. bna: Fix for handling firmware heartbeat failure can: mcp251x: Allow pass IRQ flags through platform data. smsc911x: fix mac_lock acquision before calling smsc911x_mac_read iwlwifi: accept EEPROM version 0x423 for iwl6000 rt2x00: fix cancelling uninitialized work rtlwifi: Fix some warnings/bugs p54usb: IDs for two new devices wl12xx: fix potential buffer overflow in testmode nvs push zd1211rw: reset rx idle timer from tasklet ...
2011-04-10Disable rp_filter for IPsec packetsMichael Smith
The reverse path filter interferes with IPsec subnet-to-subnet tunnels, especially when the link to the IPsec peer is on an interface other than the one hosting the default route. With dynamic routing, where the peer might be reachable through eth0 today and eth1 tomorrow, it's difficult to keep rp_filter enabled unless fake routes to the remote subnets are configured on the interface currently used to reach the peer. IPsec provides a much stronger anti-spoofing policy than rp_filter, so this patch disables the rp_filter for packets with a security path. Signed-off-by: Michael Smith <msmith@cbnco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-10fib_validate_source(): pass sk_buff instead of markMichael Smith
This makes sk_buff available for other use in fib_validate_source(). Signed-off-by: Michael Smith <msmith@cbnco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-07Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/benet/be_main.c
2011-04-07ipv4: Fix "Set rt->rt_iif more sanely on output routes."OGAWA Hirofumi
Commit 1018b5c01636c7c6bda31a719bda34fc631db29a ("Set rt->rt_iif more sanely on output routes.") breaks rt_is_{output,input}_route. This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0". To fix it, this does: 1) Add "int rt_route_iif;" to struct rtable 2) For input routes, always set rt_route_iif to same value as rt_iif 3) For output routes, always set rt_route_iif to zero. Set rt_iif as it is done currently. 4) Change rt_is_{output,input}_route() to test rt_route_iif Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-07Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6Linus Torvalds
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
2011-04-05Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
2011-04-04net: Allow no-cache copy from user on transmitTom Herbert
This patch uses __copy_from_user_nocache on transmit to bypass data cache for a performance improvement. skb_add_data_nocache and skb_copy_to_page_nocache can be called by sendmsg functions to use this feature, initial support is in tcp_sendmsg. This functionality is configurable per device using ethtool. Presumably, this feature would only be useful when the driver does not touch the data. The feature is turned on by default if a device indicates that it does some form of checksum offload; it is off by default for devices that do no checksum offload or indicate no checksum is necessary. For the former case copy-checksum is probably done anyway, in the latter case the device is likely loopback in which case the no cache copy is probably not beneficial. This patch was tested using 200 instances of netperf TCP_RR with 1400 byte request and one byte reply. Platform is 16 core AMD x86. No-cache copy disabled: 672703 tps, 97.13% utilization 50/90/99% latency:244.31 484.205 1028.41 No-cache copy enabled: 702113 tps, 96.16% utilization, 50/90/99% latency 238.56 467.56 956.955 Using 14000 byte request and response sizes demonstrate the effects more dramatically: No-cache copy disabled: 79571 tps, 34.34 %utlization 50/90/95% latency 1584.46 2319.59 5001.76 No-cache copy enabled: 83856 tps, 34.81% utilization 50/90/95% latency 2508.42 2622.62 2735.88 Note especially the effect on latency tail (95th percentile). This seems to provide a nice performance improvement and is consistent in the tests I ran. Presumably, this would provide the greatest benfits in the presence of an application workload stressing the cache and a lot of transmit data happening. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2011-04-04netfilter: get rid of atomic ops in fast pathEric Dumazet
We currently use a percpu spinlock to 'protect' rule bytes/packets counters, after various attempts to use RCU instead. Lately we added a seqlock so that get_counters() can run without blocking BH or 'writers'. But we really only need the seqcount in it. Spinlock itself is only locked by the current/owner cpu, so we can remove it completely. This cleanups api, using correct 'writer' vs 'reader' semantic. At replace time, the get_counters() call makes sure all cpus are done using the old table. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04netfilter: af_info: add 'strict' parameter to limit lookup to .oifFlorian Westphal
ipv6 fib lookup can set RT6_LOOKUP_F_IFACE flag to restrict search to an interface, but this flag cannot be set via struct flowi. Also, it cannot be set via ip6_route_output: this function uses the passed sock struct to determine if this flag is required (by testing for nonzero sk_bound_dev_if). Work around this by passing in an artificial struct sk in case 'strict' argument is true. This is required to replace the rt6_lookup call in xt_addrtype.c with nf_afinfo->route(). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04netfilter: af_info: add network namespace parameter to route hookFlorian Westphal
This is required to eventually replace the rt6_lookup call in xt_addrtype.c with nf_afinfo->route(). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-01tcp: len check is unnecessarily devastating, change to WARN_ONIlpo Järvinen
All callers are prepared for alloc failures anyway, so this error can safely be boomeranged to the callers domain without super bad consequences. ...At worst the connection might go into a state where each RTO tries to (unsuccessfully) re-fragment with such a mis-sized value and eventually dies. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31ipv4: Use flowi4_init_output() in udp_sendmsg()David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31ipv4: Use flowi4_init_output() in cookie_v4_check()David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31ipv4: Use flowi4_init_output() in raw_sendmsg()David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31ipv4: Use flowi4_init_output() in ip_send_reply()David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31ipv4: Use flowi4_init_output() in inet_connection_sock.cDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31fib: add __rcu annotationsEric Dumazet
Add __rcu annotations and lockdep checks. Add const qualifiers node_parent() and node_parent_rcu() can use rcu_dereference_index_check() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-30fib: add rtnl locking in ip_fib_net_exitEric Dumazet
Daniel J Blueman reported a lockdep splat in trie_firstleaf(), caused by RTNL being not locked before a call to fib_table_flush() Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-30net: gre: provide multicast mappings for ipv4 and ipv6Timo Teräs
My commit 6d55cb91a0020ac0 (gre: fix hard header destination address checking) broke multicast. The reason is that ip_gre used to get ipgre_header() calls with zero destination if we have NOARP or multicast destination. Instead the actual target was decided at ipgre_tunnel_xmit() time based on per-protocol dissection. Instead of allowing the "abuse" of ->header() calls with invalid destination, this creates multicast mappings for ip_gre. This also fixes "ip neigh show nud noarp" to display the proper multicast mappings used by the gre device. Reported-by: Doug Kehn <rdkehn@yahoo.com> Signed-off-by: Timo Teräs <timo.teras@iki.fi> Acked-by: Doug Kehn <rdkehn@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-28ipv4: Don't ip_rt_put() an error pointer in RAW sockets.David S. Miller
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27ipv4: Fix IP timestamp option (IPOPT_TS_PRESPEC) handling in ip_options_echo()Jan Luebbe
The current handling of echoed IP timestamp options with prespecified addresses is rather broken since the 2.2.x kernels. As far as i understand it, it should behave like when originating packets. Currently it will only timestamp the next free slot if: - there is space for *two* timestamps - some random data from the echoed packet taken as an IP is *not* a local IP This first is caused by an off-by-one error. 'soffset' points to the next free slot and so we only need to have 'soffset + 7 <= optlen'. The second bug is using sptr as the start of the option, when it really is set to 'skb_network_header(skb)'. I just use dptr instead which points to the timestamp option. Finally it would only timestamp for non-local IPs, which we shouldn't do. So instead we exclude all unicast destinations, similar to what we do in ip_options_compile(). Signed-off-by: Jan Luebbe <jluebbe@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-25ipv4: do not ignore route errorsJulian Anastasov
The "ipv4: Inline fib_semantic_match into check_leaf" change forgets to return the route errors. check_leaf should return the same results as fib_table_lookup. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24ipv4: Fix nexthop caching wrt. scoping.David S. Miller
Move the scope value out of the fib alias entries and into fib_info, so that we always use the correct scope when recomputing the nexthop cached source address. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24ipv4: Invalidate nexthop cache nh_saddr more correctly.David S. Miller
Any operation that: 1) Brings up an interface 2) Adds an IP address to an interface 3) Deletes an IP address from an interface can potentially invalidate the nh_saddr value, requiring it to be recomputed. Perform the recomputation lazily using a generation ID. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24ipv4: fix fib metricsEric Dumazet
Alessandro Suardi reported that we could not change route metrics : ip ro change default .... advmss 1400 This regression came with commit 9c150e82ac50 (Allocate fib metrics dynamically). fib_metrics is no longer an array, but a pointer to an array. Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Alessandro Suardi <alessandro.suardi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23ipv4: fix ip_rt_update_pmtu()Eric Dumazet
commit 2c8cec5c10bc (Cache learned PMTU information in inetpeer) added an extra inet_putpeer() call in ip_rt_update_pmtu(). This results in various problems, since we can free one inetpeer, while it is still in use. Ref: http://www.spinics.net/lists/netdev/msg159121.html Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>