aboutsummaryrefslogtreecommitdiff
path: root/net/ipv6/route.c
AgeCommit message (Collapse)Author
2008-01-20[IPV6] ROUTE: Make sending algorithm more friendly with RFC 4861.YOSHIFUJI Hideaki
We omit (or delay) sending NSes for known-to-unreachable routers (in NUD_FAILED state) according to RFC 4191 (Default Router Preferences and More-Specific Routes). But this is not fully compatible with RFC 4861 (Neighbor Discovery Protocol for IPv6), which does not remember unreachability of neighbors. So, let's avoid mixing sending algorithm of RFC 4191 and that of RFC 4861, and make the algorithm more friendly with RFC 4861 if RFC 4191 is disabled. Issue was found by IPv6 Ready Logo Core Self_Test 1.5.0b2 (by TAHI Project), and has been tracked down by Mitsuru Chinen <mitch@linux.vnet.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-10[NET]: Make helper to get dst entry and "use" itPavel Emelyanov
There are many places that get the dst entry, increase the __use counter and set the "lastuse" time stamp. Make a helper for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07[IPV6]: Convert /proc/net/ipv6_route to seq_file interfaceAlexey Dobriyan
This removes last proc_net_create() user. Kudos to Benjamin Thery and Stephen Hemminger for comments on previous version. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-18sysctl: ipv6 route flushing (kill binary path)Eric W. Biederman
We don't preoperly support the sysctl binary path for flushing the ipv6 routes. So remove support for a binary path. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-15[IPV6]: Consolidate the ip6_pol_route_(input|output) pairPavel Emelyanov
The difference in both functions is in the "id" passed to the rt6_select, so just pass it as an extra argument from two outer helpers. This is minus 60 lines of code and 360 bytes of .text Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make the loopback device per network namespace.Eric W. Biederman
This patch makes loopback_dev per network namespace. Adding code to create a different loopback device for each network namespace and adding the code to free a loopback device when a network namespace exits. This patch modifies all users the loopback_dev so they access it as init_net.loopback_dev, keeping all of the code compiling and working. A later pass will be needed to update the users to use something other than the initial network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Dynamically allocate the loopback device, part 1.Daniel Lezcano
This patch replaces all occurences to the static variable loopback_dev to a pointer loopback_dev. That provides the mindless, trivial, uninteressting change part for the dynamic allocation for the loopback. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-By: Kirill Korotaev <dev@sw.ru> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NETLINK]: Introduce nested and byteorder flag to netlink attributeThomas Graf
This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make the device list and device lookups per namespace.Eric W. Biederman
This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make /proc/net per network namespaceEric W. Biederman
This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31[IPV6]: Don't update ADVMSS on routes where the MTU is not also updatedSimon Arlott
The ADVMSS value was incorrectly updated for ALL routes when the MTU is updated because it's outside the effect of the if statement's condition. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-20mm: Remove slab destructors from kmem_cache_create().Paul Mundt
Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-06-07[NETLINK]: Mark netlink policies constPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24[XFRM]: Allow packet drops during larval state resolution.David S. Miller
The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[IPV6] SNMP: Fix several warnings without procfs.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-04-25[NET]: cleanup extra semicolonsStephen Hemminger
Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[IPv6]: Use rtnl registration interfaceThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6hArnaldo Carvalho de Melo
Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce skb_reset_mac_header(skb)Arnaldo Carvalho de Melo
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[IPV6]: Decentralize EXPORT_SYMBOLs.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-04-13[IPV6] SNMP: Fix {In,Out}NoRoutes statistics.YOSHIFUJI Hideaki
A packet which is being discarded because of no routes in the forwarding path should not be counted as OutNoRoutes but as InNoRoutes. Additionally, on this occasion, a packet whose destinaion is not valid should be counted as InAddrErrors separately. Based on patch from Mitsuru Chinen <mitch@linux.vnet.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-06[IPV6]: Revert recent change to rt6_check_dev().David S. Miller
This reverts a0d78ebf3a0e33a1aeacf2fc518ad9273d6a1c2f It causes pings to link-local addresses to fail. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25[IPV6]: Fix routing round-robin locking.David S. Miller
As per RFC2461, section 6.3.6, item #2, when no routers on the matching list are known to be reachable or probably reachable we do round robin on those available routes so that we make sure to probe as many of them as possible to detect when one becomes reachable faster. Each routing table has a rwlock protecting the tree and the linked list of routes at each leaf. The round robin code executes during lookup and thus with the rwlock taken as a reader. A small local spinlock tries to provide protection but this does not work at all for two reasons: 1) The round-robin list manipulation, as coded, goes like this (with read lock held): walk routes finding head and tail spin_lock(); rotate list using head and tail spin_unlock(); While one thread is rotating the list, another thread can end up with stale values of head and tail and then proceed to corrupt the list when it gets the lock. This ends up causing the OOPS in fib6_add() later onthat many people have been hitting. 2) All the other code paths that run with the rwlock held as a reader do not expect the list to change on them, they expect it to remain completely fixed while they hold the lock in that way. So, simply stated, it is impossible to implement this correctly using a manipulation of the list without violating the rwlock locking semantics. Reimplement using a per-fib6_node round-robin pointer. This way we don't need to manipulate the list at all, and since the round-robin pointer can only ever point to real existing entries we don't need to perform any locking on the changing of the round-robin pointer itself. We only need to reset the round-robin pointer to NULL when the entry it is pointing to is removed. The idea is from Thomas Graf and it is very similar to how this was implemented before the advanced router selection code when in. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-22[NET]: fix up misplaced inlines.Dave Jones
Turning up the warnings on gcc makes it emit warnings about the placement of 'inline' in function declarations. Here's everything that was under net/ Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[PATCH] mark struct file_operations const 7Arjan van de Ven
Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-10[IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointerEric Dumazet
This patch removes the next pointer from 'struct rt6_info.u' union, and renames u.next to u.dst.rt6_next. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[NET] IPV6: Fix whitespace errors.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[IPV6] ROUTE: Do not route packets to link-local address on other device.YOSHIFUJI Hideaki
With help from Wei Dong <weid@np.css.fujitsu.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08[NETLINK]: Don't BUG on undersized allocationsPatrick McHardy
Currently netlink users BUG when the allocated skb for an event notification is undersized. While this is certainly a kernel bug, its not critical and crashing the kernel is too drastic, especially when considering that these errors have appeared multiple times in the past and it BUGs even if no listeners are present. This patch replaces BUG by WARN_ON and changes the notification functions to inform potential listeners of undersized allocations using a unique error code (EMSGSIZE). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-23[IPV6]: Fixed the size of the netlink message notified by inet6_rt_notify().Noriaki TAKAMIYA
I think the return value of rt6_nlmsg_size() should includes the amount of RTA_METRICS. Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-13[IPV6]: Make fib6_node subtree depend on IPV6_SUBTREESKim Nordlund
Make fib6_node 'subtree' depend on IPV6_SUBTREES. Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[RTNETLINK]: Add rtnl_put_cacheinfo() to unify some codeThomas Graf
IPv4, IPv6, and DECNet all use struct rta_cacheinfo in a similiar way, therefore rtnl_put_cacheinfo() is added to reuse code. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[IPV6]: Misc endianness annotations.Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NETLINK]: Do precise netlink message allocations where possibleThomas Graf
Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[IPV6]: Per-interface statistics support.YOSHIFUJI Hideaki
For IP MIB (RFC4293). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-12-02[IPV6] ROUTE: Use &rt->u.dst instead of cast.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-12-02[IPV6] ROUTE: Use macros to format /proc/net/ipv6_route.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-12-02[NET]: Rethink mark field in struct flowiThomas Graf
Now that all protocols have been made aware of the mark field it can be moved out of the union thus simplyfing its usage. The config options in the IPv4/IPv6/DECnet subsystems to enable respectively disable mark based routing only obfuscate the code with ifdefs, the cost for the additional comparison in the flow key is insignificant, and most distributions have all these options enabled by default anyway. Therefore it makes sense to remove the config options and enable mark based routing by default. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET]: Turn nfmark into generic markThomas Graf
nfmark is being used in various subsystems and has become the defacto mark field for all kinds of packets. Therefore it makes sense to rename it to `mark' and remove the dependency on CONFIG_NETFILTER. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[IPV6]: flowlabels are net-endianAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-21[IPV6] ROUTE: Do not enable router reachability probing in router mode.YOSHIFUJI Hideaki
RFC4191 explicitly states that the procedures are applicable to hosts only. We should not have changed behavior of routers. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-11-21[IPV6] ROUTE: Prefer reachable nexthop only if the caller requests.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-11-21[IPV6] ROUTE: Try to use router which is not known unreachable.YOSHIFUJI Hideaki
Only routers in "FAILED" state should be considered unreachable. Otherwise, we do not try to use speicific routes unless all least specific routers are considered unreachable. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2006-10-18[IPV6]: Fix route.c warnings when multiple tables are disabled.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18[IPv6] route: Fix prohibit and blackhole routing decisionThomas Graf
Lookups resolving to ip6_blk_hole_entry must result in silently discarding the packets whereas an ip6_pkt_prohibit_entry is supposed to cause an ICMPV6_ADM_PROHIBITED message to be sent. Thanks to Kim Nordlund <kim.nordlund@nokia.com> for noticing this bug. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18[IPV6]: Always copy rt->u.dst.error when copying a rt6_info.Ville Nuorvala
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-18[IPV6]: Clean up BACKTRACK().Ville Nuorvala
The fn check is unnecessary as fn can never be NULL in BACKTRACK(). Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-15[IPv6] rules: Use RT6_LOOKUP_F_HAS_SADDR and fix source based selectorsThomas Graf
Fixes rt6_lookup() to provide the source address in the flow and sets RT6_LOOKUP_F_HAS_SADDR whenever it is present in the flow. Avoids unnecessary prefix comparisons by checking for a prefix length first. Fixes the rule logic to not match packets if a source selector has been specified but no source address is available. Thanks to Kim Nordlund <kim.nordlund@nokia.com> for working on this patch with me. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[IPV6] ROUTE: Fix dst reference counting in ip6_pol_route_lookup().YOSHIFUJI Hideaki
In ip6_pol_route_lookup(), when we finish backtracking at the top-level root entry, we need to hold it. Bug noticed by Mitsuru Chinen <CHINEN@jp.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NETLINK]: Make use of NLA_STRING/NLA_NUL_STRING attribute validationThomas Graf
Converts existing NLA_STRING attributes to use the new validation features, saving a couple of temporary buffers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>