aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4/udp.c
AgeCommit message (Collapse)Author
2011-12-09udp: Export code sk lookup routinesPavel Emelyanov
The UDP diag get_exact handler will require them to find a socket by provided net, [sd]addr-s, [sd]ports and device. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-12-01Revert "udp: remove redundant variable"David S. Miller
This reverts commit 81d54ec8479a2c695760da81f05b5a9fb2dbe40a. If we take the "try_again" goto, due to a checksum error, the 'len' has already been truncated. So we won't compute the same values as the original code did. Reported-by: paul bilke <fsmail@conspiracy.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-16net: introduce and use netdev_features_t for device features setsMichał Mirosław
v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-09ipv4: PKTINFO doesnt need dst referenceEric Dumazet
Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit : > At least, in recent kernels we dont change dst->refcnt in forwarding > patch (usinf NOREF skb->dst) > > One particular point is the atomic_inc(dst->refcnt) we have to perform > when queuing an UDP packet if socket asked PKTINFO stuff (for example a > typical DNS server has to setup this option) > > I have one patch somewhere that stores the information in skb->cb[] and > avoid the atomic_{inc|dec}(dst->refcnt). > OK I found it, I did some extra tests and believe its ready. [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference When a socket uses IP_PKTINFO notifications, we currently force a dst reference for each received skb. Reader has to access dst to get needed information (rt_iif & rt_spec_dst) and must release dst reference. We also forced a dst reference if skb was put in socket backlog, even without IP_PKTINFO handling. This happens under stress/load. We can instead store the needed information in skb->cb[], so that only softirq handler really access dst, improving cache hit ratios. This removes two atomic operations per packet, and false sharing as well. On a benchmark using a mono threaded receiver (doing only recvmsg() calls), I can reach 720.000 pps instead of 570.000 pps. IP_PKTINFO is typically used by DNS servers, and any multihomed aware UDP application. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-02udp: fix a race in encap_rcv handlingEric Dumazet
udp_queue_rcv_skb() has a possible race in encap_rcv handling, since this pointer can be changed anytime. We should use ACCESS_ONCE() to close the race. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01net: make the tcp and udp file_operations for the /proc stuff constArjan van de Ven
the tcp and udp code creates a set of struct file_operations at runtime while it can also be done at compile time, with the added benefit of then having these file operations be const. the trickiest part was to get the "THIS_MODULE" reference right; the naive method of declaring a struct in the place of registration would not work for this reason. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17rps: Add flag to skb to indicate rxhash is based on L4 tupleTom Herbert
The l4_rxhash flag was added to the skb structure to indicate that the rxhash value was computed over the 4 tuple for the packet which includes the port information in the encapsulated transport packet. This is used by the stack to preserve the rxhash value in __skb_rx_tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-12net: cleanup some rcu_dereference_rawEric Dumazet
RCU api had been completed and rcu_access_pointer() or rcu_dereference_protected() are better than generic rcu_dereference_raw() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c
2011-07-07net: refine {udp|tcp|sctp}_mem limitsEric Dumazet
Current tcp/udp/sctp global memory limits are not taking into account hugepages allocations, and allow 50% of ram to be used by buffers of a single protocol [ not counting space used by sockets / inodes ...] Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram per protocol, and a minimum of 128 pages. Heavy duty machines sysadmins probably need to tweak limits anyway. References: https://bugzilla.stlinux.com/show_bug.cgi?id=38032 Reported-by: starlight <starlight@binnacle.cx> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-05Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2011-06-21udp/recvmsg: Clear MSG_TRUNC flag when starting over for a new packetXufeng Zhang
Consider this scenario: When the size of the first received udp packet is bigger than the receive buffer, MSG_TRUNC bit is set in msg->msg_flags. However, if checksum error happens and this is a blocking socket, it will goto try_again loop to receive the next packet. But if the size of the next udp packet is smaller than receive buffer, MSG_TRUNC flag should not be set, but because MSG_TRUNC bit is not cleared in msg->msg_flags before receive the next packet, MSG_TRUNC is still set, which is wrong. Fix this problem by clearing MSG_TRUNC flag when starting over for a new packet. Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-21udp: add tracepoints for queueing skb to rcvbufSatoru Moriya
This patch adds a tracepoint to __udp_queue_rcv_skb to get the return value of ip_queue_rcv_skb. It indicates why kernel drops a packet at this point. ip_queue_rcv_skb returns following values in the packet drop case: rcvbuf is full : -ENOMEM sk_filter returns error : -EINVAL, -EACCESS, -ENOMEM, etc. __sk_mem_schedule returns error: -ENOBUF Signed-off-by: Satoru Moriya <satoru.moriya@hds.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-24net: convert %p usage to %pKDan Rosenberg
The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10ipv4: udp: Eliminate remaining uses of rt->rt_srcDavid S. Miller
We already track and pass around the correct flow key, so simply use it in udp_send_skb(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Pass flow key down into ip_append_*().David S. Miller
This way rt->rt_dst accesses are unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Pass flow keys down into datagram packet building engine.David S. Miller
This way ip_output.c no longer needs rt->rt_{src,dst}. We already have these keys sitting, ready and waiting, on the stack or in a socket structure. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08udp: Use flow key information instead of rt->rt_{src,dst}David S. Miller
We have two cases. Either the socket is in TCP_ESTABLISHED state and connect() filled in the inet socket cork flow, or we looked up the route here and used an on-stack flow. Track which one it was, and use it to obtain src/dst addrs. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28inet: add RCU protection to inet->optEric Dumazet
We lack proper synchronization to manipulate inet->opt ip_options Problem is ip_make_skb() calls ip_setup_cork() and ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options), without any protection against another thread manipulating inet->opt. Another thread can change inet->opt pointer and free old one under us. Use RCU to protect inet->opt (changed to inet->inet_opt). Instead of handling atomic refcounts, just copy ip_options when necessary, to avoid cache line dirtying. We cant insert an rcu_head in struct ip_options since its included in skb->cb[], so this patch is large because I had to introduce a new ip_options_rcu structure. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-22inet: constify ip headers and in6_addrEric Dumazet
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-11Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/smsc911x.c
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31ipv4: Use flowi4_init_output() in udp_sendmsg()David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12net: Put fl4_* macros to struct flowi4 and use them again.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12ipv4: Use flowi4 in UDPDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12ipv4: Use flowi4 in public route lookup interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12net: Make flowi ports AF dependent.David S. Miller
Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_* This will let us to create AF optimal flow instances. It will work because every context in which we access the ports, we have to be fully aware of which AF the flowi is anyways. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12net: Put flowi_* prefix on AF independent members of struct flowiDavid S. Miller
I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03ipv4: Fix crash in dst_release when udp_sendmsg route lookup fails.David S. Miller
As reported by Eric: [11483.697233] IP: [<c12b0638>] dst_release+0x18/0x60 ... [11483.697741] Call Trace: [11483.697764] [<c12fc9d2>] udp_sendmsg+0x282/0x6e0 [11483.697790] [<c12a1c01>] ? memcpy_toiovec+0x51/0x70 [11483.697818] [<c12dbd90>] ? ip_generic_getfrag+0x0/0xb0 The pointer passed to dst_release() is -EINVAL, that's because we leave an error pointer in the local variable "rt" by accident. NULL it out to fix the bug. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02ipv4: Make output route lookup return rtable directly.David S. Miller
Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Kill can_sleep arg to ip_route_output_flow()David S. Miller
This boolean state is now available in the flow flags. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01net: Add FLOWI_FLAG_CAN_SLEEP.David S. Miller
And set is in contexts where the route resolution can sleep. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep"David S. Miller
Since that is what the current vague "flags" argument means. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01udp: Add lockless transmit pathHerbert Xu
The UDP transmit path has been running under the socket lock for a long time because of the corking feature. This means that transmitting to the same socket in multiple threads does not scale at all. However, as most users don't actually use corking, the locking can be removed in the common case. This patch creates a lockless fast path where corking is not used. Please note that this does create a slight inaccuracy in the enforcement of socket send buffer limits. In particular, we may exceed the socket limit by up to (number of CPUs) * (packet size) because of the way the limit is computed. As the primary purpose of socket buffers is to indicate congestion, this should not be a great problem for now. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01udp: Switch to ip_finish_skbHerbert Xu
This patch converts UDP to use the new ip_finish_skb API. This would then allows us to more easily use ip_make_skb which allows UDP to run without a socket lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24net: change netdev->features to u32Michał Mirosław
Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-17Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h drivers/net/wireless/iwlwifi/iwl-1000.c drivers/net/wireless/iwlwifi/iwl-6000.c drivers/net/wireless/iwlwifi/iwl-core.h drivers/vhost/vhost.c
2010-12-16net: Use skb_checksum_start_offset()Michał Mirosław
Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset(). Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16net: fix nulls list corruptions in sk_prot_allocOctavian Purdila
Special care is taken inside sk_port_alloc to avoid overwriting skc_node/skc_nulls_node. We should also avoid overwriting skc_bind_node/skc_portaddr_node. The patch fixes the following crash: BUG: unable to handle kernel paging request at fffffffffffffff0 IP: [<ffffffff812ec6dd>] udp4_lib_lookup2+0xad/0x370 [<ffffffff812ecc22>] __udp4_lib_lookup+0x282/0x360 [<ffffffff812ed63e>] __udp4_lib_rcv+0x31e/0x700 [<ffffffff812bba45>] ? ip_local_deliver_finish+0x65/0x190 [<ffffffff812bbbf8>] ? ip_local_deliver+0x88/0xa0 [<ffffffff812eda35>] udp_rcv+0x15/0x20 [<ffffffff812bba45>] ip_local_deliver_finish+0x65/0x190 [<ffffffff812bbbf8>] ip_local_deliver+0x88/0xa0 [<ffffffff812bb2cd>] ip_rcv_finish+0x32d/0x6f0 [<ffffffff8128c14c>] ? netif_receive_skb+0x99c/0x11c0 [<ffffffff812bb94b>] ip_rcv+0x2bb/0x350 [<ffffffff8128c14c>] netif_receive_skb+0x99c/0x11c0 Signed-off-by: Leonard Crestez <lcrestez@ixiacom.com> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17net: use the macros defined for the members of flowiChangli Gao
Use the macros defined for the members of flowi to clean the code up. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16udp: use atomic_inc_not_zero_hintEric Dumazet
UDP sockets refcount is usually 2, unless an incoming frame is going to be queued in receive or backlog queue. Using atomic_inc_not_zero_hint() permits to reduce latency, because processor issues less memory transactions. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-10net: avoid limits overflowEric Dumazet
Robin Holt tried to boot a 16TB machine and found some limits were reached : sysctl_tcp_mem[2], sysctl_udp_mem[2] We can switch infrastructure to use long "instead" of "int", now atomic_long_t primitives are available for free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Robin Holt <holt@sgi.com> Reviewed-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25net: add __rcu annotation to sk_filterEric Dumazet
Add __rcu annotation to : (struct sock)->sk_filter And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/main.c
2010-09-08udp: add rehash on connect()Eric Dumazet
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-19net: simplify flags for tx timestampingOliver Hartkopp
This patch removes the abstraction introduced by the union skb_shared_tx in the shared skb data. The access of the different union elements at several places led to some confusion about accessing the shared tx_flags e.g. in skb_orphan_try(). http://marc.info/?l=linux-netdev&m=128084897415886&w=2 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10net-next: remove useless union keywordChangli Gao
remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31net: sock_queue_err_skb() dont mess with sk_forward_allocEric Dumazet
Correct sk_forward_alloc handling for error_queue would need to use a backlog of frames that softirq handler could not deliver because socket is owned by user thread. Or extend backlog processing to be able to process normal and error packets. Another possibility is to not use mem charge for error queue, this is what I implemented in this patch. Note: this reverts commit 29030374 (net: fix sk_forward_alloc corruptions), since we dont need to lock socket anymore. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller