summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
2013-08-20Automatically merging tracking-linaro-android-3.11 into ↵Andrey Konovalov
merge-linux-linaro-core-tracking Conflicting files:
2013-08-10netfilter: nf_conntrack: fix tcp_in_window for Fast OpenYuchung Cheng
Currently the conntrack checks if the ending sequence of a packet falls within the observed receive window. However it does so even if it has not observe any packet from the remote yet and uses an uninitialized receive window (td_maxwin). If a connection uses Fast Open to send a SYN-data packet which is dropped afterward in the network. The subsequent SYNs retransmits will all fail this check and be discarded, leading to a connection timeout. This is because the SYN retransmit does not contain data payload so end == initial sequence number (isn) + 1 sender->td_end == isn + syn_data_len receiver->td_maxwin == 0 The fix is to only apply this check after td_maxwin is initialized. Reported-by: Michael Chan <mcfchan@stanford.edu> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-05Fix 3.11-rc1 xt_sockets mismergeJohn Stultz
Tixy reported build issues with netfilter options enabled and sure enough it looks like I mucked things up in the merge. Tixy sent along a fixup patch and I took my own swipe at it, and the solutions were identical, so I suspect this is the right fix. Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-08-05netfilter: nfnetlink_{log,queue}: fix information leaks in netlink messageDan Carpenter
These structs have a "_pad" member. Also the "phw" structs have an 8 byte "hw_addr[]" array but sometimes only the first 6 bytes are initialized. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-01netfilter: xt_TCPOPTSTRIP: fix possible off by one accessPablo Neira Ayuso
Fix a possible off by one access since optlen() touches opt[offset+1] unsafely when i == tcp_hdrlen(skb) - 1. This patch replaces tcp_hdrlen() by the local variable tcp_hdrlen that stores the TCP header length, to save some cycles. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-01netfilter: xt_TCPMSS: fix handling of malformed TCP header and optionsPablo Neira Ayuso
Make sure the packet has enough room for the TCP header and that it is not malformed. While at it, store tcph->doff*4 in a variable, as it is used several times. This patch also fixes a possible off by one in case of malformed TCP options. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-17Merge tag 'v3.11-rc1' into linaro-android-3.11-experimental-mergetracking-linaro-android-3.11-llct-20130801.0tracking-linaro-android-3.11-llct-20130723.0tracking-linaro-android-3.11-llct-20130718.0John Stultz
Linux 3.11-rc1 Merged in v3.11-rc1 and fixed up a bunch of conflicts Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-15netfilter: xt_socket: fix broken v0 supportEric Dumazet
commit 681f130f39e10 ("netfilter: xt_socket: add XT_SOCKET_NOWILDCARD flag") added a potential NULL dereference if an old iptables package uses v0 of the match. Fix this by removing the test on @info in fast path. IPv6 can remove the test as well, as it uses v1 or v2. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-15netfilter: ctnetlink: fix incorrect NAT expectation dumpingPablo Neira Ayuso
nf_ct_expect_alloc leaves unset the expectation NAT fields. However, ctnetlink_exp_dump_expect expects them to be zeroed in case they are not used, which may not be the case. This results in dumping the NAT tuple of the expectation when it should not. Fix it by zeroing the NAT fields of the expectation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/freescale/fec_main.c drivers/net/ethernet/renesas/sh_eth.c net/ipv4/gre.c The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list) and the splitting of the gre.c code into seperate files. The FEC conflict was two sets of changes adding ethtool support code in an "!CONFIG_M5272" CPP protected block. Finally the sh_eth.c conflict was between one commit add bits set in the .eesr_err_check mask whilst another commit removed the .tx_error_check member and assignments. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01netfilter: xt_qtaguid: 3.10 fixesArve Hjønnevåg
Stop using obsolete procfs api. Signed-off-by: Arve Hjønnevåg <arve@android.com>
2013-07-01netfilter: xt_quota2: 3.10 fixes.Arve Hjønnevåg
- Stop using obsolete create_proc_entry api. - Use proc_set_user instead of directly accessing the private structure. Signed-off-by: Arve Hjønnevåg <arve@android.com>
2013-07-01netfilter: xt_qtaguid: fix bad tcp_time_wait sock handlingJP Abgrall
Since (41063e9 ipv4: Early TCP socket demux), skb's can have an sk which is not a struct sock but the smaller struct inet_timewait_sock without an sk->sk_socket. Now we bypass sk_state == TCP_TIME_WAIT Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: qtaguid: rate limit some of the printksJP Abgrall
Some of the printks are in the packet handling path. We now ratelimit the very unlikely errors to avoid kmsg spamming. Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: xt_qtaguid: Allow tracking loopbackJP Abgrall
In the past it would always ignore interfaces with loopback addresses. Now we just treat them like any other. This also helps with writing tests that check for the presence of the qtaguid module. Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: xt_qtaguid: extend iface stat to report protocolsJP Abgrall
In the past the iface_stat_fmt would only show global bytes/packets for the skb-based numbers. For stall detection in userspace, distinguishing tcp vs other protocols makes it easier. Now we report ifname total_skb_rx_bytes total_skb_rx_packets total_skb_tx_bytes total_skb_tx_packets {rx,tx}_{tcp,udp,ohter}_{bytes,packets} Bug: 6818637 Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: xt_qtaguid: remove AID_* dependency for access controlJP Abgrall
qtaguid limits what can be done with /ctrl and /stats based on group membership. This changes removes AID_NET_BW_STATS and AID_NET_BW_ACCT, and picks up the groups from the gid of the matching proc entry files. Signed-off-by: JP Abgrall <jpa@google.com> Change-Id: I42e477adde78a12ed5eb58fbc0b277cdaadb6f94
2013-07-01netfilter: qtaguid: Don't BUG_ON if create_if_tag_stat failsPontus Fuchs
If create_if_tag_stat fails to allocate memory (GFP_ATOMIC) the following will happen: qtaguid: iface_stat: tag stat alloc failed ... kernel BUG at xt_qtaguid.c:1482! Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com>
2013-07-01netfilter: xt_qtaguid: fix error exit that would keep a spinlock.JP Abgrall
qtudev_open() could return with a uid_tag_data_tree_lock held when an kzalloc(..., GFP_ATOMIC) would fail. Very unlikely to get triggered AND survive the mayhem of running out of mem. Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: xt_qtaguid: report only uid tags to non-privileged processesJP Abgrall
In the past, a process could only see its own stats (uid-based summary, and details). Now we allow any process to see other UIDs uid-based stats, but still hide the detailed stats. Change-Id: I7666961ed244ac1d9359c339b048799e5db9facc Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: xt_IDLETIMER: Rename INTERFACE to LABEL in netlink notification.Ashish Sharma
Change-Id: Iaeca5dd2d7878c0733923ae03309a2a7b86979ca Signed-off-by: Ashish Sharma <ashishsharma@google.com>
2013-07-01netfilter: xt_qtaguid: start tracking iface rx/tx at low levelJP Abgrall
qtaguid tracks the device stats by monitoring when it goes up and down, then it gets the dev_stats(). But devs don't correctly report stats (either they don't count headers symmetrically between rx/tx, or they count internal control messages). Now qtaguid counts the rx/tx bytes/packets during raw:prerouting and mangle:postrouting (nat is not available in ipv6). The results are in /proc/net/xt_qtaguid/iface_stat_fmt which outputs a format line (bash expansion): ifname total_skb_{rx,tx}_{bytes,packets} Added event counters for pre/post handling. Added extra ctrl_*() pid/uid debugging. Change-Id: Id84345d544ad1dd5f63e3842cab229e71d339297 Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: xt_IDLETIMER: Add new netlink msg typeJP Abgrall
Send notifications when the label becomes active after an idle period. Send netlink message notifications in addition to sysfs notifications. Using a uevent with subsystem=xt_idletimer INTERFACE=... STATE={active,inactive} This is backport from common android-3.0 commit: beb914e987cbbd368988d2b94a6661cb907c4d5a with uevent support instead of a new netlink message type. Change-Id: I31677ef00c94b5f82c8457e5bf9e5e584c23c523 Signed-off-by: Ashish Sharma <ashishsharma@google.com> Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: xt_qtaguid: fix ipv6 protocol lookupJP Abgrall
When updating the stats for a given uid it would incorrectly assume IPV4 and pick up the wrong protocol when IPV6. Change-Id: Iea4a635012b4123bf7aa93809011b7b2040bb3d5 Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: qtaguid: initialize a local var to keep compiler happy.JP Abgrall
There was a case that might have seemed like new_tag_stat was not initialized and actually used. Added comment explaining why it was impossible, and a BUG() in case the logic gets changed. Change-Id: I1eddd1b6f754c08a3bf89f7e9427e5dce1dfb081 Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: fixup the quota2, and enable.JP Abgrall
The xt_quota2 came from http://sourceforge.net/projects/xtables-addons/develop It needed tweaking for it to compile within the kernel tree. Fixed kmalloc() and create_proc_entry() invocations within a non-interruptible context. Removed useless copying of current quota back to the iptable's struct matchinfo: - those are per CPU: they will change randomly based on which cpu gets to update the value. - they prevent matching a rule: e.g. -A chain -m quota2 --name q1 --quota 123 can't be followed by -D chain -m quota2 --name q1 --quota 123 as the 123 will be compared to the struct matchinfo's quota member. Use the NETLINK NETLINK_NFLOG family to log a single message when the quota limit is reached. It uses the same packet type as ipt_ULOG, but - never copies skb data, - uses 112 as the event number (ULOG's +1) It doesn't log if the module param "event_num" is 0. Change-Id: I021d3b743db3b22158cc49acb5c94d905b501492 Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: adding the original quota2 from xtables-addonsJP Abgrall
The original xt_quota in the kernel is plain broken: - counts quota at a per CPU level (was written back when ubiquitous SMP was just a dream) - provides no way to count across IPV4/IPV6. This patch is the original unaltered code from: http://sourceforge.net/projects/xtables-addons at commit e84391ce665cef046967f796dd91026851d6bbf3 Change-Id: I19d49858840effee9ecf6cff03c23b45a97efdeb Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01netfilter: add xt_qtaguid matching moduleJP Abgrall
This module allows tracking stats at the socket level for given UIDs. It replaces xt_owner. If the --uid-owner is not specified, it will just count stats based on who the skb belongs to. This will even happen on incoming skbs as it looks into the skb via xt_socket magic to see who owns it. If an skb is lost, it will be assigned to uid=0. To control what sockets of what UIDs are tagged by what, one uses: echo t $sock_fd $accounting_tag $the_billed_uid \ > /proc/net/xt_qtaguid/ctrl So whenever an skb belongs to a sock_fd, it will be accounted against $the_billed_uid and matching stats will show up under the uid with the given $accounting_tag. Because the number of allocations for the stats structs is not that big: ~500 apps * 32 per app we'll just do it atomic. This avoids walking lists many times, and the fancy worker thread handling. Slabs will grow when needed later. It use netdevice and inetaddr notifications instead of hooks in the core dev code to track when a device comes and goes. This removes the need for exposed iface_stat.h. Put procfs dirs in /proc/net/xt_qtaguid/ ctrl stats iface_stat/<iface>/... The uid stats are obtainable in ./stats. Change-Id: I01af4fd91c8de651668d3decb76d9bdc1e343919 Signed-off-by: JP Abgrall <jpa@google.com>
2013-07-01nf: xt_socket: export the fancy sock finder codeJP Abgrall
The socket matching function has some nifty logic to get the struct sock from the skb or from the connection tracker. We export this so other xt_* can use it, similarly to ho how xt_socket uses nf_tproxy_get_sock. Change-Id: I11c58f59087e7f7ae09e4abd4b937cd3370fa2fd Signed-off-by: JP Abgrall <jpa@google.com>
2013-06-30netfilter: nf_queue: add NFQA_SKB_CSUM_NOTVERIFIED info flagFlorian Westphal
The common case is that TCP/IP checksums have already been verified, e.g. by hardware (rx checksum offload), or conntrack. Userspace can use this flag to determine when the checksum has not been validated yet. If the flag is set, this doesn't necessarily mean that the packet has an invalid checksum, e.g. if NIC doesn't support rx checksum. Userspace that sucessfully enabled NFQA_CFG_F_GSO queue feature flag can infer that IP/TCP checksum has already been validated if either the SKB_INFO attribute is not present or the NFQA_SKB_CSUM_NOTVERIFIED flag is unset. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-26ipvs: add sync_persist_mode flagJulian Anastasov
Add sync_persist_mode flag to reduce sync traffic by syncing only persistent templates. Signed-off-by: Julian Anastasov <ja@ssi.bg> Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26ipvs: SH fallback and L4 hashingAlexander Frolkin
By default the SH scheduler rejects connections that are hashed onto a realserver of weight 0. This patch adds a flag to make SH choose a different realserver in this case, instead of rejecting the connection. The patch also adds a flag to make SH include the source port (TCP, UDP, SCTP) in the hash as well as the source address. This basically allows for deterministic round-robin load balancing (i.e., where any director in a cluster of directors with identical config will send the same packet the same way). The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options can be set per service. They are set using a new option to ipvsadm. Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26ipvs: drop SCTP connections depending on stateJulian Anastasov
Drop SCTP connections under load (dropentry context) depending on the protocol state, just like for TCP: INIT conns are dropped immediately, established are dropped randomly while connections in progress or shutdown are skipped. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26ipvs: replace the SCTP state machineJulian Anastasov
Convert the SCTP state table, so that it is more readable. Change the states to be according to the diagram in RFC 2960 and add more states suitable for middle box. Still, such change in states adds incompatibility if systems in sync setup include this change and others do not include it. With this change we also have proper transitions in INPUT-ONLY mode (DR/TUN) where we see packets only from client. Now we should not switch to 10-second CLOSED state at a time when we should stay in ESTABLISHED state. The short names for states are because we have 16-char space in ipvsadm and 11-char limit for the connection list format. It is a sequence of the TCP implementation where the longest state name is ESTABLISHED. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26ipvs: sloppy TCP and SCTPAlexander Frolkin
This adds support for sloppy TCP and SCTP modes to IPVS. When enabled (sysctls net.ipv4.vs.sloppy_tcp and net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any packet, not just a TCP SYN (or SCTP INIT). This allows connections to fail over from one IPVS director to another mid-flight. Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26ipvs: provide iph to schedulersJulian Anastasov
Before now the schedulers needed access only to IP addresses and it was easy to get them from skb by using ip_vs_fill_iph_addr_only. New changes for the SH scheduler will need the protocol and ports which is difficult to get from skb for the IPv6 case. As we have all the data in the iph structure, to avoid the same slow lookups provide the iph to schedulers. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-24netfilter: ctnetlink: send event when conntrack label was modifiedFlorian Westphal
commit 0ceabd83875b72a29f33db4ab703d6ba40ea4c58 (netfilter: ctnetlink: deliver labels to userspace) sets the event bit when we raced with another packet, instead of raising the event bit when the label bit is set for the first time. commit 9b21f6a90924dfe8e5e686c314ddb441fb06501e (netfilter: ctnetlink: allow userspace to modify labels) forgot to update the event mask in the "conntrack already exists" case. Both issues result in CTA_LABELS attribute not getting included in the conntrack event. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-24netfilter: nf_nat_sip: fix manglingBalazs Peter Odor
In (b20ab9c netfilter: nf_ct_helper: better logging for dropped packets) there were some missing brackets around the logging information, thus always returning drop. Closes https://bugzilla.kernel.org/show_bug.cgi?id=60061 Signed-off-by: Balazs Peter Odor <balazs@obiserver.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-20netfilter: xt_socket: add XT_SOCKET_NOWILDCARD flagEric Dumazet
xt_socket module can be a nice replacement to conntrack module in some cases (SYN filtering for example) But it lacks the ability to match the 3rd packet of TCP handshake (ACK coming from the client). Add a XT_SOCKET_NOWILDCARD flag to disable the wildcard mechanism. The wildcard is the legacy socket match behavior, that ignores LISTEN sockets bound to INADDR_ANY (or ipv6 equivalent) iptables -I INPUT -p tcp --syn -j SYN_CHAIN iptables -I INPUT -m socket --nowildcard -j ACCEPT Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-20netfilter: nf_conntrack: avoid large timeout for mid-stream pickupFlorian Westphal
When loose tracking is enabled (default), non-syn packets cause creation of new conntracks in established state with default timeout for established state (5 days). This causes the table to fill up with UNREPLIED when the 'new ack' packet happened to be the last-ack of a previous, already timed-out connection. Consider: A 192.168.x.52792 > 10.184.y.80: F, 426:426(0) ack 9237 win 255 B 10.184.y.80 > 192.168.x.52792: ., ack 427 win 123 <61 second pause> C 10.184.y.80 > 192.168.x.52792: F, 9237:9237(0) ack 427 win 123 D 192.168.x.52792 > 10.184.y.80: ., ack 9238 win 255 B moves conntrack to CLOSE_WAIT and will kill it after 60 second timeout, C is ignored (FIN set), but last packet (D) causes new ct with 5-days timeout. Use UNACK timeout (5 minutes) instead to get rid of these entries sooner when in ESTABLISHED state without having seen traffic in both directions. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-20netfilter: check return code from nla_parse_testedDaniel Borkmann
These are the only calls under net/ that do not check nla_parse_nested() for its error code, but simply continue execution. If parsing of netlink attributes fails, we should return with an error instead of continuing. In nearly all of these calls we have a policy attached, that is being type verified during nla_parse_nested(), which we would miss checking for otherwise. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/wireless/ath/ath9k/Kconfig drivers/net/xen-netback/netback.c net/batman-adv/bat_iv_ogm.c net/wireless/nl80211.c The ath9k Kconfig conflict was a change of a Kconfig option name right next to the deletion of another option. The xen-netback conflict was overlapping changes involving the handling of the notify list in xen_netbk_rx_action(). Batman conflict resolution provided by Antonio Quartulli, basically keep everything in both conflict hunks. The nl80211 conflict is a little more involved. In 'net' we added a dynamic memory allocation to nl80211_dump_wiphy() to fix a race that Linus reported. Meanwhile in 'net-next' the handlers were converted to use pre and post doit handlers which use a flag to determine whether to hold the RTNL mutex around the operation. However, the dump handlers to not use this logic. Instead they have to explicitly do the locking. There were apparent bugs in the conversion of nl80211_dump_wiphy() in that we were not dropping the RTNL mutex in all the return paths, and it seems we very much should be doing so. So I fixed that whilst handling the overlapping changes. To simplify the initial returns, I take the RTNL mutex after we try to allocate 'tb'. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19ipvs: SCTP ports should be writable in ICMP packetsJulian Anastasov
Make sure that SCTP ports are writable when embedded in ICMP from client, so that ip_vs_nat_icmp can translate them safely. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-13net: Convert uses of typedef ctl_table to struct ctl_tableJoe Perches
Reduce the uses of this unnecessary typedef. Done via perl script: $ git grep --name-only -w ctl_table net | \ xargs perl -p -i -e '\ sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \ s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge' Reflow the modified lines that now exceed 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12netfilter: xt_TCPMSS: Fix missing fragmentation handlingPhil Oester
Similar to commit bc6bcb59 ("netfilter: xt_TCPOPTSTRIP: fix possible mangling beyond packet boundary"), add safe fragment handling to xt_TCPMSS. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-12netfilter: xt_TCPMSS: Fix IPv6 default MSS tooPhil Oester
As a followup to commit 409b545a ("netfilter: xt_TCPMSS: Fix violation of RFC879 in absence of MSS option"), John Heffner points out that IPv6 has a higher MTU than IPv4, and thus a higher minimum MSS. Update TCPMSS target to account for this, and update RFC comment. While at it, point to more recent reference RFC1122 instead of RFC879. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-11net_sched: add 64bit rate estimatorsEric Dumazet
struct gnet_stats_rate_est contains u32 fields, so the bytes per second field can wrap at 34360Mbit. Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields, and switch the kernel to use this structure natively. This structure is dumped to user space as a new attribute : TCA_STATS_RATE_EST64 Old tc command will now display the capped bps (to 34360Mbit), instead of wrapped values, and updated tc command will display correct information. Old tc command output, after patch : eric:~# tc -s -d qd sh dev lo qdisc pfifo 8001: root refcnt 2 limit 1000p Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0) rate 34360Mbit 189696pps backlog 0b 0p requeues 0 This patch carefully reorganizes "struct Qdisc" layout to get optimal performance on SMP. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11netfilter: xt_TCPOPTSTRIP: don't use tcp_hdr()Pablo Neira Ayuso
In (bc6bcb5 netfilter: xt_TCPOPTSTRIP: fix possible mangling beyond packet boundary), the use of tcp_hdr was introduced. However, we cannot assume that skb->transport_header is set for non-local packets. Cc: Florian Westphal <fw@strlen.de> Reported-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-10ipvs: info leak in __ip_vs_get_dest_entries()Dan Carpenter
The entry struct has a 2 byte hole after ->port and another 4 byte hole after ->stats.outpkts. You must have CAP_NET_ADMIN in your namespace to hit this information leak. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-07netfilter: nfnetlink_queue: fix missing HW protocolPablo Neira Ayuso
Locally generated IPv4 and IPv6 traffic gets skb->protocol unset, thus passing zero. ip6tables -I OUTPUT -j NFQUEUE libmnl/examples/netfilter# ./nf-queue 0 & ping6 ::1 packet received (id=1 hw=0x0000 hook=3) ^^^^^^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>