Age | Commit message (Collapse) | Author |
|
merge-linux-linaro-core-tracking
Conflicting files:
|
|
Currently the conntrack checks if the ending sequence of a packet
falls within the observed receive window. However it does so even
if it has not observe any packet from the remote yet and uses an
uninitialized receive window (td_maxwin).
If a connection uses Fast Open to send a SYN-data packet which is
dropped afterward in the network. The subsequent SYNs retransmits
will all fail this check and be discarded, leading to a connection
timeout. This is because the SYN retransmit does not contain data
payload so
end == initial sequence number (isn) + 1
sender->td_end == isn + syn_data_len
receiver->td_maxwin == 0
The fix is to only apply this check after td_maxwin is initialized.
Reported-by: Michael Chan <mcfchan@stanford.edu>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Tixy reported build issues with netfilter options enabled and
sure enough it looks like I mucked things up in the merge.
Tixy sent along a fixup patch and I took my own swipe at it,
and the solutions were identical, so I suspect this is the right
fix.
Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
These structs have a "_pad" member. Also the "phw" structs have an 8
byte "hw_addr[]" array but sometimes only the first 6 bytes are
initialized.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fix a possible off by one access since optlen()
touches opt[offset+1] unsafely when i == tcp_hdrlen(skb) - 1.
This patch replaces tcp_hdrlen() by the local variable tcp_hdrlen
that stores the TCP header length, to save some cycles.
Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Make sure the packet has enough room for the TCP header and
that it is not malformed.
While at it, store tcph->doff*4 in a variable, as it is used
several times.
This patch also fixes a possible off by one in case of malformed
TCP options.
Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Linux 3.11-rc1
Merged in v3.11-rc1 and fixed up a bunch of conflicts
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
commit 681f130f39e10 ("netfilter: xt_socket: add XT_SOCKET_NOWILDCARD
flag") added a potential NULL dereference if an old iptables package
uses v0 of the match.
Fix this by removing the test on @info in fast path.
IPv6 can remove the test as well, as it uses v1 or v2.
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nf_ct_expect_alloc leaves unset the expectation NAT fields. However,
ctnetlink_exp_dump_expect expects them to be zeroed in case they are
not used, which may not be the case. This results in dumping the NAT
tuple of the expectation when it should not.
Fix it by zeroing the NAT fields of the expectation.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Conflicts:
drivers/net/ethernet/freescale/fec_main.c
drivers/net/ethernet/renesas/sh_eth.c
net/ipv4/gre.c
The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
and the splitting of the gre.c code into seperate files.
The FEC conflict was two sets of changes adding ethtool support code
in an "!CONFIG_M5272" CPP protected block.
Finally the sh_eth.c conflict was between one commit add bits set
in the .eesr_err_check mask whilst another commit removed the
.tx_error_check member and assignments.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Stop using obsolete procfs api.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
|
|
- Stop using obsolete create_proc_entry api.
- Use proc_set_user instead of directly accessing the private structure.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
|
|
Since (41063e9 ipv4: Early TCP socket demux), skb's can have an sk which
is not a struct sock but the smaller struct inet_timewait_sock without an
sk->sk_socket. Now we bypass sk_state == TCP_TIME_WAIT
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
Some of the printks are in the packet handling path.
We now ratelimit the very unlikely errors to avoid
kmsg spamming.
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
In the past it would always ignore interfaces with loopback addresses.
Now we just treat them like any other.
This also helps with writing tests that check for the presence
of the qtaguid module.
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
In the past the iface_stat_fmt would only show global bytes/packets
for the skb-based numbers.
For stall detection in userspace, distinguishing tcp vs other protocols
makes it easier.
Now we report
ifname total_skb_rx_bytes total_skb_rx_packets total_skb_tx_bytes
total_skb_tx_packets {rx,tx}_{tcp,udp,ohter}_{bytes,packets}
Bug: 6818637
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
qtaguid limits what can be done with /ctrl and /stats based on group
membership.
This changes removes AID_NET_BW_STATS and AID_NET_BW_ACCT, and picks
up the groups from the gid of the matching proc entry files.
Signed-off-by: JP Abgrall <jpa@google.com>
Change-Id: I42e477adde78a12ed5eb58fbc0b277cdaadb6f94
|
|
If create_if_tag_stat fails to allocate memory (GFP_ATOMIC) the
following will happen:
qtaguid: iface_stat: tag stat alloc failed
...
kernel BUG at xt_qtaguid.c:1482!
Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com>
|
|
qtudev_open() could return with a uid_tag_data_tree_lock held
when an kzalloc(..., GFP_ATOMIC) would fail.
Very unlikely to get triggered AND survive the mayhem of running out of mem.
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
In the past, a process could only see its own stats (uid-based summary,
and details).
Now we allow any process to see other UIDs uid-based stats, but still
hide the detailed stats.
Change-Id: I7666961ed244ac1d9359c339b048799e5db9facc
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
Change-Id: Iaeca5dd2d7878c0733923ae03309a2a7b86979ca
Signed-off-by: Ashish Sharma <ashishsharma@google.com>
|
|
qtaguid tracks the device stats by monitoring when it goes up and down,
then it gets the dev_stats().
But devs don't correctly report stats (either they don't count headers
symmetrically between rx/tx, or they count internal control messages).
Now qtaguid counts the rx/tx bytes/packets during raw:prerouting and
mangle:postrouting (nat is not available in ipv6).
The results are in
/proc/net/xt_qtaguid/iface_stat_fmt
which outputs a format line (bash expansion):
ifname total_skb_{rx,tx}_{bytes,packets}
Added event counters for pre/post handling.
Added extra ctrl_*() pid/uid debugging.
Change-Id: Id84345d544ad1dd5f63e3842cab229e71d339297
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
Send notifications when the label becomes active after an idle period.
Send netlink message notifications in addition to sysfs notifications.
Using a uevent with
subsystem=xt_idletimer
INTERFACE=...
STATE={active,inactive}
This is backport from common android-3.0
commit: beb914e987cbbd368988d2b94a6661cb907c4d5a
with uevent support instead of a new netlink message type.
Change-Id: I31677ef00c94b5f82c8457e5bf9e5e584c23c523
Signed-off-by: Ashish Sharma <ashishsharma@google.com>
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
When updating the stats for a given uid it would incorrectly assume
IPV4 and pick up the wrong protocol when IPV6.
Change-Id: Iea4a635012b4123bf7aa93809011b7b2040bb3d5
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
There was a case that might have seemed like new_tag_stat was not
initialized and actually used.
Added comment explaining why it was impossible, and a BUG()
in case the logic gets changed.
Change-Id: I1eddd1b6f754c08a3bf89f7e9427e5dce1dfb081
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
The xt_quota2 came from
http://sourceforge.net/projects/xtables-addons/develop
It needed tweaking for it to compile within the kernel tree.
Fixed kmalloc() and create_proc_entry() invocations within
a non-interruptible context.
Removed useless copying of current quota back to the iptable's
struct matchinfo:
- those are per CPU: they will change randomly based on which
cpu gets to update the value.
- they prevent matching a rule: e.g.
-A chain -m quota2 --name q1 --quota 123
can't be followed by
-D chain -m quota2 --name q1 --quota 123
as the 123 will be compared to the struct matchinfo's quota member.
Use the NETLINK NETLINK_NFLOG family to log a single message
when the quota limit is reached.
It uses the same packet type as ipt_ULOG, but
- never copies skb data,
- uses 112 as the event number (ULOG's +1)
It doesn't log if the module param "event_num" is 0.
Change-Id: I021d3b743db3b22158cc49acb5c94d905b501492
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
The original xt_quota in the kernel is plain broken:
- counts quota at a per CPU level
(was written back when ubiquitous SMP was just a dream)
- provides no way to count across IPV4/IPV6.
This patch is the original unaltered code from:
http://sourceforge.net/projects/xtables-addons
at commit e84391ce665cef046967f796dd91026851d6bbf3
Change-Id: I19d49858840effee9ecf6cff03c23b45a97efdeb
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
This module allows tracking stats at the socket level for given UIDs.
It replaces xt_owner.
If the --uid-owner is not specified, it will just count stats based on
who the skb belongs to. This will even happen on incoming skbs as it
looks into the skb via xt_socket magic to see who owns it.
If an skb is lost, it will be assigned to uid=0.
To control what sockets of what UIDs are tagged by what, one uses:
echo t $sock_fd $accounting_tag $the_billed_uid \
> /proc/net/xt_qtaguid/ctrl
So whenever an skb belongs to a sock_fd, it will be accounted against
$the_billed_uid
and matching stats will show up under the uid with the given
$accounting_tag.
Because the number of allocations for the stats structs is not that big:
~500 apps * 32 per app
we'll just do it atomic. This avoids walking lists many times, and
the fancy worker thread handling. Slabs will grow when needed later.
It use netdevice and inetaddr notifications instead of hooks in the core dev
code to track when a device comes and goes. This removes the need for
exposed iface_stat.h.
Put procfs dirs in /proc/net/xt_qtaguid/
ctrl
stats
iface_stat/<iface>/...
The uid stats are obtainable in ./stats.
Change-Id: I01af4fd91c8de651668d3decb76d9bdc1e343919
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
The socket matching function has some nifty logic to get the struct sock
from the skb or from the connection tracker.
We export this so other xt_* can use it, similarly to ho how
xt_socket uses nf_tproxy_get_sock.
Change-Id: I11c58f59087e7f7ae09e4abd4b937cd3370fa2fd
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
The common case is that TCP/IP checksums have already been
verified, e.g. by hardware (rx checksum offload), or conntrack.
Userspace can use this flag to determine when the checksum
has not been validated yet.
If the flag is set, this doesn't necessarily mean that the packet has
an invalid checksum, e.g. if NIC doesn't support rx checksum.
Userspace that sucessfully enabled NFQA_CFG_F_GSO queue feature flag can
infer that IP/TCP checksum has already been validated if either the
SKB_INFO attribute is not present or the NFQA_SKB_CSUM_NOTVERIFIED
flag is unset.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add sync_persist_mode flag to reduce sync traffic
by syncing only persistent templates.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
By default the SH scheduler rejects connections that are hashed onto a
realserver of weight 0. This patch adds a flag to make SH choose a
different realserver in this case, instead of rejecting the connection.
The patch also adds a flag to make SH include the source port (TCP, UDP,
SCTP) in the hash as well as the source address. This basically allows
for deterministic round-robin load balancing (i.e., where any director
in a cluster of directors with identical config will send the same
packet the same way).
The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options
can be set per service. They are set using a new option to ipvsadm.
Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Drop SCTP connections under load (dropentry context) depending
on the protocol state, just like for TCP: INIT conns are
dropped immediately, established are dropped randomly while
connections in progress or shutdown are skipped.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Convert the SCTP state table, so that it is more readable.
Change the states to be according to the diagram in RFC 2960
and add more states suitable for middle box. Still, such
change in states adds incompatibility if systems in sync
setup include this change and others do not include it.
With this change we also have proper transitions in INPUT-ONLY
mode (DR/TUN) where we see packets only from client. Now
we should not switch to 10-second CLOSED state at a time
when we should stay in ESTABLISHED state.
The short names for states are because we have 16-char space
in ipvsadm and 11-char limit for the connection list format.
It is a sequence of the TCP implementation where the longest
state name is ESTABLISHED.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
This adds support for sloppy TCP and SCTP modes to IPVS.
When enabled (sysctls net.ipv4.vs.sloppy_tcp and
net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any
packet, not just a TCP SYN (or SCTP INIT).
This allows connections to fail over from one IPVS director to another
mid-flight.
Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Before now the schedulers needed access only to IP
addresses and it was easy to get them from skb by
using ip_vs_fill_iph_addr_only.
New changes for the SH scheduler will need the protocol
and ports which is difficult to get from skb for the
IPv6 case. As we have all the data in the iph structure,
to avoid the same slow lookups provide the iph to schedulers.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
commit 0ceabd83875b72a29f33db4ab703d6ba40ea4c58
(netfilter: ctnetlink: deliver labels to userspace) sets the event bit
when we raced with another packet, instead of raising the event bit
when the label bit is set for the first time.
commit 9b21f6a90924dfe8e5e686c314ddb441fb06501e
(netfilter: ctnetlink: allow userspace to modify labels) forgot to update
the event mask in the "conntrack already exists" case.
Both issues result in CTA_LABELS attribute not getting included in the
conntrack event.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In (b20ab9c netfilter: nf_ct_helper: better logging for dropped packets)
there were some missing brackets around the logging information, thus
always returning drop.
Closes https://bugzilla.kernel.org/show_bug.cgi?id=60061
Signed-off-by: Balazs Peter Odor <balazs@obiserver.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
xt_socket module can be a nice replacement to conntrack module
in some cases (SYN filtering for example)
But it lacks the ability to match the 3rd packet of TCP
handshake (ACK coming from the client).
Add a XT_SOCKET_NOWILDCARD flag to disable the wildcard mechanism.
The wildcard is the legacy socket match behavior, that ignores
LISTEN sockets bound to INADDR_ANY (or ipv6 equivalent)
iptables -I INPUT -p tcp --syn -j SYN_CHAIN
iptables -I INPUT -m socket --nowildcard -j ACCEPT
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When loose tracking is enabled (default), non-syn packets cause
creation of new conntracks in established state with default timeout for
established state (5 days). This causes the table to fill up with UNREPLIED
when the 'new ack' packet happened to be the last-ack of a previous,
already timed-out connection.
Consider:
A 192.168.x.52792 > 10.184.y.80: F, 426:426(0) ack 9237 win 255
B 10.184.y.80 > 192.168.x.52792: ., ack 427 win 123
<61 second pause>
C 10.184.y.80 > 192.168.x.52792: F, 9237:9237(0) ack 427 win 123
D 192.168.x.52792 > 10.184.y.80: ., ack 9238 win 255
B moves conntrack to CLOSE_WAIT and will kill it after 60 second timeout,
C is ignored (FIN set), but last packet (D) causes new ct with 5-days timeout.
Use UNACK timeout (5 minutes) instead to get rid of these entries sooner
when in ESTABLISHED state without having seen traffic in both directions.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
These are the only calls under net/ that do not check nla_parse_nested()
for its error code, but simply continue execution. If parsing of netlink
attributes fails, we should return with an error instead of continuing.
In nearly all of these calls we have a policy attached, that is being
type verified during nla_parse_nested(), which we would miss checking
for otherwise.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Conflicts:
drivers/net/wireless/ath/ath9k/Kconfig
drivers/net/xen-netback/netback.c
net/batman-adv/bat_iv_ogm.c
net/wireless/nl80211.c
The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.
The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().
Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.
The nl80211 conflict is a little more involved. In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported. Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.
However, the dump handlers to not use this logic. Instead they have
to explicitly do the locking. There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so. So I fixed that whilst handling the overlapping changes.
To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make sure that SCTP ports are writable when embedded in ICMP
from client, so that ip_vs_nat_icmp can translate them safely.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Reduce the uses of this unnecessary typedef.
Done via perl script:
$ git grep --name-only -w ctl_table net | \
xargs perl -p -i -e '\
sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'
Reflow the modified lines that now exceed 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similar to commit bc6bcb59 ("netfilter: xt_TCPOPTSTRIP: fix
possible mangling beyond packet boundary"), add safe fragment
handling to xt_TCPMSS.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
As a followup to commit 409b545a ("netfilter: xt_TCPMSS: Fix violation
of RFC879 in absence of MSS option"), John Heffner points out that IPv6
has a higher MTU than IPv4, and thus a higher minimum MSS. Update TCPMSS
target to account for this, and update RFC comment.
While at it, point to more recent reference RFC1122 instead of RFC879.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
struct gnet_stats_rate_est contains u32 fields, so the bytes per second
field can wrap at 34360Mbit.
Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
and switch the kernel to use this structure natively.
This structure is dumped to user space as a new attribute :
TCA_STATS_RATE_EST64
Old tc command will now display the capped bps (to 34360Mbit), instead
of wrapped values, and updated tc command will display correct
information.
Old tc command output, after patch :
eric:~# tc -s -d qd sh dev lo
qdisc pfifo 8001: root refcnt 2 limit 1000p
Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
rate 34360Mbit 189696pps backlog 0b 0p requeues 0
This patch carefully reorganizes "struct Qdisc" layout to get optimal
performance on SMP.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In (bc6bcb5 netfilter: xt_TCPOPTSTRIP: fix possible mangling beyond
packet boundary), the use of tcp_hdr was introduced. However, we
cannot assume that skb->transport_header is set for non-local packets.
Cc: Florian Westphal <fw@strlen.de>
Reported-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The entry struct has a 2 byte hole after ->port and another 4 byte
hole after ->stats.outpkts. You must have CAP_NET_ADMIN in your
namespace to hit this information leak.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Locally generated IPv4 and IPv6 traffic gets skb->protocol unset,
thus passing zero.
ip6tables -I OUTPUT -j NFQUEUE
libmnl/examples/netfilter# ./nf-queue 0 &
ping6 ::1
packet received (id=1 hw=0x0000 hook=3)
^^^^^^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|