aboutsummaryrefslogtreecommitdiff
path: root/tools
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2013-05-01 14:08:52 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2013-05-01 14:08:52 -0700
commit73287a43cc79ca06629a88d1a199cd283f42456a (patch)
treeacf4456e260115bea77ee31a29f10ce17f0db45c /tools
parent251df49db3327c64bf917bfdba94491fde2b4ee0 (diff)
parent20074f357da4a637430aec2879c9d864c5d2c23c (diff)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: "Highlights (1721 non-merge commits, this has to be a record of some sort): 1) Add 'random' mode to team driver, from Jiri Pirko and Eric Dumazet. 2) Make it so that any driver that supports configuration of multiple MAC addresses can provide the forwarding database add and del calls by providing a default implementation and hooking that up if the driver doesn't have an explicit set of handlers. From Vlad Yasevich. 3) Support GSO segmentation over tunnels and other encapsulating devices such as VXLAN, from Pravin B Shelar. 4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton. 5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita Dukkipati. 6) In the PHY layer, allow supporting wake-on-lan in situations where the PHY registers have to be written for it to be configured. Use it to support wake-on-lan in mv643xx_eth. From Michael Stapelberg. 7) Significantly improve firewire IPV6 support, from YOSHIFUJI Hideaki. 8) Allow multiple packets to be sent in a single transmission using network coding in batman-adv, from Martin Hundebøll. 9) Add support for T5 cxgb4 chips, from Santosh Rastapur. 10) Generalize the VXLAN forwarding tables so that there is more flexibility in configurating various aspects of the endpoints. From David Stevens. 11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver, from Dmitry Kravkov. 12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo Neira Ayuso. 13) Start adding networking selftests. 14) In situations of overload on the same AF_PACKET fanout socket, or per-cpu packet receive queue, minimize drop by distributing the load to other cpus/fanouts. From Willem de Bruijn and Eric Dumazet. 15) Add support for new payload offset BPF instruction, from Daniel Borkmann. 16) Convert several drivers over to mdoule_platform_driver(), from Sachin Kamat. 17) Provide a minimal BPF JIT image disassembler userspace tool, from Daniel Borkmann. 18) Rewrite F-RTO implementation in TCP to match the final specification of it in RFC4138 and RFC5682. From Yuchung Cheng. 19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear you like netlink, so I implemented netlink dumping of netlink sockets.") From Andrey Vagin. 20) Remove ugly passing of rtnetlink attributes into rtnl_doit functions, from Thomas Graf. 21) Allow userspace to be able to see if a configuration change occurs in the middle of an address or device list dump, from Nicolas Dichtel. 22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes Frederic Sowa. 23) Increase accuracy of packet length used by packet scheduler, from Jason Wang. 24) Beginning set of changes to make ipv4/ipv6 fragment handling more scalable and less susceptible to overload and locking contention, from Jesper Dangaard Brouer. 25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*() instead. From Hong Zhiguo. 26) Optimize route usage in IPVS by avoiding reference counting where possible, from Julian Anastasov. 27) Convert IPVS schedulers to RCU, also from Julian Anastasov. 28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger Eitzenberger. 29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG, nfnetlink_log, and nfnetlink_queue. From Gao feng. 30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa. 31) Support several new r8169 chips, from Hayes Wang. 32) Support tokenized interface identifiers in ipv6, from Daniel Borkmann. 33) Use usbnet_link_change() helper in USB net driver, from Ming Lei. 34) Add 802.1ad vlan offload support, from Patrick McHardy. 35) Support mmap() based netlink communication, also from Patrick McHardy. 36) Support HW timestamping in mlx4 driver, from Amir Vadai. 37) Rationalize AF_PACKET packet timestamping when transmitting, from Willem de Bruijn and Daniel Borkmann. 38) Bring parity to what's provided by /proc/net/packet socket dumping and the info provided by netlink socket dumping of AF_PACKET sockets. From Nicolas Dichtel. 39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin Poirier" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) filter: fix va_list build error af_unix: fix a fatal race with bit fields bnx2x: Prevent memory leak when cnic is absent bnx2x: correct reading of speed capabilities net: sctp: attribute printl with __printf for gcc fmt checks netlink: kconfig: move mmap i/o into netlink kconfig netpoll: convert mutex into a semaphore netlink: Fix skb ref counting. net_sched: act_ipt forward compat with xtables mlx4_en: fix a build error on 32bit arches Revert "bnx2x: allow nvram test to run when device is down" bridge: avoid OOPS if root port not found drivers: net: cpsw: fix kernel warn on cpsw irq enable sh_eth: use random MAC address if no valid one supplied 3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA) tg3: fix to append hardware time stamping flags unix/stream: fix peeking with an offset larger than data in queue unix/dgram: fix peeking with an offset larger than data in queue unix/dgram: peek beyond 0-sized skbs openvswitch: Remove unneeded ovs_netdev_get_ifindex() ...
Diffstat (limited to 'tools')
-rw-r--r--tools/Makefile11
-rw-r--r--tools/net/Makefile15
-rw-r--r--tools/net/bpf_jit_disasm.c199
-rw-r--r--tools/testing/selftests/Makefile1
-rw-r--r--tools/testing/selftests/net/.gitignore3
-rw-r--r--tools/testing/selftests/net/Makefile19
-rw-r--r--tools/testing/selftests/net/psock_fanout.c312
-rw-r--r--tools/testing/selftests/net/psock_lib.h127
-rw-r--r--tools/testing/selftests/net/psock_tpacket.c824
-rw-r--r--tools/testing/selftests/net/run_afpackettests26
-rw-r--r--tools/testing/selftests/net/run_netsocktests12
-rw-r--r--tools/testing/selftests/net/socket.c92
12 files changed, 1636 insertions, 5 deletions
diff --git a/tools/Makefile b/tools/Makefile
index 6aaeb6cd867..41067f30421 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -12,6 +12,7 @@ help:
@echo ' turbostat - Intel CPU idle stats and freq reporting tool'
@echo ' usb - USB testing tools'
@echo ' virtio - vhost test module'
+ @echo ' net - misc networking tools'
@echo ' vm - misc vm tools'
@echo ' x86_energy_perf_policy - Intel energy policy tool'
@echo ''
@@ -34,7 +35,7 @@ help:
cpupower: FORCE
$(call descend,power/$@)
-cgroup firewire guest usb virtio vm: FORCE
+cgroup firewire guest usb virtio vm net: FORCE
$(call descend,$@)
liblk: FORCE
@@ -52,7 +53,7 @@ turbostat x86_energy_perf_policy: FORCE
cpupower_install:
$(call descend,power/$(@:_install=),install)
-cgroup_install firewire_install lguest_install perf_install usb_install virtio_install vm_install:
+cgroup_install firewire_install lguest_install perf_install usb_install virtio_install vm_install net_install:
$(call descend,$(@:_install=),install)
selftests_install:
@@ -63,12 +64,12 @@ turbostat_install x86_energy_perf_policy_install:
install: cgroup_install cpupower_install firewire_install lguest_install \
perf_install selftests_install turbostat_install usb_install \
- virtio_install vm_install x86_energy_perf_policy_install
+ virtio_install vm_install net_install x86_energy_perf_policy_install
cpupower_clean:
$(call descend,power/cpupower,clean)
-cgroup_clean firewire_clean lguest_clean usb_clean virtio_clean vm_clean:
+cgroup_clean firewire_clean lguest_clean usb_clean virtio_clean vm_clean net_clean:
$(call descend,$(@:_clean=),clean)
liblk_clean:
@@ -85,6 +86,6 @@ turbostat_clean x86_energy_perf_policy_clean:
clean: cgroup_clean cpupower_clean firewire_clean lguest_clean perf_clean \
selftests_clean turbostat_clean usb_clean virtio_clean \
- vm_clean x86_energy_perf_policy_clean
+ vm_clean net_clean x86_energy_perf_policy_clean
.PHONY: FORCE
diff --git a/tools/net/Makefile b/tools/net/Makefile
new file mode 100644
index 00000000000..b4444d53b73
--- /dev/null
+++ b/tools/net/Makefile
@@ -0,0 +1,15 @@
+prefix = /usr
+
+CC = gcc
+
+all : bpf_jit_disasm
+
+bpf_jit_disasm : CFLAGS = -Wall -O2
+bpf_jit_disasm : LDLIBS = -lopcodes -lbfd -ldl
+bpf_jit_disasm : bpf_jit_disasm.o
+
+clean :
+ rm -rf *.o bpf_jit_disasm
+
+install :
+ install bpf_jit_disasm $(prefix)/bin/bpf_jit_disasm
diff --git a/tools/net/bpf_jit_disasm.c b/tools/net/bpf_jit_disasm.c
new file mode 100644
index 00000000000..cfe0cdcda3d
--- /dev/null
+++ b/tools/net/bpf_jit_disasm.c
@@ -0,0 +1,199 @@
+/*
+ * Minimal BPF JIT image disassembler
+ *
+ * Disassembles BPF JIT compiler emitted opcodes back to asm insn's for
+ * debugging or verification purposes.
+ *
+ * To get the disassembly of the JIT code, do the following:
+ *
+ * 1) `echo 2 > /proc/sys/net/core/bpf_jit_enable`
+ * 2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24`)
+ * 3) Run e.g. `bpf_jit_disasm -o` to read out the last JIT code
+ *
+ * Copyright 2013 Daniel Borkmann <borkmann@redhat.com>
+ * Licensed under the GNU General Public License, version 2.0 (GPLv2)
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <unistd.h>
+#include <string.h>
+#include <bfd.h>
+#include <dis-asm.h>
+#include <sys/klog.h>
+#include <sys/types.h>
+#include <regex.h>
+
+static void get_exec_path(char *tpath, size_t size)
+{
+ char *path;
+ ssize_t len;
+
+ snprintf(tpath, size, "/proc/%d/exe", (int) getpid());
+ tpath[size - 1] = 0;
+
+ path = strdup(tpath);
+ assert(path);
+
+ len = readlink(path, tpath, size);
+ tpath[len] = 0;
+
+ free(path);
+}
+
+static void get_asm_insns(uint8_t *image, size_t len, unsigned long base,
+ int opcodes)
+{
+ int count, i, pc = 0;
+ char tpath[256];
+ struct disassemble_info info;
+ disassembler_ftype disassemble;
+ bfd *bfdf;
+
+ memset(tpath, 0, sizeof(tpath));
+ get_exec_path(tpath, sizeof(tpath));
+
+ bfdf = bfd_openr(tpath, NULL);
+ assert(bfdf);
+ assert(bfd_check_format(bfdf, bfd_object));
+
+ init_disassemble_info(&info, stdout, (fprintf_ftype) fprintf);
+ info.arch = bfd_get_arch(bfdf);
+ info.mach = bfd_get_mach(bfdf);
+ info.buffer = image;
+ info.buffer_length = len;
+
+ disassemble_init_for_target(&info);
+
+ disassemble = disassembler(bfdf);
+ assert(disassemble);
+
+ do {
+ printf("%4x:\t", pc);
+
+ count = disassemble(pc, &info);
+
+ if (opcodes) {
+ printf("\n\t");
+ for (i = 0; i < count; ++i)
+ printf("%02x ", (uint8_t) image[pc + i]);
+ }
+ printf("\n");
+
+ pc += count;
+ } while(count > 0 && pc < len);
+
+ bfd_close(bfdf);
+}
+
+static char *get_klog_buff(int *klen)
+{
+ int ret, len = klogctl(10, NULL, 0);
+ char *buff = malloc(len);
+
+ assert(buff && klen);
+ ret = klogctl(3, buff, len);
+ assert(ret >= 0);
+ *klen = ret;
+
+ return buff;
+}
+
+static void put_klog_buff(char *buff)
+{
+ free(buff);
+}
+
+static int get_last_jit_image(char *haystack, size_t hlen,
+ uint8_t *image, size_t ilen,
+ unsigned long *base)
+{
+ char *ptr, *pptr, *tmp;
+ off_t off = 0;
+ int ret, flen, proglen, pass, ulen = 0;
+ regmatch_t pmatch[1];
+ regex_t regex;
+
+ if (hlen == 0)
+ return 0;
+
+ ret = regcomp(&regex, "flen=[[:alnum:]]+ proglen=[[:digit:]]+ "
+ "pass=[[:digit:]]+ image=[[:xdigit:]]+", REG_EXTENDED);
+ assert(ret == 0);
+
+ ptr = haystack;
+ while (1) {
+ ret = regexec(&regex, ptr, 1, pmatch, 0);
+ if (ret == 0) {
+ ptr += pmatch[0].rm_eo;
+ off += pmatch[0].rm_eo;
+ assert(off < hlen);
+ } else
+ break;
+ }
+
+ ptr = haystack + off - (pmatch[0].rm_eo - pmatch[0].rm_so);
+ ret = sscanf(ptr, "flen=%d proglen=%d pass=%d image=%lx",
+ &flen, &proglen, &pass, base);
+ if (ret != 4)
+ return 0;
+
+ tmp = ptr = haystack + off;
+ while ((ptr = strtok(tmp, "\n")) != NULL && ulen < ilen) {
+ tmp = NULL;
+ if (!strstr(ptr, "JIT code"))
+ continue;
+ pptr = ptr;
+ while ((ptr = strstr(pptr, ":")))
+ pptr = ptr + 1;
+ ptr = pptr;
+ do {
+ image[ulen++] = (uint8_t) strtoul(pptr, &pptr, 16);
+ if (ptr == pptr || ulen >= ilen) {
+ ulen--;
+ break;
+ }
+ ptr = pptr;
+ } while (1);
+ }
+
+ assert(ulen == proglen);
+ printf("%d bytes emitted from JIT compiler (pass:%d, flen:%d)\n",
+ proglen, pass, flen);
+ printf("%lx + <x>:\n", *base);
+
+ regfree(&regex);
+ return ulen;
+}
+
+int main(int argc, char **argv)
+{
+ int len, klen, opcodes = 0;
+ char *kbuff;
+ unsigned long base;
+ uint8_t image[4096];
+
+ if (argc > 1) {
+ if (!strncmp("-o", argv[argc - 1], 2)) {
+ opcodes = 1;
+ } else {
+ printf("usage: bpf_jit_disasm [-o: show opcodes]\n");
+ exit(0);
+ }
+ }
+
+ bfd_init();
+ memset(image, 0, sizeof(image));
+
+ kbuff = get_klog_buff(&klen);
+
+ len = get_last_jit_image(kbuff, klen, image, sizeof(image), &base);
+ if (len > 0 && base > 0)
+ get_asm_insns(image, len, base, opcodes);
+
+ put_klog_buff(kbuff);
+
+ return 0;
+}
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index fa6ea69f2e4..d4abc59ce1d 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -4,6 +4,7 @@ TARGETS += efivarfs
TARGETS += kcmp
TARGETS += memory-hotplug
TARGETS += mqueue
+TARGETS += net
TARGETS += ptrace
TARGETS += soft-dirty
TARGETS += vm
diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
new file mode 100644
index 00000000000..00326629d4a
--- /dev/null
+++ b/tools/testing/selftests/net/.gitignore
@@ -0,0 +1,3 @@
+socket
+psock_fanout
+psock_tpacket
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
new file mode 100644
index 00000000000..750512ba2c8
--- /dev/null
+++ b/tools/testing/selftests/net/Makefile
@@ -0,0 +1,19 @@
+# Makefile for net selftests
+
+CC = $(CROSS_COMPILE)gcc
+CFLAGS = -Wall -O2 -g
+
+CFLAGS += -I../../../../usr/include/
+
+NET_PROGS = socket psock_fanout psock_tpacket
+
+all: $(NET_PROGS)
+%: %.c
+ $(CC) $(CFLAGS) -o $@ $^
+
+run_tests: all
+ @/bin/sh ./run_netsocktests || echo "sockettests: [FAIL]"
+ @/bin/sh ./run_afpackettests || echo "afpackettests: [FAIL]"
+
+clean:
+ $(RM) $(NET_PROGS)
diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c
new file mode 100644
index 00000000000..57b9c2b7c4f
--- /dev/null
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -0,0 +1,312 @@
+/*
+ * Copyright 2013 Google Inc.
+ * Author: Willem de Bruijn (willemb@google.com)
+ *
+ * A basic test of packet socket fanout behavior.
+ *
+ * Control:
+ * - create fanout fails as expected with illegal flag combinations
+ * - join fanout fails as expected with diverging types or flags
+ *
+ * Datapath:
+ * Open a pair of packet sockets and a pair of INET sockets, send a known
+ * number of packets across the two INET sockets and count the number of
+ * packets enqueued onto the two packet sockets.
+ *
+ * The test currently runs for
+ * - PACKET_FANOUT_HASH
+ * - PACKET_FANOUT_HASH with PACKET_FANOUT_FLAG_ROLLOVER
+ * - PACKET_FANOUT_LB
+ * - PACKET_FANOUT_CPU
+ * - PACKET_FANOUT_ROLLOVER
+ *
+ * Todo:
+ * - functionality: PACKET_FANOUT_FLAG_DEFRAG
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#define _GNU_SOURCE /* for sched_setaffinity */
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/filter.h>
+#include <linux/if_packet.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <netinet/udp.h>
+#include <poll.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "psock_lib.h"
+
+#define RING_NUM_FRAMES 20
+
+/* Open a socket in a given fanout mode.
+ * @return -1 if mode is bad, a valid socket otherwise */
+static int sock_fanout_open(uint16_t typeflags, int num_packets)
+{
+ int fd, val;
+
+ fd = socket(PF_PACKET, SOCK_DGRAM, htons(ETH_P_IP));
+ if (fd < 0) {
+ perror("socket packet");
+ exit(1);
+ }
+
+ /* fanout group ID is always 0: tests whether old groups are deleted */
+ val = ((int) typeflags) << 16;
+ if (setsockopt(fd, SOL_PACKET, PACKET_FANOUT, &val, sizeof(val))) {
+ if (close(fd)) {
+ perror("close packet");
+ exit(1);
+ }
+ return -1;
+ }
+
+ pair_udp_setfilter(fd);
+ return fd;
+}
+
+static char *sock_fanout_open_ring(int fd)
+{
+ struct tpacket_req req = {
+ .tp_block_size = getpagesize(),
+ .tp_frame_size = getpagesize(),
+ .tp_block_nr = RING_NUM_FRAMES,
+ .tp_frame_nr = RING_NUM_FRAMES,
+ };
+ char *ring;
+ int val = TPACKET_V2;
+
+ if (setsockopt(fd, SOL_PACKET, PACKET_VERSION, (void *) &val,
+ sizeof(val))) {
+ perror("packetsock ring setsockopt version");
+ exit(1);
+ }
+ if (setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req,
+ sizeof(req))) {
+ perror("packetsock ring setsockopt");
+ exit(1);
+ }
+
+ ring = mmap(0, req.tp_block_size * req.tp_block_nr,
+ PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ if (!ring) {
+ fprintf(stderr, "packetsock ring mmap\n");
+ exit(1);
+ }
+
+ return ring;
+}
+
+static int sock_fanout_read_ring(int fd, void *ring)
+{
+ struct tpacket2_hdr *header = ring;
+ int count = 0;
+
+ while (header->tp_status & TP_STATUS_USER && count < RING_NUM_FRAMES) {
+ count++;
+ header = ring + (count * getpagesize());
+ }
+
+ return count;
+}
+
+static int sock_fanout_read(int fds[], char *rings[], const int expect[])
+{
+ int ret[2];
+
+ ret[0] = sock_fanout_read_ring(fds[0], rings[0]);
+ ret[1] = sock_fanout_read_ring(fds[1], rings[1]);
+
+ fprintf(stderr, "info: count=%d,%d, expect=%d,%d\n",
+ ret[0], ret[1], expect[0], expect[1]);
+
+ if ((!(ret[0] == expect[0] && ret[1] == expect[1])) &&
+ (!(ret[0] == expect[1] && ret[1] == expect[0]))) {
+ fprintf(stderr, "ERROR: incorrect queue lengths\n");
+ return 1;
+ }
+
+ return 0;
+}
+
+/* Test illegal mode + flag combination */
+static void test_control_single(void)
+{
+ fprintf(stderr, "test: control single socket\n");
+
+ if (sock_fanout_open(PACKET_FANOUT_ROLLOVER |
+ PACKET_FANOUT_FLAG_ROLLOVER, 0) != -1) {
+ fprintf(stderr, "ERROR: opened socket with dual rollover\n");
+ exit(1);
+ }
+}
+
+/* Test illegal group with different modes or flags */
+static void test_control_group(void)
+{
+ int fds[2];
+
+ fprintf(stderr, "test: control multiple sockets\n");
+
+ fds[0] = sock_fanout_open(PACKET_FANOUT_HASH, 20);
+ if (fds[0] == -1) {
+ fprintf(stderr, "ERROR: failed to open HASH socket\n");
+ exit(1);
+ }
+ if (sock_fanout_open(PACKET_FANOUT_HASH |
+ PACKET_FANOUT_FLAG_DEFRAG, 10) != -1) {
+ fprintf(stderr, "ERROR: joined group with wrong flag defrag\n");
+ exit(1);
+ }
+ if (sock_fanout_open(PACKET_FANOUT_HASH |
+ PACKET_FANOUT_FLAG_ROLLOVER, 10) != -1) {
+ fprintf(stderr, "ERROR: joined group with wrong flag ro\n");
+ exit(1);
+ }
+ if (sock_fanout_open(PACKET_FANOUT_CPU, 10) != -1) {
+ fprintf(stderr, "ERROR: joined group with wrong mode\n");
+ exit(1);
+ }
+ fds[1] = sock_fanout_open(PACKET_FANOUT_HASH, 20);
+ if (fds[1] == -1) {
+ fprintf(stderr, "ERROR: failed to join group\n");
+ exit(1);
+ }
+ if (close(fds[1]) || close(fds[0])) {
+ fprintf(stderr, "ERROR: closing sockets\n");
+ exit(1);
+ }
+}
+
+static int test_datapath(uint16_t typeflags, int port_off,
+ const int expect1[], const int expect2[])
+{
+ const int expect0[] = { 0, 0 };
+ char *rings[2];
+ int fds[2], fds_udp[2][2], ret;
+
+ fprintf(stderr, "test: datapath 0x%hx\n", typeflags);
+
+ fds[0] = sock_fanout_open(typeflags, 20);
+ fds[1] = sock_fanout_open(typeflags, 20);
+ if (fds[0] == -1 || fds[1] == -1) {
+ fprintf(stderr, "ERROR: failed open\n");
+ exit(1);
+ }
+ rings[0] = sock_fanout_open_ring(fds[0]);
+ rings[1] = sock_fanout_open_ring(fds[1]);
+ pair_udp_open(fds_udp[0], PORT_BASE);
+ pair_udp_open(fds_udp[1], PORT_BASE + port_off);
+ sock_fanout_read(fds, rings, expect0);
+
+ /* Send data, but not enough to overflow a queue */
+ pair_udp_send(fds_udp[0], 15);
+ pair_udp_send(fds_udp[1], 5);
+ ret = sock_fanout_read(fds, rings, expect1);
+
+ /* Send more data, overflow the queue */
+ pair_udp_send(fds_udp[0], 15);
+ /* TODO: ensure consistent order between expect1 and expect2 */
+ ret |= sock_fanout_read(fds, rings, expect2);
+
+ if (munmap(rings[1], RING_NUM_FRAMES * getpagesize()) ||
+ munmap(rings[0], RING_NUM_FRAMES * getpagesize())) {
+ fprintf(stderr, "close rings\n");
+ exit(1);
+ }
+ if (close(fds_udp[1][1]) || close(fds_udp[1][0]) ||
+ close(fds_udp[0][1]) || close(fds_udp[0][0]) ||
+ close(fds[1]) || close(fds[0])) {
+ fprintf(stderr, "close datapath\n");
+ exit(1);
+ }
+
+ return ret;
+}
+
+static int set_cpuaffinity(int cpuid)
+{
+ cpu_set_t mask;
+
+ CPU_ZERO(&mask);
+ CPU_SET(cpuid, &mask);
+ if (sched_setaffinity(0, sizeof(mask), &mask)) {
+ if (errno != EINVAL) {
+ fprintf(stderr, "setaffinity %d\n", cpuid);
+ exit(1);
+ }
+ return 1;
+ }
+
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ const int expect_hash[2][2] = { { 15, 5 }, { 20, 5 } };
+ const int expect_hash_rb[2][2] = { { 15, 5 }, { 20, 15 } };
+ const int expect_lb[2][2] = { { 10, 10 }, { 18, 17 } };
+ const int expect_rb[2][2] = { { 20, 0 }, { 20, 15 } };
+ const int expect_cpu0[2][2] = { { 20, 0 }, { 20, 0 } };
+ const int expect_cpu1[2][2] = { { 0, 20 }, { 0, 20 } };
+ int port_off = 2, tries = 5, ret;
+
+ test_control_single();
+ test_control_group();
+
+ /* find a set of ports that do not collide onto the same socket */
+ ret = test_datapath(PACKET_FANOUT_HASH, port_off,
+ expect_hash[0], expect_hash[1]);
+ while (ret && tries--) {
+ fprintf(stderr, "info: trying alternate ports (%d)\n", tries);
+ ret = test_datapath(PACKET_FANOUT_HASH, ++port_off,
+ expect_hash[0], expect_hash[1]);
+ }
+
+ ret |= test_datapath(PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_ROLLOVER,
+ port_off, expect_hash_rb[0], expect_hash_rb[1]);
+ ret |= test_datapath(PACKET_FANOUT_LB,
+ port_off, expect_lb[0], expect_lb[1]);
+ ret |= test_datapath(PACKET_FANOUT_ROLLOVER,
+ port_off, expect_rb[0], expect_rb[1]);
+
+ set_cpuaffinity(0);
+ ret |= test_datapath(PACKET_FANOUT_CPU, port_off,
+ expect_cpu0[0], expect_cpu0[1]);
+ if (!set_cpuaffinity(1))
+ /* TODO: test that choice alternates with previous */
+ ret |= test_datapath(PACKET_FANOUT_CPU, port_off,
+ expect_cpu1[0], expect_cpu1[1]);
+
+ if (ret)
+ return 1;
+
+ printf("OK. All tests passed\n");
+ return 0;
+}
diff --git a/tools/testing/selftests/net/psock_lib.h b/tools/testing/selftests/net/psock_lib.h
new file mode 100644
index 00000000000..37da54ac85a
--- /dev/null
+++ b/tools/testing/selftests/net/psock_lib.h
@@ -0,0 +1,127 @@
+/*
+ * Copyright 2013 Google Inc.
+ * Author: Willem de Bruijn <willemb@google.com>
+ * Daniel Borkmann <dborkman@redhat.com>
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef PSOCK_LIB_H
+#define PSOCK_LIB_H
+
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <string.h>
+#include <arpa/inet.h>
+#include <unistd.h>
+
+#define DATA_LEN 100
+#define DATA_CHAR 'a'
+
+#define PORT_BASE 8000
+
+#ifndef __maybe_unused
+# define __maybe_unused __attribute__ ((__unused__))
+#endif
+
+static __maybe_unused void pair_udp_setfilter(int fd)
+{
+ struct sock_filter bpf_filter[] = {
+ { 0x80, 0, 0, 0x00000000 }, /* LD pktlen */
+ { 0x35, 0, 5, DATA_LEN }, /* JGE DATA_LEN [f goto nomatch]*/
+ { 0x30, 0, 0, 0x00000050 }, /* LD ip[80] */
+ { 0x15, 0, 3, DATA_CHAR }, /* JEQ DATA_CHAR [f goto nomatch]*/
+ { 0x30, 0, 0, 0x00000051 }, /* LD ip[81] */
+ { 0x15, 0, 1, DATA_CHAR }, /* JEQ DATA_CHAR [f goto nomatch]*/
+ { 0x06, 0, 0, 0x00000060 }, /* RET match */
+ { 0x06, 0, 0, 0x00000000 }, /* RET no match */
+ };
+ struct sock_fprog bpf_prog;
+
+ bpf_prog.filter = bpf_filter;
+ bpf_prog.len = sizeof(bpf_filter) / sizeof(struct sock_filter);
+ if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &bpf_prog,
+ sizeof(bpf_prog))) {
+ perror("setsockopt SO_ATTACH_FILTER");
+ exit(1);
+ }
+}
+
+static __maybe_unused void pair_udp_open(int fds[], uint16_t port)
+{
+ struct sockaddr_in saddr, daddr;
+
+ fds[0] = socket(PF_INET, SOCK_DGRAM, 0);
+ fds[1] = socket(PF_INET, SOCK_DGRAM, 0);
+ if (fds[0] == -1 || fds[1] == -1) {
+ fprintf(stderr, "ERROR: socket dgram\n");
+ exit(1);
+ }
+
+ memset(&saddr, 0, sizeof(saddr));
+ saddr.sin_family = AF_INET;
+ saddr.sin_port = htons(port);
+ saddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+
+ memset(&daddr, 0, sizeof(daddr));
+ daddr.sin_family = AF_INET;
+ daddr.sin_port = htons(port + 1);
+ daddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+
+ /* must bind both to get consistent hash result */
+ if (bind(fds[1], (void *) &daddr, sizeof(daddr))) {
+ perror("bind");
+ exit(1);
+ }
+ if (bind(fds[0], (void *) &saddr, sizeof(saddr))) {
+ perror("bind");
+ exit(1);
+ }
+ if (connect(fds[0], (void *) &daddr, sizeof(daddr))) {
+ perror("connect");
+ exit(1);
+ }
+}
+
+static __maybe_unused void pair_udp_send(int fds[], int num)
+{
+ char buf[DATA_LEN], rbuf[DATA_LEN];
+
+ memset(buf, DATA_CHAR, sizeof(buf));
+ while (num--) {
+ /* Should really handle EINTR and EAGAIN */
+ if (write(fds[0], buf, sizeof(buf)) != sizeof(buf)) {
+ fprintf(stderr, "ERROR: send failed left=%d\n", num);
+ exit(1);
+ }
+ if (read(fds[1], rbuf, sizeof(rbuf)) != sizeof(rbuf)) {
+ fprintf(stderr, "ERROR: recv failed left=%d\n", num);
+ exit(1);
+ }
+ if (memcmp(buf, rbuf, sizeof(buf))) {
+ fprintf(stderr, "ERROR: data failed left=%d\n", num);
+ exit(1);
+ }
+ }
+}
+
+static __maybe_unused void pair_udp_close(int fds[])
+{
+ close(fds[0]);
+ close(fds[1]);
+}
+
+#endif /* PSOCK_LIB_H */
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
new file mode 100644
index 00000000000..c41b58640a0
--- /dev/null
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -0,0 +1,824 @@
+/*
+ * Copyright 2013 Red Hat, Inc.
+ * Author: Daniel Borkmann <dborkman@redhat.com>
+ *
+ * A basic test of packet socket's TPACKET_V1/TPACKET_V2/TPACKET_V3 behavior.
+ *
+ * Control:
+ * Test the setup of the TPACKET socket with different patterns that are
+ * known to fail (TODO) resp. succeed (OK).
+ *
+ * Datapath:
+ * Open a pair of packet sockets and send resp. receive an a priori known
+ * packet pattern accross the sockets and check if it was received resp.
+ * sent correctly. Fanout in combination with RX_RING is currently not
+ * tested here.
+ *
+ * The test currently runs for
+ * - TPACKET_V1: RX_RING, TX_RING
+ * - TPACKET_V2: RX_RING, TX_RING
+ * - TPACKET_V3: RX_RING
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/mman.h>
+#include <linux/if_packet.h>
+#include <linux/filter.h>
+#include <ctype.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <bits/wordsize.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <arpa/inet.h>
+#include <stdint.h>
+#include <string.h>
+#include <assert.h>
+#include <net/if.h>
+#include <inttypes.h>
+#include <poll.h>
+
+#include "psock_lib.h"
+
+#ifndef bug_on
+# define bug_on(cond) assert(!(cond))
+#endif
+
+#ifndef __aligned_tpacket
+# define __aligned_tpacket __attribute__((aligned(TPACKET_ALIGNMENT)))
+#endif
+
+#ifndef __align_tpacket
+# define __align_tpacket(x) __attribute__((aligned(TPACKET_ALIGN(x))))
+#endif
+
+#define BLOCK_STATUS(x) ((x)->h1.block_status)
+#define BLOCK_NUM_PKTS(x) ((x)->h1.num_pkts)
+#define BLOCK_O2FP(x) ((x)->h1.offset_to_first_pkt)
+#define BLOCK_LEN(x) ((x)->h1.blk_len)
+#define BLOCK_SNUM(x) ((x)->h1.seq_num)
+#define BLOCK_O2PRIV(x) ((x)->offset_to_priv)
+#define BLOCK_PRIV(x) ((void *) ((uint8_t *) (x) + BLOCK_O2PRIV(x)))
+#define BLOCK_HDR_LEN (ALIGN_8(sizeof(struct block_desc)))
+#define ALIGN_8(x) (((x) + 8 - 1) & ~(8 - 1))
+#define BLOCK_PLUS_PRIV(sz_pri) (BLOCK_HDR_LEN + ALIGN_8((sz_pri)))
+
+#define NUM_PACKETS 100
+
+struct ring {
+ struct iovec *rd;
+ uint8_t *mm_space;
+ size_t mm_len, rd_len;
+ struct sockaddr_ll ll;
+ void (*walk)(int sock, struct ring *ring);
+ int type, rd_num, flen, version;
+ union {
+ struct tpacket_req req;
+ struct tpacket_req3 req3;
+ };
+};
+
+struct block_desc {
+ uint32_t version;
+ uint32_t offset_to_priv;
+ struct tpacket_hdr_v1 h1;
+};
+
+union frame_map {
+ struct {
+ struct tpacket_hdr tp_h __aligned_tpacket;
+ struct sockaddr_ll s_ll __align_tpacket(sizeof(struct tpacket_hdr));
+ } *v1;
+ struct {
+ struct tpacket2_hdr tp_h __aligned_tpacket;
+ struct sockaddr_ll s_ll __align_tpacket(sizeof(struct tpacket2_hdr));
+ } *v2;
+ void *raw;
+};
+
+static unsigned int total_packets, total_bytes;
+
+static int pfsocket(int ver)
+{
+ int ret, sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+ if (sock == -1) {
+ perror("socket");
+ exit(1);
+ }
+
+ ret = setsockopt(sock, SOL_PACKET, PACKET_VERSION, &ver, sizeof(ver));
+ if (ret == -1) {
+ perror("setsockopt");
+ exit(1);
+ }
+
+ return sock;
+}
+
+static void status_bar_update(void)
+{
+ if (total_packets % 10 == 0) {
+ fprintf(stderr, ".");
+ fflush(stderr);
+ }
+}
+
+static void test_payload(void *pay, size_t len)
+{
+ struct ethhdr *eth = pay;
+
+ if (len < sizeof(struct ethhdr)) {
+ fprintf(stderr, "test_payload: packet too "
+ "small: %zu bytes!\n", len);
+ exit(1);
+ }
+
+ if (eth->h_proto != htons(ETH_P_IP)) {
+ fprintf(stderr, "test_payload: wrong ethernet "
+ "type: 0x%x!\n", ntohs(eth->h_proto));
+ exit(1);
+ }
+}
+
+static void create_payload(void *pay, size_t *len)
+{
+ int i;
+ struct ethhdr *eth = pay;
+ struct iphdr *ip = pay + sizeof(*eth);
+
+ /* Lets create some broken crap, that still passes
+ * our BPF filter.
+ */
+
+ *len = DATA_LEN + 42;
+
+ memset(pay, 0xff, ETH_ALEN * 2);
+ eth->h_proto = htons(ETH_P_IP);
+
+ for (i = 0; i < sizeof(*ip); ++i)
+ ((uint8_t *) pay)[i + sizeof(*eth)] = (uint8_t) rand();
+
+ ip->ihl = 5;
+ ip->version = 4;
+ ip->protocol = 0x11;
+ ip->frag_off = 0;
+ ip->ttl = 64;
+ ip->tot_len = htons((uint16_t) *len - sizeof(*eth));
+
+ ip->saddr = htonl(INADDR_LOOPBACK);
+ ip->daddr = htonl(INADDR_LOOPBACK);
+
+ memset(pay + sizeof(*eth) + sizeof(*ip),
+ DATA_CHAR, DATA_LEN);
+}
+
+static inline int __v1_rx_kernel_ready(struct tpacket_hdr *hdr)
+{
+ return ((hdr->tp_status & TP_STATUS_USER) == TP_STATUS_USER);
+}
+
+static inline void __v1_rx_user_ready(struct tpacket_hdr *hdr)
+{
+ hdr->tp_status = TP_STATUS_KERNEL;
+ __sync_synchronize();
+}
+
+static inline int __v2_rx_kernel_ready(struct tpacket2_hdr *hdr)
+{
+ return ((hdr->tp_status & TP_STATUS_USER) == TP_STATUS_USER);
+}
+
+static inline void __v2_rx_user_ready(struct tpacket2_hdr *hdr)
+{
+ hdr->tp_status = TP_STATUS_KERNEL;
+ __sync_synchronize();
+}
+
+static inline int __v1_v2_rx_kernel_ready(void *base, int version)
+{
+ switch (version) {
+ case TPACKET_V1:
+ return __v1_rx_kernel_ready(base);
+ case TPACKET_V2:
+ return __v2_rx_kernel_ready(base);
+ default:
+ bug_on(1);
+ return 0;
+ }
+}
+
+static inline void __v1_v2_rx_user_ready(void *base, int version)
+{
+ switch (version) {
+ case TPACKET_V1:
+ __v1_rx_user_ready(base);
+ break;
+ case TPACKET_V2:
+ __v2_rx_user_ready(base);
+ break;
+ }
+}
+
+static void walk_v1_v2_rx(int sock, struct ring *ring)
+{
+ struct pollfd pfd;
+ int udp_sock[2];
+ union frame_map ppd;
+ unsigned int frame_num = 0;
+
+ bug_on(ring->type != PACKET_RX_RING);
+
+ pair_udp_open(udp_sock, PORT_BASE);
+ pair_udp_setfilter(sock);
+
+ memset(&pfd, 0, sizeof(pfd));
+ pfd.fd = sock;
+ pfd.events = POLLIN | POLLERR;
+ pfd.revents = 0;
+
+ pair_udp_send(udp_sock, NUM_PACKETS);
+
+ while (total_packets < NUM_PACKETS * 2) {
+ while (__v1_v2_rx_kernel_ready(ring->rd[frame_num].iov_base,
+ ring->version)) {
+ ppd.raw = ring->rd[frame_num].iov_base;
+
+ switch (ring->version) {
+ case TPACKET_V1:
+ test_payload((uint8_t *) ppd.raw + ppd.v1->tp_h.tp_mac,
+ ppd.v1->tp_h.tp_snaplen);
+ total_bytes += ppd.v1->tp_h.tp_snaplen;
+ break;
+
+ case TPACKET_V2:
+ test_payload((uint8_t *) ppd.raw + ppd.v2->tp_h.tp_mac,
+ ppd.v2->tp_h.tp_snaplen);
+ total_bytes += ppd.v2->tp_h.tp_snaplen;
+ break;
+ }
+
+ status_bar_update();
+ total_packets++;
+
+ __v1_v2_rx_user_ready(ppd.raw, ring->version);
+
+ frame_num = (frame_num + 1) % ring->rd_num;
+ }
+
+ poll(&pfd, 1, 1);
+ }
+
+ pair_udp_close(udp_sock);
+
+ if (total_packets != 2 * NUM_PACKETS) {
+ fprintf(stderr, "walk_v%d_rx: received %u out of %u pkts\n",
+ ring->version, total_packets, NUM_PACKETS);
+ exit(1);
+ }
+
+ fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, total_bytes >> 1);
+}
+
+static inline int __v1_tx_kernel_ready(struct tpacket_hdr *hdr)
+{
+ return !(hdr->tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING));
+}
+
+static inline void __v1_tx_user_ready(struct tpacket_hdr *hdr)
+{
+ hdr->tp_status = TP_STATUS_SEND_REQUEST;
+ __sync_synchronize();
+}
+
+static inline int __v2_tx_kernel_ready(struct tpacket2_hdr *hdr)
+{
+ return !(hdr->tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING));
+}
+
+static inline void __v2_tx_user_ready(struct tpacket2_hdr *hdr)
+{
+ hdr->tp_status = TP_STATUS_SEND_REQUEST;
+ __sync_synchronize();
+}
+
+static inline int __v1_v2_tx_kernel_ready(void *base, int version)
+{
+ switch (version) {
+ case TPACKET_V1:
+ return __v1_tx_kernel_ready(base);
+ case TPACKET_V2:
+ return __v2_tx_kernel_ready(base);
+ default:
+ bug_on(1);
+ return 0;
+ }
+}
+
+static inline void __v1_v2_tx_user_ready(void *base, int version)
+{
+ switch (version) {
+ case TPACKET_V1:
+ __v1_tx_user_ready(base);
+ break;
+ case TPACKET_V2:
+ __v2_tx_user_ready(base);
+ break;
+ }
+}
+
+static void __v1_v2_set_packet_loss_discard(int sock)
+{
+ int ret, discard = 1;
+
+ ret = setsockopt(sock, SOL_PACKET, PACKET_LOSS, (void *) &discard,
+ sizeof(discard));
+ if (ret == -1) {
+ perror("setsockopt");
+ exit(1);
+ }
+}
+
+static void walk_v1_v2_tx(int sock, struct ring *ring)
+{
+ struct pollfd pfd;
+ int rcv_sock, ret;
+ size_t packet_len;
+ union frame_map ppd;
+ char packet[1024];
+ unsigned int frame_num = 0, got = 0;
+ struct sockaddr_ll ll = {
+ .sll_family = PF_PACKET,
+ .sll_halen = ETH_ALEN,
+ };
+
+ bug_on(ring->type != PACKET_TX_RING);
+ bug_on(ring->rd_num < NUM_PACKETS);
+
+ rcv_sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+ if (rcv_sock == -1) {
+ perror("socket");
+ exit(1);
+ }
+
+ pair_udp_setfilter(rcv_sock);
+
+ ll.sll_ifindex = if_nametoindex("lo");
+ ret = bind(rcv_sock, (struct sockaddr *) &ll, sizeof(ll));
+ if (ret == -1) {
+ perror("bind");
+ exit(1);
+ }
+
+ memset(&pfd, 0, sizeof(pfd));
+ pfd.fd = sock;
+ pfd.events = POLLOUT | POLLERR;
+ pfd.revents = 0;
+
+ total_packets = NUM_PACKETS;
+ create_payload(packet, &packet_len);
+
+ while (total_packets > 0) {
+ while (__v1_v2_tx_kernel_ready(ring->rd[frame_num].iov_base,
+ ring->version) &&
+ total_packets > 0) {
+ ppd.raw = ring->rd[frame_num].iov_base;
+
+ switch (ring->version) {
+ case TPACKET_V1:
+ ppd.v1->tp_h.tp_snaplen = packet_len;
+ ppd.v1->tp_h.tp_len = packet_len;
+
+ memcpy((uint8_t *) ppd.raw + TPACKET_HDRLEN -
+ sizeof(struct sockaddr_ll), packet,
+ packet_len);
+ total_bytes += ppd.v1->tp_h.tp_snaplen;
+ break;
+
+ case TPACKET_V2:
+ ppd.v2->tp_h.tp_snaplen = packet_len;
+ ppd.v2->tp_h.tp_len = packet_len;
+
+ memcpy((uint8_t *) ppd.raw + TPACKET2_HDRLEN -
+ sizeof(struct sockaddr_ll), packet,
+ packet_len);
+ total_bytes += ppd.v2->tp_h.tp_snaplen;
+ break;
+ }
+
+ status_bar_update();
+ total_packets--;
+
+ __v1_v2_tx_user_ready(ppd.raw, ring->version);
+
+ frame_num = (frame_num + 1) % ring->rd_num;
+ }
+
+ poll(&pfd, 1, 1);
+ }
+
+ bug_on(total_packets != 0);
+
+ ret = sendto(sock, NULL, 0, 0, NULL, 0);
+ if (ret == -1) {
+ perror("sendto");
+ exit(1);
+ }
+
+ while ((ret = recvfrom(rcv_sock, packet, sizeof(packet),
+ 0, NULL, NULL)) > 0 &&
+ total_packets < NUM_PACKETS) {
+ got += ret;
+ test_payload(packet, ret);
+
+ status_bar_update();
+ total_packets++;
+ }
+
+ close(rcv_sock);
+
+ if (total_packets != NUM_PACKETS) {
+ fprintf(stderr, "walk_v%d_rx: received %u out of %u pkts\n",
+ ring->version, total_packets, NUM_PACKETS);
+ exit(1);
+ }
+
+ fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, got);
+}
+
+static void walk_v1_v2(int sock, struct ring *ring)
+{
+ if (ring->type == PACKET_RX_RING)
+ walk_v1_v2_rx(sock, ring);
+ else
+ walk_v1_v2_tx(sock, ring);
+}
+
+static uint64_t __v3_prev_block_seq_num = 0;
+
+void __v3_test_block_seq_num(struct block_desc *pbd)
+{
+ if (__v3_prev_block_seq_num + 1 != BLOCK_SNUM(pbd)) {
+ fprintf(stderr, "\nprev_block_seq_num:%"PRIu64", expected "
+ "seq:%"PRIu64" != actual seq:%"PRIu64"\n",
+ __v3_prev_block_seq_num, __v3_prev_block_seq_num + 1,
+ (uint64_t) BLOCK_SNUM(pbd));
+ exit(1);
+ }
+
+ __v3_prev_block_seq_num = BLOCK_SNUM(pbd);
+}
+
+static void __v3_test_block_len(struct block_desc *pbd, uint32_t bytes, int block_num)
+{
+ if (BLOCK_NUM_PKTS(pbd)) {
+ if (bytes != BLOCK_LEN(pbd)) {
+ fprintf(stderr, "\nblock:%u with %upackets, expected "
+ "len:%u != actual len:%u\n", block_num,
+ BLOCK_NUM_PKTS(pbd), bytes, BLOCK_LEN(pbd));
+ exit(1);
+ }
+ } else {
+ if (BLOCK_LEN(pbd) != BLOCK_PLUS_PRIV(13)) {
+ fprintf(stderr, "\nblock:%u, expected len:%lu != "
+ "actual len:%u\n", block_num, BLOCK_HDR_LEN,
+ BLOCK_LEN(pbd));
+ exit(1);
+ }
+ }
+}
+
+static void __v3_test_block_header(struct block_desc *pbd, const int block_num)
+{
+ uint32_t block_status = BLOCK_STATUS(pbd);
+
+ if ((block_status & TP_STATUS_USER) == 0) {
+ fprintf(stderr, "\nblock %u: not in TP_STATUS_USER\n", block_num);
+ exit(1);
+ }
+
+ __v3_test_block_seq_num(pbd);
+}
+
+static void __v3_walk_block(struct block_desc *pbd, const int block_num)
+{
+ int num_pkts = BLOCK_NUM_PKTS(pbd), i;
+ unsigned long bytes = 0;
+ unsigned long bytes_with_padding = BLOCK_PLUS_PRIV(13);
+ struct tpacket3_hdr *ppd;
+
+ __v3_test_block_header(pbd, block_num);
+
+ ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd + BLOCK_O2FP(pbd));
+ for (i = 0; i < num_pkts; ++i) {
+ bytes += ppd->tp_snaplen;
+
+ if (ppd->tp_next_offset)
+ bytes_with_padding += ppd->tp_next_offset;
+ else
+ bytes_with_padding += ALIGN_8(ppd->tp_snaplen + ppd->tp_mac);
+
+ test_payload((uint8_t *) ppd + ppd->tp_mac, ppd->tp_snaplen);
+
+ status_bar_update();
+ total_packets++;
+
+ ppd = (struct tpacket3_hdr *) ((uint8_t *) ppd + ppd->tp_next_offset);
+ __sync_synchronize();
+ }
+
+ __v3_test_block_len(pbd, bytes_with_padding, block_num);
+ total_bytes += bytes;
+}
+
+void __v3_flush_block(struct block_desc *pbd)
+{
+ BLOCK_STATUS(pbd) = TP_STATUS_KERNEL;
+ __sync_synchronize();
+}
+
+static void walk_v3_rx(int sock, struct ring *ring)
+{
+ unsigned int block_num = 0;
+ struct pollfd pfd;
+ struct block_desc *pbd;
+ int udp_sock[2];
+
+ bug_on(ring->type != PACKET_RX_RING);
+
+ pair_udp_open(udp_sock, PORT_BASE);
+ pair_udp_setfilter(sock);
+
+ memset(&pfd, 0, sizeof(pfd));
+ pfd.fd = sock;
+ pfd.events = POLLIN | POLLERR;
+ pfd.revents = 0;
+
+ pair_udp_send(udp_sock, NUM_PACKETS);
+
+ while (total_packets < NUM_PACKETS * 2) {
+ pbd = (struct block_desc *) ring->rd[block_num].iov_base;
+
+ while ((BLOCK_STATUS(pbd) & TP_STATUS_USER) == 0)
+ poll(&pfd, 1, 1);
+
+ __v3_walk_block(pbd, block_num);
+ __v3_flush_block(pbd);
+
+ block_num = (block_num + 1) % ring->rd_num;
+ }
+
+ pair_udp_close(udp_sock);
+
+ if (total_packets != 2 * NUM_PACKETS) {
+ fprintf(stderr, "walk_v3_rx: received %u out of %u pkts\n",
+ total_packets, NUM_PACKETS);
+ exit(1);
+ }
+
+ fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, total_bytes >> 1);
+}
+
+static void walk_v3(int sock, struct ring *ring)
+{
+ if (ring->type == PACKET_RX_RING)
+ walk_v3_rx(sock, ring);
+ else
+ bug_on(1);
+}
+
+static void __v1_v2_fill(struct ring *ring, unsigned int blocks)
+{
+ ring->req.tp_block_size = getpagesize() << 2;
+ ring->req.tp_frame_size = TPACKET_ALIGNMENT << 7;
+ ring->req.tp_block_nr = blocks;
+
+ ring->req.tp_frame_nr = ring->req.tp_block_size /
+ ring->req.tp_frame_size *
+ ring->req.tp_block_nr;
+
+ ring->mm_len = ring->req.tp_block_size * ring->req.tp_block_nr;
+ ring->walk = walk_v1_v2;
+ ring->rd_num = ring->req.tp_frame_nr;
+ ring->flen = ring->req.tp_frame_size;
+}
+
+static void __v3_fill(struct ring *ring, unsigned int blocks)
+{
+ ring->req3.tp_retire_blk_tov = 64;
+ ring->req3.tp_sizeof_priv = 13;
+ ring->req3.tp_feature_req_word |= TP_FT_REQ_FILL_RXHASH;
+
+ ring->req3.tp_block_size = getpagesize() << 2;
+ ring->req3.tp_frame_size = TPACKET_ALIGNMENT << 7;
+ ring->req3.tp_block_nr = blocks;
+
+ ring->req3.tp_frame_nr = ring->req3.tp_block_size /
+ ring->req3.tp_frame_size *
+ ring->req3.tp_block_nr;
+
+ ring->mm_len = ring->req3.tp_block_size * ring->req3.tp_block_nr;
+ ring->walk = walk_v3;
+ ring->rd_num = ring->req3.tp_block_nr;
+ ring->flen = ring->req3.tp_block_size;
+}
+
+static void setup_ring(int sock, struct ring *ring, int version, int type)
+{
+ int ret = 0;
+ unsigned int blocks = 256;
+
+ ring->type = type;
+ ring->version = version;
+
+ switch (version) {
+ case TPACKET_V1:
+ case TPACKET_V2:
+ if (type == PACKET_TX_RING)
+ __v1_v2_set_packet_loss_discard(sock);
+ __v1_v2_fill(ring, blocks);
+ ret = setsockopt(sock, SOL_PACKET, type, &ring->req,
+ sizeof(ring->req));
+ break;
+
+ case TPACKET_V3:
+ __v3_fill(ring, blocks);
+ ret = setsockopt(sock, SOL_PACKET, type, &ring->req3,
+ sizeof(ring->req3));
+ break;
+ }
+
+ if (ret == -1) {
+ perror("setsockopt");
+ exit(1);
+ }
+
+ ring->rd_len = ring->rd_num * sizeof(*ring->rd);
+ ring->rd = malloc(ring->rd_len);
+ if (ring->rd == NULL) {
+ perror("malloc");
+ exit(1);
+ }
+
+ total_packets = 0;
+ total_bytes = 0;
+}
+
+static void mmap_ring(int sock, struct ring *ring)
+{
+ int i;
+
+ ring->mm_space = mmap(0, ring->mm_len, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_LOCKED | MAP_POPULATE, sock, 0);
+ if (ring->mm_space == MAP_FAILED) {
+ perror("mmap");
+ exit(1);
+ }
+
+ memset(ring->rd, 0, ring->rd_len);
+ for (i = 0; i < ring->rd_num; ++i) {
+ ring->rd[i].iov_base = ring->mm_space + (i * ring->flen);
+ ring->rd[i].iov_len = ring->flen;
+ }
+}
+
+static void bind_ring(int sock, struct ring *ring)
+{
+ int ret;
+
+ ring->ll.sll_family = PF_PACKET;
+ ring->ll.sll_protocol = htons(ETH_P_ALL);
+ ring->ll.sll_ifindex = if_nametoindex("lo");
+ ring->ll.sll_hatype = 0;
+ ring->ll.sll_pkttype = 0;
+ ring->ll.sll_halen = 0;
+
+ ret = bind(sock, (struct sockaddr *) &ring->ll, sizeof(ring->ll));
+ if (ret == -1) {
+ perror("bind");
+ exit(1);
+ }
+}
+
+static void walk_ring(int sock, struct ring *ring)
+{
+ ring->walk(sock, ring);
+}
+
+static void unmap_ring(int sock, struct ring *ring)
+{
+ munmap(ring->mm_space, ring->mm_len);
+ free(ring->rd);
+}
+
+static int test_kernel_bit_width(void)
+{
+ char in[512], *ptr;
+ int num = 0, fd;
+ ssize_t ret;
+
+ fd = open("/proc/kallsyms", O_RDONLY);
+ if (fd == -1) {
+ perror("open");
+ exit(1);
+ }
+
+ ret = read(fd, in, sizeof(in));
+ if (ret <= 0) {
+ perror("read");
+ exit(1);
+ }
+
+ close(fd);
+
+ ptr = in;
+ while(!isspace(*ptr)) {
+ num++;
+ ptr++;
+ }
+
+ return num * 4;
+}
+
+static int test_user_bit_width(void)
+{
+ return __WORDSIZE;
+}
+
+static const char *tpacket_str[] = {
+ [TPACKET_V1] = "TPACKET_V1",
+ [TPACKET_V2] = "TPACKET_V2",
+ [TPACKET_V3] = "TPACKET_V3",
+};
+
+static const char *type_str[] = {
+ [PACKET_RX_RING] = "PACKET_RX_RING",
+ [PACKET_TX_RING] = "PACKET_TX_RING",
+};
+
+static int test_tpacket(int version, int type)
+{
+ int sock;
+ struct ring ring;
+
+ fprintf(stderr, "test: %s with %s ", tpacket_str[version],
+ type_str[type]);
+ fflush(stderr);
+
+ if (version == TPACKET_V1 &&
+ test_kernel_bit_width() != test_user_bit_width()) {
+ fprintf(stderr, "test: skip %s %s since user and kernel "
+ "space have different bit width\n",
+ tpacket_str[version], type_str[type]);
+ return 0;
+ }
+
+ sock = pfsocket(version);
+ memset(&ring, 0, sizeof(ring));
+ setup_ring(sock, &ring, version, type);
+ mmap_ring(sock, &ring);
+ bind_ring(sock, &ring);
+ walk_ring(sock, &ring);
+ unmap_ring(sock, &ring);
+ close(sock);
+
+ fprintf(stderr, "\n");
+ return 0;
+}
+
+int main(void)
+{
+ int ret = 0;
+
+ ret |= test_tpacket(TPACKET_V1, PACKET_RX_RING);
+ ret |= test_tpacket(TPACKET_V1, PACKET_TX_RING);
+
+ ret |= test_tpacket(TPACKET_V2, PACKET_RX_RING);
+ ret |= test_tpacket(TPACKET_V2, PACKET_TX_RING);
+
+ ret |= test_tpacket(TPACKET_V3, PACKET_RX_RING);
+
+ if (ret)
+ return 1;
+
+ printf("OK. All tests passed\n");
+ return 0;
+}
diff --git a/tools/testing/selftests/net/run_afpackettests b/tools/testing/selftests/net/run_afpackettests
new file mode 100644
index 00000000000..5246e782d6e
--- /dev/null
+++ b/tools/testing/selftests/net/run_afpackettests
@@ -0,0 +1,26 @@
+#!/bin/sh
+
+if [ $(id -u) != 0 ]; then
+ echo $msg must be run as root >&2
+ exit 0
+fi
+
+echo "--------------------"
+echo "running psock_fanout test"
+echo "--------------------"
+./psock_fanout
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+else
+ echo "[PASS]"
+fi
+
+echo "--------------------"
+echo "running psock_tpacket test"
+echo "--------------------"
+./psock_tpacket
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+else
+ echo "[PASS]"
+fi
diff --git a/tools/testing/selftests/net/run_netsocktests b/tools/testing/selftests/net/run_netsocktests
new file mode 100644
index 00000000000..c09a682df56
--- /dev/null
+++ b/tools/testing/selftests/net/run_netsocktests
@@ -0,0 +1,12 @@
+#!/bin/bash
+
+echo "--------------------"
+echo "running socket test"
+echo "--------------------"
+./socket
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+else
+ echo "[PASS]"
+fi
+
diff --git a/tools/testing/selftests/net/socket.c b/tools/testing/selftests/net/socket.c
new file mode 100644
index 00000000000..0f227f2f9be
--- /dev/null
+++ b/tools/testing/selftests/net/socket.c
@@ -0,0 +1,92 @@
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+
+struct socket_testcase {
+ int domain;
+ int type;
+ int protocol;
+
+ /* 0 = valid file descriptor
+ * -foo = error foo
+ */
+ int expect;
+
+ /* If non-zero, accept EAFNOSUPPORT to handle the case
+ * of the protocol not being configured into the kernel.
+ */
+ int nosupport_ok;
+};
+
+static struct socket_testcase tests[] = {
+ { AF_MAX, 0, 0, -EAFNOSUPPORT, 0 },
+ { AF_INET, SOCK_STREAM, IPPROTO_TCP, 0, 1 },
+ { AF_INET, SOCK_DGRAM, IPPROTO_TCP, -EPROTONOSUPPORT, 1 },
+ { AF_INET, SOCK_DGRAM, IPPROTO_UDP, 0, 1 },
+ { AF_INET, SOCK_STREAM, IPPROTO_UDP, -EPROTONOSUPPORT, 1 },
+};
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#define ERR_STRING_SZ 64
+
+static int run_tests(void)
+{
+ char err_string1[ERR_STRING_SZ];
+ char err_string2[ERR_STRING_SZ];
+ int i, err;
+
+ err = 0;
+ for (i = 0; i < ARRAY_SIZE(tests); i++) {
+ struct socket_testcase *s = &tests[i];
+ int fd;
+
+ fd = socket(s->domain, s->type, s->protocol);
+ if (fd < 0) {
+ if (s->nosupport_ok &&
+ errno == EAFNOSUPPORT)
+ continue;
+
+ if (s->expect < 0 &&
+ errno == -s->expect)
+ continue;
+
+ strerror_r(-s->expect, err_string1, ERR_STRING_SZ);
+ strerror_r(errno, err_string2, ERR_STRING_SZ);
+
+ fprintf(stderr, "socket(%d, %d, %d) expected "
+ "err (%s) got (%s)\n",
+ s->domain, s->type, s->protocol,
+ err_string1, err_string2);
+
+ err = -1;
+ break;
+ } else {
+ close(fd);
+
+ if (s->expect < 0) {
+ strerror_r(errno, err_string1, ERR_STRING_SZ);
+
+ fprintf(stderr, "socket(%d, %d, %d) expected "
+ "success got err (%s)\n",
+ s->domain, s->type, s->protocol,
+ err_string1);
+
+ err = -1;
+ break;
+ }
+ }
+ }
+
+ return err;
+}
+
+int main(void)
+{
+ int err = run_tests();
+
+ return err;
+}