-- 
cgit v1.2.3

From 38a94118a69c531598a85b59b6fc26aa0b996110 Mon Sep 17 00:00:00 2001
From: James Morris <james.l.morris@oracle.com>
Date: Mon, 9 Apr 2012 11:03:36 +1000
Subject: maintainers: add kernel/capability.c to capabilities entry

Add kernel/capability.c to capabilities entry.

Reported-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2dcfca85063..8cbc9e3ba40 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1731,6 +1731,7 @@ S:	Supported
 F:	include/linux/capability.h
 F:	security/capability.c
 F:	security/commoncap.c 
+F:	kernel/capability.c
 
 CELL BROADBAND ENGINE ARCHITECTURE
 M:	Arnd Bergmann <arnd@arndb.de>
-- 
cgit v1.2.3


From 9ccf010f8172b699ea80178860e8ea228f7dce56 Mon Sep 17 00:00:00 2001
From: James Morris <james.l.morris@oracle.com>
Date: Mon, 9 Apr 2012 15:48:07 +1000
Subject: maintainers: update wiki url for the security subsystem

Update the wiki url for the security subsystem to:

http://kernsec.org/

Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8cbc9e3ba40..f31aa0c247e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5956,7 +5956,7 @@ SECURITY SUBSYSTEM
 M:	James Morris <james.l.morris@oracle.com>
 L:	linux-security-module@vger.kernel.org (suggested Cc:)
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git
-W:	http://security.wiki.kernel.org/
+W:	http://kernsec.org/
 S:	Supported
 F:	security/
 
-- 
cgit v1.2.3


From 47a93a5bcb131879d4425d4559e90ad82990825d Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Thu, 16 Feb 2012 15:08:39 -0500
Subject: SELinux: allow seek operations on the file exposing policy

sesearch uses:
lseek(3, 0, SEEK_SET)                   = -1 ESPIPE (Illegal seek)

Make that work.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/selinuxfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index d7018bfa1f0..d6ae2d40730 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -496,6 +496,7 @@ static const struct file_operations sel_policy_ops = {
 	.read		= sel_read_policy,
 	.mmap		= sel_mmap_policy,
 	.release	= sel_release_policy,
+	.llseek		= generic_file_llseek,
 };
 
 static ssize_t sel_write_load(struct file *file, const char __user *buf,
-- 
cgit v1.2.3


From 72e8c8593f8fdb983d9cd79d824f6b48ef21f14f Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Thu, 16 Feb 2012 15:08:39 -0500
Subject: SELinux: loosen DAC perms on reading policy

There is no reason the DAC perms on reading the policy file need to be root
only.  There are selinux checks which should control this access.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/selinuxfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index d6ae2d40730..f4b5a0baaec 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -1832,7 +1832,7 @@ static int sel_fill_super(struct super_block *sb, void *data, int silent)
 		[SEL_REJECT_UNKNOWN] = {"reject_unknown", &sel_handle_unknown_ops, S_IRUGO},
 		[SEL_DENY_UNKNOWN] = {"deny_unknown", &sel_handle_unknown_ops, S_IRUGO},
 		[SEL_STATUS] = {"status", &sel_handle_status_ops, S_IRUGO},
-		[SEL_POLICY] = {"policy", &sel_policy_ops, S_IRUSR},
+		[SEL_POLICY] = {"policy", &sel_policy_ops, S_IRUGO},
 		/* last one */ {""}
 	};
 	ret = simple_fill_super(sb, SELINUX_MAGIC, selinux_files);
-- 
cgit v1.2.3


From 6ce74ec75ca690c4fb3a3c5f8b7767d094d93215 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Thu, 16 Feb 2012 15:08:39 -0500
Subject: SELinux: include flow.h where used rather than get it indirectly

We use flow_cache_genid in the selinux xfrm files.  This is declared in
net/flow.h  However we do not include that file directly anywhere.  We have
always just gotten it through a long chain of indirect .h file includes.

on x86_64:

  CC      security/selinux/ss/services.o
In file included from
/next/linux-next-20120216/security/selinux/ss/services.c:69:0:
/next/linux-next-20120216/security/selinux/include/xfrm.h: In function 'selinux_xfrm_notify_policyload':
/next/linux-next-20120216/security/selinux/include/xfrm.h:51:14: error: 'flow_cache_genid' undeclared (first use in this function)
/next/linux-next-20120216/security/selinux/include/xfrm.h:51:14: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [security/selinux/ss/services.o] Error 1

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
---
 net/xfrm/xfrm_policy.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7661576b6f4..596f125658f 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -26,6 +26,7 @@
 #include <linux/cache.h>
 #include <linux/audit.h>
 #include <net/dst.h>
+#include <net/flow.h>
 #include <net/xfrm.h>
 #include <net/ip.h>
 #ifdef CONFIG_XFRM_STATISTICS
-- 
cgit v1.2.3


From aa893269de6277b44be88e25dcd5331c934c29c4 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Tue, 20 Mar 2012 14:35:12 -0400
Subject: SELinux: allow default source/target selectors for user/role/range

When new objects are created we have great and flexible rules to
determine the type of the new object.  We aren't quite as flexible or
mature when it comes to determining the user, role, and range.  This
patch adds a new ability to specify the place a new objects user, role,
and range should come from.  For users and roles it can come from either
the source or the target of the operation.  aka for files the user can
either come from the source (the running process and todays default) or
it can come from the target (aka the parent directory of the new file)

examples always are done with
directory context: system_u:object_r:mnt_t:s0-s0:c0.c512
process context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

[no rule]
	unconfined_u:object_r:mnt_t:s0   test_none
[default user source]
	unconfined_u:object_r:mnt_t:s0   test_user_source
[default user target]
	system_u:object_r:mnt_t:s0       test_user_target
[default role source]
	unconfined_u:unconfined_r:mnt_t:s0 test_role_source
[default role target]
	unconfined_u:object_r:mnt_t:s0   test_role_target
[default range source low]
	unconfined_u:object_r:mnt_t:s0 test_range_source_low
[default range source high]
	unconfined_u:object_r:mnt_t:s0:c0.c1023 test_range_source_high
[default range source low-high]
	unconfined_u:object_r:mnt_t:s0-s0:c0.c1023 test_range_source_low-high
[default range target low]
	unconfined_u:object_r:mnt_t:s0 test_range_target_low
[default range target high]
	unconfined_u:object_r:mnt_t:s0:c0.c512 test_range_target_high
[default range target low-high]
	unconfined_u:object_r:mnt_t:s0-s0:c0.c512 test_range_target_low-high

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/include/security.h |  3 ++-
 security/selinux/ss/context.h       | 20 ++++++++++++++++++++
 security/selinux/ss/mls.c           | 24 ++++++++++++++++++++++++
 security/selinux/ss/policydb.c      | 25 +++++++++++++++++++++++++
 security/selinux/ss/policydb.h      | 13 +++++++++++++
 security/selinux/ss/services.c      | 32 +++++++++++++++++++++++++-------
 6 files changed, 109 insertions(+), 8 deletions(-)

diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index d871e8ad210..ba53400195c 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -31,13 +31,14 @@
 #define POLICYDB_VERSION_BOUNDARY	24
 #define POLICYDB_VERSION_FILENAME_TRANS	25
 #define POLICYDB_VERSION_ROLETRANS	26
+#define POLICYDB_VERSION_NEW_OBJECT_DEFAULTS	27
 
 /* Range of policy versions we understand*/
 #define POLICYDB_VERSION_MIN   POLICYDB_VERSION_BASE
 #ifdef CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX
 #define POLICYDB_VERSION_MAX	CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX_VALUE
 #else
-#define POLICYDB_VERSION_MAX	POLICYDB_VERSION_ROLETRANS
+#define POLICYDB_VERSION_MAX	POLICYDB_VERSION_NEW_OBJECT_DEFAULTS
 #endif
 
 /* Mask for just the mount related flags */
diff --git a/security/selinux/ss/context.h b/security/selinux/ss/context.h
index 45e8fb0515f..212e3479a0d 100644
--- a/security/selinux/ss/context.h
+++ b/security/selinux/ss/context.h
@@ -74,6 +74,26 @@ out:
 	return rc;
 }
 
+/*
+ * Sets both levels in the MLS range of 'dst' to the high level of 'src'.
+ */
+static inline int mls_context_cpy_high(struct context *dst, struct context *src)
+{
+	int rc;
+
+	dst->range.level[0].sens = src->range.level[1].sens;
+	rc = ebitmap_cpy(&dst->range.level[0].cat, &src->range.level[1].cat);
+	if (rc)
+		goto out;
+
+	dst->range.level[1].sens = src->range.level[1].sens;
+	rc = ebitmap_cpy(&dst->range.level[1].cat, &src->range.level[1].cat);
+	if (rc)
+		ebitmap_destroy(&dst->range.level[0].cat);
+out:
+	return rc;
+}
+
 static inline int mls_context_cmp(struct context *c1, struct context *c2)
 {
 	return ((c1->range.level[0].sens == c2->range.level[0].sens) &&
diff --git a/security/selinux/ss/mls.c b/security/selinux/ss/mls.c
index fbf9c5816c7..40de8d3f208 100644
--- a/security/selinux/ss/mls.c
+++ b/security/selinux/ss/mls.c
@@ -517,6 +517,8 @@ int mls_compute_sid(struct context *scontext,
 {
 	struct range_trans rtr;
 	struct mls_range *r;
+	struct class_datum *cladatum;
+	int default_range = 0;
 
 	if (!policydb.mls_enabled)
 		return 0;
@@ -530,6 +532,28 @@ int mls_compute_sid(struct context *scontext,
 		r = hashtab_search(policydb.range_tr, &rtr);
 		if (r)
 			return mls_range_set(newcontext, r);
+
+		if (tclass && tclass <= policydb.p_classes.nprim) {
+			cladatum = policydb.class_val_to_struct[tclass - 1];
+			if (cladatum)
+				default_range = cladatum->default_range;
+		}
+
+		switch (default_range) {
+		case DEFAULT_SOURCE_LOW:
+			return mls_context_cpy_low(newcontext, scontext);
+		case DEFAULT_SOURCE_HIGH:
+			return mls_context_cpy_high(newcontext, scontext);
+		case DEFAULT_SOURCE_LOW_HIGH:
+			return mls_context_cpy(newcontext, scontext);
+		case DEFAULT_TARGET_LOW:
+			return mls_context_cpy_low(newcontext, tcontext);
+		case DEFAULT_TARGET_HIGH:
+			return mls_context_cpy_high(newcontext, tcontext);
+		case DEFAULT_TARGET_LOW_HIGH:
+			return mls_context_cpy(newcontext, tcontext);
+		}
+
 		/* Fallthrough */
 	case AVTAB_CHANGE:
 		if ((tclass == policydb.process_class) || (sock == true))
diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
index a7f61d52f05..2bb9c2fd5f1 100644
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
@@ -133,6 +133,11 @@ static struct policydb_compat_info policydb_compat[] = {
 		.sym_num	= SYM_NUM,
 		.ocon_num	= OCON_NUM,
 	},
+	{
+		.version	= POLICYDB_VERSION_NEW_OBJECT_DEFAULTS,
+		.sym_num	= SYM_NUM,
+		.ocon_num	= OCON_NUM,
+	},
 };
 
 static struct policydb_compat_info *policydb_lookup_compat(int version)
@@ -1306,6 +1311,16 @@ static int class_read(struct policydb *p, struct hashtab *h, void *fp)
 			goto bad;
 	}
 
+	if (p->policyvers >= POLICYDB_VERSION_NEW_OBJECT_DEFAULTS) {
+		rc = next_entry(buf, fp, sizeof(u32) * 3);
+		if (rc)
+			goto bad;
+
+		cladatum->default_user = le32_to_cpu(buf[0]);
+		cladatum->default_role = le32_to_cpu(buf[1]);
+		cladatum->default_range = le32_to_cpu(buf[2]);
+	}
+
 	rc = hashtab_insert(h, key, cladatum);
 	if (rc)
 		goto bad;
@@ -2832,6 +2847,16 @@ static int class_write(void *vkey, void *datum, void *ptr)
 	if (rc)
 		return rc;
 
+	if (p->policyvers >= POLICYDB_VERSION_NEW_OBJECT_DEFAULTS) {
+		buf[0] = cpu_to_le32(cladatum->default_user);
+		buf[1] = cpu_to_le32(cladatum->default_role);
+		buf[2] = cpu_to_le32(cladatum->default_range);
+
+		rc = put_entry(buf, sizeof(uint32_t), 3, fp);
+		if (rc)
+			return rc;
+	}
+
 	return 0;
 }
 
diff --git a/security/selinux/ss/policydb.h b/security/selinux/ss/policydb.h
index b846c038718..a949f1ad43b 100644
--- a/security/selinux/ss/policydb.h
+++ b/security/selinux/ss/policydb.h
@@ -60,6 +60,19 @@ struct class_datum {
 	struct symtab permissions;	/* class-specific permission symbol table */
 	struct constraint_node *constraints;	/* constraints on class permissions */
 	struct constraint_node *validatetrans;	/* special transition rules */
+	/* Options how a new object user and role should be decided */
+#define DEFAULT_SOURCE         1
+#define DEFAULT_TARGET         2
+	char default_user;
+	char default_role;
+/* Options how a new object range should be decided */
+#define DEFAULT_SOURCE_LOW     1
+#define DEFAULT_SOURCE_HIGH    2
+#define DEFAULT_SOURCE_LOW_HIGH        3
+#define DEFAULT_TARGET_LOW     4
+#define DEFAULT_TARGET_HIGH    5
+#define DEFAULT_TARGET_LOW_HIGH        6
+	char default_range;
 };
 
 /* Role attributes */
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 185f849a26f..2ea108c2c04 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -1389,6 +1389,7 @@ static int security_compute_sid(u32 ssid,
 				u32 *out_sid,
 				bool kern)
 {
+	struct class_datum *cladatum = NULL;
 	struct context *scontext = NULL, *tcontext = NULL, newcontext;
 	struct role_trans *roletr = NULL;
 	struct avtab_key avkey;
@@ -1437,12 +1438,20 @@ static int security_compute_sid(u32 ssid,
 		goto out_unlock;
 	}
 
+	if (tclass && tclass <= policydb.p_classes.nprim)
+		cladatum = policydb.class_val_to_struct[tclass - 1];
+
 	/* Set the user identity. */
 	switch (specified) {
 	case AVTAB_TRANSITION:
 	case AVTAB_CHANGE:
-		/* Use the process user identity. */
-		newcontext.user = scontext->user;
+		if (cladatum && cladatum->default_user == DEFAULT_TARGET) {
+			newcontext.user = tcontext->user;
+		} else {
+			/* notice this gets both DEFAULT_SOURCE and unset */
+			/* Use the process user identity. */
+			newcontext.user = scontext->user;
+		}
 		break;
 	case AVTAB_MEMBER:
 		/* Use the related object owner. */
@@ -1450,14 +1459,23 @@ static int security_compute_sid(u32 ssid,
 		break;
 	}
 
-	/* Set the role and type to default values. */
-	if ((tclass == policydb.process_class) || (sock == true)) {
-		/* Use the current role and type of process. */
+	/* Set the role to default values. */
+	if (cladatum && cladatum->default_role == DEFAULT_SOURCE) {
 		newcontext.role = scontext->role;
+	} else if (cladatum && cladatum->default_role == DEFAULT_TARGET) {
+		newcontext.role = tcontext->role;
+	} else {
+		if ((tclass == policydb.process_class) || (sock == true))
+			newcontext.role = scontext->role;
+		else
+			newcontext.role = OBJECT_R_VAL;
+	}
+
+	/* Set the type to default values. */
+	if ((tclass == policydb.process_class) || (sock == true)) {
+		/* Use the type of process. */
 		newcontext.type = scontext->type;
 	} else {
-		/* Use the well-defined object role. */
-		newcontext.role = OBJECT_R_VAL;
 		/* Use the type of the related object. */
 		newcontext.type = tcontext->type;
 	}
-- 
cgit v1.2.3


From eed7795d0a2c9b2e934afc088e903fa2c17b7958 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Tue, 20 Mar 2012 14:35:12 -0400
Subject: SELinux: add default_type statements

Because Fedora shipped userspace based on my development tree we now
have policy version 27 in the wild defining only default user, role, and
range.  Thus to add default_type we need a policy.28.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/include/security.h |  3 ++-
 security/selinux/ss/policydb.c      | 19 +++++++++++++++++++
 security/selinux/ss/policydb.h      |  3 ++-
 security/selinux/ss/services.c      | 14 ++++++++++----
 4 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index ba53400195c..dde2005407a 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -32,13 +32,14 @@
 #define POLICYDB_VERSION_FILENAME_TRANS	25
 #define POLICYDB_VERSION_ROLETRANS	26
 #define POLICYDB_VERSION_NEW_OBJECT_DEFAULTS	27
+#define POLICYDB_VERSION_DEFAULT_TYPE	28
 
 /* Range of policy versions we understand*/
 #define POLICYDB_VERSION_MIN   POLICYDB_VERSION_BASE
 #ifdef CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX
 #define POLICYDB_VERSION_MAX	CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX_VALUE
 #else
-#define POLICYDB_VERSION_MAX	POLICYDB_VERSION_NEW_OBJECT_DEFAULTS
+#define POLICYDB_VERSION_MAX	POLICYDB_VERSION_DEFAULT_TYPE
 #endif
 
 /* Mask for just the mount related flags */
diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
index 2bb9c2fd5f1..9cd9b7c661e 100644
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
@@ -138,6 +138,11 @@ static struct policydb_compat_info policydb_compat[] = {
 		.sym_num	= SYM_NUM,
 		.ocon_num	= OCON_NUM,
 	},
+	{
+		.version	= POLICYDB_VERSION_DEFAULT_TYPE,
+		.sym_num	= SYM_NUM,
+		.ocon_num	= OCON_NUM,
+	},
 };
 
 static struct policydb_compat_info *policydb_lookup_compat(int version)
@@ -1321,6 +1326,13 @@ static int class_read(struct policydb *p, struct hashtab *h, void *fp)
 		cladatum->default_range = le32_to_cpu(buf[2]);
 	}
 
+	if (p->policyvers >= POLICYDB_VERSION_DEFAULT_TYPE) {
+		rc = next_entry(buf, fp, sizeof(u32) * 1);
+		if (rc)
+			goto bad;
+		cladatum->default_type = le32_to_cpu(buf[0]);
+	}
+
 	rc = hashtab_insert(h, key, cladatum);
 	if (rc)
 		goto bad;
@@ -2857,6 +2869,13 @@ static int class_write(void *vkey, void *datum, void *ptr)
 			return rc;
 	}
 
+	if (p->policyvers >= POLICYDB_VERSION_DEFAULT_TYPE) {
+		buf[0] = cpu_to_le32(cladatum->default_type);
+		rc = put_entry(buf, sizeof(uint32_t), 1, fp);
+		if (rc)
+			return rc;
+	}
+
 	return 0;
 }
 
diff --git a/security/selinux/ss/policydb.h b/security/selinux/ss/policydb.h
index a949f1ad43b..da637471d4c 100644
--- a/security/selinux/ss/policydb.h
+++ b/security/selinux/ss/policydb.h
@@ -60,11 +60,12 @@ struct class_datum {
 	struct symtab permissions;	/* class-specific permission symbol table */
 	struct constraint_node *constraints;	/* constraints on class permissions */
 	struct constraint_node *validatetrans;	/* special transition rules */
-	/* Options how a new object user and role should be decided */
+/* Options how a new object user, role, and type should be decided */
 #define DEFAULT_SOURCE         1
 #define DEFAULT_TARGET         2
 	char default_user;
 	char default_role;
+	char default_type;
 /* Options how a new object range should be decided */
 #define DEFAULT_SOURCE_LOW     1
 #define DEFAULT_SOURCE_HIGH    2
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 2ea108c2c04..1ded0ec7e8c 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -1472,12 +1472,18 @@ static int security_compute_sid(u32 ssid,
 	}
 
 	/* Set the type to default values. */
-	if ((tclass == policydb.process_class) || (sock == true)) {
-		/* Use the type of process. */
+	if (cladatum && cladatum->default_type == DEFAULT_SOURCE) {
 		newcontext.type = scontext->type;
-	} else {
-		/* Use the type of the related object. */
+	} else if (cladatum && cladatum->default_type == DEFAULT_TARGET) {
 		newcontext.type = tcontext->type;
+	} else {
+		if ((tclass == policydb.process_class) || (sock == true)) {
+			/* Use the type of process. */
+			newcontext.type = scontext->type;
+		} else {
+			/* Use the type of the related object. */
+			newcontext.type = tcontext->type;
+		}
 	}
 
 	/* Look for a type transition/member/change rule. */
-- 
cgit v1.2.3


From 95dbf739313f09c8d859bde1373bc264ef979337 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 13:45:34 -0400
Subject: SELinux: check OPEN on truncate calls

In RH BZ 578841 we realized that the SELinux sandbox program was allowed to
truncate files outside of the sandbox.  The reason is because sandbox
confinement is determined almost entirely by the 'open' permission.  The idea
was that if the sandbox was unable to open() files it would be unable to do
harm to those files.  This turns out to be false in light of syscalls like
truncate() and chmod() which don't require a previous open() call.  I looked
at the syscalls that did not have an associated 'open' check and found that
truncate(), did not have a seperate permission and even if it did have a
separate permission such a permission owuld be inadequate for use by
sandbox (since it owuld have to be granted so liberally as to be useless).
This patch checks the OPEN permission on truncate.  I think a better solution
for sandbox is a whole new permission, but at least this fixes what we have
today.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/hooks.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d85b793c932..f7d7e779c7f 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2708,6 +2708,7 @@ static int selinux_inode_setattr(struct dentry *dentry, struct iattr *iattr)
 {
 	const struct cred *cred = current_cred();
 	unsigned int ia_valid = iattr->ia_valid;
+	__u32 av = FILE__WRITE;
 
 	/* ATTR_FORCE is just used for ATTR_KILL_S[UG]ID. */
 	if (ia_valid & ATTR_FORCE) {
@@ -2721,7 +2722,10 @@ static int selinux_inode_setattr(struct dentry *dentry, struct iattr *iattr)
 			ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_TIMES_SET))
 		return dentry_has_perm(cred, dentry, FILE__SETATTR);
 
-	return dentry_has_perm(cred, dentry, FILE__WRITE);
+	if (ia_valid & ATTR_SIZE)
+		av |= FILE__OPEN;
+
+	return dentry_has_perm(cred, dentry, av);
 }
 
 static int selinux_inode_getattr(struct vfsmount *mnt, struct dentry *dentry)
-- 
cgit v1.2.3


From 83d498569e9a7a4b92c4c5d3566f2d6a604f28c9 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 13:45:40 -0400
Subject: SELinux: rename dentry_open to file_open

dentry_open takes a file, rename it to file_open

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 fs/open.c                  |  2 +-
 include/linux/security.h   | 13 +++++--------
 security/apparmor/lsm.c    |  4 ++--
 security/capability.c      |  4 ++--
 security/security.c        |  4 ++--
 security/selinux/hooks.c   |  6 +++---
 security/smack/smack_lsm.c |  6 +++---
 security/tomoyo/tomoyo.c   |  6 +++---
 8 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 5720854156d..5eccdcea2d1 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -681,7 +681,7 @@ static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
 
 	f->f_op = fops_get(inode->i_fop);
 
-	error = security_dentry_open(f, cred);
+	error = security_file_open(f, cred);
 	if (error)
 		goto cleanup_all;
 
diff --git a/include/linux/security.h b/include/linux/security.h
index 673afbb8238..de412ea29aa 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -639,10 +639,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  *	to receive an open file descriptor via socket IPC.
  *	@file contains the file structure being received.
  *	Return 0 if permission is granted.
- *
- * Security hook for dentry
- *
- * @dentry_open
+ * @file_open
  *	Save open-time permission checking state for later use upon
  *	file_permission, and recheck access if anything has changed
  *	since inode_permission.
@@ -1497,7 +1494,7 @@ struct security_operations {
 	int (*file_send_sigiotask) (struct task_struct *tsk,
 				    struct fown_struct *fown, int sig);
 	int (*file_receive) (struct file *file);
-	int (*dentry_open) (struct file *file, const struct cred *cred);
+	int (*file_open) (struct file *file, const struct cred *cred);
 
 	int (*task_create) (unsigned long clone_flags);
 	void (*task_free) (struct task_struct *task);
@@ -1756,7 +1753,7 @@ int security_file_set_fowner(struct file *file);
 int security_file_send_sigiotask(struct task_struct *tsk,
 				 struct fown_struct *fown, int sig);
 int security_file_receive(struct file *file);
-int security_dentry_open(struct file *file, const struct cred *cred);
+int security_file_open(struct file *file, const struct cred *cred);
 int security_task_create(unsigned long clone_flags);
 void security_task_free(struct task_struct *task);
 int security_cred_alloc_blank(struct cred *cred, gfp_t gfp);
@@ -2227,8 +2224,8 @@ static inline int security_file_receive(struct file *file)
 	return 0;
 }
 
-static inline int security_dentry_open(struct file *file,
-				       const struct cred *cred)
+static inline int security_file_open(struct file *file,
+				     const struct cred *cred)
 {
 	return 0;
 }
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index ad05d391974..02fddcd4c64 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -373,7 +373,7 @@ static int apparmor_inode_getattr(struct vfsmount *mnt, struct dentry *dentry)
 				      AA_MAY_META_READ);
 }
 
-static int apparmor_dentry_open(struct file *file, const struct cred *cred)
+static int apparmor_file_open(struct file *file, const struct cred *cred)
 {
 	struct aa_file_cxt *fcxt = file->f_security;
 	struct aa_profile *profile;
@@ -640,9 +640,9 @@ static struct security_operations apparmor_ops = {
 	.path_chmod =			apparmor_path_chmod,
 	.path_chown =			apparmor_path_chown,
 	.path_truncate =		apparmor_path_truncate,
-	.dentry_open =			apparmor_dentry_open,
 	.inode_getattr =                apparmor_inode_getattr,
 
+	.file_open =			apparmor_file_open,
 	.file_permission =		apparmor_file_permission,
 	.file_alloc_security =		apparmor_file_alloc_security,
 	.file_free_security =		apparmor_file_free_security,
diff --git a/security/capability.c b/security/capability.c
index 5bb21b1c448..fca889676c5 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -348,7 +348,7 @@ static int cap_file_receive(struct file *file)
 	return 0;
 }
 
-static int cap_dentry_open(struct file *file, const struct cred *cred)
+static int cap_file_open(struct file *file, const struct cred *cred)
 {
 	return 0;
 }
@@ -956,7 +956,7 @@ void __init security_fixup_ops(struct security_operations *ops)
 	set_to_cap_if_null(ops, file_set_fowner);
 	set_to_cap_if_null(ops, file_send_sigiotask);
 	set_to_cap_if_null(ops, file_receive);
-	set_to_cap_if_null(ops, dentry_open);
+	set_to_cap_if_null(ops, file_open);
 	set_to_cap_if_null(ops, task_create);
 	set_to_cap_if_null(ops, task_free);
 	set_to_cap_if_null(ops, cred_alloc_blank);
diff --git a/security/security.c b/security/security.c
index bf619ffc9a4..5497a57fba0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -701,11 +701,11 @@ int security_file_receive(struct file *file)
 	return security_ops->file_receive(file);
 }
 
-int security_dentry_open(struct file *file, const struct cred *cred)
+int security_file_open(struct file *file, const struct cred *cred)
 {
 	int ret;
 
-	ret = security_ops->dentry_open(file, cred);
+	ret = security_ops->file_open(file, cred);
 	if (ret)
 		return ret;
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index f7d7e779c7f..dc15f16a357 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2973,7 +2973,7 @@ static int selinux_file_permission(struct file *file, int mask)
 
 	if (sid == fsec->sid && fsec->isid == isec->sid &&
 	    fsec->pseqno == avc_policy_seqno())
-		/* No change since dentry_open check. */
+		/* No change since file_open check. */
 		return 0;
 
 	return selinux_revalidate_file_permission(file, mask);
@@ -3232,7 +3232,7 @@ static int selinux_file_receive(struct file *file)
 	return file_has_perm(cred, file, file_to_av(file));
 }
 
-static int selinux_dentry_open(struct file *file, const struct cred *cred)
+static int selinux_file_open(struct file *file, const struct cred *cred)
 {
 	struct file_security_struct *fsec;
 	struct inode *inode;
@@ -5596,7 +5596,7 @@ static struct security_operations selinux_ops = {
 	.file_send_sigiotask =		selinux_file_send_sigiotask,
 	.file_receive =			selinux_file_receive,
 
-	.dentry_open =			selinux_dentry_open,
+	.file_open =			selinux_file_open,
 
 	.task_create =			selinux_task_create,
 	.cred_alloc_blank =		selinux_cred_alloc_blank,
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 81c03a59711..8ef0199ebca 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -1349,7 +1349,7 @@ static int smack_file_receive(struct file *file)
 }
 
 /**
- * smack_dentry_open - Smack dentry open processing
+ * smack_file_open - Smack dentry open processing
  * @file: the object
  * @cred: unused
  *
@@ -1357,7 +1357,7 @@ static int smack_file_receive(struct file *file)
  *
  * Returns 0
  */
-static int smack_dentry_open(struct file *file, const struct cred *cred)
+static int smack_file_open(struct file *file, const struct cred *cred)
 {
 	struct inode_smack *isp = file->f_path.dentry->d_inode->i_security;
 
@@ -3538,7 +3538,7 @@ struct security_operations smack_ops = {
 	.file_send_sigiotask = 		smack_file_send_sigiotask,
 	.file_receive = 		smack_file_receive,
 
-	.dentry_open =			smack_dentry_open,
+	.file_open =			smack_file_open,
 
 	.cred_alloc_blank =		smack_cred_alloc_blank,
 	.cred_free =			smack_cred_free,
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index 620d37c159a..c2d04a50f76 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -319,14 +319,14 @@ static int tomoyo_file_fcntl(struct file *file, unsigned int cmd,
 }
 
 /**
- * tomoyo_dentry_open - Target for security_dentry_open().
+ * tomoyo_file_open - Target for security_file_open().
  *
  * @f:    Pointer to "struct file".
  * @cred: Pointer to "struct cred".
  *
  * Returns 0 on success, negative value otherwise.
  */
-static int tomoyo_dentry_open(struct file *f, const struct cred *cred)
+static int tomoyo_file_open(struct file *f, const struct cred *cred)
 {
 	int flags = f->f_flags;
 	/* Don't check read permission here if called from do_execve(). */
@@ -510,7 +510,7 @@ static struct security_operations tomoyo_security_ops = {
 	.bprm_set_creds      = tomoyo_bprm_set_creds,
 	.bprm_check_security = tomoyo_bprm_check_security,
 	.file_fcntl          = tomoyo_file_fcntl,
-	.dentry_open         = tomoyo_dentry_open,
+	.file_open           = tomoyo_file_open,
 	.path_truncate       = tomoyo_path_truncate,
 	.path_unlink         = tomoyo_path_unlink,
 	.path_mkdir          = tomoyo_path_mkdir,
-- 
cgit v1.2.3


From d6ea83ec6864e9297fa8b00ec3dae183413a90e3 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 13:45:49 -0400
Subject: SELinux: audit failed attempts to set invalid labels

We know that some yum operation is causing CAP_MAC_ADMIN failures.  This
implies that an RPM is laying down (or attempting to lay down) a file with
an invalid label.  The problem is that we don't have any information to
track down the cause.  This patch will cause such a failure to report the
failed label in an SELINUX_ERR audit message.  This is similar to the
SELINUX_ERR reports on invalid transitions and things like that.  It should
help run down problems on what is trying to set invalid labels in the
future.

Resulting records look something like:
type=AVC msg=audit(1319659241.138:71): avc:  denied  { mac_admin } for pid=2594 comm="chcon" capability=33 scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=capability2
type=SELINUX_ERR msg=audit(1319659241.138:71): op=setxattr invalid_context=unconfined_u:object_r:hello:s0
type=SYSCALL msg=audit(1319659241.138:71): arch=c000003e syscall=188 success=no exit=-22 a0=a2c0e0 a1=390341b79b a2=a2d620 a3=1f items=1 ppid=2519 pid=2594 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=1 comm="chcon" exe="/usr/bin/chcon" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=CWD msg=audit(1319659241.138:71):  cwd="/root" type=PATH msg=audit(1319659241.138:71): item=0 name="test" inode=785879 dev=fc:03 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:admin_home_t:s0

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/hooks.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index dc15f16a357..c3ee902306d 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2792,8 +2792,25 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
 
 	rc = security_context_to_sid(value, size, &newsid);
 	if (rc == -EINVAL) {
-		if (!capable(CAP_MAC_ADMIN))
+		if (!capable(CAP_MAC_ADMIN)) {
+			struct audit_buffer *ab;
+			size_t audit_size;
+			const char *str;
+
+			/* We strip a nul only if it is at the end, otherwise the
+			 * context contains a nul and we should audit that */
+			str = value;
+			if (str[size - 1] == '\0')
+				audit_size = size - 1;
+			else
+				audit_size = size;
+			ab = audit_log_start(current->audit_context, GFP_ATOMIC, AUDIT_SELINUX_ERR);
+			audit_log_format(ab, "op=setxattr invalid_context=");
+			audit_log_n_untrustedstring(ab, value, audit_size);
+			audit_log_end(ab);
+
 			return rc;
+		}
 		rc = security_context_to_sid_force(value, size, &newsid);
 	}
 	if (rc)
@@ -5335,8 +5352,23 @@ static int selinux_setprocattr(struct task_struct *p,
 		}
 		error = security_context_to_sid(value, size, &sid);
 		if (error == -EINVAL && !strcmp(name, "fscreate")) {
-			if (!capable(CAP_MAC_ADMIN))
+			if (!capable(CAP_MAC_ADMIN)) {
+				struct audit_buffer *ab;
+				size_t audit_size;
+
+				/* We strip a nul only if it is at the end, otherwise the
+				 * context contains a nul and we should audit that */
+				if (str[size - 1] == '\0')
+					audit_size = size - 1;
+				else
+					audit_size = size;
+				ab = audit_log_start(current->audit_context, GFP_ATOMIC, AUDIT_SELINUX_ERR);
+				audit_log_format(ab, "op=fscreate invalid_context=");
+				audit_log_n_untrustedstring(ab, value, audit_size);
+				audit_log_end(ab);
+
 				return error;
+			}
 			error = security_context_to_sid_force(value, size,
 							      &sid);
 		}
-- 
cgit v1.2.3


From bb7081ab93582fd2557160549854200a5fc7b42a Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 13:46:36 -0400
Subject: SELinux: possible NULL deref in context_struct_to_string

It's possible that the caller passed a NULL for scontext.  However if this
is a defered mapping we might still attempt to call *scontext=kstrdup().
This is bad.  Instead just return the len.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/ss/services.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 1ded0ec7e8c..9b7e7ed54e7 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -1018,9 +1018,11 @@ static int context_struct_to_string(struct context *context, char **scontext, u3
 
 	if (context->len) {
 		*scontext_len = context->len;
-		*scontext = kstrdup(context->str, GFP_ATOMIC);
-		if (!(*scontext))
-			return -ENOMEM;
+		if (scontext) {
+			*scontext = kstrdup(context->str, GFP_ATOMIC);
+			if (!(*scontext))
+				return -ENOMEM;
+		}
 		return 0;
 	}
 
-- 
cgit v1.2.3


From 92ae9e82d9a2c4b9b388d6a9e7a4b2ccb0b4452f Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 13:46:46 -0400
Subject: SELinux: remove needless sel_div function

I'm not really sure what the idea behind the sel_div function is, but it's
useless.  Since a and b are both unsigned, it's impossible for a % b < 0.
That means that part of the function never does anything.  Thus it's just a
normal /.  Just do that instead.  I don't even understand what that operation
was supposed to mean in the signed case however....

If it was signed:
sel_div(-2, 4) == ((-2 / 4) - ((-2 % 4) < 0))
		  ((0)      - ((-2)     < 0))
		  ((0)      - (1))
		  (-1)

What actually happens:
sel_div(-2, 4) == ((18446744073709551614 / 4) - ((18446744073709551614 % 4) < 0))
		  ((4611686018427387903)      - ((2 < 0))
		  (4611686018427387903        - 0)
		  ((unsigned int)4611686018427387903)
		  (4294967295)

Neither makes a whole ton of sense to me.  So I'm getting rid of the
function entirely.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/selinuxfs.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index f4b5a0baaec..640feaa06c0 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -1533,11 +1533,6 @@ static int sel_make_initcon_files(struct dentry *dir)
 	return 0;
 }
 
-static inline unsigned int sel_div(unsigned long a, unsigned long b)
-{
-	return a / b - (a % b < 0);
-}
-
 static inline unsigned long sel_class_to_ino(u16 class)
 {
 	return (class * (SEL_VEC_MAX + 1)) | SEL_CLASS_INO_OFFSET;
@@ -1545,7 +1540,7 @@ static inline unsigned long sel_class_to_ino(u16 class)
 
 static inline u16 sel_ino_to_class(unsigned long ino)
 {
-	return sel_div(ino & SEL_INO_MASK, SEL_VEC_MAX + 1);
+	return (ino & SEL_INO_MASK) / (SEL_VEC_MAX + 1);
 }
 
 static inline unsigned long sel_perm_to_ino(u16 class, u32 perm)
-- 
cgit v1.2.3


From 154c50ca4eb9ae472f50b6a481213e21ead4457d Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 13:47:11 -0400
Subject: SELinux: if sel_make_bools errors don't leave inconsistent state

We reset the bool names and values array to NULL, but do not reset the
number of entries in these arrays to 0.  If we error out and then get back
into this function we will walk these NULL pointers based on the belief
that they are non-zero length.

Signed-off-by: Eric Paris <eparis@redhat.com>
cc: stable@kernel.org
---
 security/selinux/selinuxfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 640feaa06c0..4e93f9ef970 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -1233,6 +1233,7 @@ static int sel_make_bools(void)
 		kfree(bool_pending_names[i]);
 	kfree(bool_pending_names);
 	kfree(bool_pending_values);
+	bool_num = 0;
 	bool_pending_names = NULL;
 	bool_pending_values = NULL;
 
-- 
cgit v1.2.3


From 2e33405785d3eaec303c54b4a10afdebf3729da7 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 15:01:42 -0400
Subject: SELinux: delay initialization of audit data in
 selinux_inode_permission

We pay a rather large overhead initializing the common_audit_data.
Since we only need this information if we actually emit an audit
message there is little need to set it up in the hot path.  This patch
splits the functionality of avc_has_perm() into avc_has_perm_noaudit(),
avc_audit_required() and slow_avc_audit().  But we take care of setting
up to audit between required() and the actual audit call.  Thus saving
measurable time in a hot path.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/avc.c         | 63 +-------------------------------
 security/selinux/hooks.c       | 30 ++++++++++++++--
 security/selinux/include/avc.h | 82 +++++++++++++++++++++++++++++++++++++++---
 3 files changed, 105 insertions(+), 70 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 8ee42b2a5f1..1a04247e3a1 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -458,7 +458,7 @@ static void avc_audit_post_callback(struct audit_buffer *ab, void *a)
 }
 
 /* This is the slow part of avc audit with big stack footprint */
-static noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
+noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
 		u32 requested, u32 audited, u32 denied,
 		struct common_audit_data *a,
 		unsigned flags)
@@ -496,67 +496,6 @@ static noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
 	return 0;
 }
 
-/**
- * avc_audit - Audit the granting or denial of permissions.
- * @ssid: source security identifier
- * @tsid: target security identifier
- * @tclass: target security class
- * @requested: requested permissions
- * @avd: access vector decisions
- * @result: result from avc_has_perm_noaudit
- * @a:  auxiliary audit data
- * @flags: VFS walk flags
- *
- * Audit the granting or denial of permissions in accordance
- * with the policy.  This function is typically called by
- * avc_has_perm() after a permission check, but can also be
- * called directly by callers who use avc_has_perm_noaudit()
- * in order to separate the permission check from the auditing.
- * For example, this separation is useful when the permission check must
- * be performed under a lock, to allow the lock to be released
- * before calling the auditing code.
- */
-inline int avc_audit(u32 ssid, u32 tsid,
-	       u16 tclass, u32 requested,
-	       struct av_decision *avd, int result, struct common_audit_data *a,
-	       unsigned flags)
-{
-	u32 denied, audited;
-	denied = requested & ~avd->allowed;
-	if (unlikely(denied)) {
-		audited = denied & avd->auditdeny;
-		/*
-		 * a->selinux_audit_data->auditdeny is TRICKY!  Setting a bit in
-		 * this field means that ANY denials should NOT be audited if
-		 * the policy contains an explicit dontaudit rule for that
-		 * permission.  Take notice that this is unrelated to the
-		 * actual permissions that were denied.  As an example lets
-		 * assume:
-		 *
-		 * denied == READ
-		 * avd.auditdeny & ACCESS == 0 (not set means explicit rule)
-		 * selinux_audit_data->auditdeny & ACCESS == 1
-		 *
-		 * We will NOT audit the denial even though the denied
-		 * permission was READ and the auditdeny checks were for
-		 * ACCESS
-		 */
-		if (a &&
-		    a->selinux_audit_data->auditdeny &&
-		    !(a->selinux_audit_data->auditdeny & avd->auditdeny))
-			audited = 0;
-	} else if (result)
-		audited = denied = requested;
-	else
-		audited = requested & avd->auditallow;
-	if (likely(!audited))
-		return 0;
-
-	return slow_avc_audit(ssid, tsid, tclass,
-		requested, audited, denied,
-		a, flags);
-}
-
 /**
  * avc_add_callback - Register a callback for security events.
  * @callback: callback function
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c3ee902306d..c99027dc0b3 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2684,6 +2684,11 @@ static int selinux_inode_permission(struct inode *inode, int mask)
 	u32 perms;
 	bool from_access;
 	unsigned flags = mask & MAY_NOT_BLOCK;
+	struct inode_security_struct *isec;
+	u32 sid;
+	struct av_decision avd;
+	int rc, rc2;
+	u32 audited, denied;
 
 	from_access = mask & MAY_ACCESS;
 	mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND);
@@ -2692,6 +2697,23 @@ static int selinux_inode_permission(struct inode *inode, int mask)
 	if (!mask)
 		return 0;
 
+	validate_creds(cred);
+
+	if (unlikely(IS_PRIVATE(inode)))
+		return 0;
+
+	perms = file_mask_to_av(inode->i_mode, mask);
+
+	sid = cred_sid(cred);
+	isec = inode->i_security;
+
+	rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);
+	audited = avc_audit_required(perms, &avd, rc,
+				     from_access ? FILE__AUDIT_ACCESS : 0,
+				     &denied);
+	if (likely(!audited))
+		return rc;
+
 	COMMON_AUDIT_DATA_INIT(&ad, INODE);
 	ad.selinux_audit_data = &sad;
 	ad.u.inode = inode;
@@ -2699,9 +2721,11 @@ static int selinux_inode_permission(struct inode *inode, int mask)
 	if (from_access)
 		ad.selinux_audit_data->auditdeny |= FILE__AUDIT_ACCESS;
 
-	perms = file_mask_to_av(inode->i_mode, mask);
-
-	return inode_has_perm(cred, inode, perms, &ad, flags);
+	rc2 = slow_avc_audit(sid, isec->sid, isec->sclass, perms,
+			     audited, denied, &ad, flags);
+	if (rc2)
+		return rc2;
+	return rc;
 }
 
 static int selinux_inode_setattr(struct dentry *dentry, struct iattr *iattr)
diff --git a/security/selinux/include/avc.h b/security/selinux/include/avc.h
index 1931370233d..e4e50bb218e 100644
--- a/security/selinux/include/avc.h
+++ b/security/selinux/include/avc.h
@@ -77,11 +77,83 @@ struct selinux_audit_data {
 
 void __init avc_init(void);
 
-int avc_audit(u32 ssid, u32 tsid,
-	       u16 tclass, u32 requested,
-	       struct av_decision *avd,
-	       int result,
-	      struct common_audit_data *a, unsigned flags);
+static inline u32 avc_audit_required(u32 requested,
+			      struct av_decision *avd,
+			      int result,
+			      u32 auditdeny,
+			      u32 *deniedp)
+{
+	u32 denied, audited;
+	denied = requested & ~avd->allowed;
+	if (unlikely(denied)) {
+		audited = denied & avd->auditdeny;
+		/*
+		 * auditdeny is TRICKY!  Setting a bit in
+		 * this field means that ANY denials should NOT be audited if
+		 * the policy contains an explicit dontaudit rule for that
+		 * permission.  Take notice that this is unrelated to the
+		 * actual permissions that were denied.  As an example lets
+		 * assume:
+		 *
+		 * denied == READ
+		 * avd.auditdeny & ACCESS == 0 (not set means explicit rule)
+		 * auditdeny & ACCESS == 1
+		 *
+		 * We will NOT audit the denial even though the denied
+		 * permission was READ and the auditdeny checks were for
+		 * ACCESS
+		 */
+		if (auditdeny && !(auditdeny & avd->auditdeny))
+			audited = 0;
+	} else if (result)
+		audited = denied = requested;
+	else
+		audited = requested & avd->auditallow;
+	*deniedp = denied;
+	return audited;
+}
+
+int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
+		   u32 requested, u32 audited, u32 denied,
+		   struct common_audit_data *a,
+		   unsigned flags);
+
+/**
+ * avc_audit - Audit the granting or denial of permissions.
+ * @ssid: source security identifier
+ * @tsid: target security identifier
+ * @tclass: target security class
+ * @requested: requested permissions
+ * @avd: access vector decisions
+ * @result: result from avc_has_perm_noaudit
+ * @a:  auxiliary audit data
+ * @flags: VFS walk flags
+ *
+ * Audit the granting or denial of permissions in accordance
+ * with the policy.  This function is typically called by
+ * avc_has_perm() after a permission check, but can also be
+ * called directly by callers who use avc_has_perm_noaudit()
+ * in order to separate the permission check from the auditing.
+ * For example, this separation is useful when the permission check must
+ * be performed under a lock, to allow the lock to be released
+ * before calling the auditing code.
+ */
+static inline int avc_audit(u32 ssid, u32 tsid,
+			    u16 tclass, u32 requested,
+			    struct av_decision *avd,
+			    int result,
+			    struct common_audit_data *a, unsigned flags)
+{
+	u32 audited, denied;
+	audited = avc_audit_required(requested, avd, result,
+				     a ? a->selinux_audit_data->auditdeny : 0,
+				     &denied);
+	if (likely(!audited))
+		return 0;
+	return slow_avc_audit(ssid, tsid, tclass,
+			      requested, audited, denied,
+			      a, flags);
+}
 
 #define AVC_STRICT 1 /* Ignore permissive mode. */
 int avc_has_perm_noaudit(u32 ssid, u32 tsid,
-- 
cgit v1.2.3


From 602a8dd6ea6abd463bc26310c4a1b44919f88e68 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 15:01:42 -0400
Subject: SELinux: remove inode_has_perm_noadp

Both callers could better be using file_has_perm() to get better audit
results.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/hooks.c | 28 ++++------------------------
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c99027dc0b3..8417a6afaf3 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1488,20 +1488,6 @@ static int inode_has_perm(const struct cred *cred,
 	return avc_has_perm_flags(sid, isec->sid, isec->sclass, perms, adp, flags);
 }
 
-static int inode_has_perm_noadp(const struct cred *cred,
-				struct inode *inode,
-				u32 perms,
-				unsigned flags)
-{
-	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
-
-	COMMON_AUDIT_DATA_INIT(&ad, INODE);
-	ad.u.inode = inode;
-	ad.selinux_audit_data = &sad;
-	return inode_has_perm(cred, inode, perms, &ad, flags);
-}
-
 /* Same as inode_has_perm, but pass explicit audit data containing
    the dentry to help the auditing code to more easily generate the
    pathname if needed. */
@@ -2128,21 +2114,17 @@ static inline void flush_unauthorized_files(const struct cred *cred,
 		spin_lock(&tty_files_lock);
 		if (!list_empty(&tty->tty_files)) {
 			struct tty_file_private *file_priv;
-			struct inode *inode;
 
 			/* Revalidate access to controlling tty.
-			   Use inode_has_perm on the tty inode directly rather
+			   Use path_has_perm on the tty path directly rather
 			   than using file_has_perm, as this particular open
 			   file may belong to another process and we are only
 			   interested in the inode-based check here. */
 			file_priv = list_first_entry(&tty->tty_files,
 						struct tty_file_private, list);
 			file = file_priv->file;
-			inode = file->f_path.dentry->d_inode;
-			if (inode_has_perm_noadp(cred, inode,
-					   FILE__READ | FILE__WRITE, 0)) {
+			if (path_has_perm(cred, &file->f_path, FILE__READ | FILE__WRITE))
 				drop_tty = 1;
-			}
 		}
 		spin_unlock(&tty_files_lock);
 		tty_kref_put(tty);
@@ -3276,12 +3258,10 @@ static int selinux_file_receive(struct file *file)
 static int selinux_file_open(struct file *file, const struct cred *cred)
 {
 	struct file_security_struct *fsec;
-	struct inode *inode;
 	struct inode_security_struct *isec;
 
-	inode = file->f_path.dentry->d_inode;
 	fsec = file->f_security;
-	isec = inode->i_security;
+	isec = file->f_path.dentry->d_inode->i_security;
 	/*
 	 * Save inode label and policy sequence number
 	 * at open-time so that selinux_file_permission
@@ -3299,7 +3279,7 @@ static int selinux_file_open(struct file *file, const struct cred *cred)
 	 * new inode label or new policy.
 	 * This check is not redundant - do not remove.
 	 */
-	return inode_has_perm_noadp(cred, inode, open_file_to_av(file), 0);
+	return path_has_perm(cred, &file->f_path, open_file_to_av(file));
 }
 
 /* task security operations */
-- 
cgit v1.2.3


From d4cf970d0732628d514405c5a975024b9e205b0b Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 15:01:42 -0400
Subject: SELinux: move common_audit_data to a noinline slow path function

selinux_inode_has_perm is a hot path.  Instead of declaring the
common_audit_data on the stack move it to a noinline function only used in
the rare case we need to send an audit message.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/hooks.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 8417a6afaf3..b3bd8e1d268 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2658,11 +2658,29 @@ static int selinux_inode_follow_link(struct dentry *dentry, struct nameidata *na
 	return dentry_has_perm(cred, dentry, FILE__READ);
 }
 
-static int selinux_inode_permission(struct inode *inode, int mask)
+static noinline int audit_inode_permission(struct inode *inode,
+					   u32 perms, u32 audited, u32 denied,
+					   unsigned flags)
 {
-	const struct cred *cred = current_cred();
 	struct common_audit_data ad;
 	struct selinux_audit_data sad = {0,};
+	struct inode_security_struct *isec = inode->i_security;
+	int rc;
+
+	COMMON_AUDIT_DATA_INIT(&ad, INODE);
+	ad.selinux_audit_data = &sad;
+	ad.u.inode = inode;
+
+	rc = slow_avc_audit(current_sid(), isec->sid, isec->sclass, perms,
+			    audited, denied, &ad, flags);
+	if (rc)
+		return rc;
+	return 0;
+}
+
+static int selinux_inode_permission(struct inode *inode, int mask)
+{
+	const struct cred *cred = current_cred();
 	u32 perms;
 	bool from_access;
 	unsigned flags = mask & MAY_NOT_BLOCK;
@@ -2696,15 +2714,7 @@ static int selinux_inode_permission(struct inode *inode, int mask)
 	if (likely(!audited))
 		return rc;
 
-	COMMON_AUDIT_DATA_INIT(&ad, INODE);
-	ad.selinux_audit_data = &sad;
-	ad.u.inode = inode;
-
-	if (from_access)
-		ad.selinux_audit_data->auditdeny |= FILE__AUDIT_ACCESS;
-
-	rc2 = slow_avc_audit(sid, isec->sid, isec->sclass, perms,
-			     audited, denied, &ad, flags);
+	rc2 = audit_inode_permission(inode, perms, audited, denied, flags);
 	if (rc2)
 		return rc2;
 	return rc;
-- 
cgit v1.2.3


From bd5e50f9c1c71daac273fa586424f07205f6b13b Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 15:01:42 -0400
Subject: LSM: remove the COMMON_AUDIT_DATA_INIT type expansion

Just open code it so grep on the source code works better.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 include/linux/lsm_audit.h         |  2 +-
 security/apparmor/capability.c    |  2 +-
 security/apparmor/file.c          |  2 +-
 security/apparmor/ipc.c           |  2 +-
 security/apparmor/lib.c           |  2 +-
 security/apparmor/lsm.c           |  2 +-
 security/apparmor/policy.c        |  2 +-
 security/apparmor/policy_unpack.c |  2 +-
 security/apparmor/resource.c      |  2 +-
 security/selinux/avc.c            |  2 +-
 security/selinux/hooks.c          | 68 +++++++++++++++++++--------------------
 11 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index fad48aab893..9e1ebf5851b 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -96,7 +96,7 @@ int ipv6_skb_to_auditdata(struct sk_buff *skb,
 /* Initialize an LSM audit data structure. */
 #define COMMON_AUDIT_DATA_INIT(_d, _t) \
 	{ memset((_d), 0, sizeof(struct common_audit_data)); \
-	 (_d)->type = LSM_AUDIT_DATA_##_t; }
+	 (_d)->type = _t; }
 
 void common_lsm_audit(struct common_audit_data *a,
 	void (*pre_audit)(struct audit_buffer *, void *),
diff --git a/security/apparmor/capability.c b/security/apparmor/capability.c
index 088dba3bf7d..3ecb8b7d850 100644
--- a/security/apparmor/capability.c
+++ b/security/apparmor/capability.c
@@ -65,7 +65,7 @@ static int audit_caps(struct aa_profile *profile, struct task_struct *task,
 	int type = AUDIT_APPARMOR_AUTO;
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, CAP);
+	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_CAP);
 	sa.aad = &aad;
 	sa.tsk = task;
 	sa.u.cap = cap;
diff --git a/security/apparmor/file.c b/security/apparmor/file.c
index 2f8fcba9ce4..6ab264ca85c 100644
--- a/security/apparmor/file.c
+++ b/security/apparmor/file.c
@@ -108,7 +108,7 @@ int aa_audit_file(struct aa_profile *profile, struct file_perms *perms,
 	int type = AUDIT_APPARMOR_AUTO;
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, NONE);
+	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
 	sa.aad = &aad;
 	aad.op = op,
 	aad.fs.request = request;
diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
index c3da93a5150..dba449b74db 100644
--- a/security/apparmor/ipc.c
+++ b/security/apparmor/ipc.c
@@ -42,7 +42,7 @@ static int aa_audit_ptrace(struct aa_profile *profile,
 {
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, NONE);
+	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
 	sa.aad = &aad;
 	aad.op = OP_PTRACE;
 	aad.target = target;
diff --git a/security/apparmor/lib.c b/security/apparmor/lib.c
index e75829ba0ff..b11a2652f54 100644
--- a/security/apparmor/lib.c
+++ b/security/apparmor/lib.c
@@ -66,7 +66,7 @@ void aa_info_message(const char *str)
 	if (audit_enabled) {
 		struct common_audit_data sa;
 		struct apparmor_audit_data aad = {0,};
-		COMMON_AUDIT_DATA_INIT(&sa, NONE);
+		COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
 		sa.aad = &aad;
 		aad.info = str;
 		aa_audit_msg(AUDIT_APPARMOR_STATUS, &sa, NULL);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 02fddcd4c64..4f7bc07b2dc 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -589,7 +589,7 @@ static int apparmor_setprocattr(struct task_struct *task, char *name,
 		} else {
 			struct common_audit_data sa;
 			struct apparmor_audit_data aad = {0,};
-			COMMON_AUDIT_DATA_INIT(&sa, NONE);
+			COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
 			sa.aad = &aad;
 			aad.op = OP_SETPROCATTR;
 			aad.info = name;
diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c
index f1f7506a464..03dbaef2f8e 100644
--- a/security/apparmor/policy.c
+++ b/security/apparmor/policy.c
@@ -965,7 +965,7 @@ static int audit_policy(int op, gfp_t gfp, const char *name, const char *info,
 {
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, NONE);
+	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
 	sa.aad = &aad;
 	aad.op = op;
 	aad.name = name;
diff --git a/security/apparmor/policy_unpack.c b/security/apparmor/policy_unpack.c
index deab7c7e8dc..504ba4015aa 100644
--- a/security/apparmor/policy_unpack.c
+++ b/security/apparmor/policy_unpack.c
@@ -95,7 +95,7 @@ static int audit_iface(struct aa_profile *new, const char *name,
 	struct aa_profile *profile = __aa_current_profile();
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, NONE);
+	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
 	sa.aad = &aad;
 	if (e)
 		aad.iface.pos = e->pos - e->start;
diff --git a/security/apparmor/resource.c b/security/apparmor/resource.c
index 2fe8613efe3..d06f57b74f7 100644
--- a/security/apparmor/resource.c
+++ b/security/apparmor/resource.c
@@ -52,7 +52,7 @@ static int audit_resource(struct aa_profile *profile, unsigned int resource,
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
 
-	COMMON_AUDIT_DATA_INIT(&sa, NONE);
+	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
 	sa.aad = &aad;
 	aad.op = OP_SETRLIMIT,
 	aad.rlim.rlim = resource;
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 1a04247e3a1..c04eea2bdb0 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -469,7 +469,7 @@ noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
 
 	if (!a) {
 		a = &stack_data;
-		COMMON_AUDIT_DATA_INIT(a, NONE);
+		COMMON_AUDIT_DATA_INIT(a, LSM_AUDIT_DATA_NONE);
 		a->selinux_audit_data = &sad;
 	}
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index b3bd8e1d268..9f038449300 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1427,7 +1427,7 @@ static int cred_has_capability(const struct cred *cred,
 	u32 av = CAP_TO_MASK(cap);
 	int rc;
 
-	COMMON_AUDIT_DATA_INIT(&ad, CAP);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_CAP);
 	ad.selinux_audit_data = &sad;
 	ad.tsk = current;
 	ad.u.cap = cap;
@@ -1499,7 +1499,7 @@ static inline int dentry_has_perm(const struct cred *cred,
 	struct common_audit_data ad;
 	struct selinux_audit_data sad = {0,};
 
-	COMMON_AUDIT_DATA_INIT(&ad, DENTRY);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
 	ad.u.dentry = dentry;
 	ad.selinux_audit_data = &sad;
 	return inode_has_perm(cred, inode, av, &ad, 0);
@@ -1516,7 +1516,7 @@ static inline int path_has_perm(const struct cred *cred,
 	struct common_audit_data ad;
 	struct selinux_audit_data sad = {0,};
 
-	COMMON_AUDIT_DATA_INIT(&ad, PATH);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_PATH);
 	ad.u.path = *path;
 	ad.selinux_audit_data = &sad;
 	return inode_has_perm(cred, inode, av, &ad, 0);
@@ -1541,7 +1541,7 @@ static int file_has_perm(const struct cred *cred,
 	u32 sid = cred_sid(cred);
 	int rc;
 
-	COMMON_AUDIT_DATA_INIT(&ad, PATH);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_PATH);
 	ad.u.path = file->f_path;
 	ad.selinux_audit_data = &sad;
 
@@ -1582,7 +1582,7 @@ static int may_create(struct inode *dir,
 	sid = tsec->sid;
 	newsid = tsec->create_sid;
 
-	COMMON_AUDIT_DATA_INIT(&ad, DENTRY);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
 	ad.u.dentry = dentry;
 	ad.selinux_audit_data = &sad;
 
@@ -1637,7 +1637,7 @@ static int may_link(struct inode *dir,
 	dsec = dir->i_security;
 	isec = dentry->d_inode->i_security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, DENTRY);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
 	ad.u.dentry = dentry;
 	ad.selinux_audit_data = &sad;
 
@@ -1685,7 +1685,7 @@ static inline int may_rename(struct inode *old_dir,
 	old_is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
 	new_dsec = new_dir->i_security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, DENTRY);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
 	ad.selinux_audit_data = &sad;
 
 	ad.u.dentry = old_dentry;
@@ -2011,7 +2011,7 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
 			return rc;
 	}
 
-	COMMON_AUDIT_DATA_INIT(&ad, PATH);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_PATH);
 	ad.selinux_audit_data = &sad;
 	ad.u.path = bprm->file->f_path;
 
@@ -2135,7 +2135,7 @@ static inline void flush_unauthorized_files(const struct cred *cred,
 
 	/* Revalidate access to inherited open files. */
 
-	COMMON_AUDIT_DATA_INIT(&ad, INODE);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_INODE);
 	ad.selinux_audit_data = &sad;
 
 	spin_lock(&files->file_lock);
@@ -2485,7 +2485,7 @@ static int selinux_sb_kern_mount(struct super_block *sb, int flags, void *data)
 	if (flags & MS_KERNMOUNT)
 		return 0;
 
-	COMMON_AUDIT_DATA_INIT(&ad, DENTRY);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
 	ad.selinux_audit_data = &sad;
 	ad.u.dentry = sb->s_root;
 	return superblock_has_perm(cred, sb, FILESYSTEM__MOUNT, &ad);
@@ -2497,7 +2497,7 @@ static int selinux_sb_statfs(struct dentry *dentry)
 	struct common_audit_data ad;
 	struct selinux_audit_data sad = {0,};
 
-	COMMON_AUDIT_DATA_INIT(&ad, DENTRY);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
 	ad.selinux_audit_data = &sad;
 	ad.u.dentry = dentry->d_sb->s_root;
 	return superblock_has_perm(cred, dentry->d_sb, FILESYSTEM__GETATTR, &ad);
@@ -2667,7 +2667,7 @@ static noinline int audit_inode_permission(struct inode *inode,
 	struct inode_security_struct *isec = inode->i_security;
 	int rc;
 
-	COMMON_AUDIT_DATA_INIT(&ad, INODE);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_INODE);
 	ad.selinux_audit_data = &sad;
 	ad.u.inode = inode;
 
@@ -2797,7 +2797,7 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
 	if (!inode_owner_or_capable(inode))
 		return -EPERM;
 
-	COMMON_AUDIT_DATA_INIT(&ad, DENTRY);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
 	ad.selinux_audit_data = &sad;
 	ad.u.dentry = dentry;
 
@@ -3412,7 +3412,7 @@ static int selinux_kernel_module_request(char *kmod_name)
 
 	sid = task_sid(current);
 
-	COMMON_AUDIT_DATA_INIT(&ad, KMOD);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_KMOD);
 	ad.selinux_audit_data = &sad;
 	ad.u.kmod_name = kmod_name;
 
@@ -3793,7 +3793,7 @@ static int sock_has_perm(struct task_struct *task, struct sock *sk, u32 perms)
 	if (sksec->sid == SECINITSID_KERNEL)
 		return 0;
 
-	COMMON_AUDIT_DATA_INIT(&ad, NET);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->sk = sk;
@@ -3901,7 +3901,7 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
 						      snum, &sid);
 				if (err)
 					goto out;
-				COMMON_AUDIT_DATA_INIT(&ad, NET);
+				COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 				ad.selinux_audit_data = &sad;
 				ad.u.net = &net;
 				ad.u.net->sport = htons(snum);
@@ -3936,7 +3936,7 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
 		if (err)
 			goto out;
 
-		COMMON_AUDIT_DATA_INIT(&ad, NET);
+		COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 		ad.selinux_audit_data = &sad;
 		ad.u.net = &net;
 		ad.u.net->sport = htons(snum);
@@ -3998,7 +3998,7 @@ static int selinux_socket_connect(struct socket *sock, struct sockaddr *address,
 		perm = (sksec->sclass == SECCLASS_TCP_SOCKET) ?
 		       TCP_SOCKET__NAME_CONNECT : DCCP_SOCKET__NAME_CONNECT;
 
-		COMMON_AUDIT_DATA_INIT(&ad, NET);
+		COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 		ad.selinux_audit_data = &sad;
 		ad.u.net = &net;
 		ad.u.net->dport = htons(snum);
@@ -4095,7 +4095,7 @@ static int selinux_socket_unix_stream_connect(struct sock *sock,
 	struct lsm_network_audit net = {0,};
 	int err;
 
-	COMMON_AUDIT_DATA_INIT(&ad, NET);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->sk = other;
@@ -4128,7 +4128,7 @@ static int selinux_socket_unix_may_send(struct socket *sock,
 	struct selinux_audit_data sad = {0,};
 	struct lsm_network_audit net = {0,};
 
-	COMMON_AUDIT_DATA_INIT(&ad, NET);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->sk = other->sk;
@@ -4171,7 +4171,7 @@ static int selinux_sock_rcv_skb_compat(struct sock *sk, struct sk_buff *skb,
 	struct lsm_network_audit net = {0,};
 	char *addrp;
 
-	COMMON_AUDIT_DATA_INIT(&ad, NET);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = skb->skb_iif;
@@ -4227,7 +4227,7 @@ static int selinux_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	if (!secmark_active && !peerlbl_active)
 		return 0;
 
-	COMMON_AUDIT_DATA_INIT(&ad, NET);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = skb->skb_iif;
@@ -4584,7 +4584,7 @@ static unsigned int selinux_ip_forward(struct sk_buff *skb, int ifindex,
 	if (selinux_skb_peerlbl_sid(skb, family, &peer_sid) != 0)
 		return NF_DROP;
 
-	COMMON_AUDIT_DATA_INIT(&ad, NET);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = ifindex;
@@ -4684,7 +4684,7 @@ static unsigned int selinux_ip_postroute_compat(struct sk_buff *skb,
 		return NF_ACCEPT;
 	sksec = sk->sk_security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, NET);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = ifindex;
@@ -4757,7 +4757,7 @@ static unsigned int selinux_ip_postroute(struct sk_buff *skb, int ifindex,
 		secmark_perm = PACKET__SEND;
 	}
 
-	COMMON_AUDIT_DATA_INIT(&ad, NET);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = ifindex;
@@ -4881,7 +4881,7 @@ static int ipc_has_perm(struct kern_ipc_perm *ipc_perms,
 
 	isec = ipc_perms->security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, IPC);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = ipc_perms->key;
 
@@ -4913,7 +4913,7 @@ static int selinux_msg_queue_alloc_security(struct msg_queue *msq)
 
 	isec = msq->q_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, IPC);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
@@ -4940,7 +4940,7 @@ static int selinux_msg_queue_associate(struct msg_queue *msq, int msqflg)
 
 	isec = msq->q_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, IPC);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
@@ -5002,7 +5002,7 @@ static int selinux_msg_queue_msgsnd(struct msg_queue *msq, struct msg_msg *msg,
 			return rc;
 	}
 
-	COMMON_AUDIT_DATA_INIT(&ad, IPC);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
@@ -5035,7 +5035,7 @@ static int selinux_msg_queue_msgrcv(struct msg_queue *msq, struct msg_msg *msg,
 	isec = msq->q_perm.security;
 	msec = msg->security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, IPC);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
@@ -5062,7 +5062,7 @@ static int selinux_shm_alloc_security(struct shmid_kernel *shp)
 
 	isec = shp->shm_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, IPC);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = shp->shm_perm.key;
 
@@ -5089,7 +5089,7 @@ static int selinux_shm_associate(struct shmid_kernel *shp, int shmflg)
 
 	isec = shp->shm_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, IPC);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = shp->shm_perm.key;
 
@@ -5158,7 +5158,7 @@ static int selinux_sem_alloc_security(struct sem_array *sma)
 
 	isec = sma->sem_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, IPC);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = sma->sem_perm.key;
 
@@ -5185,7 +5185,7 @@ static int selinux_sem_associate(struct sem_array *sma, int semflg)
 
 	isec = sma->sem_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, IPC);
+	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = sma->sem_perm.key;
 
-- 
cgit v1.2.3


From 0972c74ecba4878baa5f97bb78b242c0eefacfb6 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 15:01:42 -0400
Subject: apparmor: move task from common_audit_data to apparmor_audit_data

apparmor is the only LSM that uses the common_audit_data tsk field.
Instead of making all LSMs pay for the stack space move the aa usage into
the apparmor_audit_data.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/apparmor/audit.c         | 11 +++++++++--
 security/apparmor/capability.c    |  2 +-
 security/apparmor/include/audit.h |  1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/security/apparmor/audit.c b/security/apparmor/audit.c
index cc3520d39a7..3ae28db5a64 100644
--- a/security/apparmor/audit.c
+++ b/security/apparmor/audit.c
@@ -111,7 +111,7 @@ static const char *const aa_audit_type[] = {
 static void audit_pre(struct audit_buffer *ab, void *ca)
 {
 	struct common_audit_data *sa = ca;
-	struct task_struct *tsk = sa->tsk ? sa->tsk : current;
+	struct task_struct *tsk = sa->aad->tsk ? sa->aad->tsk : current;
 
 	if (aa_g_audit_header) {
 		audit_log_format(ab, "apparmor=");
@@ -149,6 +149,12 @@ static void audit_pre(struct audit_buffer *ab, void *ca)
 		audit_log_format(ab, " name=");
 		audit_log_untrustedstring(ab, sa->aad->name);
 	}
+
+	if (sa->aad->tsk) {
+		audit_log_format(ab, " pid=%d comm=", tsk->pid);
+		audit_log_untrustedstring(ab, tsk->comm);
+	}
+
 }
 
 /**
@@ -205,7 +211,8 @@ int aa_audit(int type, struct aa_profile *profile, gfp_t gfp,
 	aa_audit_msg(type, sa, cb);
 
 	if (sa->aad->type == AUDIT_APPARMOR_KILL)
-		(void)send_sig_info(SIGKILL, NULL, sa->tsk ? sa->tsk : current);
+		(void)send_sig_info(SIGKILL, NULL,
+				    sa->aad->tsk ?  sa->aad->tsk : current);
 
 	if (sa->aad->type == AUDIT_APPARMOR_ALLOWED)
 		return complain_error(sa->aad->error);
diff --git a/security/apparmor/capability.c b/security/apparmor/capability.c
index 3ecb8b7d850..b66a0e4a569 100644
--- a/security/apparmor/capability.c
+++ b/security/apparmor/capability.c
@@ -67,8 +67,8 @@ static int audit_caps(struct aa_profile *profile, struct task_struct *task,
 	struct apparmor_audit_data aad = {0,};
 	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_CAP);
 	sa.aad = &aad;
-	sa.tsk = task;
 	sa.u.cap = cap;
+	sa.aad->tsk = task;
 	sa.aad->op = OP_CAPABLE;
 	sa.aad->error = error;
 
diff --git a/security/apparmor/include/audit.h b/security/apparmor/include/audit.h
index 3868b1e5d5b..4b7e18951ae 100644
--- a/security/apparmor/include/audit.h
+++ b/security/apparmor/include/audit.h
@@ -110,6 +110,7 @@ struct apparmor_audit_data {
 	void *profile;
 	const char *name;
 	const char *info;
+	struct task_struct *tsk;
 	union {
 		void *target;
 		struct {
-- 
cgit v1.2.3


From b466066f9b648ccb6aa1e174f0389b7433e460fd Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 15:01:43 -0400
Subject: LSM: remove the task field from common_audit_data

There are no legitimate users.  Always use current and get back some stack
space for the common_audit_data.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 include/linux/lsm_audit.h | 1 -
 security/lsm_audit.c      | 8 ++------
 security/selinux/hooks.c  | 1 -
 3 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index 9e1ebf5851b..75368c1aac7 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -53,7 +53,6 @@ struct common_audit_data {
 #define LSM_AUDIT_DATA_KMOD	8
 #define LSM_AUDIT_DATA_INODE	9
 #define LSM_AUDIT_DATA_DENTRY	10
-	struct task_struct *tsk;
 	union 	{
 		struct path path;
 		struct dentry *dentry;
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 90c129b0102..e796d251765 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -213,12 +213,8 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 {
 	struct task_struct *tsk = current;
 
-	if (a->tsk)
-		tsk = a->tsk;
-	if (tsk && tsk->pid) {
-		audit_log_format(ab, " pid=%d comm=", tsk->pid);
-		audit_log_untrustedstring(ab, tsk->comm);
-	}
+	audit_log_format(ab, " pid=%d comm=", tsk->pid);
+	audit_log_untrustedstring(ab, tsk->comm);
 
 	switch (a->type) {
 	case LSM_AUDIT_DATA_NONE:
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 9f038449300..d79762946c6 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1429,7 +1429,6 @@ static int cred_has_capability(const struct cred *cred,
 
 	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_CAP);
 	ad.selinux_audit_data = &sad;
-	ad.tsk = current;
 	ad.u.cap = cap;
 
 	switch (CAP_TO_INDEX(cap)) {
-- 
cgit v1.2.3


From 07f62eb66c6626aa5653a0fcb34c9c040d0bd032 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 15:01:43 -0400
Subject: LSM: BUILD_BUG_ON if the common_audit_data union ever grows

We did a lot of work to shrink the common_audit_data.  Add a BUILD_BUG_ON
so future programers (let's be honest, probably me) won't do something
foolish like make it large again!

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/lsm_audit.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index e796d251765..8d8d97dbb38 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -213,6 +213,13 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 {
 	struct task_struct *tsk = current;
 
+	/*
+	 * To keep stack sizes in check force programers to notice if they
+	 * start making this union too large!  See struct lsm_network_audit
+	 * as an example of how to deal with large data.
+	 */
+	BUILD_BUG_ON(sizeof(a->u) > sizeof(void *)*2);
+
 	audit_log_format(ab, " pid=%d comm=", tsk->pid);
 	audit_log_untrustedstring(ab, tsk->comm);
 
-- 
cgit v1.2.3


From 50c205f5e5c2e2af002fd4ef537ded79b90b1b56 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 15:01:43 -0400
Subject: LSM: do not initialize common_audit_data to 0

It isn't needed.  If you don't set the type of the data associated with
that type it is a pretty obvious programming bug.  So why waste the cycles?

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 include/linux/lsm_audit.h         |  5 ---
 security/apparmor/capability.c    |  2 +-
 security/apparmor/file.c          |  2 +-
 security/apparmor/ipc.c           |  2 +-
 security/apparmor/lib.c           |  2 +-
 security/apparmor/lsm.c           |  2 +-
 security/apparmor/policy.c        |  2 +-
 security/apparmor/policy_unpack.c |  2 +-
 security/apparmor/resource.c      |  2 +-
 security/selinux/avc.c            |  2 +-
 security/selinux/hooks.c          | 68 +++++++++++++++++++--------------------
 security/smack/smack.h            |  2 +-
 12 files changed, 44 insertions(+), 49 deletions(-)

diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index 75368c1aac7..1cc89e9df48 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -92,11 +92,6 @@ int ipv4_skb_to_auditdata(struct sk_buff *skb,
 int ipv6_skb_to_auditdata(struct sk_buff *skb,
 		struct common_audit_data *ad, u8 *proto);
 
-/* Initialize an LSM audit data structure. */
-#define COMMON_AUDIT_DATA_INIT(_d, _t) \
-	{ memset((_d), 0, sizeof(struct common_audit_data)); \
-	 (_d)->type = _t; }
-
 void common_lsm_audit(struct common_audit_data *a,
 	void (*pre_audit)(struct audit_buffer *, void *),
 	void (*post_audit)(struct audit_buffer *, void *));
diff --git a/security/apparmor/capability.c b/security/apparmor/capability.c
index b66a0e4a569..887a5e94894 100644
--- a/security/apparmor/capability.c
+++ b/security/apparmor/capability.c
@@ -65,7 +65,7 @@ static int audit_caps(struct aa_profile *profile, struct task_struct *task,
 	int type = AUDIT_APPARMOR_AUTO;
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_CAP);
+	sa.type = LSM_AUDIT_DATA_CAP;
 	sa.aad = &aad;
 	sa.u.cap = cap;
 	sa.aad->tsk = task;
diff --git a/security/apparmor/file.c b/security/apparmor/file.c
index 6ab264ca85c..cf19d4093ca 100644
--- a/security/apparmor/file.c
+++ b/security/apparmor/file.c
@@ -108,7 +108,7 @@ int aa_audit_file(struct aa_profile *profile, struct file_perms *perms,
 	int type = AUDIT_APPARMOR_AUTO;
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
+	sa.type = LSM_AUDIT_DATA_NONE;
 	sa.aad = &aad;
 	aad.op = op,
 	aad.fs.request = request;
diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
index dba449b74db..cf1071b1423 100644
--- a/security/apparmor/ipc.c
+++ b/security/apparmor/ipc.c
@@ -42,7 +42,7 @@ static int aa_audit_ptrace(struct aa_profile *profile,
 {
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
+	sa.type = LSM_AUDIT_DATA_NONE;
 	sa.aad = &aad;
 	aad.op = OP_PTRACE;
 	aad.target = target;
diff --git a/security/apparmor/lib.c b/security/apparmor/lib.c
index b11a2652f54..7430298116d 100644
--- a/security/apparmor/lib.c
+++ b/security/apparmor/lib.c
@@ -66,7 +66,7 @@ void aa_info_message(const char *str)
 	if (audit_enabled) {
 		struct common_audit_data sa;
 		struct apparmor_audit_data aad = {0,};
-		COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
+		sa.type = LSM_AUDIT_DATA_NONE;
 		sa.aad = &aad;
 		aad.info = str;
 		aa_audit_msg(AUDIT_APPARMOR_STATUS, &sa, NULL);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 4f7bc07b2dc..032daab449b 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -589,7 +589,7 @@ static int apparmor_setprocattr(struct task_struct *task, char *name,
 		} else {
 			struct common_audit_data sa;
 			struct apparmor_audit_data aad = {0,};
-			COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
+			sa.type = LSM_AUDIT_DATA_NONE;
 			sa.aad = &aad;
 			aad.op = OP_SETPROCATTR;
 			aad.info = name;
diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c
index 03dbaef2f8e..421681c7c34 100644
--- a/security/apparmor/policy.c
+++ b/security/apparmor/policy.c
@@ -965,7 +965,7 @@ static int audit_policy(int op, gfp_t gfp, const char *name, const char *info,
 {
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
+	sa.type = LSM_AUDIT_DATA_NONE;
 	sa.aad = &aad;
 	aad.op = op;
 	aad.name = name;
diff --git a/security/apparmor/policy_unpack.c b/security/apparmor/policy_unpack.c
index 504ba4015aa..329b1fd3074 100644
--- a/security/apparmor/policy_unpack.c
+++ b/security/apparmor/policy_unpack.c
@@ -95,7 +95,7 @@ static int audit_iface(struct aa_profile *new, const char *name,
 	struct aa_profile *profile = __aa_current_profile();
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
+	sa.type = LSM_AUDIT_DATA_NONE;
 	sa.aad = &aad;
 	if (e)
 		aad.iface.pos = e->pos - e->start;
diff --git a/security/apparmor/resource.c b/security/apparmor/resource.c
index d06f57b74f7..e1f3d7ef2c5 100644
--- a/security/apparmor/resource.c
+++ b/security/apparmor/resource.c
@@ -52,7 +52,7 @@ static int audit_resource(struct aa_profile *profile, unsigned int resource,
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
 
-	COMMON_AUDIT_DATA_INIT(&sa, LSM_AUDIT_DATA_NONE);
+	sa.type = LSM_AUDIT_DATA_NONE;
 	sa.aad = &aad;
 	aad.op = OP_SETRLIMIT,
 	aad.rlim.rlim = resource;
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index c04eea2bdb0..cd91e25667d 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -469,7 +469,7 @@ noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
 
 	if (!a) {
 		a = &stack_data;
-		COMMON_AUDIT_DATA_INIT(a, LSM_AUDIT_DATA_NONE);
+		a->type = LSM_AUDIT_DATA_NONE;
 		a->selinux_audit_data = &sad;
 	}
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d79762946c6..d9fa2489a55 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1427,7 +1427,7 @@ static int cred_has_capability(const struct cred *cred,
 	u32 av = CAP_TO_MASK(cap);
 	int rc;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_CAP);
+	ad.type = LSM_AUDIT_DATA_CAP;
 	ad.selinux_audit_data = &sad;
 	ad.u.cap = cap;
 
@@ -1498,7 +1498,7 @@ static inline int dentry_has_perm(const struct cred *cred,
 	struct common_audit_data ad;
 	struct selinux_audit_data sad = {0,};
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
+	ad.type = LSM_AUDIT_DATA_DENTRY;
 	ad.u.dentry = dentry;
 	ad.selinux_audit_data = &sad;
 	return inode_has_perm(cred, inode, av, &ad, 0);
@@ -1515,7 +1515,7 @@ static inline int path_has_perm(const struct cred *cred,
 	struct common_audit_data ad;
 	struct selinux_audit_data sad = {0,};
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_PATH);
+	ad.type = LSM_AUDIT_DATA_PATH;
 	ad.u.path = *path;
 	ad.selinux_audit_data = &sad;
 	return inode_has_perm(cred, inode, av, &ad, 0);
@@ -1540,7 +1540,7 @@ static int file_has_perm(const struct cred *cred,
 	u32 sid = cred_sid(cred);
 	int rc;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_PATH);
+	ad.type = LSM_AUDIT_DATA_PATH;
 	ad.u.path = file->f_path;
 	ad.selinux_audit_data = &sad;
 
@@ -1581,7 +1581,7 @@ static int may_create(struct inode *dir,
 	sid = tsec->sid;
 	newsid = tsec->create_sid;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
+	ad.type = LSM_AUDIT_DATA_DENTRY;
 	ad.u.dentry = dentry;
 	ad.selinux_audit_data = &sad;
 
@@ -1636,7 +1636,7 @@ static int may_link(struct inode *dir,
 	dsec = dir->i_security;
 	isec = dentry->d_inode->i_security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
+	ad.type = LSM_AUDIT_DATA_DENTRY;
 	ad.u.dentry = dentry;
 	ad.selinux_audit_data = &sad;
 
@@ -1684,7 +1684,7 @@ static inline int may_rename(struct inode *old_dir,
 	old_is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
 	new_dsec = new_dir->i_security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
+	ad.type = LSM_AUDIT_DATA_DENTRY;
 	ad.selinux_audit_data = &sad;
 
 	ad.u.dentry = old_dentry;
@@ -2010,7 +2010,7 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
 			return rc;
 	}
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_PATH);
+	ad.type = LSM_AUDIT_DATA_PATH;
 	ad.selinux_audit_data = &sad;
 	ad.u.path = bprm->file->f_path;
 
@@ -2134,7 +2134,7 @@ static inline void flush_unauthorized_files(const struct cred *cred,
 
 	/* Revalidate access to inherited open files. */
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_INODE);
+	ad.type = LSM_AUDIT_DATA_INODE;
 	ad.selinux_audit_data = &sad;
 
 	spin_lock(&files->file_lock);
@@ -2484,7 +2484,7 @@ static int selinux_sb_kern_mount(struct super_block *sb, int flags, void *data)
 	if (flags & MS_KERNMOUNT)
 		return 0;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
+	ad.type = LSM_AUDIT_DATA_DENTRY;
 	ad.selinux_audit_data = &sad;
 	ad.u.dentry = sb->s_root;
 	return superblock_has_perm(cred, sb, FILESYSTEM__MOUNT, &ad);
@@ -2496,7 +2496,7 @@ static int selinux_sb_statfs(struct dentry *dentry)
 	struct common_audit_data ad;
 	struct selinux_audit_data sad = {0,};
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
+	ad.type = LSM_AUDIT_DATA_DENTRY;
 	ad.selinux_audit_data = &sad;
 	ad.u.dentry = dentry->d_sb->s_root;
 	return superblock_has_perm(cred, dentry->d_sb, FILESYSTEM__GETATTR, &ad);
@@ -2666,7 +2666,7 @@ static noinline int audit_inode_permission(struct inode *inode,
 	struct inode_security_struct *isec = inode->i_security;
 	int rc;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_INODE);
+	ad.type = LSM_AUDIT_DATA_INODE;
 	ad.selinux_audit_data = &sad;
 	ad.u.inode = inode;
 
@@ -2796,7 +2796,7 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
 	if (!inode_owner_or_capable(inode))
 		return -EPERM;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_DENTRY);
+	ad.type = LSM_AUDIT_DATA_DENTRY;
 	ad.selinux_audit_data = &sad;
 	ad.u.dentry = dentry;
 
@@ -3411,7 +3411,7 @@ static int selinux_kernel_module_request(char *kmod_name)
 
 	sid = task_sid(current);
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_KMOD);
+	ad.type = LSM_AUDIT_DATA_KMOD;
 	ad.selinux_audit_data = &sad;
 	ad.u.kmod_name = kmod_name;
 
@@ -3792,7 +3792,7 @@ static int sock_has_perm(struct task_struct *task, struct sock *sk, u32 perms)
 	if (sksec->sid == SECINITSID_KERNEL)
 		return 0;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+	ad.type = LSM_AUDIT_DATA_NET;
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->sk = sk;
@@ -3900,7 +3900,7 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
 						      snum, &sid);
 				if (err)
 					goto out;
-				COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+				ad.type = LSM_AUDIT_DATA_NET;
 				ad.selinux_audit_data = &sad;
 				ad.u.net = &net;
 				ad.u.net->sport = htons(snum);
@@ -3935,7 +3935,7 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
 		if (err)
 			goto out;
 
-		COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+		ad.type = LSM_AUDIT_DATA_NET;
 		ad.selinux_audit_data = &sad;
 		ad.u.net = &net;
 		ad.u.net->sport = htons(snum);
@@ -3997,7 +3997,7 @@ static int selinux_socket_connect(struct socket *sock, struct sockaddr *address,
 		perm = (sksec->sclass == SECCLASS_TCP_SOCKET) ?
 		       TCP_SOCKET__NAME_CONNECT : DCCP_SOCKET__NAME_CONNECT;
 
-		COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+		ad.type = LSM_AUDIT_DATA_NET;
 		ad.selinux_audit_data = &sad;
 		ad.u.net = &net;
 		ad.u.net->dport = htons(snum);
@@ -4094,7 +4094,7 @@ static int selinux_socket_unix_stream_connect(struct sock *sock,
 	struct lsm_network_audit net = {0,};
 	int err;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+	ad.type = LSM_AUDIT_DATA_NET;
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->sk = other;
@@ -4127,7 +4127,7 @@ static int selinux_socket_unix_may_send(struct socket *sock,
 	struct selinux_audit_data sad = {0,};
 	struct lsm_network_audit net = {0,};
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+	ad.type = LSM_AUDIT_DATA_NET;
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->sk = other->sk;
@@ -4170,7 +4170,7 @@ static int selinux_sock_rcv_skb_compat(struct sock *sk, struct sk_buff *skb,
 	struct lsm_network_audit net = {0,};
 	char *addrp;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+	ad.type = LSM_AUDIT_DATA_NET;
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = skb->skb_iif;
@@ -4226,7 +4226,7 @@ static int selinux_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	if (!secmark_active && !peerlbl_active)
 		return 0;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+	ad.type = LSM_AUDIT_DATA_NET;
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = skb->skb_iif;
@@ -4583,7 +4583,7 @@ static unsigned int selinux_ip_forward(struct sk_buff *skb, int ifindex,
 	if (selinux_skb_peerlbl_sid(skb, family, &peer_sid) != 0)
 		return NF_DROP;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+	ad.type = LSM_AUDIT_DATA_NET;
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = ifindex;
@@ -4683,7 +4683,7 @@ static unsigned int selinux_ip_postroute_compat(struct sk_buff *skb,
 		return NF_ACCEPT;
 	sksec = sk->sk_security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+	ad.type = LSM_AUDIT_DATA_NET;
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = ifindex;
@@ -4756,7 +4756,7 @@ static unsigned int selinux_ip_postroute(struct sk_buff *skb, int ifindex,
 		secmark_perm = PACKET__SEND;
 	}
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_NET);
+	ad.type = LSM_AUDIT_DATA_NET;
 	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = ifindex;
@@ -4880,7 +4880,7 @@ static int ipc_has_perm(struct kern_ipc_perm *ipc_perms,
 
 	isec = ipc_perms->security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
+	ad.type = LSM_AUDIT_DATA_IPC;
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = ipc_perms->key;
 
@@ -4912,7 +4912,7 @@ static int selinux_msg_queue_alloc_security(struct msg_queue *msq)
 
 	isec = msq->q_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
+	ad.type = LSM_AUDIT_DATA_IPC;
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
@@ -4939,7 +4939,7 @@ static int selinux_msg_queue_associate(struct msg_queue *msq, int msqflg)
 
 	isec = msq->q_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
+	ad.type = LSM_AUDIT_DATA_IPC;
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
@@ -5001,7 +5001,7 @@ static int selinux_msg_queue_msgsnd(struct msg_queue *msq, struct msg_msg *msg,
 			return rc;
 	}
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
+	ad.type = LSM_AUDIT_DATA_IPC;
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
@@ -5034,7 +5034,7 @@ static int selinux_msg_queue_msgrcv(struct msg_queue *msq, struct msg_msg *msg,
 	isec = msq->q_perm.security;
 	msec = msg->security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
+	ad.type = LSM_AUDIT_DATA_IPC;
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
@@ -5061,7 +5061,7 @@ static int selinux_shm_alloc_security(struct shmid_kernel *shp)
 
 	isec = shp->shm_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
+	ad.type = LSM_AUDIT_DATA_IPC;
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = shp->shm_perm.key;
 
@@ -5088,7 +5088,7 @@ static int selinux_shm_associate(struct shmid_kernel *shp, int shmflg)
 
 	isec = shp->shm_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
+	ad.type = LSM_AUDIT_DATA_IPC;
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = shp->shm_perm.key;
 
@@ -5157,7 +5157,7 @@ static int selinux_sem_alloc_security(struct sem_array *sma)
 
 	isec = sma->sem_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
+	ad.type = LSM_AUDIT_DATA_IPC;
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = sma->sem_perm.key;
 
@@ -5184,7 +5184,7 @@ static int selinux_sem_associate(struct sem_array *sma, int semflg)
 
 	isec = sma->sem_perm.security;
 
-	COMMON_AUDIT_DATA_INIT(&ad, LSM_AUDIT_DATA_IPC);
+	ad.type = LSM_AUDIT_DATA_IPC;
 	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = sma->sem_perm.key;
 
diff --git a/security/smack/smack.h b/security/smack/smack.h
index 4ede719922e..b61e75f224d 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -319,7 +319,7 @@ void smack_log(char *subject_label, char *object_label,
 static inline void smk_ad_init(struct smk_audit_info *a, const char *func,
 			       char type)
 {
-	memset(a, 0, sizeof(*a));
+	memset(&a->sad, 0, sizeof(a->sad));
 	a->a.type = type;
 	a->a.smack_audit_data = &a->sad;
 	a->a.smack_audit_data->function = func;
-- 
cgit v1.2.3


From 1d3492927118d0ce1ea1ff3e007746699cba8f3e Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 15:01:43 -0400
Subject: SELinux: remove auditdeny from selinux_audit_data

It's just takin' up space.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/include/avc.h | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/security/selinux/include/avc.h b/security/selinux/include/avc.h
index e4e50bb218e..faa277729cb 100644
--- a/security/selinux/include/avc.h
+++ b/security/selinux/include/avc.h
@@ -63,11 +63,6 @@ struct selinux_late_audit_data {
  * We collect this at the beginning or during an selinux security operation
  */
 struct selinux_audit_data {
-	/*
-	 * auditdeny is a bit tricky and unintuitive.  See the
-	 * comments in avc.c for it's meaning and usage.
-	 */
-	u32 auditdeny;
 	struct selinux_late_audit_data *slad;
 };
 
@@ -145,9 +140,7 @@ static inline int avc_audit(u32 ssid, u32 tsid,
 			    struct common_audit_data *a, unsigned flags)
 {
 	u32 audited, denied;
-	audited = avc_audit_required(requested, avd, result,
-				     a ? a->selinux_audit_data->auditdeny : 0,
-				     &denied);
+	audited = avc_audit_required(requested, avd, result, 0, &denied);
 	if (likely(!audited))
 		return 0;
 	return slow_avc_audit(ssid, tsid, tclass,
-- 
cgit v1.2.3


From 899838b25f063a94594b1df6e0100aea1ec57fac Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Wed, 4 Apr 2012 15:01:43 -0400
Subject: SELinux: unify the selinux_audit_data and selinux_late_audit_data

We no longer need the distinction.  We only need data after we decide to do an
audit.  So turn the "late" audit data into just "data" and remove what we
currently have as "data".

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/avc.c         | 31 ++++++++++---------
 security/selinux/hooks.c       | 67 ------------------------------------------
 security/selinux/include/avc.h |  9 +-----
 3 files changed, 16 insertions(+), 91 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index cd91e25667d..c03a964ffde 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -436,9 +436,9 @@ static void avc_audit_pre_callback(struct audit_buffer *ab, void *a)
 {
 	struct common_audit_data *ad = a;
 	audit_log_format(ab, "avc:  %s ",
-			 ad->selinux_audit_data->slad->denied ? "denied" : "granted");
-	avc_dump_av(ab, ad->selinux_audit_data->slad->tclass,
-			ad->selinux_audit_data->slad->audited);
+			 ad->selinux_audit_data->denied ? "denied" : "granted");
+	avc_dump_av(ab, ad->selinux_audit_data->tclass,
+			ad->selinux_audit_data->audited);
 	audit_log_format(ab, " for ");
 }
 
@@ -452,9 +452,9 @@ static void avc_audit_post_callback(struct audit_buffer *ab, void *a)
 {
 	struct common_audit_data *ad = a;
 	audit_log_format(ab, " ");
-	avc_dump_query(ab, ad->selinux_audit_data->slad->ssid,
-			   ad->selinux_audit_data->slad->tsid,
-			   ad->selinux_audit_data->slad->tclass);
+	avc_dump_query(ab, ad->selinux_audit_data->ssid,
+			   ad->selinux_audit_data->tsid,
+			   ad->selinux_audit_data->tclass);
 }
 
 /* This is the slow part of avc audit with big stack footprint */
@@ -464,13 +464,11 @@ noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
 		unsigned flags)
 {
 	struct common_audit_data stack_data;
-	struct selinux_audit_data sad = {0,};
-	struct selinux_late_audit_data slad;
+	struct selinux_audit_data sad;
 
 	if (!a) {
 		a = &stack_data;
 		a->type = LSM_AUDIT_DATA_NONE;
-		a->selinux_audit_data = &sad;
 	}
 
 	/*
@@ -484,14 +482,15 @@ noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
 	    (flags & MAY_NOT_BLOCK))
 		return -ECHILD;
 
-	slad.tclass = tclass;
-	slad.requested = requested;
-	slad.ssid = ssid;
-	slad.tsid = tsid;
-	slad.audited = audited;
-	slad.denied = denied;
+	sad.tclass = tclass;
+	sad.requested = requested;
+	sad.ssid = ssid;
+	sad.tsid = tsid;
+	sad.audited = audited;
+	sad.denied = denied;
+
+	a->selinux_audit_data = &sad;
 
-	a->selinux_audit_data->slad = &slad;
 	common_lsm_audit(a, avc_audit_pre_callback, avc_audit_post_callback);
 	return 0;
 }
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d9fa2489a55..2578de549ad 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1420,7 +1420,6 @@ static int cred_has_capability(const struct cred *cred,
 			       int cap, int audit)
 {
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct av_decision avd;
 	u16 sclass;
 	u32 sid = cred_sid(cred);
@@ -1428,7 +1427,6 @@ static int cred_has_capability(const struct cred *cred,
 	int rc;
 
 	ad.type = LSM_AUDIT_DATA_CAP;
-	ad.selinux_audit_data = &sad;
 	ad.u.cap = cap;
 
 	switch (CAP_TO_INDEX(cap)) {
@@ -1496,11 +1494,9 @@ static inline int dentry_has_perm(const struct cred *cred,
 {
 	struct inode *inode = dentry->d_inode;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 
 	ad.type = LSM_AUDIT_DATA_DENTRY;
 	ad.u.dentry = dentry;
-	ad.selinux_audit_data = &sad;
 	return inode_has_perm(cred, inode, av, &ad, 0);
 }
 
@@ -1513,11 +1509,9 @@ static inline int path_has_perm(const struct cred *cred,
 {
 	struct inode *inode = path->dentry->d_inode;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 
 	ad.type = LSM_AUDIT_DATA_PATH;
 	ad.u.path = *path;
-	ad.selinux_audit_data = &sad;
 	return inode_has_perm(cred, inode, av, &ad, 0);
 }
 
@@ -1536,13 +1530,11 @@ static int file_has_perm(const struct cred *cred,
 	struct file_security_struct *fsec = file->f_security;
 	struct inode *inode = file->f_path.dentry->d_inode;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = cred_sid(cred);
 	int rc;
 
 	ad.type = LSM_AUDIT_DATA_PATH;
 	ad.u.path = file->f_path;
-	ad.selinux_audit_data = &sad;
 
 	if (sid != fsec->sid) {
 		rc = avc_has_perm(sid, fsec->sid,
@@ -1572,7 +1564,6 @@ static int may_create(struct inode *dir,
 	struct superblock_security_struct *sbsec;
 	u32 sid, newsid;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	int rc;
 
 	dsec = dir->i_security;
@@ -1583,7 +1574,6 @@ static int may_create(struct inode *dir,
 
 	ad.type = LSM_AUDIT_DATA_DENTRY;
 	ad.u.dentry = dentry;
-	ad.selinux_audit_data = &sad;
 
 	rc = avc_has_perm(sid, dsec->sid, SECCLASS_DIR,
 			  DIR__ADD_NAME | DIR__SEARCH,
@@ -1628,7 +1618,6 @@ static int may_link(struct inode *dir,
 {
 	struct inode_security_struct *dsec, *isec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = current_sid();
 	u32 av;
 	int rc;
@@ -1638,7 +1627,6 @@ static int may_link(struct inode *dir,
 
 	ad.type = LSM_AUDIT_DATA_DENTRY;
 	ad.u.dentry = dentry;
-	ad.selinux_audit_data = &sad;
 
 	av = DIR__SEARCH;
 	av |= (kind ? DIR__REMOVE_NAME : DIR__ADD_NAME);
@@ -1673,7 +1661,6 @@ static inline int may_rename(struct inode *old_dir,
 {
 	struct inode_security_struct *old_dsec, *new_dsec, *old_isec, *new_isec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = current_sid();
 	u32 av;
 	int old_is_dir, new_is_dir;
@@ -1685,7 +1672,6 @@ static inline int may_rename(struct inode *old_dir,
 	new_dsec = new_dir->i_security;
 
 	ad.type = LSM_AUDIT_DATA_DENTRY;
-	ad.selinux_audit_data = &sad;
 
 	ad.u.dentry = old_dentry;
 	rc = avc_has_perm(sid, old_dsec->sid, SECCLASS_DIR,
@@ -1971,7 +1957,6 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
 	struct task_security_struct *new_tsec;
 	struct inode_security_struct *isec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct inode *inode = bprm->file->f_path.dentry->d_inode;
 	int rc;
 
@@ -2011,7 +1996,6 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
 	}
 
 	ad.type = LSM_AUDIT_DATA_PATH;
-	ad.selinux_audit_data = &sad;
 	ad.u.path = bprm->file->f_path;
 
 	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
@@ -2101,7 +2085,6 @@ static inline void flush_unauthorized_files(const struct cred *cred,
 					    struct files_struct *files)
 {
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct file *file, *devnull = NULL;
 	struct tty_struct *tty;
 	struct fdtable *fdt;
@@ -2135,7 +2118,6 @@ static inline void flush_unauthorized_files(const struct cred *cred,
 	/* Revalidate access to inherited open files. */
 
 	ad.type = LSM_AUDIT_DATA_INODE;
-	ad.selinux_audit_data = &sad;
 
 	spin_lock(&files->file_lock);
 	for (;;) {
@@ -2473,7 +2455,6 @@ static int selinux_sb_kern_mount(struct super_block *sb, int flags, void *data)
 {
 	const struct cred *cred = current_cred();
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	int rc;
 
 	rc = superblock_doinit(sb, data);
@@ -2485,7 +2466,6 @@ static int selinux_sb_kern_mount(struct super_block *sb, int flags, void *data)
 		return 0;
 
 	ad.type = LSM_AUDIT_DATA_DENTRY;
-	ad.selinux_audit_data = &sad;
 	ad.u.dentry = sb->s_root;
 	return superblock_has_perm(cred, sb, FILESYSTEM__MOUNT, &ad);
 }
@@ -2494,10 +2474,8 @@ static int selinux_sb_statfs(struct dentry *dentry)
 {
 	const struct cred *cred = current_cred();
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 
 	ad.type = LSM_AUDIT_DATA_DENTRY;
-	ad.selinux_audit_data = &sad;
 	ad.u.dentry = dentry->d_sb->s_root;
 	return superblock_has_perm(cred, dentry->d_sb, FILESYSTEM__GETATTR, &ad);
 }
@@ -2662,12 +2640,10 @@ static noinline int audit_inode_permission(struct inode *inode,
 					   unsigned flags)
 {
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct inode_security_struct *isec = inode->i_security;
 	int rc;
 
 	ad.type = LSM_AUDIT_DATA_INODE;
-	ad.selinux_audit_data = &sad;
 	ad.u.inode = inode;
 
 	rc = slow_avc_audit(current_sid(), isec->sid, isec->sclass, perms,
@@ -2782,7 +2758,6 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
 	struct inode_security_struct *isec = inode->i_security;
 	struct superblock_security_struct *sbsec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 newsid, sid = current_sid();
 	int rc = 0;
 
@@ -2797,7 +2772,6 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
 		return -EPERM;
 
 	ad.type = LSM_AUDIT_DATA_DENTRY;
-	ad.selinux_audit_data = &sad;
 	ad.u.dentry = dentry;
 
 	rc = avc_has_perm(sid, isec->sid, isec->sclass,
@@ -3407,12 +3381,10 @@ static int selinux_kernel_module_request(char *kmod_name)
 {
 	u32 sid;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 
 	sid = task_sid(current);
 
 	ad.type = LSM_AUDIT_DATA_KMOD;
-	ad.selinux_audit_data = &sad;
 	ad.u.kmod_name = kmod_name;
 
 	return avc_has_perm(sid, SECINITSID_KERNEL, SECCLASS_SYSTEM,
@@ -3785,7 +3757,6 @@ static int sock_has_perm(struct task_struct *task, struct sock *sk, u32 perms)
 {
 	struct sk_security_struct *sksec = sk->sk_security;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct lsm_network_audit net = {0,};
 	u32 tsid = task_sid(task);
 
@@ -3793,7 +3764,6 @@ static int sock_has_perm(struct task_struct *task, struct sock *sk, u32 perms)
 		return 0;
 
 	ad.type = LSM_AUDIT_DATA_NET;
-	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->sk = sk;
 
@@ -3873,7 +3843,6 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
 		char *addrp;
 		struct sk_security_struct *sksec = sk->sk_security;
 		struct common_audit_data ad;
-		struct selinux_audit_data sad = {0,};
 		struct lsm_network_audit net = {0,};
 		struct sockaddr_in *addr4 = NULL;
 		struct sockaddr_in6 *addr6 = NULL;
@@ -3901,7 +3870,6 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
 				if (err)
 					goto out;
 				ad.type = LSM_AUDIT_DATA_NET;
-				ad.selinux_audit_data = &sad;
 				ad.u.net = &net;
 				ad.u.net->sport = htons(snum);
 				ad.u.net->family = family;
@@ -3936,7 +3904,6 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
 			goto out;
 
 		ad.type = LSM_AUDIT_DATA_NET;
-		ad.selinux_audit_data = &sad;
 		ad.u.net = &net;
 		ad.u.net->sport = htons(snum);
 		ad.u.net->family = family;
@@ -3971,7 +3938,6 @@ static int selinux_socket_connect(struct socket *sock, struct sockaddr *address,
 	if (sksec->sclass == SECCLASS_TCP_SOCKET ||
 	    sksec->sclass == SECCLASS_DCCP_SOCKET) {
 		struct common_audit_data ad;
-		struct selinux_audit_data sad = {0,};
 		struct lsm_network_audit net = {0,};
 		struct sockaddr_in *addr4 = NULL;
 		struct sockaddr_in6 *addr6 = NULL;
@@ -3998,7 +3964,6 @@ static int selinux_socket_connect(struct socket *sock, struct sockaddr *address,
 		       TCP_SOCKET__NAME_CONNECT : DCCP_SOCKET__NAME_CONNECT;
 
 		ad.type = LSM_AUDIT_DATA_NET;
-		ad.selinux_audit_data = &sad;
 		ad.u.net = &net;
 		ad.u.net->dport = htons(snum);
 		ad.u.net->family = sk->sk_family;
@@ -4090,12 +4055,10 @@ static int selinux_socket_unix_stream_connect(struct sock *sock,
 	struct sk_security_struct *sksec_other = other->sk_security;
 	struct sk_security_struct *sksec_new = newsk->sk_security;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct lsm_network_audit net = {0,};
 	int err;
 
 	ad.type = LSM_AUDIT_DATA_NET;
-	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->sk = other;
 
@@ -4124,11 +4087,9 @@ static int selinux_socket_unix_may_send(struct socket *sock,
 	struct sk_security_struct *ssec = sock->sk->sk_security;
 	struct sk_security_struct *osec = other->sk->sk_security;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct lsm_network_audit net = {0,};
 
 	ad.type = LSM_AUDIT_DATA_NET;
-	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->sk = other->sk;
 
@@ -4166,12 +4127,10 @@ static int selinux_sock_rcv_skb_compat(struct sock *sk, struct sk_buff *skb,
 	struct sk_security_struct *sksec = sk->sk_security;
 	u32 sk_sid = sksec->sid;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct lsm_network_audit net = {0,};
 	char *addrp;
 
 	ad.type = LSM_AUDIT_DATA_NET;
-	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = skb->skb_iif;
 	ad.u.net->family = family;
@@ -4201,7 +4160,6 @@ static int selinux_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	u16 family = sk->sk_family;
 	u32 sk_sid = sksec->sid;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct lsm_network_audit net = {0,};
 	char *addrp;
 	u8 secmark_active;
@@ -4227,7 +4185,6 @@ static int selinux_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		return 0;
 
 	ad.type = LSM_AUDIT_DATA_NET;
-	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = skb->skb_iif;
 	ad.u.net->family = family;
@@ -4565,7 +4522,6 @@ static unsigned int selinux_ip_forward(struct sk_buff *skb, int ifindex,
 	char *addrp;
 	u32 peer_sid;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct lsm_network_audit net = {0,};
 	u8 secmark_active;
 	u8 netlbl_active;
@@ -4584,7 +4540,6 @@ static unsigned int selinux_ip_forward(struct sk_buff *skb, int ifindex,
 		return NF_DROP;
 
 	ad.type = LSM_AUDIT_DATA_NET;
-	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = ifindex;
 	ad.u.net->family = family;
@@ -4674,7 +4629,6 @@ static unsigned int selinux_ip_postroute_compat(struct sk_buff *skb,
 	struct sock *sk = skb->sk;
 	struct sk_security_struct *sksec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct lsm_network_audit net = {0,};
 	char *addrp;
 	u8 proto;
@@ -4684,7 +4638,6 @@ static unsigned int selinux_ip_postroute_compat(struct sk_buff *skb,
 	sksec = sk->sk_security;
 
 	ad.type = LSM_AUDIT_DATA_NET;
-	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = ifindex;
 	ad.u.net->family = family;
@@ -4709,7 +4662,6 @@ static unsigned int selinux_ip_postroute(struct sk_buff *skb, int ifindex,
 	u32 peer_sid;
 	struct sock *sk;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	struct lsm_network_audit net = {0,};
 	char *addrp;
 	u8 secmark_active;
@@ -4757,7 +4709,6 @@ static unsigned int selinux_ip_postroute(struct sk_buff *skb, int ifindex,
 	}
 
 	ad.type = LSM_AUDIT_DATA_NET;
-	ad.selinux_audit_data = &sad;
 	ad.u.net = &net;
 	ad.u.net->netif = ifindex;
 	ad.u.net->family = family;
@@ -4875,13 +4826,11 @@ static int ipc_has_perm(struct kern_ipc_perm *ipc_perms,
 {
 	struct ipc_security_struct *isec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = current_sid();
 
 	isec = ipc_perms->security;
 
 	ad.type = LSM_AUDIT_DATA_IPC;
-	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = ipc_perms->key;
 
 	return avc_has_perm(sid, isec->sid, isec->sclass, perms, &ad);
@@ -4902,7 +4851,6 @@ static int selinux_msg_queue_alloc_security(struct msg_queue *msq)
 {
 	struct ipc_security_struct *isec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = current_sid();
 	int rc;
 
@@ -4913,7 +4861,6 @@ static int selinux_msg_queue_alloc_security(struct msg_queue *msq)
 	isec = msq->q_perm.security;
 
 	ad.type = LSM_AUDIT_DATA_IPC;
-	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
 	rc = avc_has_perm(sid, isec->sid, SECCLASS_MSGQ,
@@ -4934,13 +4881,11 @@ static int selinux_msg_queue_associate(struct msg_queue *msq, int msqflg)
 {
 	struct ipc_security_struct *isec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = current_sid();
 
 	isec = msq->q_perm.security;
 
 	ad.type = LSM_AUDIT_DATA_IPC;
-	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
 	return avc_has_perm(sid, isec->sid, SECCLASS_MSGQ,
@@ -4980,7 +4925,6 @@ static int selinux_msg_queue_msgsnd(struct msg_queue *msq, struct msg_msg *msg,
 	struct ipc_security_struct *isec;
 	struct msg_security_struct *msec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = current_sid();
 	int rc;
 
@@ -5002,7 +4946,6 @@ static int selinux_msg_queue_msgsnd(struct msg_queue *msq, struct msg_msg *msg,
 	}
 
 	ad.type = LSM_AUDIT_DATA_IPC;
-	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
 	/* Can this process write to the queue? */
@@ -5027,7 +4970,6 @@ static int selinux_msg_queue_msgrcv(struct msg_queue *msq, struct msg_msg *msg,
 	struct ipc_security_struct *isec;
 	struct msg_security_struct *msec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = task_sid(target);
 	int rc;
 
@@ -5035,7 +4977,6 @@ static int selinux_msg_queue_msgrcv(struct msg_queue *msq, struct msg_msg *msg,
 	msec = msg->security;
 
 	ad.type = LSM_AUDIT_DATA_IPC;
-	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = msq->q_perm.key;
 
 	rc = avc_has_perm(sid, isec->sid,
@@ -5051,7 +4992,6 @@ static int selinux_shm_alloc_security(struct shmid_kernel *shp)
 {
 	struct ipc_security_struct *isec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = current_sid();
 	int rc;
 
@@ -5062,7 +5002,6 @@ static int selinux_shm_alloc_security(struct shmid_kernel *shp)
 	isec = shp->shm_perm.security;
 
 	ad.type = LSM_AUDIT_DATA_IPC;
-	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = shp->shm_perm.key;
 
 	rc = avc_has_perm(sid, isec->sid, SECCLASS_SHM,
@@ -5083,13 +5022,11 @@ static int selinux_shm_associate(struct shmid_kernel *shp, int shmflg)
 {
 	struct ipc_security_struct *isec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = current_sid();
 
 	isec = shp->shm_perm.security;
 
 	ad.type = LSM_AUDIT_DATA_IPC;
-	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = shp->shm_perm.key;
 
 	return avc_has_perm(sid, isec->sid, SECCLASS_SHM,
@@ -5147,7 +5084,6 @@ static int selinux_sem_alloc_security(struct sem_array *sma)
 {
 	struct ipc_security_struct *isec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = current_sid();
 	int rc;
 
@@ -5158,7 +5094,6 @@ static int selinux_sem_alloc_security(struct sem_array *sma)
 	isec = sma->sem_perm.security;
 
 	ad.type = LSM_AUDIT_DATA_IPC;
-	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = sma->sem_perm.key;
 
 	rc = avc_has_perm(sid, isec->sid, SECCLASS_SEM,
@@ -5179,13 +5114,11 @@ static int selinux_sem_associate(struct sem_array *sma, int semflg)
 {
 	struct ipc_security_struct *isec;
 	struct common_audit_data ad;
-	struct selinux_audit_data sad = {0,};
 	u32 sid = current_sid();
 
 	isec = sma->sem_perm.security;
 
 	ad.type = LSM_AUDIT_DATA_IPC;
-	ad.selinux_audit_data = &sad;
 	ad.u.ipc_id = sma->sem_perm.key;
 
 	return avc_has_perm(sid, isec->sid, SECCLASS_SEM,
diff --git a/security/selinux/include/avc.h b/security/selinux/include/avc.h
index faa277729cb..d97fadc4d96 100644
--- a/security/selinux/include/avc.h
+++ b/security/selinux/include/avc.h
@@ -49,7 +49,7 @@ struct avc_cache_stats {
 /*
  * We only need this data after we have decided to send an audit message.
  */
-struct selinux_late_audit_data {
+struct selinux_audit_data {
 	u32 ssid;
 	u32 tsid;
 	u16 tclass;
@@ -59,13 +59,6 @@ struct selinux_late_audit_data {
 	int result;
 };
 
-/*
- * We collect this at the beginning or during an selinux security operation
- */
-struct selinux_audit_data {
-	struct selinux_late_audit_data *slad;
-};
-
 /*
  * AVC operations
  */
-- 
cgit v1.2.3


From 0b36e44cc680b355f0d1b34002b2a10c9e1cae60 Mon Sep 17 00:00:00 2001
From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Date: Wed, 7 Mar 2012 22:17:13 +0800
Subject: SELinux: replace weak GFP_ATOMIC to GFP_KERNEL in avc_add_callback

avc_add_callback now only called from initcalls, so replace the
weak GFP_ATOMIC to GFP_KERNEL, and mark this function __init
to make a warning when not been called from initcalls.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/avc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index c03a964ffde..5c1326e1883 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -510,7 +510,7 @@ noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
  * @perms based on @tclass.  Returns %0 on success or
  * -%ENOMEM if insufficient memory exists to add the callback.
  */
-int avc_add_callback(int (*callback)(u32 event, u32 ssid, u32 tsid,
+int __init avc_add_callback(int (*callback)(u32 event, u32 ssid, u32 tsid,
 				     u16 tclass, u32 perms,
 				     u32 *out_retained),
 		     u32 events, u32 ssid, u32 tsid,
@@ -519,7 +519,7 @@ int avc_add_callback(int (*callback)(u32 event, u32 ssid, u32 tsid,
 	struct avc_callback_node *c;
 	int rc = 0;
 
-	c = kmalloc(sizeof(*c), GFP_ATOMIC);
+	c = kmalloc(sizeof(*c), GFP_KERNEL);
 	if (!c) {
 		rc = -ENOMEM;
 		goto out;
-- 
cgit v1.2.3


From 562c99f20d989f222138dddfd71e275bfb3665de Mon Sep 17 00:00:00 2001
From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Date: Wed, 7 Mar 2012 22:17:14 +0800
Subject: SELinux: avc: remove the useless fields in avc_add_callback

avc_add_callback now just used for registering reset functions
in initcalls, and the callback functions just did reset operations.
So, reducing the arguments to only one event is enough now.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/avc.c         | 32 ++++++--------------------------
 security/selinux/include/avc.h |  6 +-----
 security/selinux/netif.c       |  6 ++----
 security/selinux/netnode.c     |  6 ++----
 security/selinux/netport.c     |  6 ++----
 security/selinux/ss/services.c |  6 ++----
 6 files changed, 15 insertions(+), 47 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 5c1326e1883..68d82daed25 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -65,14 +65,8 @@ struct avc_cache {
 };
 
 struct avc_callback_node {
-	int (*callback) (u32 event, u32 ssid, u32 tsid,
-			 u16 tclass, u32 perms,
-			 u32 *out_retained);
+	int (*callback) (u32 event);
 	u32 events;
-	u32 ssid;
-	u32 tsid;
-	u16 tclass;
-	u32 perms;
 	struct avc_callback_node *next;
 };
 
@@ -499,22 +493,12 @@ noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
  * avc_add_callback - Register a callback for security events.
  * @callback: callback function
  * @events: security events
- * @ssid: source security identifier or %SECSID_WILD
- * @tsid: target security identifier or %SECSID_WILD
- * @tclass: target security class
- * @perms: permissions
  *
- * Register a callback function for events in the set @events
- * related to the SID pair (@ssid, @tsid) 
- * and the permissions @perms, interpreting
- * @perms based on @tclass.  Returns %0 on success or
- * -%ENOMEM if insufficient memory exists to add the callback.
+ * Register a callback function for events in the set @events.
+ * Returns %0 on success or -%ENOMEM if insufficient memory
+ * exists to add the callback.
  */
-int __init avc_add_callback(int (*callback)(u32 event, u32 ssid, u32 tsid,
-				     u16 tclass, u32 perms,
-				     u32 *out_retained),
-		     u32 events, u32 ssid, u32 tsid,
-		     u16 tclass, u32 perms)
+int __init avc_add_callback(int (*callback)(u32 event), u32 events)
 {
 	struct avc_callback_node *c;
 	int rc = 0;
@@ -527,9 +511,6 @@ int __init avc_add_callback(int (*callback)(u32 event, u32 ssid, u32 tsid,
 
 	c->callback = callback;
 	c->events = events;
-	c->ssid = ssid;
-	c->tsid = tsid;
-	c->perms = perms;
 	c->next = avc_callbacks;
 	avc_callbacks = c;
 out:
@@ -669,8 +650,7 @@ int avc_ss_reset(u32 seqno)
 
 	for (c = avc_callbacks; c; c = c->next) {
 		if (c->events & AVC_CALLBACK_RESET) {
-			tmprc = c->callback(AVC_CALLBACK_RESET,
-					    0, 0, 0, 0, NULL);
+			tmprc = c->callback(AVC_CALLBACK_RESET);
 			/* save the first error encountered for the return
 			   value and continue processing the callbacks */
 			if (!rc)
diff --git a/security/selinux/include/avc.h b/security/selinux/include/avc.h
index d97fadc4d96..92d0ab561db 100644
--- a/security/selinux/include/avc.h
+++ b/security/selinux/include/avc.h
@@ -170,11 +170,7 @@ u32 avc_policy_seqno(void);
 #define AVC_CALLBACK_AUDITDENY_ENABLE	64
 #define AVC_CALLBACK_AUDITDENY_DISABLE	128
 
-int avc_add_callback(int (*callback)(u32 event, u32 ssid, u32 tsid,
-				     u16 tclass, u32 perms,
-				     u32 *out_retained),
-		     u32 events, u32 ssid, u32 tsid,
-		     u16 tclass, u32 perms);
+int avc_add_callback(int (*callback)(u32 event), u32 events);
 
 /* Exported to selinuxfs */
 int avc_get_hash_stats(char *page);
diff --git a/security/selinux/netif.c b/security/selinux/netif.c
index 326f22cbe40..47a49d1a6f6 100644
--- a/security/selinux/netif.c
+++ b/security/selinux/netif.c
@@ -252,8 +252,7 @@ static void sel_netif_flush(void)
 	spin_unlock_bh(&sel_netif_lock);
 }
 
-static int sel_netif_avc_callback(u32 event, u32 ssid, u32 tsid,
-				  u16 class, u32 perms, u32 *retained)
+static int sel_netif_avc_callback(u32 event)
 {
 	if (event == AVC_CALLBACK_RESET) {
 		sel_netif_flush();
@@ -292,8 +291,7 @@ static __init int sel_netif_init(void)
 
 	register_netdevice_notifier(&sel_netif_netdev_notifier);
 
-	err = avc_add_callback(sel_netif_avc_callback, AVC_CALLBACK_RESET,
-			       SECSID_NULL, SECSID_NULL, SECCLASS_NULL, 0);
+	err = avc_add_callback(sel_netif_avc_callback, AVC_CALLBACK_RESET);
 	if (err)
 		panic("avc_add_callback() failed, error %d\n", err);
 
diff --git a/security/selinux/netnode.c b/security/selinux/netnode.c
index 86365857c08..28f911cdd7c 100644
--- a/security/selinux/netnode.c
+++ b/security/selinux/netnode.c
@@ -297,8 +297,7 @@ static void sel_netnode_flush(void)
 	spin_unlock_bh(&sel_netnode_lock);
 }
 
-static int sel_netnode_avc_callback(u32 event, u32 ssid, u32 tsid,
-				    u16 class, u32 perms, u32 *retained)
+static int sel_netnode_avc_callback(u32 event)
 {
 	if (event == AVC_CALLBACK_RESET) {
 		sel_netnode_flush();
@@ -320,8 +319,7 @@ static __init int sel_netnode_init(void)
 		sel_netnode_hash[iter].size = 0;
 	}
 
-	ret = avc_add_callback(sel_netnode_avc_callback, AVC_CALLBACK_RESET,
-			       SECSID_NULL, SECSID_NULL, SECCLASS_NULL, 0);
+	ret = avc_add_callback(sel_netnode_avc_callback, AVC_CALLBACK_RESET);
 	if (ret != 0)
 		panic("avc_add_callback() failed, error %d\n", ret);
 
diff --git a/security/selinux/netport.c b/security/selinux/netport.c
index 7b9eb1faf68..d35379781c2 100644
--- a/security/selinux/netport.c
+++ b/security/selinux/netport.c
@@ -234,8 +234,7 @@ static void sel_netport_flush(void)
 	spin_unlock_bh(&sel_netport_lock);
 }
 
-static int sel_netport_avc_callback(u32 event, u32 ssid, u32 tsid,
-				    u16 class, u32 perms, u32 *retained)
+static int sel_netport_avc_callback(u32 event)
 {
 	if (event == AVC_CALLBACK_RESET) {
 		sel_netport_flush();
@@ -257,8 +256,7 @@ static __init int sel_netport_init(void)
 		sel_netport_hash[iter].size = 0;
 	}
 
-	ret = avc_add_callback(sel_netport_avc_callback, AVC_CALLBACK_RESET,
-			       SECSID_NULL, SECSID_NULL, SECCLASS_NULL, 0);
+	ret = avc_add_callback(sel_netport_avc_callback, AVC_CALLBACK_RESET);
 	if (ret != 0)
 		panic("avc_add_callback() failed, error %d\n", ret);
 
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 9b7e7ed54e7..4321b8fc886 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -3044,8 +3044,7 @@ out:
 
 static int (*aurule_callback)(void) = audit_update_lsm_rules;
 
-static int aurule_avc_callback(u32 event, u32 ssid, u32 tsid,
-			       u16 class, u32 perms, u32 *retained)
+static int aurule_avc_callback(u32 event)
 {
 	int err = 0;
 
@@ -3058,8 +3057,7 @@ static int __init aurule_init(void)
 {
 	int err;
 
-	err = avc_add_callback(aurule_avc_callback, AVC_CALLBACK_RESET,
-			       SECSID_NULL, SECSID_NULL, SECCLASS_NULL, 0);
+	err = avc_add_callback(aurule_avc_callback, AVC_CALLBACK_RESET);
 	if (err)
 		panic("avc_add_callback() failed, error %d\n", err);
 
-- 
cgit v1.2.3


From c737f8284cac91428f8fcc8281e69117fa16e887 Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Thu, 5 Apr 2012 13:51:53 -0400
Subject: SELinux: remove unused common_audit_data in flush_unauthorized_files

We don't need this variable and it just eats stack space.  Remove it.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 security/selinux/hooks.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 2578de549ad..e94349b85bf 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2084,7 +2084,6 @@ static int selinux_bprm_secureexec(struct linux_binprm *bprm)
 static inline void flush_unauthorized_files(const struct cred *cred,
 					    struct files_struct *files)
 {
-	struct common_audit_data ad;
 	struct file *file, *devnull = NULL;
 	struct tty_struct *tty;
 	struct fdtable *fdt;
@@ -2116,9 +2115,6 @@ static inline void flush_unauthorized_files(const struct cred *cred,
 		no_tty();
 
 	/* Revalidate access to inherited open files. */
-
-	ad.type = LSM_AUDIT_DATA_INODE;
-
 	spin_lock(&files->file_lock);
 	for (;;) {
 		unsigned long set, i;
-- 
cgit v1.2.3


From 259e5e6c75a910f3b5e656151dc602f53f9d7548 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto@amacapital.net>
Date: Thu, 12 Apr 2012 16:47:50 -0500
Subject: Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privs

With this change, calling
  prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
disables privilege granting operations at execve-time.  For example, a
process will not be able to execute a setuid binary to change their uid
or gid if this bit is set.  The same is true for file capabilities.

Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
LSMs respect the requested behavior.

To determine if the NO_NEW_PRIVS bit is set, a task may call
  prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
It returns 1 if set and 0 if it is not set. If any of the arguments are
non-zero, it will return -1 and set errno to -EINVAL.
(PR_SET_NO_NEW_PRIVS behaves similarly.)

This functionality is desired for the proposed seccomp filter patch
series.  By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
system call behavior for itself and its child tasks without being
able to impact the behavior of a more privileged task.

Another potential use is making certain privileged operations
unprivileged.  For example, chroot may be considered "safe" if it cannot
affect privileged tasks.

Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
set and AppArmor is in use.  It is fixed in a subsequent patch.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>

v18: updated change desc
v17: using new define values as per 3.4
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 fs/exec.c                  | 10 +++++++++-
 include/linux/prctl.h      | 15 +++++++++++++++
 include/linux/sched.h      |  2 ++
 include/linux/security.h   |  1 +
 kernel/sys.c               | 10 ++++++++++
 security/apparmor/domain.c |  4 ++++
 security/commoncap.c       |  7 +++++--
 security/selinux/hooks.c   | 10 +++++++++-
 8 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b1fd2025e59..d038968b54b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1245,6 +1245,13 @@ static int check_unsafe_exec(struct linux_binprm *bprm)
 			bprm->unsafe |= LSM_UNSAFE_PTRACE;
 	}
 
+	/*
+	 * This isn't strictly necessary, but it makes it harder for LSMs to
+	 * mess up.
+	 */
+	if (current->no_new_privs)
+		bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS;
+
 	n_fs = 1;
 	spin_lock(&p->fs->lock);
 	rcu_read_lock();
@@ -1288,7 +1295,8 @@ int prepare_binprm(struct linux_binprm *bprm)
 	bprm->cred->euid = current_euid();
 	bprm->cred->egid = current_egid();
 
-	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
+	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) &&
+	    !current->no_new_privs) {
 		/* Set-uid? */
 		if (mode & S_ISUID) {
 			bprm->per_clear |= PER_CLEAR_ON_SETID;
diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index e0cfec2490a..78b76e24cc7 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -124,4 +124,19 @@
 #define PR_SET_CHILD_SUBREAPER 36
 #define PR_GET_CHILD_SUBREAPER 37
 
+/*
+ * If no_new_privs is set, then operations that grant new privileges (i.e.
+ * execve) will either fail or not grant them.  This affects suid/sgid,
+ * file capabilities, and LSMs.
+ *
+ * Operations that merely manipulate or drop existing privileges (setresuid,
+ * capset, etc.) will still work.  Drop those privileges if you want them gone.
+ *
+ * Changing LSM security domain is considered a new privilege.  So, for example,
+ * asking selinux for a specific new context (e.g. with runcon) will result
+ * in execve returning -EPERM.
+ */
+#define PR_SET_NO_NEW_PRIVS 38
+#define PR_GET_NO_NEW_PRIVS 39
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81a173c0897..ba60897bb44 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1341,6 +1341,8 @@ struct task_struct {
 				 * execve */
 	unsigned in_iowait:1;
 
+	/* task may not gain privileges */
+	unsigned no_new_privs:1;
 
 	/* Revert to default priority/policy when forking */
 	unsigned sched_reset_on_fork:1;
diff --git a/include/linux/security.h b/include/linux/security.h
index 673afbb8238..6e1dea93907 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -144,6 +144,7 @@ struct request_sock;
 #define LSM_UNSAFE_SHARE	1
 #define LSM_UNSAFE_PTRACE	2
 #define LSM_UNSAFE_PTRACE_CAP	4
+#define LSM_UNSAFE_NO_NEW_PRIVS	8
 
 #ifdef CONFIG_MMU
 extern int mmap_min_addr_handler(struct ctl_table *table, int write,
diff --git a/kernel/sys.c b/kernel/sys.c
index e7006eb6c1e..b82568b7d20 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1979,6 +1979,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			error = put_user(me->signal->is_child_subreaper,
 					 (int __user *) arg2);
 			break;
+		case PR_SET_NO_NEW_PRIVS:
+			if (arg2 != 1 || arg3 || arg4 || arg5)
+				return -EINVAL;
+
+			current->no_new_privs = 1;
+			break;
+		case PR_GET_NO_NEW_PRIVS:
+			if (arg2 || arg3 || arg4 || arg5)
+				return -EINVAL;
+			return current->no_new_privs ? 1 : 0;
 		default:
 			error = -EINVAL;
 			break;
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 6327685c101..18c88d06e88 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -360,6 +360,10 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 	if (bprm->cred_prepared)
 		return 0;
 
+	/* XXX: no_new_privs is not usable with AppArmor yet */
+	if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)
+		return -EPERM;
+
 	cxt = bprm->cred->security;
 	BUG_ON(!cxt);
 
diff --git a/security/commoncap.c b/security/commoncap.c
index 0cf4b53480a..edd3918fac0 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -506,14 +506,17 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 skip:
 
 	/* Don't let someone trace a set[ug]id/setpcap binary with the revised
-	 * credentials unless they have the appropriate permit
+	 * credentials unless they have the appropriate permit.
+	 *
+	 * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
 	 */
 	if ((new->euid != old->uid ||
 	     new->egid != old->gid ||
 	     !cap_issubset(new->cap_permitted, old->cap_permitted)) &&
 	    bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
 		/* downgrade; they get no more than they had, and maybe less */
-		if (!capable(CAP_SETUID)) {
+		if (!capable(CAP_SETUID) ||
+		    (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) {
 			new->euid = new->uid;
 			new->egid = new->gid;
 		}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d85b793c932..0b06685787b 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2016,6 +2016,13 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
 		new_tsec->sid = old_tsec->exec_sid;
 		/* Reset exec SID on execve. */
 		new_tsec->exec_sid = 0;
+
+		/*
+		 * Minimize confusion: if no_new_privs and a transition is
+		 * explicitly requested, then fail the exec.
+		 */
+		if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)
+			return -EPERM;
 	} else {
 		/* Check for a default transition on this program. */
 		rc = security_transition_sid(old_tsec->sid, isec->sid,
@@ -2029,7 +2036,8 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
 	ad.selinux_audit_data = &sad;
 	ad.u.path = bprm->file->f_path;
 
-	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+	if ((bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) ||
+	    (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))
 		new_tsec->sid = old_tsec->sid;
 
 	if (new_tsec->sid == old_tsec->sid) {
-- 
cgit v1.2.3


From c29bceb3967398cf2ac8bf8edf9634fdb722df7d Mon Sep 17 00:00:00 2001
From: John Johansen <john.johansen@canonical.com>
Date: Thu, 12 Apr 2012 16:47:51 -0500
Subject: Fix execve behavior apparmor for PR_{GET,SET}_NO_NEW_PRIVS

Add support for AppArmor to explicitly fail requested domain transitions
if NO_NEW_PRIVS is set and the task is not unconfined.

Transitions from unconfined are still allowed because this always results
in a reduction of privileges.

Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>

v18: new acked-by, new description
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 security/apparmor/domain.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 18c88d06e88..b81ea10a17a 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -360,10 +360,6 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 	if (bprm->cred_prepared)
 		return 0;
 
-	/* XXX: no_new_privs is not usable with AppArmor yet */
-	if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)
-		return -EPERM;
-
 	cxt = bprm->cred->security;
 	BUG_ON(!cxt);
 
@@ -398,6 +394,11 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 			new_profile = find_attach(ns, &ns->base.profiles, name);
 		if (!new_profile)
 			goto cleanup;
+		/*
+		 * NOTE: Domain transitions from unconfined are allowed
+		 * even when no_new_privs is set because this aways results
+		 * in a further reduction of permissions.
+		 */
 		goto apply;
 	}
 
@@ -459,6 +460,16 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 		/* fail exec */
 		error = -EACCES;
 
+	/*
+	 * Policy has specified a domain transition, if no_new_privs then
+	 * fail the exec.
+	 */
+	if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS) {
+		aa_put_profile(new_profile);
+		error = -EPERM;
+		goto cleanup;
+	}
+
 	if (!new_profile)
 		goto audit;
 
@@ -613,6 +624,14 @@ int aa_change_hat(const char *hats[], int count, u64 token, bool permtest)
 	const char *target = NULL, *info = NULL;
 	int error = 0;
 
+	/*
+	 * Fail explicitly requested domain transitions if no_new_privs.
+	 * There is no exception for unconfined as change_hat is not
+	 * available.
+	 */
+	if (current->no_new_privs)
+		return -EPERM;
+
 	/* released below */
 	cred = get_current_cred();
 	cxt = cred->security;
@@ -754,6 +773,18 @@ int aa_change_profile(const char *ns_name, const char *hname, bool onexec,
 	cxt = cred->security;
 	profile = aa_cred_profile(cred);
 
+	/*
+	 * Fail explicitly requested domain transitions if no_new_privs
+	 * and not unconfined.
+	 * Domain transitions from unconfined are allowed even when
+	 * no_new_privs is set because this aways results in a reduction
+	 * of permissions.
+	 */
+	if (current->no_new_privs && !unconfined(profile)) {
+		put_cred(cred);
+		return -EPERM;
+	}
+
 	if (ns_name) {
 		/* released below */
 		ns = aa_find_namespace(profile->ns, ns_name);
-- 
cgit v1.2.3


From 46b325c7eb01482674406701825ff67f561ccdd4 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:47:52 -0500
Subject: sk_run_filter: add BPF_S_ANC_SECCOMP_LD_W

Introduces a new BPF ancillary instruction that all LD calls will be
mapped through when skb_run_filter() is being used for seccomp BPF.  The
rewriting will be done using a secondary chk_filter function that is run
after skb_chk_filter.

The code change is guarded by CONFIG_SECCOMP_FILTER which is added,
along with the seccomp_bpf_load() function later in this series.

This is based on http://lkml.org/lkml/2012/3/2/141

Suggested-by: Indan Zupancic <indan@nul.nu>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: rebase
...
v15: include seccomp.h explicitly for when seccomp_bpf_load exists.
v14: First cut using a single additional instruction
... v13: made bpf functions generic.
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 include/linux/filter.h | 1 +
 net/core/filter.c      | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 8eeb205f298..aaa2e80630b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -228,6 +228,7 @@ enum {
 	BPF_S_ANC_HATYPE,
 	BPF_S_ANC_RXHASH,
 	BPF_S_ANC_CPU,
+	BPF_S_ANC_SECCOMP_LD_W,
 };
 
 #endif /* __KERNEL__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index 6f755cca452..491e2e1ec27 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -38,6 +38,7 @@
 #include <linux/filter.h>
 #include <linux/reciprocal_div.h>
 #include <linux/ratelimit.h>
+#include <linux/seccomp.h>
 
 /* No hurry in this branch
  *
@@ -352,6 +353,11 @@ load_b:
 				A = 0;
 			continue;
 		}
+#ifdef CONFIG_SECCOMP_FILTER
+		case BPF_S_ANC_SECCOMP_LD_W:
+			A = seccomp_bpf_load(fentry->k);
+			continue;
+#endif
 		default:
 			WARN_RATELIMIT(1, "Unknown code:%u jt:%u tf:%u k:%u\n",
 				       fentry->code, fentry->jt,
-- 
cgit v1.2.3


From 0c5fe1b4221c6701224c2601cf3c692e5721103e Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:47:53 -0500
Subject: net/compat.c,linux/filter.h: share compat_sock_fprog

Any other users of bpf_*_filter that take a struct sock_fprog from
userspace will need to be able to also accept a compat_sock_fprog
if the arch supports compat calls.  This change allows the existing
compat_sock_fprog be shared.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: tasered by the apostrophe police
v14: rebase/nochanges
v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: rebase on to linux-next
v11: introduction
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 include/linux/filter.h | 11 +++++++++++
 net/compat.c           |  8 --------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index aaa2e80630b..f2e53152e83 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -10,6 +10,7 @@
 
 #ifdef __KERNEL__
 #include <linux/atomic.h>
+#include <linux/compat.h>
 #endif
 
 /*
@@ -132,6 +133,16 @@ struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
 
 #ifdef __KERNEL__
 
+#ifdef CONFIG_COMPAT
+/*
+ * A struct sock_filter is architecture independent.
+ */
+struct compat_sock_fprog {
+	u16		len;
+	compat_uptr_t	filter;		/* struct sock_filter * */
+};
+#endif
+
 struct sk_buff;
 struct sock;
 
diff --git a/net/compat.c b/net/compat.c
index e055708b8ec..242c828810f 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -328,14 +328,6 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
 	__scm_destroy(scm);
 }
 
-/*
- * A struct sock_filter is architecture independent.
- */
-struct compat_sock_fprog {
-	u16		len;
-	compat_uptr_t	filter;		/* struct sock_filter * */
-};
-
 static int do_set_attach_filter(struct socket *sock, int level, int optname,
 				char __user *optval, unsigned int optlen)
 {
-- 
cgit v1.2.3


From 932ecebb0405b9a41cd18946e6cff8a17d434e23 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:47:54 -0500
Subject: seccomp: kill the seccomp_t typedef

Replaces the seccomp_t typedef with struct seccomp to match modern
kernel style.

Signed-off-by: Will Drewry <wad@chromium.org>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: rebase
...
v14: rebase/nochanges
v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: rebase on to linux-next
v8-v11: no changes
v7: struct seccomp_struct -> struct seccomp
v6: original inclusion in this series.
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 include/linux/sched.h   |  2 +-
 include/linux/seccomp.h | 10 ++++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ba60897bb44..cad15023f45 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1452,7 +1452,7 @@ struct task_struct {
 	uid_t loginuid;
 	unsigned int sessionid;
 #endif
-	seccomp_t seccomp;
+	struct seccomp seccomp;
 
 /* Thread group tracking */
    	u32 parent_exec_id;
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index cc7a4e9cc7a..d61f27fcaa9 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -7,7 +7,9 @@
 #include <linux/thread_info.h>
 #include <asm/seccomp.h>
 
-typedef struct { int mode; } seccomp_t;
+struct seccomp {
+	int mode;
+};
 
 extern void __secure_computing(int);
 static inline void secure_computing(int this_syscall)
@@ -19,7 +21,7 @@ static inline void secure_computing(int this_syscall)
 extern long prctl_get_seccomp(void);
 extern long prctl_set_seccomp(unsigned long);
 
-static inline int seccomp_mode(seccomp_t *s)
+static inline int seccomp_mode(struct seccomp *s)
 {
 	return s->mode;
 }
@@ -28,7 +30,7 @@ static inline int seccomp_mode(seccomp_t *s)
 
 #include <linux/errno.h>
 
-typedef struct { } seccomp_t;
+struct seccomp { };
 
 #define secure_computing(x) do { } while (0)
 
@@ -42,7 +44,7 @@ static inline long prctl_set_seccomp(unsigned long arg2)
 	return -EINVAL;
 }
 
-static inline int seccomp_mode(seccomp_t *s)
+static inline int seccomp_mode(struct seccomp *s)
 {
 	return 0;
 }
-- 
cgit v1.2.3


From 07bd18d00d5dcf84eb22f8120f47f09c3d8fe27d Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:47:55 -0500
Subject: asm/syscall.h: add syscall_get_arch

Adds a stub for a function that will return the AUDIT_ARCH_* value
appropriate to the supplied task based on the system call convention.

For audit's use, the value can generally be hard-coded at the
audit-site.  However, for other functionality not inlined into syscall
entry/exit, this makes that information available.  seccomp_filter is
the first planned consumer and, as such, the comment indicates a tie to
CONFIG_HAVE_ARCH_SECCOMP_FILTER.

Suggested-by: Roland McGrath <mcgrathr@chromium.org>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: comment and change reword and rebase.
v14: rebase/nochanges
v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: rebase on to linux-next
v11: fixed improper return type
v10: introduced
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 include/asm-generic/syscall.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
index 5c122ae6bfa..5b09392db67 100644
--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -142,4 +142,18 @@ void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
 			   unsigned int i, unsigned int n,
 			   const unsigned long *args);
 
+/**
+ * syscall_get_arch - return the AUDIT_ARCH for the current system call
+ * @task:	task of interest, must be in system call entry tracing
+ * @regs:	task_pt_regs() of @task
+ *
+ * Returns the AUDIT_ARCH_* based on the system call convention in use.
+ *
+ * It's only valid to call this when @task is stopped on entry to a system
+ * call, due to %TIF_SYSCALL_TRACE, %TIF_SYSCALL_AUDIT, or %TIF_SECCOMP.
+ *
+ * Architectures which permit CONFIG_HAVE_ARCH_SECCOMP_FILTER must
+ * provide an implementation of this.
+ */
+int syscall_get_arch(struct task_struct *task, struct pt_regs *regs);
 #endif	/* _ASM_SYSCALL_H */
-- 
cgit v1.2.3


From b7456536cf9466b402b540c5588d79a4177c723a Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:47:56 -0500
Subject: arch/x86: add syscall_get_arch to syscall.h

Add syscall_get_arch() to export the current AUDIT_ARCH_* based on system call
entry path.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>

v18: - update comment about x32 tasks
     - rebase to v3.4-rc2
v17: rebase and reviewed-by
v14: rebase/nochanges
v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 arch/x86/include/asm/syscall.h | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 386b78686c4..1ace47b6259 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -13,9 +13,11 @@
 #ifndef _ASM_X86_SYSCALL_H
 #define _ASM_X86_SYSCALL_H
 
+#include <linux/audit.h>
 #include <linux/sched.h>
 #include <linux/err.h>
 #include <asm/asm-offsets.h>	/* For NR_syscalls */
+#include <asm/thread_info.h>	/* for TS_COMPAT */
 #include <asm/unistd.h>
 
 extern const unsigned long sys_call_table[];
@@ -88,6 +90,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	memcpy(&regs->bx + i, args, n * sizeof(args[0]));
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return AUDIT_ARCH_I386;
+}
+
 #else	 /* CONFIG_X86_64 */
 
 static inline void syscall_get_arguments(struct task_struct *task,
@@ -212,6 +220,25 @@ static inline void syscall_set_arguments(struct task_struct *task,
 		}
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+#ifdef CONFIG_IA32_EMULATION
+	/*
+	 * TS_COMPAT is set for 32-bit syscall entry and then
+	 * remains set until we return to user mode.
+	 *
+	 * TIF_IA32 tasks should always have TS_COMPAT set at
+	 * system call time.
+	 *
+	 * x32 tasks should be considered AUDIT_ARCH_X86_64.
+	 */
+	if (task_thread_info(task)->status & TS_COMPAT)
+		return AUDIT_ARCH_I386;
+#endif
+	/* Both x32 and x86_64 are considered "64-bit". */
+	return AUDIT_ARCH_X86_64;
+}
 #endif	/* CONFIG_X86_32 */
 
 #endif	/* _ASM_X86_SYSCALL_H */
-- 
cgit v1.2.3


From e2cfabdfd075648216f99c2c03821cf3f47c1727 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:47:57 -0500
Subject: seccomp: add system call filtering using BPF

[This patch depends on luto@mit.edu's no_new_privs patch:
   https://lkml.org/lkml/2012/1/30/264
 The whole series including Andrew's patches can be found here:
   https://github.com/redpig/linux/tree/seccomp
 Complete diff here:
   https://github.com/redpig/linux/compare/1dc65fed...seccomp
]

This patch adds support for seccomp mode 2.  Mode 2 introduces the
ability for unprivileged processes to install system call filtering
policy expressed in terms of a Berkeley Packet Filter (BPF) program.
This program will be evaluated in the kernel for each system call
the task makes and computes a result based on data in the format
of struct seccomp_data.

A filter program may be installed by calling:
  struct sock_fprog fprog = { ... };
  ...
  prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog);

The return value of the filter program determines if the system call is
allowed to proceed or denied.  If the first filter program installed
allows prctl(2) calls, then the above call may be made repeatedly
by a task to further reduce its access to the kernel.  All attached
programs must be evaluated before a system call will be allowed to
proceed.

Filter programs will be inherited across fork/clone and execve.
However, if the task attaching the filter is unprivileged
(!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task.  This
ensures that unprivileged tasks cannot attach filters that affect
privileged tasks (e.g., setuid binary).

There are a number of benefits to this approach. A few of which are
as follows:
- BPF has been exposed to userland for a long time
- BPF optimization (and JIT'ing) are well understood
- Userland already knows its ABI: system call numbers and desired
  arguments
- No time-of-check-time-of-use vulnerable data accesses are possible.
- system call arguments are loaded on access only to minimize copying
  required for system call policy decisions.

Mode 2 support is restricted to architectures that enable
HAVE_ARCH_SECCOMP_FILTER.  In this patch, the primary dependency is on
syscall_get_arguments().  The full desired scope of this feature will
add a few minor additional requirements expressed later in this series.
Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be
the desired additional functionality.

No architectures are enabled in this patch.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: Indan Zupancic <indan@nul.nu>
Acked-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>

v18: - rebase to v3.4-rc2
     - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org)
     - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu)
     - add a comment for get_u32 regarding endianness (akpm@)
     - fix other typos, style mistakes (akpm@)
     - added acked-by
v17: - properly guard seccomp filter needed headers (leann@ubuntu.com)
     - tighten return mask to 0x7fff0000
v16: - no change
v15: - add a 4 instr penalty when counting a path to account for seccomp_filter
       size (indan@nul.nu)
     - drop the max insns to 256KB (indan@nul.nu)
     - return ENOMEM if the max insns limit has been hit (indan@nul.nu)
     - move IP checks after args (indan@nul.nu)
     - drop !user_filter check (indan@nul.nu)
     - only allow explicit bpf codes (indan@nul.nu)
     - exit_code -> exit_sig
v14: - put/get_seccomp_filter takes struct task_struct
       (indan@nul.nu,keescook@chromium.org)
     - adds seccomp_chk_filter and drops general bpf_run/chk_filter user
     - add seccomp_bpf_load for use by net/core/filter.c
     - lower max per-process/per-hierarchy: 1MB
     - moved nnp/capability check prior to allocation
       (all of the above: indan@nul.nu)
v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com)
     - removed copy_seccomp (keescook@chromium.org,indan@nul.nu)
     - reworded the prctl_set_seccomp comment (indan@nul.nu)
v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com)
     - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu)
     - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu)
     - pare down Kconfig doc reference.
     - extra comment clean up
v10: - seccomp_data has changed again to be more aesthetically pleasing
       (hpa@zytor.com)
     - calling convention is noted in a new u32 field using syscall_get_arch.
       This allows for cross-calling convention tasks to use seccomp filters.
       (hpa@zytor.com)
     - lots of clean up (thanks, Indan!)
 v9: - n/a
 v8: - use bpf_chk_filter, bpf_run_filter. update load_fns
     - Lots of fixes courtesy of indan@nul.nu:
     -- fix up load behavior, compat fixups, and merge alloc code,
     -- renamed pc and dropped __packed, use bool compat.
     -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch
        dependencies
 v7:  (massive overhaul thanks to Indan, others)
     - added CONFIG_HAVE_ARCH_SECCOMP_FILTER
     - merged into seccomp.c
     - minimal seccomp_filter.h
     - no config option (part of seccomp)
     - no new prctl
     - doesn't break seccomp on systems without asm/syscall.h
       (works but arg access always fails)
     - dropped seccomp_init_task, extra free functions, ...
     - dropped the no-asm/syscall.h code paths
     - merges with network sk_run_filter and sk_chk_filter
 v6: - fix memory leak on attach compat check failure
     - require no_new_privs || CAP_SYS_ADMIN prior to filter
       installation. (luto@mit.edu)
     - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com)
     - cleaned up Kconfig (amwang@redhat.com)
     - on block, note if the call was compat (so the # means something)
 v5: - uses syscall_get_arguments
       (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org)
      - uses union-based arg storage with hi/lo struct to
        handle endianness.  Compromises between the two alternate
        proposals to minimize extra arg shuffling and account for
        endianness assuming userspace uses offsetof().
        (mcgrathr@chromium.org, indan@nul.nu)
      - update Kconfig description
      - add include/seccomp_filter.h and add its installation
      - (naive) on-demand syscall argument loading
      - drop seccomp_t (eparis@redhat.com)
 v4:  - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS
      - now uses current->no_new_privs
        (luto@mit.edu,torvalds@linux-foundation.com)
      - assign names to seccomp modes (rdunlap@xenotime.net)
      - fix style issues (rdunlap@xenotime.net)
      - reworded Kconfig entry (rdunlap@xenotime.net)
 v3:  - macros to inline (oleg@redhat.com)
      - init_task behavior fixed (oleg@redhat.com)
      - drop creator entry and extra NULL check (oleg@redhat.com)
      - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com)
      - adds tentative use of "always_unprivileged" as per
        torvalds@linux-foundation.org and luto@mit.edu
 v2:  - (patch 2 only)
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 arch/Kconfig            |  17 +++
 include/linux/Kbuild    |   1 +
 include/linux/seccomp.h |  76 +++++++++-
 kernel/fork.c           |   3 +
 kernel/seccomp.c        | 396 +++++++++++++++++++++++++++++++++++++++++++++---
 kernel/sys.c            |   2 +-
 6 files changed, 472 insertions(+), 23 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 684eb5af439..91c2c730fc1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -216,4 +216,21 @@ config HAVE_CMPXCHG_DOUBLE
 config ARCH_WANT_OLD_COMPAT_IPC
 	bool
 
+config HAVE_ARCH_SECCOMP_FILTER
+	bool
+	help
+	  This symbol should be selected by an architecure if it provides
+	  asm/syscall.h, specifically syscall_get_arguments() and
+	  syscall_get_arch().
+
+config SECCOMP_FILTER
+	def_bool y
+	depends on HAVE_ARCH_SECCOMP_FILTER && SECCOMP && NET
+	help
+	  Enable tasks to build secure computing environments defined
+	  in terms of Berkeley Packet Filter programs which implement
+	  task-defined system call filtering polices.
+
+	  See Documentation/prctl/seccomp_filter.txt for details.
+
 source "kernel/gcov/Kconfig"
diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index 3c9b616c834..5c93d6c5d59 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -332,6 +332,7 @@ header-y += scc.h
 header-y += sched.h
 header-y += screen_info.h
 header-y += sdla.h
+header-y += seccomp.h
 header-y += securebits.h
 header-y += selinux_netlink.h
 header-y += sem.h
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index d61f27fcaa9..86bb68fc768 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -1,14 +1,67 @@
 #ifndef _LINUX_SECCOMP_H
 #define _LINUX_SECCOMP_H
 
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+
+/* Valid values for seccomp.mode and prctl(PR_SET_SECCOMP, <mode>) */
+#define SECCOMP_MODE_DISABLED	0 /* seccomp is not in use. */
+#define SECCOMP_MODE_STRICT	1 /* uses hard-coded filter. */
+#define SECCOMP_MODE_FILTER	2 /* uses user-supplied filter. */
+
+/*
+ * All BPF programs must return a 32-bit value.
+ * The bottom 16-bits are reserved for future use.
+ * The upper 16-bits are ordered from least permissive values to most.
+ *
+ * The ordering ensures that a min_t() over composed return values always
+ * selects the least permissive choice.
+ */
+#define SECCOMP_RET_KILL	0x00000000U /* kill the task immediately */
+#define SECCOMP_RET_ALLOW	0x7fff0000U /* allow */
+
+/* Masks for the return value sections. */
+#define SECCOMP_RET_ACTION	0x7fff0000U
+#define SECCOMP_RET_DATA	0x0000ffffU
+
+/**
+ * struct seccomp_data - the format the BPF program executes over.
+ * @nr: the system call number
+ * @arch: indicates system call convention as an AUDIT_ARCH_* value
+ *        as defined in <linux/audit.h>.
+ * @instruction_pointer: at the time of the system call.
+ * @args: up to 6 system call arguments always stored as 64-bit values
+ *        regardless of the architecture.
+ */
+struct seccomp_data {
+	int nr;
+	__u32 arch;
+	__u64 instruction_pointer;
+	__u64 args[6];
+};
 
+#ifdef __KERNEL__
 #ifdef CONFIG_SECCOMP
 
 #include <linux/thread_info.h>
 #include <asm/seccomp.h>
 
+struct seccomp_filter;
+/**
+ * struct seccomp - the state of a seccomp'ed process
+ *
+ * @mode:  indicates one of the valid values above for controlled
+ *         system calls available to a process.
+ * @filter: The metadata and ruleset for determining what system calls
+ *          are allowed for a task.
+ *
+ *          @filter must only be accessed from the context of current as there
+ *          is no locking.
+ */
 struct seccomp {
 	int mode;
+	struct seccomp_filter *filter;
 };
 
 extern void __secure_computing(int);
@@ -19,7 +72,7 @@ static inline void secure_computing(int this_syscall)
 }
 
 extern long prctl_get_seccomp(void);
-extern long prctl_set_seccomp(unsigned long);
+extern long prctl_set_seccomp(unsigned long, char __user *);
 
 static inline int seccomp_mode(struct seccomp *s)
 {
@@ -31,15 +84,16 @@ static inline int seccomp_mode(struct seccomp *s)
 #include <linux/errno.h>
 
 struct seccomp { };
+struct seccomp_filter { };
 
-#define secure_computing(x) do { } while (0)
+#define secure_computing(x) 0
 
 static inline long prctl_get_seccomp(void)
 {
 	return -EINVAL;
 }
 
-static inline long prctl_set_seccomp(unsigned long arg2)
+static inline long prctl_set_seccomp(unsigned long arg2, char __user *arg3)
 {
 	return -EINVAL;
 }
@@ -48,7 +102,21 @@ static inline int seccomp_mode(struct seccomp *s)
 {
 	return 0;
 }
-
 #endif /* CONFIG_SECCOMP */
 
+#ifdef CONFIG_SECCOMP_FILTER
+extern void put_seccomp_filter(struct task_struct *tsk);
+extern void get_seccomp_filter(struct task_struct *tsk);
+extern u32 seccomp_bpf_load(int off);
+#else  /* CONFIG_SECCOMP_FILTER */
+static inline void put_seccomp_filter(struct task_struct *tsk)
+{
+	return;
+}
+static inline void get_seccomp_filter(struct task_struct *tsk)
+{
+	return;
+}
+#endif /* CONFIG_SECCOMP_FILTER */
+#endif /* __KERNEL__ */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index b9372a0bff1..f7cf6fb107e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -34,6 +34,7 @@
 #include <linux/cgroup.h>
 #include <linux/security.h>
 #include <linux/hugetlb.h>
+#include <linux/seccomp.h>
 #include <linux/swap.h>
 #include <linux/syscalls.h>
 #include <linux/jiffies.h>
@@ -170,6 +171,7 @@ void free_task(struct task_struct *tsk)
 	free_thread_info(tsk->stack);
 	rt_mutex_debug_task_free(tsk);
 	ftrace_graph_exit_task(tsk);
+	put_seccomp_filter(tsk);
 	free_task_struct(tsk);
 }
 EXPORT_SYMBOL(free_task);
@@ -1162,6 +1164,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 		goto fork_out;
 
 	ftrace_graph_init_task(p);
+	get_seccomp_filter(p);
 
 	rt_mutex_init_task(p);
 
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index e8d76c5895e..0aeec1960f9 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -3,16 +3,343 @@
  *
  * Copyright 2004-2005  Andrea Arcangeli <andrea@cpushare.com>
  *
- * This defines a simple but solid secure-computing mode.
+ * Copyright (C) 2012 Google, Inc.
+ * Will Drewry <wad@chromium.org>
+ *
+ * This defines a simple but solid secure-computing facility.
+ *
+ * Mode 1 uses a fixed list of allowed system calls.
+ * Mode 2 allows user-defined system call filters in the form
+ *        of Berkeley Packet Filters/Linux Socket Filters.
  */
 
+#include <linux/atomic.h>
 #include <linux/audit.h>
-#include <linux/seccomp.h>
-#include <linux/sched.h>
 #include <linux/compat.h>
+#include <linux/sched.h>
+#include <linux/seccomp.h>
 
 /* #define SECCOMP_DEBUG 1 */
-#define NR_SECCOMP_MODES 1
+
+#ifdef CONFIG_SECCOMP_FILTER
+#include <asm/syscall.h>
+#include <linux/filter.h>
+#include <linux/security.h>
+#include <linux/slab.h>
+#include <linux/tracehook.h>
+#include <linux/uaccess.h>
+
+/**
+ * struct seccomp_filter - container for seccomp BPF programs
+ *
+ * @usage: reference count to manage the object lifetime.
+ *         get/put helpers should be used when accessing an instance
+ *         outside of a lifetime-guarded section.  In general, this
+ *         is only needed for handling filters shared across tasks.
+ * @prev: points to a previously installed, or inherited, filter
+ * @len: the number of instructions in the program
+ * @insns: the BPF program instructions to evaluate
+ *
+ * seccomp_filter objects are organized in a tree linked via the @prev
+ * pointer.  For any task, it appears to be a singly-linked list starting
+ * with current->seccomp.filter, the most recently attached or inherited filter.
+ * However, multiple filters may share a @prev node, by way of fork(), which
+ * results in a unidirectional tree existing in memory.  This is similar to
+ * how namespaces work.
+ *
+ * seccomp_filter objects should never be modified after being attached
+ * to a task_struct (other than @usage).
+ */
+struct seccomp_filter {
+	atomic_t usage;
+	struct seccomp_filter *prev;
+	unsigned short len;  /* Instruction count */
+	struct sock_filter insns[];
+};
+
+/* Limit any path through the tree to 256KB worth of instructions. */
+#define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
+
+static void seccomp_filter_log_failure(int syscall)
+{
+	int compat = 0;
+#ifdef CONFIG_COMPAT
+	compat = is_compat_task();
+#endif
+	pr_info("%s[%d]: %ssystem call %d blocked at 0x%lx\n",
+		current->comm, task_pid_nr(current),
+		(compat ? "compat " : ""),
+		syscall, KSTK_EIP(current));
+}
+
+/**
+ * get_u32 - returns a u32 offset into data
+ * @data: a unsigned 64 bit value
+ * @index: 0 or 1 to return the first or second 32-bits
+ *
+ * This inline exists to hide the length of unsigned long.  If a 32-bit
+ * unsigned long is passed in, it will be extended and the top 32-bits will be
+ * 0. If it is a 64-bit unsigned long, then whatever data is resident will be
+ * properly returned.
+ *
+ * Endianness is explicitly ignored and left for BPF program authors to manage
+ * as per the specific architecture.
+ */
+static inline u32 get_u32(u64 data, int index)
+{
+	return ((u32 *)&data)[index];
+}
+
+/* Helper for bpf_load below. */
+#define BPF_DATA(_name) offsetof(struct seccomp_data, _name)
+/**
+ * bpf_load: checks and returns a pointer to the requested offset
+ * @off: offset into struct seccomp_data to load from
+ *
+ * Returns the requested 32-bits of data.
+ * seccomp_check_filter() should assure that @off is 32-bit aligned
+ * and not out of bounds.  Failure to do so is a BUG.
+ */
+u32 seccomp_bpf_load(int off)
+{
+	struct pt_regs *regs = task_pt_regs(current);
+	if (off == BPF_DATA(nr))
+		return syscall_get_nr(current, regs);
+	if (off == BPF_DATA(arch))
+		return syscall_get_arch(current, regs);
+	if (off >= BPF_DATA(args[0]) && off < BPF_DATA(args[6])) {
+		unsigned long value;
+		int arg = (off - BPF_DATA(args[0])) / sizeof(u64);
+		int index = !!(off % sizeof(u64));
+		syscall_get_arguments(current, regs, arg, 1, &value);
+		return get_u32(value, index);
+	}
+	if (off == BPF_DATA(instruction_pointer))
+		return get_u32(KSTK_EIP(current), 0);
+	if (off == BPF_DATA(instruction_pointer) + sizeof(u32))
+		return get_u32(KSTK_EIP(current), 1);
+	/* seccomp_check_filter should make this impossible. */
+	BUG();
+}
+
+/**
+ *	seccomp_check_filter - verify seccomp filter code
+ *	@filter: filter to verify
+ *	@flen: length of filter
+ *
+ * Takes a previously checked filter (by sk_chk_filter) and
+ * redirects all filter code that loads struct sk_buff data
+ * and related data through seccomp_bpf_load.  It also
+ * enforces length and alignment checking of those loads.
+ *
+ * Returns 0 if the rule set is legal or -EINVAL if not.
+ */
+static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
+{
+	int pc;
+	for (pc = 0; pc < flen; pc++) {
+		struct sock_filter *ftest = &filter[pc];
+		u16 code = ftest->code;
+		u32 k = ftest->k;
+
+		switch (code) {
+		case BPF_S_LD_W_ABS:
+			ftest->code = BPF_S_ANC_SECCOMP_LD_W;
+			/* 32-bit aligned and not out of bounds. */
+			if (k >= sizeof(struct seccomp_data) || k & 3)
+				return -EINVAL;
+			continue;
+		case BPF_S_LD_W_LEN:
+			ftest->code = BPF_S_LD_IMM;
+			ftest->k = sizeof(struct seccomp_data);
+			continue;
+		case BPF_S_LDX_W_LEN:
+			ftest->code = BPF_S_LDX_IMM;
+			ftest->k = sizeof(struct seccomp_data);
+			continue;
+		/* Explicitly include allowed calls. */
+		case BPF_S_RET_K:
+		case BPF_S_RET_A:
+		case BPF_S_ALU_ADD_K:
+		case BPF_S_ALU_ADD_X:
+		case BPF_S_ALU_SUB_K:
+		case BPF_S_ALU_SUB_X:
+		case BPF_S_ALU_MUL_K:
+		case BPF_S_ALU_MUL_X:
+		case BPF_S_ALU_DIV_X:
+		case BPF_S_ALU_AND_K:
+		case BPF_S_ALU_AND_X:
+		case BPF_S_ALU_OR_K:
+		case BPF_S_ALU_OR_X:
+		case BPF_S_ALU_LSH_K:
+		case BPF_S_ALU_LSH_X:
+		case BPF_S_ALU_RSH_K:
+		case BPF_S_ALU_RSH_X:
+		case BPF_S_ALU_NEG:
+		case BPF_S_LD_IMM:
+		case BPF_S_LDX_IMM:
+		case BPF_S_MISC_TAX:
+		case BPF_S_MISC_TXA:
+		case BPF_S_ALU_DIV_K:
+		case BPF_S_LD_MEM:
+		case BPF_S_LDX_MEM:
+		case BPF_S_ST:
+		case BPF_S_STX:
+		case BPF_S_JMP_JA:
+		case BPF_S_JMP_JEQ_K:
+		case BPF_S_JMP_JEQ_X:
+		case BPF_S_JMP_JGE_K:
+		case BPF_S_JMP_JGE_X:
+		case BPF_S_JMP_JGT_K:
+		case BPF_S_JMP_JGT_X:
+		case BPF_S_JMP_JSET_K:
+		case BPF_S_JMP_JSET_X:
+			continue;
+		default:
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+/**
+ * seccomp_run_filters - evaluates all seccomp filters against @syscall
+ * @syscall: number of the current system call
+ *
+ * Returns valid seccomp BPF response codes.
+ */
+static u32 seccomp_run_filters(int syscall)
+{
+	struct seccomp_filter *f;
+	u32 ret = SECCOMP_RET_KILL;
+	/*
+	 * All filters in the list are evaluated and the lowest BPF return
+	 * value always takes priority.
+	 */
+	for (f = current->seccomp.filter; f; f = f->prev) {
+		ret = sk_run_filter(NULL, f->insns);
+		if (ret != SECCOMP_RET_ALLOW)
+			break;
+	}
+	return ret;
+}
+
+/**
+ * seccomp_attach_filter: Attaches a seccomp filter to current.
+ * @fprog: BPF program to install
+ *
+ * Returns 0 on success or an errno on failure.
+ */
+static long seccomp_attach_filter(struct sock_fprog *fprog)
+{
+	struct seccomp_filter *filter;
+	unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
+	unsigned long total_insns = fprog->len;
+	long ret;
+
+	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
+		return -EINVAL;
+
+	for (filter = current->seccomp.filter; filter; filter = filter->prev)
+		total_insns += filter->len + 4;  /* include a 4 instr penalty */
+	if (total_insns > MAX_INSNS_PER_PATH)
+		return -ENOMEM;
+
+	/*
+	 * Installing a seccomp filter requires that the task have
+	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
+	 * This avoids scenarios where unprivileged tasks can affect the
+	 * behavior of privileged children.
+	 */
+	if (!current->no_new_privs &&
+	    security_capable_noaudit(current_cred(), current_user_ns(),
+				     CAP_SYS_ADMIN) != 0)
+		return -EACCES;
+
+	/* Allocate a new seccomp_filter */
+	filter = kzalloc(sizeof(struct seccomp_filter) + fp_size,
+			 GFP_KERNEL|__GFP_NOWARN);
+	if (!filter)
+		return -ENOMEM;
+	atomic_set(&filter->usage, 1);
+	filter->len = fprog->len;
+
+	/* Copy the instructions from fprog. */
+	ret = -EFAULT;
+	if (copy_from_user(filter->insns, fprog->filter, fp_size))
+		goto fail;
+
+	/* Check and rewrite the fprog via the skb checker */
+	ret = sk_chk_filter(filter->insns, filter->len);
+	if (ret)
+		goto fail;
+
+	/* Check and rewrite the fprog for seccomp use */
+	ret = seccomp_check_filter(filter->insns, filter->len);
+	if (ret)
+		goto fail;
+
+	/*
+	 * If there is an existing filter, make it the prev and don't drop its
+	 * task reference.
+	 */
+	filter->prev = current->seccomp.filter;
+	current->seccomp.filter = filter;
+	return 0;
+fail:
+	kfree(filter);
+	return ret;
+}
+
+/**
+ * seccomp_attach_user_filter - attaches a user-supplied sock_fprog
+ * @user_filter: pointer to the user data containing a sock_fprog.
+ *
+ * Returns 0 on success and non-zero otherwise.
+ */
+long seccomp_attach_user_filter(char __user *user_filter)
+{
+	struct sock_fprog fprog;
+	long ret = -EFAULT;
+
+#ifdef CONFIG_COMPAT
+	if (is_compat_task()) {
+		struct compat_sock_fprog fprog32;
+		if (copy_from_user(&fprog32, user_filter, sizeof(fprog32)))
+			goto out;
+		fprog.len = fprog32.len;
+		fprog.filter = compat_ptr(fprog32.filter);
+	} else /* falls through to the if below. */
+#endif
+	if (copy_from_user(&fprog, user_filter, sizeof(fprog)))
+		goto out;
+	ret = seccomp_attach_filter(&fprog);
+out:
+	return ret;
+}
+
+/* get_seccomp_filter - increments the reference count of the filter on @tsk */
+void get_seccomp_filter(struct task_struct *tsk)
+{
+	struct seccomp_filter *orig = tsk->seccomp.filter;
+	if (!orig)
+		return;
+	/* Reference count is bounded by the number of total processes. */
+	atomic_inc(&orig->usage);
+}
+
+/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
+void put_seccomp_filter(struct task_struct *tsk)
+{
+	struct seccomp_filter *orig = tsk->seccomp.filter;
+	/* Clean up single-reference branches iteratively. */
+	while (orig && atomic_dec_and_test(&orig->usage)) {
+		struct seccomp_filter *freeme = orig;
+		orig = orig->prev;
+		kfree(freeme);
+	}
+}
+#endif	/* CONFIG_SECCOMP_FILTER */
 
 /*
  * Secure computing mode 1 allows only read/write/exit/sigreturn.
@@ -34,10 +361,11 @@ static int mode1_syscalls_32[] = {
 void __secure_computing(int this_syscall)
 {
 	int mode = current->seccomp.mode;
-	int * syscall;
+	int exit_sig = 0;
+	int *syscall;
 
 	switch (mode) {
-	case 1:
+	case SECCOMP_MODE_STRICT:
 		syscall = mode1_syscalls;
 #ifdef CONFIG_COMPAT
 		if (is_compat_task())
@@ -47,7 +375,16 @@ void __secure_computing(int this_syscall)
 			if (*syscall == this_syscall)
 				return;
 		} while (*++syscall);
+		exit_sig = SIGKILL;
 		break;
+#ifdef CONFIG_SECCOMP_FILTER
+	case SECCOMP_MODE_FILTER:
+		if (seccomp_run_filters(this_syscall) == SECCOMP_RET_ALLOW)
+			return;
+		seccomp_filter_log_failure(this_syscall);
+		exit_sig = SIGSYS;
+		break;
+#endif
 	default:
 		BUG();
 	}
@@ -56,7 +393,7 @@ void __secure_computing(int this_syscall)
 	dump_stack();
 #endif
 	audit_seccomp(this_syscall);
-	do_exit(SIGKILL);
+	do_exit(exit_sig);
 }
 
 long prctl_get_seccomp(void)
@@ -64,25 +401,48 @@ long prctl_get_seccomp(void)
 	return current->seccomp.mode;
 }
 
-long prctl_set_seccomp(unsigned long seccomp_mode)
+/**
+ * prctl_set_seccomp: configures current->seccomp.mode
+ * @seccomp_mode: requested mode to use
+ * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
+ *
+ * This function may be called repeatedly with a @seccomp_mode of
+ * SECCOMP_MODE_FILTER to install additional filters.  Every filter
+ * successfully installed will be evaluated (in reverse order) for each system
+ * call the task makes.
+ *
+ * Once current->seccomp.mode is non-zero, it may not be changed.
+ *
+ * Returns 0 on success or -EINVAL on failure.
+ */
+long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 {
-	long ret;
+	long ret = -EINVAL;
 
-	/* can set it only once to be even more secure */
-	ret = -EPERM;
-	if (unlikely(current->seccomp.mode))
+	if (current->seccomp.mode &&
+	    current->seccomp.mode != seccomp_mode)
 		goto out;
 
-	ret = -EINVAL;
-	if (seccomp_mode && seccomp_mode <= NR_SECCOMP_MODES) {
-		current->seccomp.mode = seccomp_mode;
-		set_thread_flag(TIF_SECCOMP);
+	switch (seccomp_mode) {
+	case SECCOMP_MODE_STRICT:
+		ret = 0;
 #ifdef TIF_NOTSC
 		disable_TSC();
 #endif
-		ret = 0;
+		break;
+#ifdef CONFIG_SECCOMP_FILTER
+	case SECCOMP_MODE_FILTER:
+		ret = seccomp_attach_user_filter(filter);
+		if (ret)
+			goto out;
+		break;
+#endif
+	default:
+		goto out;
 	}
 
- out:
+	current->seccomp.mode = seccomp_mode;
+	set_thread_flag(TIF_SECCOMP);
+out:
 	return ret;
 }
diff --git a/kernel/sys.c b/kernel/sys.c
index b82568b7d20..ba0ae8eea6f 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1908,7 +1908,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			error = prctl_get_seccomp();
 			break;
 		case PR_SET_SECCOMP:
-			error = prctl_set_seccomp(arg2);
+			error = prctl_set_seccomp(arg2, (char __user *)arg3);
 			break;
 		case PR_GET_TSC:
 			error = GET_TSC_CTL(arg2);
-- 
cgit v1.2.3


From 3dc1c1b2d2ed7507ce8a379814ad75745ff97ebe Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Thu, 12 Apr 2012 16:47:58 -0500
Subject: seccomp: remove duplicated failure logging

This consolidates the seccomp filter error logging path and adds more
details to the audit log.

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>

v18: make compat= permanent in the record
v15: added a return code to the audit_seccomp path by wad@chromium.org
     (suggested by eparis@redhat.com)
v*: original by keescook@chromium.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 include/linux/audit.h |  8 ++++----
 kernel/auditsc.c      |  8 ++++++--
 kernel/seccomp.c      | 15 +--------------
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index ed3ef197249..22f292a917a 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -463,7 +463,7 @@ extern void audit_putname(const char *name);
 extern void __audit_inode(const char *name, const struct dentry *dentry);
 extern void __audit_inode_child(const struct dentry *dentry,
 				const struct inode *parent);
-extern void __audit_seccomp(unsigned long syscall);
+extern void __audit_seccomp(unsigned long syscall, long signr, int code);
 extern void __audit_ptrace(struct task_struct *t);
 
 static inline int audit_dummy_context(void)
@@ -508,10 +508,10 @@ static inline void audit_inode_child(const struct dentry *dentry,
 }
 void audit_core_dumps(long signr);
 
-static inline void audit_seccomp(unsigned long syscall)
+static inline void audit_seccomp(unsigned long syscall, long signr, int code)
 {
 	if (unlikely(!audit_dummy_context()))
-		__audit_seccomp(syscall);
+		__audit_seccomp(syscall, signr, code);
 }
 
 static inline void audit_ptrace(struct task_struct *t)
@@ -634,7 +634,7 @@ extern int audit_signals;
 #define audit_inode(n,d) do { (void)(d); } while (0)
 #define audit_inode_child(i,p) do { ; } while (0)
 #define audit_core_dumps(i) do { ; } while (0)
-#define audit_seccomp(i) do { ; } while (0)
+#define audit_seccomp(i,s,c) do { ; } while (0)
 #define auditsc_get_stamp(c,t,s) (0)
 #define audit_get_loginuid(t) (-1)
 #define audit_get_sessionid(t) (-1)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index af1de0f34ea..4b96415527b 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -67,6 +67,7 @@
 #include <linux/syscalls.h>
 #include <linux/capability.h>
 #include <linux/fs_struct.h>
+#include <linux/compat.h>
 
 #include "audit.h"
 
@@ -2710,13 +2711,16 @@ void audit_core_dumps(long signr)
 	audit_log_end(ab);
 }
 
-void __audit_seccomp(unsigned long syscall)
+void __audit_seccomp(unsigned long syscall, long signr, int code)
 {
 	struct audit_buffer *ab;
 
 	ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_ANOM_ABEND);
-	audit_log_abend(ab, "seccomp", SIGKILL);
+	audit_log_abend(ab, "seccomp", signr);
 	audit_log_format(ab, " syscall=%ld", syscall);
+	audit_log_format(ab, " compat=%d", is_compat_task());
+	audit_log_format(ab, " ip=0x%lx", KSTK_EIP(current));
+	audit_log_format(ab, " code=0x%x", code);
 	audit_log_end(ab);
 }
 
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 0aeec1960f9..0f7c709a523 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -60,18 +60,6 @@ struct seccomp_filter {
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
-static void seccomp_filter_log_failure(int syscall)
-{
-	int compat = 0;
-#ifdef CONFIG_COMPAT
-	compat = is_compat_task();
-#endif
-	pr_info("%s[%d]: %ssystem call %d blocked at 0x%lx\n",
-		current->comm, task_pid_nr(current),
-		(compat ? "compat " : ""),
-		syscall, KSTK_EIP(current));
-}
-
 /**
  * get_u32 - returns a u32 offset into data
  * @data: a unsigned 64 bit value
@@ -381,7 +369,6 @@ void __secure_computing(int this_syscall)
 	case SECCOMP_MODE_FILTER:
 		if (seccomp_run_filters(this_syscall) == SECCOMP_RET_ALLOW)
 			return;
-		seccomp_filter_log_failure(this_syscall);
 		exit_sig = SIGSYS;
 		break;
 #endif
@@ -392,7 +379,7 @@ void __secure_computing(int this_syscall)
 #ifdef SECCOMP_DEBUG
 	dump_stack();
 #endif
-	audit_seccomp(this_syscall);
+	audit_seccomp(this_syscall, exit_code, SECCOMP_RET_KILL);
 	do_exit(exit_sig);
 }
 
-- 
cgit v1.2.3


From acf3b2c71ed20c53dc69826683417703c2a88059 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:47:59 -0500
Subject: seccomp: add SECCOMP_RET_ERRNO

This change adds the SECCOMP_RET_ERRNO as a valid return value from a
seccomp filter.  Additionally, it makes the first use of the lower
16-bits for storing a filter-supplied errno.  16-bits is more than
enough for the errno-base.h calls.

Returning errors instead of immediately terminating processes that
violate seccomp policy allow for broader use of this functionality
for kernel attack surface reduction.  For example, a linux container
could maintain a whitelist of pre-existing system calls but drop
all new ones with errnos.  This would keep a logically static attack
surface while providing errnos that may allow for graceful failure
without the downside of do_exit() on a bad call.

This change also changes the signature of __secure_computing.  It
appears the only direct caller is the arm entry code and it clobbers
any possible return value (register) immediately.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>

v18: - fix up comments and rebase
     - fix bad var name which was fixed in later revs
     - remove _int() and just change the __secure_computing signature
v16-v17: ...
v15: - use audit_seccomp and add a skip label. (eparis@redhat.com)
     - clean up and pad out return codes (indan@nul.nu)
v14: - no change/rebase
v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: - move to WARN_ON if filter is NULL
       (oleg@redhat.com, luto@mit.edu, keescook@chromium.org)
     - return immediately for filter==NULL (keescook@chromium.org)
     - change evaluation to only compare the ACTION so that layered
       errnos don't result in the lowest one being returned.
       (keeschook@chromium.org)
v11: - check for NULL filter (keescook@chromium.org)
v10: - change loaders to fn
 v9: - n/a
 v8: - update Kconfig to note new need for syscall_set_return_value.
     - reordered such that TRAP behavior follows on later.
     - made the for loop a little less indent-y
 v7: - introduced
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 arch/Kconfig            |  6 ++++--
 include/linux/seccomp.h | 10 ++++++----
 kernel/seccomp.c        | 42 ++++++++++++++++++++++++++++++++----------
 3 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 91c2c730fc1..beaab68c13b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -220,8 +220,10 @@ config HAVE_ARCH_SECCOMP_FILTER
 	bool
 	help
 	  This symbol should be selected by an architecure if it provides
-	  asm/syscall.h, specifically syscall_get_arguments() and
-	  syscall_get_arch().
+	  asm/syscall.h, specifically syscall_get_arguments(),
+	  syscall_get_arch(), and syscall_set_return_value().  Additionally,
+	  its system call entry path must respect a return value of -1 from
+	  __secure_computing() and/or secure_computing().
 
 config SECCOMP_FILTER
 	def_bool y
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 86bb68fc768..b4ce2c816e0 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -12,13 +12,14 @@
 
 /*
  * All BPF programs must return a 32-bit value.
- * The bottom 16-bits are reserved for future use.
+ * The bottom 16-bits are for optional return data.
  * The upper 16-bits are ordered from least permissive values to most.
  *
  * The ordering ensures that a min_t() over composed return values always
  * selects the least permissive choice.
  */
 #define SECCOMP_RET_KILL	0x00000000U /* kill the task immediately */
+#define SECCOMP_RET_ERRNO	0x00050000U /* returns an errno */
 #define SECCOMP_RET_ALLOW	0x7fff0000U /* allow */
 
 /* Masks for the return value sections. */
@@ -64,11 +65,12 @@ struct seccomp {
 	struct seccomp_filter *filter;
 };
 
-extern void __secure_computing(int);
-static inline void secure_computing(int this_syscall)
+extern int __secure_computing(int);
+static inline int secure_computing(int this_syscall)
 {
 	if (unlikely(test_thread_flag(TIF_SECCOMP)))
-		__secure_computing(this_syscall);
+		return  __secure_computing(this_syscall);
+	return 0;
 }
 
 extern long prctl_get_seccomp(void);
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 0f7c709a523..5f78fb6d221 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -199,15 +199,20 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
 static u32 seccomp_run_filters(int syscall)
 {
 	struct seccomp_filter *f;
-	u32 ret = SECCOMP_RET_KILL;
+	u32 ret = SECCOMP_RET_ALLOW;
+
+	/* Ensure unexpected behavior doesn't result in failing open. */
+	if (WARN_ON(current->seccomp.filter == NULL))
+		return SECCOMP_RET_KILL;
+
 	/*
 	 * All filters in the list are evaluated and the lowest BPF return
-	 * value always takes priority.
+	 * value always takes priority (ignoring the DATA).
 	 */
 	for (f = current->seccomp.filter; f; f = f->prev) {
-		ret = sk_run_filter(NULL, f->insns);
-		if (ret != SECCOMP_RET_ALLOW)
-			break;
+		u32 cur_ret = sk_run_filter(NULL, f->insns);
+		if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
+			ret = cur_ret;
 	}
 	return ret;
 }
@@ -346,11 +351,13 @@ static int mode1_syscalls_32[] = {
 };
 #endif
 
-void __secure_computing(int this_syscall)
+int __secure_computing(int this_syscall)
 {
 	int mode = current->seccomp.mode;
 	int exit_sig = 0;
 	int *syscall;
+	u32 ret = SECCOMP_RET_KILL;
+	int data;
 
 	switch (mode) {
 	case SECCOMP_MODE_STRICT:
@@ -361,14 +368,26 @@ void __secure_computing(int this_syscall)
 #endif
 		do {
 			if (*syscall == this_syscall)
-				return;
+				return 0;
 		} while (*++syscall);
 		exit_sig = SIGKILL;
 		break;
 #ifdef CONFIG_SECCOMP_FILTER
 	case SECCOMP_MODE_FILTER:
-		if (seccomp_run_filters(this_syscall) == SECCOMP_RET_ALLOW)
-			return;
+		ret = seccomp_run_filters(this_syscall);
+		data = ret & SECCOMP_RET_DATA;
+		switch (ret & SECCOMP_RET_ACTION) {
+		case SECCOMP_RET_ERRNO:
+			/* Set the low-order 16-bits as a errno. */
+			syscall_set_return_value(current, task_pt_regs(current),
+						 -data, 0);
+			goto skip;
+		case SECCOMP_RET_ALLOW:
+			return 0;
+		case SECCOMP_RET_KILL:
+		default:
+			break;
+		}
 		exit_sig = SIGSYS;
 		break;
 #endif
@@ -379,8 +398,11 @@ void __secure_computing(int this_syscall)
 #ifdef SECCOMP_DEBUG
 	dump_stack();
 #endif
-	audit_seccomp(this_syscall, exit_code, SECCOMP_RET_KILL);
+	audit_seccomp(this_syscall, exit_sig, ret);
 	do_exit(exit_sig);
+skip:
+	audit_seccomp(this_syscall, exit_sig, ret);
+	return -1;
 }
 
 long prctl_get_seccomp(void)
-- 
cgit v1.2.3


From a0727e8ce513fe6890416da960181ceb10fbfae6 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:48:00 -0500
Subject: signal, x86: add SIGSYS info and make it synchronous.

This change enables SIGSYS, defines _sigfields._sigsys, and adds
x86 (compat) arch support.  _sigsys defines fields which allow
a signal handler to receive the triggering system call number,
the relevant AUDIT_ARCH_* value for that number, and the address
of the callsite.

SIGSYS is added to the SYNCHRONOUS_MASK because it is desirable for it
to have setup_frame() called for it. The goal is to ensure that
ucontext_t reflects the machine state from the time-of-syscall and not
from another signal handler.

The first consumer of SIGSYS would be seccomp filter.  In particular,
a filter program could specify a new return value, SECCOMP_RET_TRAP,
which would result in the system call being denied and the calling
thread signaled.  This also means that implementing arch-specific
support can be dependent upon HAVE_ARCH_SECCOMP_FILTER.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: - added acked by, rebase
v17: - rebase and reviewed-by addition
v14: - rebase/nochanges
v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: - reworded changelog (oleg@redhat.com)
v11: - fix dropped words in the change description
     - added fallback copy_siginfo support.
     - added __ARCH_SIGSYS define to allow stepped arch support.
v10: - first version based on suggestion
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 arch/x86/ia32/ia32_signal.c   |  4 ++++
 arch/x86/include/asm/ia32.h   |  6 ++++++
 include/asm-generic/siginfo.h | 22 ++++++++++++++++++++++
 kernel/signal.c               |  9 ++++++++-
 4 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index a69245ba27e..0b3f2354f6a 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -67,6 +67,10 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 			switch (from->si_code >> 16) {
 			case __SI_FAULT >> 16:
 				break;
+			case __SI_SYS >> 16:
+				put_user_ex(from->si_syscall, &to->si_syscall);
+				put_user_ex(from->si_arch, &to->si_arch);
+				break;
 			case __SI_CHLD >> 16:
 				if (ia32) {
 					put_user_ex(from->si_utime, &to->si_utime);
diff --git a/arch/x86/include/asm/ia32.h b/arch/x86/include/asm/ia32.h
index ee52760549f..b04cbdb138c 100644
--- a/arch/x86/include/asm/ia32.h
+++ b/arch/x86/include/asm/ia32.h
@@ -144,6 +144,12 @@ typedef struct compat_siginfo {
 			int _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		struct {
+			unsigned int _call_addr; /* calling insn */
+			int _syscall;	/* triggering system call number */
+			unsigned int _arch;	/* AUDIT_ARCH_* of syscall */
+		} _sigsys;
 	} _sifields;
 } compat_siginfo_t;
 
diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 0dd4e87f6fb..31306f55eb0 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -90,9 +90,18 @@ typedef struct siginfo {
 			__ARCH_SI_BAND_T _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGSYS */
+		struct {
+			void __user *_call_addr; /* calling insn */
+			int _syscall;	/* triggering system call number */
+			unsigned int _arch;	/* AUDIT_ARCH_* of syscall */
+		} _sigsys;
 	} _sifields;
 } siginfo_t;
 
+/* If the arch shares siginfo, then it has SIGSYS. */
+#define __ARCH_SIGSYS
 #endif
 
 /*
@@ -116,6 +125,11 @@ typedef struct siginfo {
 #define si_addr_lsb	_sifields._sigfault._addr_lsb
 #define si_band		_sifields._sigpoll._band
 #define si_fd		_sifields._sigpoll._fd
+#ifdef __ARCH_SIGSYS
+#define si_call_addr	_sifields._sigsys._call_addr
+#define si_syscall	_sifields._sigsys._syscall
+#define si_arch		_sifields._sigsys._arch
+#endif
 
 #ifdef __KERNEL__
 #define __SI_MASK	0xffff0000u
@@ -126,6 +140,7 @@ typedef struct siginfo {
 #define __SI_CHLD	(4 << 16)
 #define __SI_RT		(5 << 16)
 #define __SI_MESGQ	(6 << 16)
+#define __SI_SYS	(7 << 16)
 #define __SI_CODE(T,N)	((T) | ((N) & 0xffff))
 #else
 #define __SI_KILL	0
@@ -135,6 +150,7 @@ typedef struct siginfo {
 #define __SI_CHLD	0
 #define __SI_RT		0
 #define __SI_MESGQ	0
+#define __SI_SYS	0
 #define __SI_CODE(T,N)	(N)
 #endif
 
@@ -231,6 +247,12 @@ typedef struct siginfo {
 #define POLL_HUP	(__SI_POLL|6)	/* device disconnected */
 #define NSIGPOLL	6
 
+/*
+ * SIGSYS si_codes
+ */
+#define SYS_SECCOMP		(__SI_SYS|1)	/* seccomp triggered */
+#define NSIGSYS	1
+
 /*
  * sigevent definitions
  * 
diff --git a/kernel/signal.c b/kernel/signal.c
index 17afcaf582d..1a006b5d9d9 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -160,7 +160,7 @@ void recalc_sigpending(void)
 
 #define SYNCHRONOUS_MASK \
 	(sigmask(SIGSEGV) | sigmask(SIGBUS) | sigmask(SIGILL) | \
-	 sigmask(SIGTRAP) | sigmask(SIGFPE))
+	 sigmask(SIGTRAP) | sigmask(SIGFPE) | sigmask(SIGSYS))
 
 int next_signal(struct sigpending *pending, sigset_t *mask)
 {
@@ -2706,6 +2706,13 @@ int copy_siginfo_to_user(siginfo_t __user *to, siginfo_t *from)
 		err |= __put_user(from->si_uid, &to->si_uid);
 		err |= __put_user(from->si_ptr, &to->si_ptr);
 		break;
+#ifdef __ARCH_SIGSYS
+	case __SI_SYS:
+		err |= __put_user(from->si_call_addr, &to->si_call_addr);
+		err |= __put_user(from->si_syscall, &to->si_syscall);
+		err |= __put_user(from->si_arch, &to->si_arch);
+		break;
+#endif
 	default: /* this is just in case for now ... */
 		err |= __put_user(from->si_pid, &to->si_pid);
 		err |= __put_user(from->si_uid, &to->si_uid);
-- 
cgit v1.2.3


From bb6ea4301a1109afdacaee576fedbfcd7152fc86 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:48:01 -0500
Subject: seccomp: Add SECCOMP_RET_TRAP

Adds a new return value to seccomp filters that triggers a SIGSYS to be
delivered with the new SYS_SECCOMP si_code.

This allows in-process system call emulation, including just specifying
an errno or cleanly dumping core, rather than just dying.

Suggested-by: Markus Gutschke <markus@chromium.org>
Suggested-by: Julien Tinnes <jln@chromium.org>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>

v18: - acked-by, rebase
     - don't mention secure_computing_int() anymore
v15: - use audit_seccomp/skip
     - pad out error spacing; clean up switch (indan@nul.nu)
v14: - n/a
v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: - rebase on to linux-next
v11: - clarify the comment (indan@nul.nu)
     - s/sigtrap/sigsys
v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig
       note suggested-by (though original suggestion had other behaviors)
v9:  - changes to SIGILL
v8:  - clean up based on changes to dependent patches
v7:  - introduction
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 arch/Kconfig                  | 14 +++++++++-----
 include/asm-generic/siginfo.h |  2 +-
 include/linux/seccomp.h       |  1 +
 kernel/seccomp.c              | 26 ++++++++++++++++++++++++++
 4 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index beaab68c13b..66aef13f603 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -219,11 +219,15 @@ config ARCH_WANT_OLD_COMPAT_IPC
 config HAVE_ARCH_SECCOMP_FILTER
 	bool
 	help
-	  This symbol should be selected by an architecure if it provides
-	  asm/syscall.h, specifically syscall_get_arguments(),
-	  syscall_get_arch(), and syscall_set_return_value().  Additionally,
-	  its system call entry path must respect a return value of -1 from
-	  __secure_computing() and/or secure_computing().
+	  This symbol should be selected by an architecure if it provides:
+	  asm/syscall.h:
+	  - syscall_get_arch()
+	  - syscall_get_arguments()
+	  - syscall_rollback()
+	  - syscall_set_return_value()
+	  SIGSYS siginfo_t support must be implemented.
+	  __secure_computing()/secure_computing()'s return value must be
+	  checked, with -1 resulting in the syscall being skipped.
 
 config SECCOMP_FILTER
 	def_bool y
diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 31306f55eb0..af5d0350f84 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -93,7 +93,7 @@ typedef struct siginfo {
 
 		/* SIGSYS */
 		struct {
-			void __user *_call_addr; /* calling insn */
+			void __user *_call_addr; /* calling user insn */
 			int _syscall;	/* triggering system call number */
 			unsigned int _arch;	/* AUDIT_ARCH_* of syscall */
 		} _sigsys;
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index b4ce2c816e0..317ccb78cf4 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -19,6 +19,7 @@
  * selects the least permissive choice.
  */
 #define SECCOMP_RET_KILL	0x00000000U /* kill the task immediately */
+#define SECCOMP_RET_TRAP	0x00030000U /* disallow and force a SIGSYS */
 #define SECCOMP_RET_ERRNO	0x00050000U /* returns an errno */
 #define SECCOMP_RET_ALLOW	0x7fff0000U /* allow */
 
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 5f78fb6d221..9c3830692a0 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -332,6 +332,26 @@ void put_seccomp_filter(struct task_struct *tsk)
 		kfree(freeme);
 	}
 }
+
+/**
+ * seccomp_send_sigsys - signals the task to allow in-process syscall emulation
+ * @syscall: syscall number to send to userland
+ * @reason: filter-supplied reason code to send to userland (via si_errno)
+ *
+ * Forces a SIGSYS with a code of SYS_SECCOMP and related sigsys info.
+ */
+static void seccomp_send_sigsys(int syscall, int reason)
+{
+	struct siginfo info;
+	memset(&info, 0, sizeof(info));
+	info.si_signo = SIGSYS;
+	info.si_code = SYS_SECCOMP;
+	info.si_call_addr = (void __user *)KSTK_EIP(current);
+	info.si_errno = reason;
+	info.si_arch = syscall_get_arch(current, task_pt_regs(current));
+	info.si_syscall = syscall;
+	force_sig_info(SIGSYS, &info, current);
+}
 #endif	/* CONFIG_SECCOMP_FILTER */
 
 /*
@@ -382,6 +402,12 @@ int __secure_computing(int this_syscall)
 			syscall_set_return_value(current, task_pt_regs(current),
 						 -data, 0);
 			goto skip;
+		case SECCOMP_RET_TRAP:
+			/* Show the handler the original registers. */
+			syscall_rollback(current, task_pt_regs(current));
+			/* Let the filter pass back 16 bits of data. */
+			seccomp_send_sigsys(this_syscall, data);
+			goto skip;
 		case SECCOMP_RET_ALLOW:
 			return 0;
 		case SECCOMP_RET_KILL:
-- 
cgit v1.2.3


From fb0fadf9b213f55ca9368f3edafe51101d5d2deb Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:48:02 -0500
Subject: ptrace,seccomp: Add PTRACE_SECCOMP support

This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP,
and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE.

When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the
tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that
results in a BPF program returning SECCOMP_RET_TRACE.  The 16-bit
SECCOMP_RET_DATA mask of the BPF program return value will be passed as
the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG.

If the subordinate process is not using seccomp filter, then no
system call notifications will occur even if the option is specified.

If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE
is returned, the system call will not be executed and an -ENOSYS errno
will be returned to userspace.

This change adds a dependency on the system call slow path.  Any future
efforts to use the system call fast path for seccomp filter will need to
address this restriction.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>

v18: - rebase
     - comment fatal_signal check
     - acked-by
     - drop secure_computing_int comment
v17: - ...
v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu)
       [note PT_TRACE_MASK disappears in linux-next]
v15: - add audit support for non-zero return codes
     - clean up style (indan@nul.nu)
v14: - rebase/nochanges
v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
       (Brings back a change to ptrace.c and the masks.)
v12: - rebase to linux-next
     - use ptrace_event and update arch/Kconfig to mention slow-path dependency
     - drop all tracehook changes and inclusion (oleg@redhat.com)
v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator
       (indan@nul.nu)
v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP
v9:  - n/a
v8:  - guarded PTRACE_SECCOMP use with an ifdef
v7:  - introduced
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 arch/Kconfig            | 10 +++++-----
 include/linux/ptrace.h  |  5 ++++-
 include/linux/seccomp.h |  1 +
 kernel/seccomp.c        | 16 ++++++++++++++++
 4 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 66aef13f603..c024b3ed667 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -219,15 +219,15 @@ config ARCH_WANT_OLD_COMPAT_IPC
 config HAVE_ARCH_SECCOMP_FILTER
 	bool
 	help
-	  This symbol should be selected by an architecure if it provides:
-	  asm/syscall.h:
+	  An arch should select this symbol if it provides all of these things:
 	  - syscall_get_arch()
 	  - syscall_get_arguments()
 	  - syscall_rollback()
 	  - syscall_set_return_value()
-	  SIGSYS siginfo_t support must be implemented.
-	  __secure_computing()/secure_computing()'s return value must be
-	  checked, with -1 resulting in the syscall being skipped.
+	  - SIGSYS siginfo_t support
+	  - secure_computing is called from a ptrace_event()-safe context
+	  - secure_computing return value is checked and a return value of -1
+	    results in the system call being skipped immediately.
 
 config SECCOMP_FILTER
 	def_bool y
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 5c719627c2a..597e4fdb97f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -58,6 +58,7 @@
 #define PTRACE_EVENT_EXEC	4
 #define PTRACE_EVENT_VFORK_DONE	5
 #define PTRACE_EVENT_EXIT	6
+#define PTRACE_EVENT_SECCOMP	7
 /* Extended result codes which enabled by means other than options.  */
 #define PTRACE_EVENT_STOP	128
 
@@ -69,8 +70,9 @@
 #define PTRACE_O_TRACEEXEC	(1 << PTRACE_EVENT_EXEC)
 #define PTRACE_O_TRACEVFORKDONE	(1 << PTRACE_EVENT_VFORK_DONE)
 #define PTRACE_O_TRACEEXIT	(1 << PTRACE_EVENT_EXIT)
+#define PTRACE_O_TRACESECCOMP	(1 << PTRACE_EVENT_SECCOMP)
 
-#define PTRACE_O_MASK		0x0000007f
+#define PTRACE_O_MASK		0x000000ff
 
 #include <asm/ptrace.h>
 
@@ -98,6 +100,7 @@
 #define PT_TRACE_EXEC		PT_EVENT_FLAG(PTRACE_EVENT_EXEC)
 #define PT_TRACE_VFORK_DONE	PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE)
 #define PT_TRACE_EXIT		PT_EVENT_FLAG(PTRACE_EVENT_EXIT)
+#define PT_TRACE_SECCOMP	PT_EVENT_FLAG(PTRACE_EVENT_SECCOMP)
 
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT	31
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 317ccb78cf4..5818e869651 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -21,6 +21,7 @@
 #define SECCOMP_RET_KILL	0x00000000U /* kill the task immediately */
 #define SECCOMP_RET_TRAP	0x00030000U /* disallow and force a SIGSYS */
 #define SECCOMP_RET_ERRNO	0x00050000U /* returns an errno */
+#define SECCOMP_RET_TRACE	0x7ff00000U /* pass to a tracer or disallow */
 #define SECCOMP_RET_ALLOW	0x7fff0000U /* allow */
 
 /* Masks for the return value sections. */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 9c3830692a0..d9db6ec46bc 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -24,6 +24,7 @@
 #ifdef CONFIG_SECCOMP_FILTER
 #include <asm/syscall.h>
 #include <linux/filter.h>
+#include <linux/ptrace.h>
 #include <linux/security.h>
 #include <linux/slab.h>
 #include <linux/tracehook.h>
@@ -408,6 +409,21 @@ int __secure_computing(int this_syscall)
 			/* Let the filter pass back 16 bits of data. */
 			seccomp_send_sigsys(this_syscall, data);
 			goto skip;
+		case SECCOMP_RET_TRACE:
+			/* Skip these calls if there is no tracer. */
+			if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP))
+				goto skip;
+			/* Allow the BPF to provide the event message */
+			ptrace_event(PTRACE_EVENT_SECCOMP, data);
+			/*
+			 * The delivery of a fatal signal during event
+			 * notification may silently skip tracer notification.
+			 * Terminating the task now avoids executing a system
+			 * call that may not be intended.
+			 */
+			if (fatal_signal_pending(current))
+				break;
+			return 0;
 		case SECCOMP_RET_ALLOW:
 			return 0;
 		case SECCOMP_RET_KILL:
-- 
cgit v1.2.3


From c6cfbeb4029610c8c330c312dcf4d514cc067554 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:48:03 -0500
Subject: x86: Enable HAVE_ARCH_SECCOMP_FILTER

Enable support for seccomp filter on x86:
- syscall_get_arch()
- syscall_get_arguments()
- syscall_rollback()
- syscall_set_return_value()
- SIGSYS siginfo_t support
- secure_computing is called from a ptrace_event()-safe context
- secure_computing return value is checked (see below).

SECCOMP_RET_TRACE and SECCOMP_RET_TRAP may result in seccomp needing to
skip a system call without killing the process.  This is done by
returning a non-zero (-1) value from secure_computing.  This change
makes x86 respect that return value.

To ensure that minimal kernel code is exposed, a non-zero return value
results in an immediate return to user space (with an invalid syscall
number).

Signed-off-by: Will Drewry <wad@chromium.org>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>

v18: rebase and tweaked change description, acked-by
v17: added reviewed by and rebased
v..: all rebases since original introduction.
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 arch/x86/Kconfig         | 1 +
 arch/x86/kernel/ptrace.c | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d14cc6b79a..3a41c4424a0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -82,6 +82,7 @@ config X86
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select GENERIC_IOMAP
 	select DCACHE_WORD_ACCESS if !DEBUG_PAGEALLOC
+	select HAVE_ARCH_SECCOMP_FILTER
 
 config INSTRUCTION_DECODER
 	def_bool (KPROBES || PERF_EVENTS)
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 685845cf16e..13b1990c7c5 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1480,7 +1480,11 @@ long syscall_trace_enter(struct pt_regs *regs)
 		regs->flags |= X86_EFLAGS_TF;
 
 	/* do the secure computing check first */
-	secure_computing(regs->orig_ax);
+	if (secure_computing(regs->orig_ax)) {
+		/* seccomp failures shouldn't expose any additional code. */
+		ret = -1L;
+		goto out;
+	}
 
 	if (unlikely(test_thread_flag(TIF_SYSCALL_EMU)))
 		ret = -1L;
@@ -1505,6 +1509,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 				    regs->dx, regs->r10);
 #endif
 
+out:
 	return ret ?: regs->orig_ax;
 }
 
-- 
cgit v1.2.3


From 8ac270d1e29f0428228ab2b9a8ae5e1ed4a5cd84 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Thu, 12 Apr 2012 16:48:04 -0500
Subject: Documentation: prctl/seccomp_filter

Documents how system call filtering using Berkeley Packet
Filter programs works and how it may be used.
Includes an example for x86 and a semi-generic
example using a macro-based code generator.

Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>

v18: - added acked by
     - update no new privs numbers
v17: - remove @compat note and add Pitfalls section for arch checking
       (keescook@chromium.org)
v16: -
v15: -
v14: - rebase/nochanges
v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: - comment on the ptrace_event use
     - update arch support comment
     - note the behavior of SECCOMP_RET_DATA when there are multiple filters
       (keescook@chromium.org)
     - lots of samples/ clean up incl 64-bit bpf-direct support
       (markus@chromium.org)
     - rebase to linux-next
v11: - overhaul return value language, updates (keescook@chromium.org)
     - comment on do_exit(SIGSYS)
v10: - update for SIGSYS
     - update for new seccomp_data layout
     - update for ptrace option use
v9: - updated bpf-direct.c for SIGILL
v8: - add PR_SET_NO_NEW_PRIVS to the samples.
v7: - updated for all the new stuff in v7: TRAP, TRACE
    - only talk about PR_SET_SECCOMP now
    - fixed bad JLE32 check (coreyb@linux.vnet.ibm.com)
    - adds dropper.c: a simple system call disabler
v6: - tweak the language to note the requirement of
      PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu)
v5: - update sample to use system call arguments
    - adds a "fancy" example using a macro-based generator
    - cleaned up bpf in the sample
    - update docs to mention arguments
    - fix prctl value (eparis@redhat.com)
    - language cleanup (rdunlap@xenotime.net)
v4: - update for no_new_privs use
    - minor tweaks
v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net)
    - document use of tentative always-unprivileged
    - guard sample compilation for i386 and x86_64
v2: - move code to samples (corbet@lwn.net)
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 Documentation/prctl/seccomp_filter.txt | 163 ++++++++++++++++++++++
 samples/Makefile                       |   2 +-
 samples/seccomp/Makefile               |  38 ++++++
 samples/seccomp/bpf-direct.c           | 176 ++++++++++++++++++++++++
 samples/seccomp/bpf-fancy.c            | 102 ++++++++++++++
 samples/seccomp/bpf-helper.c           |  89 ++++++++++++
 samples/seccomp/bpf-helper.h           | 238 +++++++++++++++++++++++++++++++++
 samples/seccomp/dropper.c              |  68 ++++++++++
 8 files changed, 875 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/prctl/seccomp_filter.txt
 create mode 100644 samples/seccomp/Makefile
 create mode 100644 samples/seccomp/bpf-direct.c
 create mode 100644 samples/seccomp/bpf-fancy.c
 create mode 100644 samples/seccomp/bpf-helper.c
 create mode 100644 samples/seccomp/bpf-helper.h
 create mode 100644 samples/seccomp/dropper.c

diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
new file mode 100644
index 00000000000..597c3c58137
--- /dev/null
+++ b/Documentation/prctl/seccomp_filter.txt
@@ -0,0 +1,163 @@
+		SECure COMPuting with filters
+		=============================
+
+Introduction
+------------
+
+A large number of system calls are exposed to every userland process
+with many of them going unused for the entire lifetime of the process.
+As system calls change and mature, bugs are found and eradicated.  A
+certain subset of userland applications benefit by having a reduced set
+of available system calls.  The resulting set reduces the total kernel
+surface exposed to the application.  System call filtering is meant for
+use with those applications.
+
+Seccomp filtering provides a means for a process to specify a filter for
+incoming system calls.  The filter is expressed as a Berkeley Packet
+Filter (BPF) program, as with socket filters, except that the data
+operated on is related to the system call being made: system call
+number and the system call arguments.  This allows for expressive
+filtering of system calls using a filter program language with a long
+history of being exposed to userland and a straightforward data set.
+
+Additionally, BPF makes it impossible for users of seccomp to fall prey
+to time-of-check-time-of-use (TOCTOU) attacks that are common in system
+call interposition frameworks.  BPF programs may not dereference
+pointers which constrains all filters to solely evaluating the system
+call arguments directly.
+
+What it isn't
+-------------
+
+System call filtering isn't a sandbox.  It provides a clearly defined
+mechanism for minimizing the exposed kernel surface.  It is meant to be
+a tool for sandbox developers to use.  Beyond that, policy for logical
+behavior and information flow should be managed with a combination of
+other system hardening techniques and, potentially, an LSM of your
+choosing.  Expressive, dynamic filters provide further options down this
+path (avoiding pathological sizes or selecting which of the multiplexed
+system calls in socketcall() is allowed, for instance) which could be
+construed, incorrectly, as a more complete sandboxing solution.
+
+Usage
+-----
+
+An additional seccomp mode is added and is enabled using the same
+prctl(2) call as the strict seccomp.  If the architecture has
+CONFIG_HAVE_ARCH_SECCOMP_FILTER, then filters may be added as below:
+
+PR_SET_SECCOMP:
+	Now takes an additional argument which specifies a new filter
+	using a BPF program.
+	The BPF program will be executed over struct seccomp_data
+	reflecting the system call number, arguments, and other
+	metadata.  The BPF program must then return one of the
+	acceptable values to inform the kernel which action should be
+	taken.
+
+	Usage:
+		prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog);
+
+	The 'prog' argument is a pointer to a struct sock_fprog which
+	will contain the filter program.  If the program is invalid, the
+	call will return -1 and set errno to EINVAL.
+
+	If fork/clone and execve are allowed by @prog, any child
+	processes will be constrained to the same filters and system
+	call ABI as the parent.
+
+	Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
+	run with CAP_SYS_ADMIN privileges in its namespace.  If these are not
+	true, -EACCES will be returned.  This requirement ensures that filter
+	programs cannot be applied to child processes with greater privileges
+	than the task that installed them.
+
+	Additionally, if prctl(2) is allowed by the attached filter,
+	additional filters may be layered on which will increase evaluation
+	time, but allow for further decreasing the attack surface during
+	execution of a process.
+
+The above call returns 0 on success and non-zero on error.
+
+Return values
+-------------
+A seccomp filter may return any of the following values. If multiple
+filters exist, the return value for the evaluation of a given system
+call will always use the highest precedent value. (For example,
+SECCOMP_RET_KILL will always take precedence.)
+
+In precedence order, they are:
+
+SECCOMP_RET_KILL:
+	Results in the task exiting immediately without executing the
+	system call.  The exit status of the task (status & 0x7f) will
+	be SIGSYS, not SIGKILL.
+
+SECCOMP_RET_TRAP:
+	Results in the kernel sending a SIGSYS signal to the triggering
+	task without executing the system call.  The kernel will
+	rollback the register state to just before the system call
+	entry such that a signal handler in the task will be able to
+	inspect the ucontext_t->uc_mcontext registers and emulate
+	system call success or failure upon return from the signal
+	handler.
+
+	The SECCOMP_RET_DATA portion of the return value will be passed
+	as si_errno.
+
+	SIGSYS triggered by seccomp will have a si_code of SYS_SECCOMP.
+
+SECCOMP_RET_ERRNO:
+	Results in the lower 16-bits of the return value being passed
+	to userland as the errno without executing the system call.
+
+SECCOMP_RET_TRACE:
+	When returned, this value will cause the kernel to attempt to
+	notify a ptrace()-based tracer prior to executing the system
+	call.  If there is no tracer present, -ENOSYS is returned to
+	userland and the system call is not executed.
+
+	A tracer will be notified if it requests PTRACE_O_TRACESECCOMP
+	using ptrace(PTRACE_SETOPTIONS).  The tracer will be notified
+	of a PTRACE_EVENT_SECCOMP and the SECCOMP_RET_DATA portion of
+	the BPF program return value will be available to the tracer
+	via PTRACE_GETEVENTMSG.
+
+SECCOMP_RET_ALLOW:
+	Results in the system call being executed.
+
+If multiple filters exist, the return value for the evaluation of a
+given system call will always use the highest precedent value.
+
+Precedence is only determined using the SECCOMP_RET_ACTION mask.  When
+multiple filters return values of the same precedence, only the
+SECCOMP_RET_DATA from the most recently installed filter will be
+returned.
+
+Pitfalls
+--------
+
+The biggest pitfall to avoid during use is filtering on system call
+number without checking the architecture value.  Why?  On any
+architecture that supports multiple system call invocation conventions,
+the system call numbers may vary based on the specific invocation.  If
+the numbers in the different calling conventions overlap, then checks in
+the filters may be abused.  Always check the arch value!
+
+Example
+-------
+
+The samples/seccomp/ directory contains both an x86-specific example
+and a more generic example of a higher level macro interface for BPF
+program generation.
+
+
+
+Adding architecture support
+-----------------------
+
+See arch/Kconfig for the authoritative requirements.  In general, if an
+architecture supports both ptrace_event and seccomp, it will be able to
+support seccomp filter with minor fixup: SIGSYS support and seccomp return
+value checking.  Then it must just add CONFIG_HAVE_ARCH_SECCOMP_FILTER
+to its arch-specific Kconfig.
diff --git a/samples/Makefile b/samples/Makefile
index 2f75851ec62..5ef08bba96c 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -1,4 +1,4 @@
 # Makefile for Linux samples code
 
 obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ tracepoints/ trace_events/ \
-			   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/
+			   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/
diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
new file mode 100644
index 00000000000..e8fe0f57b68
--- /dev/null
+++ b/samples/seccomp/Makefile
@@ -0,0 +1,38 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-$(CONFIG_SECCOMP) := bpf-fancy dropper
+bpf-fancy-objs := bpf-fancy.o bpf-helper.o
+
+HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include
+HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include
+HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include
+HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include
+
+HOSTCFLAGS_dropper.o += -I$(objtree)/usr/include
+HOSTCFLAGS_dropper.o += -idirafter $(objtree)/include
+dropper-objs := dropper.o
+
+# bpf-direct.c is x86-only.
+ifeq ($(SRCARCH),x86)
+# List of programs to build
+hostprogs-$(CONFIG_SECCOMP) += bpf-direct
+bpf-direct-objs := bpf-direct.o
+endif
+
+HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include
+HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include
+
+# Try to match the kernel target.
+ifeq ($(CONFIG_64BIT),)
+HOSTCFLAGS_bpf-direct.o += -m32
+HOSTCFLAGS_dropper.o += -m32
+HOSTCFLAGS_bpf-helper.o += -m32
+HOSTCFLAGS_bpf-fancy.o += -m32
+HOSTLOADLIBES_bpf-direct += -m32
+HOSTLOADLIBES_bpf-fancy += -m32
+HOSTLOADLIBES_dropper += -m32
+endif
+
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
diff --git a/samples/seccomp/bpf-direct.c b/samples/seccomp/bpf-direct.c
new file mode 100644
index 00000000000..26f523e6ed7
--- /dev/null
+++ b/samples/seccomp/bpf-direct.c
@@ -0,0 +1,176 @@
+/*
+ * Seccomp filter example for x86 (32-bit and 64-bit) with BPF macros
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_SET_SECCOMP, 2, ...).
+ */
+#define __USE_GNU 1
+#define _GNU_SOURCE 1
+
+#include <linux/types.h>
+#include <linux/filter.h>
+#include <linux/seccomp.h>
+#include <linux/unistd.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stddef.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+
+#define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
+#define syscall_nr (offsetof(struct seccomp_data, nr))
+
+#if defined(__i386__)
+#define REG_RESULT	REG_EAX
+#define REG_SYSCALL	REG_EAX
+#define REG_ARG0	REG_EBX
+#define REG_ARG1	REG_ECX
+#define REG_ARG2	REG_EDX
+#define REG_ARG3	REG_ESI
+#define REG_ARG4	REG_EDI
+#define REG_ARG5	REG_EBP
+#elif defined(__x86_64__)
+#define REG_RESULT	REG_RAX
+#define REG_SYSCALL	REG_RAX
+#define REG_ARG0	REG_RDI
+#define REG_ARG1	REG_RSI
+#define REG_ARG2	REG_RDX
+#define REG_ARG3	REG_R10
+#define REG_ARG4	REG_R8
+#define REG_ARG5	REG_R9
+#else
+#error Unsupported platform
+#endif
+
+#ifndef PR_SET_NO_NEW_PRIVS
+#define PR_SET_NO_NEW_PRIVS 38
+#endif
+
+#ifndef SYS_SECCOMP
+#define SYS_SECCOMP 1
+#endif
+
+static void emulator(int nr, siginfo_t *info, void *void_context)
+{
+	ucontext_t *ctx = (ucontext_t *)(void_context);
+	int syscall;
+	char *buf;
+	ssize_t bytes;
+	size_t len;
+	if (info->si_code != SYS_SECCOMP)
+		return;
+	if (!ctx)
+		return;
+	syscall = ctx->uc_mcontext.gregs[REG_SYSCALL];
+	buf = (char *) ctx->uc_mcontext.gregs[REG_ARG1];
+	len = (size_t) ctx->uc_mcontext.gregs[REG_ARG2];
+
+	if (syscall != __NR_write)
+		return;
+	if (ctx->uc_mcontext.gregs[REG_ARG0] != STDERR_FILENO)
+		return;
+	/* Redirect stderr messages to stdout. Doesn't handle EINTR, etc */
+	ctx->uc_mcontext.gregs[REG_RESULT] = -1;
+	if (write(STDOUT_FILENO, "[ERR] ", 6) > 0) {
+		bytes = write(STDOUT_FILENO, buf, len);
+		ctx->uc_mcontext.gregs[REG_RESULT] = bytes;
+	}
+	return;
+}
+
+static int install_emulator(void)
+{
+	struct sigaction act;
+	sigset_t mask;
+	memset(&act, 0, sizeof(act));
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGSYS);
+
+	act.sa_sigaction = &emulator;
+	act.sa_flags = SA_SIGINFO;
+	if (sigaction(SIGSYS, &act, NULL) < 0) {
+		perror("sigaction");
+		return -1;
+	}
+	if (sigprocmask(SIG_UNBLOCK, &mask, NULL)) {
+		perror("sigprocmask");
+		return -1;
+	}
+	return 0;
+}
+
+static int install_filter(void)
+{
+	struct sock_filter filter[] = {
+		/* Grab the system call number */
+		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr),
+		/* Jump table for the allowed syscalls */
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 0, 1),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+#ifdef __NR_sigreturn
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 0, 1),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+#endif
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 0, 1),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 0, 1),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 3, 2),
+
+		/* Check that read is only using stdin. */
+		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 4, 0),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
+
+		/* Check that write is only using stdout */
+		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0),
+		/* Trap attempts to write to stderr */
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 1, 2),
+
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_TRAP),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
+	};
+	struct sock_fprog prog = {
+		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+		.filter = filter,
+	};
+
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("prctl(NO_NEW_PRIVS)");
+		return 1;
+	}
+
+
+	if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
+		perror("prctl");
+		return 1;
+	}
+	return 0;
+}
+
+#define payload(_c) (_c), sizeof((_c))
+int main(int argc, char **argv)
+{
+	char buf[4096];
+	ssize_t bytes = 0;
+	if (install_emulator())
+		return 1;
+	if (install_filter())
+		return 1;
+	syscall(__NR_write, STDOUT_FILENO,
+		payload("OHAI! WHAT IS YOUR NAME? "));
+	bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf));
+	syscall(__NR_write, STDOUT_FILENO, payload("HELLO, "));
+	syscall(__NR_write, STDOUT_FILENO, buf, bytes);
+	syscall(__NR_write, STDERR_FILENO,
+		payload("Error message going to STDERR\n"));
+	return 0;
+}
diff --git a/samples/seccomp/bpf-fancy.c b/samples/seccomp/bpf-fancy.c
new file mode 100644
index 00000000000..8eb483aaec4
--- /dev/null
+++ b/samples/seccomp/bpf-fancy.c
@@ -0,0 +1,102 @@
+/*
+ * Seccomp BPF example using a macro-based generator.
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
+ */
+
+#include <linux/filter.h>
+#include <linux/seccomp.h>
+#include <linux/unistd.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+
+#include "bpf-helper.h"
+
+#ifndef PR_SET_NO_NEW_PRIVS
+#define PR_SET_NO_NEW_PRIVS 38
+#endif
+
+int main(int argc, char **argv)
+{
+	struct bpf_labels l;
+	static const char msg1[] = "Please type something: ";
+	static const char msg2[] = "You typed: ";
+	char buf[256];
+	struct sock_filter filter[] = {
+		/* TODO: LOAD_SYSCALL_NR(arch) and enforce an arch */
+		LOAD_SYSCALL_NR,
+		SYSCALL(__NR_exit, ALLOW),
+		SYSCALL(__NR_exit_group, ALLOW),
+		SYSCALL(__NR_write, JUMP(&l, write_fd)),
+		SYSCALL(__NR_read, JUMP(&l, read)),
+		DENY,  /* Don't passthrough into a label */
+
+		LABEL(&l, read),
+		ARG(0),
+		JNE(STDIN_FILENO, DENY),
+		ARG(1),
+		JNE((unsigned long)buf, DENY),
+		ARG(2),
+		JGE(sizeof(buf), DENY),
+		ALLOW,
+
+		LABEL(&l, write_fd),
+		ARG(0),
+		JEQ(STDOUT_FILENO, JUMP(&l, write_buf)),
+		JEQ(STDERR_FILENO, JUMP(&l, write_buf)),
+		DENY,
+
+		LABEL(&l, write_buf),
+		ARG(1),
+		JEQ((unsigned long)msg1, JUMP(&l, msg1_len)),
+		JEQ((unsigned long)msg2, JUMP(&l, msg2_len)),
+		JEQ((unsigned long)buf, JUMP(&l, buf_len)),
+		DENY,
+
+		LABEL(&l, msg1_len),
+		ARG(2),
+		JLT(sizeof(msg1), ALLOW),
+		DENY,
+
+		LABEL(&l, msg2_len),
+		ARG(2),
+		JLT(sizeof(msg2), ALLOW),
+		DENY,
+
+		LABEL(&l, buf_len),
+		ARG(2),
+		JLT(sizeof(buf), ALLOW),
+		DENY,
+	};
+	struct sock_fprog prog = {
+		.filter = filter,
+		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+	};
+	ssize_t bytes;
+	bpf_resolve_jumps(&l, filter, sizeof(filter)/sizeof(*filter));
+
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("prctl(NO_NEW_PRIVS)");
+		return 1;
+	}
+
+	if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
+		perror("prctl(SECCOMP)");
+		return 1;
+	}
+	syscall(__NR_write, STDOUT_FILENO, msg1, strlen(msg1));
+	bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)-1);
+	bytes = (bytes > 0 ? bytes : 0);
+	syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2));
+	syscall(__NR_write, STDERR_FILENO, buf, bytes);
+	/* Now get killed */
+	syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)+2);
+	return 0;
+}
diff --git a/samples/seccomp/bpf-helper.c b/samples/seccomp/bpf-helper.c
new file mode 100644
index 00000000000..579cfe33188
--- /dev/null
+++ b/samples/seccomp/bpf-helper.c
@@ -0,0 +1,89 @@
+/*
+ * Seccomp BPF helper functions
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
+ */
+
+#include <stdio.h>
+#include <string.h>
+
+#include "bpf-helper.h"
+
+int bpf_resolve_jumps(struct bpf_labels *labels,
+		      struct sock_filter *filter, size_t count)
+{
+	struct sock_filter *begin = filter;
+	__u8 insn = count - 1;
+
+	if (count < 1)
+		return -1;
+	/*
+	* Walk it once, backwards, to build the label table and do fixups.
+	* Since backward jumps are disallowed by BPF, this is easy.
+	*/
+	filter += insn;
+	for (; filter >= begin; --insn, --filter) {
+		if (filter->code != (BPF_JMP+BPF_JA))
+			continue;
+		switch ((filter->jt<<8)|filter->jf) {
+		case (JUMP_JT<<8)|JUMP_JF:
+			if (labels->labels[filter->k].location == 0xffffffff) {
+				fprintf(stderr, "Unresolved label: '%s'\n",
+					labels->labels[filter->k].label);
+				return 1;
+			}
+			filter->k = labels->labels[filter->k].location -
+				    (insn + 1);
+			filter->jt = 0;
+			filter->jf = 0;
+			continue;
+		case (LABEL_JT<<8)|LABEL_JF:
+			if (labels->labels[filter->k].location != 0xffffffff) {
+				fprintf(stderr, "Duplicate label use: '%s'\n",
+					labels->labels[filter->k].label);
+				return 1;
+			}
+			labels->labels[filter->k].location = insn;
+			filter->k = 0; /* fall through */
+			filter->jt = 0;
+			filter->jf = 0;
+			continue;
+		}
+	}
+	return 0;
+}
+
+/* Simple lookup table for labels. */
+__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label)
+{
+	struct __bpf_label *begin = labels->labels, *end;
+	int id;
+	if (labels->count == 0) {
+		begin->label = label;
+		begin->location = 0xffffffff;
+		labels->count++;
+		return 0;
+	}
+	end = begin + labels->count;
+	for (id = 0; begin < end; ++begin, ++id) {
+		if (!strcmp(label, begin->label))
+			return id;
+	}
+	begin->label = label;
+	begin->location = 0xffffffff;
+	labels->count++;
+	return id;
+}
+
+void seccomp_bpf_print(struct sock_filter *filter, size_t count)
+{
+	struct sock_filter *end = filter + count;
+	for ( ; filter < end; ++filter)
+		printf("{ code=%u,jt=%u,jf=%u,k=%u },\n",
+			filter->code, filter->jt, filter->jf, filter->k);
+}
diff --git a/samples/seccomp/bpf-helper.h b/samples/seccomp/bpf-helper.h
new file mode 100644
index 00000000000..643279dd30f
--- /dev/null
+++ b/samples/seccomp/bpf-helper.h
@@ -0,0 +1,238 @@
+/*
+ * Example wrapper around BPF macros.
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_SET_SECCOMP, 2, ...).
+ *
+ * No guarantees are provided with respect to the correctness
+ * or functionality of this code.
+ */
+#ifndef __BPF_HELPER_H__
+#define __BPF_HELPER_H__
+
+#include <asm/bitsperlong.h>	/* for __BITS_PER_LONG */
+#include <endian.h>
+#include <linux/filter.h>
+#include <linux/seccomp.h>	/* for seccomp_data */
+#include <linux/types.h>
+#include <linux/unistd.h>
+#include <stddef.h>
+
+#define BPF_LABELS_MAX 256
+struct bpf_labels {
+	int count;
+	struct __bpf_label {
+		const char *label;
+		__u32 location;
+	} labels[BPF_LABELS_MAX];
+};
+
+int bpf_resolve_jumps(struct bpf_labels *labels,
+		      struct sock_filter *filter, size_t count);
+__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label);
+void seccomp_bpf_print(struct sock_filter *filter, size_t count);
+
+#define JUMP_JT 0xff
+#define JUMP_JF 0xff
+#define LABEL_JT 0xfe
+#define LABEL_JF 0xfe
+
+#define ALLOW \
+	BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)
+#define DENY \
+	BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)
+#define JUMP(labels, label) \
+	BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \
+		 JUMP_JT, JUMP_JF)
+#define LABEL(labels, label) \
+	BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \
+		 LABEL_JT, LABEL_JF)
+#define SYSCALL(nr, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (nr), 0, 1), \
+	jt
+
+/* Lame, but just an example */
+#define FIND_LABEL(labels, label) seccomp_bpf_label((labels), #label)
+
+#define EXPAND(...) __VA_ARGS__
+/* Map all width-sensitive operations */
+#if __BITS_PER_LONG == 32
+
+#define JEQ(x, jt) JEQ32(x, EXPAND(jt))
+#define JNE(x, jt) JNE32(x, EXPAND(jt))
+#define JGT(x, jt) JGT32(x, EXPAND(jt))
+#define JLT(x, jt) JLT32(x, EXPAND(jt))
+#define JGE(x, jt) JGE32(x, EXPAND(jt))
+#define JLE(x, jt) JLE32(x, EXPAND(jt))
+#define JA(x, jt) JA32(x, EXPAND(jt))
+#define ARG(i) ARG_32(i)
+#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)])
+
+#elif __BITS_PER_LONG == 64
+
+/* Ensure that we load the logically correct offset. */
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define ENDIAN(_lo, _hi) _lo, _hi
+#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)])
+#define HI_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) + sizeof(__u32)
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define ENDIAN(_lo, _hi) _hi, _lo
+#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) + sizeof(__u32)
+#define HI_ARG(idx) offsetof(struct seccomp_data, args[(idx)])
+#else
+#error "Unknown endianness"
+#endif
+
+union arg64 {
+	struct {
+		__u32 ENDIAN(lo32, hi32);
+	};
+	__u64 u64;
+};
+
+#define JEQ(x, jt) \
+	JEQ64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JGT(x, jt) \
+	JGT64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JGE(x, jt) \
+	JGE64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JNE(x, jt) \
+	JNE64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JLT(x, jt) \
+	JLT64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JLE(x, jt) \
+	JLE64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+
+#define JA(x, jt) \
+	JA64(((union arg64){.u64 = (x)}).lo32, \
+	       ((union arg64){.u64 = (x)}).hi32, \
+	       EXPAND(jt))
+#define ARG(i) ARG_64(i)
+
+#else
+#error __BITS_PER_LONG value unusable.
+#endif
+
+/* Loads the arg into A */
+#define ARG_32(idx) \
+	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, LO_ARG(idx))
+
+/* Loads hi into A and lo in X */
+#define ARG_64(idx) \
+	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, LO_ARG(idx)), \
+	BPF_STMT(BPF_ST, 0), /* lo -> M[0] */ \
+	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, HI_ARG(idx)), \
+	BPF_STMT(BPF_ST, 1) /* hi -> M[1] */
+
+#define JEQ32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 0, 1), \
+	jt
+
+#define JNE32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 1, 0), \
+	jt
+
+/* Checks the lo, then swaps to check the hi. A=lo,X=hi */
+#define JEQ64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 0, 2), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+
+#define JNE64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 5, 0), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 2, 0), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+
+#define JA32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (value), 0, 1), \
+	jt
+
+#define JA64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (hi), 3, 0), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (lo), 0, 2), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+
+#define JGE32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 0, 1), \
+	jt
+
+#define JLT32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 1, 0), \
+	jt
+
+/* Shortcut checking if hi > arg.hi. */
+#define JGE64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (lo), 0, 2), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+
+#define JLT64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (hi), 0, 4), \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+
+#define JGT32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \
+	jt
+
+#define JLE32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 1, 0), \
+	jt
+
+/* Check hi > args.hi first, then do the GE checking */
+#define JGT64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 0, 2), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+
+#define JLE64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 6, 0), \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 3), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+
+#define LOAD_SYSCALL_NR \
+	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
+		 offsetof(struct seccomp_data, nr))
+
+#endif  /* __BPF_HELPER_H__ */
diff --git a/samples/seccomp/dropper.c b/samples/seccomp/dropper.c
new file mode 100644
index 00000000000..c69c347c701
--- /dev/null
+++ b/samples/seccomp/dropper.c
@@ -0,0 +1,68 @@
+/*
+ * Naive system call dropper built on seccomp_filter.
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_SET_SECCOMP, 2, ...).
+ *
+ * When run, returns the specified errno for the specified
+ * system call number against the given architecture.
+ *
+ * Run this one as root as PR_SET_NO_NEW_PRIVS is not called.
+ */
+
+#include <errno.h>
+#include <linux/audit.h>
+#include <linux/filter.h>
+#include <linux/seccomp.h>
+#include <linux/unistd.h>
+#include <stdio.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+
+static int install_filter(int nr, int arch, int error)
+{
+	struct sock_filter filter[] = {
+		BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
+			 (offsetof(struct seccomp_data, arch))),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, arch, 0, 3),
+		BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
+			 (offsetof(struct seccomp_data, nr))),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, nr, 0, 1),
+		BPF_STMT(BPF_RET+BPF_K,
+			 SECCOMP_RET_ERRNO|(error & SECCOMP_RET_DATA)),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+	};
+	struct sock_fprog prog = {
+		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+		.filter = filter,
+	};
+	if (prctl(PR_SET_SECCOMP, 2, &prog)) {
+		perror("prctl");
+		return 1;
+	}
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	if (argc < 5) {
+		fprintf(stderr, "Usage:\n"
+			"dropper <syscall_nr> <arch> <errno> <prog> [<args>]\n"
+			"Hint:	AUDIT_ARCH_I386: 0x%X\n"
+			"	AUDIT_ARCH_X86_64: 0x%X\n"
+			"\n", AUDIT_ARCH_I386, AUDIT_ARCH_X86_64);
+		return 1;
+	}
+	if (install_filter(strtol(argv[1], NULL, 0), strtol(argv[2], NULL, 0),
+			   strtol(argv[3], NULL, 0)))
+		return 1;
+	execv(argv[4], &argv[4]);
+	printf("Failed to execv\n");
+	return 255;
+}
-- 
cgit v1.2.3


From b1fa650c7e6e81ca788fef52b1659295eb82ffdd Mon Sep 17 00:00:00 2001
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Tue, 17 Apr 2012 12:08:48 +1000
Subject: seccomp: use a static inline for a function stub

Fixes this error message when CONFIG_SECCOMP is not set:

arch/powerpc/kernel/ptrace.c: In function 'do_syscall_trace_enter':
arch/powerpc/kernel/ptrace.c:1713:2: error: statement with no effect [-Werror=unused-value]

Signed-off-by: Stephen Rothwell <sfr@ozlabs.au.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 include/linux/seccomp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 5818e869651..60f2b350ead 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -90,7 +90,7 @@ static inline int seccomp_mode(struct seccomp *s)
 struct seccomp { };
 struct seccomp_filter { };
 
-#define secure_computing(x) 0
+static inline int secure_computing(int this_syscall) { return 0; }
 
 static inline long prctl_get_seccomp(void)
 {
-- 
cgit v1.2.3


From e4da89d02f369450996cfd04f64b1cce4d8afaea Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Tue, 17 Apr 2012 14:48:57 -0500
Subject: seccomp: ignore secure_computing return values

This change is inspired by
  https://lkml.org/lkml/2012/4/16/14
which fixes the build warnings for arches that don't support
CONFIG_HAVE_ARCH_SECCOMP_FILTER.

In particular, there is no requirement for the return value of
secure_computing() to be checked unless the architecture supports
seccomp filter.  Instead of silencing the warnings with (void)
a new static inline is added to encode the expected behavior
in a compiler and human friendly way.

v2: - cleans things up with a static inline
    - removes sfr's signed-off-by since it is a different approach
v1: - matches sfr's original change

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 arch/microblaze/kernel/ptrace.c | 2 +-
 arch/mips/kernel/ptrace.c       | 2 +-
 arch/powerpc/kernel/ptrace.c    | 2 +-
 arch/s390/kernel/ptrace.c       | 2 +-
 arch/sh/kernel/ptrace_32.c      | 2 +-
 arch/sh/kernel/ptrace_64.c      | 2 +-
 arch/sparc/kernel/ptrace_64.c   | 2 +-
 include/linux/seccomp.h         | 7 +++++++
 8 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/microblaze/kernel/ptrace.c b/arch/microblaze/kernel/ptrace.c
index 6eb2aa927d8..ab1b9db661f 100644
--- a/arch/microblaze/kernel/ptrace.c
+++ b/arch/microblaze/kernel/ptrace.c
@@ -136,7 +136,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing(regs->r12);
+	secure_computing_strict(regs->r12);
 
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index 7c24c2973c6..4812c6d916e 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -535,7 +535,7 @@ static inline int audit_arch(void)
 asmlinkage void syscall_trace_enter(struct pt_regs *regs)
 {
 	/* do the secure computing check first */
-	secure_computing(regs->regs[2]);
+	secure_computing_strict(regs->regs[2]);
 
 	if (!(current->ptrace & PT_PTRACED))
 		goto out;
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 8d8e028893b..dd5e214cdf2 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1710,7 +1710,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing(regs->gpr[0]);
+	secure_computing_strict(regs->gpr[0]);
 
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
diff --git a/arch/s390/kernel/ptrace.c b/arch/s390/kernel/ptrace.c
index 02f300fbf07..4993e689b2c 100644
--- a/arch/s390/kernel/ptrace.c
+++ b/arch/s390/kernel/ptrace.c
@@ -719,7 +719,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 	long ret = 0;
 
 	/* Do the secure computing check first. */
-	secure_computing(regs->gprs[2]);
+	secure_computing_strict(regs->gprs[2]);
 
 	/*
 	 * The sysc_tracesys code in entry.S stored the system
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 9698671444e..81f999a672f 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -503,7 +503,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing(regs->regs[0]);
+	secure_computing_strict(regs->regs[0]);
 
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
diff --git a/arch/sh/kernel/ptrace_64.c b/arch/sh/kernel/ptrace_64.c
index bc81e07dc09..af90339dadc 100644
--- a/arch/sh/kernel/ptrace_64.c
+++ b/arch/sh/kernel/ptrace_64.c
@@ -522,7 +522,7 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long long ret = 0;
 
-	secure_computing(regs->regs[9]);
+	secure_computing_strict(regs->regs[9]);
 
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
diff --git a/arch/sparc/kernel/ptrace_64.c b/arch/sparc/kernel/ptrace_64.c
index 6f97c076799..484dabac704 100644
--- a/arch/sparc/kernel/ptrace_64.c
+++ b/arch/sparc/kernel/ptrace_64.c
@@ -1062,7 +1062,7 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 	int ret = 0;
 
 	/* do the secure computing check first */
-	secure_computing(regs->u_regs[UREG_G1]);
+	secure_computing_strict(regs->u_regs[UREG_G1]);
 
 	if (test_thread_flag(TIF_SYSCALL_TRACE))
 		ret = tracehook_report_syscall_entry(regs);
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 60f2b350ead..84f6320da50 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -75,6 +75,12 @@ static inline int secure_computing(int this_syscall)
 	return 0;
 }
 
+/* A wrapper for architectures supporting only SECCOMP_MODE_STRICT. */
+static inline void secure_computing_strict(int this_syscall)
+{
+	BUG_ON(secure_computing(this_syscall) != 0);
+}
+
 extern long prctl_get_seccomp(void);
 extern long prctl_set_seccomp(unsigned long, char __user *);
 
@@ -91,6 +97,7 @@ struct seccomp { };
 struct seccomp_filter { };
 
 static inline int secure_computing(int this_syscall) { return 0; }
+static inline void secure_computing_strict(int this_syscall) { return; }
 
 static inline long prctl_get_seccomp(void)
 {
-- 
cgit v1.2.3


From 8156b451f37898d3c3652b4e988a4d62ae16eaac Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Tue, 17 Apr 2012 14:48:58 -0500
Subject: seccomp: fix build warnings when there is no CONFIG_SECCOMP_FILTER

If both audit and seccomp filter support are disabled, 'ret' is marked
as unused.

If just seccomp filter support is disabled, data and skip are considered
unused.

This change fixes those build warnings.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 kernel/seccomp.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index d9db6ec46bc..ee376beedaf 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -377,8 +377,7 @@ int __secure_computing(int this_syscall)
 	int mode = current->seccomp.mode;
 	int exit_sig = 0;
 	int *syscall;
-	u32 ret = SECCOMP_RET_KILL;
-	int data;
+	u32 ret;
 
 	switch (mode) {
 	case SECCOMP_MODE_STRICT:
@@ -392,12 +391,15 @@ int __secure_computing(int this_syscall)
 				return 0;
 		} while (*++syscall);
 		exit_sig = SIGKILL;
+		ret = SECCOMP_RET_KILL;
 		break;
 #ifdef CONFIG_SECCOMP_FILTER
-	case SECCOMP_MODE_FILTER:
+	case SECCOMP_MODE_FILTER: {
+		int data;
 		ret = seccomp_run_filters(this_syscall);
 		data = ret & SECCOMP_RET_DATA;
-		switch (ret & SECCOMP_RET_ACTION) {
+		ret &= SECCOMP_RET_ACTION;
+		switch (ret) {
 		case SECCOMP_RET_ERRNO:
 			/* Set the low-order 16-bits as a errno. */
 			syscall_set_return_value(current, task_pt_regs(current),
@@ -432,6 +434,7 @@ int __secure_computing(int this_syscall)
 		}
 		exit_sig = SIGSYS;
 		break;
+	}
 #endif
 	default:
 		BUG();
@@ -442,8 +445,10 @@ int __secure_computing(int this_syscall)
 #endif
 	audit_seccomp(this_syscall, exit_sig, ret);
 	do_exit(exit_sig);
+#ifdef CONFIG_SECCOMP_FILTER
 skip:
 	audit_seccomp(this_syscall, exit_sig, ret);
+#endif
 	return -1;
 }
 
-- 
cgit v1.2.3


From 389da25f93eea8ff64181ae7e3e87da68acaef2e Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Mon, 16 Apr 2012 11:56:45 -0700
Subject: Yama: add additional ptrace scopes

This expands the available Yama ptrace restrictions to include two more
modes. Mode 2 requires CAP_SYS_PTRACE for PTRACE_ATTACH, and mode 3
completely disables PTRACE_ATTACH (and locks the sysctl).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 Documentation/security/Yama.txt | 10 ++++++-
 security/yama/yama_lsm.c        | 62 +++++++++++++++++++++++++++++++++--------
 2 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/Documentation/security/Yama.txt b/Documentation/security/Yama.txt
index a9511f17906..e369de2d48c 100644
--- a/Documentation/security/Yama.txt
+++ b/Documentation/security/Yama.txt
@@ -34,7 +34,7 @@ parent to a child process (i.e. direct "gdb EXE" and "strace EXE" still
 work), or with CAP_SYS_PTRACE (i.e. "gdb --pid=PID", and "strace -p PID"
 still work as root).
 
-For software that has defined application-specific relationships
+In mode 1, software that has defined application-specific relationships
 between a debugging process and its inferior (crash handlers, etc),
 prctl(PR_SET_PTRACER, pid, ...) can be used. An inferior can declare which
 other process (and its descendents) are allowed to call PTRACE_ATTACH
@@ -46,6 +46,8 @@ restrictions, it can call prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...)
 so that any otherwise allowed process (even those in external pid namespaces)
 may attach.
 
+These restrictions do not change how ptrace via PTRACE_TRACEME operates.
+
 The sysctl settings are:
 
 0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other
@@ -60,6 +62,12 @@ The sysctl settings are:
     inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare
     an allowed debugger PID to call PTRACE_ATTACH on the inferior.
 
+2 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace
+    with PTRACE_ATTACH.
+
+3 - no attach: no processes may use ptrace with PTRACE_ATTACH. Once set,
+    this sysctl cannot be changed to a lower value.
+
 The original children-only logic was based on the restrictions in grsecurity.
 
 ==============================================================
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index 573723843a0..afb04cbb697 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -18,7 +18,12 @@
 #include <linux/prctl.h>
 #include <linux/ratelimit.h>
 
-static int ptrace_scope = 1;
+#define YAMA_SCOPE_DISABLED	0
+#define YAMA_SCOPE_RELATIONAL	1
+#define YAMA_SCOPE_CAPABILITY	2
+#define YAMA_SCOPE_NO_ATTACH	3
+
+static int ptrace_scope = YAMA_SCOPE_RELATIONAL;
 
 /* describe a ptrace relationship for potential exception */
 struct ptrace_relation {
@@ -251,17 +256,32 @@ static int yama_ptrace_access_check(struct task_struct *child,
 		return rc;
 
 	/* require ptrace target be a child of ptracer on attach */
-	if (mode == PTRACE_MODE_ATTACH &&
-	    ptrace_scope &&
-	    !task_is_descendant(current, child) &&
-	    !ptracer_exception_found(current, child) &&
-	    !capable(CAP_SYS_PTRACE))
-		rc = -EPERM;
+	if (mode == PTRACE_MODE_ATTACH) {
+		switch (ptrace_scope) {
+		case YAMA_SCOPE_DISABLED:
+			/* No additional restrictions. */
+			break;
+		case YAMA_SCOPE_RELATIONAL:
+			if (!task_is_descendant(current, child) &&
+			    !ptracer_exception_found(current, child) &&
+			    !capable(CAP_SYS_PTRACE))
+				rc = -EPERM;
+			break;
+		case YAMA_SCOPE_CAPABILITY:
+			if (!capable(CAP_SYS_PTRACE))
+				rc = -EPERM;
+			break;
+		case YAMA_SCOPE_NO_ATTACH:
+		default:
+			rc = -EPERM;
+			break;
+		}
+	}
 
 	if (rc) {
 		char name[sizeof(current->comm)];
-		printk_ratelimited(KERN_NOTICE "ptrace of non-child"
-			" pid %d was attempted by: %s (pid %d)\n",
+		printk_ratelimited(KERN_NOTICE
+			"ptrace of pid %d was attempted by: %s (pid %d)\n",
 			child->pid,
 			get_task_comm(name, current),
 			current->pid);
@@ -279,8 +299,28 @@ static struct security_operations yama_ops = {
 };
 
 #ifdef CONFIG_SYSCTL
+static int yama_dointvec_minmax(struct ctl_table *table, int write,
+				void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int rc;
+
+	if (write && !capable(CAP_SYS_PTRACE))
+		return -EPERM;
+
+	rc = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	if (rc)
+		return rc;
+
+	/* Lock the max value if it ever gets set. */
+	if (write && *(int *)table->data == *(int *)table->extra2)
+		table->extra1 = table->extra2;
+
+	return rc;
+}
+
 static int zero;
 static int one = 1;
+static int max_scope = YAMA_SCOPE_NO_ATTACH;
 
 struct ctl_path yama_sysctl_path[] = {
 	{ .procname = "kernel", },
@@ -294,9 +334,9 @@ static struct ctl_table yama_sysctl_table[] = {
 		.data           = &ptrace_scope,
 		.maxlen         = sizeof(int),
 		.mode           = 0644,
-		.proc_handler   = proc_dointvec_minmax,
+		.proc_handler   = yama_dointvec_minmax,
 		.extra1         = &zero,
-		.extra2         = &one,
+		.extra2         = &max_scope,
 	},
 	{ }
 };
-- 
cgit v1.2.3


From 561381a146a31ff91d7a2370c10871b02ac7343c Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Wed, 18 Apr 2012 19:50:25 -0500
Subject: samples/seccomp: fix dependencies on arch macros

This change fixes the compilation error triggered here for
i386 allmodconfig in linux-next:
  http://kisskb.ellerman.id.au/kisskb/buildresult/6123842/

Logic attempting to predict the host architecture has been
removed from the Makefile.  Instead, the bpf-direct sample
should now compile on any architecture, but if the architecture
is not supported, it will compile a minimal main() function.

This change also ensures the samples are not compiled when
there is no seccomp filter support.

(Note, I wasn't able to reproduce the error locally, but
 the existing approach was clearly flawed.  This tweak
 should resolve your issue and avoid other future weirdness.)

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 samples/seccomp/Makefile     | 12 +++---------
 samples/seccomp/bpf-direct.c | 18 ++++++++++++++++--
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
index e8fe0f57b68..16aa2d42498 100644
--- a/samples/seccomp/Makefile
+++ b/samples/seccomp/Makefile
@@ -1,27 +1,21 @@
 # kbuild trick to avoid linker error. Can be omitted if a module is built.
 obj- := dummy.o
 
-hostprogs-$(CONFIG_SECCOMP) := bpf-fancy dropper
-bpf-fancy-objs := bpf-fancy.o bpf-helper.o
+hostprogs-$(CONFIG_SECCOMP_FILTER) := bpf-fancy dropper bpf-direct
 
 HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include
 HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include
 HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include
 HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include
+bpf-fancy-objs := bpf-fancy.o bpf-helper.o
 
 HOSTCFLAGS_dropper.o += -I$(objtree)/usr/include
 HOSTCFLAGS_dropper.o += -idirafter $(objtree)/include
 dropper-objs := dropper.o
 
-# bpf-direct.c is x86-only.
-ifeq ($(SRCARCH),x86)
-# List of programs to build
-hostprogs-$(CONFIG_SECCOMP) += bpf-direct
-bpf-direct-objs := bpf-direct.o
-endif
-
 HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include
 HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include
+bpf-direct-objs := bpf-direct.o
 
 # Try to match the kernel target.
 ifeq ($(CONFIG_64BIT),)
diff --git a/samples/seccomp/bpf-direct.c b/samples/seccomp/bpf-direct.c
index 26f523e6ed7..151ec3f5218 100644
--- a/samples/seccomp/bpf-direct.c
+++ b/samples/seccomp/bpf-direct.c
@@ -8,6 +8,11 @@
  * and can serve as a starting point for developing
  * applications using prctl(PR_SET_SECCOMP, 2, ...).
  */
+#if defined(__i386__) || defined(__x86_64__)
+#define SUPPORTED_ARCH 1
+#endif
+
+#if defined(SUPPORTED_ARCH)
 #define __USE_GNU 1
 #define _GNU_SOURCE 1
 
@@ -43,8 +48,6 @@
 #define REG_ARG3	REG_R10
 #define REG_ARG4	REG_R8
 #define REG_ARG5	REG_R9
-#else
-#error Unsupported platform
 #endif
 
 #ifndef PR_SET_NO_NEW_PRIVS
@@ -174,3 +177,14 @@ int main(int argc, char **argv)
 		payload("Error message going to STDERR\n"));
 	return 0;
 }
+#else	/* SUPPORTED_ARCH */
+/*
+ * This sample is x86-only.  Since kernel samples are compiled with the
+ * host toolchain, a non-x86 host will result in using only the main()
+ * below.
+ */
+int main(void)
+{
+	return 1;
+}
+#endif	/* SUPPORTED_ARCH */
-- 
cgit v1.2.3


From 08162e6a23d476544adfe1164afe9ea8b34ab859 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Fri, 20 Apr 2012 16:35:24 +0300
Subject: Yama: remove an unused variable

GCC complains that we don't use "one" any more after 389da25f93 "Yama:
add additional ptrace scopes".

security/yama/yama_lsm.c:322:12: warning: ?one? defined but not used
	[-Wunused-variable]

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 security/yama/yama_lsm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index afb04cbb697..c852f7472ad 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -319,7 +319,6 @@ static int yama_dointvec_minmax(struct ctl_table *table, int write,
 }
 
 static int zero;
-static int one = 1;
 static int max_scope = YAMA_SCOPE_NO_ATTACH;
 
 struct ctl_path yama_sysctl_path[] = {
-- 
cgit v1.2.3


From 45de6767dc51358a188f75dc4ad9dfddb7fb9480 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Fri, 11 May 2012 10:56:56 +0100
Subject: KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32
 compat

Use the 32-bit compat keyctl() syscall wrapper on Sparc64 for Sparc32 binary
compatibility.

Without this, keyctl(KEYCTL_INSTANTIATE_IOV) is liable to malfunction as it
uses an iovec array read from userspace - though the kernel should survive this
as it checks pointers and sizes anyway.

I think all the other keyctl() function should just work, provided (a) the top
32-bits of each 64-bit argument register are cleared prior to invoking the
syscall routine, and the 32-bit address space is right at the 0-end of the
64-bit address space.  Most of the arguments are 32-bit anyway, and so for
those clearing is not required.

Signed-off-by: David Howells <dhowells@redhat.com
cc: "David S. Miller" <davem@davemloft.net>
cc: sparclinux@vger.kernel.org
cc: stable@vger.kernel.org
---
 arch/sparc/Kconfig             | 3 +++
 arch/sparc/kernel/systbls_64.S | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 6c0683d3fcb..76c7ccfb1eb 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -584,6 +584,9 @@ config SYSVIPC_COMPAT
 	depends on COMPAT && SYSVIPC
 	default y
 
+config KEYS_COMPAT
+	def_bool y if COMPAT && KEYS
+
 endmenu
 
 source "net/Kconfig"
diff --git a/arch/sparc/kernel/systbls_64.S b/arch/sparc/kernel/systbls_64.S
index db86b1a0e9a..3a58e0d66f5 100644
--- a/arch/sparc/kernel/systbls_64.S
+++ b/arch/sparc/kernel/systbls_64.S
@@ -74,7 +74,7 @@ sys_call_table32:
 	.word sys_timer_delete, compat_sys_timer_create, sys_ni_syscall, compat_sys_io_setup, sys_io_destroy
 /*270*/	.word sys32_io_submit, sys_io_cancel, compat_sys_io_getevents, sys32_mq_open, sys_mq_unlink
 	.word compat_sys_mq_timedsend, compat_sys_mq_timedreceive, compat_sys_mq_notify, compat_sys_mq_getsetattr, compat_sys_waitid
-/*280*/	.word sys32_tee, sys_add_key, sys_request_key, sys_keyctl, compat_sys_openat
+/*280*/	.word sys32_tee, sys_add_key, sys_request_key, compat_sys_keyctl, compat_sys_openat
 	.word sys_mkdirat, sys_mknodat, sys_fchownat, compat_sys_futimesat, compat_sys_fstatat64
 /*290*/	.word sys_unlinkat, sys_renameat, sys_linkat, sys_symlinkat, sys_readlinkat
 	.word sys_fchmodat, sys_faccessat, compat_sys_pselect6, compat_sys_ppoll, sys_unshare
-- 
cgit v1.2.3


From f0894940aed13b21f363a411c7ec57358827ad87 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Fri, 11 May 2012 10:56:56 +0100
Subject: KEYS: Move the key config into security/keys/Kconfig

Move the key config into security/keys/Kconfig as there are going to be a lot
of key-related options.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
---
 security/Kconfig      | 68 +-----------------------------------------------
 security/keys/Kconfig | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 67 deletions(-)
 create mode 100644 security/keys/Kconfig

diff --git a/security/Kconfig b/security/Kconfig
index ccc61f8006b..e9c6ac724fe 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -4,73 +4,7 @@
 
 menu "Security options"
 
-config KEYS
-	bool "Enable access key retention support"
-	help
-	  This option provides support for retaining authentication tokens and
-	  access keys in the kernel.
-
-	  It also includes provision of methods by which such keys might be
-	  associated with a process so that network filesystems, encryption
-	  support and the like can find them.
-
-	  Furthermore, a special type of key is available that acts as keyring:
-	  a searchable sequence of keys. Each process is equipped with access
-	  to five standard keyrings: UID-specific, GID-specific, session,
-	  process and thread.
-
-	  If you are unsure as to whether this is required, answer N.
-
-config TRUSTED_KEYS
-	tristate "TRUSTED KEYS"
-	depends on KEYS && TCG_TPM
-	select CRYPTO
-	select CRYPTO_HMAC
-	select CRYPTO_SHA1
-	help
-	  This option provides support for creating, sealing, and unsealing
-	  keys in the kernel. Trusted keys are random number symmetric keys,
-	  generated and RSA-sealed by the TPM. The TPM only unseals the keys,
-	  if the boot PCRs and other criteria match.  Userspace will only ever
-	  see encrypted blobs.
-
-	  If you are unsure as to whether this is required, answer N.
-
-config ENCRYPTED_KEYS
-	tristate "ENCRYPTED KEYS"
-	depends on KEYS
-	select CRYPTO
-	select CRYPTO_HMAC
-	select CRYPTO_AES
-	select CRYPTO_CBC
-	select CRYPTO_SHA256
-	select CRYPTO_RNG
-	help
-	  This option provides support for create/encrypting/decrypting keys
-	  in the kernel.  Encrypted keys are kernel generated random numbers,
-	  which are encrypted/decrypted with a 'master' symmetric key. The
-	  'master' key can be either a trusted-key or user-key type.
-	  Userspace only ever sees/stores encrypted blobs.
-
-	  If you are unsure as to whether this is required, answer N.
-
-config KEYS_DEBUG_PROC_KEYS
-	bool "Enable the /proc/keys file by which keys may be viewed"
-	depends on KEYS
-	help
-	  This option turns on support for the /proc/keys file - through which
-	  can be listed all the keys on the system that are viewable by the
-	  reading process.
-
-	  The only keys included in the list are those that grant View
-	  permission to the reading process whether or not it possesses them.
-	  Note that LSM security checks are still performed, and may further
-	  filter out keys that the current process is not authorised to view.
-
-	  Only key attributes are listed here; key payloads are not included in
-	  the resulting table.
-
-	  If you are unsure as to whether this is required, answer N.
+source security/keys/Kconfig
 
 config SECURITY_DMESG_RESTRICT
 	bool "Restrict unprivileged access to the kernel syslog"
diff --git a/security/keys/Kconfig b/security/keys/Kconfig
new file mode 100644
index 00000000000..a90d6d300db
--- /dev/null
+++ b/security/keys/Kconfig
@@ -0,0 +1,71 @@
+#
+# Key management configuration
+#
+
+config KEYS
+	bool "Enable access key retention support"
+	help
+	  This option provides support for retaining authentication tokens and
+	  access keys in the kernel.
+
+	  It also includes provision of methods by which such keys might be
+	  associated with a process so that network filesystems, encryption
+	  support and the like can find them.
+
+	  Furthermore, a special type of key is available that acts as keyring:
+	  a searchable sequence of keys. Each process is equipped with access
+	  to five standard keyrings: UID-specific, GID-specific, session,
+	  process and thread.
+
+	  If you are unsure as to whether this is required, answer N.
+
+config TRUSTED_KEYS
+	tristate "TRUSTED KEYS"
+	depends on KEYS && TCG_TPM
+	select CRYPTO
+	select CRYPTO_HMAC
+	select CRYPTO_SHA1
+	help
+	  This option provides support for creating, sealing, and unsealing
+	  keys in the kernel. Trusted keys are random number symmetric keys,
+	  generated and RSA-sealed by the TPM. The TPM only unseals the keys,
+	  if the boot PCRs and other criteria match.  Userspace will only ever
+	  see encrypted blobs.
+
+	  If you are unsure as to whether this is required, answer N.
+
+config ENCRYPTED_KEYS
+	tristate "ENCRYPTED KEYS"
+	depends on KEYS
+	select CRYPTO
+	select CRYPTO_HMAC
+	select CRYPTO_AES
+	select CRYPTO_CBC
+	select CRYPTO_SHA256
+	select CRYPTO_RNG
+	help
+	  This option provides support for create/encrypting/decrypting keys
+	  in the kernel.  Encrypted keys are kernel generated random numbers,
+	  which are encrypted/decrypted with a 'master' symmetric key. The
+	  'master' key can be either a trusted-key or user-key type.
+	  Userspace only ever sees/stores encrypted blobs.
+
+	  If you are unsure as to whether this is required, answer N.
+
+config KEYS_DEBUG_PROC_KEYS
+	bool "Enable the /proc/keys file by which keys may be viewed"
+	depends on KEYS
+	help
+	  This option turns on support for the /proc/keys file - through which
+	  can be listed all the keys on the system that are viewable by the
+	  reading process.
+
+	  The only keys included in the list are those that grant View
+	  permission to the reading process whether or not it possesses them.
+	  Note that LSM security checks are still performed, and may further
+	  filter out keys that the current process is not authorised to view.
+
+	  Only key attributes are listed here; key payloads are not included in
+	  the resulting table.
+
+	  If you are unsure as to whether this is required, answer N.
-- 
cgit v1.2.3


From 9f7ce8e249ab761c7ed753059cb16319ede41762 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Fri, 11 May 2012 10:56:56 +0100
Subject: KEYS: Reorganise keys Makefile

Reorganise the keys directory Makefile to put all the core bits together and
the type-specific bits after.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
---
 security/keys/Makefile | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/security/keys/Makefile b/security/keys/Makefile
index a56f1ffdc64..504aaa00838 100644
--- a/security/keys/Makefile
+++ b/security/keys/Makefile
@@ -2,6 +2,9 @@
 # Makefile for key management
 #
 
+#
+# Core
+#
 obj-y := \
 	gc.o \
 	key.o \
@@ -12,9 +15,12 @@ obj-y := \
 	request_key.o \
 	request_key_auth.o \
 	user_defined.o
-
-obj-$(CONFIG_TRUSTED_KEYS) += trusted.o
-obj-$(CONFIG_ENCRYPTED_KEYS) += encrypted-keys/
 obj-$(CONFIG_KEYS_COMPAT) += compat.o
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_SYSCTL) += sysctl.o
+
+#
+# Key types
+#
+obj-$(CONFIG_TRUSTED_KEYS) += trusted.o
+obj-$(CONFIG_ENCRYPTED_KEYS) += encrypted-keys/
-- 
cgit v1.2.3


From 1eb1bcf5bfad001128293b86d891c9d6f2f27333 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Fri, 11 May 2012 10:56:56 +0100
Subject: KEYS: Announce key type (un)registration

Announce the (un)registration of a key type in the core key code rather than
in the callers.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
---
 net/dns_resolver/dns_key.c | 5 -----
 security/keys/key.c        | 3 +++
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/dns_resolver/dns_key.c b/net/dns_resolver/dns_key.c
index c73bba326d7..14b2c3d6e52 100644
--- a/net/dns_resolver/dns_key.c
+++ b/net/dns_resolver/dns_key.c
@@ -249,9 +249,6 @@ static int __init init_dns_resolver(void)
 	struct key *keyring;
 	int ret;
 
-	printk(KERN_NOTICE "Registering the %s key type\n",
-	       key_type_dns_resolver.name);
-
 	/* create an override credential set with a special thread keyring in
 	 * which DNS requests are cached
 	 *
@@ -301,8 +298,6 @@ static void __exit exit_dns_resolver(void)
 	key_revoke(dns_resolver_cache->thread_keyring);
 	unregister_key_type(&key_type_dns_resolver);
 	put_cred(dns_resolver_cache);
-	printk(KERN_NOTICE "Unregistered %s key type\n",
-	       key_type_dns_resolver.name);
 }
 
 module_init(init_dns_resolver)
diff --git a/security/keys/key.c b/security/keys/key.c
index 06783cffb3a..dc628941ecd 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -980,6 +980,8 @@ int register_key_type(struct key_type *ktype)
 
 	/* store the type */
 	list_add(&ktype->link, &key_types_list);
+
+	pr_notice("Key type %s registered\n", ktype->name);
 	ret = 0;
 
 out:
@@ -1002,6 +1004,7 @@ void unregister_key_type(struct key_type *ktype)
 	list_del_init(&ktype->link);
 	downgrade_write(&key_types_sem);
 	key_gc_keytype(ktype);
+	pr_notice("Key type %s unregistered\n", ktype->name);
 	up_read(&key_types_sem);
 }
 EXPORT_SYMBOL(unregister_key_type);
-- 
cgit v1.2.3


From 65d87fe68abf2fc226a9e96be61160f65d6b4680 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Fri, 11 May 2012 10:56:56 +0100
Subject: KEYS: Perform RCU synchronisation on keys prior to key destruction

Make the keys garbage collector invoke synchronize_rcu() prior to destroying
keys with a zero usage count.  This means that a key can be examined under the
RCU read lock in the safe knowledge that it won't get deallocated until after
the lock is released - even if its usage count becomes zero whilst we're
looking at it.

This is useful in keyring search vs key link.  Consider a keyring containing a
link to a key.  That link can be replaced in-place in the keyring without
requiring an RCU copy-and-replace on the keyring contents without breaking a
search underway on that keyring when the displaced key is released, provided
the key is actually destroyed only after the RCU read lock held by the search
algorithm is released.

This permits __key_link() to replace a key without having to reallocate the key
payload.  A key gets replaced if a new key being linked into a keyring has the
same type and description.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/key.h |  5 +++-
 security/keys/gc.c  | 73 ++++++++++++++++++++++++++++++++---------------------
 2 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/include/linux/key.h b/include/linux/key.h
index 96933b1e5d2..c505f83c969 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -124,7 +124,10 @@ static inline unsigned long is_key_possessed(const key_ref_t key_ref)
 struct key {
 	atomic_t		usage;		/* number of references */
 	key_serial_t		serial;		/* key serial number */
-	struct rb_node		serial_node;
+	union {
+		struct list_head graveyard_link;
+		struct rb_node	serial_node;
+	};
 	struct key_type		*type;		/* type of key */
 	struct rw_semaphore	sem;		/* change vs change sem */
 	struct key_user		*user;		/* owner of this key */
diff --git a/security/keys/gc.c b/security/keys/gc.c
index a42b45531aa..27610bf7219 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -168,38 +168,45 @@ do_gc:
 }
 
 /*
- * Garbage collect an unreferenced, detached key
+ * Garbage collect a list of unreferenced, detached keys
  */
-static noinline void key_gc_unused_key(struct key *key)
+static noinline void key_gc_unused_keys(struct list_head *keys)
 {
-	key_check(key);
-
-	security_key_free(key);
-
-	/* deal with the user's key tracking and quota */
-	if (test_bit(KEY_FLAG_IN_QUOTA, &key->flags)) {
-		spin_lock(&key->user->lock);
-		key->user->qnkeys--;
-		key->user->qnbytes -= key->quotalen;
-		spin_unlock(&key->user->lock);
-	}
+	while (!list_empty(keys)) {
+		struct key *key =
+			list_entry(keys->next, struct key, graveyard_link);
+		list_del(&key->graveyard_link);
+
+		kdebug("- %u", key->serial);
+		key_check(key);
+
+		security_key_free(key);
+
+		/* deal with the user's key tracking and quota */
+		if (test_bit(KEY_FLAG_IN_QUOTA, &key->flags)) {
+			spin_lock(&key->user->lock);
+			key->user->qnkeys--;
+			key->user->qnbytes -= key->quotalen;
+			spin_unlock(&key->user->lock);
+		}
 
-	atomic_dec(&key->user->nkeys);
-	if (test_bit(KEY_FLAG_INSTANTIATED, &key->flags))
-		atomic_dec(&key->user->nikeys);
+		atomic_dec(&key->user->nkeys);
+		if (test_bit(KEY_FLAG_INSTANTIATED, &key->flags))
+			atomic_dec(&key->user->nikeys);
 
-	key_user_put(key->user);
+		key_user_put(key->user);
 
-	/* now throw away the key memory */
-	if (key->type->destroy)
-		key->type->destroy(key);
+		/* now throw away the key memory */
+		if (key->type->destroy)
+			key->type->destroy(key);
 
-	kfree(key->description);
+		kfree(key->description);
 
 #ifdef KEY_DEBUGGING
-	key->magic = KEY_DEBUG_MAGIC_X;
+		key->magic = KEY_DEBUG_MAGIC_X;
 #endif
-	kmem_cache_free(key_jar, key);
+		kmem_cache_free(key_jar, key);
+	}
 }
 
 /*
@@ -211,6 +218,7 @@ static noinline void key_gc_unused_key(struct key *key)
  */
 static void key_garbage_collector(struct work_struct *work)
 {
+	static LIST_HEAD(graveyard);
 	static u8 gc_state;		/* Internal persistent state */
 #define KEY_GC_REAP_AGAIN	0x01	/* - Need another cycle */
 #define KEY_GC_REAPING_LINKS	0x02	/* - We need to reap links */
@@ -316,15 +324,22 @@ maybe_resched:
 		key_schedule_gc(new_timer);
 	}
 
-	if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2)) {
-		/* Make sure everyone revalidates their keys if we marked a
-		 * bunch as being dead and make sure all keyring ex-payloads
-		 * are destroyed.
+	if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2) ||
+	    !list_empty(&graveyard)) {
+		/* Make sure that all pending keyring payload destructions are
+		 * fulfilled and that people aren't now looking at dead or
+		 * dying keys that they don't have a reference upon or a link
+		 * to.
 		 */
-		kdebug("dead sync");
+		kdebug("gc sync");
 		synchronize_rcu();
 	}
 
+	if (!list_empty(&graveyard)) {
+		kdebug("gc keys");
+		key_gc_unused_keys(&graveyard);
+	}
+
 	if (unlikely(gc_state & (KEY_GC_REAPING_DEAD_1 |
 				 KEY_GC_REAPING_DEAD_2))) {
 		if (!(gc_state & KEY_GC_FOUND_DEAD_KEY)) {
@@ -359,7 +374,7 @@ found_unreferenced_key:
 	rb_erase(&key->serial_node, &key_serial_tree);
 	spin_unlock(&key_serial_lock);
 
-	key_gc_unused_key(key);
+	list_add_tail(&key->graveyard_link, &graveyard);
 	gc_state |= KEY_GC_REAP_AGAIN;
 	goto maybe_resched;
 
-- 
cgit v1.2.3


From 233e4735f2a45d9e641c2488b8d7afeb1f377dac Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Fri, 11 May 2012 10:56:56 +0100
Subject: KEYS: Permit in-place link replacement in keyring list

Make use of the previous patch that makes the garbage collector perform RCU
synchronisation before destroying defunct keys.  Key pointers can now be
replaced in-place without creating a new keyring payload and replacing the
whole thing as the discarded keys will not be destroyed until all currently
held RCU read locks are released.

If the keyring payload space needs to be expanded or contracted, then a
replacement will still need allocating, and the original will still have to be
freed by RCU.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 include/keys/keyring-type.h |  2 +-
 security/keys/gc.c          |  2 +-
 security/keys/keyring.c     | 95 ++++++++++++++++++++++++++-------------------
 3 files changed, 58 insertions(+), 41 deletions(-)

diff --git a/include/keys/keyring-type.h b/include/keys/keyring-type.h
index 843f872a4b6..cf49159b0e3 100644
--- a/include/keys/keyring-type.h
+++ b/include/keys/keyring-type.h
@@ -24,7 +24,7 @@ struct keyring_list {
 	unsigned short	maxkeys;	/* max keys this list can hold */
 	unsigned short	nkeys;		/* number of keys currently held */
 	unsigned short	delkey;		/* key to be unlinked by RCU */
-	struct key	*keys[0];
+	struct key __rcu *keys[0];
 };
 
 
diff --git a/security/keys/gc.c b/security/keys/gc.c
index 27610bf7219..adddaa258d5 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -148,7 +148,7 @@ static void key_gc_keyring(struct key *keyring, time_t limit)
 	loop = klist->nkeys;
 	smp_rmb();
 	for (loop--; loop >= 0; loop--) {
-		key = klist->keys[loop];
+		key = rcu_dereference(klist->keys[loop]);
 		if (test_bit(KEY_FLAG_DEAD, &key->flags) ||
 		    (key->expiry > 0 && key->expiry <= limit))
 			goto do_gc;
diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index d605f75292e..459b3cc347f 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -25,6 +25,11 @@
 		(keyring)->payload.subscriptions,			\
 		rwsem_is_locked((struct rw_semaphore *)&(keyring)->sem)))
 
+#define rcu_deref_link_locked(klist, index, keyring)			\
+	(rcu_dereference_protected(					\
+		(klist)->keys[index],					\
+		rwsem_is_locked((struct rw_semaphore *)&(keyring)->sem)))
+
 #define KEY_LINK_FIXQUOTA 1UL
 
 /*
@@ -138,6 +143,11 @@ static int keyring_match(const struct key *keyring, const void *description)
 /*
  * Clean up a keyring when it is destroyed.  Unpublish its name if it had one
  * and dispose of its data.
+ *
+ * The garbage collector detects the final key_put(), removes the keyring from
+ * the serial number tree and then does RCU synchronisation before coming here,
+ * so we shouldn't need to worry about code poking around here with the RCU
+ * readlock held by this time.
  */
 static void keyring_destroy(struct key *keyring)
 {
@@ -154,11 +164,10 @@ static void keyring_destroy(struct key *keyring)
 		write_unlock(&keyring_name_lock);
 	}
 
-	klist = rcu_dereference_check(keyring->payload.subscriptions,
-				      atomic_read(&keyring->usage) == 0);
+	klist = rcu_access_pointer(keyring->payload.subscriptions);
 	if (klist) {
 		for (loop = klist->nkeys - 1; loop >= 0; loop--)
-			key_put(klist->keys[loop]);
+			key_put(rcu_access_pointer(klist->keys[loop]));
 		kfree(klist);
 	}
 }
@@ -214,7 +223,8 @@ static long keyring_read(const struct key *keyring,
 			ret = -EFAULT;
 
 			for (loop = 0; loop < klist->nkeys; loop++) {
-				key = klist->keys[loop];
+				key = rcu_deref_link_locked(klist, loop,
+							    keyring);
 
 				tmp = sizeof(key_serial_t);
 				if (tmp > buflen)
@@ -383,7 +393,7 @@ descend:
 	nkeys = keylist->nkeys;
 	smp_rmb();
 	for (kix = 0; kix < nkeys; kix++) {
-		key = keylist->keys[kix];
+		key = rcu_dereference(keylist->keys[kix]);
 		kflags = key->flags;
 
 		/* ignore keys not of this type */
@@ -426,7 +436,7 @@ ascend:
 	nkeys = keylist->nkeys;
 	smp_rmb();
 	for (; kix < nkeys; kix++) {
-		key = keylist->keys[kix];
+		key = rcu_dereference(keylist->keys[kix]);
 		if (key->type != &key_type_keyring)
 			continue;
 
@@ -531,8 +541,7 @@ key_ref_t __keyring_search_one(key_ref_t keyring_ref,
 		nkeys = klist->nkeys;
 		smp_rmb();
 		for (loop = 0; loop < nkeys ; loop++) {
-			key = klist->keys[loop];
-
+			key = rcu_dereference(klist->keys[loop]);
 			if (key->type == ktype &&
 			    (!key->type->match ||
 			     key->type->match(key, description)) &&
@@ -654,7 +663,7 @@ ascend:
 	nkeys = keylist->nkeys;
 	smp_rmb();
 	for (; kix < nkeys; kix++) {
-		key = keylist->keys[kix];
+		key = rcu_dereference(keylist->keys[kix]);
 
 		if (key == A)
 			goto cycle_detected;
@@ -711,7 +720,7 @@ static void keyring_unlink_rcu_disposal(struct rcu_head *rcu)
 		container_of(rcu, struct keyring_list, rcu);
 
 	if (klist->delkey != USHRT_MAX)
-		key_put(klist->keys[klist->delkey]);
+		key_put(rcu_access_pointer(klist->keys[klist->delkey]));
 	kfree(klist);
 }
 
@@ -749,24 +758,16 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 	/* see if there's a matching key we can displace */
 	if (klist && klist->nkeys > 0) {
 		for (loop = klist->nkeys - 1; loop >= 0; loop--) {
-			if (klist->keys[loop]->type == type &&
-			    strcmp(klist->keys[loop]->description,
-				   description) == 0
-			    ) {
-				/* found a match - we'll replace this one with
-				 * the new key */
-				size = sizeof(struct key *) * klist->maxkeys;
-				size += sizeof(*klist);
-				BUG_ON(size > PAGE_SIZE);
-
-				ret = -ENOMEM;
-				nklist = kmemdup(klist, size, GFP_KERNEL);
-				if (!nklist)
-					goto error_sem;
-
-				/* note replacement slot */
-				klist->delkey = nklist->delkey = loop;
-				prealloc = (unsigned long)nklist;
+			struct key *key = rcu_deref_link_locked(klist, loop,
+								keyring);
+			if (key->type == type &&
+			    strcmp(key->description, description) == 0) {
+				/* Found a match - we'll replace the link with
+				 * one to the new key.  We record the slot
+				 * position.
+				 */
+				klist->delkey = loop;
+				prealloc = 0;
 				goto done;
 			}
 		}
@@ -780,7 +781,7 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 
 	if (klist && klist->nkeys < klist->maxkeys) {
 		/* there's sufficient slack space to append directly */
-		nklist = NULL;
+		klist->delkey = klist->nkeys;
 		prealloc = KEY_LINK_FIXQUOTA;
 	} else {
 		/* grow the key list */
@@ -813,10 +814,10 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 		}
 
 		/* add the key into the new space */
-		nklist->keys[nklist->delkey] = NULL;
+		RCU_INIT_POINTER(nklist->keys[nklist->delkey], NULL);
+		prealloc = (unsigned long)nklist | KEY_LINK_FIXQUOTA;
 	}
 
-	prealloc = (unsigned long)nklist | KEY_LINK_FIXQUOTA;
 done:
 	*_prealloc = prealloc;
 	kleave(" = 0");
@@ -862,6 +863,7 @@ void __key_link(struct key *keyring, struct key *key,
 		unsigned long *_prealloc)
 {
 	struct keyring_list *klist, *nklist;
+	struct key *discard;
 
 	nklist = (struct keyring_list *)(*_prealloc & ~KEY_LINK_FIXQUOTA);
 	*_prealloc = 0;
@@ -875,10 +877,10 @@ void __key_link(struct key *keyring, struct key *key,
 	/* there's a matching key we can displace or an empty slot in a newly
 	 * allocated list we can fill */
 	if (nklist) {
-		kdebug("replace %hu/%hu/%hu",
+		kdebug("reissue %hu/%hu/%hu",
 		       nklist->delkey, nklist->nkeys, nklist->maxkeys);
 
-		nklist->keys[nklist->delkey] = key;
+		RCU_INIT_POINTER(nklist->keys[nklist->delkey], key);
 
 		rcu_assign_pointer(keyring->payload.subscriptions, nklist);
 
@@ -889,9 +891,23 @@ void __key_link(struct key *keyring, struct key *key,
 			       klist->delkey, klist->nkeys, klist->maxkeys);
 			call_rcu(&klist->rcu, keyring_unlink_rcu_disposal);
 		}
+	} else if (klist->delkey < klist->nkeys) {
+		kdebug("replace %hu/%hu/%hu",
+		       klist->delkey, klist->nkeys, klist->maxkeys);
+
+		discard = rcu_dereference_protected(
+			klist->keys[klist->delkey],
+			rwsem_is_locked(&keyring->sem));
+		rcu_assign_pointer(klist->keys[klist->delkey], key);
+		/* The garbage collector will take care of RCU
+		 * synchronisation */
+		key_put(discard);
 	} else {
 		/* there's sufficient slack space to append directly */
-		klist->keys[klist->nkeys] = key;
+		kdebug("append %hu/%hu/%hu",
+		       klist->delkey, klist->nkeys, klist->maxkeys);
+
+		RCU_INIT_POINTER(klist->keys[klist->delkey], key);
 		smp_wmb();
 		klist->nkeys++;
 	}
@@ -998,7 +1014,7 @@ int key_unlink(struct key *keyring, struct key *key)
 	if (klist) {
 		/* search the keyring for the key */
 		for (loop = 0; loop < klist->nkeys; loop++)
-			if (klist->keys[loop] == key)
+			if (rcu_access_pointer(klist->keys[loop]) == key)
 				goto key_is_present;
 	}
 
@@ -1061,7 +1077,7 @@ static void keyring_clear_rcu_disposal(struct rcu_head *rcu)
 	klist = container_of(rcu, struct keyring_list, rcu);
 
 	for (loop = klist->nkeys - 1; loop >= 0; loop--)
-		key_put(klist->keys[loop]);
+		key_put(rcu_access_pointer(klist->keys[loop]));
 
 	kfree(klist);
 }
@@ -1161,7 +1177,8 @@ void keyring_gc(struct key *keyring, time_t limit)
 	/* work out how many subscriptions we're keeping */
 	keep = 0;
 	for (loop = klist->nkeys - 1; loop >= 0; loop--)
-		if (!key_is_dead(klist->keys[loop], limit))
+		if (!key_is_dead(rcu_deref_link_locked(klist, loop, keyring),
+				 limit))
 			keep++;
 
 	if (keep == klist->nkeys)
@@ -1182,11 +1199,11 @@ void keyring_gc(struct key *keyring, time_t limit)
 	 */
 	keep = 0;
 	for (loop = klist->nkeys - 1; loop >= 0; loop--) {
-		key = klist->keys[loop];
+		key = rcu_deref_link_locked(klist, loop, keyring);
 		if (!key_is_dead(key, limit)) {
 			if (keep >= max)
 				goto discard_new;
-			new->keys[keep++] = key_get(key);
+			RCU_INIT_POINTER(new->keys[keep++], key_get(key));
 		}
 	}
 	new->nkeys = keep;
-- 
cgit v1.2.3


From 31d5a79d7f3d436da176a78ebc12d53c06da402e Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Fri, 11 May 2012 10:56:56 +0100
Subject: KEYS: Do LRU discard in full keyrings

Do an LRU discard in keyrings that are full rather than returning ENFILE.  To
perform this, a time_t is added to the key struct and updated by the creation
of a link to a key and by a key being found as the result of a search.  At the
completion of a successful search, the keyrings in the path between the root of
the search and the first found link to it also have their last-used times
updated.

Note that discarding a link to a key from a keyring does not necessarily
destroy the key as there may be references held by other places.

An alternate discard method that might suffice is to perform FIFO discard from
the keyring, using the spare 2-byte hole in the keylist header as the index of
the next link to be discarded.

This is useful when using a keyring as a cache for DNS results or foreign
filesystem IDs.


This can be tested by the following.  As root do:

	echo 1000 >/proc/sys/kernel/keys/root_maxkeys

	kr=`keyctl newring foo @s`
	for ((i=0; i<2000; i++)); do keyctl add user a$i a $kr; done

Without this patch ENFILE should be reported when the keyring fills up.  With
this patch, the keyring discards keys in an LRU fashion.  Note that the stored
LRU time has a granularity of 1s.

After doing this, /proc/key-users can be observed and should show that most of
the 2000 keys have been discarded:

	[root@andromeda ~]# cat /proc/key-users
	    0:   517 516/516 513/1000 5249/20000

The "513/1000" here is the number of quota-accounted keys present for this user
out of the maximum permitted.

In /proc/keys, the keyring shows the number of keys it has and the number of
slots it has allocated:

	[root@andromeda ~]# grep foo /proc/keys
	200c64c4 I--Q--     1 perm 3b3f0000     0     0 keyring   foo: 509/509

The maximum is (PAGE_SIZE - header) / key pointer size.  That's typically 509
on a 64-bit system and 1020 on a 32-bit system.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 include/linux/key.h          |  1 +
 security/keys/keyring.c      | 47 +++++++++++++++++++++++++++++++++++++-------
 security/keys/process_keys.c |  2 ++
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/include/linux/key.h b/include/linux/key.h
index c505f83c969..13c0dcd8ee4 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -136,6 +136,7 @@ struct key {
 		time_t		expiry;		/* time at which key expires (or 0) */
 		time_t		revoked_at;	/* time at which key was revoked */
 	};
+	time_t			last_used_at;	/* last time used for LRU keyring discard */
 	uid_t			uid;
 	gid_t			gid;
 	key_perm_t		perm;		/* access permissions */
diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index 459b3cc347f..89d02cfb00c 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -30,6 +30,10 @@
 		(klist)->keys[index],					\
 		rwsem_is_locked((struct rw_semaphore *)&(keyring)->sem)))
 
+#define MAX_KEYRING_LINKS						\
+	min_t(size_t, USHRT_MAX - 1,					\
+	      ((PAGE_SIZE - sizeof(struct keyring_list)) / sizeof(struct key *)))
+
 #define KEY_LINK_FIXQUOTA 1UL
 
 /*
@@ -319,6 +323,8 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 			     bool no_state_check)
 {
 	struct {
+		/* Need a separate keylist pointer for RCU purposes */
+		struct key *keyring;
 		struct keyring_list *keylist;
 		int kix;
 	} stack[KEYRING_SEARCH_MAX_DEPTH];
@@ -451,6 +457,7 @@ ascend:
 			continue;
 
 		/* stack the current position */
+		stack[sp].keyring = keyring;
 		stack[sp].keylist = keylist;
 		stack[sp].kix = kix;
 		sp++;
@@ -466,6 +473,7 @@ not_this_keyring:
 	if (sp > 0) {
 		/* resume the processing of a keyring higher up in the tree */
 		sp--;
+		keyring = stack[sp].keyring;
 		keylist = stack[sp].keylist;
 		kix = stack[sp].kix + 1;
 		goto ascend;
@@ -477,6 +485,10 @@ not_this_keyring:
 	/* we found a viable match */
 found:
 	atomic_inc(&key->usage);
+	key->last_used_at = now.tv_sec;
+	keyring->last_used_at = now.tv_sec;
+	while (sp > 0)
+		stack[--sp].keyring->last_used_at = now.tv_sec;
 	key_check(key);
 	key_ref = make_key_ref(key, possessed);
 error_2:
@@ -558,6 +570,8 @@ key_ref_t __keyring_search_one(key_ref_t keyring_ref,
 
 found:
 	atomic_inc(&key->usage);
+	keyring->last_used_at = key->last_used_at =
+		current_kernel_time().tv_sec;
 	rcu_read_unlock();
 	return make_key_ref(key, possessed);
 }
@@ -611,6 +625,7 @@ struct key *find_keyring_by_name(const char *name, bool skip_perm_check)
 			 * (ie. it has a zero usage count) */
 			if (!atomic_inc_not_zero(&keyring->usage))
 				continue;
+			keyring->last_used_at = current_kernel_time().tv_sec;
 			goto out;
 		}
 	}
@@ -734,8 +749,9 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 	struct keyring_list *klist, *nklist;
 	unsigned long prealloc;
 	unsigned max;
+	time_t lowest_lru;
 	size_t size;
-	int loop, ret;
+	int loop, lru, ret;
 
 	kenter("%d,%s,%s,", key_serial(keyring), type->name, description);
 
@@ -756,7 +772,9 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 	klist = rcu_dereference_locked_keyring(keyring);
 
 	/* see if there's a matching key we can displace */
+	lru = -1;
 	if (klist && klist->nkeys > 0) {
+		lowest_lru = TIME_T_MAX;
 		for (loop = klist->nkeys - 1; loop >= 0; loop--) {
 			struct key *key = rcu_deref_link_locked(klist, loop,
 								keyring);
@@ -770,9 +788,23 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 				prealloc = 0;
 				goto done;
 			}
+			if (key->last_used_at < lowest_lru) {
+				lowest_lru = key->last_used_at;
+				lru = loop;
+			}
 		}
 	}
 
+	/* If the keyring is full then do an LRU discard */
+	if (klist &&
+	    klist->nkeys == klist->maxkeys &&
+	    klist->maxkeys >= MAX_KEYRING_LINKS) {
+		kdebug("LRU discard %d\n", lru);
+		klist->delkey = lru;
+		prealloc = 0;
+		goto done;
+	}
+
 	/* check that we aren't going to overrun the user's quota */
 	ret = key_payload_reserve(keyring,
 				  keyring->datalen + KEYQUOTA_LINK_BYTES);
@@ -786,15 +818,14 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 	} else {
 		/* grow the key list */
 		max = 4;
-		if (klist)
+		if (klist) {
 			max += klist->maxkeys;
+			if (max > MAX_KEYRING_LINKS)
+				max = MAX_KEYRING_LINKS;
+			BUG_ON(max <= klist->maxkeys);
+		}
 
-		ret = -ENFILE;
-		if (max > USHRT_MAX - 1)
-			goto error_quota;
 		size = sizeof(*klist) + sizeof(struct key *) * max;
-		if (size > PAGE_SIZE)
-			goto error_quota;
 
 		ret = -ENOMEM;
 		nklist = kmalloc(size, GFP_KERNEL);
@@ -873,6 +904,8 @@ void __key_link(struct key *keyring, struct key *key,
 	klist = rcu_dereference_locked_keyring(keyring);
 
 	atomic_inc(&key->usage);
+	keyring->last_used_at = key->last_used_at =
+		current_kernel_time().tv_sec;
 
 	/* there's a matching key we can displace or an empty slot in a newly
 	 * allocated list we can fill */
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index be7ecb2018d..e137fcd7042 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -732,6 +732,8 @@ try_again:
 	if (ret < 0)
 		goto invalid_key;
 
+	key->last_used_at = current_kernel_time().tv_sec;
+
 error:
 	put_cred(cred);
 	return key_ref;
-- 
cgit v1.2.3


From fd75815f727f157a05f4c96b5294a4617c0557da Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Fri, 11 May 2012 10:56:56 +0100
Subject: KEYS: Add invalidation support

Add support for invalidating a key - which renders it immediately invisible to
further searches and causes the garbage collector to immediately wake up,
remove it from keyrings and then destroy it when it's no longer referenced.

It's better not to do this with keyctl_revoke() as that marks the key to start
returning -EKEYREVOKED to searches when what is actually desired is to have the
key refetched.

To invalidate a key the caller must be granted SEARCH permission by the key.
This may be too strict.  It may be better to also permit invalidation if the
caller has any of READ, WRITE or SETATTR permission.

The primary use for this is to evict keys that are cached in special keyrings,
such as the DNS resolver or an ID mapper.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/security/keys.txt | 17 +++++++++++++++++
 include/linux/key.h             |  3 +++
 include/linux/keyctl.h          |  1 +
 security/keys/compat.c          |  3 +++
 security/keys/gc.c              | 21 ++++++++++++++-------
 security/keys/internal.h        | 15 ++++++++++++++-
 security/keys/key.c             | 22 ++++++++++++++++++++++
 security/keys/keyctl.c          | 34 ++++++++++++++++++++++++++++++++++
 security/keys/keyring.c         | 25 +++++++++++--------------
 security/keys/permission.c      | 15 ++++++++++-----
 security/keys/proc.c            |  3 ++-
 11 files changed, 131 insertions(+), 28 deletions(-)

diff --git a/Documentation/security/keys.txt b/Documentation/security/keys.txt
index d389acd31e1..aa0dbd74b71 100644
--- a/Documentation/security/keys.txt
+++ b/Documentation/security/keys.txt
@@ -805,6 +805,23 @@ The keyctl syscall functions are:
      kernel and resumes executing userspace.
 
 
+ (*) Invalidate a key.
+
+	long keyctl(KEYCTL_INVALIDATE, key_serial_t key);
+
+     This function marks a key as being invalidated and then wakes up the
+     garbage collector.  The garbage collector immediately removes invalidated
+     keys from all keyrings and deletes the key when its reference count
+     reaches zero.
+
+     Keys that are marked invalidated become invisible to normal key operations
+     immediately, though they are still visible in /proc/keys until deleted
+     (they're marked with an 'i' flag).
+
+     A process must have search permission on the key for this function to be
+     successful.
+
+
 ===============
 KERNEL SERVICES
 ===============
diff --git a/include/linux/key.h b/include/linux/key.h
index 13c0dcd8ee4..b145b054b3e 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -160,6 +160,7 @@ struct key {
 #define KEY_FLAG_USER_CONSTRUCT	4	/* set if key is being constructed in userspace */
 #define KEY_FLAG_NEGATIVE	5	/* set if key is negative */
 #define KEY_FLAG_ROOT_CAN_CLEAR	6	/* set if key can be cleared by root without permission */
+#define KEY_FLAG_INVALIDATED	7	/* set if key has been invalidated */
 
 	/* the description string
 	 * - this is used to match a key against search criteria
@@ -203,6 +204,7 @@ extern struct key *key_alloc(struct key_type *type,
 #define KEY_ALLOC_NOT_IN_QUOTA	0x0002	/* not in quota */
 
 extern void key_revoke(struct key *key);
+extern void key_invalidate(struct key *key);
 extern void key_put(struct key *key);
 
 static inline struct key *key_get(struct key *key)
@@ -323,6 +325,7 @@ extern void key_init(void);
 #define key_serial(k)			0
 #define key_get(k) 			({ NULL; })
 #define key_revoke(k)			do { } while(0)
+#define key_invalidate(k)		do { } while(0)
 #define key_put(k)			do { } while(0)
 #define key_ref_put(k)			do { } while(0)
 #define make_key_ref(k, p)		NULL
diff --git a/include/linux/keyctl.h b/include/linux/keyctl.h
index 9b0b865ce62..c9b7f4faf97 100644
--- a/include/linux/keyctl.h
+++ b/include/linux/keyctl.h
@@ -55,5 +55,6 @@
 #define KEYCTL_SESSION_TO_PARENT	18	/* apply session keyring to parent process */
 #define KEYCTL_REJECT			19	/* reject a partially constructed key */
 #define KEYCTL_INSTANTIATE_IOV		20	/* instantiate a partially constructed key */
+#define KEYCTL_INVALIDATE		21	/* invalidate a key */
 
 #endif /*  _LINUX_KEYCTL_H */
diff --git a/security/keys/compat.c b/security/keys/compat.c
index 4c48e13448f..fab4f8dda6c 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -135,6 +135,9 @@ asmlinkage long compat_sys_keyctl(u32 option,
 		return compat_keyctl_instantiate_key_iov(
 			arg2, compat_ptr(arg3), arg4, arg5);
 
+	case KEYCTL_INVALIDATE:
+		return keyctl_invalidate_key(arg2);
+
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/security/keys/gc.c b/security/keys/gc.c
index adddaa258d5..61ab7c82ebb 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -71,6 +71,15 @@ void key_schedule_gc(time_t gc_at)
 	}
 }
 
+/*
+ * Schedule a dead links collection run.
+ */
+void key_schedule_gc_links(void)
+{
+	set_bit(KEY_GC_KEY_EXPIRED, &key_gc_flags);
+	queue_work(system_nrt_wq, &key_gc_work);
+}
+
 /*
  * Some key's cleanup time was met after it expired, so we need to get the
  * reaper to go through a cycle finding expired keys.
@@ -79,8 +88,7 @@ static void key_gc_timer_func(unsigned long data)
 {
 	kenter("");
 	key_gc_next_run = LONG_MAX;
-	set_bit(KEY_GC_KEY_EXPIRED, &key_gc_flags);
-	queue_work(system_nrt_wq, &key_gc_work);
+	key_schedule_gc_links();
 }
 
 /*
@@ -131,12 +139,12 @@ void key_gc_keytype(struct key_type *ktype)
 static void key_gc_keyring(struct key *keyring, time_t limit)
 {
 	struct keyring_list *klist;
-	struct key *key;
 	int loop;
 
 	kenter("%x", key_serial(keyring));
 
-	if (test_bit(KEY_FLAG_REVOKED, &keyring->flags))
+	if (keyring->flags & ((1 << KEY_FLAG_INVALIDATED) |
+			      (1 << KEY_FLAG_REVOKED)))
 		goto dont_gc;
 
 	/* scan the keyring looking for dead keys */
@@ -148,9 +156,8 @@ static void key_gc_keyring(struct key *keyring, time_t limit)
 	loop = klist->nkeys;
 	smp_rmb();
 	for (loop--; loop >= 0; loop--) {
-		key = rcu_dereference(klist->keys[loop]);
-		if (test_bit(KEY_FLAG_DEAD, &key->flags) ||
-		    (key->expiry > 0 && key->expiry <= limit))
+		struct key *key = rcu_dereference(klist->keys[loop]);
+		if (key_is_dead(key, limit))
 			goto do_gc;
 	}
 
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 65647f82558..f711b094ed4 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -152,7 +152,8 @@ extern long join_session_keyring(const char *name);
 extern struct work_struct key_gc_work;
 extern unsigned key_gc_delay;
 extern void keyring_gc(struct key *keyring, time_t limit);
-extern void key_schedule_gc(time_t expiry_at);
+extern void key_schedule_gc(time_t gc_at);
+extern void key_schedule_gc_links(void);
 extern void key_gc_keytype(struct key_type *ktype);
 
 extern int key_task_permission(const key_ref_t key_ref,
@@ -196,6 +197,17 @@ extern struct key *request_key_auth_new(struct key *target,
 
 extern struct key *key_get_instantiation_authkey(key_serial_t target_id);
 
+/*
+ * Determine whether a key is dead.
+ */
+static inline bool key_is_dead(struct key *key, time_t limit)
+{
+	return
+		key->flags & ((1 << KEY_FLAG_DEAD) |
+			      (1 << KEY_FLAG_INVALIDATED)) ||
+		(key->expiry > 0 && key->expiry <= limit);
+}
+
 /*
  * keyctl() functions
  */
@@ -225,6 +237,7 @@ extern long keyctl_reject_key(key_serial_t, unsigned, unsigned, key_serial_t);
 extern long keyctl_instantiate_key_iov(key_serial_t,
 				       const struct iovec __user *,
 				       unsigned, key_serial_t);
+extern long keyctl_invalidate_key(key_serial_t);
 
 extern long keyctl_instantiate_key_common(key_serial_t,
 					  const struct iovec __user *,
diff --git a/security/keys/key.c b/security/keys/key.c
index dc628941ecd..c9bf66ac36e 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -954,6 +954,28 @@ void key_revoke(struct key *key)
 }
 EXPORT_SYMBOL(key_revoke);
 
+/**
+ * key_invalidate - Invalidate a key.
+ * @key: The key to be invalidated.
+ *
+ * Mark a key as being invalidated and have it cleaned up immediately.  The key
+ * is ignored by all searches and other operations from this point.
+ */
+void key_invalidate(struct key *key)
+{
+	kenter("%d", key_serial(key));
+
+	key_check(key);
+
+	if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) {
+		down_write_nested(&key->sem, 1);
+		if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags))
+			key_schedule_gc_links();
+		up_write(&key->sem);
+	}
+}
+EXPORT_SYMBOL(key_invalidate);
+
 /**
  * register_key_type - Register a type of key.
  * @ktype: The new key type.
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index fb767c6cd99..ddb3e05bc5f 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -374,6 +374,37 @@ error:
 	return ret;
 }
 
+/*
+ * Invalidate a key.
+ *
+ * The key must be grant the caller Invalidate permission for this to work.
+ * The key and any links to the key will be automatically garbage collected
+ * immediately.
+ *
+ * If successful, 0 is returned.
+ */
+long keyctl_invalidate_key(key_serial_t id)
+{
+	key_ref_t key_ref;
+	long ret;
+
+	kenter("%d", id);
+
+	key_ref = lookup_user_key(id, 0, KEY_SEARCH);
+	if (IS_ERR(key_ref)) {
+		ret = PTR_ERR(key_ref);
+		goto error;
+	}
+
+	key_invalidate(key_ref_to_ptr(key_ref));
+	ret = 0;
+
+	key_ref_put(key_ref);
+error:
+	kleave(" = %ld", ret);
+	return ret;
+}
+
 /*
  * Clear the specified keyring, creating an empty process keyring if one of the
  * special keyring IDs is used.
@@ -1622,6 +1653,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			(unsigned) arg4,
 			(key_serial_t) arg5);
 
+	case KEYCTL_INVALIDATE:
+		return keyctl_invalidate_key((key_serial_t) arg2);
+
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index 89d02cfb00c..7445875f681 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -382,13 +382,17 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 	/* otherwise, the top keyring must not be revoked, expired, or
 	 * negatively instantiated if we are to search it */
 	key_ref = ERR_PTR(-EAGAIN);
-	if (kflags & ((1 << KEY_FLAG_REVOKED) | (1 << KEY_FLAG_NEGATIVE)) ||
+	if (kflags & ((1 << KEY_FLAG_INVALIDATED) |
+		      (1 << KEY_FLAG_REVOKED) |
+		      (1 << KEY_FLAG_NEGATIVE)) ||
 	    (keyring->expiry && now.tv_sec >= keyring->expiry))
 		goto error_2;
 
 	/* start processing a new keyring */
 descend:
-	if (test_bit(KEY_FLAG_REVOKED, &keyring->flags))
+	kflags = keyring->flags;
+	if (kflags & ((1 << KEY_FLAG_INVALIDATED) |
+		      (1 << KEY_FLAG_REVOKED)))
 		goto not_this_keyring;
 
 	keylist = rcu_dereference(keyring->payload.subscriptions);
@@ -406,9 +410,10 @@ descend:
 		if (key->type != type)
 			continue;
 
-		/* skip revoked keys and expired keys */
+		/* skip invalidated, revoked and expired keys */
 		if (!no_state_check) {
-			if (kflags & (1 << KEY_FLAG_REVOKED))
+			if (kflags & ((1 << KEY_FLAG_INVALIDATED) |
+				      (1 << KEY_FLAG_REVOKED)))
 				continue;
 
 			if (key->expiry && now.tv_sec >= key->expiry)
@@ -559,7 +564,8 @@ key_ref_t __keyring_search_one(key_ref_t keyring_ref,
 			     key->type->match(key, description)) &&
 			    key_permission(make_key_ref(key, possessed),
 					   perm) == 0 &&
-			    !test_bit(KEY_FLAG_REVOKED, &key->flags)
+			    !(key->flags & ((1 << KEY_FLAG_INVALIDATED) |
+					    (1 << KEY_FLAG_REVOKED)))
 			    )
 				goto found;
 		}
@@ -1176,15 +1182,6 @@ static void keyring_revoke(struct key *keyring)
 	}
 }
 
-/*
- * Determine whether a key is dead.
- */
-static bool key_is_dead(struct key *key, time_t limit)
-{
-	return test_bit(KEY_FLAG_DEAD, &key->flags) ||
-		(key->expiry > 0 && key->expiry <= limit);
-}
-
 /*
  * Collect garbage from the contents of a keyring, replacing the old list with
  * a new one with the pointers all shuffled down.
diff --git a/security/keys/permission.c b/security/keys/permission.c
index c35b5229e3c..5f4c00c0947 100644
--- a/security/keys/permission.c
+++ b/security/keys/permission.c
@@ -87,20 +87,25 @@ EXPORT_SYMBOL(key_task_permission);
  * key_validate - Validate a key.
  * @key: The key to be validated.
  *
- * Check that a key is valid, returning 0 if the key is okay, -EKEYREVOKED if
- * the key's type has been removed or if the key has been revoked or
- * -EKEYEXPIRED if the key has expired.
+ * Check that a key is valid, returning 0 if the key is okay, -ENOKEY if the
+ * key is invalidated, -EKEYREVOKED if the key's type has been removed or if
+ * the key has been revoked or -EKEYEXPIRED if the key has expired.
  */
 int key_validate(struct key *key)
 {
 	struct timespec now;
+	unsigned long flags = key->flags;
 	int ret = 0;
 
 	if (key) {
+		ret = -ENOKEY;
+		if (flags & (1 << KEY_FLAG_INVALIDATED))
+			goto error;
+
 		/* check it's still accessible */
 		ret = -EKEYREVOKED;
-		if (test_bit(KEY_FLAG_REVOKED, &key->flags) ||
-		    test_bit(KEY_FLAG_DEAD, &key->flags))
+		if (flags & ((1 << KEY_FLAG_REVOKED) |
+			     (1 << KEY_FLAG_DEAD)))
 			goto error;
 
 		/* check it hasn't expired */
diff --git a/security/keys/proc.c b/security/keys/proc.c
index 49bbc97943a..30d1ddfd9ce 100644
--- a/security/keys/proc.c
+++ b/security/keys/proc.c
@@ -242,7 +242,7 @@ static int proc_keys_show(struct seq_file *m, void *v)
 #define showflag(KEY, LETTER, FLAG) \
 	(test_bit(FLAG,	&(KEY)->flags) ? LETTER : '-')
 
-	seq_printf(m, "%08x %c%c%c%c%c%c %5d %4s %08x %5d %5d %-9.9s ",
+	seq_printf(m, "%08x %c%c%c%c%c%c%c %5d %4s %08x %5d %5d %-9.9s ",
 		   key->serial,
 		   showflag(key, 'I', KEY_FLAG_INSTANTIATED),
 		   showflag(key, 'R', KEY_FLAG_REVOKED),
@@ -250,6 +250,7 @@ static int proc_keys_show(struct seq_file *m, void *v)
 		   showflag(key, 'Q', KEY_FLAG_IN_QUOTA),
 		   showflag(key, 'U', KEY_FLAG_USER_CONSTRUCT),
 		   showflag(key, 'N', KEY_FLAG_NEGATIVE),
+		   showflag(key, 'i', KEY_FLAG_INVALIDATED),
 		   atomic_read(&key->usage),
 		   xbuf,
 		   key->perm,
-- 
cgit v1.2.3


From 77b513dda90fd99bd1225410b25e745b74779c1c Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sun, 13 May 2012 23:03:23 +0900
Subject: TOMOYO: Accept manager programs which do not start with / .

The pathname of /usr/sbin/tomoyo-editpolicy seen from Ubuntu 12.04 Live CD is
squashfs:/usr/sbin/tomoyo-editpolicy rather than /usr/sbin/tomoyo-editpolicy .
Therefore, we need to accept manager programs which do not start with / .

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 security/tomoyo/common.c | 26 ++++++--------------------
 security/tomoyo/common.h |  1 -
 2 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/security/tomoyo/common.c b/security/tomoyo/common.c
index 8656b16eef7..2e0f12c6293 100644
--- a/security/tomoyo/common.c
+++ b/security/tomoyo/common.c
@@ -850,14 +850,9 @@ static int tomoyo_update_manager_entry(const char *manager,
 		policy_list[TOMOYO_ID_MANAGER],
 	};
 	int error = is_delete ? -ENOENT : -ENOMEM;
-	if (tomoyo_domain_def(manager)) {
-		if (!tomoyo_correct_domain(manager))
-			return -EINVAL;
-		e.is_domain = true;
-	} else {
-		if (!tomoyo_correct_path(manager))
-			return -EINVAL;
-	}
+	if (!tomoyo_correct_domain(manager) &&
+	    !tomoyo_correct_word(manager))
+		return -EINVAL;
 	e.manager = tomoyo_get_name(manager);
 	if (e.manager) {
 		error = tomoyo_update_policy(&e.head, sizeof(e), &param,
@@ -932,23 +927,14 @@ static bool tomoyo_manager(void)
 		return true;
 	if (!tomoyo_manage_by_non_root && (task->cred->uid || task->cred->euid))
 		return false;
-	list_for_each_entry_rcu(ptr, &tomoyo_kernel_namespace.
-				policy_list[TOMOYO_ID_MANAGER], head.list) {
-		if (!ptr->head.is_deleted && ptr->is_domain
-		    && !tomoyo_pathcmp(domainname, ptr->manager)) {
-			found = true;
-			break;
-		}
-	}
-	if (found)
-		return true;
 	exe = tomoyo_get_exe();
 	if (!exe)
 		return false;
 	list_for_each_entry_rcu(ptr, &tomoyo_kernel_namespace.
 				policy_list[TOMOYO_ID_MANAGER], head.list) {
-		if (!ptr->head.is_deleted && !ptr->is_domain
-		    && !strcmp(exe, ptr->manager->name)) {
+		if (!ptr->head.is_deleted &&
+		    (!tomoyo_pathcmp(domainname, ptr->manager) ||
+		     !strcmp(exe, ptr->manager->name))) {
 			found = true;
 			break;
 		}
diff --git a/security/tomoyo/common.h b/security/tomoyo/common.h
index 30fd9836970..75e4dc1c02a 100644
--- a/security/tomoyo/common.h
+++ b/security/tomoyo/common.h
@@ -860,7 +860,6 @@ struct tomoyo_aggregator {
 /* Structure for policy manager. */
 struct tomoyo_manager {
 	struct tomoyo_acl_head head;
-	bool is_domain;  /* True if manager is a domainname. */
 	/* A path to program or a domainname. */
 	const struct tomoyo_path_info *manager;
 };
-- 
cgit v1.2.3


From 2cc8a71641b4460783ea3bd7a3476043fdf85397 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Mon, 14 May 2012 10:19:28 -0700
Subject: Yama: replace capable() with ns_capable()

When checking capabilities, the question we want to be asking is "does
current() have the capability in the child's namespace?"

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 security/yama/yama_lsm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index c852f7472ad..83554ee8a58 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -264,11 +264,11 @@ static int yama_ptrace_access_check(struct task_struct *child,
 		case YAMA_SCOPE_RELATIONAL:
 			if (!task_is_descendant(current, child) &&
 			    !ptracer_exception_found(current, child) &&
-			    !capable(CAP_SYS_PTRACE))
+			    !ns_capable(task_user_ns(child), CAP_SYS_PTRACE))
 				rc = -EPERM;
 			break;
 		case YAMA_SCOPE_CAPABILITY:
-			if (!capable(CAP_SYS_PTRACE))
+			if (!ns_capable(task_user_ns(child), CAP_SYS_PTRACE))
 				rc = -EPERM;
 			break;
 		case YAMA_SCOPE_NO_ATTACH:
-- 
cgit v1.2.3


From 2267b13a7cad1f9dfe0073c1f902d45953f9faff Mon Sep 17 00:00:00 2001
From: Casey Schaufler <casey@schaufler-ca.com>
Date: Tue, 13 Mar 2012 19:14:19 -0700
Subject: Smack: recursive tramsmute

The transmuting directory feature of Smack requires that
the transmuting attribute be explicitly set in all cases.
It seems the users of this facility would expect that the
transmuting attribute be inherited by subdirectories that
are created in a transmuting directory. This does not seem
to add any additional complexity to the understanding of
how the system works.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
---
 security/smack/smack.h     |  1 +
 security/smack/smack_lsm.c | 44 +++++++++++++++++++++++++++++++++++---------
 2 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/security/smack/smack.h b/security/smack/smack.h
index 4ede719922e..cf2594dfa93 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -66,6 +66,7 @@ struct task_smack {
 
 #define	SMK_INODE_INSTANT	0x01	/* inode is instantiated */
 #define	SMK_INODE_TRANSMUTE	0x02	/* directory is transmuting */
+#define	SMK_INODE_CHANGED	0x04	/* smack was transmuted */
 
 /*
  * A label access rule.
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 45c32f07416..df45d393918 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -556,6 +556,7 @@ static int smack_inode_init_security(struct inode *inode, struct inode *dir,
 				     void **value, size_t *len)
 {
 	struct smack_known *skp;
+	struct inode_smack *issp = inode->i_security;
 	char *csp = smk_of_current();
 	char *isp = smk_of_inode(inode);
 	char *dsp = smk_of_inode(dir);
@@ -577,10 +578,13 @@ static int smack_inode_init_security(struct inode *inode, struct inode *dir,
 		 * If the access rule allows transmutation and
 		 * the directory requests transmutation then
 		 * by all means transmute.
+		 * Mark the inode as changed.
 		 */
 		if (may > 0 && ((may & MAY_TRANSMUTE) != 0) &&
-		    smk_inode_transmutable(dir))
+		    smk_inode_transmutable(dir)) {
 			isp = dsp;
+			issp->smk_flags |= SMK_INODE_CHANGED;
+		}
 
 		*value = kstrdup(isp, GFP_KERNEL);
 		if (*value == NULL)
@@ -2552,6 +2556,7 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 	char *final;
 	char trattr[TRANS_TRUE_SIZE];
 	int transflag = 0;
+	int rc;
 	struct dentry *dp;
 
 	if (inode == NULL)
@@ -2670,17 +2675,38 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 		 */
 		dp = dget(opt_dentry);
 		fetched = smk_fetch(XATTR_NAME_SMACK, inode, dp);
-		if (fetched != NULL) {
+		if (fetched != NULL)
 			final = fetched;
-			if (S_ISDIR(inode->i_mode)) {
-				trattr[0] = '\0';
-				inode->i_op->getxattr(dp,
+
+		/*
+		 * Transmuting directory
+		 */
+		if (S_ISDIR(inode->i_mode)) {
+			/*
+			 * If this is a new directory and the label was
+			 * transmuted when the inode was initialized
+			 * set the transmute attribute on the directory
+			 * and mark the inode.
+			 *
+			 * If there is a transmute attribute on the
+			 * directory mark the inode.
+			 */
+			if (isp->smk_flags & SMK_INODE_CHANGED) {
+				isp->smk_flags &= ~SMK_INODE_CHANGED;
+				rc = inode->i_op->setxattr(dp,
 					XATTR_NAME_SMACKTRANSMUTE,
-					trattr, TRANS_TRUE_SIZE);
-				if (strncmp(trattr, TRANS_TRUE,
-					    TRANS_TRUE_SIZE) == 0)
-					transflag = SMK_INODE_TRANSMUTE;
+					TRANS_TRUE, TRANS_TRUE_SIZE,
+					0);
+			} else {
+				rc = inode->i_op->getxattr(dp,
+					XATTR_NAME_SMACKTRANSMUTE, trattr,
+					TRANS_TRUE_SIZE);
+				if (rc >= 0 && strncmp(trattr, TRANS_TRUE,
+						       TRANS_TRUE_SIZE) != 0)
+					rc = -EINVAL;
 			}
+			if (rc >= 0)
+				transflag = SMK_INODE_TRANSMUTE;
 		}
 		isp->smk_task = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
 		isp->smk_mmap = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
-- 
cgit v1.2.3


From ceffec5541cc22486d3ff492e3d76a33a68fbfa3 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 29 Mar 2012 16:19:05 +0900
Subject: gfp flags for security_inode_alloc()?

Dave Chinner wrote:
> Yes, because you have no idea what the calling context is except
> for the fact that is from somewhere inside filesystem code and the
> filesystem could be holding locks. Therefore, GFP_NOFS is really the
> only really safe way to allocate memory here.

I see. Thank you.

I'm not sure, but can call trace happen where somewhere inside network
filesystem or stackable filesystem code with locks held invokes operations that
involves GFP_KENREL memory allocation outside that filesystem?
----------
[PATCH] SMACK: Fix incorrect GFP_KERNEL usage.

new_inode_smack() which can be called from smack_inode_alloc_security() needs
to use GFP_NOFS like SELinux's inode_alloc_security() does, for
security_inode_alloc() is called from inode_init_always() and
inode_init_always() is called from xfs_inode_alloc() which is using GFP_NOFS.

smack_inode_init_security() needs to use GFP_NOFS like
selinux_inode_init_security() does, for initxattrs() callback function (e.g.
btrfs_initxattrs()) which is called from security_inode_init_security() is
using GFP_NOFS.

smack_audit_rule_match() needs to use GFP_ATOMIC, for
security_audit_rule_match() can be called from audit_filter_user_rules() and
audit_filter_user_rules() is called from audit_filter_user() with RCU read lock
held.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Casey Schaufler <cschaufler@cschaufler-intel.(none)>
---
 security/smack/smack_lsm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index df45d393918..5f80075d771 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -79,7 +79,7 @@ struct inode_smack *new_inode_smack(char *smack)
 {
 	struct inode_smack *isp;
 
-	isp = kzalloc(sizeof(struct inode_smack), GFP_KERNEL);
+	isp = kzalloc(sizeof(struct inode_smack), GFP_NOFS);
 	if (isp == NULL)
 		return NULL;
 
@@ -563,7 +563,7 @@ static int smack_inode_init_security(struct inode *inode, struct inode *dir,
 	int may;
 
 	if (name) {
-		*name = kstrdup(XATTR_SMACK_SUFFIX, GFP_KERNEL);
+		*name = kstrdup(XATTR_SMACK_SUFFIX, GFP_NOFS);
 		if (*name == NULL)
 			return -ENOMEM;
 	}
@@ -586,7 +586,7 @@ static int smack_inode_init_security(struct inode *inode, struct inode *dir,
 			issp->smk_flags |= SMK_INODE_CHANGED;
 		}
 
-		*value = kstrdup(isp, GFP_KERNEL);
+		*value = kstrdup(isp, GFP_NOFS);
 		if (*value == NULL)
 			return -ENOMEM;
 	}
@@ -3426,7 +3426,7 @@ static int smack_audit_rule_match(u32 secid, u32 field, u32 op, void *vrule,
 	char *rule = vrule;
 
 	if (!rule) {
-		audit_log(actx, GFP_KERNEL, AUDIT_SELINUX_ERR,
+		audit_log(actx, GFP_ATOMIC, AUDIT_SELINUX_ERR,
 			  "Smack: missing rule\n");
 		return -ENOENT;
 	}
-- 
cgit v1.2.3


From f7112e6c9abf1c70f001dcf097c1d6e218a93f5c Mon Sep 17 00:00:00 2001
From: Casey Schaufler <casey@schaufler-ca.com>
Date: Sun, 6 May 2012 15:22:02 -0700
Subject: Smack: allow for significantly longer Smack labels v4

V4 updated to current linux-security#next
Targeted for git://gitorious.org/smack-next/kernel.git

Modern application runtime environments like to use
naming schemes that are structured and generated without
human intervention. Even though the Smack limit of 23
characters for a label name is perfectly rational for
human use there have been complaints that the limit is
a problem in environments where names are composed from
a set or sources, including vendor, author, distribution
channel and application name. Names like

	softwarehouse-pgwodehouse-coolappstore-mellowmuskrats

are becoming harder to avoid. This patch introduces long
label support in Smack. Labels are now limited to 255
characters instead of the old 23.

The primary reason for limiting the labels to 23 characters
was so they could be directly contained in CIPSO category sets.
This is still done were possible, but for labels that are too
large a mapping is required. This is perfectly safe for communication
that stays "on the box" and doesn't require much coordination
between boxes beyond what would have been required to keep label
names consistent.

The bulk of this patch is in smackfs, adding and updating
administrative interfaces. Because existing APIs can't be
changed new ones that do much the same things as old ones
have been introduced.

The Smack specific CIPSO data representation has been removed
and replaced with the data format used by netlabel. The CIPSO
header is now computed when a label is imported rather than
on use. This results in improved IP performance. The smack
label is now allocated separately from the containing structure,
allowing for larger strings.

Four new /smack interfaces have been introduced as four
of the old interfaces strictly required labels be specified
in fixed length arrays.

The access interface is supplemented with the check interface:
	access  "Subject                 Object                  rwxat"
	access2 "Subject Object rwaxt"

The load interface is supplemented with the rules interface:
	load   "Subject                 Object                  rwxat"
	load2  "Subject Object rwaxt"

The load-self interface is supplemented with the self-rules interface:
	load-self   "Subject                 Object                  rwxat"
	load-self2  "Subject Object rwaxt"

The cipso interface is supplemented with the wire interface:
	cipso  "Subject                  lvl cnt  c1  c2 ..."
	cipso2 "Subject lvl cnt  c1  c2 ..."

The old interfaces are maintained for compatibility.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
---
 Documentation/security/Smack.txt | 204 ++++++--
 security/smack/smack.h           |  56 +--
 security/smack/smack_access.c    | 233 +++++----
 security/smack/smack_lsm.c       | 185 ++------
 security/smack/smackfs.c         | 993 ++++++++++++++++++++++++++++++---------
 5 files changed, 1105 insertions(+), 566 deletions(-)

diff --git a/Documentation/security/Smack.txt b/Documentation/security/Smack.txt
index d2f72ae6643..a416479b8a1 100644
--- a/Documentation/security/Smack.txt
+++ b/Documentation/security/Smack.txt
@@ -15,7 +15,7 @@ at hand.
 
 Smack consists of three major components:
     - The kernel
-    - A start-up script and a few modified applications
+    - Basic utilities, which are helpful but not required
     - Configuration data
 
 The kernel component of Smack is implemented as a Linux
@@ -23,37 +23,28 @@ Security Modules (LSM) module. It requires netlabel and
 works best with file systems that support extended attributes,
 although xattr support is not strictly required.
 It is safe to run a Smack kernel under a "vanilla" distribution.
+
 Smack kernels use the CIPSO IP option. Some network
 configurations are intolerant of IP options and can impede
 access to systems that use them as Smack does.
 
-The startup script etc-init.d-smack should be installed
-in /etc/init.d/smack and should be invoked early in the
-start-up process. On Fedora rc5.d/S02smack is recommended.
-This script ensures that certain devices have the correct
-Smack attributes and loads the Smack configuration if
-any is defined. This script invokes two programs that
-ensure configuration data is properly formatted. These
-programs are /usr/sbin/smackload and /usr/sin/smackcipso.
-The system will run just fine without these programs,
-but it will be difficult to set access rules properly.
-
-A version of "ls" that provides a "-M" option to display
-Smack labels on long listing is available.
+The current git repositories for Smack user space are:
 
-A hacked version of sshd that allows network logins by users
-with specific Smack labels is available. This version does
-not work for scp. You must set the /etc/ssh/sshd_config
-line:
-   UsePrivilegeSeparation no
+	git@gitorious.org:meego-platform-security/smackutil.git
+	git@gitorious.org:meego-platform-security/libsmack.git
 
-The format of /etc/smack/usr is:
+These should make and install on most modern distributions.
+There are three commands included in smackutil:
 
-   username smack
+smackload  - properly formats data for writing to /smack/load
+smackcipso - properly formats data for writing to /smack/cipso
+chsmack    - display or set Smack extended attribute values
 
 In keeping with the intent of Smack, configuration data is
 minimal and not strictly required. The most important
 configuration step is mounting the smackfs pseudo filesystem.
+If smackutil is installed the startup script will take care
+of this, but it can be manually as well.
 
 Add this line to /etc/fstab:
 
@@ -61,19 +52,148 @@ Add this line to /etc/fstab:
 
 and create the /smack directory for mounting.
 
-Smack uses extended attributes (xattrs) to store file labels.
-The command to set a Smack label on a file is:
+Smack uses extended attributes (xattrs) to store labels on filesystem
+objects. The attributes are stored in the extended attribute security
+name space. A process must have CAP_MAC_ADMIN to change any of these
+attributes.
+
+The extended attributes that Smack uses are:
+
+SMACK64
+	Used to make access control decisions. In almost all cases
+	the label given to a new filesystem object will be the label
+	of the process that created it.
+SMACK64EXEC
+	The Smack label of a process that execs a program file with
+	this attribute set will run with this attribute's value.
+SMACK64MMAP
+	Don't allow the file to be mmapped by a process whose Smack
+	label does not allow all of the access permitted to a process
+	with the label contained in this attribute. This is a very
+	specific use case for shared libraries.
+SMACK64TRANSMUTE
+	Can only have the value "TRUE". If this attribute is present
+	on a directory when an object is created in the directory and
+	the Smack rule (more below) that permitted the write access
+	to the directory includes the transmute ("t") mode the object
+	gets the label of the directory instead of the label of the
+	creating process. If the object being created is a directory
+	the SMACK64TRANSMUTE attribute is set as well.
+SMACK64IPIN
+	This attribute is only available on file descriptors for sockets.
+	Use the Smack label in this attribute for access control
+	decisions on packets being delivered to this socket.
+SMACK64IPOUT
+	This attribute is only available on file descriptors for sockets.
+	Use the Smack label in this attribute for access control
+	decisions on packets coming from this socket.
+
+There are multiple ways to set a Smack label on a file:
 
     # attr -S -s SMACK64 -V "value" path
+    # chsmack -a value path
 
-NOTE: Smack labels are limited to 23 characters. The attr command
-      does not enforce this restriction and can be used to set
-      invalid Smack labels on files.
-
-If you don't do anything special all users will get the floor ("_")
-label when they log in. If you do want to log in via the hacked ssh
-at other labels use the attr command to set the smack value on the
-home directory and its contents.
+A process can see the smack label it is running with by
+reading /proc/self/attr/current. A process with CAP_MAC_ADMIN
+can set the process smack by writing there.
+
+Most Smack configuration is accomplished by writing to files
+in the smackfs filesystem. This pseudo-filesystem is usually
+mounted on /smack.
+
+access
+	This interface reports whether a subject with the specified
+	Smack label has a particular access to an object with a
+	specified Smack label. Write a fixed format access rule to
+	this file. The next read will indicate whether the access
+	would be permitted. The text will be either "1" indicating
+	access, or "0" indicating denial.
+access2
+	This interface reports whether a subject with the specified
+	Smack label has a particular access to an object with a
+	specified Smack label. Write a long format access rule to
+	this file. The next read will indicate whether the access
+	would be permitted. The text will be either "1" indicating
+	access, or "0" indicating denial.
+ambient
+	This contains the Smack label applied to unlabeled network
+	packets.
+cipso
+	This interface allows a specific CIPSO header to be assigned
+	to a Smack label. The format accepted on write is:
+		"%24s%4d%4d"["%4d"]...
+	The first string is a fixed Smack label. The first number is
+	the level to use. The second number is the number of categories.
+	The following numbers are the categories.
+	"level-3-cats-5-19          3   2   5  19"
+cipso2
+	This interface allows a specific CIPSO header to be assigned
+	to a Smack label. The format accepted on write is:
+	"%s%4d%4d"["%4d"]...
+	The first string is a long Smack label. The first number is
+	the level to use. The second number is the number of categories.
+	The following numbers are the categories.
+	"level-3-cats-5-19   3   2   5  19"
+direct
+	This contains the CIPSO level used for Smack direct label
+	representation in network packets.
+doi
+	This contains the CIPSO domain of interpretation used in
+	network packets.
+load
+	This interface allows access control rules in addition to
+	the system defined rules to be specified. The format accepted
+	on write is:
+		"%24s%24s%5s"
+	where the first string is the subject label, the second the
+	object label, and the third the requested access. The access
+	string may contain only the characters "rwxat-", and specifies
+	which sort of access is allowed. The "-" is a placeholder for
+	permissions that are not allowed. The string "r-x--" would
+	specify read and execute access. Labels are limited to 23
+	characters in length.
+load2
+	This interface allows access control rules in addition to
+	the system defined rules to be specified. The format accepted
+	on write is:
+		"%s %s %s"
+	where the first string is the subject label, the second the
+	object label, and the third the requested access. The access
+	string may contain only the characters "rwxat-", and specifies
+	which sort of access is allowed. The "-" is a placeholder for
+	permissions that are not allowed. The string "r-x--" would
+	specify read and execute access.
+load-self
+	This interface allows process specific access rules to be
+	defined. These rules are only consulted if access would
+	otherwise be permitted, and are intended to provide additional
+	restrictions on the process. The format is the same as for
+	the load interface.
+load-self2
+	This interface allows process specific access rules to be
+	defined. These rules are only consulted if access would
+	otherwise be permitted, and are intended to provide additional
+	restrictions on the process. The format is the same as for
+	the load2 interface.
+logging
+	This contains the Smack logging state.
+mapped
+	This contains the CIPSO level used for Smack mapped label
+	representation in network packets.
+netlabel
+	This interface allows specific internet addresses to be
+	treated as single label hosts. Packets are sent to single
+	label hosts without CIPSO headers, but only from processes
+	that have Smack write access to the host label. All packets
+	received from single label hosts are given the specified
+	label. The format accepted on write is:
+		"%d.%d.%d.%d label" or "%d.%d.%d.%d/%d label".
+onlycap
+	This contains the label processes must have for CAP_MAC_ADMIN
+	and CAP_MAC_OVERRIDE to be effective. If this file is empty
+	these capabilities are effective at for processes with any
+	label. The value is set by writing the desired label to the
+	file or cleared by writing "-" to the file.
 
 You can add access rules in /etc/smack/accesses. They take the form:
 
@@ -83,10 +203,6 @@ access is a combination of the letters rwxa which specify the
 kind of access permitted a subject with subjectlabel on an
 object with objectlabel. If there is no rule no access is allowed.
 
-A process can see the smack label it is running with by
-reading /proc/self/attr/current. A privileged process can
-set the process smack by writing there.
-
 Look for additional programs on http://schaufler-ca.com
 
 From the Smack Whitepaper:
@@ -186,7 +302,7 @@ team. Smack labels are unstructured, case sensitive, and the only operation
 ever performed on them is comparison for equality. Smack labels cannot
 contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
 (quote) and '"' (double-quote) characters.
-Smack labels cannot begin with a '-', which is reserved for special options.
+Smack labels cannot begin with a '-'. This is reserved for special options.
 
 There are some predefined labels:
 
@@ -194,7 +310,7 @@ There are some predefined labels:
 	^ 	Pronounced "hat", a single circumflex character.
 	* 	Pronounced "star", a single asterisk character.
 	? 	Pronounced "huh", a single question mark character.
-	@ 	Pronounced "Internet", a single at sign character.
+	@ 	Pronounced "web", a single at sign character.
 
 Every task on a Smack system is assigned a label. System tasks, such as
 init(8) and systems daemons, are run with the floor ("_") label. User tasks
@@ -246,13 +362,14 @@ The format of an access rule is:
 
 Where subject-label is the Smack label of the task, object-label is the Smack
 label of the thing being accessed, and access is a string specifying the sort
-of access allowed. The Smack labels are limited to 23 characters. The access
-specification is searched for letters that describe access modes:
+of access allowed. The access specification is searched for letters that
+describe access modes:
 
 	a: indicates that append access should be granted.
 	r: indicates that read access should be granted.
 	w: indicates that write access should be granted.
 	x: indicates that execute access should be granted.
+	t: indicates that the rule requests transmutation.
 
 Uppercase values for the specification letters are allowed as well.
 Access mode specifications can be in any order. Examples of acceptable rules
@@ -273,7 +390,7 @@ Examples of unacceptable rules are:
 
 Spaces are not allowed in labels. Since a subject always has access to files
 with the same label specifying a rule for that case is pointless. Only
-valid letters (rwxaRWXA) and the dash ('-') character are allowed in
+valid letters (rwxatRWXAT) and the dash ('-') character are allowed in
 access specifications. The dash is a placeholder, so "a-r" is the same
 as "ar". A lone dash is used to specify that no access should be allowed.
 
@@ -297,6 +414,13 @@ but not any of its attributes by the circumstance of having read access to the
 containing directory but not to the differently labeled file. This is an
 artifact of the file name being data in the directory, not a part of the file.
 
+If a directory is marked as transmuting (SMACK64TRANSMUTE=TRUE) and the
+access rule that allows a process to create an object in that directory
+includes 't' access the label assigned to the new object will be that
+of the directory, not the creating process. This makes it much easier
+for two processes with different labels to share data without granting
+access to all of their files.
+
 IPC objects, message queues, semaphore sets, and memory segments exist in flat
 namespaces and access requests are only required to match the object in
 question.
diff --git a/security/smack/smack.h b/security/smack/smack.h
index cf2594dfa93..5e031a2e4c3 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -23,13 +23,19 @@
 #include <linux/lsm_audit.h>
 
 /*
+ * Smack labels were limited to 23 characters for a long time.
+ */
+#define SMK_LABELLEN	24
+#define SMK_LONGLABEL	256
+
+/*
+ * Maximum number of bytes for the levels in a CIPSO IP option.
  * Why 23? CIPSO is constrained to 30, so a 32 byte buffer is
  * bigger than can be used, and 24 is the next lower multiple
  * of 8, and there are too many issues if there isn't space set
  * aside for the terminating null byte.
  */
-#define SMK_MAXLEN	23
-#define SMK_LABELLEN	(SMK_MAXLEN+1)
+#define SMK_CIPSOLEN	24
 
 struct superblock_smack {
 	char		*smk_root;
@@ -78,15 +84,6 @@ struct smack_rule {
 	int			smk_access;
 };
 
-/*
- * An entry in the table mapping smack values to
- * CIPSO level/category-set values.
- */
-struct smack_cipso {
-	int	smk_level;
-	char	smk_catset[SMK_LABELLEN];
-};
-
 /*
  * An entry in the table identifying hosts.
  */
@@ -114,22 +111,19 @@ struct smk_netlbladdr {
  * interfaces don't. The secid should go away when all of
  * these components have been repaired.
  *
- * If there is a cipso value associated with the label it
- * gets stored here, too. This will most likely be rare as
- * the cipso direct mapping in used internally.
+ * The cipso value associated with the label gets stored here, too.
  *
  * Keep the access rules for this subject label here so that
  * the entire set of rules does not need to be examined every
  * time.
  */
 struct smack_known {
-	struct list_head	list;
-	char			smk_known[SMK_LABELLEN];
-	u32			smk_secid;
-	struct smack_cipso	*smk_cipso;
-	spinlock_t		smk_cipsolock;	/* for changing cipso map */
-	struct list_head	smk_rules;	/* access rules */
-	struct mutex		smk_rules_lock;	/* lock for the rules */
+	struct list_head		list;
+	char				*smk_known;
+	u32				smk_secid;
+	struct netlbl_lsm_secattr	smk_netlabel;	/* on wire labels */
+	struct list_head		smk_rules;	/* access rules */
+	struct mutex			smk_rules_lock;	/* lock for rules */
 };
 
 /*
@@ -166,6 +160,7 @@ struct smack_known {
 #define SMACK_CIPSO_DOI_DEFAULT		3	/* Historical */
 #define SMACK_CIPSO_DOI_INVALID		-1	/* Not a DOI */
 #define SMACK_CIPSO_DIRECT_DEFAULT	250	/* Arbitrary */
+#define SMACK_CIPSO_MAPPED_DEFAULT	251	/* Also arbitrary */
 #define SMACK_CIPSO_MAXCATVAL		63	/* Bigger gets harder */
 #define SMACK_CIPSO_MAXLEVEL            255     /* CIPSO 2.2 standard */
 #define SMACK_CIPSO_MAXCATNUM           239     /* CIPSO 2.2 standard */
@@ -216,10 +211,9 @@ struct inode_smack *new_inode_smack(char *);
 int smk_access_entry(char *, char *, struct list_head *);
 int smk_access(char *, char *, int, struct smk_audit_info *);
 int smk_curacc(char *, u32, struct smk_audit_info *);
-int smack_to_cipso(const char *, struct smack_cipso *);
-char *smack_from_cipso(u32, char *);
 char *smack_from_secid(const u32);
-void smk_parse_smack(const char *string, int len, char *smack);
+char *smk_parse_smack(const char *string, int len);
+int smk_netlbl_mls(int, char *, struct netlbl_lsm_secattr *, int);
 char *smk_import(const char *, int);
 struct smack_known *smk_import_entry(const char *, int);
 struct smack_known *smk_find_entry(const char *);
@@ -229,6 +223,7 @@ u32 smack_to_secid(const char *);
  * Shared data.
  */
 extern int smack_cipso_direct;
+extern int smack_cipso_mapped;
 extern char *smack_net_ambient;
 extern char *smack_onlycap;
 extern const char *smack_cipso_option;
@@ -240,23 +235,12 @@ extern struct smack_known smack_known_invalid;
 extern struct smack_known smack_known_star;
 extern struct smack_known smack_known_web;
 
+extern struct mutex	smack_known_lock;
 extern struct list_head smack_known_list;
 extern struct list_head smk_netlbladdr_list;
 
 extern struct security_operations smack_ops;
 
-/*
- * Stricly for CIPSO level manipulation.
- * Set the category bit number in a smack label sized buffer.
- */
-static inline void smack_catset_bit(int cat, char *catsetp)
-{
-	if (cat > SMK_LABELLEN * 8)
-		return;
-
-	catsetp[(cat - 1) / 8] |= 0x80 >> ((cat - 1) % 8);
-}
-
 /*
  * Is the directory transmuting?
  */
diff --git a/security/smack/smack_access.c b/security/smack/smack_access.c
index c8115f7308f..9f3705e9271 100644
--- a/security/smack/smack_access.c
+++ b/security/smack/smack_access.c
@@ -19,37 +19,31 @@
 struct smack_known smack_known_huh = {
 	.smk_known	= "?",
 	.smk_secid	= 2,
-	.smk_cipso	= NULL,
 };
 
 struct smack_known smack_known_hat = {
 	.smk_known	= "^",
 	.smk_secid	= 3,
-	.smk_cipso	= NULL,
 };
 
 struct smack_known smack_known_star = {
 	.smk_known	= "*",
 	.smk_secid	= 4,
-	.smk_cipso	= NULL,
 };
 
 struct smack_known smack_known_floor = {
 	.smk_known	= "_",
 	.smk_secid	= 5,
-	.smk_cipso	= NULL,
 };
 
 struct smack_known smack_known_invalid = {
 	.smk_known	= "",
 	.smk_secid	= 6,
-	.smk_cipso	= NULL,
 };
 
 struct smack_known smack_known_web = {
 	.smk_known	= "@",
 	.smk_secid	= 7,
-	.smk_cipso	= NULL,
 };
 
 LIST_HEAD(smack_known_list);
@@ -331,7 +325,7 @@ void smack_log(char *subject_label, char *object_label, int request,
 }
 #endif
 
-static DEFINE_MUTEX(smack_known_lock);
+DEFINE_MUTEX(smack_known_lock);
 
 /**
  * smk_find_entry - find a label on the list, return the list entry
@@ -345,7 +339,7 @@ struct smack_known *smk_find_entry(const char *string)
 	struct smack_known *skp;
 
 	list_for_each_entry_rcu(skp, &smack_known_list, list) {
-		if (strncmp(skp->smk_known, string, SMK_MAXLEN) == 0)
+		if (strcmp(skp->smk_known, string) == 0)
 			return skp;
 	}
 
@@ -356,27 +350,76 @@ struct smack_known *smk_find_entry(const char *string)
  * smk_parse_smack - parse smack label from a text string
  * @string: a text string that might contain a Smack label
  * @len: the maximum size, or zero if it is NULL terminated.
- * @smack: parsed smack label, or NULL if parse error
+ *
+ * Returns a pointer to the clean label, or NULL
  */
-void smk_parse_smack(const char *string, int len, char *smack)
+char *smk_parse_smack(const char *string, int len)
 {
-	int found;
+	char *smack;
 	int i;
 
-	if (len <= 0 || len > SMK_MAXLEN)
-		len = SMK_MAXLEN;
-
-	for (i = 0, found = 0; i < SMK_LABELLEN; i++) {
-		if (found)
-			smack[i] = '\0';
-		else if (i >= len || string[i] > '~' || string[i] <= ' ' ||
-			 string[i] == '/' || string[i] == '"' ||
-			 string[i] == '\\' || string[i] == '\'') {
-			smack[i] = '\0';
-			found = 1;
-		} else
-			smack[i] = string[i];
+	if (len <= 0)
+		len = strlen(string) + 1;
+
+	/*
+	 * Reserve a leading '-' as an indicator that
+	 * this isn't a label, but an option to interfaces
+	 * including /smack/cipso and /smack/cipso2
+	 */
+	if (string[0] == '-')
+		return NULL;
+
+	for (i = 0; i < len; i++)
+		if (string[i] > '~' || string[i] <= ' ' || string[i] == '/' ||
+		    string[i] == '"' || string[i] == '\\' || string[i] == '\'')
+			break;
+
+	if (i == 0 || i >= SMK_LONGLABEL)
+		return NULL;
+
+	smack = kzalloc(i + 1, GFP_KERNEL);
+	if (smack != NULL) {
+		strncpy(smack, string, i + 1);
+		smack[i] = '\0';
 	}
+	return smack;
+}
+
+/**
+ * smk_netlbl_mls - convert a catset to netlabel mls categories
+ * @catset: the Smack categories
+ * @sap: where to put the netlabel categories
+ *
+ * Allocates and fills attr.mls
+ * Returns 0 on success, error code on failure.
+ */
+int smk_netlbl_mls(int level, char *catset, struct netlbl_lsm_secattr *sap,
+			int len)
+{
+	unsigned char *cp;
+	unsigned char m;
+	int cat;
+	int rc;
+	int byte;
+
+	sap->flags |= NETLBL_SECATTR_MLS_CAT;
+	sap->attr.mls.lvl = level;
+	sap->attr.mls.cat = netlbl_secattr_catmap_alloc(GFP_ATOMIC);
+	sap->attr.mls.cat->startbit = 0;
+
+	for (cat = 1, cp = catset, byte = 0; byte < len; cp++, byte++)
+		for (m = 0x80; m != 0; m >>= 1, cat++) {
+			if ((m & *cp) == 0)
+				continue;
+			rc = netlbl_secattr_catmap_setbit(sap->attr.mls.cat,
+							  cat, GFP_ATOMIC);
+			if (rc < 0) {
+				netlbl_secattr_catmap_free(sap->attr.mls.cat);
+				return rc;
+			}
+		}
+
+	return 0;
 }
 
 /**
@@ -390,33 +433,59 @@ void smk_parse_smack(const char *string, int len, char *smack)
 struct smack_known *smk_import_entry(const char *string, int len)
 {
 	struct smack_known *skp;
-	char smack[SMK_LABELLEN];
+	char *smack;
+	int slen;
+	int rc;
 
-	smk_parse_smack(string, len, smack);
-	if (smack[0] == '\0')
+	smack = smk_parse_smack(string, len);
+	if (smack == NULL)
 		return NULL;
 
 	mutex_lock(&smack_known_lock);
 
 	skp = smk_find_entry(smack);
+	if (skp != NULL)
+		goto freeout;
 
-	if (skp == NULL) {
-		skp = kzalloc(sizeof(struct smack_known), GFP_KERNEL);
-		if (skp != NULL) {
-			strncpy(skp->smk_known, smack, SMK_MAXLEN);
-			skp->smk_secid = smack_next_secid++;
-			skp->smk_cipso = NULL;
-			INIT_LIST_HEAD(&skp->smk_rules);
-			spin_lock_init(&skp->smk_cipsolock);
-			mutex_init(&skp->smk_rules_lock);
-			/*
-			 * Make sure that the entry is actually
-			 * filled before putting it on the list.
-			 */
-			list_add_rcu(&skp->list, &smack_known_list);
-		}
-	}
+	skp = kzalloc(sizeof(*skp), GFP_KERNEL);
+	if (skp == NULL)
+		goto freeout;
 
+	skp->smk_known = smack;
+	skp->smk_secid = smack_next_secid++;
+	skp->smk_netlabel.domain = skp->smk_known;
+	skp->smk_netlabel.flags =
+		NETLBL_SECATTR_DOMAIN | NETLBL_SECATTR_MLS_LVL;
+	/*
+	 * If direct labeling works use it.
+	 * Otherwise use mapped labeling.
+	 */
+	slen = strlen(smack);
+	if (slen < SMK_CIPSOLEN)
+		rc = smk_netlbl_mls(smack_cipso_direct, skp->smk_known,
+			       &skp->smk_netlabel, slen);
+	else
+		rc = smk_netlbl_mls(smack_cipso_mapped, (char *)&skp->smk_secid,
+			       &skp->smk_netlabel, sizeof(skp->smk_secid));
+
+	if (rc >= 0) {
+		INIT_LIST_HEAD(&skp->smk_rules);
+		mutex_init(&skp->smk_rules_lock);
+		/*
+		 * Make sure that the entry is actually
+		 * filled before putting it on the list.
+		 */
+		list_add_rcu(&skp->list, &smack_known_list);
+		goto unlockout;
+	}
+	/*
+	 * smk_netlbl_mls failed.
+	 */
+	kfree(skp);
+	skp = NULL;
+freeout:
+	kfree(smack);
+unlockout:
 	mutex_unlock(&smack_known_lock);
 
 	return skp;
@@ -479,79 +548,9 @@ char *smack_from_secid(const u32 secid)
  */
 u32 smack_to_secid(const char *smack)
 {
-	struct smack_known *skp;
+	struct smack_known *skp = smk_find_entry(smack);
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(skp, &smack_known_list, list) {
-		if (strncmp(skp->smk_known, smack, SMK_MAXLEN) == 0) {
-			rcu_read_unlock();
-			return skp->smk_secid;
-		}
-	}
-	rcu_read_unlock();
-	return 0;
-}
-
-/**
- * smack_from_cipso - find the Smack label associated with a CIPSO option
- * @level: Bell & LaPadula level from the network
- * @cp: Bell & LaPadula categories from the network
- *
- * This is a simple lookup in the label table.
- *
- * Return the matching label from the label list or NULL.
- */
-char *smack_from_cipso(u32 level, char *cp)
-{
-	struct smack_known *kp;
-	char *final = NULL;
-
-	rcu_read_lock();
-	list_for_each_entry(kp, &smack_known_list, list) {
-		if (kp->smk_cipso == NULL)
-			continue;
-
-		spin_lock_bh(&kp->smk_cipsolock);
-
-		if (kp->smk_cipso->smk_level == level &&
-		    memcmp(kp->smk_cipso->smk_catset, cp, SMK_LABELLEN) == 0)
-			final = kp->smk_known;
-
-		spin_unlock_bh(&kp->smk_cipsolock);
-
-		if (final != NULL)
-			break;
-	}
-	rcu_read_unlock();
-
-	return final;
-}
-
-/**
- * smack_to_cipso - find the CIPSO option to go with a Smack label
- * @smack: a pointer to the smack label in question
- * @cp: where to put the result
- *
- * Returns zero if a value is available, non-zero otherwise.
- */
-int smack_to_cipso(const char *smack, struct smack_cipso *cp)
-{
-	struct smack_known *kp;
-	int found = 0;
-
-	rcu_read_lock();
-	list_for_each_entry_rcu(kp, &smack_known_list, list) {
-		if (kp->smk_known == smack ||
-		    strcmp(kp->smk_known, smack) == 0) {
-			found = 1;
-			break;
-		}
-	}
-	rcu_read_unlock();
-
-	if (found == 0 || kp->smk_cipso == NULL)
-		return -ENOENT;
-
-	memcpy(cp, kp->smk_cipso, sizeof(struct smack_cipso));
-	return 0;
+	if (skp == NULL)
+		return 0;
+	return skp->smk_secid;
 }
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 5f80075d771..952b1f41fc7 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -30,7 +30,6 @@
 #include <linux/slab.h>
 #include <linux/mutex.h>
 #include <linux/pipe_fs_i.h>
-#include <net/netlabel.h>
 #include <net/cipso_ipv4.h>
 #include <linux/audit.h>
 #include <linux/magic.h>
@@ -57,16 +56,23 @@
 static char *smk_fetch(const char *name, struct inode *ip, struct dentry *dp)
 {
 	int rc;
-	char in[SMK_LABELLEN];
+	char *buffer;
+	char *result = NULL;
 
 	if (ip->i_op->getxattr == NULL)
 		return NULL;
 
-	rc = ip->i_op->getxattr(dp, name, in, SMK_LABELLEN);
-	if (rc < 0)
+	buffer = kzalloc(SMK_LONGLABEL, GFP_KERNEL);
+	if (buffer == NULL)
 		return NULL;
 
-	return smk_import(in, rc);
+	rc = ip->i_op->getxattr(dp, name, buffer, SMK_LONGLABEL);
+	if (rc > 0)
+		result = smk_import(buffer, rc);
+
+	kfree(buffer);
+
+	return result;
 }
 
 /**
@@ -825,7 +831,7 @@ static int smack_inode_setxattr(struct dentry *dentry, const char *name,
 		 * check label validity here so import wont fail on
 		 * post_setxattr
 		 */
-		if (size == 0 || size >= SMK_LABELLEN ||
+		if (size == 0 || size >= SMK_LONGLABEL ||
 		    smk_import(value, size) == NULL)
 			rc = -EINVAL;
 	} else if (strcmp(name, XATTR_NAME_SMACKTRANSMUTE) == 0) {
@@ -1823,65 +1829,6 @@ static char *smack_host_label(struct sockaddr_in *sip)
 	return NULL;
 }
 
-/**
- * smack_set_catset - convert a capset to netlabel mls categories
- * @catset: the Smack categories
- * @sap: where to put the netlabel categories
- *
- * Allocates and fills attr.mls.cat
- */
-static void smack_set_catset(char *catset, struct netlbl_lsm_secattr *sap)
-{
-	unsigned char *cp;
-	unsigned char m;
-	int cat;
-	int rc;
-	int byte;
-
-	if (!catset)
-		return;
-
-	sap->flags |= NETLBL_SECATTR_MLS_CAT;
-	sap->attr.mls.cat = netlbl_secattr_catmap_alloc(GFP_ATOMIC);
-	sap->attr.mls.cat->startbit = 0;
-
-	for (cat = 1, cp = catset, byte = 0; byte < SMK_LABELLEN; cp++, byte++)
-		for (m = 0x80; m != 0; m >>= 1, cat++) {
-			if ((m & *cp) == 0)
-				continue;
-			rc = netlbl_secattr_catmap_setbit(sap->attr.mls.cat,
-							  cat, GFP_ATOMIC);
-		}
-}
-
-/**
- * smack_to_secattr - fill a secattr from a smack value
- * @smack: the smack value
- * @nlsp: where the result goes
- *
- * Casey says that CIPSO is good enough for now.
- * It can be used to effect.
- * It can also be abused to effect when necessary.
- * Apologies to the TSIG group in general and GW in particular.
- */
-static void smack_to_secattr(char *smack, struct netlbl_lsm_secattr *nlsp)
-{
-	struct smack_cipso cipso;
-	int rc;
-
-	nlsp->domain = smack;
-	nlsp->flags = NETLBL_SECATTR_DOMAIN | NETLBL_SECATTR_MLS_LVL;
-
-	rc = smack_to_cipso(smack, &cipso);
-	if (rc == 0) {
-		nlsp->attr.mls.lvl = cipso.smk_level;
-		smack_set_catset(cipso.smk_catset, nlsp);
-	} else {
-		nlsp->attr.mls.lvl = smack_cipso_direct;
-		smack_set_catset(smack, nlsp);
-	}
-}
-
 /**
  * smack_netlabel - Set the secattr on a socket
  * @sk: the socket
@@ -1894,8 +1841,8 @@ static void smack_to_secattr(char *smack, struct netlbl_lsm_secattr *nlsp)
  */
 static int smack_netlabel(struct sock *sk, int labeled)
 {
+	struct smack_known *skp;
 	struct socket_smack *ssp = sk->sk_security;
-	struct netlbl_lsm_secattr secattr;
 	int rc = 0;
 
 	/*
@@ -1913,10 +1860,8 @@ static int smack_netlabel(struct sock *sk, int labeled)
 	    labeled == SMACK_UNLABELED_SOCKET)
 		netlbl_sock_delattr(sk);
 	else {
-		netlbl_secattr_init(&secattr);
-		smack_to_secattr(ssp->smk_out, &secattr);
-		rc = netlbl_sock_setattr(sk, sk->sk_family, &secattr);
-		netlbl_secattr_destroy(&secattr);
+		skp = smk_find_entry(ssp->smk_out);
+		rc = netlbl_sock_setattr(sk, sk->sk_family, &skp->smk_netlabel);
 	}
 
 	bh_unlock_sock(sk);
@@ -1989,7 +1934,7 @@ static int smack_inode_setsecurity(struct inode *inode, const char *name,
 	struct socket *sock;
 	int rc = 0;
 
-	if (value == NULL || size > SMK_LABELLEN || size == 0)
+	if (value == NULL || size > SMK_LONGLABEL || size == 0)
 		return -EACCES;
 
 	sp = smk_import(value, size);
@@ -2785,7 +2730,7 @@ static int smack_setprocattr(struct task_struct *p, char *name,
 	if (!capable(CAP_MAC_ADMIN))
 		return -EPERM;
 
-	if (value == NULL || size == 0 || size >= SMK_LABELLEN)
+	if (value == NULL || size == 0 || size >= SMK_LONGLABEL)
 		return -EINVAL;
 
 	if (strcmp(name, "current") != 0)
@@ -2921,10 +2866,9 @@ static int smack_socket_sendmsg(struct socket *sock, struct msghdr *msg,
 static char *smack_from_secattr(struct netlbl_lsm_secattr *sap,
 				struct socket_smack *ssp)
 {
-	struct smack_known *skp;
-	char smack[SMK_LABELLEN];
+	struct smack_known *kp;
 	char *sp;
-	int pcat;
+	int found = 0;
 
 	if ((sap->flags & NETLBL_SECATTR_MLS_LVL) != 0) {
 		/*
@@ -2932,59 +2876,27 @@ static char *smack_from_secattr(struct netlbl_lsm_secattr *sap,
 		 * If there are flags but no level netlabel isn't
 		 * behaving the way we expect it to.
 		 *
-		 * Get the categories, if any
+		 * Look it up in the label table
 		 * Without guidance regarding the smack value
 		 * for the packet fall back on the network
 		 * ambient value.
 		 */
-		memset(smack, '\0', SMK_LABELLEN);
-		if ((sap->flags & NETLBL_SECATTR_MLS_CAT) != 0)
-			for (pcat = -1;;) {
-				pcat = netlbl_secattr_catmap_walk(
-					sap->attr.mls.cat, pcat + 1);
-				if (pcat < 0)
-					break;
-				smack_catset_bit(pcat, smack);
-			}
-		/*
-		 * If it is CIPSO using smack direct mapping
-		 * we are already done. WeeHee.
-		 */
-		if (sap->attr.mls.lvl == smack_cipso_direct) {
-			/*
-			 * The label sent is usually on the label list.
-			 *
-			 * If it is not we may still want to allow the
-			 * delivery.
-			 *
-			 * If the recipient is accepting all packets
-			 * because it is using the star ("*") label
-			 * for SMACK64IPIN provide the web ("@") label
-			 * so that a directed response will succeed.
-			 * This is not very correct from a MAC point
-			 * of view, but gets around the problem that
-			 * locking prevents adding the newly discovered
-			 * label to the list.
-			 * The case where the recipient is not using
-			 * the star label should obviously fail.
-			 * The easy way to do this is to provide the
-			 * star label as the subject label.
-			 */
-			skp = smk_find_entry(smack);
-			if (skp != NULL)
-				return skp->smk_known;
-			if (ssp != NULL &&
-			    ssp->smk_in == smack_known_star.smk_known)
-				return smack_known_web.smk_known;
-			return smack_known_star.smk_known;
+		rcu_read_lock();
+		list_for_each_entry(kp, &smack_known_list, list) {
+			if (sap->attr.mls.lvl != kp->smk_netlabel.attr.mls.lvl)
+				continue;
+			if (memcmp(sap->attr.mls.cat,
+				kp->smk_netlabel.attr.mls.cat,
+				SMK_CIPSOLEN) != 0)
+				continue;
+			found = 1;
+			break;
 		}
-		/*
-		 * Look it up in the supplied table if it is not
-		 * a direct mapping.
-		 */
-		sp = smack_from_cipso(sap->attr.mls.lvl, smack);
-		if (sp != NULL)
-			return sp;
+		rcu_read_unlock();
+
+		if (found)
+			return kp->smk_known;
+
 		if (ssp != NULL && ssp->smk_in == smack_known_star.smk_known)
 			return smack_known_web.smk_known;
 		return smack_known_star.smk_known;
@@ -3184,11 +3096,13 @@ static int smack_inet_conn_request(struct sock *sk, struct sk_buff *skb,
 				   struct request_sock *req)
 {
 	u16 family = sk->sk_family;
+	struct smack_known *skp;
 	struct socket_smack *ssp = sk->sk_security;
 	struct netlbl_lsm_secattr secattr;
 	struct sockaddr_in addr;
 	struct iphdr *hdr;
 	char *sp;
+	char *hsp;
 	int rc;
 	struct smk_audit_info ad;
 #ifdef CONFIG_AUDIT
@@ -3235,16 +3149,14 @@ static int smack_inet_conn_request(struct sock *sk, struct sk_buff *skb,
 	hdr = ip_hdr(skb);
 	addr.sin_addr.s_addr = hdr->saddr;
 	rcu_read_lock();
-	if (smack_host_label(&addr) == NULL) {
-		rcu_read_unlock();
-		netlbl_secattr_init(&secattr);
-		smack_to_secattr(sp, &secattr);
-		rc = netlbl_req_setattr(req, &secattr);
-		netlbl_secattr_destroy(&secattr);
-	} else {
-		rcu_read_unlock();
+	hsp = smack_host_label(&addr);
+	rcu_read_unlock();
+
+	if (hsp == NULL) {
+		skp = smk_find_entry(sp);
+		rc = netlbl_req_setattr(req, &skp->smk_netlabel);
+	} else
 		netlbl_req_delattr(req);
-	}
 
 	return rc;
 }
@@ -3668,15 +3580,6 @@ struct security_operations smack_ops = {
 
 static __init void init_smack_known_list(void)
 {
-	/*
-	 * Initialize CIPSO locks
-	 */
-	spin_lock_init(&smack_known_huh.smk_cipsolock);
-	spin_lock_init(&smack_known_hat.smk_cipsolock);
-	spin_lock_init(&smack_known_star.smk_cipsolock);
-	spin_lock_init(&smack_known_floor.smk_cipsolock);
-	spin_lock_init(&smack_known_invalid.smk_cipsolock);
-	spin_lock_init(&smack_known_web.smk_cipsolock);
 	/*
 	 * Initialize rule list locks
 	 */
diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
index 038811cb7e6..1810c9a4ed4 100644
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
@@ -22,7 +22,6 @@
 #include <linux/mutex.h>
 #include <linux/slab.h>
 #include <net/net_namespace.h>
-#include <net/netlabel.h>
 #include <net/cipso_ipv4.h>
 #include <linux/seq_file.h>
 #include <linux/ctype.h>
@@ -45,6 +44,11 @@ enum smk_inos {
 	SMK_LOGGING	= 10,	/* logging */
 	SMK_LOAD_SELF	= 11,	/* task specific rules */
 	SMK_ACCESSES	= 12,	/* access policy */
+	SMK_MAPPED	= 13,	/* CIPSO level indicating mapped label */
+	SMK_LOAD2	= 14,	/* load policy with long labels */
+	SMK_LOAD_SELF2	= 15,	/* load task specific rules with long labels */
+	SMK_ACCESS2	= 16,	/* make an access check with long labels */
+	SMK_CIPSO2	= 17,	/* load long label -> CIPSO mapping */
 };
 
 /*
@@ -60,7 +64,7 @@ static DEFINE_MUTEX(smk_netlbladdr_lock);
  * If it isn't somehow marked, use this.
  * It can be reset via smackfs/ambient
  */
-char *smack_net_ambient = smack_known_floor.smk_known;
+char *smack_net_ambient;
 
 /*
  * This is the level in a CIPSO header that indicates a
@@ -69,6 +73,13 @@ char *smack_net_ambient = smack_known_floor.smk_known;
  */
 int smack_cipso_direct = SMACK_CIPSO_DIRECT_DEFAULT;
 
+/*
+ * This is the level in a CIPSO header that indicates a
+ * secid is contained directly in the category set.
+ * It can be reset via smackfs/mapped
+ */
+int smack_cipso_mapped = SMACK_CIPSO_MAPPED_DEFAULT;
+
 /*
  * Unless a process is running with this label even
  * having CAP_MAC_OVERRIDE isn't enough to grant
@@ -89,7 +100,7 @@ LIST_HEAD(smk_netlbladdr_list);
 
 /*
  * Rule lists are maintained for each label.
- * This master list is just for reading /smack/load.
+ * This master list is just for reading /smack/load and /smack/load2.
  */
 struct smack_master_list {
 	struct list_head	list;
@@ -125,6 +136,18 @@ const char *smack_cipso_option = SMACK_CIPSO_OPTION;
 #define SMK_OLOADLEN	(SMK_LABELLEN + SMK_LABELLEN + SMK_OACCESSLEN)
 #define SMK_LOADLEN	(SMK_LABELLEN + SMK_LABELLEN + SMK_ACCESSLEN)
 
+/*
+ * Stricly for CIPSO level manipulation.
+ * Set the category bit number in a smack label sized buffer.
+ */
+static inline void smack_catset_bit(unsigned int cat, char *catsetp)
+{
+	if (cat == 0 || cat > (SMK_CIPSOLEN * 8))
+		return;
+
+	catsetp[(cat - 1) / 8] |= 0x80 >> ((cat - 1) % 8);
+}
+
 /**
  * smk_netlabel_audit_set - fill a netlbl_audit struct
  * @nap: structure to fill
@@ -137,12 +160,10 @@ static void smk_netlabel_audit_set(struct netlbl_audit *nap)
 }
 
 /*
- * Values for parsing single label host rules
+ * Value for parsing single label host rules
  * "1.2.3.4 X"
- * "192.168.138.129/32 abcdefghijklmnopqrstuvw"
  */
 #define SMK_NETLBLADDRMIN	9
-#define SMK_NETLBLADDRMAX	42
 
 /**
  * smk_set_access - add a rule to the rule list
@@ -188,33 +209,47 @@ static int smk_set_access(struct smack_rule *srp, struct list_head *rule_list,
 }
 
 /**
- * smk_parse_rule - parse Smack rule from load string
- * @data: string to be parsed whose size is SMK_LOADLEN
+ * smk_fill_rule - Fill Smack rule from strings
+ * @subject: subject label string
+ * @object: object label string
+ * @access: access string
  * @rule: Smack rule
  * @import: if non-zero, import labels
+ *
+ * Returns 0 on success, -1 on failure
  */
-static int smk_parse_rule(const char *data, struct smack_rule *rule, int import)
+static int smk_fill_rule(const char *subject, const char *object,
+				const char *access, struct smack_rule *rule,
+				int import)
 {
-	char smack[SMK_LABELLEN];
+	int rc = -1;
+	int done;
+	const char *cp;
 	struct smack_known *skp;
 
 	if (import) {
-		rule->smk_subject = smk_import(data, 0);
+		rule->smk_subject = smk_import(subject, 0);
 		if (rule->smk_subject == NULL)
 			return -1;
 
-		rule->smk_object = smk_import(data + SMK_LABELLEN, 0);
+		rule->smk_object = smk_import(object, 0);
 		if (rule->smk_object == NULL)
 			return -1;
 	} else {
-		smk_parse_smack(data, 0, smack);
-		skp = smk_find_entry(smack);
+		cp = smk_parse_smack(subject, 0);
+		if (cp == NULL)
+			return -1;
+		skp = smk_find_entry(cp);
+		kfree(cp);
 		if (skp == NULL)
 			return -1;
 		rule->smk_subject = skp->smk_known;
 
-		smk_parse_smack(data + SMK_LABELLEN, 0, smack);
-		skp = smk_find_entry(smack);
+		cp = smk_parse_smack(object, 0);
+		if (cp == NULL)
+			return -1;
+		skp = smk_find_entry(cp);
+		kfree(cp);
 		if (skp == NULL)
 			return -1;
 		rule->smk_object = skp->smk_known;
@@ -222,90 +257,127 @@ static int smk_parse_rule(const char *data, struct smack_rule *rule, int import)
 
 	rule->smk_access = 0;
 
-	switch (data[SMK_LABELLEN + SMK_LABELLEN]) {
-	case '-':
-		break;
-	case 'r':
-	case 'R':
-		rule->smk_access |= MAY_READ;
-		break;
-	default:
-		return -1;
+	for (cp = access, done = 0; *cp && !done; cp++) {
+		switch (*cp) {
+		case '-':
+			break;
+		case 'r':
+		case 'R':
+			rule->smk_access |= MAY_READ;
+			break;
+		case 'w':
+		case 'W':
+			rule->smk_access |= MAY_WRITE;
+			break;
+		case 'x':
+		case 'X':
+			rule->smk_access |= MAY_EXEC;
+			break;
+		case 'a':
+		case 'A':
+			rule->smk_access |= MAY_APPEND;
+			break;
+		case 't':
+		case 'T':
+			rule->smk_access |= MAY_TRANSMUTE;
+			break;
+		default:
+			done = 1;
+			break;
+		}
 	}
+	rc = 0;
 
-	switch (data[SMK_LABELLEN + SMK_LABELLEN + 1]) {
-	case '-':
-		break;
-	case 'w':
-	case 'W':
-		rule->smk_access |= MAY_WRITE;
-		break;
-	default:
-		return -1;
-	}
+	return rc;
+}
 
-	switch (data[SMK_LABELLEN + SMK_LABELLEN + 2]) {
-	case '-':
-		break;
-	case 'x':
-	case 'X':
-		rule->smk_access |= MAY_EXEC;
-		break;
-	default:
-		return -1;
-	}
+/**
+ * smk_parse_rule - parse Smack rule from load string
+ * @data: string to be parsed whose size is SMK_LOADLEN
+ * @rule: Smack rule
+ * @import: if non-zero, import labels
+ *
+ * Returns 0 on success, -1 on errors.
+ */
+static int smk_parse_rule(const char *data, struct smack_rule *rule, int import)
+{
+	int rc;
 
-	switch (data[SMK_LABELLEN + SMK_LABELLEN + 3]) {
-	case '-':
-		break;
-	case 'a':
-	case 'A':
-		rule->smk_access |= MAY_APPEND;
-		break;
-	default:
-		return -1;
-	}
+	rc = smk_fill_rule(data, data + SMK_LABELLEN,
+			   data + SMK_LABELLEN + SMK_LABELLEN, rule, import);
+	return rc;
+}
 
-	switch (data[SMK_LABELLEN + SMK_LABELLEN + 4]) {
-	case '-':
-		break;
-	case 't':
-	case 'T':
-		rule->smk_access |= MAY_TRANSMUTE;
-		break;
-	default:
-		return -1;
-	}
+/**
+ * smk_parse_long_rule - parse Smack rule from rule string
+ * @data: string to be parsed, null terminated
+ * @rule: Smack rule
+ * @import: if non-zero, import labels
+ *
+ * Returns 0 on success, -1 on failure
+ */
+static int smk_parse_long_rule(const char *data, struct smack_rule *rule,
+				int import)
+{
+	char *subject;
+	char *object;
+	char *access;
+	int datalen;
+	int rc = -1;
 
-	return 0;
+	/*
+	 * This is probably inefficient, but safe.
+	 */
+	datalen = strlen(data);
+	subject = kzalloc(datalen, GFP_KERNEL);
+	if (subject == NULL)
+		return -1;
+	object = kzalloc(datalen, GFP_KERNEL);
+	if (object == NULL)
+		goto free_out_s;
+	access = kzalloc(datalen, GFP_KERNEL);
+	if (access == NULL)
+		goto free_out_o;
+
+	if (sscanf(data, "%s %s %s", subject, object, access) == 3)
+		rc = smk_fill_rule(subject, object, access, rule, import);
+
+	kfree(access);
+free_out_o:
+	kfree(object);
+free_out_s:
+	kfree(subject);
+	return rc;
 }
 
+#define SMK_FIXED24_FMT	0	/* Fixed 24byte label format */
+#define SMK_LONG_FMT	1	/* Variable long label format */
 /**
- * smk_write_load_list - write() for any /smack/load
+ * smk_write_rules_list - write() for any /smack rule file
  * @file: file pointer, not actually used
  * @buf: where to get the data from
  * @count: bytes sent
  * @ppos: where to start - must be 0
  * @rule_list: the list of rules to write to
  * @rule_lock: lock for the rule list
+ * @format: /smack/load or /smack/load2 format.
  *
  * Get one smack access rule from above.
- * The format is exactly:
- *     char subject[SMK_LABELLEN]
- *     char object[SMK_LABELLEN]
- *     char access[SMK_ACCESSLEN]
- *
- * writes must be SMK_LABELLEN+SMK_LABELLEN+SMK_ACCESSLEN bytes.
+ * The format for SMK_LONG_FMT is:
+ *	"subject<whitespace>object<whitespace>access[<whitespace>...]"
+ * The format for SMK_FIXED24_FMT is exactly:
+ *	"subject                 object                  rwxat"
  */
-static ssize_t smk_write_load_list(struct file *file, const char __user *buf,
-				size_t count, loff_t *ppos,
-				struct list_head *rule_list,
-				struct mutex *rule_lock)
+static ssize_t smk_write_rules_list(struct file *file, const char __user *buf,
+					size_t count, loff_t *ppos,
+					struct list_head *rule_list,
+					struct mutex *rule_lock, int format)
 {
 	struct smack_master_list *smlp;
 	struct smack_known *skp;
 	struct smack_rule *rule;
 	char *data;
+	int datalen;
 	int rc = -EINVAL;
 	int load = 0;
 
@@ -315,13 +387,18 @@ static ssize_t smk_write_load_list(struct file *file, const char __user *buf,
 	 */
 	if (*ppos != 0)
 		return -EINVAL;
-	/*
-	 * Minor hack for backward compatibility
-	 */
-	if (count < (SMK_OLOADLEN) || count > SMK_LOADLEN)
-		return -EINVAL;
 
-	data = kzalloc(SMK_LOADLEN, GFP_KERNEL);
+	if (format == SMK_FIXED24_FMT) {
+		/*
+		 * Minor hack for backward compatibility
+		 */
+		if (count != SMK_OLOADLEN && count != SMK_LOADLEN)
+			return -EINVAL;
+		datalen = SMK_LOADLEN;
+	} else
+		datalen = count + 1;
+
+	data = kzalloc(datalen, GFP_KERNEL);
 	if (data == NULL)
 		return -ENOMEM;
 
@@ -330,20 +407,29 @@ static ssize_t smk_write_load_list(struct file *file, const char __user *buf,
 		goto out;
 	}
 
-	/*
-	 * More on the minor hack for backward compatibility
-	 */
-	if (count == (SMK_OLOADLEN))
-		data[SMK_OLOADLEN] = '-';
-
 	rule = kzalloc(sizeof(*rule), GFP_KERNEL);
 	if (rule == NULL) {
 		rc = -ENOMEM;
 		goto out;
 	}
 
-	if (smk_parse_rule(data, rule, 1))
-		goto out_free_rule;
+	if (format == SMK_LONG_FMT) {
+		/*
+		 * Be sure the data string is terminated.
+		 */
+		data[count] = '\0';
+		if (smk_parse_long_rule(data, rule, 1))
+			goto out_free_rule;
+	} else {
+		/*
+		 * More on the minor hack for backward compatibility
+		 */
+		if (count == (SMK_OLOADLEN))
+			data[SMK_OLOADLEN] = '-';
+		if (smk_parse_rule(data, rule, 1))
+			goto out_free_rule;
+	}
+
 
 	if (rule_list == NULL) {
 		load = 1;
@@ -354,18 +440,20 @@ static ssize_t smk_write_load_list(struct file *file, const char __user *buf,
 
 	rc = count;
 	/*
-	 * If this is "load" as opposed to "load-self" and a new rule
+	 * If this is a global as opposed to self and a new rule
 	 * it needs to get added for reporting.
 	 * smk_set_access returns true if there was already a rule
 	 * for the subject/object pair, and false if it was new.
 	 */
-	if (load && !smk_set_access(rule, rule_list, rule_lock)) {
-		smlp = kzalloc(sizeof(*smlp), GFP_KERNEL);
-		if (smlp != NULL) {
-			smlp->smk_rule = rule;
-			list_add_rcu(&smlp->list, &smack_rule_list);
-		} else
-			rc = -ENOMEM;
+	if (!smk_set_access(rule, rule_list, rule_lock)) {
+		if (load) {
+			smlp = kzalloc(sizeof(*smlp), GFP_KERNEL);
+			if (smlp != NULL) {
+				smlp->smk_rule = rule;
+				list_add_rcu(&smlp->list, &smack_rule_list);
+			} else
+				rc = -ENOMEM;
+		}
 		goto out;
 	}
 
@@ -421,29 +509,18 @@ static void smk_seq_stop(struct seq_file *s, void *v)
 	/* No-op */
 }
 
-/*
- * Seq_file read operations for /smack/load
- */
-
-static void *load_seq_start(struct seq_file *s, loff_t *pos)
-{
-	return smk_seq_start(s, pos, &smack_rule_list);
-}
-
-static void *load_seq_next(struct seq_file *s, void *v, loff_t *pos)
+static void smk_rule_show(struct seq_file *s, struct smack_rule *srp, int max)
 {
-	return smk_seq_next(s, v, pos, &smack_rule_list);
-}
-
-static int load_seq_show(struct seq_file *s, void *v)
-{
-	struct list_head *list = v;
-	struct smack_master_list *smlp =
-		 list_entry(list, struct smack_master_list, list);
-	struct smack_rule *srp = smlp->smk_rule;
+	/*
+	 * Don't show any rules with label names too long for
+	 * interface file (/smack/load or /smack/load2)
+	 * because you should expect to be able to write
+	 * anything you read back.
+	 */
+	if (strlen(srp->smk_subject) >= max || strlen(srp->smk_object) >= max)
+		return;
 
-	seq_printf(s, "%s %s", (char *)srp->smk_subject,
-		   (char *)srp->smk_object);
+	seq_printf(s, "%s %s", srp->smk_subject, srp->smk_object);
 
 	seq_putc(s, ' ');
 
@@ -461,13 +538,36 @@ static int load_seq_show(struct seq_file *s, void *v)
 		seq_putc(s, '-');
 
 	seq_putc(s, '\n');
+}
+
+/*
+ * Seq_file read operations for /smack/load
+ */
+
+static void *load2_seq_start(struct seq_file *s, loff_t *pos)
+{
+	return smk_seq_start(s, pos, &smack_rule_list);
+}
+
+static void *load2_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	return smk_seq_next(s, v, pos, &smack_rule_list);
+}
+
+static int load_seq_show(struct seq_file *s, void *v)
+{
+	struct list_head *list = v;
+	struct smack_master_list *smlp =
+		 list_entry(list, struct smack_master_list, list);
+
+	smk_rule_show(s, smlp->smk_rule, SMK_LABELLEN);
 
 	return 0;
 }
 
 static const struct seq_operations load_seq_ops = {
-	.start = load_seq_start,
-	.next  = load_seq_next,
+	.start = load2_seq_start,
+	.next  = load2_seq_next,
 	.show  = load_seq_show,
 	.stop  = smk_seq_stop,
 };
@@ -504,7 +604,8 @@ static ssize_t smk_write_load(struct file *file, const char __user *buf,
 	if (!capable(CAP_MAC_ADMIN))
 		return -EPERM;
 
-	return smk_write_load_list(file, buf, count, ppos, NULL, NULL);
+	return smk_write_rules_list(file, buf, count, ppos, NULL, NULL,
+				    SMK_FIXED24_FMT);
 }
 
 static const struct file_operations smk_load_ops = {
@@ -574,6 +675,8 @@ static void smk_unlbl_ambient(char *oldambient)
 			printk(KERN_WARNING "%s:%d remove rc = %d\n",
 			       __func__, __LINE__, rc);
 	}
+	if (smack_net_ambient == NULL)
+		smack_net_ambient = smack_known_floor.smk_known;
 
 	rc = netlbl_cfg_unlbl_map_add(smack_net_ambient, PF_INET,
 				      NULL, NULL, &nai);
@@ -605,27 +708,28 @@ static int cipso_seq_show(struct seq_file *s, void *v)
 	struct list_head  *list = v;
 	struct smack_known *skp =
 		 list_entry(list, struct smack_known, list);
-	struct smack_cipso *scp = skp->smk_cipso;
-	char *cbp;
+	struct netlbl_lsm_secattr_catmap *cmp = skp->smk_netlabel.attr.mls.cat;
 	char sep = '/';
-	int cat = 1;
 	int i;
-	unsigned char m;
 
-	if (scp == NULL)
+	/*
+	 * Don't show a label that could not have been set using
+	 * /smack/cipso. This is in support of the notion that
+	 * anything read from /smack/cipso ought to be writeable
+	 * to /smack/cipso.
+	 *
+	 * /smack/cipso2 should be used instead.
+	 */
+	if (strlen(skp->smk_known) >= SMK_LABELLEN)
 		return 0;
 
-	seq_printf(s, "%s %3d", (char *)&skp->smk_known, scp->smk_level);
+	seq_printf(s, "%s %3d", skp->smk_known, skp->smk_netlabel.attr.mls.lvl);
 
-	cbp = scp->smk_catset;
-	for (i = 0; i < SMK_LABELLEN; i++)
-		for (m = 0x80; m != 0; m >>= 1) {
-			if (m & cbp[i]) {
-				seq_printf(s, "%c%d", sep, cat);
-				sep = ',';
-			}
-			cat++;
-		}
+	for (i = netlbl_secattr_catmap_walk(cmp, 0); i >= 0;
+	     i = netlbl_secattr_catmap_walk(cmp, i + 1)) {
+		seq_printf(s, "%c%d", sep, i);
+		sep = ',';
+	}
 
 	seq_putc(s, '\n');
 
@@ -653,23 +757,24 @@ static int smk_open_cipso(struct inode *inode, struct file *file)
 }
 
 /**
- * smk_write_cipso - write() for /smack/cipso
+ * smk_set_cipso - do the work for write() for cipso and cipso2
  * @file: file pointer, not actually used
  * @buf: where to get the data from
  * @count: bytes sent
  * @ppos: where to start
+ * @format: /smack/cipso or /smack/cipso2
  *
  * Accepts only one cipso rule per write call.
  * Returns number of bytes written or error code, as appropriate
  */
-static ssize_t smk_write_cipso(struct file *file, const char __user *buf,
-			       size_t count, loff_t *ppos)
+static ssize_t smk_set_cipso(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos, int format)
 {
 	struct smack_known *skp;
-	struct smack_cipso *scp = NULL;
-	char mapcatset[SMK_LABELLEN];
+	struct netlbl_lsm_secattr ncats;
+	char mapcatset[SMK_CIPSOLEN];
 	int maplevel;
-	int cat;
+	unsigned int cat;
 	int catlen;
 	ssize_t rc = -EINVAL;
 	char *data = NULL;
@@ -686,7 +791,8 @@ static ssize_t smk_write_cipso(struct file *file, const char __user *buf,
 		return -EPERM;
 	if (*ppos != 0)
 		return -EINVAL;
-	if (count < SMK_CIPSOMIN || count > SMK_CIPSOMAX)
+	if (format == SMK_FIXED24_FMT &&
+	    (count < SMK_CIPSOMIN || count > SMK_CIPSOMAX))
 		return -EINVAL;
 
 	data = kzalloc(count + 1, GFP_KERNEL);
@@ -698,11 +804,6 @@ static ssize_t smk_write_cipso(struct file *file, const char __user *buf,
 		goto unlockedout;
 	}
 
-	/* labels cannot begin with a '-' */
-	if (data[0] == '-') {
-		rc = -EINVAL;
-		goto unlockedout;
-	}
 	data[count] = '\0';
 	rule = data;
 	/*
@@ -715,7 +816,11 @@ static ssize_t smk_write_cipso(struct file *file, const char __user *buf,
 	if (skp == NULL)
 		goto out;
 
-	rule += SMK_LABELLEN;
+	if (format == SMK_FIXED24_FMT)
+		rule += SMK_LABELLEN;
+	else
+		rule += strlen(skp->smk_known);
+
 	ret = sscanf(rule, "%d", &maplevel);
 	if (ret != 1 || maplevel > SMACK_CIPSO_MAXLEVEL)
 		goto out;
@@ -725,41 +830,29 @@ static ssize_t smk_write_cipso(struct file *file, const char __user *buf,
 	if (ret != 1 || catlen > SMACK_CIPSO_MAXCATNUM)
 		goto out;
 
-	if (count != (SMK_CIPSOMIN + catlen * SMK_DIGITLEN))
+	if (format == SMK_FIXED24_FMT &&
+	    count != (SMK_CIPSOMIN + catlen * SMK_DIGITLEN))
 		goto out;
 
 	memset(mapcatset, 0, sizeof(mapcatset));
 
 	for (i = 0; i < catlen; i++) {
 		rule += SMK_DIGITLEN;
-		ret = sscanf(rule, "%d", &cat);
+		ret = sscanf(rule, "%u", &cat);
 		if (ret != 1 || cat > SMACK_CIPSO_MAXCATVAL)
 			goto out;
 
 		smack_catset_bit(cat, mapcatset);
 	}
 
-	if (skp->smk_cipso == NULL) {
-		scp = kzalloc(sizeof(struct smack_cipso), GFP_KERNEL);
-		if (scp == NULL) {
-			rc = -ENOMEM;
-			goto out;
-		}
+	rc = smk_netlbl_mls(maplevel, mapcatset, &ncats, SMK_CIPSOLEN);
+	if (rc >= 0) {
+		netlbl_secattr_catmap_free(skp->smk_netlabel.attr.mls.cat);
+		skp->smk_netlabel.attr.mls.cat = ncats.attr.mls.cat;
+		skp->smk_netlabel.attr.mls.lvl = ncats.attr.mls.lvl;
+		rc = count;
 	}
 
-	spin_lock_bh(&skp->smk_cipsolock);
-
-	if (scp == NULL)
-		scp = skp->smk_cipso;
-	else
-		skp->smk_cipso = scp;
-
-	scp->smk_level = maplevel;
-	memcpy(scp->smk_catset, mapcatset, sizeof(mapcatset));
-
-	spin_unlock_bh(&skp->smk_cipsolock);
-
-	rc = count;
 out:
 	mutex_unlock(&smack_cipso_lock);
 unlockedout:
@@ -767,6 +860,22 @@ unlockedout:
 	return rc;
 }
 
+/**
+ * smk_write_cipso - write() for /smack/cipso
+ * @file: file pointer, not actually used
+ * @buf: where to get the data from
+ * @count: bytes sent
+ * @ppos: where to start
+ *
+ * Accepts only one cipso rule per write call.
+ * Returns number of bytes written or error code, as appropriate
+ */
+static ssize_t smk_write_cipso(struct file *file, const char __user *buf,
+			       size_t count, loff_t *ppos)
+{
+	return smk_set_cipso(file, buf, count, ppos, SMK_FIXED24_FMT);
+}
+
 static const struct file_operations smk_cipso_ops = {
 	.open           = smk_open_cipso,
 	.read		= seq_read,
@@ -775,6 +884,80 @@ static const struct file_operations smk_cipso_ops = {
 	.release        = seq_release,
 };
 
+/*
+ * Seq_file read operations for /smack/cipso2
+ */
+
+/*
+ * Print cipso labels in format:
+ * label level[/cat[,cat]]
+ */
+static int cipso2_seq_show(struct seq_file *s, void *v)
+{
+	struct list_head  *list = v;
+	struct smack_known *skp =
+		 list_entry(list, struct smack_known, list);
+	struct netlbl_lsm_secattr_catmap *cmp = skp->smk_netlabel.attr.mls.cat;
+	char sep = '/';
+	int i;
+
+	seq_printf(s, "%s %3d", skp->smk_known, skp->smk_netlabel.attr.mls.lvl);
+
+	for (i = netlbl_secattr_catmap_walk(cmp, 0); i >= 0;
+	     i = netlbl_secattr_catmap_walk(cmp, i + 1)) {
+		seq_printf(s, "%c%d", sep, i);
+		sep = ',';
+	}
+
+	seq_putc(s, '\n');
+
+	return 0;
+}
+
+static const struct seq_operations cipso2_seq_ops = {
+	.start = cipso_seq_start,
+	.next  = cipso_seq_next,
+	.show  = cipso2_seq_show,
+	.stop  = smk_seq_stop,
+};
+
+/**
+ * smk_open_cipso2 - open() for /smack/cipso2
+ * @inode: inode structure representing file
+ * @file: "cipso2" file pointer
+ *
+ * Connect our cipso_seq_* operations with /smack/cipso2
+ * file_operations
+ */
+static int smk_open_cipso2(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &cipso2_seq_ops);
+}
+
+/**
+ * smk_write_cipso2 - write() for /smack/cipso2
+ * @file: file pointer, not actually used
+ * @buf: where to get the data from
+ * @count: bytes sent
+ * @ppos: where to start
+ *
+ * Accepts only one cipso rule per write call.
+ * Returns number of bytes written or error code, as appropriate
+ */
+static ssize_t smk_write_cipso2(struct file *file, const char __user *buf,
+			      size_t count, loff_t *ppos)
+{
+	return smk_set_cipso(file, buf, count, ppos, SMK_LONG_FMT);
+}
+
+static const struct file_operations smk_cipso2_ops = {
+	.open           = smk_open_cipso2,
+	.read		= seq_read,
+	.llseek         = seq_lseek,
+	.write		= smk_write_cipso2,
+	.release        = seq_release,
+};
+
 /*
  * Seq_file read operations for /smack/netlabel
  */
@@ -887,9 +1070,9 @@ static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
 {
 	struct smk_netlbladdr *skp;
 	struct sockaddr_in newname;
-	char smack[SMK_LABELLEN];
+	char *smack;
 	char *sp;
-	char data[SMK_NETLBLADDRMAX + 1];
+	char *data;
 	char *host = (char *)&newname.sin_addr.s_addr;
 	int rc;
 	struct netlbl_audit audit_info;
@@ -911,10 +1094,23 @@ static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
 		return -EPERM;
 	if (*ppos != 0)
 		return -EINVAL;
-	if (count < SMK_NETLBLADDRMIN || count > SMK_NETLBLADDRMAX)
+	if (count < SMK_NETLBLADDRMIN)
 		return -EINVAL;
-	if (copy_from_user(data, buf, count) != 0)
-		return -EFAULT;
+
+	data = kzalloc(count + 1, GFP_KERNEL);
+	if (data == NULL)
+		return -ENOMEM;
+
+	if (copy_from_user(data, buf, count) != 0) {
+		rc = -EFAULT;
+		goto free_data_out;
+	}
+
+	smack = kzalloc(count + 1, GFP_KERNEL);
+	if (smack == NULL) {
+		rc = -ENOMEM;
+		goto free_data_out;
+	}
 
 	data[count] = '\0';
 
@@ -923,24 +1119,34 @@ static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
 	if (rc != 6) {
 		rc = sscanf(data, "%hhd.%hhd.%hhd.%hhd %s",
 			&host[0], &host[1], &host[2], &host[3], smack);
-		if (rc != 5)
-			return -EINVAL;
+		if (rc != 5) {
+			rc = -EINVAL;
+			goto free_out;
+		}
 		m = BEBITS;
 	}
-	if (m > BEBITS)
-		return -EINVAL;
+	if (m > BEBITS) {
+		rc = -EINVAL;
+		goto free_out;
+	}
 
-	/* if smack begins with '-', its an option, don't import it */
+	/*
+	 * If smack begins with '-', it is an option, don't import it
+	 */
 	if (smack[0] != '-') {
 		sp = smk_import(smack, 0);
-		if (sp == NULL)
-			return -EINVAL;
+		if (sp == NULL) {
+			rc = -EINVAL;
+			goto free_out;
+		}
 	} else {
 		/* check known options */
 		if (strcmp(smack, smack_cipso_option) == 0)
 			sp = (char *)smack_cipso_option;
-		else
-			return -EINVAL;
+		else {
+			rc = -EINVAL;
+			goto free_out;
+		}
 	}
 
 	for (temp_mask = 0; m > 0; m--) {
@@ -1006,6 +1212,11 @@ static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
 
 	mutex_unlock(&smk_netlbladdr_lock);
 
+free_out:
+	kfree(smack);
+free_data_out:
+	kfree(data);
+
 	return rc;
 }
 
@@ -1119,6 +1330,7 @@ static ssize_t smk_read_direct(struct file *filp, char __user *buf,
 static ssize_t smk_write_direct(struct file *file, const char __user *buf,
 				size_t count, loff_t *ppos)
 {
+	struct smack_known *skp;
 	char temp[80];
 	int i;
 
@@ -1136,7 +1348,20 @@ static ssize_t smk_write_direct(struct file *file, const char __user *buf,
 	if (sscanf(temp, "%d", &i) != 1)
 		return -EINVAL;
 
-	smack_cipso_direct = i;
+	/*
+	 * Don't do anything if the value hasn't actually changed.
+	 * If it is changing reset the level on entries that were
+	 * set up to be direct when they were created.
+	 */
+	if (smack_cipso_direct != i) {
+		mutex_lock(&smack_known_lock);
+		list_for_each_entry_rcu(skp, &smack_known_list, list)
+			if (skp->smk_netlabel.attr.mls.lvl ==
+			    smack_cipso_direct)
+				skp->smk_netlabel.attr.mls.lvl = i;
+		smack_cipso_direct = i;
+		mutex_unlock(&smack_known_lock);
+	}
 
 	return count;
 }
@@ -1147,6 +1372,84 @@ static const struct file_operations smk_direct_ops = {
 	.llseek		= default_llseek,
 };
 
+/**
+ * smk_read_mapped - read() for /smack/mapped
+ * @filp: file pointer, not actually used
+ * @buf: where to put the result
+ * @count: maximum to send along
+ * @ppos: where to start
+ *
+ * Returns number of bytes read or error code, as appropriate
+ */
+static ssize_t smk_read_mapped(struct file *filp, char __user *buf,
+			       size_t count, loff_t *ppos)
+{
+	char temp[80];
+	ssize_t rc;
+
+	if (*ppos != 0)
+		return 0;
+
+	sprintf(temp, "%d", smack_cipso_mapped);
+	rc = simple_read_from_buffer(buf, count, ppos, temp, strlen(temp));
+
+	return rc;
+}
+
+/**
+ * smk_write_mapped - write() for /smack/mapped
+ * @file: file pointer, not actually used
+ * @buf: where to get the data from
+ * @count: bytes sent
+ * @ppos: where to start
+ *
+ * Returns number of bytes written or error code, as appropriate
+ */
+static ssize_t smk_write_mapped(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	struct smack_known *skp;
+	char temp[80];
+	int i;
+
+	if (!capable(CAP_MAC_ADMIN))
+		return -EPERM;
+
+	if (count >= sizeof(temp) || count == 0)
+		return -EINVAL;
+
+	if (copy_from_user(temp, buf, count) != 0)
+		return -EFAULT;
+
+	temp[count] = '\0';
+
+	if (sscanf(temp, "%d", &i) != 1)
+		return -EINVAL;
+
+	/*
+	 * Don't do anything if the value hasn't actually changed.
+	 * If it is changing reset the level on entries that were
+	 * set up to be mapped when they were created.
+	 */
+	if (smack_cipso_mapped != i) {
+		mutex_lock(&smack_known_lock);
+		list_for_each_entry_rcu(skp, &smack_known_list, list)
+			if (skp->smk_netlabel.attr.mls.lvl ==
+			    smack_cipso_mapped)
+				skp->smk_netlabel.attr.mls.lvl = i;
+		smack_cipso_mapped = i;
+		mutex_unlock(&smack_known_lock);
+	}
+
+	return count;
+}
+
+static const struct file_operations smk_mapped_ops = {
+	.read		= smk_read_mapped,
+	.write		= smk_write_mapped,
+	.llseek		= default_llseek,
+};
+
 /**
  * smk_read_ambient - read() for /smack/ambient
  * @filp: file pointer, not actually used
@@ -1195,22 +1498,28 @@ static ssize_t smk_read_ambient(struct file *filp, char __user *buf,
 static ssize_t smk_write_ambient(struct file *file, const char __user *buf,
 				 size_t count, loff_t *ppos)
 {
-	char in[SMK_LABELLEN];
 	char *oldambient;
-	char *smack;
+	char *smack = NULL;
+	char *data;
+	int rc = count;
 
 	if (!capable(CAP_MAC_ADMIN))
 		return -EPERM;
 
-	if (count >= SMK_LABELLEN)
-		return -EINVAL;
+	data = kzalloc(count + 1, GFP_KERNEL);
+	if (data == NULL)
+		return -ENOMEM;
 
-	if (copy_from_user(in, buf, count) != 0)
-		return -EFAULT;
+	if (copy_from_user(data, buf, count) != 0) {
+		rc = -EFAULT;
+		goto out;
+	}
 
-	smack = smk_import(in, count);
-	if (smack == NULL)
-		return -EINVAL;
+	smack = smk_import(data, count);
+	if (smack == NULL) {
+		rc = -EINVAL;
+		goto out;
+	}
 
 	mutex_lock(&smack_ambient_lock);
 
@@ -1220,7 +1529,9 @@ static ssize_t smk_write_ambient(struct file *file, const char __user *buf,
 
 	mutex_unlock(&smack_ambient_lock);
 
-	return count;
+out:
+	kfree(data);
+	return rc;
 }
 
 static const struct file_operations smk_ambient_ops = {
@@ -1271,8 +1582,9 @@ static ssize_t smk_read_onlycap(struct file *filp, char __user *buf,
 static ssize_t smk_write_onlycap(struct file *file, const char __user *buf,
 				 size_t count, loff_t *ppos)
 {
-	char in[SMK_LABELLEN];
+	char *data;
 	char *sp = smk_of_task(current->cred->security);
+	int rc = count;
 
 	if (!capable(CAP_MAC_ADMIN))
 		return -EPERM;
@@ -1285,11 +1597,9 @@ static ssize_t smk_write_onlycap(struct file *file, const char __user *buf,
 	if (smack_onlycap != NULL && smack_onlycap != sp)
 		return -EPERM;
 
-	if (count >= SMK_LABELLEN)
-		return -EINVAL;
-
-	if (copy_from_user(in, buf, count) != 0)
-		return -EFAULT;
+	data = kzalloc(count, GFP_KERNEL);
+	if (data == NULL)
+		return -ENOMEM;
 
 	/*
 	 * Should the null string be passed in unset the onlycap value.
@@ -1297,10 +1607,17 @@ static ssize_t smk_write_onlycap(struct file *file, const char __user *buf,
 	 * smk_import only expects to return NULL for errors. It
 	 * is usually the case that a nullstring or "\n" would be
 	 * bad to pass to smk_import but in fact this is useful here.
+	 *
+	 * smk_import will also reject a label beginning with '-',
+	 * so "-usecapabilities" will also work.
 	 */
-	smack_onlycap = smk_import(in, count);
+	if (copy_from_user(data, buf, count) != 0)
+		rc = -EFAULT;
+	else
+		smack_onlycap = smk_import(data, count);
 
-	return count;
+	kfree(data);
+	return rc;
 }
 
 static const struct file_operations smk_onlycap_ops = {
@@ -1398,25 +1715,7 @@ static int load_self_seq_show(struct seq_file *s, void *v)
 	struct smack_rule *srp =
 		 list_entry(list, struct smack_rule, list);
 
-	seq_printf(s, "%s %s", (char *)srp->smk_subject,
-		   (char *)srp->smk_object);
-
-	seq_putc(s, ' ');
-
-	if (srp->smk_access & MAY_READ)
-		seq_putc(s, 'r');
-	if (srp->smk_access & MAY_WRITE)
-		seq_putc(s, 'w');
-	if (srp->smk_access & MAY_EXEC)
-		seq_putc(s, 'x');
-	if (srp->smk_access & MAY_APPEND)
-		seq_putc(s, 'a');
-	if (srp->smk_access & MAY_TRANSMUTE)
-		seq_putc(s, 't');
-	if (srp->smk_access == 0)
-		seq_putc(s, '-');
-
-	seq_putc(s, '\n');
+	smk_rule_show(s, srp, SMK_LABELLEN);
 
 	return 0;
 }
@@ -1430,7 +1729,7 @@ static const struct seq_operations load_self_seq_ops = {
 
 
 /**
- * smk_open_load_self - open() for /smack/load-self
+ * smk_open_load_self - open() for /smack/load-self2
  * @inode: inode structure representing file
  * @file: "load" file pointer
  *
@@ -1454,8 +1753,8 @@ static ssize_t smk_write_load_self(struct file *file, const char __user *buf,
 {
 	struct task_smack *tsp = current_security();
 
-	return smk_write_load_list(file, buf, count, ppos, &tsp->smk_rules,
-					&tsp->smk_rules_lock);
+	return smk_write_rules_list(file, buf, count, ppos, &tsp->smk_rules,
+				    &tsp->smk_rules_lock, SMK_FIXED24_FMT);
 }
 
 static const struct file_operations smk_load_self_ops = {
@@ -1467,24 +1766,42 @@ static const struct file_operations smk_load_self_ops = {
 };
 
 /**
- * smk_write_access - handle access check transaction
+ * smk_user_access - handle access check transaction
  * @file: file pointer
  * @buf: data from user space
  * @count: bytes sent
  * @ppos: where to start - must be 0
  */
-static ssize_t smk_write_access(struct file *file, const char __user *buf,
-				size_t count, loff_t *ppos)
+static ssize_t smk_user_access(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos, int format)
 {
 	struct smack_rule rule;
 	char *data;
+	char *cod;
 	int res;
 
 	data = simple_transaction_get(file, buf, count);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
-	if (count < SMK_LOADLEN || smk_parse_rule(data, &rule, 0))
+	if (format == SMK_FIXED24_FMT) {
+		if (count < SMK_LOADLEN)
+			return -EINVAL;
+		res = smk_parse_rule(data, &rule, 0);
+	} else {
+		/*
+		 * Copy the data to make sure the string is terminated.
+		 */
+		cod = kzalloc(count + 1, GFP_KERNEL);
+		if (cod == NULL)
+			return -ENOMEM;
+		memcpy(cod, data, count);
+		cod[count] = '\0';
+		res = smk_parse_long_rule(cod, &rule, 0);
+		kfree(cod);
+	}
+
+	if (res)
 		return -EINVAL;
 
 	res = smk_access(rule.smk_subject, rule.smk_object, rule.smk_access,
@@ -1493,7 +1810,23 @@ static ssize_t smk_write_access(struct file *file, const char __user *buf,
 	data[1] = '\0';
 
 	simple_transaction_set(file, 2);
-	return SMK_LOADLEN;
+
+	if (format == SMK_FIXED24_FMT)
+		return SMK_LOADLEN;
+	return count;
+}
+
+/**
+ * smk_write_access - handle access check transaction
+ * @file: file pointer
+ * @buf: data from user space
+ * @count: bytes sent
+ * @ppos: where to start - must be 0
+ */
+static ssize_t smk_write_access(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	return smk_user_access(file, buf, count, ppos, SMK_FIXED24_FMT);
 }
 
 static const struct file_operations smk_access_ops = {
@@ -1503,6 +1836,163 @@ static const struct file_operations smk_access_ops = {
 	.llseek		= generic_file_llseek,
 };
 
+
+/*
+ * Seq_file read operations for /smack/load2
+ */
+
+static int load2_seq_show(struct seq_file *s, void *v)
+{
+	struct list_head *list = v;
+	struct smack_master_list *smlp =
+		 list_entry(list, struct smack_master_list, list);
+
+	smk_rule_show(s, smlp->smk_rule, SMK_LONGLABEL);
+
+	return 0;
+}
+
+static const struct seq_operations load2_seq_ops = {
+	.start = load2_seq_start,
+	.next  = load2_seq_next,
+	.show  = load2_seq_show,
+	.stop  = smk_seq_stop,
+};
+
+/**
+ * smk_open_load2 - open() for /smack/load2
+ * @inode: inode structure representing file
+ * @file: "load2" file pointer
+ *
+ * For reading, use load2_seq_* seq_file reading operations.
+ */
+static int smk_open_load2(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &load2_seq_ops);
+}
+
+/**
+ * smk_write_load2 - write() for /smack/load2
+ * @file: file pointer, not actually used
+ * @buf: where to get the data from
+ * @count: bytes sent
+ * @ppos: where to start - must be 0
+ *
+ */
+static ssize_t smk_write_load2(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	/*
+	 * Must have privilege.
+	 */
+	if (!capable(CAP_MAC_ADMIN))
+		return -EPERM;
+
+	return smk_write_rules_list(file, buf, count, ppos, NULL, NULL,
+				    SMK_LONG_FMT);
+}
+
+static const struct file_operations smk_load2_ops = {
+	.open           = smk_open_load2,
+	.read		= seq_read,
+	.llseek         = seq_lseek,
+	.write		= smk_write_load2,
+	.release        = seq_release,
+};
+
+/*
+ * Seq_file read operations for /smack/load-self2
+ */
+
+static void *load_self2_seq_start(struct seq_file *s, loff_t *pos)
+{
+	struct task_smack *tsp = current_security();
+
+	return smk_seq_start(s, pos, &tsp->smk_rules);
+}
+
+static void *load_self2_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	struct task_smack *tsp = current_security();
+
+	return smk_seq_next(s, v, pos, &tsp->smk_rules);
+}
+
+static int load_self2_seq_show(struct seq_file *s, void *v)
+{
+	struct list_head *list = v;
+	struct smack_rule *srp =
+		 list_entry(list, struct smack_rule, list);
+
+	smk_rule_show(s, srp, SMK_LONGLABEL);
+
+	return 0;
+}
+
+static const struct seq_operations load_self2_seq_ops = {
+	.start = load_self2_seq_start,
+	.next  = load_self2_seq_next,
+	.show  = load_self2_seq_show,
+	.stop  = smk_seq_stop,
+};
+
+/**
+ * smk_open_load_self2 - open() for /smack/load-self2
+ * @inode: inode structure representing file
+ * @file: "load" file pointer
+ *
+ * For reading, use load_seq_* seq_file reading operations.
+ */
+static int smk_open_load_self2(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &load_self2_seq_ops);
+}
+
+/**
+ * smk_write_load_self2 - write() for /smack/load-self2
+ * @file: file pointer, not actually used
+ * @buf: where to get the data from
+ * @count: bytes sent
+ * @ppos: where to start - must be 0
+ *
+ */
+static ssize_t smk_write_load_self2(struct file *file, const char __user *buf,
+			      size_t count, loff_t *ppos)
+{
+	struct task_smack *tsp = current_security();
+
+	return smk_write_rules_list(file, buf, count, ppos, &tsp->smk_rules,
+				    &tsp->smk_rules_lock, SMK_LONG_FMT);
+}
+
+static const struct file_operations smk_load_self2_ops = {
+	.open           = smk_open_load_self2,
+	.read		= seq_read,
+	.llseek         = seq_lseek,
+	.write		= smk_write_load_self2,
+	.release        = seq_release,
+};
+
+/**
+ * smk_write_access2 - handle access check transaction
+ * @file: file pointer
+ * @buf: data from user space
+ * @count: bytes sent
+ * @ppos: where to start - must be 0
+ */
+static ssize_t smk_write_access2(struct file *file, const char __user *buf,
+					size_t count, loff_t *ppos)
+{
+	return smk_user_access(file, buf, count, ppos, SMK_LONG_FMT);
+}
+
+static const struct file_operations smk_access2_ops = {
+	.write		= smk_write_access2,
+	.read		= simple_transaction_read,
+	.release	= simple_transaction_release,
+	.llseek		= generic_file_llseek,
+};
+
 /**
  * smk_fill_super - fill the /smackfs superblock
  * @sb: the empty superblock
@@ -1539,6 +2029,16 @@ static int smk_fill_super(struct super_block *sb, void *data, int silent)
 			"load-self", &smk_load_self_ops, S_IRUGO|S_IWUGO},
 		[SMK_ACCESSES] = {
 			"access", &smk_access_ops, S_IRUGO|S_IWUGO},
+		[SMK_MAPPED] = {
+			"mapped", &smk_mapped_ops, S_IRUGO|S_IWUSR},
+		[SMK_LOAD2] = {
+			"load2", &smk_load2_ops, S_IRUGO|S_IWUSR},
+		[SMK_LOAD_SELF2] = {
+			"load-self2", &smk_load_self2_ops, S_IRUGO|S_IWUGO},
+		[SMK_ACCESS2] = {
+			"access2", &smk_access2_ops, S_IRUGO|S_IWUGO},
+		[SMK_CIPSO2] = {
+			"cipso2", &smk_cipso2_ops, S_IRUGO|S_IWUSR},
 		/* last one */
 			{""}
 	};
@@ -1581,6 +2081,15 @@ static struct file_system_type smk_fs_type = {
 
 static struct vfsmount *smackfs_mount;
 
+static int __init smk_preset_netlabel(struct smack_known *skp)
+{
+	skp->smk_netlabel.domain = skp->smk_known;
+	skp->smk_netlabel.flags =
+		NETLBL_SECATTR_DOMAIN | NETLBL_SECATTR_MLS_LVL;
+	return smk_netlbl_mls(smack_cipso_direct, skp->smk_known,
+				&skp->smk_netlabel, strlen(skp->smk_known));
+}
+
 /**
  * init_smk_fs - get the smackfs superblock
  *
@@ -1597,6 +2106,7 @@ static struct vfsmount *smackfs_mount;
 static int __init init_smk_fs(void)
 {
 	int err;
+	int rc;
 
 	if (!security_module_enable(&smack_ops))
 		return 0;
@@ -1614,6 +2124,25 @@ static int __init init_smk_fs(void)
 	smk_cipso_doi();
 	smk_unlbl_ambient(NULL);
 
+	rc = smk_preset_netlabel(&smack_known_floor);
+	if (err == 0 && rc < 0)
+		err = rc;
+	rc = smk_preset_netlabel(&smack_known_hat);
+	if (err == 0 && rc < 0)
+		err = rc;
+	rc = smk_preset_netlabel(&smack_known_huh);
+	if (err == 0 && rc < 0)
+		err = rc;
+	rc = smk_preset_netlabel(&smack_known_invalid);
+	if (err == 0 && rc < 0)
+		err = rc;
+	rc = smk_preset_netlabel(&smack_known_star);
+	if (err == 0 && rc < 0)
+		err = rc;
+	rc = smk_preset_netlabel(&smack_known_web);
+	if (err == 0 && rc < 0)
+		err = rc;
+
 	return err;
 }
 
-- 
cgit v1.2.3


From b404aef72fdafb601c945c714164c0ee2b04c364 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Tue, 15 May 2012 14:11:11 +0100
Subject: KEYS: Don't check for NULL key pointer in key_validate()

Don't bother checking for NULL key pointer in key_validate() as all of the
places that call it will crash anyway if the relevant key pointer is NULL by
the time they call key_validate().  Therefore, the checking must be done prior
to calling here.

Whilst we're at it, simplify the key_validate() function a bit and mark its
argument const.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 include/linux/key.h        |  2 +-
 security/keys/permission.c | 40 ++++++++++++++++------------------------
 2 files changed, 17 insertions(+), 25 deletions(-)

diff --git a/include/linux/key.h b/include/linux/key.h
index b145b054b3e..5231800770e 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -242,7 +242,7 @@ extern struct key *request_key_async_with_auxdata(struct key_type *type,
 
 extern int wait_for_key_construction(struct key *key, bool intr);
 
-extern int key_validate(struct key *key);
+extern int key_validate(const struct key *key);
 
 extern key_ref_t key_create_or_update(key_ref_t keyring,
 				      const char *type,
diff --git a/security/keys/permission.c b/security/keys/permission.c
index 5f4c00c0947..57d96363d7f 100644
--- a/security/keys/permission.c
+++ b/security/keys/permission.c
@@ -91,33 +91,25 @@ EXPORT_SYMBOL(key_task_permission);
  * key is invalidated, -EKEYREVOKED if the key's type has been removed or if
  * the key has been revoked or -EKEYEXPIRED if the key has expired.
  */
-int key_validate(struct key *key)
+int key_validate(const struct key *key)
 {
-	struct timespec now;
 	unsigned long flags = key->flags;
-	int ret = 0;
-
-	if (key) {
-		ret = -ENOKEY;
-		if (flags & (1 << KEY_FLAG_INVALIDATED))
-			goto error;
-
-		/* check it's still accessible */
-		ret = -EKEYREVOKED;
-		if (flags & ((1 << KEY_FLAG_REVOKED) |
-			     (1 << KEY_FLAG_DEAD)))
-			goto error;
-
-		/* check it hasn't expired */
-		ret = 0;
-		if (key->expiry) {
-			now = current_kernel_time();
-			if (now.tv_sec >= key->expiry)
-				ret = -EKEYEXPIRED;
-		}
+
+	if (flags & (1 << KEY_FLAG_INVALIDATED))
+		return -ENOKEY;
+
+	/* check it's still accessible */
+	if (flags & ((1 << KEY_FLAG_REVOKED) |
+		     (1 << KEY_FLAG_DEAD)))
+		return -EKEYREVOKED;
+
+	/* check it hasn't expired */
+	if (key->expiry) {
+		struct timespec now = current_kernel_time();
+		if (now.tv_sec >= key->expiry)
+			return -EKEYEXPIRED;
 	}
 
-error:
-	return ret;
+	return 0;
 }
 EXPORT_SYMBOL(key_validate);
-- 
cgit v1.2.3


From fbbb456347b21279a379b42eeb31151c33d8dd49 Mon Sep 17 00:00:00 2001
From: Mimi Zohar <zohar@us.ibm.com>
Date: Mon, 14 May 2012 21:50:11 -0400
Subject: ima: fix filename hint to reflect script interpreter name

When IMA was first upstreamed, the bprm filename and interp were
always the same.  Currently, the bprm->filename and bprm->interp
are the same, except for when only bprm->interp contains the
interpreter name.  So instead of using the bprm->filename as
the IMA filename hint in the measurement list, we could replace
it with bprm->interp, but this feels too fragil.

The following patch is not much better, but at least there is some
indication that sometimes we're passing the filename and other times
the interpreter name.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 security/integrity/ima/ima_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 1eff5cb001e..b17be79b9cf 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -194,7 +194,9 @@ int ima_bprm_check(struct linux_binprm *bprm)
 {
 	int rc;
 
-	rc = process_measurement(bprm->file, bprm->filename,
+	rc = process_measurement(bprm->file,
+				 (strcmp(bprm->filename, bprm->interp) == 0) ?
+				 bprm->filename : bprm->interp,
 				 MAY_EXEC, BPRM_CHECK);
 	return 0;
 }
-- 
cgit v1.2.3


From bf83208e0b7f5938f5a7f6d9dfa9960bf04692fa Mon Sep 17 00:00:00 2001
From: John Johansen <john.johansen@canonical.com>
Date: Wed, 16 May 2012 11:00:05 -0700
Subject: apparmor: fix profile lookup for unconfined

BugLink: http://bugs.launchpad.net/bugs/978038

also affects apparmor portion of
BugLink: http://bugs.launchpad.net/bugs/987371

The unconfined profile is not stored in the regular profile list, but
change_profile and exec transitions may want access to it when setting
up specialized transitions like switch to the unconfined profile of a
new policy namespace.

Signed-off-by: John Johansen <john.johansen@canonical.com>
---
 security/apparmor/policy.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c
index f1f7506a464..7f3f455d8ea 100644
--- a/security/apparmor/policy.c
+++ b/security/apparmor/policy.c
@@ -903,6 +903,10 @@ struct aa_profile *aa_lookup_profile(struct aa_namespace *ns, const char *hname)
 	profile = aa_get_profile(__lookup_profile(&ns->base, hname));
 	read_unlock(&ns->lock);
 
+	/* the unconfined profile is not in the regular profile list */
+	if (!profile && strcmp(hname, "unconfined") == 0)
+		profile = aa_get_profile(ns->unconfined);
+
 	/* refcount released by caller */
 	return profile;
 }
-- 
cgit v1.2.3


From cffee16e8b997ab947de661e8820e486b0830c94 Mon Sep 17 00:00:00 2001
From: John Johansen <john.johansen@canonical.com>
Date: Wed, 16 May 2012 11:01:05 -0700
Subject: apparmor: fix long path failure due to disconnected path

BugLink: http://bugs.launchpad.net/bugs/955892

All failures from __d_path where being treated as disconnected paths,
however __d_path can also fail when the generated pathname is too long.

The initial ENAMETOOLONG error was being lost, and ENAMETOOLONG was only
returned if the subsequent dentry_path call resulted in that error.  Other
wise if the path was split across a mount point such that the dentry_path
fit within the buffer when the __d_path did not the failure was treated
as a disconnected path.

Signed-off-by: John Johansen <john.johansen@canonical.com>
---
 security/apparmor/path.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/security/apparmor/path.c b/security/apparmor/path.c
index 2daeea4f926..e91ffee8016 100644
--- a/security/apparmor/path.c
+++ b/security/apparmor/path.c
@@ -94,6 +94,8 @@ static int d_namespace_path(struct path *path, char *buf, int buflen,
 	 * be returned.
 	 */
 	if (!res || IS_ERR(res)) {
+		if (PTR_ERR(res) == -ENAMETOOLONG)
+			return -ENAMETOOLONG;
 		connected = 0;
 		res = dentry_path_raw(path->dentry, buf, buflen);
 		if (IS_ERR(res)) {
-- 
cgit v1.2.3