From 2e40e163a25af3bd35d128d3e2e005916de5cce6 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Wed, 15 Apr 2015 16:15:23 -0700 Subject: zsmalloc: decouple handle and object Recently, we started to use zram heavily and some of issues popped. 1) external fragmentation I got a report from Juneho Choi that fork failed although there are plenty of free pages in the system. His investigation revealed zram is one of the culprit to make heavy fragmentation so there was no more contiguous 16K page for pgd to fork in the ARM. 2) non-movable pages Other problem of zram now is that inherently, user want to use zram as swap in small memory system so they use zRAM with CMA to use memory efficiently. However, unfortunately, it doesn't work well because zRAM cannot use CMA's movable pages unless it doesn't support compaction. I got several reports about that OOM happened with zram although there are lots of swap space and free space in CMA area. 3) internal fragmentation zRAM has started support memory limitation feature to limit memory usage and I sent a patchset(https://lkml.org/lkml/2014/9/21/148) for VM to be harmonized with zram-swap to stop anonymous page reclaim if zram consumed memory up to the limit although there are free space on the swap. One problem for that direction is zram has no way to know any hole in memory space zsmalloc allocated by internal fragmentation so zram would regard swap is full although there are free space in zsmalloc. For solving the issue, zram want to trigger compaction of zsmalloc before it decides full or not. This patchset is first step to support above issues. For that, it adds indirect layer between handle and object location and supports manual compaction to solve 3th problem first of all. After this patchset got merged, next step is to make VM aware of zsmalloc compaction so that generic compaction will move zsmalloced-pages automatically in runtime. In my imaginary experiment(ie, high compress ratio data with heavy swap in/out on 8G zram-swap), data is as follows, Before = zram allocated object : 60212066 bytes zram total used: 140103680 bytes ratio: 42.98 percent MemFree: 840192 kB Compaction After = frag ratio after compaction zram allocated object : 60212066 bytes zram total used: 76185600 bytes ratio: 79.03 percent MemFree: 901932 kB Juneho reported below in his real platform with small aging. So, I think the benefit would be bigger in real aging system for a long time. - frag_ratio increased 3% (ie, higher is better) - memfree increased about 6MB - In buddy info, Normal 2^3: 4, 2^2: 1: 2^1 increased, Highmem: 2^1 21 increased frag ratio after swap fragment used : 156677 kbytes total: 166092 kbytes frag_ratio : 94 meminfo before compaction MemFree: 83724 kB Node 0, zone Normal 13642 1364 57 10 61 17 9 5 4 0 0 Node 0, zone HighMem 425 29 1 0 0 0 0 0 0 0 0 num_migrated : 23630 compaction done frag ratio after compaction used : 156673 kbytes total: 160564 kbytes frag_ratio : 97 meminfo after compaction MemFree: 89060 kB Node 0, zone Normal 14076 1544 67 14 61 17 9 5 4 0 0 Node 0, zone HighMem 863 50 1 0 0 0 0 0 0 0 0 This patchset adds more logics(about 480 lines) in zsmalloc but when I tested heavy swapin/out program, the regression for swapin/out speed is marginal because most of overheads were caused by compress/decompress and other MM reclaim stuff. This patch (of 7): Currently, handle of zsmalloc encodes object's location directly so it makes support of migration hard. This patch decouples handle and object via adding indirect layer. For that, it allocates handle dynamically and returns it to user. The handle is the address allocated by slab allocation so it's unique and we could keep object's location in the memory space allocated for handle. With it, we can change object's position without changing handle itself. Signed-off-by: Minchan Kim Cc: Juneho Choi Cc: Gunho Lee Cc: Luigi Semenzato Cc: Dan Streetman Cc: Seth Jennings Cc: Nitin Gupta Cc: Jerome Marchand Cc: Sergey Senozhatsky Cc: Joonsoo Kim Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 126 +++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 98 insertions(+), 28 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 0dec1fa5f656..6f3cfbf5e237 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -110,6 +110,8 @@ #define ZS_MAX_ZSPAGE_ORDER 2 #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) +#define ZS_HANDLE_SIZE (sizeof(unsigned long)) + /* * Object location (, ) is encoded as * as single (unsigned long) handle value. @@ -140,7 +142,8 @@ /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */ #define ZS_MIN_ALLOC_SIZE \ MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) -#define ZS_MAX_ALLOC_SIZE PAGE_SIZE +/* each chunk includes extra space to keep handle */ +#define ZS_MAX_ALLOC_SIZE (PAGE_SIZE + ZS_HANDLE_SIZE) /* * On systems with 4K page size, this gives 255 size classes! There is a @@ -233,14 +236,24 @@ struct size_class { * This must be power of 2 and less than or equal to ZS_ALIGN */ struct link_free { - /* Handle of next free chunk (encodes ) */ - void *next; + union { + /* + * Position of next free chunk (encodes ) + * It's valid for non-allocated object + */ + void *next; + /* + * Handle of allocated object. + */ + unsigned long handle; + }; }; struct zs_pool { char *name; struct size_class **size_class; + struct kmem_cache *handle_cachep; gfp_t flags; /* allocation flags used when growing pool */ atomic_long_t pages_allocated; @@ -269,6 +282,34 @@ struct mapping_area { enum zs_mapmode vm_mm; /* mapping mode */ }; +static int create_handle_cache(struct zs_pool *pool) +{ + pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE, + 0, 0, NULL); + return pool->handle_cachep ? 0 : 1; +} + +static void destroy_handle_cache(struct zs_pool *pool) +{ + kmem_cache_destroy(pool->handle_cachep); +} + +static unsigned long alloc_handle(struct zs_pool *pool) +{ + return (unsigned long)kmem_cache_alloc(pool->handle_cachep, + pool->flags & ~__GFP_HIGHMEM); +} + +static void free_handle(struct zs_pool *pool, unsigned long handle) +{ + kmem_cache_free(pool->handle_cachep, (void *)handle); +} + +static void record_obj(unsigned long handle, unsigned long obj) +{ + *(unsigned long *)handle = obj; +} + /* zpool driver */ #ifdef CONFIG_ZPOOL @@ -595,13 +636,18 @@ static void *obj_location_to_handle(struct page *page, unsigned long obj_idx) * decoded obj_idx back to its original value since it was adjusted in * obj_location_to_handle(). */ -static void obj_handle_to_location(unsigned long handle, struct page **page, +static void obj_to_location(unsigned long handle, struct page **page, unsigned long *obj_idx) { *page = pfn_to_page(handle >> OBJ_INDEX_BITS); *obj_idx = (handle & OBJ_INDEX_MASK) - 1; } +static unsigned long handle_to_obj(unsigned long handle) +{ + return *(unsigned long *)handle; +} + static unsigned long obj_idx_to_offset(struct page *page, unsigned long obj_idx, int class_size) { @@ -860,12 +906,16 @@ static void __zs_unmap_object(struct mapping_area *area, { int sizes[2]; void *addr; - char *buf = area->vm_buf; + char *buf; /* no write fastpath */ if (area->vm_mm == ZS_MM_RO) goto out; + buf = area->vm_buf + ZS_HANDLE_SIZE; + size -= ZS_HANDLE_SIZE; + off += ZS_HANDLE_SIZE; + sizes[0] = PAGE_SIZE - off; sizes[1] = size - sizes[0]; @@ -1153,13 +1203,14 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, enum zs_mapmode mm) { struct page *page; - unsigned long obj_idx, off; + unsigned long obj, obj_idx, off; unsigned int class_idx; enum fullness_group fg; struct size_class *class; struct mapping_area *area; struct page *pages[2]; + void *ret; BUG_ON(!handle); @@ -1170,7 +1221,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, */ BUG_ON(in_interrupt()); - obj_handle_to_location(handle, &page, &obj_idx); + obj = handle_to_obj(handle); + obj_to_location(obj, &page, &obj_idx); get_zspage_mapping(get_first_page(page), &class_idx, &fg); class = pool->size_class[class_idx]; off = obj_idx_to_offset(page, obj_idx, class->size); @@ -1180,7 +1232,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, if (off + class->size <= PAGE_SIZE) { /* this object is contained entirely within a page */ area->vm_addr = kmap_atomic(page); - return area->vm_addr + off; + ret = area->vm_addr + off; + goto out; } /* this object spans two pages */ @@ -1188,14 +1241,16 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, pages[1] = get_next_page(page); BUG_ON(!pages[1]); - return __zs_map_object(area, pages, off, class->size); + ret = __zs_map_object(area, pages, off, class->size); +out: + return ret + ZS_HANDLE_SIZE; } EXPORT_SYMBOL_GPL(zs_map_object); void zs_unmap_object(struct zs_pool *pool, unsigned long handle) { struct page *page; - unsigned long obj_idx, off; + unsigned long obj, obj_idx, off; unsigned int class_idx; enum fullness_group fg; @@ -1204,7 +1259,8 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) BUG_ON(!handle); - obj_handle_to_location(handle, &page, &obj_idx); + obj = handle_to_obj(handle); + obj_to_location(obj, &page, &obj_idx); get_zspage_mapping(get_first_page(page), &class_idx, &fg); class = pool->size_class[class_idx]; off = obj_idx_to_offset(page, obj_idx, class->size); @@ -1236,7 +1292,7 @@ EXPORT_SYMBOL_GPL(zs_unmap_object); */ unsigned long zs_malloc(struct zs_pool *pool, size_t size) { - unsigned long obj; + unsigned long handle, obj; struct link_free *link; struct size_class *class; void *vaddr; @@ -1244,9 +1300,15 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) struct page *first_page, *m_page; unsigned long m_objidx, m_offset; - if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE)) + if (unlikely(!size || (size + ZS_HANDLE_SIZE) > ZS_MAX_ALLOC_SIZE)) + return 0; + + handle = alloc_handle(pool); + if (!handle) return 0; + /* extra space in chunk to keep the handle */ + size += ZS_HANDLE_SIZE; class = pool->size_class[get_size_class_index(size)]; spin_lock(&class->lock); @@ -1255,8 +1317,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) if (!first_page) { spin_unlock(&class->lock); first_page = alloc_zspage(class, pool->flags); - if (unlikely(!first_page)) + if (unlikely(!first_page)) { + free_handle(pool, handle); return 0; + } set_zspage_mapping(first_page, class->index, ZS_EMPTY); atomic_long_add(class->pages_per_zspage, @@ -1268,40 +1332,45 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) } obj = (unsigned long)first_page->freelist; - obj_handle_to_location(obj, &m_page, &m_objidx); + obj_to_location(obj, &m_page, &m_objidx); m_offset = obj_idx_to_offset(m_page, m_objidx, class->size); vaddr = kmap_atomic(m_page); link = (struct link_free *)vaddr + m_offset / sizeof(*link); first_page->freelist = link->next; - memset(link, POISON_INUSE, sizeof(*link)); + + /* record handle in the header of allocated chunk */ + link->handle = handle; kunmap_atomic(vaddr); first_page->inuse++; zs_stat_inc(class, OBJ_USED, 1); /* Now move the zspage to another fullness group, if required */ fix_fullness_group(pool, first_page); + record_obj(handle, obj); spin_unlock(&class->lock); - return obj; + return handle; } EXPORT_SYMBOL_GPL(zs_malloc); -void zs_free(struct zs_pool *pool, unsigned long obj) +void zs_free(struct zs_pool *pool, unsigned long handle) { struct link_free *link; struct page *first_page, *f_page; - unsigned long f_objidx, f_offset; + unsigned long obj, f_objidx, f_offset; void *vaddr; int class_idx; struct size_class *class; enum fullness_group fullness; - if (unlikely(!obj)) + if (unlikely(!handle)) return; - obj_handle_to_location(obj, &f_page, &f_objidx); + obj = handle_to_obj(handle); + free_handle(pool, handle); + obj_to_location(obj, &f_page, &f_objidx); first_page = get_first_page(f_page); get_zspage_mapping(first_page, &class_idx, &fullness); @@ -1355,20 +1424,20 @@ struct zs_pool *zs_create_pool(char *name, gfp_t flags) if (!pool) return NULL; - pool->name = kstrdup(name, GFP_KERNEL); - if (!pool->name) { - kfree(pool); - return NULL; - } - pool->size_class = kcalloc(zs_size_classes, sizeof(struct size_class *), GFP_KERNEL); if (!pool->size_class) { - kfree(pool->name); kfree(pool); return NULL; } + pool->name = kstrdup(name, GFP_KERNEL); + if (!pool->name) + goto err; + + if (create_handle_cache(pool)) + goto err; + /* * Iterate reversly, because, size of size_class that we want to use * for merging should be larger or equal to current size. @@ -1450,6 +1519,7 @@ void zs_destroy_pool(struct zs_pool *pool) kfree(class); } + destroy_handle_cache(pool); kfree(pool->size_class); kfree(pool->name); kfree(pool); -- cgit v1.2.3 From c78062612fb525430b775a0bef4d3cc07e512da0 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Wed, 15 Apr 2015 16:15:26 -0700 Subject: zsmalloc: factor out obj_[malloc|free] In later patch, migration needs some part of functions in zs_malloc and zs_free so this patch factor out them. Signed-off-by: Minchan Kim Cc: Juneho Choi Cc: Gunho Lee Cc: Luigi Semenzato Cc: Dan Streetman Cc: Seth Jennings Cc: Nitin Gupta Cc: Jerome Marchand Cc: Sergey Senozhatsky Cc: Joonsoo Kim Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 98 ++++++++++++++++++++++++++++++++++++----------------------- 1 file changed, 60 insertions(+), 38 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 6f3cfbf5e237..55b171016f4f 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -525,11 +525,10 @@ static void remove_zspage(struct page *page, struct size_class *class, * page from the freelist of the old fullness group to that of the new * fullness group. */ -static enum fullness_group fix_fullness_group(struct zs_pool *pool, +static enum fullness_group fix_fullness_group(struct size_class *class, struct page *page) { int class_idx; - struct size_class *class; enum fullness_group currfg, newfg; BUG_ON(!is_first_page(page)); @@ -539,7 +538,6 @@ static enum fullness_group fix_fullness_group(struct zs_pool *pool, if (newfg == currfg) goto out; - class = pool->size_class[class_idx]; remove_zspage(page, class, currfg); insert_zspage(page, class, newfg); set_zspage_mapping(page, class_idx, newfg); @@ -1281,6 +1279,33 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) } EXPORT_SYMBOL_GPL(zs_unmap_object); +static unsigned long obj_malloc(struct page *first_page, + struct size_class *class, unsigned long handle) +{ + unsigned long obj; + struct link_free *link; + + struct page *m_page; + unsigned long m_objidx, m_offset; + void *vaddr; + + obj = (unsigned long)first_page->freelist; + obj_to_location(obj, &m_page, &m_objidx); + m_offset = obj_idx_to_offset(m_page, m_objidx, class->size); + + vaddr = kmap_atomic(m_page); + link = (struct link_free *)vaddr + m_offset / sizeof(*link); + first_page->freelist = link->next; + /* record handle in the header of allocated chunk */ + link->handle = handle; + kunmap_atomic(vaddr); + first_page->inuse++; + zs_stat_inc(class, OBJ_USED, 1); + + return obj; +} + + /** * zs_malloc - Allocate block of given size from pool. * @pool: pool to allocate from @@ -1293,12 +1318,8 @@ EXPORT_SYMBOL_GPL(zs_unmap_object); unsigned long zs_malloc(struct zs_pool *pool, size_t size) { unsigned long handle, obj; - struct link_free *link; struct size_class *class; - void *vaddr; - - struct page *first_page, *m_page; - unsigned long m_objidx, m_offset; + struct page *first_page; if (unlikely(!size || (size + ZS_HANDLE_SIZE) > ZS_MAX_ALLOC_SIZE)) return 0; @@ -1331,22 +1352,9 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) class->size, class->pages_per_zspage)); } - obj = (unsigned long)first_page->freelist; - obj_to_location(obj, &m_page, &m_objidx); - m_offset = obj_idx_to_offset(m_page, m_objidx, class->size); - - vaddr = kmap_atomic(m_page); - link = (struct link_free *)vaddr + m_offset / sizeof(*link); - first_page->freelist = link->next; - - /* record handle in the header of allocated chunk */ - link->handle = handle; - kunmap_atomic(vaddr); - - first_page->inuse++; - zs_stat_inc(class, OBJ_USED, 1); + obj = obj_malloc(first_page, class, handle); /* Now move the zspage to another fullness group, if required */ - fix_fullness_group(pool, first_page); + fix_fullness_group(class, first_page); record_obj(handle, obj); spin_unlock(&class->lock); @@ -1354,46 +1362,60 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) } EXPORT_SYMBOL_GPL(zs_malloc); -void zs_free(struct zs_pool *pool, unsigned long handle) +static void obj_free(struct zs_pool *pool, struct size_class *class, + unsigned long obj) { struct link_free *link; struct page *first_page, *f_page; - unsigned long obj, f_objidx, f_offset; + unsigned long f_objidx, f_offset; void *vaddr; - int class_idx; - struct size_class *class; enum fullness_group fullness; - if (unlikely(!handle)) - return; + BUG_ON(!obj); - obj = handle_to_obj(handle); - free_handle(pool, handle); obj_to_location(obj, &f_page, &f_objidx); first_page = get_first_page(f_page); get_zspage_mapping(first_page, &class_idx, &fullness); - class = pool->size_class[class_idx]; f_offset = obj_idx_to_offset(f_page, f_objidx, class->size); - spin_lock(&class->lock); + vaddr = kmap_atomic(f_page); /* Insert this object in containing zspage's freelist */ - vaddr = kmap_atomic(f_page); link = (struct link_free *)(vaddr + f_offset); link->next = first_page->freelist; kunmap_atomic(vaddr); first_page->freelist = (void *)obj; - first_page->inuse--; - fullness = fix_fullness_group(pool, first_page); - zs_stat_dec(class, OBJ_USED, 1); +} + +void zs_free(struct zs_pool *pool, unsigned long handle) +{ + struct page *first_page, *f_page; + unsigned long obj, f_objidx; + int class_idx; + struct size_class *class; + enum fullness_group fullness; + + if (unlikely(!handle)) + return; + + obj = handle_to_obj(handle); + free_handle(pool, handle); + obj_to_location(obj, &f_page, &f_objidx); + first_page = get_first_page(f_page); + + get_zspage_mapping(first_page, &class_idx, &fullness); + class = pool->size_class[class_idx]; + + spin_lock(&class->lock); + obj_free(pool, class, obj); + fullness = fix_fullness_group(class, first_page); if (fullness == ZS_EMPTY) zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( class->size, class->pages_per_zspage)); - spin_unlock(&class->lock); if (fullness == ZS_EMPTY) { -- cgit v1.2.3 From 312fcae227037619dc858c9ccd362c7b847730a2 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Wed, 15 Apr 2015 16:15:30 -0700 Subject: zsmalloc: support compaction This patch provides core functions for migration of zsmalloc. Migraion policy is simple as follows. for each size class { while { src_page = get zs_page from ZS_ALMOST_EMPTY if (!src_page) break; dst_page = get zs_page from ZS_ALMOST_FULL if (!dst_page) dst_page = get zs_page from ZS_ALMOST_EMPTY if (!dst_page) break; migrate(from src_page, to dst_page); } } For migration, we need to identify which objects in zspage are allocated to migrate them out. We could know it by iterating of freed objects in a zspage because first_page of zspage keeps free objects singly-linked list but it's not efficient. Instead, this patch adds a tag(ie, OBJ_ALLOCATED_TAG) in header of each object(ie, handle) so we could check whether the object is allocated easily. This patch adds another status bit in handle to synchronize between user access through zs_map_object and migration. During migration, we cannot move objects user are using due to data coherency between old object and new object. [akpm@linux-foundation.org: zsmalloc.c needs sched.h for cond_resched()] Signed-off-by: Minchan Kim Cc: Juneho Choi Cc: Gunho Lee Cc: Luigi Semenzato Cc: Dan Streetman Cc: Seth Jennings Cc: Nitin Gupta Cc: Jerome Marchand Cc: Sergey Senozhatsky Cc: Joonsoo Kim Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 378 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 359 insertions(+), 19 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 55b171016f4f..c4ae608dc725 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -78,6 +78,7 @@ #include #include +#include #include #include #include @@ -135,7 +136,26 @@ #endif #endif #define _PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT) -#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS) + +/* + * Memory for allocating for handle keeps object position by + * encoding and the encoded value has a room + * in least bit(ie, look at obj_to_location). + * We use the bit to synchronize between object access by + * user and migration. + */ +#define HANDLE_PIN_BIT 0 + +/* + * Head in allocated object should have OBJ_ALLOCATED_TAG + * to identify the object was allocated or not. + * It's okay to add the status bit in the least bit because + * header keeps handle which is 4byte-aligned address so we + * have room for two bit at least. + */ +#define OBJ_ALLOCATED_TAG 1 +#define OBJ_TAG_BITS 1 +#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) #define MAX(a, b) ((a) >= (b) ? (a) : (b)) @@ -610,35 +630,35 @@ static struct page *get_next_page(struct page *page) /* * Encode as a single handle value. - * On hardware platforms with physical memory starting at 0x0 the pfn - * could be 0 so we ensure that the handle will never be 0 by adjusting the - * encoded obj_idx value before encoding. + * We use the least bit of handle for tagging. */ -static void *obj_location_to_handle(struct page *page, unsigned long obj_idx) +static void *location_to_obj(struct page *page, unsigned long obj_idx) { - unsigned long handle; + unsigned long obj; if (!page) { BUG_ON(obj_idx); return NULL; } - handle = page_to_pfn(page) << OBJ_INDEX_BITS; - handle |= ((obj_idx + 1) & OBJ_INDEX_MASK); + obj = page_to_pfn(page) << OBJ_INDEX_BITS; + obj |= ((obj_idx) & OBJ_INDEX_MASK); + obj <<= OBJ_TAG_BITS; - return (void *)handle; + return (void *)obj; } /* * Decode pair from the given object handle. We adjust the * decoded obj_idx back to its original value since it was adjusted in - * obj_location_to_handle(). + * location_to_obj(). */ -static void obj_to_location(unsigned long handle, struct page **page, +static void obj_to_location(unsigned long obj, struct page **page, unsigned long *obj_idx) { - *page = pfn_to_page(handle >> OBJ_INDEX_BITS); - *obj_idx = (handle & OBJ_INDEX_MASK) - 1; + obj >>= OBJ_TAG_BITS; + *page = pfn_to_page(obj >> OBJ_INDEX_BITS); + *obj_idx = (obj & OBJ_INDEX_MASK); } static unsigned long handle_to_obj(unsigned long handle) @@ -646,6 +666,11 @@ static unsigned long handle_to_obj(unsigned long handle) return *(unsigned long *)handle; } +unsigned long obj_to_head(void *obj) +{ + return *(unsigned long *)obj; +} + static unsigned long obj_idx_to_offset(struct page *page, unsigned long obj_idx, int class_size) { @@ -657,6 +682,25 @@ static unsigned long obj_idx_to_offset(struct page *page, return off + obj_idx * class_size; } +static inline int trypin_tag(unsigned long handle) +{ + unsigned long *ptr = (unsigned long *)handle; + + return !test_and_set_bit_lock(HANDLE_PIN_BIT, ptr); +} + +static void pin_tag(unsigned long handle) +{ + while (!trypin_tag(handle)); +} + +static void unpin_tag(unsigned long handle) +{ + unsigned long *ptr = (unsigned long *)handle; + + clear_bit_unlock(HANDLE_PIN_BIT, ptr); +} + static void reset_page(struct page *page) { clear_bit(PG_private, &page->flags); @@ -718,7 +762,7 @@ static void init_zspage(struct page *first_page, struct size_class *class) link = (struct link_free *)vaddr + off / sizeof(*link); while ((off += class->size) < PAGE_SIZE) { - link->next = obj_location_to_handle(page, i++); + link->next = location_to_obj(page, i++); link += class->size / sizeof(*link); } @@ -728,7 +772,7 @@ static void init_zspage(struct page *first_page, struct size_class *class) * page (if present) */ next_page = get_next_page(page); - link->next = obj_location_to_handle(next_page, 0); + link->next = location_to_obj(next_page, 0); kunmap_atomic(vaddr); page = next_page; off %= PAGE_SIZE; @@ -782,7 +826,7 @@ static struct page *alloc_zspage(struct size_class *class, gfp_t flags) init_zspage(first_page, class); - first_page->freelist = obj_location_to_handle(first_page, 0); + first_page->freelist = location_to_obj(first_page, 0); /* Maximum number of objects we can store in this zspage */ first_page->objects = class->pages_per_zspage * PAGE_SIZE / class->size; @@ -1017,6 +1061,13 @@ static bool can_merge(struct size_class *prev, int size, int pages_per_zspage) return true; } +static bool zspage_full(struct page *page) +{ + BUG_ON(!is_first_page(page)); + + return page->inuse == page->objects; +} + #ifdef CONFIG_ZSMALLOC_STAT static inline void zs_stat_inc(struct size_class *class, @@ -1219,6 +1270,9 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, */ BUG_ON(in_interrupt()); + /* From now on, migration cannot move the object */ + pin_tag(handle); + obj = handle_to_obj(handle); obj_to_location(obj, &page, &obj_idx); get_zspage_mapping(get_first_page(page), &class_idx, &fg); @@ -1276,6 +1330,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) __zs_unmap_object(area, pages, off, class->size); } put_cpu_var(zs_map_area); + unpin_tag(handle); } EXPORT_SYMBOL_GPL(zs_unmap_object); @@ -1289,6 +1344,7 @@ static unsigned long obj_malloc(struct page *first_page, unsigned long m_objidx, m_offset; void *vaddr; + handle |= OBJ_ALLOCATED_TAG; obj = (unsigned long)first_page->freelist; obj_to_location(obj, &m_page, &m_objidx); m_offset = obj_idx_to_offset(m_page, m_objidx, class->size); @@ -1374,6 +1430,7 @@ static void obj_free(struct zs_pool *pool, struct size_class *class, BUG_ON(!obj); + obj &= ~OBJ_ALLOCATED_TAG; obj_to_location(obj, &f_page, &f_objidx); first_page = get_first_page(f_page); @@ -1402,8 +1459,8 @@ void zs_free(struct zs_pool *pool, unsigned long handle) if (unlikely(!handle)) return; + pin_tag(handle); obj = handle_to_obj(handle); - free_handle(pool, handle); obj_to_location(obj, &f_page, &f_objidx); first_page = get_first_page(f_page); @@ -1413,18 +1470,301 @@ void zs_free(struct zs_pool *pool, unsigned long handle) spin_lock(&class->lock); obj_free(pool, class, obj); fullness = fix_fullness_group(class, first_page); - if (fullness == ZS_EMPTY) + if (fullness == ZS_EMPTY) { zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( class->size, class->pages_per_zspage)); + atomic_long_sub(class->pages_per_zspage, + &pool->pages_allocated); + free_zspage(first_page); + } spin_unlock(&class->lock); + unpin_tag(handle); + + free_handle(pool, handle); +} +EXPORT_SYMBOL_GPL(zs_free); + +static void zs_object_copy(unsigned long src, unsigned long dst, + struct size_class *class) +{ + struct page *s_page, *d_page; + unsigned long s_objidx, d_objidx; + unsigned long s_off, d_off; + void *s_addr, *d_addr; + int s_size, d_size, size; + int written = 0; + + s_size = d_size = class->size; + + obj_to_location(src, &s_page, &s_objidx); + obj_to_location(dst, &d_page, &d_objidx); + + s_off = obj_idx_to_offset(s_page, s_objidx, class->size); + d_off = obj_idx_to_offset(d_page, d_objidx, class->size); + + if (s_off + class->size > PAGE_SIZE) + s_size = PAGE_SIZE - s_off; + + if (d_off + class->size > PAGE_SIZE) + d_size = PAGE_SIZE - d_off; + + s_addr = kmap_atomic(s_page); + d_addr = kmap_atomic(d_page); + + while (1) { + size = min(s_size, d_size); + memcpy(d_addr + d_off, s_addr + s_off, size); + written += size; + + if (written == class->size) + break; + + if (s_off + size >= PAGE_SIZE) { + kunmap_atomic(d_addr); + kunmap_atomic(s_addr); + s_page = get_next_page(s_page); + BUG_ON(!s_page); + s_addr = kmap_atomic(s_page); + d_addr = kmap_atomic(d_page); + s_size = class->size - written; + s_off = 0; + } else { + s_off += size; + s_size -= size; + } + + if (d_off + size >= PAGE_SIZE) { + kunmap_atomic(d_addr); + d_page = get_next_page(d_page); + BUG_ON(!d_page); + d_addr = kmap_atomic(d_page); + d_size = class->size - written; + d_off = 0; + } else { + d_off += size; + d_size -= size; + } + } + + kunmap_atomic(d_addr); + kunmap_atomic(s_addr); +} + +/* + * Find alloced object in zspage from index object and + * return handle. + */ +static unsigned long find_alloced_obj(struct page *page, int index, + struct size_class *class) +{ + unsigned long head; + int offset = 0; + unsigned long handle = 0; + void *addr = kmap_atomic(page); + + if (!is_first_page(page)) + offset = page->index; + offset += class->size * index; + + while (offset < PAGE_SIZE) { + head = obj_to_head(addr + offset); + if (head & OBJ_ALLOCATED_TAG) { + handle = head & ~OBJ_ALLOCATED_TAG; + if (trypin_tag(handle)) + break; + handle = 0; + } + + offset += class->size; + index++; + } + + kunmap_atomic(addr); + return handle; +} + +struct zs_compact_control { + /* Source page for migration which could be a subpage of zspage. */ + struct page *s_page; + /* Destination page for migration which should be a first page + * of zspage. */ + struct page *d_page; + /* Starting object index within @s_page which used for live object + * in the subpage. */ + int index; + /* how many of objects are migrated */ + int nr_migrated; +}; + +static int migrate_zspage(struct zs_pool *pool, struct size_class *class, + struct zs_compact_control *cc) +{ + unsigned long used_obj, free_obj; + unsigned long handle; + struct page *s_page = cc->s_page; + struct page *d_page = cc->d_page; + unsigned long index = cc->index; + int nr_migrated = 0; + int ret = 0; + + while (1) { + handle = find_alloced_obj(s_page, index, class); + if (!handle) { + s_page = get_next_page(s_page); + if (!s_page) + break; + index = 0; + continue; + } + + /* Stop if there is no more space */ + if (zspage_full(d_page)) { + unpin_tag(handle); + ret = -ENOMEM; + break; + } + used_obj = handle_to_obj(handle); + free_obj = obj_malloc(d_page, class, handle); + zs_object_copy(used_obj, free_obj, class); + index++; + record_obj(handle, free_obj); + unpin_tag(handle); + obj_free(pool, class, used_obj); + nr_migrated++; + } + + /* Remember last position in this iteration */ + cc->s_page = s_page; + cc->index = index; + cc->nr_migrated = nr_migrated; + + return ret; +} + +static struct page *alloc_target_page(struct size_class *class) +{ + int i; + struct page *page; + + for (i = 0; i < _ZS_NR_FULLNESS_GROUPS; i++) { + page = class->fullness_list[i]; + if (page) { + remove_zspage(page, class, i); + break; + } + } + + return page; +} + +static void putback_zspage(struct zs_pool *pool, struct size_class *class, + struct page *first_page) +{ + int class_idx; + enum fullness_group fullness; + + BUG_ON(!is_first_page(first_page)); + + get_zspage_mapping(first_page, &class_idx, &fullness); + insert_zspage(first_page, class, fullness); + fullness = fix_fullness_group(class, first_page); if (fullness == ZS_EMPTY) { + zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( + class->size, class->pages_per_zspage)); atomic_long_sub(class->pages_per_zspage, &pool->pages_allocated); + free_zspage(first_page); } } -EXPORT_SYMBOL_GPL(zs_free); + +static struct page *isolate_source_page(struct size_class *class) +{ + struct page *page; + + page = class->fullness_list[ZS_ALMOST_EMPTY]; + if (page) + remove_zspage(page, class, ZS_ALMOST_EMPTY); + + return page; +} + +static unsigned long __zs_compact(struct zs_pool *pool, + struct size_class *class) +{ + int nr_to_migrate; + struct zs_compact_control cc; + struct page *src_page; + struct page *dst_page = NULL; + unsigned long nr_total_migrated = 0; + + cond_resched(); + + spin_lock(&class->lock); + while ((src_page = isolate_source_page(class))) { + + BUG_ON(!is_first_page(src_page)); + + /* The goal is to migrate all live objects in source page */ + nr_to_migrate = src_page->inuse; + cc.index = 0; + cc.s_page = src_page; + + while ((dst_page = alloc_target_page(class))) { + cc.d_page = dst_page; + /* + * If there is no more space in dst_page, try to + * allocate another zspage. + */ + if (!migrate_zspage(pool, class, &cc)) + break; + + putback_zspage(pool, class, dst_page); + nr_total_migrated += cc.nr_migrated; + nr_to_migrate -= cc.nr_migrated; + } + + /* Stop if we couldn't find slot */ + if (dst_page == NULL) + break; + + putback_zspage(pool, class, dst_page); + putback_zspage(pool, class, src_page); + spin_unlock(&class->lock); + nr_total_migrated += cc.nr_migrated; + cond_resched(); + spin_lock(&class->lock); + } + + if (src_page) + putback_zspage(pool, class, src_page); + + spin_unlock(&class->lock); + + return nr_total_migrated; +} + +unsigned long zs_compact(struct zs_pool *pool) +{ + int i; + unsigned long nr_migrated = 0; + struct size_class *class; + + for (i = zs_size_classes - 1; i >= 0; i--) { + class = pool->size_class[i]; + if (!class) + continue; + if (class->index != i) + continue; + nr_migrated += __zs_compact(pool, class); + } + + synchronize_rcu(); + + return nr_migrated; +} +EXPORT_SYMBOL_GPL(zs_compact); /** * zs_create_pool - Creates an allocation pool to work from. -- cgit v1.2.3 From d3d07c92ff69f784bb8c3279fa87678bfa2f7f6f Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Wed, 15 Apr 2015 16:15:33 -0700 Subject: zsmalloc: adjust ZS_ALMOST_FULL Curretly, zsmalloc regards a zspage as ZS_ALMOST_EMPTY if the zspage has under 1/4 used objects(ie, fullness_threshold_frac). It could make result in loose packing since zsmalloc migrates only ZS_ALMOST_EMPTY zspage out. This patch changes the rule so that zsmalloc makes zspage which has above 3/4 used object ZS_ALMOST_FULL so it could make tight packing. Signed-off-by: Minchan Kim Cc: Juneho Choi Cc: Gunho Lee Cc: Luigi Semenzato Cc: Dan Streetman Cc: Seth Jennings Cc: Nitin Gupta Cc: Jerome Marchand Cc: Sergey Senozhatsky Cc: Joonsoo Kim Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index c4ae608dc725..3bd8b0a16462 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -480,7 +480,7 @@ static enum fullness_group get_fullness_group(struct page *page) fg = ZS_EMPTY; else if (inuse == max_objects) fg = ZS_FULL; - else if (inuse <= max_objects / fullness_threshold_frac) + else if (inuse <= 3 * max_objects / fullness_threshold_frac) fg = ZS_ALMOST_EMPTY; else fg = ZS_ALMOST_FULL; -- cgit v1.2.3 From 7b60a68529b0d827d26ea3426c2addd071bff789 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Wed, 15 Apr 2015 16:15:39 -0700 Subject: zsmalloc: record handle in page->private for huge object We store handle on header of each allocated object so it increases the size of each object by sizeof(unsigned long). If zram stores 4096 bytes to zsmalloc(ie, bad compression), zsmalloc needs 4104B-class to add handle. However, 4104B-class has 1-pages_per_zspage so wasted size by internal fragment is 8192 - 4104, which is terrible. So this patch records the handle in page->private on such huge object(ie, pages_per_zspage == 1 && maxobj_per_zspage == 1) instead of header of each object so we could use 4096B-class, not 4104B-class. Signed-off-by: Minchan Kim Cc: Juneho Choi Cc: Gunho Lee Cc: Luigi Semenzato Cc: Dan Streetman Cc: Seth Jennings Cc: Nitin Gupta Cc: Jerome Marchand Cc: Sergey Senozhatsky Cc: Joonsoo Kim Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 54 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 42 insertions(+), 12 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 3bd8b0a16462..95771c75f2e9 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -57,6 +57,8 @@ * * page->private (union with page->first_page): refers to the * component page after the first page + * If the page is first_page for huge object, it stores handle. + * Look at size_class->huge. * page->freelist: points to the first free object in zspage. * Free objects are linked together using in-place * metadata. @@ -163,7 +165,7 @@ #define ZS_MIN_ALLOC_SIZE \ MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) /* each chunk includes extra space to keep handle */ -#define ZS_MAX_ALLOC_SIZE (PAGE_SIZE + ZS_HANDLE_SIZE) +#define ZS_MAX_ALLOC_SIZE PAGE_SIZE /* * On systems with 4K page size, this gives 255 size classes! There is a @@ -239,6 +241,8 @@ struct size_class { /* Number of PAGE_SIZE sized pages to combine to form a 'zspage' */ int pages_per_zspage; + /* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */ + bool huge; #ifdef CONFIG_ZSMALLOC_STAT struct zs_size_stat stats; @@ -300,6 +304,7 @@ struct mapping_area { #endif char *vm_addr; /* address of kmap_atomic()'ed pages */ enum zs_mapmode vm_mm; /* mapping mode */ + bool huge; }; static int create_handle_cache(struct zs_pool *pool) @@ -457,7 +462,7 @@ static int get_size_class_index(int size) idx = DIV_ROUND_UP(size - ZS_MIN_ALLOC_SIZE, ZS_SIZE_CLASS_DELTA); - return idx; + return min(zs_size_classes - 1, idx); } /* @@ -666,9 +671,14 @@ static unsigned long handle_to_obj(unsigned long handle) return *(unsigned long *)handle; } -unsigned long obj_to_head(void *obj) +static unsigned long obj_to_head(struct size_class *class, struct page *page, + void *obj) { - return *(unsigned long *)obj; + if (class->huge) { + VM_BUG_ON(!is_first_page(page)); + return *(unsigned long *)page_private(page); + } else + return *(unsigned long *)obj; } static unsigned long obj_idx_to_offset(struct page *page, @@ -954,9 +964,12 @@ static void __zs_unmap_object(struct mapping_area *area, if (area->vm_mm == ZS_MM_RO) goto out; - buf = area->vm_buf + ZS_HANDLE_SIZE; - size -= ZS_HANDLE_SIZE; - off += ZS_HANDLE_SIZE; + buf = area->vm_buf; + if (!area->huge) { + buf = buf + ZS_HANDLE_SIZE; + size -= ZS_HANDLE_SIZE; + off += ZS_HANDLE_SIZE; + } sizes[0] = PAGE_SIZE - off; sizes[1] = size - sizes[0]; @@ -1295,7 +1308,10 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, ret = __zs_map_object(area, pages, off, class->size); out: - return ret + ZS_HANDLE_SIZE; + if (!class->huge) + ret += ZS_HANDLE_SIZE; + + return ret; } EXPORT_SYMBOL_GPL(zs_map_object); @@ -1352,8 +1368,12 @@ static unsigned long obj_malloc(struct page *first_page, vaddr = kmap_atomic(m_page); link = (struct link_free *)vaddr + m_offset / sizeof(*link); first_page->freelist = link->next; - /* record handle in the header of allocated chunk */ - link->handle = handle; + if (!class->huge) + /* record handle in the header of allocated chunk */ + link->handle = handle; + else + /* record handle in first_page->private */ + set_page_private(first_page, handle); kunmap_atomic(vaddr); first_page->inuse++; zs_stat_inc(class, OBJ_USED, 1); @@ -1377,7 +1397,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) struct size_class *class; struct page *first_page; - if (unlikely(!size || (size + ZS_HANDLE_SIZE) > ZS_MAX_ALLOC_SIZE)) + if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE)) return 0; handle = alloc_handle(pool); @@ -1387,6 +1407,11 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) /* extra space in chunk to keep the handle */ size += ZS_HANDLE_SIZE; class = pool->size_class[get_size_class_index(size)]; + /* In huge class size, we store the handle into first_page->private */ + if (class->huge) { + size -= ZS_HANDLE_SIZE; + class = pool->size_class[get_size_class_index(size)]; + } spin_lock(&class->lock); first_page = find_get_zspage(class); @@ -1442,6 +1467,8 @@ static void obj_free(struct zs_pool *pool, struct size_class *class, /* Insert this object in containing zspage's freelist */ link = (struct link_free *)(vaddr + f_offset); link->next = first_page->freelist; + if (class->huge) + set_page_private(first_page, 0); kunmap_atomic(vaddr); first_page->freelist = (void *)obj; first_page->inuse--; @@ -1567,7 +1594,7 @@ static unsigned long find_alloced_obj(struct page *page, int index, offset += class->size * index; while (offset < PAGE_SIZE) { - head = obj_to_head(addr + offset); + head = obj_to_head(class, page, addr + offset); if (head & OBJ_ALLOCATED_TAG) { handle = head & ~OBJ_ALLOCATED_TAG; if (trypin_tag(handle)) @@ -1837,6 +1864,9 @@ struct zs_pool *zs_create_pool(char *name, gfp_t flags) class->size = size; class->index = i; class->pages_per_zspage = pages_per_zspage; + if (pages_per_zspage == 1 && + get_maxobj_per_zspage(size, pages_per_zspage) == 1) + class->huge = true; spin_lock_init(&class->lock); pool->size_class[i] = class; -- cgit v1.2.3 From 248ca1b053c82fa22427d22b33ac51a24c88a86d Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Wed, 15 Apr 2015 16:15:42 -0700 Subject: zsmalloc: add fullness into stat During investigating compaction, fullness information of each class is helpful for investigating how the compaction works well. With that, we could know how compaction works well more clear on each size class. Signed-off-by: Minchan Kim Cc: Juneho Choi Cc: Gunho Lee Cc: Luigi Semenzato Cc: Dan Streetman Cc: Seth Jennings Cc: Nitin Gupta Cc: Jerome Marchand Cc: Sergey Senozhatsky Cc: Joonsoo Kim Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 349 +++++++++++++++++++++++++++++++--------------------------- 1 file changed, 184 insertions(+), 165 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 95771c75f2e9..461243e14d3e 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -197,6 +197,8 @@ enum fullness_group { enum zs_stat_type { OBJ_ALLOCATED, OBJ_USED, + CLASS_ALMOST_FULL, + CLASS_ALMOST_EMPTY, NR_ZS_STAT_TYPE, }; @@ -412,6 +414,11 @@ static struct zpool_driver zs_zpool_driver = { MODULE_ALIAS("zpool-zsmalloc"); #endif /* CONFIG_ZPOOL */ +static unsigned int get_maxobj_per_zspage(int size, int pages_per_zspage) +{ + return pages_per_zspage * PAGE_SIZE / size; +} + /* per-cpu VM mapping areas for zspage accesses that cross page boundaries */ static DEFINE_PER_CPU(struct mapping_area, zs_map_area); @@ -465,6 +472,179 @@ static int get_size_class_index(int size) return min(zs_size_classes - 1, idx); } +#ifdef CONFIG_ZSMALLOC_STAT + +static inline void zs_stat_inc(struct size_class *class, + enum zs_stat_type type, unsigned long cnt) +{ + class->stats.objs[type] += cnt; +} + +static inline void zs_stat_dec(struct size_class *class, + enum zs_stat_type type, unsigned long cnt) +{ + class->stats.objs[type] -= cnt; +} + +static inline unsigned long zs_stat_get(struct size_class *class, + enum zs_stat_type type) +{ + return class->stats.objs[type]; +} + +static int __init zs_stat_init(void) +{ + if (!debugfs_initialized()) + return -ENODEV; + + zs_stat_root = debugfs_create_dir("zsmalloc", NULL); + if (!zs_stat_root) + return -ENOMEM; + + return 0; +} + +static void __exit zs_stat_exit(void) +{ + debugfs_remove_recursive(zs_stat_root); +} + +static int zs_stats_size_show(struct seq_file *s, void *v) +{ + int i; + struct zs_pool *pool = s->private; + struct size_class *class; + int objs_per_zspage; + unsigned long class_almost_full, class_almost_empty; + unsigned long obj_allocated, obj_used, pages_used; + unsigned long total_class_almost_full = 0, total_class_almost_empty = 0; + unsigned long total_objs = 0, total_used_objs = 0, total_pages = 0; + + seq_printf(s, " %5s %5s %11s %12s %13s %10s %10s %16s\n", + "class", "size", "almost_full", "almost_empty", + "obj_allocated", "obj_used", "pages_used", + "pages_per_zspage"); + + for (i = 0; i < zs_size_classes; i++) { + class = pool->size_class[i]; + + if (class->index != i) + continue; + + spin_lock(&class->lock); + class_almost_full = zs_stat_get(class, CLASS_ALMOST_FULL); + class_almost_empty = zs_stat_get(class, CLASS_ALMOST_EMPTY); + obj_allocated = zs_stat_get(class, OBJ_ALLOCATED); + obj_used = zs_stat_get(class, OBJ_USED); + spin_unlock(&class->lock); + + objs_per_zspage = get_maxobj_per_zspage(class->size, + class->pages_per_zspage); + pages_used = obj_allocated / objs_per_zspage * + class->pages_per_zspage; + + seq_printf(s, " %5u %5u %11lu %12lu %13lu %10lu %10lu %16d\n", + i, class->size, class_almost_full, class_almost_empty, + obj_allocated, obj_used, pages_used, + class->pages_per_zspage); + + total_class_almost_full += class_almost_full; + total_class_almost_empty += class_almost_empty; + total_objs += obj_allocated; + total_used_objs += obj_used; + total_pages += pages_used; + } + + seq_puts(s, "\n"); + seq_printf(s, " %5s %5s %11lu %12lu %13lu %10lu %10lu\n", + "Total", "", total_class_almost_full, + total_class_almost_empty, total_objs, + total_used_objs, total_pages); + + return 0; +} + +static int zs_stats_size_open(struct inode *inode, struct file *file) +{ + return single_open(file, zs_stats_size_show, inode->i_private); +} + +static const struct file_operations zs_stat_size_ops = { + .open = zs_stats_size_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int zs_pool_stat_create(char *name, struct zs_pool *pool) +{ + struct dentry *entry; + + if (!zs_stat_root) + return -ENODEV; + + entry = debugfs_create_dir(name, zs_stat_root); + if (!entry) { + pr_warn("debugfs dir <%s> creation failed\n", name); + return -ENOMEM; + } + pool->stat_dentry = entry; + + entry = debugfs_create_file("classes", S_IFREG | S_IRUGO, + pool->stat_dentry, pool, &zs_stat_size_ops); + if (!entry) { + pr_warn("%s: debugfs file entry <%s> creation failed\n", + name, "classes"); + return -ENOMEM; + } + + return 0; +} + +static void zs_pool_stat_destroy(struct zs_pool *pool) +{ + debugfs_remove_recursive(pool->stat_dentry); +} + +#else /* CONFIG_ZSMALLOC_STAT */ + +static inline void zs_stat_inc(struct size_class *class, + enum zs_stat_type type, unsigned long cnt) +{ +} + +static inline void zs_stat_dec(struct size_class *class, + enum zs_stat_type type, unsigned long cnt) +{ +} + +static inline unsigned long zs_stat_get(struct size_class *class, + enum zs_stat_type type) +{ + return 0; +} + +static int __init zs_stat_init(void) +{ + return 0; +} + +static void __exit zs_stat_exit(void) +{ +} + +static inline int zs_pool_stat_create(char *name, struct zs_pool *pool) +{ + return 0; +} + +static inline void zs_pool_stat_destroy(struct zs_pool *pool) +{ +} + +#endif + + /* * For each size class, zspages are divided into different groups * depending on how "full" they are. This was done so that we could @@ -514,6 +694,8 @@ static void insert_zspage(struct page *page, struct size_class *class, list_add_tail(&page->lru, &(*head)->lru); *head = page; + zs_stat_inc(class, fullness == ZS_ALMOST_EMPTY ? + CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1); } /* @@ -539,6 +721,8 @@ static void remove_zspage(struct page *page, struct size_class *class, struct page, lru); list_del_init(&page->lru); + zs_stat_dec(class, fullness == ZS_ALMOST_EMPTY ? + CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1); } /* @@ -1057,11 +1241,6 @@ static void init_zs_size_classes(void) zs_size_classes = nr; } -static unsigned int get_maxobj_per_zspage(int size, int pages_per_zspage) -{ - return pages_per_zspage * PAGE_SIZE / size; -} - static bool can_merge(struct size_class *prev, int size, int pages_per_zspage) { if (prev->pages_per_zspage != pages_per_zspage) @@ -1081,166 +1260,6 @@ static bool zspage_full(struct page *page) return page->inuse == page->objects; } -#ifdef CONFIG_ZSMALLOC_STAT - -static inline void zs_stat_inc(struct size_class *class, - enum zs_stat_type type, unsigned long cnt) -{ - class->stats.objs[type] += cnt; -} - -static inline void zs_stat_dec(struct size_class *class, - enum zs_stat_type type, unsigned long cnt) -{ - class->stats.objs[type] -= cnt; -} - -static inline unsigned long zs_stat_get(struct size_class *class, - enum zs_stat_type type) -{ - return class->stats.objs[type]; -} - -static int __init zs_stat_init(void) -{ - if (!debugfs_initialized()) - return -ENODEV; - - zs_stat_root = debugfs_create_dir("zsmalloc", NULL); - if (!zs_stat_root) - return -ENOMEM; - - return 0; -} - -static void __exit zs_stat_exit(void) -{ - debugfs_remove_recursive(zs_stat_root); -} - -static int zs_stats_size_show(struct seq_file *s, void *v) -{ - int i; - struct zs_pool *pool = s->private; - struct size_class *class; - int objs_per_zspage; - unsigned long obj_allocated, obj_used, pages_used; - unsigned long total_objs = 0, total_used_objs = 0, total_pages = 0; - - seq_printf(s, " %5s %5s %13s %10s %10s\n", "class", "size", - "obj_allocated", "obj_used", "pages_used"); - - for (i = 0; i < zs_size_classes; i++) { - class = pool->size_class[i]; - - if (class->index != i) - continue; - - spin_lock(&class->lock); - obj_allocated = zs_stat_get(class, OBJ_ALLOCATED); - obj_used = zs_stat_get(class, OBJ_USED); - spin_unlock(&class->lock); - - objs_per_zspage = get_maxobj_per_zspage(class->size, - class->pages_per_zspage); - pages_used = obj_allocated / objs_per_zspage * - class->pages_per_zspage; - - seq_printf(s, " %5u %5u %10lu %10lu %10lu\n", i, - class->size, obj_allocated, obj_used, pages_used); - - total_objs += obj_allocated; - total_used_objs += obj_used; - total_pages += pages_used; - } - - seq_puts(s, "\n"); - seq_printf(s, " %5s %5s %10lu %10lu %10lu\n", "Total", "", - total_objs, total_used_objs, total_pages); - - return 0; -} - -static int zs_stats_size_open(struct inode *inode, struct file *file) -{ - return single_open(file, zs_stats_size_show, inode->i_private); -} - -static const struct file_operations zs_stat_size_ops = { - .open = zs_stats_size_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; - -static int zs_pool_stat_create(char *name, struct zs_pool *pool) -{ - struct dentry *entry; - - if (!zs_stat_root) - return -ENODEV; - - entry = debugfs_create_dir(name, zs_stat_root); - if (!entry) { - pr_warn("debugfs dir <%s> creation failed\n", name); - return -ENOMEM; - } - pool->stat_dentry = entry; - - entry = debugfs_create_file("obj_in_classes", S_IFREG | S_IRUGO, - pool->stat_dentry, pool, &zs_stat_size_ops); - if (!entry) { - pr_warn("%s: debugfs file entry <%s> creation failed\n", - name, "obj_in_classes"); - return -ENOMEM; - } - - return 0; -} - -static void zs_pool_stat_destroy(struct zs_pool *pool) -{ - debugfs_remove_recursive(pool->stat_dentry); -} - -#else /* CONFIG_ZSMALLOC_STAT */ - -static inline void zs_stat_inc(struct size_class *class, - enum zs_stat_type type, unsigned long cnt) -{ -} - -static inline void zs_stat_dec(struct size_class *class, - enum zs_stat_type type, unsigned long cnt) -{ -} - -static inline unsigned long zs_stat_get(struct size_class *class, - enum zs_stat_type type) -{ - return 0; -} - -static int __init zs_stat_init(void) -{ - return 0; -} - -static void __exit zs_stat_exit(void) -{ -} - -static inline int zs_pool_stat_create(char *name, struct zs_pool *pool) -{ - return 0; -} - -static inline void zs_pool_stat_destroy(struct zs_pool *pool) -{ -} - -#endif - unsigned long zs_get_total_pages(struct zs_pool *pool) { return atomic_long_read(&pool->pages_allocated); -- cgit v1.2.3 From d02be50dba649b4246e0c1c4b7cb5d8a8d49de9a Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Wed, 15 Apr 2015 16:15:46 -0700 Subject: zsmalloc: zsmalloc documentation Create zsmalloc doc which explains design concept and stat information. Signed-off-by: Minchan Kim Cc: Juneho Choi Cc: Gunho Lee Cc: Luigi Semenzato Cc: Dan Streetman Cc: Seth Jennings Cc: Nitin Gupta Cc: Jerome Marchand Cc: Sergey Senozhatsky Cc: Joonsoo Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 29 ----------------------------- 1 file changed, 29 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 461243e14d3e..1833fc9e09cb 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -12,35 +12,6 @@ */ /* - * This allocator is designed for use with zram. Thus, the allocator is - * supposed to work well under low memory conditions. In particular, it - * never attempts higher order page allocation which is very likely to - * fail under memory pressure. On the other hand, if we just use single - * (0-order) pages, it would suffer from very high fragmentation -- - * any object of size PAGE_SIZE/2 or larger would occupy an entire page. - * This was one of the major issues with its predecessor (xvmalloc). - * - * To overcome these issues, zsmalloc allocates a bunch of 0-order pages - * and links them together using various 'struct page' fields. These linked - * pages act as a single higher-order page i.e. an object can span 0-order - * page boundaries. The code refers to these linked pages as a single entity - * called zspage. - * - * For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE - * since this satisfies the requirements of all its current users (in the - * worst case, page is incompressible and is thus stored "as-is" i.e. in - * uncompressed form). For allocation requests larger than this size, failure - * is returned (see zs_malloc). - * - * Additionally, zs_malloc() does not return a dereferenceable pointer. - * Instead, it returns an opaque handle (unsigned long) which encodes actual - * location of the allocated object. The reason for this indirection is that - * zsmalloc does not keep zspages permanently mapped since that would cause - * issues on 32-bit systems where the VA region for kernel space mappings - * is very small. So, before using the allocating memory, the object has to - * be mapped using zs_map_object() to get a usable pointer and subsequently - * unmapped using zs_unmap_object(). - * * Following is how we use various fields and flags of underlying * struct page(s) to form a zspage. * -- cgit v1.2.3 From 888fa374e625f3ee8f3cc2efebf52939d3bb6da1 Mon Sep 17 00:00:00 2001 From: Yinghao Xie Date: Wed, 15 Apr 2015 16:15:49 -0700 Subject: mm/zsmalloc.c: fix comment for get_pages_per_zspage Signed-off-by: Yinghao Xie Suggested-by: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 1833fc9e09cb..7a48e5568d46 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -731,7 +731,8 @@ out: * to form a zspage for each size class. This is important * to reduce wastage due to unusable space left at end of * each zspage which is given as: - * wastage = Zp - Zp % size_class + * wastage = Zp % class_size + * usage = Zp - wastage * where Zp = zspage size = k * PAGE_SIZE where k = 1, 2, ... * * For example, for size class of 3/8 * PAGE_SIZE, we should -- cgit v1.2.3 From 1ec7cfb13acb8047ae5baafb43d2cd6b64ac85b9 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Wed, 15 Apr 2015 16:16:12 -0700 Subject: zsmalloc: remove synchronize_rcu from zs_compact() Do not synchronize rcu in zs_compact(). Neither zsmalloc not zram use rcu. Signed-off-by: Sergey Senozhatsky Acked-by: Minchan Kim Cc: Nitin Gupta Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 7a48e5568d46..8705a010e2d3 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1778,8 +1778,6 @@ unsigned long zs_compact(struct zs_pool *pool) nr_migrated += __zs_compact(pool, class); } - synchronize_rcu(); - return nr_migrated; } EXPORT_SYMBOL_GPL(zs_compact); -- cgit v1.2.3 From 495819ead5ad02174208994ca610852a7791a2f2 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Wed, 15 Apr 2015 16:16:15 -0700 Subject: zsmalloc: micro-optimize zs_object_copy() A micro-optimization. Avoid additional branching and reduce (a bit) registry pressure (f.e. s_off += size; d_off += size; may be calculated twise: first for >= PAGE_SIZE check and later for offset update in "else" clause). scripts/bloat-o-meter shows some improvement add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10) function old new delta zs_object_copy 550 540 -10 Signed-off-by: Sergey Senozhatsky Acked-by: Minchan Kim Cc: Nitin Gupta Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 8705a010e2d3..a9a9ff233a13 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1537,7 +1537,12 @@ static void zs_object_copy(unsigned long src, unsigned long dst, if (written == class->size) break; - if (s_off + size >= PAGE_SIZE) { + s_off += size; + s_size -= size; + d_off += size; + d_size -= size; + + if (s_off >= PAGE_SIZE) { kunmap_atomic(d_addr); kunmap_atomic(s_addr); s_page = get_next_page(s_page); @@ -1546,21 +1551,15 @@ static void zs_object_copy(unsigned long src, unsigned long dst, d_addr = kmap_atomic(d_page); s_size = class->size - written; s_off = 0; - } else { - s_off += size; - s_size -= size; } - if (d_off + size >= PAGE_SIZE) { + if (d_off >= PAGE_SIZE) { kunmap_atomic(d_addr); d_page = get_next_page(d_page); BUG_ON(!d_page); d_addr = kmap_atomic(d_page); d_size = class->size - written; d_off = 0; - } else { - d_off += size; - d_size -= size; } } -- cgit v1.2.3 From 839373e645d12613308d9148041c4bd967bce8d5 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Wed, 15 Apr 2015 16:16:18 -0700 Subject: zsmalloc: remove unnecessary insertion/removal of zspage in compaction In putback_zspage, we don't need to insert a zspage into list of zspage in size_class again to just fix fullness group. We could do directly without reinsertion so we could save some instuctions. Reported-by: Heesub Shin Signed-off-by: Minchan Kim Cc: Nitin Gupta Cc: Sergey Senozhatsky Cc: Dan Streetman Cc: Seth Jennings Cc: Ganesh Mahendran Cc: Luigi Semenzato Cc: Gunho Lee Cc: Juneho Choi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index a9a9ff233a13..ded3672295d7 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1678,14 +1678,14 @@ static struct page *alloc_target_page(struct size_class *class) static void putback_zspage(struct zs_pool *pool, struct size_class *class, struct page *first_page) { - int class_idx; enum fullness_group fullness; BUG_ON(!is_first_page(first_page)); - get_zspage_mapping(first_page, &class_idx, &fullness); + fullness = get_fullness_group(first_page); insert_zspage(first_page, class, fullness); - fullness = fix_fullness_group(class, first_page); + set_zspage_mapping(first_page, class->index, fullness); + if (fullness == ZS_EMPTY) { zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( class->size, class->pages_per_zspage)); -- cgit v1.2.3 From 81da9b13f73653bf5f38c63af8029fc459198ac0 Mon Sep 17 00:00:00 2001 From: Heesub Shin Date: Wed, 15 Apr 2015 16:16:21 -0700 Subject: zsmalloc: fix fatal corruption due to wrong size class selection There is no point in overriding the size class below. It causes fatal corruption on the next chunk on the 3264-bytes size class, which is the last size class that is not huge. For example, if the requested size was exactly 3264 bytes, current zsmalloc allocates and returns a chunk from the size class of 3264 bytes, not 4096. User access to this chunk may overwrite head of the next adjacent chunk. Here is the panic log captured when freelist was corrupted due to this: Kernel BUG at ffffffc00030659c [verbose debug info unavailable] Internal error: Oops - BUG: 96000006 [#1] PREEMPT SMP Modules linked in: exynos-snapshot: core register saved(CPU:5) CPUMERRSR: 0000000000000000, L2MERRSR: 0000000000000000 exynos-snapshot: context saved(CPU:5) exynos-snapshot: item - log_kevents is disabled CPU: 5 PID: 898 Comm: kswapd0 Not tainted 3.10.61-4497415-eng #1 task: ffffffc0b8783d80 ti: ffffffc0b71e8000 task.ti: ffffffc0b71e8000 PC is at obj_idx_to_offset+0x0/0x1c LR is at obj_malloc+0x44/0xe8 pc : [] lr : [] pstate: a0000045 sp : ffffffc0b71eb790 x29: ffffffc0b71eb790 x28: ffffffc00204c000 x27: 000000000001d96f x26: 0000000000000000 x25: ffffffc098cc3500 x24: ffffffc0a13f2810 x23: ffffffc098cc3501 x22: ffffffc0a13f2800 x21: 000011e1a02006e3 x20: ffffffc0a13f2800 x19: ffffffbc02a7e000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000feb x15: 0000000000000000 x14: 00000000a01003e3 x13: 0000000000000020 x12: fffffffffffffff0 x11: ffffffc08b264000 x10: 00000000e3a01004 x9 : ffffffc08b263fea x8 : ffffffc0b1e611c0 x7 : ffffffc000307d24 x6 : 0000000000000000 x5 : 0000000000000038 x4 : 000000000000011e x3 : ffffffbc00003e90 x2 : 0000000000000cc0 x1 : 00000000d0100371 x0 : ffffffbc00003e90 Reported-by: Sooyong Suk Signed-off-by: Heesub Shin Tested-by: Sooyong Suk Acked-by: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 5 ----- 1 file changed, 5 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index ded3672295d7..e24f7ccc5865 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1398,11 +1398,6 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) /* extra space in chunk to keep the handle */ size += ZS_HANDLE_SIZE; class = pool->size_class[get_size_class_index(size)]; - /* In huge class size, we store the handle into first_page->private */ - if (class->huge) { - size -= ZS_HANDLE_SIZE; - class = pool->size_class[get_size_class_index(size)]; - } spin_lock(&class->lock); first_page = find_get_zspage(class); -- cgit v1.2.3 From 160a117f0864871ae1bab26554a985a1d2861afd Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Wed, 15 Apr 2015 16:16:24 -0700 Subject: zsmalloc: remove extra cond_resched() in __zs_compact Do not perform cond_resched() before the busy compaction loop in __zs_compact(), because this loop does it when needed. Signed-off-by: Sergey Senozhatsky Acked-by: Minchan Kim Cc: Nitin Gupta Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/zsmalloc.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'mm/zsmalloc.c') diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index e24f7ccc5865..08bd7a3d464a 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1711,8 +1711,6 @@ static unsigned long __zs_compact(struct zs_pool *pool, struct page *dst_page = NULL; unsigned long nr_total_migrated = 0; - cond_resched(); - spin_lock(&class->lock); while ((src_page = isolate_source_page(class))) { -- cgit v1.2.3