[PATCH] zone_reclaim: partial scans instead of full scan

Instead of scanning all the pages in a zone, imitate real swap and scan
only a portion of the pages and gradually scan more if we do not free up
enough pages.  This avoids a zone suddenly loosing all unused pagecache
pages (we may after all access some of these again so they deserve another
chance) but it still frees up large chunks of memory if a zone only
contains unused pagecache pages.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8277f93..f8b94ea 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1596,6 +1596,14 @@
  * Mininum time between zone reclaim scans
  */
 #define ZONE_RECLAIM_INTERVAL 30*HZ
+
+/*
+ * Priority for ZONE_RECLAIM. This determines the fraction of pages
+ * of a node considered for each zone_reclaim. 4 scans 1/16th of
+ * a zone.
+ */
+#define ZONE_RECLAIM_PRIORITY 4
+
 /*
  * Try to free up some pages from this zone through reclaim.
  */
@@ -1626,7 +1634,7 @@
 	sc.may_swap = 0;
 	sc.nr_scanned = 0;
 	sc.nr_reclaimed = 0;
-	sc.priority = 0;
+	sc.priority = ZONE_RECLAIM_PRIORITY + 1;
 	sc.nr_mapped = read_page_state(nr_mapped);
 	sc.gfp_mask = gfp_mask;
 
@@ -1643,7 +1651,15 @@
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	shrink_zone(zone, &sc);
+	/*
+	 * Free memory by calling shrink zone with increasing priorities
+	 * until we have enough memory freed.
+	 */
+	do {
+		sc.priority--;
+		shrink_zone(zone, &sc);
+
+	} while (sc.nr_reclaimed < nr_pages && sc.priority > 0);
 
 	p->reclaim_state = NULL;
 	current->flags &= ~PF_MEMALLOC;