linux內核情景分析之頁面換出

kswap線程主要用於頁面的定期換出,接下來說說kswap線程的實現

首先kswap線程的初始化時,需要根據物理內存的大小設置一個page_cluster變數的值,這個值表示預讀數目

(比如本來只讀一個頁面,預讀3個,就會一次性讀取3個頁面,這樣根據訪問局部性原理有利於提高速度)

kswap是一個線程共享內核的內存空間,創建使用kernel_thread創建

kswap線程首先調用inactive_shortage()檢查整個系統物理頁面是否短缺.

系統物理頁面的最低底線值由freepages.high(空閑頁面的數量),inactive_targe(不活躍頁面的數目)提供

而系統物理頁面的實際可用物理頁面由三部分組成,分別是

空閑頁面(立即可分配,來自於各個zone),其數目由nr_free_pages()統計提供

不活躍乾淨頁面(本質是可以分配的頁面,但其頁面還存在內容(在swap緩存),多保留這樣的頁面有利於減少從swap設備讀入,提供速度),其數量由nr_inactive_clean_pages記錄

不活躍臟頁面(需要寫入交換設備後,才能被分配的),由nr_inactive_dirty_pages記錄

int inactive_shortage(void){ int shortage = 0; //系統應該維持的物理內存由xxxhigh跟target維持 //實際的由下面3個函數統計,如果沒法滿足那就返回正數 shortage += freepages.high; shortage += inactive_target; shortage -= nr_free_pages(); shortage -= nr_inactive_clean_pages(); shortage -= nr_inactive_dirty_pages; if (shortage > 0) return shortage; return 0;}

即使以上條件滿足(及實際頁面數目高於底線數目),還需要調用free_shortage()檢查各個管理區是否頁面非常短缺.

統計管理區的實際的頁面是否滿足管理區的水準,如果不滿足,則返回差值..

/* * Check if there are zones with a severe shortage of free pages, * or if all zones have a minor shortage. */int free_shortage(void){ pg_data_t *pgdat = pgdat_list;//節點 int sum = 0; int freeable = nr_free_pages() + nr_inactive_clean_pages();//實際空閑 int freetarget = freepages.high + inactive_target / 3;//理論空閑 //實際小於理論,直接返回差值,表示需要擴充 /* Are we low on free pages globally? */ if (freeable < freetarget) return freetarget - freeable; /* If not, are we very low on any particular zone? */ do { int i; for(i = 0; i < MAX_NR_ZONES; i++) { zone_t *zone = pgdat->node_zones+ i;//獲取管理區 if (zone->size && (zone->inactive_clean_pages + zone->free_pages < zone->pages_min+1)) {//空閑頁面+乾淨不活躍頁面是否小於最低水準 /* + 1 to have overlap with alloc_pages() !! */ sum += zone->pages_min + 1; sum -= zone->free_pages; sum -= zone->inactive_clean_pages; } } pgdat = pgdat->node_next; } while (pgdat); return sum;}

以上兩個條件都滿足,那麼將調用refill_inactive_scan函數,試圖將一些活躍頁面(沒有用戶映射)轉換為非活躍臟頁面,

據priority的值掃描活躍隊列一部分頁面,priority為0時才全部掃描,另外判斷頁面是否最近受到訪問,收到了就增加age值,否則減少age值

(關於age值,age為0才考慮是否移到不活躍隊列中),接著判斷頁面age是否等於0並且判斷頁面是否有用戶進程映射(頁面分配時count設置為1,

當做讀寫緩衝時+1,每當一個進程映射到這頁面時+1,所以需要判斷該頁面是佛屬於緩衝頁面(讀/寫),如果age=0並且沒有用戶映射,那就

調用deactivate_page_nolock()函數,將頁面的age設置為0,清除頁面最近訪問標誌,並從活躍頁面隊列轉移到非活躍臟隊列,

當然如果頁面還是活躍的就放入活躍隊列尾.

/** * refill_inactive_scan - scan the active list and find pages to deactivate * @priority: the priority at which to scan * @oneshot: exit after deactivating one page * * This function will scan a portion of the active list to find * unused pages, those pages will then be moved to the inactive list. *///據priority的值掃描隊列一部分頁面,priority為0時才全部掃描int refill_inactive_scan(unsigned int priority, int oneshot){ struct list_head * page_lru; struct page * page; int maxscan, page_active = 0;//maxscan控制掃描頁面數目 int ret = 0; /* Take the lock while messing with the list... */ spin_lock(&pagemap_lru_lock); maxscan = nr_active_pages >> priority; while (maxscan-- > 0 && (page_lru = active_list.prev) != &active_list) { page = list_entry(page_lru, struct page, lru); /* Wrong page on list?! (list corruption, should not happen) */ if (!PageActive(page)) {//掃描的頁面必須是在活躍隊列中 printk("VM: refill_inactive, wrong page on list.
"); list_del(page_lru); nr_active_pages--; continue; } /* 判斷頁面是否受到訪問,,決定增加或減少壽命,如果減少壽命到0,那說明此頁面很久都沒訪問了Do aging on the pages. */ if (PageTestandClearReferenced(page)) { age_page_up_nolock(page); page_active = 1; } else { age_page_down_ageonly(page); /* * Since we dont hold a reference on the page * ourselves, we have to do our test a bit more * strict then deactivate_page(). This is needed * since otherwise the system could hang shuffling * unfreeable pages from the active list to the * inactive_dirty list and back again... * * SUBTLE: we can have buffer pages with count 1. *///緩衝頁面如果引用計數大於1,說明還要用戶空間映射,不能轉為不活躍頁面 if (page->age == 0 && page_count(page) <= (page->buffers ? 2 : 1)) { deactivate_page_nolock(page); page_active = 0; } else { page_active = 1; } } /* * If the page is still on the active list, move it * to the other end of the list. Otherwise it was * deactivated by age_page_down and we exit successfully. */ if (page_active || PageActive(page)) { list_del(page_lru);//如果頁面還是活躍的,就放入活躍尾部 list_add(page_lru, &active_list); } else { ret = 1; if (oneshot)//根據oneshot參數選擇是否繼續掃描一次 break; } } spin_unlock(&pagemap_lru_lock); return ret;}

上面是kswap檢測了系統物理內存是夠了並且管理區物理頁面也夠了的操作,kswap線程是一個死循環,完成上述操作,再次判斷頁面是否短缺或管理區短缺,如果不短缺就調用interruptibale_sleep_on_timeon()進入睡眠,讓內核自由調度其他進程運行,然後在內核運行一定時間(HZ自己可以定義)後,又喚醒kswap繼續重複操作

2.如果判斷出系統內存不足或者管理區頁面不足則調用do_try_free_pages()試圖騰出一些內存頁面來

1.如果頁面緊缺,或者髒的不活躍頁面的數量大於空閑頁面跟不活躍乾淨頁面的數目就需要調用page_launder試圖把不活躍狀態的臟頁面洗凈,使得它們成為立刻可分配的頁面,

如果經過page_launder()後,系統頁面依舊緊缺,釋放dentry目錄項跟inode數據結構的緩存,一般而言即使關閉這些,頁面也不會立刻釋放而是保存到lru隊列作為後備,否則如果頁面不緊缺了,就只調用kmem_cache_reap回收一部分slab緩存

static int do_try_to_free_pages(unsigned int gfp_mask, int user){ int ret = 0; /* 如果頁面緊缺,或者髒的不活躍頁面的數量大於空閑頁面跟不活躍乾淨頁面的數目 就需要調用page_launder試圖把不活躍狀態的臟頁面洗凈,使得它們成為立刻可分配的 頁面 */ if (free_shortage() || nr_inactive_dirty_pages > nr_free_pages() + nr_inactive_clean_pages()) ret += page_launder(gfp_mask, user); /*如果內存依舊緊缺 * If needed, we move pages from the active list * to the inactive list. We also "eat" pages from * the inode and dentry cache whenever we do this. *///釋放dentry目錄項跟inode數據結構的緩存,即使關閉這些,頁面也不會立刻釋放 //而是保存到lru隊列作為後備 if (free_shortage() || inactive_shortage()) { shrink_dcache_memory(6, gfp_mask);//釋放dentry目錄項緩存 shrink_icache_memory(6, gfp_mask);//釋放inode緩存 ret += refill_inactive(gfp_mask, user);//user表示是否有等待隊列的進程 } else { /* * 否則回收slab緩存 */ kmem_cache_reap(gfp_mask); ret = 1; } return ret;}

以上是大體流程,接下來分析do_try_free_pages中的page_launder()函數

作用是把不活躍狀態的臟頁面洗凈.

從不活躍臟頁面隊列取出每個頁,判斷是否最近受到訪問(雖然是臟頁面隊列還是有可能會受到訪問的,所以需要判斷,如果受到了訪問,那就移入活躍隊列,

頁面依舊是臟頁面,判斷是否是第一輪掃描,是的話放入隊尾然後繼續循環,否則如果是第二輪循環(當然有條件的,就是空閑頁面是否短缺),那就清除臟位,同時調用address_space提供的相關寫到swap設備的函數進行寫入.

如果頁面不再是髒的了但作用於緩存,先把該頁面脫離臟隊列,再調用try_to_free_buffers()後,count值減一

,如果失敗了,那就轉入活躍隊列或者不活躍乾淨頁面,接著判斷 判斷該頁面是否有映射,不是的話,那就釋放該頁面,或者判斷是否還有用戶進程映射,如果有,那就轉移到活躍隊列中,否則那就是雖然此頁面曾經是映射頁面,但沒有用戶映射了,那就也釋放該頁面,(注:前面的釋放,只是設置標誌位.需要再經過page_cache_release()使其count減為0,那就頁面進入了空閑頁面隊列了,接著判斷是否釋放了一個頁面後系統不再短缺,那就跳出循環,結束清洗,否則

判斷頁面是否是乾淨頁面並且是之前映射過的頁面那就轉移到不活躍乾淨隊列中.

完成一趟掃描後,判斷是否頁面緊缺,如果依舊緊缺就第二輪掃描了

int page_launder(int gfp_mask, int sync){ int launder_loop, maxscan, cleaned_pages, maxlaunder; int can_get_io_locks; struct list_head * page_lru; struct page * page; /* * We can only grab the IO locks (eg. for flushing dirty * buffers to disk) if __GFP_IO is set. */ can_get_io_locks = gfp_mask & __GFP_IO; launder_loop = 0; maxlaunder = 0; cleaned_pages = 0;dirty_page_rescan: spin_lock(&pagemap_lru_lock); maxscan = nr_inactive_dirty_pages;//避免重複處理同一頁面,設定的變數 //對不活躍臟頁面隊列掃描 while ((page_lru = inactive_dirty_list.prev) != &inactive_dirty_list && maxscan-- > 0) { page = list_entry(page_lru, struct page, lru); /* Wrong page on list?! (list corruption, should not happen) */ if (!PageInactiveDirty(page)) {檢查其標誌是否為1 printk("VM: page_launder, wrong page on list.
"); list_del(page_lru);//從隊列中刪除 nr_inactive_dirty_pages--; page->zone->inactive_dirty_pages--; continue; } /* 到了臟隊列,由於可能受到訪問,就會放入活躍頁面隊列Page is or was in use? Move it to the active list. */ if (PageTestandClearReferenced(page) || page->age > 0 || (!page->buffers && page_count(page) > 1) || page_ramdisk(page)) { del_page_from_inactive_dirty_list(page);//刪除非活躍隊列 add_page_to_active_list(page);//加入到活躍隊列中 continue; } /*頁面是否被鎖住,是的話表示把它移到隊列尾部 * The page is locked. IO in progress? * Move it to the back of the list. */ if (TryLockPage(page)) { list_del(page_lru); list_add(page_lru, &inactive_dirty_list); continue; } /* * Dirty swap-cache page? Write it out if * last copy.. */ if (PageDirty(page)) {//是臟頁面 int (*writepage)(struct page *) = page->mapping->a_ops->writepage; int result; if (!writepage)//如果沒有提供具體寫swp的函數,則放入活躍隊列中 goto page_active; /*判斷是否是第一次掃描,是的話就移到隊列尾部,繼續 First time through? Move it to the back of the list */ if (!launder_loop) { list_del(page_lru); list_add(page_lru, &inactive_dirty_list); UnlockPage(page); continue; } /* OK, do a physical asynchronous write to swap. */ ClearPageDirty(page);//清除page結構的_dirty位,防止再次寫入 page_cache_get(page);//增加page->count表示多了一個用戶操作此 //頁面,因為kswap線程把這個頁面寫出到swp設備中 spin_unlock(&pagemap_lru_lock); result = writepage(page); page_cache_release(page);//count--完成了寫入操作 //所以就用戶--了 /* And re-start the thing.. */ spin_lock(&pagemap_lru_lock); if (result != 1)//寫入失敗的話 continue; /* writepage refused to do anything */ set_page_dirty(page);//又設置為臟頁 goto page_active; } /* * 如果頁面不是髒的然後又是用於緩存文件讀寫的頁面 */ if (page->buffers) { int wait, clearedbuf; int freed_page = 0; /* * Since we might be doing disk IO, we have to * drop the spinlock and take an extra reference * on the page so it doesnt go away from under us. */ del_page_from_inactive_dirty_list(page);//脫離臟隊列 page_cache_get(page);//表示kswap進程需要作用於page,count++ spin_unlock(&pagemap_lru_lock); /* Will we do (asynchronous) IO? */ if (launder_loop && maxlaunder == 0 && sync) wait = 2; /* Synchrounous IO */ else if (launder_loop && maxlaunder-- > 0) wait = 1; /* Async IO */ else wait = 0; /* No IO */ /*試圖將頁面釋放,這裡是count減一 Try to free the page buffers. */ clearedbuf = try_to_free_buffers(page, wait); /* * Re-take the spinlock. Note that we cannot * unlock the page yet since were still * accessing the page_struct here... */ spin_lock(&pagemap_lru_lock); /* 不能釋放或者說釋放失敗繼續放入臟隊列The buffers were not freed. */ if (!clearedbuf) { add_page_to_inactive_dirty_list(page); /*/*頁面只在buffer cache隊列中,而不在某個文件的inode->i_mapping中,這樣的頁有超級塊,索引節點點陣圖等等,它們不屬於某個文件,因此我們就成功釋放了一個頁面*/ 如果該頁面只用於緩存,而非映射The page was only in the buffer cache. */ } else if (!page->mapping) { atomic_dec(&buffermem_pages); freed_page = 1; cleaned_pages++; /* *否則這個頁面還在某個文件的inode->i_mapping中,並且還有超過2個用戶(the cache and us)在訪問它,例如有多個進程映射到該文件如果該頁有幾個用戶,加入到活躍隊列中The page has more users besides the cache and us. */ } else if (page_count(page) > 2) { add_page_to_active_list(page); /* 最後,只剩下page->mapping && page_count(page) == 2,說明雖然這個頁面還在某個inode->i_mapping中,但是已經沒有任何用戶在訪問他們了,因此可以釋放該頁面OK, we "created" a freeable page. */ } else /* page->mapping && page_count(page) == 2 */ { add_page_to_inactive_clean_list(page); cleaned_pages++; } /* * Unlock the page and drop the extra reference. * We can only do it here because we ar accessing * the page struct above. */ UnlockPage(page); page_cache_release(page);//最終釋放頁面到空閑隊列緩存中 /* * If were freeing buffer cache pages, stop when * weve got enough free memory. 釋放了一個頁面,並且系統內存不再緊缺,那就停止 */ if (freed_page && !free_shortage()) break; continue;//頁面不再是臟頁面,並且屬於address_space紅 } else if (page->mapping && !PageDirty(page)) { /* * If a page had an extra reference in * deactivate_page(), we will find it here. * Now the page is really freeable, so we * move it to the inactive_clean list. */ del_page_from_inactive_dirty_list(page);//轉移到不活躍隊列中 add_page_to_inactive_clean_list(page); UnlockPage(page); cleaned_pages++; } else {page_active: /* * OK, we dont know what to do with the page. * Its no use keeping it here, so we move it to * the active list. */ del_page_from_inactive_dirty_list(page); add_page_to_active_list(page); UnlockPage(page); } } spin_unlock(&pagemap_lru_lock); /* * If we dont have enough free pages, we loop back once * to queue the dirty pages for writeout. When we were called * by a user process (that /needs/ a free page) and we didnt * free anything yet, we wait synchronously on the writeout of * MAX_SYNC_LAUNDER pages. * * We also wake up bdflush, since bdflush should, under most * loads, flush out the dirty pages before we have to wait on * IO. *///如果內存繼續緊缺,那就二次掃描一趟 if (can_get_io_locks && !launder_loop && free_shortage()) { launder_loop = 1; /* If we cleaned pages, never do synchronous IO. */ if (cleaned_pages) sync = 0; /* We only do a few "out of order" flushes. */ maxlaunder = MAX_LAUNDER; /* Kflushd takes care of the rest. */ wakeup_bdflush(0); goto dirty_page_rescan; } /* Return the number of pages moved to the inactive_clean list. */ return cleaned_pages;//返回有多少頁面被移到不活躍乾淨頁面中}

如果經過page_launder後,頁面也就緊缺,那就調用shrink_dcache_memory跟shrink_icache_memory

函數分別釋放釋放dentry目錄項緩存跟釋放inode緩存,並且調用refill_inactive函數進一步回收,否則如果

頁面充裕,那就只調用kmem_cache_reap回收slab緩存

接下來分析refill_inactive函數.

首先判斷系統還需要多少頁面,接著回收slab緩存,然後一個do_while循環,從優先順序最低的6開始,加大力度到0.

其循環調用了refill_active_scan(上面已經分析了)試圖將一部分活躍頁面轉移到非活躍臟頁面隊列,

接著調用shrink_dcache_memory跟shrink_icache_memory,函數分別釋放釋放dentry目錄項緩存跟釋放inode緩存,

接著根據count的數目多次調用swap_out函數試圖找出一個進程,掃描其映射表,找到可以轉入不活躍狀態頁面,最後根據count的數目多次調用refill_active_scan再次掃描就結束了

/* * We need to make the locks finer granularity, but right * now we need this so that we can do page allocations * without holding the kernel lock etc. * * We want to try to free "count" pages, and we want to * cluster them so that we get good swap-out behaviour. * * OTOH, if were a user process (and not kswapd), we * really care about latency. In that case we dont try * to free too many pages. */static int refill_inactive(unsigned int gfp_mask, int user){ int priority, count, start_count, made_progress; count = inactive_shortage() + free_shortage();//獲取需要的頁面數目 if (user) count = (1 << page_cluster); start_count = count; /* 任何時候,當頁面緊缺時,從slab開始回收Always trim SLAB caches when memory gets low. */ kmem_cache_reap(gfp_mask); priority = 6;//從最低優先順序別6開始 do { made_progress = 0;//每次循環都要檢查下當前進程是否被設置被調度,設置了,說明某個中斷程序需要調度 if (current->need_resched) { __set_current_state(TASK_RUNNING); schedule(); }//掃描活躍頁面隊列,試圖從中找出可以轉入不活躍狀態頁面 while (refill_inactive_scan(priority, 1)) { made_progress = 1; if (--count <= 0) goto done; } /* * dont be too light against the d/i cache since * refill_inactive() almost never fail when theres * really plenty of memory free. */ shrink_dcache_memory(priority, gfp_mask); shrink_icache_memory(priority, gfp_mask); /*試圖找出一個進程,掃描其映射表,找到可以轉入不活躍狀態頁面 * Then, try to page stuff out.. */ while (swap_out(priority, gfp_mask)) { made_progress = 1; if (--count <= 0) goto done; } /* * If we either have enough free memory, or if * page_launder() will be able to make enough * free memory, then stop. */ if (!inactive_shortage() || !free_shortage()) goto done; /* * Only switch to a lower "priority" if we * didnt make any useful progress in the * last loop. */ if (!made_progress) priority--; } while (priority >= 0); /* Always end on a refill_inactive.., may sleep... */ while (refill_inactive_scan(0, 1)) { if (--count <= 0) goto done; }done: return (count < start_count);}

接著看看swap_out函數的實現

根據內核中進程的個數跟調用swap_out的優先順序計算得到的counter.counter表示循環次數,每次循環的任務從所有進程中找出最合適的進程best,斷開頁面印射,進一步轉換成不活躍狀態,最合適的準則是"劫富濟貧「和」輪流坐莊「的結合

static int swap_out(unsigned int priority, int gfp_mask){ int counter;//循環次數 int __ret = 0; /* * We make one or two passes through the task list, indexed by * assign = {0, 1}: * Pass 1: select the swappable task with maximal RSS that has * not yet been swapped out. * Pass 2: re-assign rss swap_cnt values, then select as above. * * With this approach, theres no need to remember the last task * swapped out. If the swap-out fails, we clear swap_cnt so the * task wont be selected again until all others have been tried. * * Think of swap_cnt as a "shadow rss" - it tells us which process * we want to page out (always try largest first). *///根據內核中進程的個數跟調用swap_out的優先順序計算得到的 counter = (nr_threads << SWAP_SHIFT) >> priority; if (counter < 1) counter = 1; for (; counter >= 0; counter--) { struct list_head *p; unsigned long max_cnt = 0; struct mm_struct *best = NULL; int assign = 0; int found_task = 0; select: spin_lock(&mmlist_lock); p = init_mm.mmlist.next; for (; p != &init_mm.mmlist; p = p->next) { struct mm_struct *mm = list_entry(p, struct mm_struct, mmlist); if (mm->rss <= 0) continue; found_task++; /* Refresh swap_cnt? */ if (assign == 1) {////增加這層判斷目的是,但我們找不到mm->swap_cnt不為0的mm時候,我們就會設置assign=1,然後再從新掃描一遍,此次就會直接把內存頁面數量賦值給尚未考察頁面數量,從而從新刷新一次,這樣我們就會從最富有的進程開始下手,mm->swap_cnt用於保證我們所說的輪流坐莊,mm->rss則是保證劫富濟貧第二輪循環,將mm->rss拷貝到mm_swap_cnt,從最大的開始繼續 mm->swap_cnt = (mm->rss >> SWAP_SHIFT);//記錄一次輪換中尚未內存頁面尚未考察的數量 if (mm->swap_cnt < SWAP_MIN) mm->swap_cnt = SWAP_MIN; } if (mm->swap_cnt > max_cnt) { max_cnt = mm->swap_cnt; best = mm; } }///從循環退出來,我們就找到了最大的mm->swap_cnt的mm /* Make sure it doesnt disappear */ if (best) atomic_inc(&best->mm_users); spin_unlock(&mmlist_lock); /* * We have dropped the tasklist_lock, but we * know that "mm" still exists: we are running * with the big kernel lock, and exit_mm() * cannot race with us. */ if (!best) { if (!assign && found_task > 0) {//第一次進入,表示所有進程mm->swap_cnt都為0,第2次不會再進入了,一般不會出現第2次 assign = 1;//第二輪循環 goto select; } break; } else {//掃出一個最佳換出的進程,調用swap_out_mm __ret = swap_out_mm(best, gfp_mask); mmput(best); break; } } return __ret;}

swap_out_vma會調用關係swap_out_vma()>swap_out_pgd()>swap_out_pmd()>try_to_swap_out() static int try_to_swap_out()(struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, pte_t * page_table, int gfp_mask){//page_table指向頁面表項,不是頁面表到了try_to_swap_out()這個是非常關鍵的..所以自己主要分析try_to_swap_out()函數的實現,

一開始判斷準備換出的頁的合法性,判斷是否訪問過,是的話增加其age,即使不在活躍隊列,而且最近沒有訪問,還不能立刻換出,而要保留觀察,直到其

page->age等於0為止,如果page->age等於0了,又通過了上面的測試,清除其頁表項設置為0,接著判斷該頁是否已經在swap緩存中,如果存在就判斷是否最近寫過,如果是,那就設置該頁為臟頁,同時轉移到不活躍臟隊列中,並且釋放頁面的緩存.

如果頁面不是臟頁面也不在swap緩存中,那就直接把映射解除而不是暫時斷開.如果頁面來自於mmap映射也不在swap緩存中,把頁面設置為臟頁面,並且轉移到該文件映射的臟頁面隊列中.

如果頁面是臟頁面又不屬於文件映射也不在swap緩存,那就說明該頁面很久都沒訪問了,那就必須先分配一個swap設備的磁碟頁面,將其內容寫入該磁碟頁面.

同時通過add_swap_cache將頁面鏈入swapper_space的隊列中跟活躍頁面隊列中.

至此,對一個進程的空間頁面的掃描就OK了

/* * The swap-out functions return 1 if they successfully * threw something out, and we got a free page. It returns * zero if it couldnt do anything, and any other value * indicates it decreased rss, but the page was shared. * * NOTE! If it sleeps, it *must* return 1 to make sure we * dont continue with the swap-out. Otherwise we may be * using a process that no longer actually exists (it might * have died while we slept). */static int try_to_swap_out(struct mm_struct * mm, struct vm_area_struct* vma, unsigned long address, pte_t * page_table, int gfp_mask){ pte_t pte; swp_entry_t entry; struct page * page; int onlist; pte = *page_table;//獲取頁表項 if (!pte_present(pte))//是否存在物理內存中 goto out_failed; page = pte_page(pte);//獲取具體的頁 if ((!VALID_PAGE(page)) || PageReserved(page))//頁面不合法或者頁面不允許換出swap分區 goto out_failed; if (!mm->swap_cnt) return 1;//需要具體的考察訪問一個頁面,swap_cnt減一 mm->swap_cnt--; onlist = PageActive(page);//判斷是否活躍 /* Dont look at this pte if its been accessed recently. */ if (ptep_test_and_clear_young(page_table)) {//測試頁面是否訪問過(訪問過說明年輕) age_page_up(page);//增加保留觀察時間 goto out_failed; } if (!onlist)//即使不在活躍隊列,而且最近沒有訪問,還不能立刻換出,而要保留觀察,直到其 //page->age等於0為止 age_page_down_ageonly(page); /* * If the page is in active use by us, or if the page * is in active use by others, dont unmap it or * (worse) start unneeded IO. */ if (page->age > 0) goto out_failed; if (TryLockPage(page)) goto out_failed; /* From this point on, the odds are that were going to * nuke this pte, so read and clear the pte. This hook * is needed on CPUs which update the accessed and dirty * bits in hardware. *///把頁表項的內容清0(撤銷了映射) pte = ptep_get_and_clear(page_table); flush_tlb_page(vma, address); /* * Is the page already in the swap cache? If so, then * we can just drop our reference to it without doing * any IO - its already up-to-date on disk. * * Return 0, as we didnt actually free any real * memory, and we should just continue our scan. */ if (PageSwapCache(page)) {//判斷該頁是否已經在swap緩存中 entry.val = page->index; if (pte_dirty(pte)) set_page_dirty(page);//轉入臟頁面set_swap_pte: swap_duplicate(entry);//對index做一些印證 set_pte(page_table, swp_entry_to_pte(entry));//設置pte為swap的索引了,這樣完成了交換drop_pte: UnlockPage(page); mm->rss--;//物理頁面斷開的映射,所以rss-- deactivate_page(page);//將其從活躍隊列移到不活躍隊列中 page_cache_release(page);//釋放頁面緩存out_failed: return 0; } /* * Is it a clean page? Then it must be recoverable * by just paging it in again, and we can just drop * it.. * * However, this wont actually free any real * memory, as the page will just be in the page cache * somewhere, and as such we should just continue * our scan. * * Basically, this just makes it possible for us to do * some real work in the future in "refill_inactive()". */ flush_cache_page(vma, address); if (!pte_dirty(pte)) goto drop_pte; /* * Ok, its really dirty. That means that * we should either create a new swap cache * entry for it, or we should write it back * to its own backing store. */ if (page->mapping) { set_page_dirty(page); goto drop_pte; } /* * This is a dirty, swappable page. First of all, * get a suitable swap entry for it, and make sure * we have the swap cache set up to associate the * page with that swap entry. */ entry = get_swap_page(); if (!entry.val) goto out_unlock_restore; /* No swap space left */ /* Add it to the swap cache and mark it dirty */ add_to_swap_cache(page, entry); set_page_dirty(page); goto set_swap_pte;out_unlock_restore: set_pte(page_table, pte); UnlockPage(page); return 0;}

推薦閱讀:

windows為何不收購everything等第三方優秀的工具集成在自己的系統當中以改進自己某方面系統功能的弱勢?
win7系統安裝教程
系統突發性地磁碟佔有100%,資源管理器無限重啟
一基於事件處理的RTOS原型內核的介紹-2_概念與約定
操作系統 向內核邁進(四)

TAG:操作系統 | 編程 | Linux內核 |