9.1. 阻塞进程

9.1.1. 阻塞进程

当别人让你做一件你不能马上去做的事时,你会如何反映?如果你是人类的话,而且对方也是人类的话,你只会说:“现在不行,我忙着在。闪开!”但是如果你是一个内核模块而且你被一个进程以同样的问题困扰,你会有另外一个选择。你可以让该进程休眠直到你可以为它服务时。毕竟,这样的情况在内核中时时刻刻都在发生(这就是系统让多进程在单CPU上同时运行的方法)。

这个内核模块就是一个这样的例子。文件(名叫 /proc/sleep )只可以在同一时刻被一个进程打开。如果该文件已经被打开,内核模块将调用函数 module_interruptible_sleep_on[1]。该函数修改task的状态(task是一个内核中的结构体数据结构,其中保存着对应进程的信息和该进程正在调用的系统调用,如果有的话)为TASK_INTERRUPTIBLE,意味着改进程将不会继续运行直到被唤醒,然后被添加到系统的进程等待队列中,一个等待打开该文件的队列中。然后,该函数调用系统调度器去切换到另一个不同的但有CPU运算请求的进程。

当一个进程处理完该文件并且关闭了该文件, module_close 酒杯调用执行了。该函数唤醒所有在等待队列中的进程(还没有只唤醒特定进程的机制)。然后该函数返回,那个刚刚关闭文件的进程得以继续运行。及时的,进程调度器会判定该进程执行已执行完毕,将CPU转让给别的进程。被提供CPU使用权的那个进程就恰好从先前系统调用 module_interruptible_sleep_on[2] 后的地方开始继续执行。它可以设置一个全局变量去通知别的进程该文件已被打开占用了。当别的请求该文件的进程获得CPU时间片时,它们将检测该变量然后返回休眠。

更有趣的是, module_close 并不垄断唤醒等待中的请求文件的进程的权力。一个信号,像 Ctrl+c (SIGINT) 也能够唤醒别的进程[3] 。在这种情况下,我们想立即返回 -EINTR 。这对用户很重要,举个例子来说,用户可以在某个进程接受到文件前终止该进程。

还有一点值得注意。有些时候进程并不愿意休眠,它们要么立即执行它们想做的,要么被告知任务无法进行。这样的进程在打开文件时会使用标志 O_NONBLOCK 。在别的进程被阻塞时内核应该做出的响应是返回错误代码 -EAGAIN ,像在本例中对该文件的请求的进程。程序 cat_noblock,在本章的源代码目录下可以找到,就能够使用标志位  O_NONBLOCK 打开文件。

Example 9-1. sleep.c

/*  sleep.c - create a /proc file, and if several processes try to open it at
 *  the same time, put all but one to sleep
 */

#include <linux/kernel.h>                   /* We're doing kernel work */
#include <linux/module.h>                   /* Specifically, a module */

/* Deal with CONFIG_MODVERSIONS */
#if CONFIG_MODVERSIONS==1
#define MODVERSIONS
#include <linux/modversions.h>
#endif        

/* Necessary because we use proc fs */
#include <linux/proc_fs.h>

/* For putting processes to sleep and waking them up */
#include <linux/sched.h>
#include <linux/wrapper.h>

/* In 2.2.3 /usr/include/linux/version.h includes a macro for this, but 2.0.35
 * doesn't - so I add it here if necessary.
 */
#ifndef KERNEL_VERSION
#define KERNEL_VERSION(a,b,c) ((a)*65536+(b)*256+(c))
#endif

#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
#include <asm/uaccess.h>                    /* for get_user and put_user */
#endif

/* The module's file functions */

/* Here we keep the last message received, to prove that we can process our
 * input
 */
#define MESSAGE_LENGTH 80
static char Message[MESSAGE_LENGTH];

/* Since we use the file operations struct, we can't use the special proc
 * output provisions - we have to use a standard read function, which is this
 * function
 */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
static ssize_t module_output (
   struct file *file,                      /* The file read */
   char *buf,           /* The buffer to put data to (in the user segment) */
   size_t len,                             /* The length of the buffer */
   loff_t *offset)                         /* Offset in the file - ignore */
#else
static int module_output (
   struct inode *inode,                    /* The inode read */
   struct file *file,                      /* The file read */
   char *buf,           /* The buffer to put data to (in the user segment) */
   int len)                                /* The length of the buffer */
#endif
{
   static int finished = 0;
   int i;
   char message[MESSAGE_LENGTH+30];

   /* Return 0 to signify end of file - that we have nothing more to say at this
    * point.
    */
   if (finished) {
      finished = 0;
      return 0;
   }

   /* If you don't understand this by now, you're hopeless as a kernel
    * programmer.
    */
   sprintf(message, "Last input:%s\n", Message);
   for (i = 0; i < len && message[i]; i++) 
      put_user(message[i], buf+i);

   finished = 1;
   return i;                            /* Return the number of bytes "read" */
}

/* This function receives input from the user when the user writes to the /proc
 * file.
 */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
static ssize_t module_input (
   struct file *file,                     /* The file itself */
   const char *buf,                       /* The buffer with input */
   size_t length,                         /* The buffer's length */
   loff_t *offset)                        /* offset to file - ignore */
#else
static int module_input (
   struct inode *inode,                   /* The file's inode */
   struct file *file,                     /* The file itself */
   const char *buf,                       /* The buffer with the input */
   int length)                            /* The buffer's length */
#endif
{
   int i;

   /* Put the input into Message, where module_output will later be able to use
    * it
    */
   for(i = 0; i < MESSAGE_LENGTH-1 && i < length; i++)
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
      get_user(Message[i], buf+i);
#else
      Message[i] = get_user(buf+i);
#endif
   /* we want a standard, zero terminated string */
   Message[i] = '\0';  
  
   /* We need to return the number of input characters used */
   return i;
}

/* 1 if the file is currently open by somebody */
int Already_Open = 0;

/* Queue of processes who want our file */
static struct wait_queue *WaitQ = NULL;

/* Called when the /proc file is opened */
static int module_open(struct inode *inode, struct file *file)
{
   /* If the file's flags include O_NONBLOCK, it means the process doesn't want
    * to wait for the file.  In this case, if the file is already open, we
    * should fail with -EAGAIN, meaning "you'll have to try again", instead of
    * blocking a process which would rather stay awake.
    */
   if ((file->f_flags & O_NONBLOCK) && Already_Open) 
      return -EAGAIN;

	 /* This is the correct place for MOD_INC_USE_COUNT because if a process is
    * in the loop, which is within the kernel module, the kernel module must
    * not be removed.
    */
   MOD_INC_USE_COUNT;

   /* If the file is already open, wait until it isn't */
   while (Already_Open) 
   {
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
      int i, is_sig = 0;
#endif

      /* This function puts the current process, including any system calls,
       * such as us, to sleep.  Execution will be resumed right after the
       * function call, either because somebody called wake_up(&WaitQ) (only
       * module_close does that, when the file is closed) or when a signal,
       * such as Ctrl-C, is sent to the process
       */
      module_interruptible_sleep_on(&WaitQ);
 
      /* If we woke up because we got a signal we're not blocking, return
       * -EINTR (fail the system call).  This allows processes to be killed or
       * stopped.
       */

/*
 * Emmanuel Papirakis:
 *
 * This is a little update to work with 2.2.*.  Signals now are contained in
 * two words (64 bits) and are stored in a structure that contains an array of
 * two unsigned longs.  We now have to make 2 checks in our if.
 *
 * Ori Pomerantz:
 *
 * Nobody promised me they'll never use more than 64 bits, or that this book
 * won't be used for a version of Linux with a word size of 16 bits.  This code
 * would work in any case.
 */	  
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
      for (i = 0; i < _NSIG_WORDS && !is_sig; i++)
         is_sig = current->signal.sig[i] & ~current->blocked.sig[i];

      if (is_sig) {
#else
      if (current->signal & ~current->blocked) {
#endif
         /* It's important to put MOD_DEC_USE_COUNT here, because for processes
          * where the open is interrupted there will never be a corresponding
          * close. If we don't decrement the usage count here, we will be left
          * with a positive usage count which we'll have no way to bring down
          * to zero, giving us an immortal module, which can only be killed by
          * rebooting the machine.
          */
         MOD_DEC_USE_COUNT;
         return -EINTR;
      }
   }

   /* If we got here, Already_Open must be zero */

   /* Open the file */
   Already_Open = 1;
   return 0;                                 /* Allow the access */
}

/* Called when the /proc file is closed */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
int module_close(struct inode *inode, struct file *file)
#else
void module_close(struct inode *inode, struct file *file)
#endif
{
   /* Set Already_Open to zero, so one of the processes in the WaitQ will be
    * able to set Already_Open back to one and to open the file.  All the other
    * processes will be called when Already_Open is back to one, so they'll go
    * back to sleep.
    */
   Already_Open = 0;

   /* Wake up all the processes in WaitQ, so if anybody is waiting for the
    * file, they can have it.
    */
   module_wake_up(&WaitQ);

   MOD_DEC_USE_COUNT;

#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
   return 0;                                 /* success */
#endif
}

/* This function decides whether to allow an operation (return zero) or not
 * allow it (return a non-zero which indicates why it is not allowed).
 *
 * The operation can be one of the following values:
 * 0 - Execute (run the "file" - meaningless in our case)
 * 2 - Write (input to the kernel module)
 * 4 - Read (output from the kernel module)
 *
 * This is the real function that checks file permissions. The permissions
 * returned by ls -l are for referece only, and can be overridden here. 
 */
static int module_permission(struct inode *inode, int op)
{
   /* We allow everybody to read from our module, but only root (uid 0) may
    * write to it
    */ 
   if (op == 4 || (op == 2 && current->euid == 0))
      return 0; 

   /* If it's anything else, access is denied */
   return -EACCES;
}

/* Structures to register as the /proc file, with pointers to all the relevant
 * functions. 
 */

/* File operations for our proc file. This is where we place pointers to all
 * the functions called when somebody tries to do something to our file. NULL
 * means we don't want to deal with something.
 */
static struct file_operations File_Ops_4_Our_Proc_File = {
   NULL,                                   /* lseek */
   module_output,                          /* "read" from the file */
   module_input,                           /* "write" to the file */
   NULL,                                   /* readdir */
   NULL,                                   /* select */
   NULL,                                   /* ioctl */
   NULL,                                   /* mmap */
   module_open,                    /* called when the /proc file is opened */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
   NULL,                                   /* flush */
#endif
   module_close};                          /* called when it's classed */

/* Inode operations for our proc file.  We need it so we'll have somewhere to
 * specify the file operations structure we want to use, and the function we
 * use for permissions. It's also possible to specify functions to be called
 * for anything else which could be done to an inode (although we don't bother,
 * we just put NULL).
 */
static struct inode_operations Inode_Ops_4_Our_Proc_File = {
   &File_Ops_4_Our_Proc_File,
   NULL,                                   /* create */
   NULL,                                   /* lookup */
   NULL,                                   /* link */
   NULL,                                   /* unlink */
   NULL,                                   /* symlink */
   NULL,                                   /* mkdir */
   NULL,                                   /* rmdir */
   NULL,                                   /* mknod */
   NULL,                                   /* rename */
   NULL,                                   /* readlink */
   NULL,                                   /* follow_link */
   NULL,                                   /* readpage */
   NULL,                                   /* writepage */
   NULL,                                   /* bmap */
   NULL,                                   /* truncate */
   module_permission};                     /* check for permissions */

/* Directory entry */
static struct proc_dir_entry Our_Proc_File = {
	 0,                 /* Inode number - ignore, it will be filled by 
                       * proc_register[_dynamic]
                       */
   5,                                      /* Length of the file name */
   "sleep",                                /* The file name */

   /* File mode - this is a regular file which can be read by its owner, its
    * group, and everybody else. Also, its owner can write to it.
    *
    * Actually, this field is just for reference, it's module_permission that
    * does the actual check. It could use this field, but in our
    * implementation it doesn't, for simplicity.
    */
   S_IFREG | S_IRUGO | S_IWUSR, 
   1,        /* Number of links (directories where the file is referenced) */
   0, 0,     /* The uid and gid for the file - we give it to root */
   80,       /* The size of the file reported by ls. */

   /* A pointer to the inode structure for the file, if we need it. In our
    * case we do, because we need a write function.
    */
   &Inode_Ops_4_Our_Proc_File, 

   /* The read function for the file.  Irrelevant, because we put it in the
    * inode structure above
    */
   NULL}; 

/* Module initialization and cleanup */

/* Initialize the module - register the proc file */
int init_module()
{
   /* Success if proc_register_dynamic is a success, failure otherwise */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
   return proc_register(&proc_root, &Our_Proc_File);
#else
   return proc_register_dynamic(&proc_root, &Our_Proc_File);
#endif 

   /* proc_root is the root directory for the proc fs (/proc).  This is where
    * we want our file to be located. 
    */
}

/* Cleanup - unregister our file from /proc.  This could get dangerous if
 * there are still processes waiting in WaitQ, because they are inside our
 * open function, which will get unloaded. I'll explain how to avoid removal
 * of a kernel module in such a case in chapter 10.
 */
void cleanup_module()
{
   proc_unregister(&proc_root, Our_Proc_File.low_ino);
}  

Notes

[1]

最方便的保持某个文件被打开的方法是使用命令 tail -f 打开该文件。

[2]

这就意味着该进程仍然在内核态中——该进程已经调用了 open 的系统调用,但系统调用却没有返回。在这段时间内该进程将不会得知别人正在使用CPU。

[3]

这是因为我们使用的是 module_interruptible_sleep_on. 我们也可以使用 module_sleep_on ,但这样会导致一些十分愤怒的用户,因为他们的 Ctrl+c 将不起任何作用。