2.6 调度，初级问题，大家指教

unbutun 2009-12-23 04:57:40

加精

调度策略：
在 Linux2.6 中,仍有三种调度策略: SCHED_OTHER、SCHED_FIFO 和 SCHED_RR。
SCHED_ORHER:普通进程，基于优先级进行调度。
SCHED_FIFO：实时进程，实现一种简单的先进先出的调度算法。
SCHED_RR：实时进程，基于时间片的SCHED_FIFO,实时轮流调度算法。

前者是普通进程调度策略,后两者都是实时进程调度策略。
SCHED_FIFO 与 SCHED_RR 的区别是:
当进程的调度策略为前者时,当前实时进程将一直占用 CPU 直至自动退出,除非有更紧迫的、
优先级更高的实时进程需要运行时,它才会被抢占 CPU;当进程的调度策略
为后者时,它与其它实时进程以实时轮流算法去共同使用 CPU，用完时间片放到运行队列尾部。

kernel：2.6.22

1.SCHED_FIFO的情况下，如果没有比当前更高的优先级，那么当前进程会一直运行下去。问：那么如果来了一个更高的，那么抢占是如何发生的？当前cpu要运行哪段代码才知道有更高的进程，并让他运行的呢，我看了一下schedule（）里并没有涉及到SCHED_FIFO宏，也没发现有什么抢占的痕迹.

2.SCHED_FIFO SCHED_RR 是如何实现的呢, 搜了一下代码，之后很少几个地方有SCHED_FIFO SCHED_RR

以上问题，希望大家能够那代码说话，如果是理论，那么Google就可以了，right？

...全文

1507 48 打赏收藏转发到动态举报

写回复

用AI写文章

48 条回复

切换为时间正序

请发表友善的回复…

发表回复

I小码哥 2010-04-23

打赏
举报

恩恩
谢谢

unbutun 2010-01-09

打赏
举报

[Quote=引用 46 楼 fetag 的回复:]
我在上班，没太多时间check整个内核的代码来检查代码的逻辑，只是简单的写一下，希望能对楼主有帮

助。你前面说的那些我就不重复了

1、实际上在每个进程创建的时候，它本质上都是以线程的形态存在的，大于或等于一个。在PC上，线程

默认的调度policy是SCHED_OTHER，在这种policy下，是没有prority一说的，你可以用

sched_get_priority_max()和sched_get_priority_min()函数来验证一下，返回值全是0。

而后两种的优先级范围是1-99，至少在我的系统上是这样，同样是上面的两个函数或得到。

2、因为现在Linux的线程库多是NPTL，而它的实现机制和POSIX的标准还有些不同。如果是用SCHED_FIFO

或者SCHED_RR的调度policy，那么肯定会在线程创建之前设置线程的调度policy，这个是用这个函数实现

的pthread_attr_setschedpolicy()。既然这样，那问题就简单了，只要跟进这个函数看下，就知道它是

怎么作用给内核的了。以下是在Glibc 2.10.1中的源代码：
C/C++ codeint
__pthread_attr_setschedpolicy (attr, policy)
pthread_attr_t*attr;int policy;
{struct pthread_attr*iattr;

assert (sizeof (*attr)>=sizeof (struct pthread_attr));
iattr= (struct pthread_attr*) attr;/* Catch invalid values.*/if (policy!= SCHED_OTHER&& policy!= SCHED_FIFO&& policy!= SCHED_RR)return EINVAL;/* Store the new values.*/
iattr->schedpolicy= policy;/* Remember we set the value.*/
iattr->flags|= ATTR_FLAG_POLICY_SET;return0;
}
这个函数没必要讲了，内容很简单，注释也很详细。关键的一点就是将新的调度policy保存到thread的属

性结构体中。接下来再看一下这个属性结构体是用来干嘛的，如下：
C/C++ codestruct pthread_attr
{/* Scheduler parameters and priority.*/struct sched_param schedparam;int schedpolicy;/* Various flags like detachstate, scope, etc.*/int flags;/* Size of guard area.*/
size_t guardsize;/* Stack handling.*/void*stackaddr;
size_t stacksize;/* Affinity map.*/
cpu_set_t*cpuset;
size_t cpusetsize;
};
注意这个结构体的第一个field，
/* The official definition. */
struct sched_param
{
int __sched_priority;
};
它是一个调度参数，也就是说在调度器调度的时候，它会check这个参数的值，然后来决定怎么调度。

接下来的分析，就是你在内核中看到了的。简单的总结一下这套流程就是：

进程创建的时候，以线程的形态来执行，进程只是负责申请、分配资源

如果线程是实时调度的，那就会通过glibc里提供的POSIX接口来动态设置自己的policy，然后作用于调度参数上

调度器在调度的时候，又会参照调度参数中的值来决定该怎么做。

PS:匆匆写的东西，未必正确。有问题可以跟帖拍砖
[/Quote]

楼上老兄很好的补充了应用层的实现部分，谢了。

这个贴我先不结，等有时间再看看，大家可以继续讨论。

呵呵

独孤过儿 2010-01-05

打赏
举报

我在上班，没太多时间check整个内核的代码来检查代码的逻辑，只是简单的写一下，希望能对楼主有帮

助。你前面说的那些我就不重复了

1、实际上在每个进程创建的时候，它本质上都是以线程的形态存在的，大于或等于一个。在PC上，线程

默认的调度policy是SCHED_OTHER，在这种policy下，是没有prority一说的，你可以用

sched_get_priority_max()和sched_get_priority_min()函数来验证一下，返回值全是0。

而后两种的优先级范围是1-99，至少在我的系统上是这样，同样是上面的两个函数或得到。

2、因为现在Linux的线程库多是NPTL，而它的实现机制和POSIX的标准还有些不同。如果是用SCHED_FIFO

或者SCHED_RR的调度policy，那么肯定会在线程创建之前设置线程的调度policy，这个是用这个函数实现

的pthread_attr_setschedpolicy()。既然这样，那问题就简单了，只要跟进这个函数看下，就知道它是

怎么作用给内核的了。以下是在Glibc 2.10.1中的源代码：



int

__pthread_attr_setschedpolicy (attr, policy)

     pthread_attr_t *attr;

     int policy;

{

  struct pthread_attr *iattr;



  assert (sizeof (*attr) >= sizeof (struct pthread_attr));

  iattr = (struct pthread_attr *) attr;



  /* Catch invalid values.  */

  if (policy != SCHED_OTHER && policy != SCHED_FIFO && policy != SCHED_RR)

    return EINVAL;



  /* Store the new values.  */

  iattr->schedpolicy = policy;



  /* Remember we set the value.  */

  iattr->flags |= ATTR_FLAG_POLICY_SET;



  return 0;

}

这个函数没必要讲了，内容很简单，注释也很详细。关键的一点就是将新的调度policy保存到thread的属

性结构体中。接下来再看一下这个属性结构体是用来干嘛的，如下：



struct pthread_attr

{

  /* Scheduler parameters and priority.  */

  struct sched_param schedparam;

  int schedpolicy;

  /* Various flags like detachstate, scope, etc.  */

  int flags;

  /* Size of guard area.  */

  size_t guardsize;

  /* Stack handling.  */

  void *stackaddr;

  size_t stacksize;

  /* Affinity map.  */

  cpu_set_t *cpuset;

  size_t cpusetsize;

};

注意这个结构体的第一个field，
/* The official definition. */
struct sched_param
{
int __sched_priority;
};
它是一个调度参数，也就是说在调度器调度的时候，它会check这个参数的值，然后来决定怎么调度。

接下来的分析，就是你在内核中看到了的。简单的总结一下这套流程就是：

进程创建的时候，以线程的形态来执行，进程只是负责申请、分配资源

如果线程是实时调度的，那就会通过glibc里提供的POSIX接口来动态设置自己的policy，然后作用于调度参数上

调度器在调度的时候，又会参照调度参数中的值来决定该怎么做。

PS:匆匆写的东西，未必正确。有问题可以跟帖拍砖

w_attana 2010-01-05

打赏
举报

我也是来学习的。。内核的代码好多啊。。头晕

wuguanlin 2010-01-05

打赏
举报

好多资料啊...就是看不完///

Wenxy1 2009-12-29

打赏
举报

摘抄一些观点：
Linux调度是基于分时技术（time-sharing)，允许多个进程“并发”运行就意味着CPU的时间被粗略地分成“片”，给每个可运行进程分配一片。当然，单处理器任何给定的时刻只能运行一个进程，当一个并发执行的进程其时间片或时限（quantum)到期时还没有终止。为保证CPU分时，不需要在程序中插入额外的代码。
调度策略是基于依照优先级排队的进程。
在Linux中，进程的优先级是动态的。调度程序跟踪进程做了些什么，并周期性地调整它们的优先级。
一种分类法把进程区分为三类：
交互式进程（Interactive process)；
批处理进程（Batch process)；
实时进程（Real-time process）；
(更正) Linux的内核,2.6.23之前的是非抢占式的，2.6.23(含)是可抢占式的, Linux进程是抢占式的。

Wenxy1 2009-12-29

打赏
举报

以下内容摘自经典的书籍ULK第三版：
第7章第1节，由于中文的pdf版本，不能复制，所以贴出原版内容，网友可以对照中文版看。

7.1. Scheduling Policy
The scheduling algorithm of traditional Unix operating systems must fulfill several conflicting objectives: fast process response time, good throughput for background jobs, avoidance of process starvation, reconciliation of the needs of low- and high-priority processes, and so on. The set of rules used to determine when and how to select a new process to run is called scheduling policy .

Linux scheduling is based on the time sharing technique: several processes run in "time multiplexing" because the CPU time is divided into slices, one for each runnable process.

Of course, a single processor can run only one process at any given instant. If a currently running process is not terminated when its time slice or quantum expires, a process switch may take place. Time sharing relies on timer interrupts and is thus transparent to processes. No additional code needs to be inserted in the programs to ensure CPU time sharing.
Recall that stopped and suspended processes cannot be selected by the scheduling algorithm to run on a CPU.

The scheduling policy is also based on ranking processes according to their priority. Complicated algorithms are sometimes used to derive the current priority of a process, but the end result is the same: each process is associated with a value that tells the scheduler how appropriate it is to let the process run on a CPU.

In Linux, process priority is dynamic. The scheduler keeps track of what processes are doing and adjusts their priorities periodically; in this way, processes that have been denied the use of a CPU for a long time interval are boosted by dynamically increasing their priority. Correspondingly, processes running for a long time are penalized by decreasing their priority.

When speaking about scheduling, processes are traditionally classified as I/O-bound or CPU-bound. The former make heavy use of I/O devices and spend much time waiting for I/O operations to complete; the latter carry on number-crunching applications that require a lot of CPU time.

An alternative classification distinguishes three classes of processes:

Interactive processes

These interact constantly with their users, and therefore spend a lot of time waiting for keypresses and mouse operations. When input is received, the process must be woken up quickly, or the user will find the system to be unresponsive. Typically, the average delay must fall between 50 and 150 milliseconds. The variance of such delay must also be bounded, or the user will find the system to be erratic. Typical interactive programs are command shells, text editors, and graphical applications.

Batch processes

These do not need user interaction, and hence they often run in the background. Because such processes do not need to be very responsive, they are often penalized by the scheduler. Typical batch programs are programming language compilers, database search engines, and scientific computations.

Real-time processes

These have very stringent scheduling requirements. Such processes should never be blocked by lower-priority processes and should have a short guaranteed response time with a minimum variance. Typical real-time programs are video and sound applications, robot controllers, and programs that collect data from physical sensors.

The two classifications we just offered are somewhat independent. For instance, a batch process can be either I/O-bound
(e.g., a database server) or CPU-bound (e.g., an image-rendering program). While real-time programs are explicitly recognized as such by the scheduling algorithm in Linux, there is no easy way to distinguish between interactive and batch programs. The Linux 2.6 scheduler implements a sophisticated heuristic algorithm based on the past behavior of the processes to decide whether a given process should be considered as interactive or batch. Of course, the scheduler tends to favor interactive processes over batch ones.

Programmers may change the scheduling priorities by means of the system calls illustrated in Table 7-1. More details are given in the section "System Calls Related to Scheduling."

Table 7-1. System calls related to scheduling System call
Description

nice( )
Change the static priority of a conventional process

getpriority( )
Get the maximum static priority of a group of conventional processes

setpriority( )
Set the static priority of a group of conventional processes

sched_getscheduler( )
Get the scheduling policy of a process

sched_setscheduler( )
Set the scheduling policy and the real-time priority of a process

sched_getparam( )
Get the real-time priority of a process

sched_setparam( )
Set the real-time priority of a process

sched_yield( )
Relinquish the processor voluntarily without blocking

sched_get_ priority_min( )
Get the minimum real-time priority value for a policy

sched_get_ priority_max( )
Get the maximum real-time priority value for a policy

sched_rr_get_interval( )
Get the time quantum value for the Round Robin policy

sched_setaffinity( )
Set the CPU affinity mask of a process

sched_getaffinity( )
Get the CPU affinity mask of a process

7.1.1. Process Preemption
As mentioned in the first chapter, Linux processes are preemptable. When a process enters the TASK_RUNNING state, the kernel checks whether its dynamic priority is greater than the priority of the currently running process. If it is, the execution of current is interrupted and the scheduler is invoked to select another process to run (usually the process that just became runnable). Of course, a process also may be preempted when its time quantum expires. When this occurs, the TIF_NEED_RESCHED flag in the thread_info structure of the current process is set, so the scheduler is invoked when the timer interrupt handler terminates.

For instance, let's consider a scenario in which only two programsa text editor and a compilerare being executed. The text editor is an interactive program, so it has a higher dynamic priority than the compiler. Nevertheless, it is often suspended, because the user alternates between pauses for think time and data entry; moreover, the average delay between two keypresses is relatively long. However, as soon as the user presses a key, an interrupt is raised and the kernel wakes up the text editor process. The kernel also determines that the dynamic priority of the editor is higher than the priority of current, the currently running process (the compiler), so it sets the TIF_NEED_RESCHED flag of this process, thus forcing the scheduler to be activated when the kernel finishes handling the interrupt. The scheduler selects the editor and performs a process switch; as a result, the execution of the editor is resumed very quickly and the character typed by the user is echoed to the screen. When the character has been processed, the text editor process suspends itself waiting for another keypress and the compiler process can resume its execution.

Be aware that a preempted process is not suspended, because it remains in the TASK_RUNNING state; it simply no longer uses the CPU. Moreover, remember that the Linux 2.6 kernel is preemptive, which means that a process can be preempted either when executing in Kernel or in User Mode; we discussed in depth this feature in the section "Kernel Preemption" in Chapter 5.

7.1.2. How Long Must a Quantum Last?
The quantum duration is critical for system performance: it should be neither too long nor too short.

If the average quantum duration is too short, the system overhead caused by process switches becomes excessively high. For instance, suppose that a process switch requires 5 milliseconds; if the quantum is also set to 5 milliseconds, then at least 50 percent of the CPU cycles will be dedicated to process switching.
Actually, things could be much worse than this; for example, if the time required for the process switch is counted in the process quantum, all CPU time is devoted to the process switch and no process can progress toward its termination.

If the average quantum duration is too long, processes no longer appear to be executed concurrently. For instance, let's suppose that the quantum is set to five seconds; each runnable process makes progress for about five seconds, but then it stops for a very long time
(typically, five seconds times the number of runnable processes).

It is often believed that a long quantum duration degrades the response time of interactive applications. This is usually false. As described in the section "Process Preemption" earlier in this chapter, interactive processes have a relatively high priority, so they quickly preempt the batch processes, no matter how long the quantum duration is.

In some cases, however, a very long quantum duration degrades the responsiveness of the system. For instance, suppose two users concurrently enter two commands at the respective shell prompts; one command starts a CPU-bound process, while the other launches an interactive application. Both shells fork a new process and delegate the execution of the user's command to it; moreover, suppose such new processes have the same initial priority (Linux does not know in advance if a program to be executed is batch or interactive). Now if the scheduler selects the CPU-bound process to run first, the other process could wait for a whole time quantum before starting its execution. Therefore, if the quantum duration is long, the system could appear to be unresponsive to the user that launched the interactive application.

The choice of the average quantum duration is always a compromise. The rule of thumb adopted by Linux is choose a duration as long as possible, while keeping good system response time.

Wenxy1 2009-12-29

打赏
举报

2.6.11是发行日期是2005年.见:
http://www.kernel.org/pub/linux/kernel/v2.6/

嗯,同意Completely Fair Scheduler，CFS）的观点,见:
http://blog.chinaunix.net/u2/73521/showart_2020147.html
http://en.wikipedia.org/wiki/Completely_Fair_Scheduler
http://www.kernel.org/doc/
http://lwn.net/Articles/240474/

In 2007, Con Colivas proposed a new scheduler, The Rotating Staircase Deadline Scheduler, which hit a snag. Ingo Molnar came up with a new scheduler, which he named the Completely Fair Scheduler, described in the LWN writeups Schedulers: the plot thickens, this week in the scheduling discussion, and CFS group scheduling.

The CFS scheduler was merged into 2.6.23.

这是kernel.org文档里的说明.感谢ustc_dylan提出的观点.

刘军卫 2009-12-29

打赏
举报

wenxy1这个版主不死心阿，你看看你讲的这些理论都是什么年代的了，都是Linux-2.6.11之前，linux2.6.11是哪年发行的阿，2008年上半年吧。代码都不知道变化了多少次，尤其是调度，没有连续性可言，完全是推到重来。CFS完全改写了以前的调度算法，还讲什么优先级队列。
linux内核也是可抢占的，原理也要弄明白！

wsmyaoquhuawei 2009-12-27

打赏
举报

goodtian 2009-12-27

打赏
举报

学习一下

Wenxy1 2009-12-27

打赏
举报

[Quote=引用 25 楼 ustc_dylan 的回复:]
csdn太差了，看看人家问的问题，linux-2.6.22是采用的O(1)的调度算法，从Linux-2.6.23之后，普通进程就采用CFS调度算法了，实时进程没变。根本就不是一个东西，在这瞎说！

检查是否有更高级别的实时进程的调度点在时钟中断里面，楼主自己去看看吧！

再说一句调度算法这一块是变化比较快的。
linux-2.6.11到linux-2.6.23之前采用的都是O(1)的调度算法
linux-2.6.23 普通进程采用了CFS, 取消的时间片的概念，采用vruntime即虚拟运行时间做为调度的参考，实时进程没有变化，后来有加入了基于CFS的组调度的思想，把进程分组。开始时候分组的标准是用户ID,后才加入了cgroup文件系统。

[/Quote]

“时间片”在多用户多任务操作系统里，依然是一个基础理论，管理CPU资源，一直会进行分时使用。

Wenxy1 2009-12-27

打赏
举报

今天，搜索到两篇文章：
http://www.ibm.com/developerworks/cn/linux/l-cn-scheduler/index.html

http://blog.chinaunix.net/u1/51562/showart_1840449.html

chen_jun_ying 2009-12-27

打赏
举报

学习一下,还没有接触过Linux 系统,本想换一个玩玩,可听说很麻烦,就暂时没搞过.

wanghui2008se 2009-12-26

打赏
举报

麻烦大家整点能看懂的

unbutun 2009-12-26

打赏
举报

[Quote=引用 30 楼 rshu 的回复:]
进程的时间片是由平均睡眠时间和优先级算出来的，不能设置的。不要以为RT方式就可以一直占用CPU，内核有能力卸载一个正在运行的进程。中断和进程调度的关系很紧密，要结合起来看的。
[/Quote]

linux总共有140个优先级，至于每个优先级的区别我认为主要反映在算时间片这部分。

rshu 2009-12-26