Interval scheduling: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: url-access=subscription updated in citation with #oabot.
 
(40 intermediate revisions by 16 users not shown)
Line 1:
{{short description|Class of problems in computer science}}
'''Interval scheduling''' is a class of problems in [[computer science]], particularly in the area of [[algorithm]] design. The problems consider a set of tasks. Each task is represented by an ''interval'' describing the time in which it needs to be executed. For instance, task A might run from 2:00 to 5:00, task B might run from 4:00 to 10:00 and task C might run from 9:00 to 11:00. A subset of intervals is ''compatible'' if no two intervals overlap. For example, the subset {A,C} is compatible, as is the subset {B}; but neither {A,B} nor {B,C} are compatible subsets, because the corresponding intervals within each subset overlap.
 
'''Interval scheduling''' is a class of problems in [[computer science]], particularly in the area of [[algorithm]] design. The problems consider a set of tasks. Each task is represented by an ''interval'' describing the time in which it needs to be processed by some machine (or, equivalently, scheduled on some resource). For instance, task A might run from 2:00 to 5:00, task B might run from 4:00 to 10:00 and task C might run from 9:00 to 11:00. A subset of intervals is ''compatible'' if no two intervals overlap on the machine/resource. For example, the subset {A,C} is compatible, as is the subset {B}; but neither {A,B} nor {B,C} are compatible subsets, because the corresponding intervals within each subset overlap.
The ''interval scheduling maximization problem'' (ISMP) is to find a largest compatible set - a set of non-overlapping intervals of maximum size. The goal here is to execute as many tasks as possible, that is, maximize the [[throughput]]. It is equivalent to finding a [[Independent set (graph theory)|maximum independent set]] in an [[interval graph]].
 
The ''interval scheduling maximization problem'' (ISMP) is to find a largest compatible set, i.e., a set of non-overlapping intervals of maximum size. The goal here is to execute as many tasks as possible, that is, to maximize the [[throughput]]. It is equivalent to finding a [[Independent set (graph theory)|maximum independent set]] in an [[interval graph]].
In an upgraded version of the problem, the intervals are partitioned into groups. A subset of intervals is ''compatible'' if no two intervals overlap, and moreover, no two intervals belong to the same group (i.e. the subset contains at most a single representative interval of each group). Each group of intervals corresponds to a single task, and represents several alternative intervals in which it can be executed.
 
A generalization of the problem considers <math>k>1</math> machines/resources.<ref name=Survey>{{Cite journal | title = Interval scheduling: A survey| year = 2007| last1 = Kolen| first1 = A. | journal = Naval Research Logistics| volume = 54| issue = 5| pages = 530–543| doi = 10.1002/nav.20231| s2cid = 15288326| doi-access = free}}</ref> Here the goal is to find <math>k</math> compatible subsets whose union is the largest.
 
In an upgraded version of the problem, the intervals are partitioned into groups. A subset of intervals is ''compatible'' if no two intervals overlap, and moreover, no two intervals belong to the same group (i.e., the subset contains at most a single representative of each group). Each group of intervals corresponds to a single task, and represents several alternative intervals in which it can be executed.
 
The ''group interval scheduling decision problem'' (GISDP) is to decide whether there exists a compatible set in which all groups are represented. The goal here is to execute a single representative task from each group. GISDPk is a restricted version of GISDP in which the number of intervals in each group is at most ''k''.
Line 16 ⟶ 20:
All these problems are special cases of [[single-machine scheduling]], since they assume that all tasks must run on a single processor. Single-machine scheduling is a special case of [[optimal job scheduling]].
 
==Single-Interval Scheduling Maximization==
Single-interval scheduling refers to creating an interval schedule in which no intervals overlap.
 
=== Unweighted ===
[[File:IntervalSelection.svg|20px|right]]
Several algorithms, that may look promising at first sight, actually do not find the optimal solution:<ref name="KleinbergTardos">{{cite book|last=Kleinberg|first=Jon|url=https://archive.org/details/algorithmdesign0000klei|title=Algorithm Design|author2=Tardos, Éva|year=2006|publisher=Pearson/Addison-Wesley |isbn=978-0-321-29535-4|url-access=registration}}</ref>
* Selecting the intervals that start earliest is not an optimal solution, because if the earliest interval happens to be very long, accepting it would make us reject many other shorter requests.
* Selecting the shortest intervals or selecting intervals with the fewest conflicts is also not optimal.
The following [[greedy algorithm]], called [[Earliest deadline first scheduling]], does find the optimal solution for unweighted single-interval scheduling:
 
=== Greedy polynomial solution ===
The following [[greedy algorithm]], called [[Earliest deadline first scheduling]], does find the optimal solution for unweighted single-inteval scheduling:
# Select the interval, ''x'', with the earliest '''finishing time'''.
# Remove ''x'', and all intervals intersecting ''x'', from the set of candidate intervals.
# Repeat until the set of candidate intervals is empty.
 
Whenever we select an interval at step 1, we may have to remove many intervals in step 2. However, all these intervals necessarily cross the finishing time of ''x'', and thus they all cross each other (see figure). Hence, at most 1 of these intervals can be in the optimal solution. Hence, for every interval in the optimal solution, there is an interval in the greedy solution. This proves that the greedy algorithm indeed finds an optimal solution.
 
A more formal explanation is given by a [[Charging argument]].
Line 35 ⟶ 38:
The greedy algorithm can be executed in time O(''n'' log ''n''), where ''n'' is the number of tasks, using a preprocessing step in which the tasks are sorted by their finishing times.
 
=== Weighted single-interval scheduling ===
WhenProblems theinvolving intervalsweighted haveinterval weights, the problemscheduling isare equivalent to finding a maximum-weight [[Independent set (graph theory)|independent set]] in an [[interval graph]]. ItSuch problems can be solved in polynomial time.<ref name=":0">{{Cite journal |lastlast1=Bar-Noy |firstfirst1=Amotz |last2=Bar-Yehuda |first2=Reuven |last3=Freund |first3=Ari |last4=(Seffi) Naor |first4=Joseph |last5=Schieber |first5=Baruch |author5-link=Baruch Schieber |date=2001-09-01 |title=A unified approach to approximating resource allocation and scheduling |url=https://doi.org/10.1145/502102.502107 |journal=Journal of the ACM |volume=48 |issue=5 |pages=1069–1090 |doi=10.1145/502102.502107 |issn=0004-5411 |s2cid=12329294|url-access=subscription }}</ref>
 
Assuming the vectors are sorted from earliest to latest finish time, the following pseudocode determines the maximum weight of a single-interval schedule in Θ(n) time:<syntaxhighlight line="1" lang="c">
==Group Interval Scheduling Decision==
// The vectors are already sorted from earliest to latest finish time.
int v[numOfVectors + 1]; // list of interval vectors
int w[numOfVectors + 1]; // w[j] is the weight for v[j].
int p[numOfVectors + 1]; // p[j] is the # of vectors that end before v[j] begins.
int M[numOfVectors + 1];
int finalSchedule[];
 
// v[0] does not exist, and the first interval vector is assigned to v[1].
=== NP-complete when some groups contain 3 or more intervals ===
w[0] = 0; p[0] = 0; M[0] = 0;
GISDPk is NP-complete when <math>k\geq 3</math>,<ref name=NakajimaHakimi>{{Cite journal | doi = 10.1016/0196-6774(82)90030-X| title = Complexity results for scheduling tasks with discrete starting times| year = 1982| last1 = Nakajima | first1 = K. | last2 = Hakimi | first2 = S. L. | journal = Journal of Algorithms| volume = 3| issue = 4| pages = 344}}</ref> even when all intervals have the same length.<ref name=Keil>{{Cite journal | doi = 10.1016/0167-6377(92)90087-j| title = On the complexity of scheduling tasks with discrete starting times| year = 1992| last1 = Mark Keil | first1 = J. | journal = Operations Research Letters| volume = 12| issue = 5| pages = 293–295}}</ref> This can be shown by a reduction from the following version of the [[Boolean satisfiability problem]]:
 
// The following code determines the value of M for each vector.
::Let <math>X = \{x_1, x_2,..., x_p\}</math> be a set of Boolean variables. Let <math>C = \{c_1, c_2,..., c_q\}</math> be a set of
// The maximum weight of the schedule is equal to M[numOfVectors].
clauses over ''X'' such that (1) each clause in ''C'' has
for (int i = 1; i < numOfVectors + 1; i++) {
at most three literals and (2) each variable is restricted to appear once or twice positively and once negatively overall in ''C''. Decide whether there is an assignment to variables of ''X'' such that each clause in ''C'' has at least one true literal.
M[i] = max(w[i] + M[p[i]], M[i - 1]);
}
 
// Function to construct the optimal schedule
This version was shown <ref>{{Cite book
schedule (j) {
| last1 = Papadimitriou
if (j == 0) { return; }
| first1 = Christos H.
else if (w[j] + M[p[j]] >= M[j - 1]){
| last2 = Steiglitz
prepend(v[j], finalSchedule); // prepends v[j] to schedule.
| first2 = Kenneth
schedule(p[j]);
| author2-link = Kenneth Steiglitz
} else { schedule(j - 1); }
| title = Combinatorial Optimization : Algorithms and Complexity
}
| publisher = Dover
</syntaxhighlight><ref>{{Cite book |last1=Kleinberg |first1=Jon |title=Algorithm Design |last2=Tardos |first2=Eva |publisher=Pearson |year=2006 |isbn=9780321295354 |edition=1st |pages=254}}</ref>
| date = July 1998
| isbn = 978-0-486-40258-1
}}</ref> to be [[NP-complete]] likewise to the unrestricted version.
 
==== Example ====
If we have the following 9 vectors sorted by finish time, with the weights above each corresponding interval, we can determine which of these vectors are included in our maximum weight schedule which only contains a subset of the following vectors.
[[File:Weighted Interval Scheduling.png|none|thumb|872x872px]]
Here, we input our final vector (where j=9 in this example) into our schedule function from the code block above. We perform the actions in the table below until j is set to 0, at which point, we only include into our final schedule the encountered intervals which met the <math display="inline">w[j]+M[p[j]] \ge M[j-1]</math> requirement. This final schedule is the schedule with the maximum weight.
{| class="wikitable"
!j
!Calculation
!<math display="inline">w[j]+M[p[j]] \ge M[j-1]</math>
(i.e. This vector is included in the final schedule)
!Set j to
|-
|9
|<math>\begin{align} & w[j]+M[p[j]]=w[9]+M[p[9]]=5+M[6]=5+16=21 \\ &
M[j-1]=M[9-1]=M[8]=20
\end{align}</math>
|True
|j=p[j]=p[9]=6
|-
|6
|<math>\begin{align} & w[j]+M[p[j]]=w[6]+M[p[6]]=5+M[4]=5+11=16 \\ &
M[j-1]=M[6-1]=M[5]=11
\end{align}</math>
|True
|j=p[j]=p[6]=4
|-
|4
|<math>\begin{align} & w[j]+M[p[j]]=w[4]+M[p[4]]=3+M[1]=3+5=8 \\ &
M[j-1]=M[4-1]=M[3]=11 \\
\end{align}</math>
|False
|j=j-1=4-1=3
|-
|3
|<math>\begin{align} & w[j]+M[p[j]]=w[3]+M[p[3]]=6+M[1]=6+5=11 \\ &
M[j-1]=M[3-1]=M[2]=5
\end{align}</math>
|True
|j=p[j]=p[3]=1
|-
|1
|<math>\begin{align} & w[j]+M[p[j]]=w[1]+M[p[1]]=5+M[0]=5+0=5 \\ &
M[j-1]=M[1-1]=M[0]=0
\end{align}</math>
|True
|j=p[j]=p[1]=0
|}
 
==Group Interval Scheduling Decision==
 
=== NP-complete when some groups contain 3 or more intervals ===
GISDPk is NP-complete when <math>k\geq 3</math>,<ref name=NakajimaHakimi>{{Cite journal | doi = 10.1016/0196-6774(82)90030-X| title = Complexity results for scheduling tasks with discrete starting times| year = 1982| last1 = Nakajima | first1 = K. | last2 = Hakimi | first2 = S. L. | journal = Journal of Algorithms| volume = 3| issue = 4| pages = 344}}</ref> even when all intervals have the same length.<ref name=Keil>{{Cite journal | doi = 10.1016/0167-6377(92)90087-j| title = On the complexity of scheduling tasks with discrete starting times| year = 1992| last1 = Mark Keil | first1 = J. | journal = Operations Research Letters| volume = 12| issue = 5| pages = 293–295}}</ref> This can be shown by a reduction from the following version of the [[Boolean satisfiability problem]], which was shown <ref>{{Cite book|last1=Papadimitriou|first1=Christos H.|title=Combinatorial Optimization : Algorithms and Complexity|last2=Steiglitz|first2=Kenneth|date=July 1998|publisher=Dover|isbn=978-0-486-40258-1|author2-link=Kenneth Steiglitz}}</ref> to be [[NP-complete]] likewise to the unrestricted version.
 
::Let <math>X = \{x_1, x_2, \dots, x_p\}</math> be a set of Boolean variables. Let <math>C = \{c_1, c_2, \dots, c_q\}</math> be a set of clauses over ''X'' such that (1) each clause in ''C'' has at most three literals and (2) each variable is restricted to appear once or twice positively and once negatively overall in ''C''. Decide whether there is an assignment to variables of ''X'' such that each clause in ''C'' has at least one true literal.
Given an instance of this satisfiability problem, construct the following instance of GISDP. All intervals have a length of 3, so it is sufficient to represent each interval by its starting time:
* For every variable <math>x_i</math> (for {{gaps|''i''|{{=}}|1,...,''p''}}), create a group with two intervals: one starting at <math>50i-10</math> (representing the assignment <math>x_i = \mathrm{false}</math>) and another starting at <math>50i+10</math> (representing the assignment <math>x_i = \mathrm{true}</math>).
* For every clause <math>c_j</math> (for {{gaps|''j''|{{=}}|1,...,''q''}}), create a group with the following intervals:
** For every variable <math>x_i</math> that appears positively for the ''first'' time in ''C'' {{--}} an interval starting at <math>50i-12</math>.
** For every variable <math>x_i</math> that appears positively for the ''second'' time in ''C'' {{--}} an interval starting at <math>50i-8</math>. Note that both these intervals intersect the interval <math>50i-10</math>, associated with <math>x_i = \mathrm{false}</math>.
** For every variable <math>x_i</math> that appears negatively - an interval starting at <math>50i+8</math>. This interval intersects the interval <math>50i+10</math> associated with <math>x_i=\text{true}</math>.
 
Note that there is no overlap between intervals in groups associated with different clauses. This is ensured since a variable appears at most twice positively and once negatively.
Line 99 ⟶ 161:
The greedy algorithm selects only 1 interval [0..2] from group #1, while an optimal scheduling is to select [1..3] from group #2 and then [4..6] from group #1.
 
A more general approximation algorithm attains a 2-factor approximation for the weighted case.<ref name=":0" />
=== LP based approximation algorithms ===
Using the technique of [[Linear programming relaxation]], it is possible to approximate the optimal scheduling with slightly better approximation factors. The approximation ratio of the first such algorithm is asymptotically 2 when ''k'' is large, but when ''k=2'' the algorithm achieves an approximation ratio of 5/3.<ref name=Spieksma/> The approximation factor for arbitrary ''k'' was later improved to 1.582.<ref name="ChuzoiEtAl">{{Cite journal | doi = 10.1287/moor.1060.0218| title = Approximation Algorithms for the Job Interval Selection Problem and Related Scheduling Problems| journal = Mathematics of Operations Research| volume = 31| issue = 4| pages = 730| year = 2006| last1 = Chuzhoy | first1 = J. | author1-link = Julia Chuzhoy | last2 = Ostrovsky | first2 = R. | author2-link = Rafail Ostrovsky| last3 = Rabani | first3 = Y. | citeseerx = 10.1.1.105.2578}}</ref>
 
=== LP-based approximation algorithms ===
<ref>{{Cite journal|last=Bar-Noy|first=Amotz|last2=Bar-Yehuda|first2=Reuven|last3=Freund|first3=Ari|last4=(Seffi) Naor|first4=Joseph|last5=Schieber|first5=Baruch|date=2001-09-01|title=A unified approach to approximating resource allocation and scheduling|url=https://doi.org/10.1145/502102.502107|journal=Journal of the ACM|volume=48|issue=5|pages=1069–1090|doi=10.1145/502102.502107|issn=0004-5411}}</ref>
Using the technique of [[Linear programming relaxation]], it is possible to approximate the optimal scheduling with slightly better approximation factors. The approximation ratio of the first such algorithm is asymptotically 2 when ''k'' is large, but when ''k=2'' the algorithm achieves an approximation ratio of 5/3.<ref name=Spieksma/> The approximation factor for arbitrary ''k'' was later improved to 1.582.<ref name="ChuzoiEtAl">{{Cite journal
| doi=10.1287/moor.1060.0218
| title=Approximation Algorithms for the Job Interval Selection Problem and Related Scheduling Problems
| journal=[[Mathematics of Operations Research]]
| volume=31
| issue=4
| pages=730–738
| year=2006
| last1=Chuzhoy | first1=Julia | author1-link=Julia Chuzhoy
| last2=Ostrovsky | first2=Rafail | author2-link=Rafail Ostrovsky
| last3=Rabani | first3=Yuval
| citeseerx=10.1.1.105.2578}}</ref>
 
==Related problems==
==Graph representations==
An interval scheduling problem can be described by an [[intersection graph]], where each vertex is an interval, and there is an edge between two vertices if and only if their intervals overlap. In this representation, the interval scheduling problem is equivalent to finding the maximum [[Independent set (graph theory)|independent set]] in this intersection graph. In general graphs, findingFinding a maximum independent set is NP-hard. Therefore, it is interesting that in interval intersectiongeneral graphs, but it can be done exactly in polynomial time. {{citationin the special case needed|date=Aprilof 2016}}intersection graphs (ISMP).
 
A group-interval scheduling problem, i.e. (GISMPk,) can be described by a similar interval-intersection graph, with additional edges between each two intervals of the same group, i.e., this is the edge union of an interval graph and a graph consisting of n disjoint cliques of size ''k''.
 
==Variations==
Line 117 ⟶ 190:
 
Another variation is when there are ''m'' processors instead of a single processor. I.e., ''m'' different tasks can run in parallel. See [[identical-machines scheduling]].
 
[[Single-machine scheduling]] is also a very similar problem.
 
==Sources==
{{reflist}}
{{Scheduling problems}}
[[Category:Optimal scheduling]]
[[Category:NP-complete problems]]