Beginner Guide
Beginner Guide
By: Si Chenglei
Email: [email protected]
Github: https://github.com/NoviScl/NOI
Foreword
This guide is designed specifically for those who already learned the basics of programming. I
assume that you already know how to code in C++ and understand the concept of time
complexity, as well as some other basic data structures like linked list, stack, queue, BST. (If not,
there are lots of great tutorials/MOOCs online.)
We understand that it is hard to teach oneself competitive programming. So I made this guide
and recorded a series of videos explaining it. This guide is meant for you to quickly get started on
solving NOI problems and get familiar with some important algorithms. There is no guarantee
that you can get a medal at NOI after reading this. But I am sure you can improve your knowledge
and problem solving skills if you read this guide carefully and do lots of practice for every topic.
Feel free to contact me if you find any mistakes or have doubts, I am always happy to discuss with
you.
1. Time Complexity
One important thing in NOI is time complexity. There is always a time limit for the problems and if
your algorithm is too slow, you will get Time Limit Exceeded (TLE).
Usually the time limit is 1 second. You can substitute the maximum possible number of data witin
the given range to yur algorithm's time complexity to get an exstimated number of operations
needed. For example, if your algorithm is O( ) and the data range is , then the max
operations needed is around . The number of operations can be done in 1 second is around
. So:
: no problem
: probably TLE (but I've seen exceptions...)
Now we will use a simple example to see how this works in practice.
You are given different integers . You need to pick 4 numbers from them. You can
pick the same number any times. If the sum of the 4 numbers you pick is , you output YES,
otherwise output NO.
, ,
1 #include <cstdio>
2 using namespace std;
3
4 const int MAX_N = 1002;
5
6 int main(){
7 int n, m, k[MAX_N];
8
9 scanf("%d %d", &n, &m);
10 for(int i=0; i<n; i++){
11 scanf("%d", &k[i]);
12 }
13
14 bool f = false;
15
16 // brute force all possibilities
17 for(int a=0; a<n; a++){
18 for(int b=0; b<n; b++){
19 for(int c=0; c<n; c++){
20 for(int d=0; d<n; d++){
21 if(k[a] + k[b] + k[c] + k[d] == m){
22 f = true;
23 }
24 }
25 }
26 }
27 }
28
29 if(f) printf("YES");
30 else printf("NO");
31 }
The complexity for this solution is , so the estimated number of operations is , which
will definitely get you TLE.
One way to improve it: when you have chosen the first three numbers, you also know what the
last number should be in order for the sum to be . Hence you can use binary search to fund the
desired number.
1 #include <cstdio>
2 #include <algorithm>
3 using namespace std;
4
5 const int MAX_N = 1002;
6
7 int n, m, k[MAX_N];
8
9 bool binary_search(int x){
10 int l=0, r=n;
11
12 while(r-l>=1){
13 int i = (l+r)/2;
14 if(k[i]==x) return true;
15 else if(k[i]<x) l = i+1;
16 else r = i;
17 }
18
19 return false;
20 }
21
22 void solve(){
23 //must sort before BS
24 sort(k, k+n);
25
26 bool f = false;
27
28 for(int a=0; a<n; a++){
29 for(int b=0; b<n; b++){
30 for(int c=0; c<n; c++){
31 if(binary_search(m-k[a]-k[b]-k[c])){
32 f = true;
33 }
34 }
35 }
36 }
37
38 if(f) printf("YES");
39 else printf("NO");
40 }
The complexity is now improved to . But this is still too slow.
One way to improve: when we have chosen the first two numbers, we also know the sum of the
other two numbers in order to get a total sum of . Hence, we can binary search among all
possible sums of two given numbers to find if the desired sum is present.
1 #include <cstdio>
2 #include <algorithm>
3 using namespace std;
4
5 const int MAX_N = 1002;
6
7 int n, m, k[MAX_N];
8
9 // sum of two numbers
10 int kk[MAX_N * MAX_N];
11
12 void solve(){
13 for(int c=0; c<n; c++){
14 for(int d=0; d<n; d++){
15 kk[c*n + d] = k[c] + k[d];
16 }
17 }
18
19 sort(kk, kk+n*n);
20
21 bool f = false;
22 for(int a=0; a<n; a++){
23 for(int b=0; b<n; b++){
24 // use STL BS
25 if(binary_search(kk, kk+n*n, (m-k[a]-k[b]))){
26 f = true;
27 }
28 }
29 }
30
31 if(f) printf("YES");
32 else printf("NO");
33 }
Naive recursion is often slow because it computes the same elements many times, which is a
waste of time. For example, when calcuating the fibonacci number, we compute fib(8) and fib(9)
to get fib(10). However, while computing fib(9), we need to compute fib(8) again.
One way to avoid this is to store all computed values in a table for future use. This technique is
called memoization.
2.2 Stack
1 #include <stack>
2 #include <cstdio>
3 using namespace std;
4
5 int main(){
6 stack<int> s;
7 s.push(2);
8 s.push(3);
9 printf("%d\n", s.top()); //3
10 s.pop();
11 printf("%d\n", s.top()); //2
12 }
2.3 Queue
DFS starts from the root and goes all the way down to the leftmost leaf node, then returns back
to the previous layer, travels through the second leaf node, then returns back to the previous
layer, and so on.
E.g.1 Sum
You are given integers , determine if it is possible to choose some of them (each
number can only be used once) so that their sum is .
Since for each given number, we can choose to either take or not take, this is essentially
searching through a binary tree.
1 const int MAX_N = 21;
2 int a[MAX_N];
3 int n, k;
4
5 bool dfs(int i, int sum){
6 // leaf node, note n not (n-1)
7 if(i==n) return sum == k;
8
9 // not take a[i]
10 if(dfs(i+1, sum)) return true;
11
12 // take a[i]
13 if(dfs(i+1, sum+a[i])) return true;
14
15 return false;
16 }
17
18 void solve(){
19 if(dfs(0, 0)) printf("YES");
20 else printf("NO");
21 }
Given a field, some areas some water after a rain ('W' : water, ' . ' : normal land).
Connected areas with water (including diagonally adjacent) are counted as one puddle. Output
the number of pubbles in the field.
Starting from one area with water, we can use DFS to find all areas with water connected with this
area. We count the number of such connected puddles while setting already counted areas to
normal to avoid repetition.
1 const int MAX_N = 101;
2 int N, M;
3 char field[MAX_N][MAX_N];
4
5 void dfs(int x, int y){
6 // change this area to normal
7 field[x][y] = '.';
8
9 // check all 8 adjacent areas
10 for(int dx=-1; dx<=1; dx++){
11 for(int dy=-1; dy<=1; dy++){
12 int nx = x + dx, ny = y + dy;
13 if(nx>=0 && nx<N && ny>=0 && ny<M && field[nx][ny]=='W')
14 dfs(nx, ny);
15 }
16 }
17 }
18
19 void solve(){
20 int res = 0;
21 for(int i=0; i<N; i++){
22 for(int j=0; j<M; j++){
23 if(field[i][j] == 'W'){
24 dfs(i, j);
25 res++;
26 }
27 }
28 }
29
30 printf("%d\n", res);
31 }
Since every area is only searched once (after once water is set to normal and won't be searched
again), time complexity is .
You are given a maze consisting of obstacles and normal lands. ('#' : obstacle, ' . ' : land,
'S': starting point, 'G': goal ). Each step you can move left, right, up or down. Find the minimum
number of steps needed from starting point to the goal.
Each point in the maze has entered the queue at most once. Hence complexity is .
In DFS, if at a certain state we realize that this state will definitely not generate a correct answer,
then we do not need to continue with this state any more, we can just seach the next possible
state instead. This is called pruning.
Usually DFS is used to search for solution over a tree structure. Generally, this algorithm can be
used to search over any problem space and it is called backtracking.
For permutation, it might be easier to directly use next_permutation function. Note that you
should use do while in this case, otherwise you will miss the original array case.
Example: next_permutation
1 #include <algorithm>
2 #include <iostream>
3 using namespace std;
4
5 const int N = 4;
6 int nums[N] = {1, 2, 3, 4}, total=0;
7
8 int main(){
9 do{
10 total++;
11 for(int i=0; i<N; i++){
12 cout<<nums[i];
13 }
14 cout<<endl;
15 }while(next_permutation(nums, nums+N));
16 cout<<total; //24
17 }
3. Greedy Algorithm
In greedy algorithm, we choose the current best solution at each step.
There are jobs, each job starts from time and ends at time . If you choose a job, you must
not do any other jobs during its full period (including and ). Find the maximum number of
jobs you can do.
Intuition: we want to finish the current job as early as possible so that we have more time for
other jobs. Every time we choose the job with the earliest ending time and not clashing with
previously chosen jobs.
1 const int MAX_N = 100002;
2
3 int N, S[MAX_N], T[MAX_N];
4
5 pair<int, int> jobs[MAX_N];
6
7 void solve(){
8 // sorts the first element in pair by default
9 // should sort by end time
10 for(int i=0; i<N; i++){
11 jobs[i].first = T[i];
12 jobs[i].second = S[i];
13 }
14
15 sort(jobs, jobs+N);
16
17 // t: end time of prev chosen job
18 int ans=0, t=0;
19 for(int i=0; i<N; i++){
20 if(t < jobs[i].second){
21 ans++;
22 t = jobs[i].first;
23 }
24 }
25
26 printf("%d\n", ans);
27 }
Rigorous proofs of the correctness of the algorithm are possible but will not be covered here.
Given a string with length (all characters are uppercase), and an empty string . Every time
you can either remove the first or last character from and append to the end of . Construct
the with the minimum alphabetic order.
Intuition: everytime choose the smaller character from the first and last character of . If they are
the same, compare the next character, do so until there is a difference (if all equal then doesn't
matter).
You need to cut a board into pieces, with lengths . The sum of all the cut boards
should be the same of the original board. The cost of cutting a board into 2 pieces equals to the
length of the board. For example, if you want to cut a board with length 21 into boards with
lengths 5, 8, 8, you can first cut it into 13 and 8 (cost: 21), then cut 12 into 5 and 8 (cost: 13).
Suppose we already have the cut boards , then the shortest and second shorted board
(suppose they are ) should be brothers (note it is impossible for a node to have only one
child node) and from the same parent node. Then we replace them with and continue
the process until there is only one board left.
1 typedef long long ll;
2
3 int N, L[MAX_N];
4
5 void solve(){
6 ll ans = 0;
7
8 while(N > 1){
9 int min1=0, min2=1;
10 // min1: shortest, min2: second shortest
11 if(L[min1] > L[min2]) swap(min1, min2);
12
13 for(int i=2; i<N; i++){
14 if(L[i] < L[min1]){
15 min2 = min1;
16 min1 = i;
17 }
18 else if(L[i] < L[min2]){
19 min2 = i;
20 }
21 }
22
23 int t = L[min1] + L[min2];
24 ans += t;
25
26 // replace min1 with t
27 // swap min2 with last ele and del it
28 if(min1 == N-1) swap(min1, min2);
29 L[min1] = t;
30 L[min2] = L[N-1];
31 N--;
32 }
33
34 printf("%lld\n", ans);
35 }
Complexity O( ). This can be further improved with priority queue (every time pop two front,
push their sum).
1 typedef long long ll;
2
3 int N, L[MAX_N];
4
5 void solve(){
6 ll ans = 0;
7
8 // small root heap
9 priority_queue<int, vector<int>, greater<int> > que;
10
11 for(int i=0; i<N; i++){
12 que.push(L[i]);
13 }
14
15 while(que.size() > 1){
16 int l1, l2;
17 l1 = que.top();
18 que.pop();
19 l2 = que.top();
20 que.pop();
21
22 ans += l1+l2;
23 que.push(l1+l2);
24 }
25
26 printf("%lld\n", ans);
27 }
Complexity: O( )
4. Dynamic Programming
There are two approaches, top-down (recursion), bottom-up (iteration). I find it best to explain DP
with examples. Most examples I list here are must-know for NOI.
You have items each with weight and value . Your bag has max wieght capacity . Find
the max value of items that can be put in the bag.
Let dp(i, j) denote the max value the bag can have using only the first items and with capacity j.
For each item, we either take or do not take (if capcacity ). So we can just choose the max from
those two options.
Complexity: O( )
We can see the that the most important part of DP is to identify the subproblem states and
update equations.
In this problem, the subproblem state is the range of items available and the capacity. The update
equations can then be deduced easily.
Note that we need to go from right to left in the inner loop in order to use the values from
previous .
Find the length of the longest common subsequence of two string. For example, the LCS of 'abcd'
and 'becd' is 'bcd'.
String length:
1 int n, m;
2 char s[MAX_N], t[MAX_M];
3
4 int dp[MAX_N+1][MAX_M+1];
5
6 void solve(){
7 memset(dp, 0, sizeof(dp));
8 for(int i=1; i<=n; i++){
9 cin>>s[i];
10 }
11 for(int i=1; i<=m; i++){
12 cin>>t[i];
13 }
14
15 for(int i=1; i<=MAX_N; i++){
16 for(int j=1; j<=MAX_M; j++){
17 dp[i][j] = max(dp[i-1][j], dp[i][j-1]);
18 if(s[i]==t[j]){
19 dp[i][j] = max(dp[i][j], dp[i-1][j-1]+1);
20 }
21 }
22 }
23
24 cout<<dp[n][m];
25 }
You have types of items each with weight and value . Your bag has max wieght capacity
. Find the max value of items that can be put in the bag. Note that you can take unlimited number
of copies of each type of item.
In this case, we need to add another inner loop to find the best number of copies to take within
the capacity.
1 int dp[MAX_N + 1][MAX_W + 1];
2
3 void solve(){
4 memset(dp, 0, sizeof(dp));
5
6 for(int i=1; i<=n; i++){
7 for(int j=1; j<=W; j++){
8 for(int k=0; k*w[i]<=j; k++){
9 dp[i][j] = max(dp[i][j], dp[i-1][j - k*w[i]] + k*v[i]);
10 }
11 }
12 }
13 printf("%d\n", dp[n][W]);
14 }
Complexity: O( )
We can further improve this algorithm. Note that choosing in is the same as choosing
in (take one copy of th item). Hence we can use this to reduce repeated
calculation.
Note that in the inner loop we go from left to right to use the computed values at this iteration.
You have items each with weight and value . Your bag has max wieght capacity . Find
the max value of items that can be put in the bag.
The difference of this problem with the first 0-1 knapsack is that the range for and become
much larger and O( ) will get TLE.
Notice that the value of is rather small this time. Let dp(i, j) denote the minimum weight needed
to get total value of choosing only from the first items. Similarly, for each item, we either take
or do not take.
Complexity: O( )
E.g.5 Sum
Given different intergers , each can be taken at most times. Determine if it's possible to
choose among them so that their sum is .
Let dp(i, j) denote the number of ways to choose only from the first numbers to get sum .
We have:
1 int n;
2 int K;
3 int a[MAX_N];
4 int m[MAX_N];
5
6 bool dp[MAX_N+1][MAX_K+1];
7
8 void solve(){
9 memset(dp, 0, sizeof(dp));
10 for(int i=0; i<=n; i++){
11 dp[i][0] = 1;
12 }
13
14 for(int i=1; i<=n; i++){
15 for(int j=1; j<=K; j++){
16 for(int k=0; k<=m[i] && k*a[i]<=j; k++){
17 dp[i][j] += dp[i-1][j-k*a[i]];
18 }
19 }
20 }
21
22 if(dp[n][K]) cout<<"YES";
23 else cout<<"NO";
24 }
Complexity: O( )
By redesigning the problem state and formulation, we can actually improve the complexity.
Let dp(i, j) denote the max number of the th number left when choosing from the first numbers
to get sum .
, if
, other cases ( )
1 int dp[NAX_K + 1];
2
3 void solve(){
4 memset(dp, -1, sizeof(dp));
5 dp[0] = 0;
6 for(int i=1; i<=n; i++){
7 for(int j=0; j<=K; j++){
8 if(dp[j]>=0){
9 dp[j] = m[i];
10 }
11 else if(j<a[i] || dp[j-a[i]]<=0){
12 dp[j]=-1;
13 }
14 else{
15 dp[j] = dp[j-a[i]]-1;
16 }
17 }
18 }
19
20 if(dp[K]>=0) cout<<"YES";
21 else cout<<"NO";
22 }
Given a sequence with numbers: . Find the length of the LIS of the sequence. (LIS: a
subsequence where for any .)
Let dp[i] denote: the length of the longest LIS ending with
We have:
1 int n;
2 int a[MAX_N];
3 int dp[MAX_N];
4
5 void solve(){
6 int res=0;
7 for(int i=0; i<n; i++){
8 dp[i]=1;
9 for(int j=0; j<i; j++){
10 if(a[j]<a[i])
11 dp[i] = max(dp[i], dp[j]+1);
12 }
13 res = max(res, dp[i]);
14 }
15
16 cout<<res;
17 }
Complexity: O( )
Another way to think of the problem is: if the length of the subsequence is fixed, we want the last
number of the sequence to be small so that more larger numbers can be appended.
Let dp[i] denote the minimum end number of a LIS with length , INF if impossible.
1 int dp[MAX_N];
2
3 void solve(){
4 fill(dp, dp+n, INF);
5 for(int i=0; i<n; i++){
6 *lower_bound(dp, dp+n, a[i]) = a[i];
7 }
8 cout<<lower_bound(dp, dp+n, INF)-dp;
9 }
Complexity: O(nlogn)
Split identical items into less than or equal to groups. Find the number of ways to split mod
.
A naive thought would be to take out from first and split the rest into groups.
However, this is wrong because it counted repeatedly. For example, it will count and
as two different ways.
So we have:
1 int n, m;
2 int dp[MAX_M + 1][MAX_N + 1];
3
4 void solve(){
5 dp[0][0] = 1;
6 for(int i=1; i<=m; i++){
7 for(int j=0; j<=n; j++){
8 if(j >= i){
9 dp[i][j] = (dp[i-1][j] + dp[i][j-i])%M;
10 }
11 else{
12 // must have a_i = 0
13 dp[i][j] = dp[i-1][j];
14 }
15 }
16 }
17 }
Complexity: O(nm)
There are types of items, the th type has copies. Items of the same type are counted as the
same. How many ways are there to take items from them? Output the result mod M.
Let denoate the number of ways to take items from the first types only.
To take items from the first types, we can first take items from the first ( ) types
and take items of the th type:
We observe that:
Thus we have:
1 int n, m;
2 int a[MAX_N+1];
3
4 int dp[MAX_N+1][MAX_M+1];
5
6 void solve(){
7 memset(dp, 0, sizeof(dp));
8
9 //always have one way to take nothing
10 for(int i=0; i<=n; i++){
11 dp[i][0] = 1;
12 }
13
14 for(int i=1; i<=n; i++){
15 for(int j=1; j<=m; j++){
16 if(j-1-a[i]>=0){
17 //add M to avoid negative
18 dp[i][j] = (dp[i][j-1]+dp[i-1][j]-dp[i-1][j-1-a[i]]+M)%M;
19 }
20 else{
21 dp[i][j] = (dp[i][j-1]+dp[i-1][j])%M;
22 }
23 }
24 }
25 cout<<dp[n][m];
26 }
Complexity: O(nm)
5. Data Structure
5.1 Heap (Priority Queue)
With heap, you can insert and get the smallest element within O( ) time.
Heap is a complete binary tree where the parent nodes' values are always smaller than or equal
to the child nodes' value. (The other way round for big root heap.)
Example:
1 #include <queue>
2 #include <vector>
3 #include <iostream>
4 using namespace std;
5
6 struct cmp{
7 bool operator()(int a, int b){
8 return a > b;
9 }
10 };
11
12 int main(){
13 priority_queue<int> pque;
14
15 pque.push(3);
16 pque.push(5);
17 pque.push(1);
18
19 while(!pque.empty()){
20 cout<<pque.top()<<endl; // 5 3 1
21 pque.pop();
22 }
23
24 priority_queue<int, vector<int>, greater<int>> que;
25
26 que.push(3);
27 que.push(5);
28 que.push(1);
29
30 while(!que.empty()){
31 cout<<que.top()<<endl; // 1 3 5
32 que.pop();
33 }
34
35 priority_queue<int, vector<int>, cmp> Q;
36
37 Q.push(3);
38 Q.push(5);
39 Q.push(1);
40
41 while(!Q.empty()){
42 cout<<Q.top()<<endl; // 1 3 5
43 Q.pop();
44 }
45
46 }
47
You need to drive a car for a distance of L. Initially there are P units of petrol in the car. Travelling
a unit distance takes i unit of petrol. The car can't move if there's no petrol left. There are N gas
stations on the way, the th station is unit distance away from the starting point, can provide
maximum of unit of petrol. Suppose the car can carry infinite amount of petrol, determine if
the car can reach the end point. If so, output the minimum number of times needed to add
petrol, else output -1.
Adding the same amount of petrol sooner or later does not affect the final outcome. Therefore,
we can consider passing through a gas station as adding this gas station as a possible option in
the queue that can later be chosen. We only add petrol when there is no petrol left to move
forward to the next gas station. Every time, we add petrol from the gas station with the maximum
petrol from the queue.
1 const int MAXN = 10005;
2 int L, P, N;
3 int A[MAX_N], B[MAX_N];
4 //A: gas station pos
5 //B: petrol amount
6
7 void solve(){
8 // add end point as a gas station
9 A[N] = L;
10 B[N] = 0;
11 N++;
12
13 priority_queue<int> que;
14 int ans=0, pos=0, tank=P;
15
16 for(int i=0; i<N; i++){
17 int d = A[i] - pos; //dist to go
18
19 // keep adding gas until enough to reach next
20 while(tank - d < 0){
21 if(que.empty()){
22 puts("-1");
23 return;
24 }
25
26 tank += que.top();
27 que.pop();
28 ans++;
29 }
30
31 tank -= d;
32 pos = A[i];
33 que.push(B[i]);
34 }
35
36 printf("%d\n", ans);
37 }
1 struct node{
2 int val;
3 node *lch, *rch;
4 };
5
6 node *insert(node *p, int x){
7 // p: parent node
8 if(p == NULL){
9 node *q = new node;
10 q->val = x;
11 q->lch = q->rch = NULL;
12 return q;
13 }
14 else{
15 if(x < p->val) p->lch = insert(p->lch, x);
16 else p->rch = insert(p->rch, x);
17 return p;
18 }
19 }
20
21 bool find(node *p, int x){
22 if(p==NULL) return false;
23 else if(x==p->val) return true;
24 else if(x < p->val) return find(p->lch, x);
25 else return find(p->rch, x);
26 }
27
28 node* remove(node *p, int x){
29 if(p==NULL) return NULL;
30 else if(x < p->val) p->lch = remove(p->lch, x);
31 else if(x > p->val) p->rch = remove(p->rch, x);
32 // remove current node
33 else if(p->lch == NULL){
34 node *q = p->rch;
35 delete p;
36 return q;
37 }
38 else if(p->lch->rch == NULL){
39 node *q = p->lch;
40 q->rch = p->rch;
41 delete p;
42 return q;
43 }
44 else{
45 node *q;
46 for(q=p->lch; q->rch->rch!=NULL; q=q->rch);
47 node *r = q->rch; //predecessor
48 q->rch = r->lch;
49 r->lch = p->lch;
50 r->rch = p->rch;
51 delete p;
52 return r;
53 }
54 }
Self-balanced BST is more efficient. Examples are AVL, Red-Black, Splay, SBT, etc. (Will include
some of them when I get time.)
We can directly use set or map from STL for balanced BST.
1 #include <cstdio>
2 #include <set>
3 using namespace std;
4
5 int main(){
6 set<int> s;
7
8 s.insert(1);
9 s.insert(3);
10
11 set<int>::iterator ite;
12
13 ite = s.find(1);
14 if(ite==s.end()) puts("not found");
15 else puts("found");
16
17 s.erase(3);
18
19 if(s.count(3)!=0) puts("found");
20 else puts("found");
21
22 for(ite=s.begin(); ite!=s.end(); ++ite){
23 printf("%d\n", *ite);
24 }
25 }
1 #include <cstdio>
2 #include <map>
3 #include <string>
4 using namespace std;
5
6 int main(){
7 map<int, const char*> m;
8
9 m.insert(make_pair(1, "ONE"));
10 m.insert(make_pair(10, "TEN"));
11 m[100] = "HUNDRED";
12
13 map<int, const char*>::iterator ite;
14 ite = m.find(1);
15 if(ite==m.end()) puts("not found");
16 else puts(ite->second);
17
18 puts(m[10]);
19
20 m.erase(10);
21
22 for(ite=m.begin(); ite!=m.end(); ++ite){
23 printf("%d: %s\n", ite->first, ite->second);
24 }
25
26 return 0;
27 }
set and map do not allow yu to store repeated elements, you can do so with multiset and
multimap.
Disjoint set use tree structures to represent groupings. Initially every node's parent node is itself.
If we want to merge two tree, we can just set one root to be the child of the other root. We can
compare if two nodes are in the same group by comparing if they have the same root node. Two
common tricks that can speed up the operations are path compression: connect nodes directly to
the root node instead of passing through a lot of intermediate parent nodes; and merge by rank:
set the shorted tree as the child of the higher tree when merging.
(From my own experience, disjoint set with path compression is usually fast enough.)
1 int par[MAX_N]; //parent
2 int rank[MAX_N]; //height
3
4 void init(int n){
5 for(int i=0; i<n; i++){
6 par[i] = i;
7 rank[i] = 0;
8 }
9 }
10
11 int find(int x){
12 if(par[x]==x)
13 return x;
14 return par[x] = find(par[x]); //path compression
15 }
16
17 // merge
18 void unite(int x, int y){
19 x = find(x);
20 y = find(y);
21 if(x==y) return;
22
23 //merge by rank
24 if(rank[x]<rank[y]){
25 par[x] = y;
26 }
27 else{
28 par[y] = x;
29 if(rank[x]==rank[y]) rank[x]++;
30 }
31 }
32
33 bool same(int x, int y){
34 return find(x)==find(y);
35 }
There are N animals, indexed 1, 2, …, N. Each animal belongs to one of A, B, C group. A eats B, B
eats C, C eats A. Input K messages of two types: 1) x and y belong to the same group. 2) x eats y.
However, some messages may be wrong. For example, they provide indices that are out of range
or messages in conflict with previous messages. Output the number of wrong messages.
For each animal , we create 3 elements: and construct disjoint set with these
elements. means animal belongs to group . Each group in the disjoint set means
that all elements in the group either all happen or all not happen.
If x eats y: merge .
1 int N, K;
2 int T[MAX_K], X[MAX_K], Y[MAX_K];
3 //T: message type
4
5 //disjoint set implementation omitted here
6 void solve(){
7 init(N*3);
8
9 int ans=0;
10 for(int i=0; i<K; i++){
11 int t = T[i];
12 int x = X[i]-1, y= Y[i]-1;
13
14 if(x<0 || x>=N || y<0 || y>=N){
15 ans++;
16 continue;
17 }
18
19 if(t==1){ //type1
20 if(same(x, y+N)||same(x, y+2*N)){
21 ans++;
22 }
23 else{
24 unite(x, y);
25 unite(x+N, y+N);
26 unite(x+N*2, y+N*2);
27 }
28 }
29 else{ //type2
30 if(same(x, y)||same(x, y+2*N)){
31 ans++;
32 }
33 else{
34 unite(x, y+N);
35 unite(x+N, y+2*N);
36 unite(x+2*N, y);
37 }
38 }
39 }
40
41 printf("%d\n", ans);
42 }
This question is a simpliefied version of the above example. We only need to create two copies of
each element to include all possibilities.
AC codes:
1 #include <iostream>
2 #include <cstring>
3 #include <vector>
4 using namespace std;
5
6 const int MAXN = 99999;
7 int father[MAXN*2];
8 int N, Q;
9
10 int find_father(int n){
11 if(father[n]!=n){
12 father[n]=find_father(father[n]);
13 }
14 return father[n];
15 }
16
17 void join(int a, int b){
18 int f_a = find_father(a);
19 int f_b = find_father(b);
20 if(f_a!=f_b){
21 father[f_a]=f_b;
22 }
23 }
24
25 int main(){
26 char cmd;
27 int a, b;
28 cin>>N>>Q;
29 for(int i=1; i<=2*N; i++){
30 father[i] = i;
31 }
32
33 for(int i=0; i<Q; i++){
34 cin>>cmd>>a>>b;
35 if(cmd=='Q'){
36 int f_a = find_father(a);
37 int f_b = find_father(b);
38 int f_aN = find_father(a+N);
39 if(f_a==f_b){
40 cout<<'R'<<endl;
41 }
42 else if(f_aN==f_b){
43 cout<<'A'<<endl;
44 }
45 else{
46 cout<<'?'<<endl;
47 }
48 }
49 else if(cmd=='R'){
50 join(a, b);
51 join(a+N, b+N);
52 }
53 else{
54 join(a, b+N);
55 join(a+N, b);
56 }
57 }
58 }
6. Graph
6.1 Representation and Search
A graph consists of vertices/nodes and edges. A graph can be wither directed or undirected
(without directions). Directed graphs that do not contain cycles are called DAG (Directed Acyclic
Graph).
Graphs can be represented by adjacent matrix or adjacent list. In an adjacent matrix, each
number represents whether two nodes are connected or the distantce between two nodes.
In an adjacent list, each list stores all nodes (sometimes together with the distance) connected to
one particular node.
Given a graph with nodes. Colour each node of the graph so that adjacent nodes have differnet
colours. Decide if it is possible to only use two different colours to do so. Given that there is no
repeated edges or self-cycles. (Such graphs are called bipartite graphs.)
Since only two different colours are allowed, once we know the colour of one node, we should
know the colour of all the adjacent nodes of this node. Hence we just need to iterate through all
nodes with DFS and complete the colouring.
1 vector<int> G[MAX_V];
2 int V;
3 int color[MAX_V]; // 1 or -1
4
5 bool dfs(int v, int c){
6 color[v] = c;
7 for(int i=0; i<G[v].size(); i++){
8 if(color[G[v][i]]==c) return false;
9 if(color[G[V][i]]==0 && !dfs(G[v][i], -c)) return false;
10 }
11 return true;
12 }
13
14 void solve(){
15 for(int i=0; i<V; i++){
16 if(color[i]==0){
17 if(!dfs(i, 1)){
18 printf("No\n");
19 return;
20 }
21 }
22 }
23 printf("Yes\n");
24 }
O(|V|+|E|)
In other words, we just need to keep checking if passing through an edge can shorten the
distance between the two nodes.
The same vertex will be updated at most once, so the while loop will run at most times.
Hence the complexity is O( ). However, this does not hold if there are negative cycles
because we can run over the negative cycles forever and keep reducing the distance. We can use
this property to check for the existence of negative cycles:
1 bool find_negative_loop(){
2 for(int i=0; i<V; i++) d[i]=INF;
3 d[s] = 0;
4
5 for(int i=0; i<V; i++){
6 for(int j=0; j<E; j++){
7 edge e = es[j];
8 if(d[e.to] > d[e.from] + e.dist){
9 d[e.to] = d[e.from] + e.dist;
10
11 //if the Vth loop still updates
12 if(i==V-1) return true;
13 }
14 }
15 }
16 return false;
17 }
The algorithm is easy to implement because it does not even require you to store the graph. You
just need to store and iterate through the edges.
An improvement is to only take care of edges that have been updated with a queue. Note that
now you need to store the graph in an adjacent list.
1 int SPFA(int start, int target){
2 queue<int> Q;
3 for(int i=1; i<=n; i++){
4 dis[i] = INF;
5 }
6 dis[start] = 0;
7 memset(vis, false, sizeof(vis));
8 Q.push(start);
9 while(!Q.empty()){
10 int u = Q.front();
11 Q.pop();
12 vis[u] = false; //out of Q
13 if(++count[u]>=n){
14 //count number of times being pushed
15 cout<<"negative cycle!"<<endl;
16 return -1;
17 }
18 for(auto edge: adjlist[u]){
19 int v = edge.v;
20 int w = edge.w;
21 if(dis[v] > dis[u]+w){
22 dis[v] = dis[u] + w;
23 //push if v not in Q yet
24 if(!vis[v]){
25 Q.push(v);
26 vis[v] = true;
27 }
28 }
29 }
30 }
31 return dis[target];
32 }
2. Dijkstra
Now let's consider cases without negative cycles. In Bellman-Ford, it is a waste of time to update
from if itself is not yet the shortest distance because in that case the updated
still can't be the shortest distance anyway. Also, it is a waste of time to keep checking those points
that are already updated to the shortest distance.
To avoid those cases, every time we choose the closest node from the starting point and save
their shortest distance. This is like a greedy strategy.
1 int map[MAX_V][MAX_V], dist[MAX_V], visited[MAX_V];
2
3 void dijkstra(){
4 memset(dist, 0x3f, sizeof(dist));
5 memset(visited, 0, sizeof(visited));
6 int min_dist, min_vertex;
7
8 dist[start] = 0;
9 for(int i=0; i<V; i++){
10 min_dist = INF;
11 for(int j=0; j<V; j++){
12 if(dist[j]<min_dist && !visited[j]){
13 min_dist = dist[j];
14 min_vertex = j;
15 }
16 }
17
18 visited[min_vertex] = 1;
19
20 for(int k=0; k<V; k++){
21 if(map[min_vertex][k] < INF){
22 dist[k] = min(dist[k], min_dist+map[min_vertex][k]);
23 }
24 }
25 }
26 }
Complexity is: O( ).
Iterating through all nodes to find the closest vertex takes O( ). To improve this, we can use a
priority queue.
1 struct edge{ int to, dist; };
2 typedef pair<int, int> P;
3 //first: dist, second: vertex index
4 //p_q sorts the first value by default
5 int V;
6 vector<edge> G[MAX_V];
7 int d[MAX_V];
8
9 void dijkstra(int s){
10 priority_queue<P, vector<P>, greater<P> > que;
11 fill(d, d+V, INF);
12 d[s] = 0;
13 que.push(P(0, s));
14
15 while(!que.empty()){
16 P p = que.top();
17 que.pop();
18 int v = p.second;
19 if(d[v] <= p.first) continue;
20 for(int i=0; i<G[v].size(); i++){
21 edge e = G[V][i];
22 if(d[e.to] > d[v] + e.dist){
23 d[e.to] = d[v] + e.dist;
24 que.push(P(d[e.to], e.to));
25 }
26 }
27 }
28 }
3. Floyd-Warshall
Floyd-Warshall is used to calculate the shoartest distance between any pair of two points in a
graph. It is essentially a dynamic programming algorithm.
Suppose represents the shortest distance between node and that only passes nodes
~ on the way. Then because it cannot pass any other node than
themselves. When passing only nodes ~ , the shortest path either passes the node once or
does not pass node . If it does not pass node , . If it passes node ,
. Hence we get our transition equation:
. This could be implemented
with a rolling array: .
1 int d[MAX_V][MAX_V];
2 int V;
3
4 void floyd_warshall(){
5 for(int k=0; k<V; k++){
6 for(int i=0; i<V; i++){
7 for(int j=0; j<V; j++){
8 d[i][j] = min(d[i][j], d[i][k]+d[k][j]);
9 }
10 }
11 }
12 }
The complexity is O(|V| ). Like Bellman-Ford, it works on graphs with negative edges and to
detect negative cycles we just need to check if any after the loops.
4. Path Reconstruction
To reconstruct the shortest path, we need to store the previous node of every node in the
shortest path. We need to update it every time make update the shortest distance of a node. For
example, in Dijkstra:
1 int prev[MAX_V];
2
3 void dijkstra(int s){
4 fill(d, d+V, INF);
5 fill(used, used+V, false);
6 fill(prev, prev+V, -1);
7 d[s] = 0;
8
9 while(true){
10 int v = -1;
11 for(int u=0; u<V; u++){
12 if(!used[u] && (v==-1 || d[u]<d[v])) v = u;
13 }
14
15 if(v==-1) break;
16 used[v] = true;
17
18 for(int u=0; u<V; u++){
19 if(d[u] > d[v] + cost[v][u]){
20 d[u] = d[v] + cost[v][u];
21 prev[u] = v;
22 }
23 }
24 }
25 }
26
27 vector<int> get_path(int t){
28 vector<int> path;
29 for(; t!=-1; t=prev[t]) path.push_back(t);
30 reverse(path.begin(), path.end()); //from s to t
31 return path;
32 }
Given an undirected graph, if it has a subgraph where any two nodes in the subgraph are
connected and the subgraph is a tree, then it is called a spanning tree. The one with miminum
sum of edge costs is the minimum spanning tree (MST).
1. Prim
Prim is similar to Dijkstra, where we keep adding new edges that are closest to the current MST,
which is like a greedy approach.
1 int cost[MAX_V][MAX_V]; //adj matrix
2 int mincost[MAX_V]; //from node to MST
3 bool used[MAX_V];
4 int V;
5
6 int prim(){
7 for(int i=0; i<V; i++){
8 mincost[i] = INF;
9 used[i] = false;
10 }
11 //doesn't matter where to start
12 mincost[0] = 0;
13 int res = 0;
14
15 while(true){
16 int v = -1;
17 for(int u=0; u<V; u++){
18 if(!used[u] && (v==-1 || mincost[u]<mincost[v])) v=u;
19 }
20
21 if(v==-1) break;
22 used[v] = true;
23 res += mincost[v];
24
25 //update dist from MST
26 for(int u=0; u<V; u++){
27 mincost[u] = min(mincost[u], cost[v][u]);
28 }
29 }
30 return res;
31 }
This takes O(|V| ) but similar to Dijkstra, can be imporved to O(| |log| |) by using a priority
queue instead of iterating and choosing the closest point.
2. Kruskal
Kruskal is also a greedy approach where we add the shorst edge every time unless it forms cycles.
To determine whethere the new edge forms cycles with already added edges, we can use disjoint
set.
1 struct edge{ int u, v, cost; };
2
3 bool comp(const edge& e1, const edge& e2){
4 return e1.cost < e2.cost;
5 }
6
7 edges es[MAX_E];
8 int V, E;
9
10 int kruskal(){
11 sort(es, es+E, comp);
12 init_union_find(V);
13 int res = 0;
14 for(int i=0; i<E; i++){
15 edge e = es[i];
16 if(!same_father(e.u, e.v)){
17 unite(e.u, e.v);
18 res += e.cost;
19 }
20 }
21 return res;
22 }
The implementation of disjont set is omitted here for simplicity. The time complexity is O(| |log|
|).
3. Applications
A district has R roads and N crossings. All roads are bidirectional. Find the second shortest path
length from crossing number 1 to number N. The same road can be passed many times.
The second shortest path to some point is either the shortest path to another point plus the
edge , or the second shortest path to plus the edge . Hence, for every node, we
need to store not only the shortest distance, but also the second shortest distance.
1 int N, R;
2 vector<edge> G[MAX_N];
3
4 int dist[MAX_N];
5 int dist2[MAX_N];
6
7 void solve(){
8 priority_queue<P, vector<P>, greater<P> > que;
9 fill(dist, dist+N, INF);
10 fill(dist2, dist2+N, INF);
11 dist[0] = 0;
12 que.push(P(0, 0));
13
14 while(!que.empty()){
15 P p = que.top();
16 que.pop();
17 int v = p.second, d = p.first;
18 if(d > dist2[v]) continue;
19 for(int i=0; i<G[v].size(); i++){
20 edge &e = G[v][i];
21 int d2 = d + e.cost;
22 //d2 may be shortest or second shortest
23 if(d2 < dist[e.to]){
24 swap(dist[e.to], d2);
25 que.push(P(dist[e.to], e.to));
26 }
27 if(d2 < dist2[e.to] && d2 > dist[e.to]){
28 dist2[e.to] = d2;
29 que.push(P(dist2[e.to], e.to));
30 }
31 }
32 }
33 printf("%d\n", dist2[N-1]);
34 }
We need to conscript N women and M men. Conscripting each person costs $10000. But if they
are familiar with the conscripted people, the cost can be lower. Given the closeness (1~9999) of
some people (R relationships), the cost of conscripting a new person is 10000 - (max closeness
with a person among the conscripted people). Find the order of conscription that makes the total
cost of conscription the lowest.
1 int N, M, R;
2 int x[MAX_R], y[MAX_R], d[MAX_R];
3
4 void solve(){
5 V = N+M;
6 E = R;
7 for(int i=0; i<R; i++){
8 es[i] = (edge){x[i], N+y[i], -d[i]};
9 }
10 printf("%d\n", 10000*(N+M)+kruskal());
11 }
12
FJ has N (2 <= N <= 1,000) cows numbered 1..N standing along a straight line waiting for feed. The
cows are standing in the same order as they are numbered, and since they can be rather pushy, it
is possible that two or more cows can line up at exactly the same location (that is, if we think of
each cow as being located at some coordinate on a number line, then it is possible for two or
more cows to share the same coordinate).
Some cows like each other and want to be within a certain distance of each other in line. Some
really dislike each other and want to be separated by at least a certain distance. A list of ML (1 <=
ML <= 10,000) constraints describes which cows like each other and the maximum distance by
which they may be separated; a subsequent list of MD constraints (1 <= MD <= 10,000) tells which
cows dislike each other and the minimum distance by which they must be separated.
Your job is to compute, if possible, the maximum possible distance between cow 1 and cow N
that satisfies the distance constraints.
Analysis: First of all, the cows are ordered so . For cows like each other, there is
maximum distance constraint: , for cows dislike each other, there is
minimum distance constraint: . The problem is then to find max value of
while satisfying the above constraints. This is a linear programming algorithm and
there are solutions like simplex algorithm. But we will use a simpler method here.
Actually, the shortest path problem can be expressed as a linear programming problem as well. If
we denoate the shortest distance from s to v as , then for edge with cost , we
have . Then, for d satisfying all constraints, the max value of is the
shorstest distance from to . Note that it is the max value not min value (min value can be 0, i.e.
shorter than the actual edge costs).
In this way, each constraint in the original problem can be thought of as an edge in the graph, and
then we just need to find the shortest path. becomes , so an
edge from to with weight ; , so an edge from to weight ;
, so an edge from to with weight . To find the max value
of , we find the shortest distance between node 1 and N. Since there are negative
edges in the graph, we use Bellman-Ford instead of Dijkstra.
1 int N, ML, MD;
2 int AL[MAX_ML], BL[MAX_ML], DL[MAX_ML];
3 int AD[MAX_MD], BD[MAX_MD], DD[MAX_MD];
4
5 int d[MAX_N];
6
7 void solve(){
8 fill(d, d+N, INF);
9 d[0] = 0;
10
11 // Bellman-Ford
12 // run N iterations to detect neg cycles
13 for(int k=0; k<N; k++){
14 // i+1 to i: 0
15 for(int i=0; i+1<N; i++){
16 if(d[i+1] < INF) d[i]=min(d[i], d[i+1]);
17 }
18 // AL to BL: DL
19 for(int i=0; i<ML; i++){
20 if(d[AL[i]-1] < INF){
21 d[BL[i]-1] = min(d[BL[i]-1], d[AL[i]-1]+DL[i]);
22 }
23 }
24 // BD to AD: -DD
25 for(int i=0; i<MD; i++){
26 if(d[BD[i]-1] < INF){
27 d[AD[i]-1] = min(d[AD[i]-1], d[BD[i]-1]-DD[i]);
28 }
29 }
30 }
31
32 int res = d[N-1];
33 if(d[0] < 0){
34 // has neg cycles, no solution
35 res = -1;
36 }
37 else if(res==INF){
38 // res can be INF large
39 res = -2;
40 }
41 printf("%d\n", res);
42 }
Complexity: O( )