Preface

This book is from the original version Competitive Programmer's Handbook by Antti Laaksonen. In this Rust Version all the theoretical part will follow the Antii Laaksonen's version, except of course for all those parts that need to be adapted to the Rust context. All code parts are rewritten from scratch.

Contribution

Please if you think that some parts could be enhanced, don't hesitate to write an issue, each contribution is highly welcome!

Why this book?

Popular opinion says that Rust is not good for competitive programming but with a couple of adivice Rust could help you a lot. As fot the Antii Laaksonen version, this book is not a rust guide and it is supposed that you already know the basics of programming and the basics of Rust.

External Resources

Tools

Rust Competitive Helper: Helper for competitive programming in Rust, written by Egor and adapted to separate project by qwerty787788
Competitive Companion

Articles

Codeforces blog post - Using Rust for programming contests

Basic Techniques

Introduction

Competitive programming combines two topics:

the design of algorithms
the implementation of algorithms.

The design of algorithms consists of problem solving and mathematical thinking. Skills for analyzing problems and solving them creatively are needed. An algorithm for solving a problem has to be both correct and efficient, and the core of the problem is often about inventing an efficient algorithm.

Theoretical knowledge of algorithms is important to competitive programmers. Typically, a solution to a problem is a combination of well-known techniques and new insights. The techniques that appear in competitive programming also form the basis for the scientific research of algorithms.

The implementation of algorithms requires good programming skills. In competitive programming, the solutions are graded by testing an implemented algorithm using a set of test cases. Thus, it is not enough that the idea of the algorithm is correct, but the implementation also has to be correct.

A good coding style in contests is straightforward and concise. Programs should be written quickly, because there is not much time available. Unlike in traditional software engineering, the programs are short (usually at most a few hundred lines of code), and they do not need to be maintained after the contest.

Rust Competitive Helper

Rust Competitive Helper is a template for competitive programming in Rust, written by Egor and adapted as a separate project by qwerty787788.

The use of a template in competitive programming is very useful because it saves time for repetitive tasks such as reading input and writing output.

Rust competitive helper works well with Competitive Companion, guides and tutorial are avalaible in Readme.md of the relevant Github repositories. From now on it will be assumed that rust competitive helper is set up correctly.

The main function where we are going to solve the problem looks like this:

use algo_lib::io::input::Input;
use algo_lib::io::output::output;
use algo_lib::{out, out_line};

fn solve(input: &mut Input, _test_case: usize) {
    // solution comes here
}

pub(crate) fn run(mut input: Input) -> bool {
    // this part automatically changes based on problem typology:
    // (1) number of tests given
    // (2) one test only
    // (3) EOF
}

//START MAIN
mod tester;

fn main() {
    tester::run_tests();
}
//END MAIN

The code can be compiled using the following command

cargo build --bin name_of_the_problem_bin

The solution will be compiled in the main.rs file of the main package (not in the main.rs file of the compiled bin where solve() function is located)

Input and output

In most contests, standard streams are used for reading input and writing output. In Rust, the standard streams are std::io::Stdin for input and std::io::Stdout for output. The standard rust macros used for output are println!() and print!() but these two macros don't flush the stream at the end of the write, Rust Competitive Helper introduce two different macros to handle the output:

out_line!() to print the ouput with newline \n
out!() to print an inline output

The input for the program usually consists of numbers and strings separated by spaces or newlines. They can be read into the solve() function as follows:

fn solve(input: &mut Input, _test_case: usize) {
    let len_a = input.read::<usize>();
    let mut a = input.read_vec::<usize>(len_a);
}

The combination of these 2 functions always works, assuming that there is at least one space or newline between each element in the input. For example, the above code can read the following inputs:

4
1 4 3 2

and this piece of code:

fn solve(input: &mut Input, _test_case: usize) {
    let a = input.read::<usize>();
    let b = input.read::<usize>();
    let c = input.read::<String>();
}

can read both of the following inputs:

123 456 monkey

123 456
monkey

the input argument in solve() function is managed by Rust Competitive Helper, the possible cases are 3:

Single test: the input variable represent the only test
Number of test given: the input represent each single test, in this case the very first line (the number of tests indeed) is automatically read and used for reading loop but ignored in input variable
Read until EOF: the input represents each test but in this case the number of tests is unknown

Working with numbers

Integers

The most used integer type in competitive programming is usize, which size is how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes. This type is very common because it could be used as index for Arrays and Vectors. The following code defines a usize variable in multiple ways:

#![allow(unused)]
fn main() {
let x = 123456789123456789_usize;
let x: usize = 123456789123456789; 
}

The suffix _usize means that the type of the number is usize.

Modular arithmetic

We denote by $x mod m$ the remainder when $x$ is divided by $m$. For example, $17 mod 5 = 2$, because $17 = 3·5+2$. Sometimes, the answer to a problem is a very large number but it is enough to output it ”modulo m”, i.e., the remainder when the answer is divided by $m$ (for 6 example, ”modulo 109 + 7”). The idea is that even if the actual answer is very large, it suffices to use the types usize. An important property of the remainder is that in addition, subtraction and multiplication, the remainder can be taken before the operation: $$ (a + b) \mod m = (a \mod m + b \mod m) \mod m \ (a - b) \mod m = (a \mod m - b \mod m) \mod m \ (a \cdot b) \mod m = (a \mod m \cdot b \mod m) \mod m \ $$

Thus, we can take the remainder after every operation and the numbers will never become too large. For example, the following code calculates n!, the factorial of n, modulo m:

#![allow(unused)]
fn main() {
let n = 5;
let m = 9;
let mut x = 1;
for i in (2..=n){
    x = (x*i)%m;
}
print!("if n = {n} and m = {m} the printed value is ");
println!("{}", x%m);
}

Usually we want the remainder to always be between $0...m-1$. However, in Rust and other languages, the remainder of a negative number is either zero or negative. An easy way to make sure there are no negative remainders is to first calculate the remainder as usual and then add $m$ if the result is negative:

x = x%m;
if x<0 {x += m};

However, this is only needed when there are subtractions in the code and the remainder may become negative.

Floating point numbers

The usual floating point types in competitive programming are the 64-bit f64. The required precision of the answer is usually given in the problem statement. An easy way to output the answer is to use the function and give the number of decimal places in the formatting string. For example, the following code prints the value of x with 9 decimal places:

println!("x:.9");

A difficulty when using floating point numbers is that some numbers cannot be represented accurately as floating point numbers, and there will be rounding errors. For example, the result of the following code is surprising:

#![allow(unused)]
fn main() {
let x: f64 = 0.3*3.0+0.1;
println!("{x}"); // 0.9999999999999999
}

Due to a rounding error, the value of $x$ is a bit smaller than 1, while the correct value would be 1.

note that

#![allow(unused)]
fn main() {
let x: f64 = 0.3 * 3 * 0.1;
//_________________^__it must be 3.0
}

fails to compile because 3 without the dot is not cast in f64

It is risky to compare floating point numbers with the == operator, because it is possible that the values should be equal but they are not because of precision errors. A better way to compare floating point numbers is to assume that two numbers are equal if the difference between them is less than $\epsilon$, where $\epsilon$ is a small number. In practice, the numbers can be compared as follows ($\epsilon = 10^{-9}$):

if (num::abs(a-b) < 1e-9){
    //a and b are aproximately equal
}

Shortening code

Short code is ideal in competitive programming, because programs should be written as fast as possible. Because of this, competitive programmers often define shorter names for datatypes and other parts of code.

Type names

Using the command type it is possible to give a shorter name to a datatype. For example, the name usize should be considered long, so we can define a shorter name u:

#![allow(unused)]
fn main() {
type u = usize;
}

After this, the code

#![allow(unused)]
fn main() {
let a: usize = 123456789;
let b: usize = 987654321;
println!("{}", a*b);
}

can be shortened as follows:

#![allow(unused)]
fn main() {
type u = usize;
let a: u = 123456789;
let b: u = 987654321;
println!("{}", a*b);
}

The command type can also be used with more complex types. For example, the following code gives the name vu for a vector of usize and the name hm for a Hashmap with String as key and value.

#![allow(unused)]
fn main() {
type vu = Vec::<usize>;
type hm = std::collections::HashMap<String, String>;
}

Macros

Another way to shorten code is to define macros. A macro means that certain strings in the code will be changed before the compilation. In Rust, macros are not popular during the constests but are useful if you want to create or improve your custom template (i.e. out_line!()).

You can define a macro using macro_rules! macro:

#![allow(unused)]
fn main() {
macro_rules! rep {
    ($a:expr, $b:expr) => {
        for i in $a..= $b {
            println!("{}", i);
        }
    };
}
}

After this, the code:

#![allow(unused)]
fn main() {
for i in 1..=5{
    println!("{}", i)
}
}

can be shortened as follows:

#![allow(unused)]
fn main() {
macro_rules! rep {
   ($a:expr, $b:expr) => {
       for i in $a..= $b {
           println!("{}", i);
       }
   };
}
rep!(1, 5);
}

Mathematics

Mathematics plays an important role in competitive programming, and it is not possible to become a successful competitive programmer without having good mathematical skills. This section discusses some important mathematical concepts and formulas that are needed later in the book.

Sum formula

Each sum of the form

$$ \sum_{x=1}^n x^k = 1^k + 2^k+3^k+...+m^k, $$

where $k$ is a positive integer, has a closed-form formula that is a polynomial of degree $k + 1$. For example¹,

$$ \sum_{x=1}^n x = 1 + 2 + 3 + ... + n = \frac{n(n+1)}{2} $$ and $$ \sum_{x=1}^n x^2 = 1^2 + 2^2 + 3^2 + ... + n^2 = \frac{n(n+1)(2n+1)}{6} $$

Arithmetic progression

An arithmetic progression An arithmetic progression is a sequence of numbers where the difference between any two consecutive numbers is constant. For example: $$ 3, 7, 11, 15 $$ is an arithmetic progression with constant 4. The sum of an arithmetic progression can be calculated using the formula $$ \underbrace{a + \cdots + b}_{n ,, \textrm{numbers}} = \frac{n(a+b)}{2} $$

where a is the first number, b is the last number and n is the amount of numbers. For example, $$ 3+7+11+15=\frac{4 \cdot (3+15)}{2} = 36 $$

The formula is based on the fact that the sum consists of $n$ numbers and the value of each number is $(a+b)/2$ on average.

Geometric progression

A geometric progression is a sequence of numbers where the ratio between any two consecutive numbers is constant. For example, $$ 3,6,12,24 $$ is a geometric progression with constant 2. The sum of a geometric progression can be calculated using the formula $$ a + ak + ak^2 + \cdots + b = \frac{bk-a}{k-1} $$ where $a$ is the first number, $b$ is the last number and the ratio between consecutive numbers is $k$. For example, $$ 3+6+12+24=\frac{24 \cdot 2 - 3}{2-1} = 45 $$ This formula can be derived as follows. Let $$ S = a + ak + ak^2 + \cdots + b $$ By multiplying both sides by $k$, we get $$ kS = ak + ak^2 + ak^3 + \cdots + bk $$ and solving the equation $$ kS-S = bk-a $$ yields the formula.

A special case of a sum of a geometric progression is the formula $$ 1+2+4+8+\ldots+2^{n-1}=2^n-1 $$

Harmonic sum

A \key{harmonic sum} is a sum of the form $$ \sum_{x=1}^n \frac{1}{x} = 1+\frac{1}{2}+\frac{1}{3}+\ldots+\frac{1}{n} $$ An upper bound for a harmonic sum is $\log_2(n)+1$. Namely, we can modify each term $1/k$ so that $k$ becomes the nearest power of two that does not exceed $k$. For example, when $n=6$, we can estimate the sum as follows: $$ 1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\frac{1}{5}+\frac{1}{6} \le 1+\frac{1}{2}+\frac{1}{2}+\frac{1}{4}+\frac{1}{4}+\frac{1}{4} $$ This upper bound consists of $\log_2(n)+1$ parts ($1$, $2 \cdot 1/2$, $4 \cdot 1/4$, etc.), and the value of each part is at most 1.

Set theory

A set is a collection of elements. For example, the set $$ X={2,4,7} $$ contains elements 2, 4 and 7. The symbol $\emptyset$ denotes an empty set, and $|S|$ denotes the size of a set $S$, i.e., the number of elements in the set. For example, in the above set, $|X|=3$. If a set $S$ contains an element $x$, we write $x \in S$, and otherwise we write $x \notin S$. For example, in the above set $$ 4 \in X \hspace{10px}\textrm{and}\hspace{10px} 5 \notin X $$ New sets can be constructed using set operations:

The \key{intersection} $A \cap B$ consists of elements that are in both $A$ and $B$. For example, if $A={1,2,5}$ and $B={2,4}$, then $A \cap B = {2}$.
The \key{union} $A \cup B$ consists of elements that are in $A$ or $B$ or both. For example, if $A={3,7}$ and $B={2,3,8}$, then $A \cup B = {2,3,7,8}$.
The \key{complement} $\bar A$ consists of elements that are not in $A$. The interpretation of a complement depends on the \key{universal set}, which contains all possible elements. For example, if $A={1,2,5,7}$ and the universal set is ${1,2,\ldots,10}$, then $\bar A = {3,4,6,8,9,10}$.
The \key{difference} $A \setminus B = A \cap \bar B$ consists of elements that are in $A$ but not in $B$. Note that $B$ can contain elements that are not in $A$. For example, if $A={2,3,7,8}$ and $B={3,5,8}$, then $A \setminus B = {2,7}$.

If each element of $A$ also belongs to $S$, we say that $A$ is a \key{subset} of $S$, denoted by $A \subset S$. A set $S$ always has $2^{|S|}$ subsets, including the empty set. For example, the subsets of the set ${2,4,7}$ are $$ \emptyset, {2}, {4}, {7}, {2,4}, {2,7}, {4,7}, {2,4,7} $$

Some often used sets are $\mathbb{N}$ (natural numbers), $\mathbb{Z}$ (integers), $\mathbb{Q}$ (rational numbers) and $\mathbb{R}$ (real numbers). The set $\mathbb{N}$ can be defined in two ways, depending on the situation: either $\mathbb{N}={0,1,2,\ldots}$ or $\mathbb{N}={1,2,3,.\ldots}$ .

We can also construct a set using a rule of the form $$ {f(n) : n \in S} $$ where $f(n)$ is some function. This set contains all elements of the form $f(n)$, where $n$ is an element in $S$.

For example, the set $$ X={2n : n \in \mathbb{Z}} $$ contains all even integers.

Logic

The value of a logical expression is either true (1) or false (0). The most important logical operators are $\lnot$ (\key{negation}), $\land$ (\key{conjunction}), $\lor$ (\key{disjunction}), $\Rightarrow$ (\key{implication}) and $\Leftrightarrow$ (\key{equivalence}). The following table shows the meanings of these operators:

$A$	$B$	$\lnot A$	$\lnot B$	$A \land B$	$A \lor B$	$A \Rightarrow B$	$A \Leftrightarrow B$
0	0	1	1	0	0	1	1
0	1	1	0	0	1	1	0
1	0	0	1	0	1	0	0
1	1	0	0	1	1	1	1

The expression $\lnot A$ has the opposite value of $A$. The expression $A \land B$ is true if both $A$ and $B$ are true, and the expression $A \lor B$ is true if $A$ or $B$ or both are true. The expression $A \Rightarrow B$ is true if whenever $A$ is true, also $B$ is true. The expression $A \Leftrightarrow B$ is true if $A$ and $B$ are both true or both false.

Predicate

A predicate is an expression that is true or false depending on its parameters. Predicates are usually denoted by capital letters. For example, we can define a predicate $P(x)$ that is true exactly when $x$ is a prime number. Using this definition, $P(7)$ is true but $P(8)$ is false.

Quantifier

A quantifier connects a logical expression to the elements of a set. The most important quantifiers are $\forall$ (\key{for all}) and $\exists$ (\key{there is}). For example, $$ \forall x (\exists y (y < x)) $$ means that for each element $x$ in the set, there is an element $y$ in the set such that $y$ is smaller than $x$. This is true in the set of integers, but false in the set of natural numbers.

Using the notation described above, we can express many kinds of logical propositions. For example, $$ \forall x ((x>1 \land \lnot P(x)) \Rightarrow (\exists a (\exists b (a > 1 \land b > 1 \land x = ab)))) $$ means that if a number $x$ is larger than 1 and not a prime number, then there are numbers $a$ and $b$ that are larger than $1$ and whose product is $x$. This proposition is true in the set of integers.

Functions

The function $\lfloor x \rfloor$ rounds the number $x$ down to an integer, and the function $\lceil x \rceil$ rounds the number $x$ up to an integer. For example, $$ \lfloor 3/2 \rfloor = 1 \hspace{10px} \textrm{and} \hspace{10px} \lceil 3/2 \rceil = 2 $$

The functions $\min(x_1,x_2,\ldots,x_n)$ and $\max(x_1,x_2,\ldots,x_n)$ give the smallest and largest of values $x_1,x_2,\ldots,x_n$. For example, $$ \min(1,2,3)=1 \hspace{10px} \textrm{and} \hspace{10px} \max(1,2,3)=3 $$

Factorial

The \key{factorial} $n!$ can be defined $$ \prod_{x=1}^n x = 1 \cdot 2 \cdot 3 \cdot \ldots \cdot n $$ or recursively $$ \begin{array}{lcl} 0! & = & 1 \ n! & = & n \cdot (n-1)! \ \end{array} $$

Fibonacci

The Fibonacci numbers² arise in many situations. They can be defined recursively as follows: $$ \begin{array}{lcl} f(0) & = & 0 \ f(1) & = & 1 \ f(n) & = & f(n-1)+f(n-2) \ \end{array} $$ The first Fibonacci numbers are $$ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, \ldots $$ There is also a closed-form formula for calculating Fibonacci numbers, which is sometimes called Binet's formula: $$ f(n)=\frac{(1 + \sqrt{5})^n - (1-\sqrt{5})^n}{2^n \sqrt{5}} $$

Logarithms

The \key{logarithm} of a number $x$ is denoted $\log_k(x)$, where $k$ is the base of the logarithm. According to the definition, $\log_k(x)=a$ exactly when $k^a=x$.

A useful property of logarithms is that $\log_k(x)$ equals the number of times we have to divide $x$ by $k$ before we reach the number 1. For example, $\log_2(32)=5$ because 5 divisions by 2 are needed:

$$ 32 \rightarrow 16 \rightarrow 8 \rightarrow 4 \rightarrow 2 \rightarrow 1 $$

Logarithms are often used in the analysis of algorithms, because many efficient algorithms halve something at each step. Hence, we can estimate the efficiency of such algorithms using logarithms.

The logarithm of a product is $$ \log_k(ab) = \log_k(a)+\log_k(b $$ and consequently, $$ \log_k(x^n) = n \cdot \log_k(x) $$ In addition, the logarithm of a quotient is $$ \log_k\Big(\frac{a}{b}\Big) = \log_k(a)-\log_k(b) $$ Another useful formula is $$ \log_u(x) = \frac{\log_k(x)}{\log_k(u)} $$ and using this, it is possible to calculate logarithms to any base if there is a way to calculate logarithms to some fixed base.

Natural logarithm

The natural logarithm $\ln(x)$ of a number $x$ is a logarithm whose base is $e \approx 2.71828$. Another property of logarithms is that the number of digits of an integer $x$ in base $b$ is $\lfloor \log_b(x)+1 \rfloor$. For example, the representation of $123$ in base $2$ is $1111011$ and $\lfloor \log_2(123)+1 \rfloor = 7$.

¹ There is even a general formula for such sums, called Faulhaber’s formula, but it is too complex to be presented here.

² Fibonacci (c. 1175--1250) was an Italian mathematician.

Contests and resources

IOI

The International Olympiad in Informatics (IOI) is an annual programming contest for secondary school students. Each country is allowed to send a team of four students to the contest. There are usually about 300 participants from 80 countries.

The IOI consists of two five-hour long contests. In both contests, the participants are asked to solve three algorithm tasks of various difficulty. The tasks are divided into subtasks, each of which has an assigned score. Even if the contestants are divided into teams, they compete as individuals.

The IOI syllabus [41] regulates the topics that may appear in IOI tasks. Almost all the topics in the IOI syllabus are covered by this book.

Participants for the IOI are selected through national contests. Before the IOI, many regional contests are organized, such as the Baltic Olympiad in Informatics (BOI), the Central European Olympiad in Informatics (CEOI) and the Asia-Pacific Informatics Olympiad (APIO).

Some countries organize online practice contests for future IOI participants, such as the Croatian Open Competition in Informatics [11] and the USA Computing Olympiad[68]. In addition, a large collection of problems from Polish contests is available online [60].

ICPC

The International Collegiate Programming Contest (ICPC) is an annual programming contest for university students. Each team in the contest consists of three students, and unlike in the IOI, the students work together; there is only one computer available for each team.

The ICPC consists of several stages, and finally the best teams are invited to the World Finals. While there are tens of thousands of participants in the contest, there are only a small number¹, so even advancing to the finals is a great achievement in some regions.

In each ICPC contest, the teams have five hours of time to solve about ten algorithm problems. A solution to a problem is accepted only if it solves all test cases efficiently. During the contest, competitors may view the results of other teams, but for the last hour the scoreboard is frozen and it is not possible to see the results of the last submissions.

The topics that may appear at the ICPC are not so well specified as those at the IOI. In any case, it is clear that more knowledge is needed at the ICPC, especially more mathematical skills.

Online contests

There are also many online contests that are open for everybody. At the moment, the most active contest site is Codeforces, which organizes contests about weekly. In Codeforces, participants are divided into two divisions: beginners compete in Div2 and more experienced programmers in Div1. Other contest sites include AtCoder, CS Academy, HackerRank and Topcoder.

Some companies organize online contests with onsite finals. Of course, companies also use those contests for recruiting: performing well in a contest is a good way to prove one's skills.

Books

There are already some books (besides this book) that focus on competitive programming and algorithmic problem solving:

S. S. Skiena and M. A. Revilla: Programming Challenges: The Programming Contest Training Manual [59]
S. Halim and F. Halim: Competitive Programming 3: The New Lower Bound of Programming Contests [33]
K. Diks et al.: Looking for a Challenge? The Ultimate Problem Set from the University of Warsaw Programming Competitions [15]

The first two books are intended for beginners, whereas the last book contains advanced material.

Of course, general algorithm books are also suitable for competitive programmers. Some popular books are:

T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein: Introduction to Algorithms [13]
J. Kleinberg and É. Tardos: Algorithm Design [45]
S. S. Skiena: The Algorithm Design Manual [58]

¹ The exact number of final slots varies from year to year; in 2017, there were 133 final slots. of final slots available

Time complexity

The efficiency of algorithms is important in competitive programming. Usually, it is easy to design an algorithm that solves the problem slowly, but the real challenge is to invent a fast algorithm. If the algorithm is too slow, it will get only partial points or no points at all. The time complexity of an algorithm estimates how much time the algorithm will use for some input. The idea is to represent the efficiency as a function whose parameter is the size of the input. By calculating the time complexity, we can find out whether the algorithm is fast enough without implementing it.

Calculation rules

The time complexity of an algorithm is denoted $O(\cdots)$ where the three dots represent some function. Usually, the variable $n$ denotes the input size. For example, if the input is an array of numbers, $n$ will be the size of the array, and if the input is a string, $n$ will be the length of the string.

Loops

A common reason why an algorithm is slow is that it contains many loops that go through the input. The more nested loops the algorithm contains, the slower it is. If there are $k$ nested loops, the time complexity is $O(n^k)$.

For example, the time complexity of the following code is $O(n)$:

#![allow(unused)]
fn main() {
let n = 10;
for i in 1..=n {
    // code
}
}

And the time complexity of the following code is $O(n^2)$:

#![allow(unused)]
fn main() {
let n = 10;
for i in 1..=n {
    for j in 1..=n {
        // code
    }
}
}

Order of magnitude

A time complexity does not tell us the exact number of times the code inside a loop is executed, but it only shows the order of magnitude. In the following examples, the code inside the loop is executed $3n$, $n+5$ and $\lceil n/2 \rceil$ times, but the time complexity of each code is $O(n)$.

#![allow(unused)]
fn main() {
let n = 10;
for i in 1..=n {
    // code
}
for i in 0..=n {
    // code
}
}

As another example, the time complexity of the following code is $O(n^2)$:

#![allow(unused)]
fn main() {
let n = 10;
 for i in 1..=n {
    for j in i+1..=n {
        // code
    }
}
}

Phases

If the algorithm consists of consecutive phases, the total time complexity is the largest time complexity of a single phase. The reason for this is that the slowest phase is usually the bottleneck of the code.

For example, the following code consists of three phases with time complexities $O(n)$, $O(n^2)$ and $O(n)$. Thus, the total time complexity is $O(n^2)$.

#![allow(unused)]
fn main() {
let n = 10;
for i in 1..=n {
    // code
}
for i in 1..=n {
    for j in 1..=n {
        // code
    }
}
for i in 1..=n {
    // code
}
}

Several Variables

Sometimes the time complexity depends on several factors. In this case, the time complexity formula contains several variables.

For example, the time complexity of the following code is $O(nm)$:

#![allow(unused)]
fn main() {
let n = 10; let m = 10;
for i in 1..=n {
    for j in 1..=m {
        // code
    }
}
}

Recursion

The time complexity of a recursive function depends on the number of times the function is called and the time complexity of a single call. The total time complexity is the product of these values.

For example, consider the following function:

#![allow(unused)]
fn main() {
fn f(n: i32){
    if n == 1 {return}
    f(n-1);
}
}

The call $\texttt{f}(n)$ causes $n$ function calls, and the time complexity of each call is $O(1)$. Thus, the total time complexity is $O(n)$.

As another example, consider the following function:

#![allow(unused)]
fn main() {
fn g(n: i32){
    if n == 1 {return}
    g(n-1);
    g(n-1);
}
}

In this case each function call generates two other calls, except for $n=1$. Let us see what happens when $g$ is called with parameter $n$. The following table shows the function calls produced by this single call:

function call	number of calls
$g(n)$	1
$g(n-1)$	2
$g(n-2)$	4
$\cdots$	$\cdots$
$g(1)$	$2^{n-1}$

Based on this, the time complexity is $$ 1+2+4+\cdots+2^{n-1} = 2^n-1 = O(2^n) $$

Time complexity

Calculation rules

Loops

For example, the time complexity of the following code is $O(n)$:

#![allow(unused)]
fn main() {
let n = 10;
for i in 1..=n {
    // code
}
}

And the time complexity of the following code is $O(n^2)$:

#![allow(unused)]
fn main() {
let n = 10;
for i in 1..=n {
    for j in 1..=n {
        // code
    }
}
}

Order of magnitude

#![allow(unused)]
fn main() {
let n = 10;
for i in 1..=n {
    // code
}
for i in 0..=n {
    // code
}
}

As another example, the time complexity of the following code is $O(n^2)$:

#![allow(unused)]
fn main() {
let n = 10;
 for i in 1..=n {
    for j in i+1..=n {
        // code
    }
}
}

Phases

For example, the following code consists of three phases with time complexities $O(n)$, $O(n^2)$ and $O(n)$. Thus, the total time complexity is $O(n^2)$.

#![allow(unused)]
fn main() {
let n = 10;
for i in 1..=n {
    // code
}
for i in 1..=n {
    for j in 1..=n {
        // code
    }
}
for i in 1..=n {
    // code
}
}

Several Variables

Sometimes the time complexity depends on several factors. In this case, the time complexity formula contains several variables.

For example, the time complexity of the following code is $O(nm)$:

#![allow(unused)]
fn main() {
let n = 10; let m = 10;
for i in 1..=n {
    for j in 1..=m {
        // code
    }
}
}

Recursion

The time complexity of a recursive function depends on the number of times the function is called and the time complexity of a single call. The total time complexity is the product of these values.

For example, consider the following function:

#![allow(unused)]
fn main() {
fn f(n: i32){
    if n == 1 {return}
    f(n-1);
}
}

The call $\texttt{f}(n)$ causes $n$ function calls, and the time complexity of each call is $O(1)$. Thus, the total time complexity is $O(n)$.

As another example, consider the following function:

#![allow(unused)]
fn main() {
fn g(n: i32){
    if n == 1 {return}
    g(n-1);
    g(n-1);
}
}

function call	number of calls
$g(n)$	1
$g(n-1)$	2
$g(n-2)$	4
$\cdots$	$\cdots$
$g(1)$	$2^{n-1}$

Based on this, the time complexity is $$ 1+2+4+\cdots+2^{n-1} = 2^n-1 = O(2^n) $$

Complexity classes

The following list contains common time complexities of algorithms:

Complexity	Description
$O(1)$	Constant-time algorithm: The running time of a constant-time algorithm does not depend on the input size. A typical constant-time algorithm is a direct formula that calculates the answer.
$O(\log n)$	Logarithmic algorithm: often halves the input size at each step. The running time of such an algorithm is logarithmic, because $\log_2 n$ equals the number of times $n$ must be divided by 2 to get 1.
$O(\sqrt n)$	Square root algorithm: is slower than $O(\log n)$ but faster than $O(n)$. A special property of square roots is that $\sqrt n = n/\sqrt n$, so the square root $\sqrt n$ lies, in some sense, in the middle of the input.
$O(n)$	Linear algorithm: it goes through the input a constant number of times. This is often the best possible time complexity, because it is usually necessary to access each input element at least once before reporting the answer.
$O(n \log n)$	Linearithmic algorithm: This time complexity often indicates that the algorithm sorts the input, because the time complexity of efficient sorting algorithms is $O(n \log n)$. Another possibility is that the algorithm uses a data structure where each operation takes $O(\log n)$ time.
$O(n^2)$	Quadratic algorithm: a quadratic algorithm often contains two nested loops. It is possible to go through all pairs of the input elements in $O(n^2)$ time.
$O(n^3)$	Cubic algorithm: it often contains three nested loops. It is possible to go through all triplets of the input elements in $O(n^3)$ time.
$O(2^n)$	Exponential algorithm: This time complexity often indicates that the algorithm iterates through all subsets of the input elements. For example, the subsets of ${1,2,3}$ are $\emptyset$, ${1}$, ${2}$, ${3}$, ${1,2}$, ${1,3}$, ${2,3}$ and ${1,2,3}$.
$O(n!)$	Factorial algorithm: This time complexity often indicates that the algorithm iterates through all permutations of the input elements. For example, the permutations of ${1,2,3}$ are $(1,2,3)$, $(1,3,2)$, $(2,1,3)$, $(2,3,1)$, $(3,1,2)$ and $(3,2,1)$.

An algorithm is polynomial if its time complexity is at most $O(n^k)$ where $k$ is a constant. All the above time complexities except $O(2^n)$ and $O(n!)$ are polynomial. In practice, the constant $k$ is usually small, and therefore a polynomial time complexity roughly means that the algorithm is efficient.

Most algorithms in this book are polynomial. Still, there are many important problems for which no polynomial algorithm is known, i.e., nobody knows how to solve them efficiently. \key{NP-hard} problems are an important set of problems, for which no polynomial algorithm is known¹.

¹ A classic book on the topic is M. R. Garey's and D. S. Johnson's Computers and Intractability: A Guide to the Theory of NP-Completeness [28].

Estimating efficiency

By calculating the time complexity of an algorithm, it is possible to check, before implementing the algorithm, that it is efficient enough for the problem. The starting point for estimations is the fact that a modern computer can perform some hundreds of millions of operations in a second.

For example, assume that the time limit for a problem is one second and the input size is $n=10^5$. If the time complexity is $O(n^2)$, the algorithm will perform about $(10^5)^2=10^{10}$ operations. This should take at least some tens of seconds, so the algorithm seems to be too slow for solving the problem.

On the other hand, given the input size, we can try to guess the required time complexity of the algorithm that solves the problem. The following table contains some useful estimates assuming a time limit of one second.

input size	reuired time complexity
$n \le 10$	$O(n!)$
$n \le 20$	$O(2^n)$
$n \le 500$	$O(n^3)$
$n \le 5000$	$O(n^2)$
$n \le 10^6$	$O(n \log n)$ or $O(n)$
$n$ is large	$O(1)$ or $O(\log n)$

For example, if the input size is $n=10^5$, it is probably expected that the time complexity of the algorithm is $O(n)$ or $O(n \log n)$. This information makes it easier to design the algorithm, because it rules out approaches that would yield an algorithm with a worse time complexity.

Still, it is important to remember that a time complexity is only an estimate of efficiency, because it hides the constant factors. For example, an algorithm that runs in $O(n)$ time may perform $n/2$ or $5n$ operations. This has an important effect on the actual running time of the algorithm.

Maximum subarray sum

There are often several possible algorithms for solving a problem such that their time complexities are different. This section discusses a classic problem that has a straightforward $O(n^3)$ solution. However, by designing a better algorithm, it is possible to solve the problem in $O(n^2)$ time and even in $O(n)$ time.

Given an array of $n$ numbers, our task is to calculate the maximum subarray sum, i.e., the largest possible sum of a sequence of consecutive values in the array¹.

The problem is interesting when there may be negative values in the array. For example, in the array

-1	2	4	-3	5	2	-5	2

the following subarray produces the maximum sum $10$:

---	2	4	-3	5	2	---	---

We assume that an empty subarray is allowed, so the maximum subarray sum is always at least $0$.

Algorithm 1

A straightforward way to solve the problem is to go through all possible subarrays, calculate the sum of values in each subarray and maintain the maximum sum. The following code implements this algorithm:

#![allow(unused)]
fn main() {
use std::cmp::max;
use std::time::Instant;
let array: Vec<i32> = vec![-1, 2, 4, -3, 5, 2, -5, 2];
let n = array.len();
let start = Instant::now();
let mut best = 0;
for a in 0..n{
    for b in a..n {
        let mut sum = 0;
        for k in a..b{
            sum += array[k];
        }
        best = max(best, sum);
    }
}
println!("{best}");
println!("Execution time: {:?}", start.elapsed());
}

The variables a and b fix the first and last index of the subarray, and the sum of values is calculated to the variable sum. The variable best contains the maximum sum found during the search.

The time complexity of the algorithm is $O(n^3)$, because it consists of three nested loops that go through the input.

Algorithm 2

It is easy to make Algorithm 1 more efficient by removing one loop from it. This is possible by calculating the sum at the same time when the right end of the subarray moves. The result is the following code:

#![allow(unused)]
fn main() {
use std::cmp::max;
use std::time::Instant;
let array: Vec<i32> = vec![-1, 2, 4, -3, 5, 2, -5, 2];
let n = array.len();
let start = Instant::now();
let mut best = 0;
for a in 0..n {
    let mut sum = 0;
    for b in a..n {
        sum += array[b];
        best = max(best,sum);
    }
}
println!("{best}");
println!("Execution time: {:?}", start.elapsed());
}

After this change, the time complexity is $O(n^2)$.

Algorithm 3

Surprisingly, it is possible to solve the problem in $O(n)$ time², which means that just one loop is enough. The idea is to calculate, for each array position, the maximum sum of a subarray that ends at that position. After this, the answer for the problem is the maximum of those sums.

Consider the subproblem of finding the maximum-sum subarray that ends at position $k$. There are two possibilities:

The subarray only contains the element at position $k$.
The subarray consists of a subarray that ends at position $k-1$, followed by the element at position $k$.

In the latter case, since we want to find a subarray with maximum sum, the subarray that ends at position $k-1$ should also have the maximum sum. Thus, we can solve the problem efficiently by calculating the maximum subarray sum for each ending position from left to right.

The following code implements the algorithm:

#![allow(unused)]
fn main() {
use std::cmp::max;
use std::time::Instant;
let array: Vec<i32> = vec![-1, 2, 4, -3, 5, 2, -5, 2];
let n = array.len();
let start = Instant::now();
let mut best = 0;
let mut sum = 0;
for k in 0..n {
    sum = max(array[k],sum+array[k]);
    best = max(best,sum);
}
println!("{best}");
println!("Execution time: {:?}", start.elapsed());
}

The algorithm only contains one loop that goes through the input, so the time complexity is $O(n)$. This is also the best possible time complexity, because any algorithm for the problem has to examine all array elements at least once.

Efficiency comparison

It is interesting to study how efficient algorithms are in practice. The following table shows the running times of the above algorithms for different values of $n$ on a modern computer.

In each test, the input was generated randomly. The time needed for reading the input was not measured.

array size $n$	Algorithm 1	Algorithm 2	Algorithm 3
$10^2$	$0.0$ s	$0.0$ s	$0.0$ s
$10^3$	$0.1$ s	$0.0$ s	$0.0$ s
$10^4$	> $10.0$ s	$0.1$ s	$0.0$ s
$10^5$	> $10.0$ s	$5.3$ s	$0.0$ s
$10^6$	> $10.0$ s	> $10.0$ s	$0.0$ s
$10^7$	> $10.0$ s	> $10.0$ s	$0.0$ s

The comparison shows that all algorithms are efficient when the input size is small, but larger inputs bring out remarkable differences in the running times of the algorithms. Algorithm 1 becomes slow when $n=10^4$, and Algorithm 2 becomes slow when $n=10^5$. Only Algorithm 3 is able to process even the largest inputs instantly.

¹ J. Bentley's book Programming Pearls [8] made the problem popular.

² In [8], this linear-time algorithm is attributed to J. B. Kadane, and the algorithm is sometimes called Kadane’s algorithm.

Sorting

Sorting is a fundamental algorithm design problem. Many efficient algorithms use sorting as a subroutine, because it is often easier to process data if the elements are in a sorted order.

For example, the problem does an array contain two equal elements? is easy to solve using sorting. If the array contains two equal elements, they will be next to each other after sorting, so it is easy to find them. Also, the problem what is the most frequent element in an array? can be solved similarly.

There are many algorithms for sorting, and they are also good examples of how to apply different algorithm design techniques. The efficient general sorting algorithms work in $O(n \log n)$ time, and many algorithms that use sorting as a subroutine also have this time complexity.

Sorting

The basic problem in sorting is as follows:

Given an array that contains $n$ elements, your task is to sort the elements in increasing order.

For example, the array

digraph {
    node [shape=record];
    a [label="1|3|8|2|9|2|5|6"];
}

will be as follows after sorting:

digraph {
    node [shape=record];
    a [label="1|2|2|3|5|6|8|9"];
}

$O(n^2)$ algorithms

Bubble sort

Simple algorithms for sorting an array work in $O(n^2)$ time. Such algorithms are short and usually consist of two nested loops. A famous $O(n^2)$ time sorting algorithm is bubble sort where the elements bubble in the array according to their values.

Bubble sort consists of $n$ rounds. On each round, the algorithm iterates through the elements of the array. Whenever two consecutive elements are found that are not in correct order, the algorithm swaps them. The algorithm can be implemented as follows:

#![allow(unused)]
fn main() {
use std::mem::swap;
let mut array = [1, 3, 8, 2, 9, 2, 5, 6];
let n = array.len();
println!("Before: {array:?}");
for i in 0..n {
    for j in 0..n-1 {
        if (array[j] > array[j+1]) {
            array.swap(j, j+1); //swap the j-th element with the j+1-th element
        }
    }
}
println!("After:  {array:?}");
}

After the first round of the algorithm, the largest element will be in the correct position, and in general, after $k$ rounds, the $k$ largest elements will be in the correct positions. Thus, after $n$ rounds, the whole array will be sorted.

For example, in the array

digraph {
    node [shape=record];
    a [label="1|3|8|2|9|2|5|6"];
}

the first round of bubble sort swaps elements as follows:

digraph {
    node [shape=record];
    a [label="1|3|<a>2|<b>8|9|2|5|6"];
    a:a:s -> a:b:s;
}

digraph {
    node [shape=record];
    a [label="1|3|2|8|<b>2|<a>9|5|6"];
    a:b:s -> a:a:s;
}

digraph {
    node [shape=record];
    a [label="1|3|2|8|2|<a>5|<b>6|9"];
    a:a:s -> a:b:s;
}

digraph {
    node [shape=record];
    array1 [label="1|3|2|8|2|5|<six>6|<nine>9"];
    array1:six:s -> array1:nine:s;
}

Inversion

Bubble sort is an example of a sorting algorithm that always swaps consecutive elements in the array. It turns out that the time complexity of such an algorithm is always at least $O(n^2)$, because in the worst case, $O(n^2)$ swaps are required for sorting the array.

A useful concept when analyzing sorting algorithms is an inversion: a pair of array elements (array[a],array[b]) such that a<b and array[a]>array[b], i.e., the elements are in the wrong order. For example, the array

digraph a {
    node [shape=record];
    array1 [label="1|2|2|6|3|5|9|8"];
}

has three inversions: $(6,3)$, $(6,5)$ and $(9,8)$. The number of inversions indicates how much work is needed to sort the array. An array is completely sorted when there are no inversions. On the other hand, if the array elements are in the reverse order, the number of inversions is the largest possible:

$$ 1+2+\cdots+(n-1)=\frac{n(n-1)}{2} = O(n^2) $$

Swapping a pair of consecutive elements that are in the wrong order removes exactly one inversion from the array. Hence, if a sorting algorithm can only swap consecutive elements, each swap removes at most one inversion, and the time complexity of the algorithm is at least $O(n^2)$.

$O(n \log n)$ algorithms

Merge sort

It is possible to sort an array efficiently in $O(n \log n)$ time using algorithms that are not limited to swapping consecutive elements. One such algorithm is merge sort which is based on recursion.

Merge sort sorts a subarray array[a..b] as follows:

If $a=b$, do not do anything, because the subarray is already sorted.
Calculate the position of the middle element: $k=\lfloor (a+b)/2 \rfloor$.
Recursively sort the subarray array[a..k].
Recursively sort the subarray array[k+1..b].
Merge the sorted subarrays array[a..k) and array(k+1..b] into a sorted subarray array[a..b].

Merge sort is an efficient algorithm, because it halves the size of the subarray at each step. The recursion consists of $O(\log n)$ levels, and processing each level takes $O(n)$ time. Merging the subarrays array[a..k] and array[k+1..b] is possible in linear time, because they are already sorted.

digraph a {
    node [shape=record];
    array1 [label="1|3|6|2|8|2|5|9"];
}

The array will be divided into two subarrays as follows:

digraph a {
    node [shape=record];
    a1[label="1|3|6|2"]; a2[label="8|2|5|9"]
}

Then, the subarrays will be sorted recursively as follows:

digraph a {
    node [shape=record];
    a1[label="1|2|3|6"]; a2[label="2|5|8|9"]
}

Finally, the algorithm merges the sorted subarrays and creates the final sorted array:

digraph a {
    node [shape=record];
    array1 [label="1|2|2|6|3|5|9|8"];
}

Sorting lower bound

Is it possible to sort an array faster than in $O(n \log n)$ time? It turns out that this is not possible when we restrict ourselves to sorting algorithms that are based on comparing array elements.

The lower bound for the time complexity can be proved by considering sorting as a process where each comparison of two elements gives more information about the contents of the array. The process creates the following tree:

digraph {
        a; b; c; d; e; f; g;
        a [label="x < y?"];
        b [label="x < y?"];
        c [label="x < y?"];
        d [label="x < y?"];
        e [label="x < y?"];
        f [label="x < y?"];
        g [label="x < y?"];

        a -> b; a -> c;
        b -> d; b -> e;
        c ->f c->g
}

Here $x<y?$ means that some elements $x$ and $y$ are compared. If $x<y$, the process continues to the left, and otherwise to the right. The results of the process are the possible ways to sort the array, a total of $n!$ ways. For this reason, the height of the tree must be at least $$ \log_2(n!) = \log_2(1)+\log_2(2)+\cdots+\log_2(n)$$ We get a lower bound for this sum by choosing the last $n/2$ elements and changing the value of each element to $\log_2(n/2)$. This yields an estimate $$ \log_2(n!) \ge (n/2) \cdot \log_2(n/2),$$ so the height of the tree and the minimum possible number of steps in a sorting algorithm in the worst case is at least $n \log n$.

Counting sort

The lower bound $n \log n$ does not apply to algorithms that do not compare array elements but use some other information. An example of such an algorithm is \key{counting sort} that sorts an array in $O(n)$ time assuming that every element in the array is an integer between $0 \ldots c$ and $c=O(n)$.

The algorithm creates a bookkeeping array, whose indices are elements of the original array. The algorithm iterates through the original array and calculates how many times each element appears in the array.

For example, the array

digraph a {
    node [shape=record];
    array1 [label="1|3|6|9|9|3|5|9"];
}

corresponds to the following bookkeeping array:

digraph a {
    node [shape=record];
    ranksep=0
    index [label="0|1|2|3|4|5|6|7|8" color=transparent];
    array [label="1|0|2|0|1|1|0|0|3"]; 
    index -> array [color=transparent]
}

For example, the value at position 3 in the bookkeeping array is 2, because the element 3 appears 2 times in the original array.

Construction of the bookkeeping array takes $O(n)$ time. After this, the sorted array can be created in $O(n)$ time because the number of occurrences of each element can be retrieved from the bookkeeping array. Thus, the total time complexity of counting sort is $O(n)$.

Counting sort is a very efficient algorithm but it can only be used when the constant $c$ is small enough, so that the array elements can be used as indices in the bookkeeping array.

¹ According to [47], merge sort was invented by J. von Neumann in 1945

Sorting in Rust

It is almost never a good idea to use a home-made sorting algorithm in a contest, because there are good implementations available in programming languages. For example, the Rust standard library contains the function sort() that can be easily used for sorting arrays and other data structures.

There are many benefits in using a library function. First, it saves time because there is no need to implement the function. Second, the library implementation is certainly correct and efficient: it is not probable that a home-made sorting function would be better.

In this section we will see how to use the Rust sort() function. The following code sorts a vector in increasing order:

#![allow(unused)]
fn main() {
let mut v = vec![4,2,5,3,5,8,3];
v.sort();
println!("{v:?}");
}

After the sorting, the contents of the vector will be [2,3,3,4,5,5,8]. The default sorting order is increasing, but a reverse order is possible as follows:

#![allow(unused)]
fn main() {
let mut v = vec![4,2,5,3,5,8,3];
v.sort();
v.reverse();
println!("{v:?}");
}

The following code sorts the string s:

#![allow(unused)]
fn main() {
let mut s = "monkey";
let mut s: Vec<_> = s.chars().collect();
s.sort();
println!("{}", s.iter().collect::<String>());
}

Sorting a string means that the characters of the string are sorted. For example, the string monkey becomes ekmnoy.

`Sort_unstable()`

The particularity of sort() function is that when two elements of the slice are equals then it doesn't reorder them. For this reason sort() function is considered stable. Current implementation of sort() is adaptive and try to use best algorithm for each case, it is currently based on iterative merge sort inspired by timsort¹. In most of case (except for the case when the slice is very short) it allocate half the size of the slice.

If the stability of the sorting algorithm is not a requirement you can use sort_unstable() that is tipically faster than the stable version. This function is in-place (it does not allocate) the algorithm is based on pattern-defeating-quicksort²

Comparison trait

The function sort requires that Ord, PartialOrd, PartialEq traits are defined for the data type of the elements to be sorted. When sorting, these traits will be used whenever it is necessary to find out the order of two elements.

Most Rust data types have a built-in implementation of these traits, and elements of those types can be sorted automatically. For example, numbers are sorted according to their values and strings are sorted in alphabetical order.

Tuples

Tuples ((T1, T2, ...)) are sorted primarily according to their first elements (.0). If the first elements of two tuples are equal, they are sorted according to their second elements (.1) and so on:

#![allow(unused)]
fn main() {
let mut v: Vec<(_,_,_)> = Vec::new();
v.push((1,5,3));
v.push((1,6,1));
v.push((1,5,4));
v.sort();
println!("{v:?}");
}

After this, the order of the pairs is $(1,5,3)$, $(1,5,4)$ and $(1,6,1)$.

User defined struct and enum

User-defined structs do not have traits Ord, PartialOrd, PartialEq implemented automatically. The traits should be implemented manually:

Ord implements the function cmp() that return Ordering enums to determine if an element is Less, Greater, or Equal to another.
PartialOrd implements the function partial_cmp() that returns Option<Ordering>.
PartialEq implements eq() that returns bool: true if compared elements are equal, false otherwise.

In most cases we don't want to implements these functions one by one because the structs the we create are composed by fields that follow standard Ordering, for this reason in Rust most traits are Derivable. Preappending #[derive(PartialEq, Eq, PartialOrd, Ord)] it will be produced a lexicographic ordering based on the top-to-bottom declaration order of the struct's memebers. On enums variants are ordered by their discriminant, by default discriminant is smallest for variants at the top and largest for variants at the bottom.

For example, the following struct P contains the x and y coordinates of a point. The traits are defined so that the points are sorted primarily by the x coordinate and secondarily by the y coordinate.

#![allow(unused)]
fn main() {
use std::cmp::Ordering;
#[derive(Debug)]
struct P {
    x: i32,
    y: i32,
};

impl Ord for P {
    fn cmp(&self, other: &Self) -> Ordering {
        self.x.cmp(&other.x).then(self.y.cmp(&other.y))
    }
}

impl PartialOrd for P {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

impl PartialEq for P {
    fn eq(&self, other: &Self) -> bool {
        self.x == other.x && self.y == other.y
    }
}

impl Eq for P {}

   let p1 = P { x: 3, y: 5 };
   let p2 = P { x: 1, y: 8 };
   let p3 = P { x: 3, y: 5 };
   println!("{:?} == {:?}: {}",p1, p2, p1 == p2);
   println!("{:?} == {:?}: {}",p1, p3, p1 == p3);
   println!("{:?} < {:?}: {}",p1, p2, p1 < p2);
   println!("{:?} > {:?}: {}",p1, p2, p1 > p2);
   println!("{:?} <= {:?}: {}",p1, p3, p1 <= p3);
   println!("{:?} >= {:?}: {}",p1, p3, p1 >= p3);
}

this should be easily replaced by:

#![allow(unused)]
fn main() {
use std::cmp::Ordering;
#[derive(Debug)]
#[derive(PartialEq, Eq, PartialOrd, Ord)]
struct P {
    x: i32,
    y: i32,
};
   let p1 = P { x: 3, y: 5 };
   let p2 = P { x: 1, y: 8 };
   let p3 = P { x: 3, y: 5 };
   println!("{:?} == {:?}: {}",p1, p2, p1 == p2);
   println!("{:?} == {:?}: {}",p1, p3, p1 == p3);
   println!("{:?} < {:?}: {}",p1, p2, p1 < p2);
   println!("{:?} > {:?}: {}",p1, p2, p1 > p2);
   println!("{:?} <= {:?}: {}",p1, p3, p1 <= p3);
   println!("{:?} >= {:?}: {}",p1, p3, p1 >= p3);
}

¹ Timsort is a hybrid, stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data https://en.wikipedia.org/wiki/Timsort

² the algorithm was developed by Orson Peters https://github.com/orlp/pdqsort

Binary search

A general method for searching for an element in an array is to use a for loop that iterates through the elements of the array. For example, the following code searches for an element $x$ in an array:

for i in 0..n {
    if (array[i] == x) {
        // x found at index i
    }
}

The time complexity of this approach is $O(n)$, because in the worst case, it is necessary to check all elements of the array. If the order of the elements is arbitrary, this is also the best possible approach, because there is no additional information available where in the array we should search for the element $x$.

However, if the array is sorted, the situation is different. In this case it is possible to perform the search much faster, because the order of the elements in the array guides the search. The following binary search algorithm efficiently searches for an element in a sorted array in $O(\log n)$ time.

Method 1

The usual way to implement binary search resembles looking for a word in a dictionary. The search maintains an active region in the array, which initially contains all array elements. Then, a number of steps is performed, each of which halves the size of the region.

At each step, the search checks the middle element of the active region. If the middle element is the target element, the search terminates. Otherwise, the search recursively continues to the left or right half of the region, depending on the value of the middle element.

The above idea can be implemented as follows:

#![allow(unused)]
fn main() {
let array = vec![1,3,4,6,7,8,9];
let x = 4;
let n = array.len();
let mut i = 0;
let mut a = 0;
let mut b = n-1;
println!("The full array is {array:?}");
println!("I search for {x:?}");
while a <= b {
    let k = (a+b)/2;
i += 1;
println!("Step {i}: I'm searching in {:?}", &array[a ..= b]);
    if (array[k] == x) {
        // x found at index k
println!("FOUND!");
    }
    if array[k] > x {b = k-1}
    else {a = k+1}
}
}

In this implementation, the active region is $a \ldots b$, and initially the region is $0 \ldots n-1$. The algorithm halves the size of the region at each step, so the time complexity is $O(\log n)$.

Method 2

An alternative method to implement binary search is based on an efficient way to iterate through the elements of the array. The idea is to make jumps and slow the speed when we get closer to the target element.

The search goes through the array from left to right, and the initial jump length is $n/2$. At each step, the jump length will be halved: first $n/4$, then $n/8$, $n/16$, etc., until finally the length is 1. After the jumps, either the target element has been found or we know that it does not appear in the array.

The following code implements the above idea:

#![allow(unused)]
fn main() {
let array = vec![1,3,4,6,7,8,9];
let x = 4;
let n = array.len();
println!("The full array is {array:?}");
println!("I search for {x:?}");
let mut k = 0;
let mut b = n/2;
while b >= 1{
println!("Current jump length is {b}");
    while k+b < n && array[k+b] <= x {k += b}
    b /= 2;
}
if (array[k] == x) {
    // x found at index k
println!("FOUND!");
}
}

During the search, the variable $b$ contains the current jump length. The time complexity of the algorithm is $O(\log n)$, because the code in the second while loop is performed at most twice for each jump length.

Rust functions

The Rust standard library contains the following functions that are based on binary search and work in logarithmic time:

binary_search returns a Result<usize, usize>, if it found the value it returns Ok(index) with the index of the element, if it does not found the element it return an Err(index) with the index where the element should be
binary_search_by it is similar to binary_search but it accept a comparator function as closure
binary_search_by_key it is like binary_search but it consider only one element of the data structure (the key), for example if a vector of tuples is sorted by its .0 element you can use this function to make a search on the same .0 key element

The functions assume that the array is sorted. For example, the following code finds out whether an array contains an element with value $x$:

#![allow(unused)]
fn main() {
let array = vec![1,3,4,6,7,8,9];
let x = 4;
let n = array.len();
println!("The full array is {array:?}");
println!("I search for {x:?}");
println!("___");
match array.binary_search(&x){
    Ok(k) => println!("{x} found at index {k}"),
    Err(k) => println!("{x} should be inserted at index {k}"),
}
}

Iterators

In rust, another very common solution to solve problems involving a search or if you want to perform operations on elements of a collection, or even filter it, is to use iterators.

Iterators are implemented as traits, which means that they are defined in terms of a set of methods that must be implemented by anu type that wants to be an iterator. The most common way to create an iterator is to use the iter() method on a collection type, such as a vector or a string. For example, the following code creates an iterator over the numbers 1, 2, 3, 4:

#![allow(unused)]
fn main() {
let numbers = vec![1,2,3,4];
let numbers_iterator = numbers.iter();
}

Iterator in Rust are very good in terms of performance:

They are lazy, so they don't actually iterate over the sequence of values until they are required to do so
They are composable, so they can be combined to create more complex iterators
They are generic, which means that they can be used with any type that implements the Iterator trait

The following code counts the number of elements whose value is $x$:

#![allow(unused)]
fn main() {
let array = vec![1,2,3,4,5,4,3,2,4,4,4,7];
let x = 4;
let count = array.iter().filter(|&el| el==&x).count();
println!("Array is {array:?}");
println!("x is {x}");
println!("___");
println!("{count}");
}

Finding the smallest solution

An important use for binary search is to find the position where the value of a function changes. Suppose that we wish to find the smallest value $k$ that is a valid solution for a problem. We are given a function ok(x) that returns true if x is a valid solution and false otherwise. In addition, we know that ok(x) is false when x<k and true when x >= k. The situation looks as follows:

$x$	0	1	$\cdots$	$k-1$	$k$	$k+1$	$\cdots$
$\texttt{ok}(x)$	`false`	`false`	$\cdots$	`false`	`true`	`true`	$\cdots$

Now, the value of $k$ can be found using binary search by:

#![allow(unused)]
fn main() {
use std::cmp::Ordering;
let array = vec![1,2,3,4,5,7,8, 9, 12, 13, 14, 15];
let mut okx = vec![false; array.len()];
okx.iter_mut().enumerate().for_each(|(i, e)| *e = ok(&array[i]));
let k = 7;
fn ok(x: &usize) -> bool{ *x >= 7 };
let ok = | x | {if ok(x) {return Ordering::Greater} Ordering::Less};
let k = array.binary_search_by(ok);
println!("Array is {array:?}");
println!("ok(x) is {okx:?}");
println!("___");
println!("k is {k:?}");
}

The search finds the largest value of $x$ for which ok(x) is false. To do so we can exploit a binary_search_by property: indeed the function returns a value also if the returned Result is Err, we create a simple closure that return an Ordering, from the moment that we want the very first true element, we can return Ordering::Less if ok(x) is true and Ordering::Greater if it is false. Thus, the next value $k=x+1$ is the smallest possible value for which $ok(k)$ is true.

The algorithm calls the function ok $O(\log z)$ times, so the total time complexity depends on the function ok. For example, if the function works in $O(n)$ time, the total time complexity is $O(n \log z)$.

Find the maximum value

Binary search can also be used to find the maximum value for a function that is first increasing and then decreasing. Our task is to find a position $k$ such that

$f(x) \le (x+1)$ when $x \le k$, and
$f(x) \ge (x+1)$ when $x \ge k$.

The idea is to use binary search for finding the largest value of $x$ for which $f(x)<f(x+1)$. This implies that $k=x+1$ because $f(x+1)>f(x+2)$. The following code implements the search:

#![allow(unused)]
fn main() {
use std::cmp::Ordering;
let array = vec![11,12,13,14,65,9,8, 7, 6, 5, 4, 3, 2, 1];
let f = | &x | { 
    let i = array.iter().position(|el| el == &x).unwrap();
    array[i].cmp(&array[i+1])
};
let k = array.binary_search_by(f);
println!("Array is {array:?}");
println!("___");
println!("k is {k:?}");
}

To better understand binary_search_by, you should just keep in mind that:

Ordering::Less in custom comparator function means that the final result is in the right half of the collection
Ordering::Greater in custom comparator function means that the final result is in the left half

The verbose version of the above function would be:

#![allow(unused)]
fn main() {
    use std::cmp::Ordering;
    let array = vec![11,12,13,14,65,9,8, 7, 6, 5, 4, 3, 2, 1];
    let f = | &x | { 
        let i = array.iter().position(|el| el == &x).unwrap();
        if array[i] < array[i+1]{
            return Ordering::Less //go in the right half
        } else {
            return Ordering::Greater //go in left half
        }
    };
    let k = array.binary_search_by(f);
    println!("Array is {array:?}");
    println!("___");
    println!("k is {k:?}");
}

Data structures

A data structure is a way to store data in the memory of a computer. It is important to choose an appropriate data structure for a problem, because each data structure has its own advantages and disadvantages. The crucial question is: which operations are efficient in the chosen data structure?

This chapter introduces the most important data structures in the Rust standard library. It is a good idea to use the standard library whenever possible, because it will save a lot of time.

Later in the book we will learn about more sophisticated data structures that are not available in the standard library.

Vectors

A Vector is an array whose size can be changed during the execution of the program.

The following code creates an empty vector and adds three elements to it:

#![allow(unused)]
fn main() {
let v = vec![1,2,3];
println!("{v:?}");
}

After this, the elements can be accessed like in an ordinary array:

#![allow(unused)]
fn main() {
let v = vec![1,2,3];
println!("{}", v[0]);
println!("{}", v[1]);
println!("{}", v[2]);
}

Access to an array element with direct access is not really idiomatic in Rust, it should be better to use std functions like .get(&index) that return a Result that could be managed. However, in competitive programming use direct access could be faster.

The function .len() returns the number of elements in the vector. The following code iterates through the vector and prints all elements in it:

#![allow(unused)]
fn main() {
let v = vec![1,2,3];
for i in 0..v.len(){
    println!("{}", v[i])
}
}

A shorter way to iterate through a vector is as follows:

#![allow(unused)]
fn main() {
let v = vec![1,2,3];
for x in v {
    println!("{x}");
}
}

The function .push() add an alement to the back of an existing Vec. The function .last() returns the last element in the Vec, and the function .pop() removes the last element:

#![allow(unused)]
fn main() {
let v = vec![5,2];
v.push(6);
println!("{v:?}");
println!("{:?}", v.last());
v.pop();
println!("{v:?}");
}

Another way to create a Vec is to give the number of elements and the initial value for each element:

#![allow(unused)]
fn main() {
// size 10, initial value 0
let v = vec![0;10]
println!("{v:?}");
}

#![allow(unused)]
fn main() {
// size 10, initial value 5
let v = vec![5;10]
println!("{v:?}");
}

The internal implementation of a vector uses a struct with three elements:

Pointer to the first data on the heap
length
capacity

The capacity indicates how much memory is reserved for the Vec. Vectors have O(1) indexing, amortized O(1) push (to the end) and O(1) pop (from the end).

Slices

In Rust, slices are a powerful concept that allows you to work with a portion of a sequence (such as an Vec, or String) without copying or transferring ownership of the data. Slices provide a safe and efficient way to reference a contiguous subsequence of elements.

Slice are composed of two parts:

A pointer to a chunk of memory
Length: a count of how many items are stored

A Slice is essentially a reference, unlike a Vec which stores its own elements. It points to elements in another collection and is commonly used when you need to work with a subset of elements from that collection.

Slices are typically created using the slicing syntax [start..end], where start is the index of the first element to include in the slice, and end is the index of the first element to exclude from the slice. Here's an example:

#![allow(unused)]
fn main() {
let data = [12, 21, 33, 41, 57];
let slice = &data[1..4];
println!("{:?}", slice); // [21, 33, 41]
}

You can also create slices from Vec as follow:

#![allow(unused)]
fn main() {
let v = vec![12, 21, 33, 41, 57];
let slice = &v[1..4];
println!("{:?}", slice); // [21, 33, 41]
}

Slice types

There are two main types of slices:

Immutable Slices (&[T]): These slices allow read-only access to the underlying data. You can't modify the elements through an immutable slice.
Mutable Slices (&mut [T]): Mutable slices allow both reading and writing to the underlying data. You can modify the elements through a mutable slice.

Set structures

A set is a data structure that maintains a collection of elements. The basic operations of sets are element insertion, search and removal.

The Rust standard library contains two set implementations: The structure BTreeSet is based on a balanced binary tree and its operations work in $O(\log n)$ time. The structure HashSet uses hashing, and its operations work in $O(1)$ time on average.

The choice of which set implementation to use is often a matter of taste. The benefit of the BTreeSet structure is that it maintains the order of the elements and provides functions that are not available in HashSet. On the other hand, HashSet can be more efficient.

The following code creates a set that contains integers, and shows some of the operations. The function .insert() adds an element to the set, the function .contains() returns the number of occurrences of an element in the set, and the function .remove removes an element from the set.

#![allow(unused)]
fn main() {
use std::collections::BTreeSet;
let mut s = BTreeSet::new();
s.insert(3);
s.insert(2);
s.insert(5);
println!("{:?}", s.contains(&3)); // true
println!("{:?}", s.contains(&4)); // false
s.remove(&3);
s.insert(4);
println!("{:?}", s.contains(&3)); // false
println!("{:?}", s.contains(&4)); // true
}

A set can be used mostly like a vector, but it is not possible to access the elements using the [] notation. The following code creates a set, prints the number of elements in it, and then iterates through all the elements:

#![allow(unused)]
fn main() {
use std::collections::BTreeSet;
let mut s = BTreeSet::from([1,2,3,4]);
println!("{}", s.len());
for x in s {
    println!("{x}");
}
}

An important property of sets is that all their elements are distinct.

Map structures

A map is a generalized array that consists of key-value-pairs. While the keys in an ordinary array are always the consecutive integers $0,1,\ldots,n-1$, where $n$ is the size of the array, the keys in a map can be of any data type and they do not have to be consecutive values.

The Rust standard library contains two map implementations that correspond to the set implementations: the structure BTreeMap is based on a balanced binary tree and accessing elements takes $O(\log n)$ time, while the structure HashMap uses hashing and accessing elements takes $O(1)$ time on average.

The following code creates a map where the keys are strings and the values are integers:

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;
let mut m = BTreeMap::new();
m.insert("monkey", 4);
m.insert("banana", 3);
m.insert("haspsichord", 9);
println!("{}", m["banana"]);
}

If the value of a key is requested but the map does not contain it, it panics. To prevent panic, .entry() function could be used, and associated with other cool methods it became very usefull. For example, in the following code, the key aybabtu with value 0 is added to the map.

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;
let mut m = BTreeMap::new();
m.entry("aybabtu").or_insert(0);
println!("{}", m["aybabtu"]);
}

The function .contains_key() checks if a key exists in a map, also .get() can be used to check if a key exists, the difference is that the first function retusn a bool, while the second returns a Option:

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;
let mut m = BTreeMap::new();
m.entry("aybabtu").or_insert(5);
println!("{:?}", m.contains_key("aybabtu")); //true
println!("{:?}", m.get("aybabtu")); //Some(5)
println!("{:?}", m.contains_key("piano")); //false
println!("{:?}", m.get("piano")); //None
}

The following code prints all the keys and values in a map, every element is a tuple so you can destructur it in for loop:

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;
let mut m = BTreeMap::new();
m.insert("monkey", 4);
m.insert("banana", 3);
m.insert("haspsichord", 9);
m.insert("aybabtu", 9);
for (i, x) in m{
    println!("{i}: {x}");
}

}

Iterators and ranges

As anticipated in previous chapters, many functions in the Rust standard library operate with iterators. In Rust an Iterator is a trait that implement a method, .next(), which when called, returns an Option<Item>.

When you call next it returns Some(Item) as long as there are alement, and None otherwise. Iterator trait has more that 70 others methods but they are built on top of next, so you can get them for free.

Working with ranges

Iterators are used in Rust standard library functions that are given a range of elements in a data structure. For example, the following code use .fold() that accept as parameters the starting point of the accumulator (0), accumulator (accumualator) and sum every single element (x) of the given range.

#![allow(unused)]
fn main() {
(0..10).fold(0, |accumulator, x| sum+x); //45
println!("{}", (0..10).fold(0, |accumulator, x| sum+x)); //45
}

This code iter on a every single element of the range and edit them accordingly to the closure, then it iter on the first $5$ element:

#![allow(unused)]
fn main() {
for x in (1..100).map(|x| x+2).take(5){
// 3 4 5 6 7
println!("{}", x);
}
}

Working with Iterators

Iterators are often used to access elements of a collection. The following code creates an iterator it from a BTreeSet:

#![allow(unused)]
fn main() {
use std::collections::BTreeSet;
let mut s = BTreeSet::new();
let it = s.iter();
}

There are a lot of method that allow you to iterate through the iterator and every type adapt each method to its structure. The following code return the max value of the previous BTreeSet:

#![allow(unused)]
fn main() {
use std::collections::BTreeSet;
let mut s = BTreeSet::from([1,2,3,4,5,56,7]);
let it = s.iter();
println!("{:?}", it.clone().max().unwrap());
it.max().unwrap();
}

function .find() accept a closure that return true or false, it applies the closure to each element of the iterator and if of them return true it return Some(element). It stops at the first true

#![allow(unused)]
fn main() {
let v = vec![1, 2, 2, 2, 5, 56, 7, 2];
println!("{:?}", v.iter().find(|&x| x==&2));
v.iter().find(|&x| x==&2); //Some(2)
}

The function .filter() filter all element of the iterator with the expression inside the closure that it accepts as argument. The following code returns the smallest element in the collection whose value is at least $x$:

#![allow(unused)]
fn main() {
let v = vec![1, 2, 2, 2, 5, 56, 7, 2];
println!("{:?}", v.iter().filter(|&el| el>&5).min().unwrap());
v.iter().filter(|&el| el>&5).min().unwrap(); // 7
}

You can play combinig methods to obtain particular results as for example the nearest element of a given value $x$:

#![allow(unused)]
fn main() {
let v = vec![1, 2, 2, 2, 5, 56, 7, 2];
let x = 5;
let near_up = v.iter().filter(|&el| el>&x).min().unwrap();
let near_down = v.iter().filter(|&el| el<&x).max().unwrap();
println!("{near_down} < x < {near_up}"); // 2 < x < 7
}

Other structures

VecDeque

A VecDeque is a double-ended queue implemented with a growable ring buffer. It provides the functions .push_back() and .pop_back(), but it also includes the functions .push_front() and .pop_front(). A VecDeque can be used as follow:

#![allow(unused)]
fn main() {
use std::collections::VecDeque;
let mut d = VecDeque::new();
d.push_back(5); // [5]
d.push_back(2); // [5,2]
d.push_front(3); // [3,5,2]
d.pop_back(); // [3,5]
d.pop_front(); // [5]
println!("d is {d:?}");
}

The internal implementation of a VecDeque is more complex than that of a vector, and for this reason, a deque is slower than a vector.

LinkedList

A LinkedList in Rust is a doubly-linked list data structure provided by the standard library. It is a linear data structure where each element (node) in the list contains two references, one pointing to the previous node and one pointing to the next node. This structure allows for efficient insertions and removals at both the beginning and end of the list.

Unlike Vec, LinkedList does not require reallocation when elements are added or removed, which can lead to more predictable performance characteristics in certain situations. However, it's important to note that random access to elements in a LinkedList is not as efficient as in Vec since it requires traversing the list from the beginning or end.

Here's an example of using LinkedList:

#![allow(unused)]
fn main() {
use std::collections::LinkedList;
let mut list = LinkedList::new();
list.push_front(1);
list.push_front(2);
list.push_front(3);
println!("{:?}", list.front()); // Some(3)
println!("{:?}", list.back()); // Some(1)
}

BinaryHeap

A Binary Heap maintains a set of elements. The supported operations are insertion, retrieval and removal of either the minimum or maximum element. Insertion and removal take $O(\log n)$ time, and retrieval takes $O(1)$ time.

While an ordered set efficiently supports all the operations of a priority queue, the benefit of using a priority queue is that it has smaller constant factors. A priority queue is usually implemented using a heap structure that is much simpler than a balanced binary tree used in an ordered set.

By default, the elements in a Rust priority queue are sorted in decreasing order, and it is possible to find and remove the largest element in the queue. The following code illustrates this:

#![allow(unused)]
fn main() {
use std::collections::BinaryHeap;
let mut heap = BinaryHeap::new();
heap.push(1);
heap.push(5);
heap.push(2);
println!("{:?}",heap.peek()); //Some(5)
heap.pop();
println!("{:?}",heap.pop()); // Some(2)
}

Comparison to sorting

It is often possible to solve a problem using either data structures or sorting. Sometimes there are remarkable differences in the actual efficiency of these approaches, which may be hidden in their time complexities.

Let us consider a problem where we are given two lists $A$ and $B$ that both contain $n$ elements. Our task is to calculate the number of elements that belong to both of the lists. For example, for the lists

$$ A = (5,2,8,9) \hspace{10px} \textrm{and} \hspace{10px} B = (3,2,9,5) $$

the answer is 3 because the numbers 2, 5 and 9 belong to both of the lists.

A straightforward solution to the problem is to go through all pairs of elements in $O(n^2)$ time, but next we will focus on more efficient algorithms.

Algorithm 1

We construct a set of the elements that appear in $A$, and after this, we iterate through the elements of $B$ and check for each elements if it also belongs to $A$. This is efficient because the elements of $A$ are in a set. Using the BTreeSet structure, the time complexity of the algorithm is $O(n \log n)$.

Algorithm 2

It is not necessary to maintain an ordered set, so instead of the BTreeSet structure we can also use the HashSet structure. This is an easy way to make the algorithm more efficient, because we only have to change the underlying data structure. The time complexity of the new algorithm is $O(n)$.

Algorithm 3

Instead of data structures, we can use sorting. First, we sort both lists $A$ and $B$. After this, we iterate through both the lists at the same time and find the common elements. The time complexity of sorting is $O(n \log n)$, and the rest of the algorithm works in $O(n)$ time, so the total time complexity is $O(n \log n)$.

Efficiency Comparison

The following table shows how efficient the above algorithms are when $n$ varies and the elements of the lists are random integers between $1 \ldots 10^9$:

$n$	Algorithm 1	Algorithm 2	Algorithm 3
$10^6$	$1.5$ s	$0.3$ s	$0.2$ s
$2 \cdot 10^6$	$3.7$ s	$0.8$ s	$0.3$ s
$3 \cdot 10^6$	$5.7$ s	$1.3$ s	$0.5$ s
$4 \cdot 10^6$	$7.7$ s	$1.7$ s	$0.7$ s
$5 \cdot 10^6$	$10.0$ s	$2.3$ s	$0.9$ s

Algorithms 1 and 2 are equal except that they use different set structures. In this problem, this choice has an important effect on the running time, because Algorithm 2 is 4–5 times faster than Algorithm 1.

However, the most efficient algorithm is Algorithm 3 which uses sorting. It only uses half the time compared to Algorithm 2. Interestingly, the time complexity of both Algorithm 1 and Algorithm 3 is $O(n \log n)$, but despite this, Algorithm 3 is ten times faster. This can be explained by the fact that sorting is a simple procedure and it is done only once at the beginning of Algorithm 3, and the rest of the algorithm works in linear time. On the other hand, Algorithm 1 maintains a complex balanced binary tree during the whole algorithm.

Complete search

Generating subsets

We first consider the problem of generating all subsets of a set of $n$ elements. For example, the subsets of ${0,1,2}$ are $\emptyset$, ${0}$, ${1}$, ${2}$, ${0,1}$, ${0,2}$, ${1,2}$ and ${0,1,2}$. There are two common methods to generate subsets: we can either perform a recursive search or exploit the bit representation of integers.

Method 1

An elegant way to go through all subsets of a set is to use recursion. The following function search generates the subsets of the set ${0,1,\ldots,n-1}$. The function maintains a vector subset that will contain the elements of each subset. The search begins when the function is called with parameter 0.

#![allow(unused)]
fn main() {
let mut subset = Vec::new();
let n = 10;
fn search(k: usize, subset: &mut Vec<usize>, n: usize) {
    if k == n{
        // process subset
println!("{:?}", &subset);
    } else {
        search(k+1, subset, n);
        subset.push(k);
        search(k+1, subset, n);
        subset.pop();
    }
}
search(0, &mut subset, n);
}

When the function search is called with parameter $k$, it decides whether to include the element $k$ in the subset or not, and in both cases, then calls itself with parameter $k+1$ However, if $k=n$, the function notices that all elements have been processed and a subset has been generated.

The following tree illustrates the function calls when $n=3$. We can always choose either the left branch ($k$ is not included in the subset) or the right branch ($k$ is included in the subset).

digraph graphname {
    node [color= transparent]
	a [label="search(0)"];
    b [label="search(1)"];
    c [label="search(1)"];
    d [label="search(2)"];
    e [label="search(2)"];
    f [label="search(2)"];
    g [label="search(2)"];
    h [label="search(3)"];
    i [label="search(3)"];
    l [label="search(3)"];
    m [label="search(3)"];
    n [label="search(3)"];
    o [label="search(3)"];
    p [label="search(3)"];
    q [label="search(3)"];

    a -> b;
    a -> c;
    b -> d;
    b -> e;
    c -> f;
    c -> g;
    d -> h;
    d -> i;
    e -> l;
    e -> m;
    f -> n;
    f -> o;
    g -> p;
    g -> q;
}

Method 2

Another way to generate subsets is based on the bit representation of integers. Each subset of a set of $n$ elements can be represented as a sequence of $n$ bits, which corresponds to an integer between $0 \ldots 2^n-1$. The ones in the bit sequence indicate which elements are included in the subset.

The usual convention is that the last bit corresponds to element 0, the second last bit corresponds to element 1, and so on. For example, the bit representation of 25 is 11001, which corresponds to the subset ${0,3,4}$.

The following code goes through the subsets of a set of $n$ elements

let mut b = 0;
while b < 1<<n{
    //process subset
    b += 1
}

The following code shows how we can find the elements of a subset that corresponds to a bit sequence. When processing each subset, the code builds a vector that contains the elements in the subset.

#![allow(unused)]
fn main() {
let n = 10;
let mut b = 0;
while b < 1<<n{
    let mut subset = Vec::new();
    for i in 0..n {
        if b & (1<<i) != 0 {
            subset.push(i);
        }
    }
println!("{subset:?}");
    b += 1;
}
}

that can be shortened as:

#![allow(unused)]
fn main() {
let n = 10;
for b in 0..1 << n{
    let subset: Vec<usize> = (0..n).filter(|&i| b&(1<<i)!=0).collect();
println!("{subset:?}");
}
}

Generating permutations

Next we consider the problem of generating all permutations of a set of $n$ elements. For example, the permutations of ${0,1,2}$ are $(0,1,2)$, $(0,2,1)$, $(1,0,2)$, $(1,2,0)$, $(2,0,1)$ and $(2,1,0)$.

Like subsets, permutations can be generated using recursion. The following function .search() goes through the permutations of the set ${0,1,\ldots,n-1}$. The function builds a vector permutation that contains the permutation, and the search begins when the function is called with parameters:

#![allow(unused)]
fn main() {
fn search(permutation: &mut Vec<usize>, n: usize, chosen: &mut Vec<bool>){
    if permutation.len() == n{
        // process permutation
        println!("{permutation:?}");
    } else {
        for i in 0..n{
            if chosen[i] {continue}
            chosen[i] = true;
            permutation.push(i);
            search(permutation, n, chosen);
            chosen[i] = false;
            permutation.pop();
        }
    }
}
let mut permutation = Vec::new();
let n = 4;
let mut chosen = vec![false;n];
println!("Permutation of {n}:");
search(&mut permutation, n, &mut chosen);
}

Backtracking

A backtracking algorithm begins with an empty solution and extends the solution step by step. The search recursively goes through all different ways how a solution can be constructed.

Queen problem

As an example, consider the problem of calculating the number of ways $n$ queens can be placed on an $n \times n$ chessboard so that no two queens attack each other. For example, when $n=4$, there are two possible solutions:

The problem can be solved using backtracking by placing queens to the board row by row. More precisely, exactly one queen will be placed on each row so that no queen attacks any of the queens placed before. A solution has been found when all $n$ queens have been placed on the board.

For example, when $n=4$, some partial solutions generated by the backtracking algorithm are as follows:

At the bottom level, the three first configurations are illegal, because the queens attack each other. However, the fourth configuration is valid and it can be extended to a complete solution by placing two more queens to the board. There is only one way to place the two remaining queens.

The algorithm can be implemented as follows:

#![allow(unused)]
fn main() {
let n = 4;
let mut count = 0;
let mut column = vec![false;n];
let mut diag1 = vec![false;2*n-1];
let mut diag2 = vec![false;2*n-1];
search(0, n, &mut count, &mut column, &mut diag1, &mut diag2);
println!{"{count}"};
fn search(y: usize, n: usize, count: &mut usize, column: &mut Vec<bool>, diag1: &mut Vec<bool>, diag2: &mut Vec<bool>) {
    if y == n {
        *count += 1;
        return;
    }
    for x in 0..n {
        if column[x] || diag1[x + y] || diag2[x.wrapping_sub(y).wrapping_add(n).wrapping_sub(1)] {
            continue;
        }
        column[x] = true;
        diag1[x + y] = true;
        diag2[x + n - y - 1] = true;
        search(y + 1, n, count, column, diag1, diag2);
        column[x] = false;
        diag1[x + y] = false;
        diag2[x + n - y - 1] = false;
    }
}
}

wrapping_sub is used to avoid that a negative number has used as usize

The search begins by calling search(0, n, &mut count, &mut column, &mut diag1, &mut diag2); The size of the board is $n \times n$, and the code calculates the number of solutions to count.

The code assumes that the rows and columns of the board are numbered from 0 to $n-1$. When the function search is called with parameter $y$, it places a queen on row $y$ and then calls itself with parameter $y+1$. Then, if $y=n$, a solution has been found and the variable count is increased by one.

The Vec column keeps track of columns that contain a queen, and the arrays diag1 and diag2 keep track of diagonals. It is not allowed to add another queen to a column or diagonal that already contains a queen. For example, the columns and diagonals of the $4 \times 4$ board are numbered as follows:

Let $q(n)$ denote the number of ways to place $n$ queens on an $n \times n$ chessboard. The above backtracking algorithm tells us that, for example, $q(8)=92$. When $n$ increases, the search quickly becomes slow, because the number of solutions increases exponentially. For example, calculating $q(16)=14772512$ using the above algorithm already takes about a minute on a modern computer¹

¹ There is no known way to efficiently calculate larger values of $q(n)$. The current record is $q(27)=234907967154122528$, calculated in 2016 [55].

Pruning the search

We can often optimize backtracking by pruning the search tree. The idea is to add intelligence to the algorithm so that it will notice as soon as possible if a partial solution cannot be extended to a complete solution. Such optimizations can have a tremendous effect on the efficiency of the search.

Let us consider the problem of calculating the number of paths in an $n \times n$ grid from the upper-left corner to the lower-right corner such that the path visits each square exactly once. For example, in a $7 \times 7$ grid, there are 111712 such paths. One of the paths is as follows:

We focus on the $7 \times 7$ case, because its level of difficulty is appropriate to our needs. We begin with a straightforward backtracking algorithm, and then optimize it step by step using observations of how the search can be pruned. After each optimization, we measure the running time of the algorithm and the number of recursive calls, so that we clearly see the effect of each optimization on the efficiency of the search.

Basic algorithm

The first version of the algorithm does not contain any optimizations. We simply use backtracking to generate all possible paths from the upper-left corner to the lower-right corner and count the number of such paths.

running time: 483 seconds
number of recursive calls: 76 billion

Optimization 1

In any solution, we first move one step down or right. There are always two paths that are symmetric about the diagonal of the grid after the first step. For example, the following paths are symmetric:

Hence, we can decide that we always first move one step down (or right), and finally multiply the number of solutions by two

running time: 244 seconds
number of recursive calls: 38 billion

Optimization 2

If the path reaches the lower-right square before it has visited all other squares of the grid, it is clear that it will not be possible to complete the solution. An example of this is the following path:

Using this observation, we can terminate the search immediately if we reach the lower-right square too early.

running time: 119 seconds
number of recursive calls: 20 billion

Optimization 3

If the path touches a wall and can turn either left or right, the grid splits into two parts that contain unvisited squares. For example, in the following situation, the path can turn either left or right:

In this case, we cannot visit all squares anymore, so we can terminate the search. This optimization is very useful:

running time: 1.8 seconds
number of recursive calls: 221 million

Optimization 4

The idea of Optimization 3 can be generalized: if the path cannot continue forward but can turn either left or right, the grid splits into two parts that both contain unvisited squares. For example, consider the following path:

It is clear that we cannot visit all squares anymore, so we can terminate the search. After this optimization, the search is very efficient:

running time: 0.6 seconds
number of recursive calls: 69 million

Now is a good moment to stop optimizing the algorithm and see what we have achieved. The running time of the original algorithm was 483 seconds, and now after the optimizations, the running time is only 0.6 seconds. Thus, the algorithm became nearly 1000 times faster after the optimizations.

This is a usual phenomenon in backtracking, because the search tree is usually large and even simple observations can effectively prune the search. Especially useful are optimizations that occur during the first steps of the algorithm, i.e., at the top of the search tree.

Meet in the middle

Meet in the middle is a technique where the search space is divided into two parts of about equal size. A separate search is performed for both of the parts, and finally the results of the searches are combined.

The technique can be used if there is an efficient way to combine the results of the searches. In such a situation, the two searches may require less time than one large search. Typically, we can turn a factor of $2^n$ into a factor of $2^{n/2}$ using the meet in the middle technique.

As an example, consider a problem where we are given a list of $n$ numbers and a number $x$, and we want to find out if it is possible to choose some numbers from the list so that their sum is $x$. For example, given the list $[2,4,5,9]$ and $x=15$, we can choose the numbers $[2,4,9]$ to get $2+4+9=15$. However, if $x=10$ for the same list, it is not possible to form the sum.

A simple algorithm to the problem is to go through all subsets of the elements and check if the sum of any of the subsets is $x$. The running time of such an algorithm is $O(2^n)$, because there are $2^n$ subsets. However, using the meet in the middle technique, we can achieve a more efficient $O(2^{n/2})$ time algorithm¹ Note that $O(2^n)$ and $O(2^{n/2})$ are different complexities because $2^{n/2}$ equals $\sqrt{2^n}$.

The idea is to divide the list into two lists $A$ and $B$ such that both lists contain about half of the numbers. The first search generates all subsets of $A$ and stores their sums to a list $S_A$. Correspondingly, the second search creates a list $S_B$ from $B$. After this, it suffices to check if it is possible to choose one element from $S_A$ and another element from $S_B$ such that their sum is $x$. This is possible exactly when there is a way to form the sum $x$ using the numbers of the original list.

For example, suppose that the list is $[2,4,5,9]$ and $x=15$. First, we divide the list into $A=[2,4]$ and $B=[5,9]$. After this, we create lists $S_A=[0,2,4,6]$ and $S_B=[0,5,9,14]$. In this case, the sum $x=15$ is possible to form, because $S_A$ contains the sum $6$, $S_B$ contains the sum $9$, and $6+9=15$. This corresponds to the solution $[2,4,9]$.

We can implement the algorithm so that its time complexity is $O(2^{n/2})$. First, we generate sorted lists $S_A$ and $S_B$, which can be done in $O(2^{n/2})$ time using a merge-like technique. After this, since the lists are sorted, we can check in $O(2^{n/2})$ time if the sum $x$ can be created from $S_A$ and $S_B$.

¹ This idea was introduced in 1974 by E. Horowitz and S. Sahni [39]

Greedy algorithms

A greedy algorithm constructs a solution to the problem by always making a choice that looks the best at the moment. A greedy algorithm never takes back its choices, but directly constructs the final solution. For this reason, greedy algorithms are usually very efficient.

The difficulty in designing greedy algorithms is to find a greedy strategy that always produces an optimal solution to the problem. The locally optimal choices in a greedy algorithm should also be globally optimal. It is often difficult to argue that a greedy algorithm works.

Coin problem

As a first example, we consider a problem where we are given a set of coins and our task is to form a sum of money $n$ using the coins. The values of the coins are $\texttt{coins}={c_1,c_2,\ldots,c_k}$, and each coin can be used as many times we want. What is the minimum number of coins needed?

For example, if the coins are the euro coins (in cents) $$ {1,2,5,10,20,50,100,200} $$ and $n=520$, we need at least four coins. The optimal solution is to select coins $200+200+100+20$ whose sum is 520.

Greedy alghoritm

A simple greedy algorithm to the problem always selects the largest possible coin, until the required sum of money has been constructed. This algorithm works in the example case, because we first select two 200 cent coins, then one 100 cent coin and finally one 20 cent coin. But does this algorithm always work?

It turns out that if the coins are the euro coins, the greedy algorithm always works, i.e., it always produces a solution with the fewest possible number of coins. The correctness of the algorithm can be shown as follows:

First, each coin 1, 5, 10, 50 and 100 appears at most once in an optimal solution, because if the solution would contain two such coins, we could replace them by one coin and obtain a better solution. For example, if the solution would contain coins $5+5$, we could replace them by coin $10$.

In the same way, coins 2 and 20 appear at most twice in an optimal solution, because we could replace coins $2+2+2$ by coins $5+1$ and coins $20+20+20$ by coins $50+10$. Moreover, an optimal solution cannot contain coins $2+2+1$ or $20+20+10$, because we could replace them by coins $5$ and $50$.

Using these observations, we can show for each coin $x$ that it is not possible to optimally construct a sum $x$ or any larger sum by only using coins that are smaller than $x$. For example, if $x=100$, the largest optimal sum using the smaller coins is $50+20+20+5+2+2=99$. Thus, the greedy algorithm that always selects the largest coin produces the optimal solution.

This example shows that it can be difficult to argue that a greedy algorithm works, even if the algorithm itself is simple.

General case

In the general case, the coin set can contain any coins and the greedy algorithm does not necessarily produce an optimal solution.

We can prove that a greedy algorithm does not work by showing a counterexample where the algorithm gives a wrong answer. In this problem we can easily find a counterexample: if the coins are ${1,3,4}$ and the target sum is 6, the greedy algorithm produces the solution $4+1+1$ while the optimal solution is $3+3$.

It is not known if the general coin problem can be solved using any greedy algorithm¹ However, as we will see in Chapter 7, in some cases, the general problem can be efficiently solved using a dynamic programming algorithm that always gives the correct answer.

¹ However, it is possible to \emph{check} in polynomial time if the greedy algorithm presented in this chapter works for a given set of coins [53]

Scheduling

Many scheduling problems can be solved using greedy algorithms. A classic problem is as follows: Given $n$ events with their starting and ending times, find a schedule that includes as many events as possible. It is not possible to select an event partially. For example, consider the following events:

event	starting time	ending time
A	1	3
B	2	5
C	3	9
D	6	8

In this case the maximum number of events is two. For example, we can select events $B$ and $D$ as follows:

It is possible to invent several greedy algorithms for the problem, but which of them works in every case?

Algorithm 1

The first idea is to select as short events as possible. In the example case this algorithm selects the following events:

However, selecting short events is not always a correct strategy. For example, the algorithm fails in the following case:

If we select the short event, we can only select one event. However, it would be possible to select both long events.

Algorithm 2

Another idea is to always select the next possible event that begins as early as possible. This algorithm selects the following events:

However, we can find a counterexample also for this algorithm. For example, in the following case, the algorithm only selects one event:

If we select the first event, it is not possible to select any other events. However, it would be possible to select the other two events.

Algorithm 3

The third idea is to always select the next possible event that ends as early as possible. This algorithm selects the following events:

It turns out that this algorithm always produces an optimal solution. The reason for this is that it is always an optimal choice to first select an event that ends as early as possible. After this, it is an optimal choice to select the next event using the same strategy, etc., until we cannot select any more events. One way to argue that the algorithm works is to consider what happens if we first select an event that ends later than the event that ends as early as possible. Now, we will have at most an equal number of choices how we can select the next event. Hence, selecting an event that ends later can never yield a better solution, and the greedy algorithm is correct.

Tasks and deadlines

Let us now consider a problem where we are given $n$ tasks with durations and deadlines and our task is to choose an order to perform the tasks. For each task, we earn $d-x$ points where $d$ is the task's deadline and $x$ is the moment when we finish the task. What is the largest possible total score we can obtain?

For example, suppose that the tasks are as follows:

task	duration	deadline
A	4	2
B	3	5
C	2	7
D	4	5

In this case, an optimal schedule for the tasks is as follows:

In this solution, $C$ yields 5 points, $B$ yields 0 points, $A$ yields $-7$ points and $D$ yields $-8$ points, so the total score is $-10$.

Surprisingly, the optimal solution to the problem does not depend on the deadlines at all, but a correct greedy strategy is to simply perform the tasks sorted by their durations in increasing order. The reason for this is that if we ever perform two tasks one after another such that the first task takes longer than the second task, we can obtain a better solution if we swap the tasks. For example, consider the following schedule:

Here $a>b$, so we should swap the tasks:

Now $X$ gives $b$ points less and $Y$ gives $a$ points more, so the total score increases by $a-b > 0$. In an optimal solution, for any two consecutive tasks, it must hold that the shorter task comes before the longer task. Thus, the tasks must be performed sorted by their durations.

Minimizing sums

We next consider a problem where we are given $n$ numbers $a_1,a_2,\ldots,a_n$ and our task is to find a value $x$ that minimizes the sum

$$ |a_1-x|^c+|a_2-x|^c+\cdots+|a_n-x|^c $$

We focus on the cases $c=1$ and $c=2$.

Case $c=1$

In this case, we should minimize the sum

$$ |a_1-x|+|a_2-x|+\cdots+|a_n-x| $$

For example, if the numbers are $[1,2,9,2,6]$, the best solution is to select $x=2$ which produces the sum

$$ |1-2|+|2-2|+|9-2|+|2-2|+|6-2|=12. $$

In the general case, the best choice for $x$ is the median of the numbers, i.e., the middle number after sorting. For example, the list $[1,2,9,2,6]$ becomes $[1,2,2,6,9]$ after sorting, so the median is 2.

The median is an optimal choice, because if $x$ is smaller than the median, the sum becomes smaller by increasing $x$, and if $x$ is larger then the median, the sum becomes smaller by decreasing $x$. Hence, the optimal solution is that $x$ is the median. If $n$ is even and there are two medians, both medians and all values between them are optimal choices.

Case $c=2$

In this case, we should minimize the sum $$ (a_1-x)^2+(a_2-x)^2+\cdots+(a_n-x)^2 $$

For example, if the numbers are $[1,2,9,2,6]$, the best solution is to select $x=4$ which produces the sum

$$ (1-4)^2+(2-4)^2+(9-4)^2+(2-4)^2+(6-4)^2=46. $$

In the general case, the best choice for $x$ is the average of the numbers. In the example the average is $\frac{1+2+9+2+6}{5}=4$. This result can be derived by presenting the sum as follows:

$$ nx^2 - 2x(a_1+a_2+\cdots+a_n) + (a_1^2+a_2^2+\cdots+a_n^2) $$

The last part does not depend on $x$, so we can ignore it. The remaining parts form a function $nx^2-2xs$ where $s=a_1+a_2+\cdots+a_n$. This is a parabola opening upwards with roots $x=0$ and $x=2\frac{s}{n}$, and the minimum value is the average of the roots $x=\frac{s}{n}$, i.e., the average of the numbers $a_1,a_2,\ldots,a_n$.

Data compression

A binary code assigns for each character of a string a codeword that consists of bits. We can compress the string using the binary code by replacing each character by the corresponding codeword. For example, the following binary code assigns codewords for characters $A-D$:

character	codeword
A	00
B	01
C	10
D	11

This is a constant-length code which means that the length of each codeword is the same. For example, we can compress the string AABACDACA as follows:

$$ 000001001011001000 $$

Using this code, the length of the compressed string is 18 bits. However, we can compress the string better if we use a variable-length code where codewords may have different lengths. Then we can give short codewords for characters that appear often and long codewords for characters that appear rarely. It turns out that an optimal code for the above string is as follows:

character	codeword
A	0
B	110
C	10
D	111

An optimal code produces a compressed string that is as short as possible. In this case, the compressed string using the optimal code is

$$ 001100101110100 $$

so only 15 bits are needed instead of 18 bits. Thus, thanks to a better code it was possible to save 3 bits in the compressed string.

We require that no codeword is a prefix of another codeword. For example, it is not allowed that a code would contain both codewords 10 and 1011. The reason for this is that we want to be able to generate the original string from the compressed string. If a codeword could be a prefix of another codeword, this would not always be possible.

For example, the following code is not valid:

character	codeword
A	10
B	11
C	1011
D	111

Using this code, it would not be possible to know if the compressed string 1011 corresponds to the string AB or the string C.

Huffman coding

Huffman coding is a greedy algorithm that constructs an optimal code for compressing a given string. The algorithm builds a binary tree based on the frequencies of the characters in the string, and each character's codeword can be read by following a path from the root to the corresponding node. A move to the left corresponds to bit 0, and a move to the right corresponds to bit 1.

Initially, each character of the string is represented by a node whose weight is the number of times the character occurs in the string. Then at each step two nodes with minimum weights are combined by creating a new node whose weight is the sum of the weights of the original nodes. The process continues until all nodes have been combined.

Next we will see how Huffman coding creates the optimal code for the string AABACDACA. Initially, there are four nodes that correspond to the characters of the string:

The node that represents character A has weight 5 because character A appears 5 times in the string. The other weights have been calculated in the same way. The first step is to combine the nodes that correspond to characters B and D, both with weight 1. The result is:

After this, the nodes with weight 2 are combined:

Finally, the two remaining nodes are combined:

Now all nodes are in the tree, so the code is ready. The following codewords can be read from the tree:

character	codeword
A	0
B	110
C	10
D	111

¹ D. A. Huffman discovered this method when solving a university course assignment and published the algorithm in 1952 [40].

Dynamic programming

Dynamic programming is a technique that combines the correctness of complete search and the efficiency of greedy algorithms. Dynamic programming can be applied if the problem can be divided into overlapping subproblems that can be solved independently.

There are two uses for dynamic programming:

Finding an optimal solution: We want to find a solution that is as large as possible or as small as possible.
Counting the number of solutions: We want to calculate the total number of possible solutions.

We will first see how dynamic programming can be used to find an optimal solution, and then we will use the same idea for counting the solutions.

Understanding dynamic programming is a milestone in every competitive programmer's career. While the basic idea is simple, the challenge is how to apply dynamic programming to different problems. This chapter introduces a set of classic problems that are a good starting point.

Coin problem

We first focus on a problem that we have already seen in Chapter 6: Given a set of coin values $\texttt{coins} = {c_1,c_2,\ldots,c_k}$ and a target sum of money $n$, our task is to form the sum $n$ using as few coins as possible.

In Chapter 6, we solved the problem using a greedy algorithm that always chooses the largest possible coin. The greedy algorithm works, for example, when the coins are the euro coins, but in the general case the greedy algorithm does not necessarily produce an optimal solution.

Now is time to solve the problem efficiently using dynamic programming, so that the algorithm works for any coin set. The dynamic programming algorithm is based on a recursive function that goes through all possibilities how to form the sum, like a brute force algorithm. However, the dynamic programming algorithm is efficient because it uses memoization and calculates the answer to each subproblem only once.

Recursive formulation

The idea in dynamic programming is to formulate the problem recursively so that the solution to the problem can be calculated from solutions to smaller subproblems. In the coin problem, a natural recursive problem is as follows: what is the smallest number of coins required to form a sum $x$?

Let $\texttt{solve}(x)$ denote the minimum number of coins required for a sum $x$. The values of the function depend on the values of the coins. For example, if $\texttt{coins} = {1,3,4}$, the first values of the function are as follows:

$$ solve(0) = 0 \ solve(1) = 1 \ solve(2) = 2 \ solve(3) = 1 \ solve(4) = 1 \ solve(5) = 2 \ solve(6) = 2 \ solve(7) = 2 \ solve(8) = 2 \ solve(9) = 3 \ solve(10) = 3 \ $$

For example, $\texttt{solve}(10)=3$, because at least 3 coins are needed to form the sum 10. The optimal solution is $3+3+4=10$.

The essential property of $\texttt{solve}$ is that its values can be recursively calculated from its smaller values. The idea is to focus on the first coin that we choose for the sum. For example, in the above scenario, the first coin can be either 1, 3 or 4. If we first choose coin 1, the remaining task is to form the sum 9 using the minimum number of coins, which is a subproblem of the original problem. Of course, the same applies to coins 3 and 4. Thus, we can use the following recursive formula to calculate the minimum number of coins:

$\texttt{solve}(x) = \min( \texttt{solve}(x-1)+1, \texttt{solve}(x-3)+1, \texttt{solve}(x-4)+1)$

The base case of the recursion is $\texttt{solve}(0)=0$, because no coins are needed to form an empty sum. For example,

$\texttt{solve}(10) = \texttt{solve}(7)+1 = \texttt{solve}(4)+2 = \texttt{solve}(0)+3 = 3$

Now we are ready to give a general recursive function that calculates the minimum number of coins needed to form a sum $x$:

\[ \texttt{solve}(x) = \begin{cases} \infty & x < 0\\ 0 & x = 0\\ \min_{ c \in \texttt{coins}} \texttt{solve}(x-c)+1 & x > 0 \\ \end{cases} \]

First, if $x<0$, the value is $\infty$, because it is impossible to form a negative sum of money. Then, if $x=0$, the value is $0$, because no coins are needed to form an empty sum. Finally, if $x>0$, the variable $c$ goes through all possibilities how to choose the first coin of the sum.

Once a recursive function that solves the problem has been found, we can directly implement a solution in Rust (the constant INF denotes infinity):

#![allow(unused)]
fn main() {
use std::cmp::{self, Ordering};
use std::isize::MAX;
const INF:isize = 999999;
let coins = [1,3,4];
let x = 10;
println!("coins: {coins:?}");
println!("x: {x}");
println!("solution: {}", solve(x, &coins));

fn solve(x: isize, coins: &[isize]) -> isize{
    match x.cmp(&0) {
        Ordering::Equal => 0,
        Ordering::Less => INF,
        _ => {
            let mut best = INF;
            for c in coins {
                best = cmp::min(best, solve(x-c, coins)+1);
            }
            best
        }
    }
}
}

Still, this function is not efficient, because there may be an exponential number of ways to construct the sum. However, next we will see how to make the function efficient using a technique called memoization.

Using Memoization

The idea of dynamic programming is to use memoization to efficiently calculate values of a recursive function. This means that the values of the function are stored in an HashMap after calculating them. For each parameter, the value of the function is calculated recursively only once, and after this, the value can be directly retrieved from the HashMap.

In this problem, we use HashMaps

#![allow(unused)]
fn main() {
const N: usize = 10;
let ready = HashMap::new();
println!("ready: {ready:?}");
}

where $\texttt{ready}[x]$ indicates whether the value of $\texttt{solve}(x)$ has been calculated, and if it is, $\texttt{ready}[x]$ contains this value.

Now the function can be efficiently implemented as follows:

#![allow(unused)]
fn main() {
use std::cmp::{self, Ordering};
use std::isize::MAX;
use std::collections::HashMap;
const INF:isize = 999999;
let coins = [1,3,4];
let x = 10;
let mut ready = HashMap::new();
println!("coins: {coins:?}");
println!("x: {x}");
println!("solution: {}", solve(x, &mut ready, &coins));

fn solve(x: isize, ready: &mut HashMap<isize, isize>, coins: &[isize]) -> isize{
    match x.cmp(&0){
        Ordering::Less => INF,
        Ordering::Equal => 0,
        _ => {
            match ready.get(&x) {
                Some(&x) => x,
                _ => {
                    let mut best = INF;
                    for c in coins {
                        best = cmp::min(best, solve(x-c, ready, coins)+1);
                    }
                    ready.insert(x, best);
                    best
                }
            }
        }
    }
}
}

The function handles the base cases $x<0$ and $x=0$ as previously. Then the function checks if $\texttt{solve}(x)$ has already been stored in $\texttt{ready}[x]$, and if it is, the function directly returns it. Otherwise the function calculates the value of $\texttt{solve}(x)$ recursively and stores it in $\texttt{ready}[x]$.

This function works efficiently, because the answer for each parameter $x$ is calculated recursively only once. After a value of $\texttt{solve}(x)$ has been stored in $\texttt{ready}[x]$, it can be efficiently retrieved whenever the function will be called again with the parameter $x$. The time complexity of the algorithm is $O(nk)$, where $n$ is the target sum and $k$ is the number of coins.

Note that we can also iteratively construct the HashMap value using a loop that simply calculates all the values of $\texttt{solve}$ for parameters $0 \ldots n$:

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::cmp;
const INF:isize = 999999;
let coins = [1,3,4];
let x = 10;
println!("coins: {coins:?}");
println!("x: {x}");
let mut ready = HashMap::new();
ready.insert(0, 0);
for i in 1..=x {
    let mut best = INF;
    for c in coins {
        if i >= c {
            if let Some(&value) = ready.get(&(i - c)) {
                ready.insert(i, cmp::min(best, value + 1));
            }
        }
    }
}
println!("solution: {ready:#?}");
}

In fact, most competitive programmers prefer this implementation, because it is shorter and has lower constant factors. From now on, we also use iterative implementations in our examples. Still, it is often easier to think about dynamic programming solutions in terms of recursive functions.

Constructing a Solution

Sometimes we are asked both to find the value of an optimal solution and to give an example how such a solution can be constructed. In the coin problem, for example, we can declare another collection that indicates for each sum of money the first coin in an optimal solution:

let mut first = HashMap::new();

Then, we can modify the algorithm as follows:

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::cmp;
const INF:isize = 999999;
let coins = [1,3,4];
let x = 10;
println!("coins: {coins:?}");
println!("x: {x}");
let mut ready = HashMap::new();
let mut first = HashMap::new();
ready.insert(0, 0);
for i in 1..=x {
    let mut best = INF;
    for c in coins {
        if i >= c {
            if let Some(&value) = ready.get(&(i - c)) {
                ready.insert(i, cmp::min(best, value + 1));
                first.insert(i, c);
            }
        }
    }
}
println!("Solution: {ready:#?}");
println!("First: {first:#?}");
}

Counting the number of solutions

Let us now consider another version of the coin problem where our task is to calculate the total number of ways to produce a sum $x$ using the coins. For example, if $\texttt{coins}={1,3,4}$ and $x=5$, there are a total of 6 ways:

$1+1+1+1+1$
$1+1+3$
$1+3+1$
$3+1+1$
$1+4$
$4+1$

Again, we can solve the problem recursively. Let $\texttt{solve}(x)$ denote the number of ways we can form the sum $x$. For example, if $\texttt{coins}={1,3,4}$, then $\texttt{solve}(5)=6$ and the recursive formula is:

$$ \begin{equation*} \begin{split} \texttt{solve}(x) = & \texttt{solve}(x-1) + \\ & \texttt{solve}(x-3) + \\ & \texttt{solve}(x-4) . \end{split} \end{equation*} $$

Then, the general recursive function is as follows:

$$ \begin{equation*} \texttt{solve}(x) = \begin{cases} 0 & x < 0\\ 1 & x = 0\\ \sum_{c \in \texttt{coins}} \texttt{solve}(x-c) & x > 0 \\ \end{cases} \end{equation*} $$

If $x<0$, the value is 0, because there are no solutions. If $x=0$, the value is 1, because there is only one way to form an empty sum. Otherwise we calculate the sum of all values of the form $\texttt{solve}(x-c)$ where $c$ is in \texttt{coins}.

The following code constructs an array $\texttt{count}$ such that $\texttt{count}[x]$ equals the value of $\texttt{solve}(x)$ for $0 \le x \le n$:

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::cmp;
const INF:isize = 999999;
let coins = [1,3,4];
let x = 10;
println!("coins: {coins:?}");
println!("x: {x}");
let mut ready = HashMap::new();
let mut first = HashMap::new();
let mut count = HashMap::new();
ready.insert(0, 0);
for i in 1..=x {
    let mut best = INF;
    for c in coins {
        if i >= c {
            if let Some(&value) = ready.get(&(i - c)) {
                ready.insert(i, cmp::min(best, value + 1));
                first.insert(i, c);
                *count.entry(i).or_insert(1) += value;
            }
        }
    }
}
println!("Solution: {ready:#?}");
println!("First: {first:#?}");
println!("Count: {count:#?}");
}

Longest increasing subsequence

Our first problem is to find the longest increasing subsequence in an array of $n$ elements. This is a maximum-length sequence of array elements that goes from left to right, and each element in the sequence is larger than the previous element. For example, in the array

the longest increasing subsequence contains 4 elements:

Let $\texttt{length}(k)$ denote the length of the longest increasing subsequence that ends at position $k$. Thus, if we calculate all values of $\texttt{length}(k)$ where $0 \le k \le n-1$, we will find out the length of the longest increasing subsequence. For example, the values of the function for the above array are as follows:

\[ \begin{array}{lcl} \texttt{length}(0) & = & 1 \\ \texttt{length}(1) & = & 1 \\ \texttt{length}(2) & = & 2 \\ \texttt{length}(3) & = & 1 \\ \texttt{length}(4) & = & 3 \\ \texttt{length}(5) & = & 2 \\ \texttt{length}(6) & = & 4 \\ \texttt{length}(7) & = & 2 \\ \end{array} \]

For example, $\texttt{length}(6)=4$, because the longest increasing subsequence that ends at position 6 consists of 4 elements.

To calculate a value of $\texttt{length}(k)$, we should find a position $i<k$ for which $\texttt{array}[i]<\texttt{array}[k]$ and $\texttt{length}(i)$ is as large as possible. Then we know that $\texttt{length}(k)=\texttt{length}(i)+1$, because this is an optimal way to add $\texttt{array}[k]$ to a subsequence. However, if there is no such position $i$, then $\texttt{length}(k)=1$, which means that the subsequence only contains $\texttt{array}[k]$.

Since all values of the function can be calculated from its smaller values, we can use dynamic programming. In the following code, the values of the function will be stored in an array $\texttt{length}$.

#![allow(unused)]
fn main() {
use std::cmp;
let array = vec![6,2,5,1,7,4,8,3];
let mut length = Vec::new();
for k in 0..array.len(){
    length.push(1);
    for i in 0..k{
        if (array[i] < array[k]) {
            length[k] = cmp::max(length[k], length[i]+1);
        }
    }
}
println!("{length:?}")
}

This code works in $O(n^2)$ time, because it consists of two nested loops. However, it is also possible to implement the dynamic programming calculation more efficiently in $O(n \log n)$ time:

#![allow(unused)]
fn main() {
use std::cmp;
let array = vec![6,2,5,1,7,4,8,3];
let mut length = Vec::new();
fn find_len(k: usize, array: &Vec<usize>, length: &mut Vec<usize>) -> usize{
    match length.get(k) {
        Some(&len) => len,
        _ => {
            length.push(1);
            for i in 0..k{
                if array[i] < array[k] {
                    let max_len = cmp::max(length[k], find_len(i, array, length)+1);
                    length[k] = max_len; 
                }
            }
            length[k]
        }
    }
}

let mut max_len = 0;
for i in 0..array.len(){
    max_len = cmp::max(max_len, find_len(i, &array, &mut length));
}
println!("{length:?}")
}

Paths in a grid

Our next problem is to find a path from the upper-left corner to the lower-right corner of an $n \times n$ grid, such that we only move down and right. Each square contains a positive integer, and the path should be constructed so that the sum of the values along the path is as large as possible.

The following picture shows an optimal path in a grid:

The sum of the values on the path is 67, and this is the largest possible sum on a path from the upper-left corner to the lower-right corner.

Assume that the rows and columns of the grid are numbered from 1 to $n$, and $\texttt{value}[y][x]$ equals the value of square $(y,x)$. Let $\texttt{sum}(y,x)$ denote the maximum sum on a path from the upper-left corner to square $(y,x)$. Now $\texttt{sum}(n,n)$ tells us the maximum sum from the upper-left corner to the lower-right corner. For example, in the above grid, $\texttt{sum}(5,5)=67$.

We can recursively calculate the sums as follows: \[ \texttt{sum}(y,x) = \max(\texttt{sum}(y,x-1),\texttt{sum}(y-1,x))+\texttt{value}[y][x] \]

The recursive formula is based on the observation that a path that ends at square $(y,x)$ can come either from square $(y,x-1)$ or square $(y-1,x)$:

Thus, we select the direction that maximizes the sum. We assume that $\texttt{sum}(y,x)=0$ if $y=0$ or $x=0$ (because no such paths exist), so the recursive formula also works when $y=1$ or $x=1$.

Since the function \texttt{sum} has two parameters, the dynamic programming array also has two dimensions. For example, we can use an array

#![allow(unused)]
fn main() {
let mut sum: Vec<Vec<usize>>= vec![vec![]];
}

and calculate the sums as follows:

#![allow(unused)]
fn main() {
use std::cmp;
let value = vec![ vec![3, 7, 9, 2, 7],
                    vec![9, 8, 3, 5, 5],
                    vec![1, 7, 9, 8, 5],
                    vec![3, 8, 6, 4, 10],
                    vec![6, 3, 9, 7, 8],];
let mut sum = value.clone();
let n = value.len()-1;
for y in 1..=n{
    for x in 1..=n{
        sum[y][x] = cmp::max(sum[y][x-1], sum[y-1][x])+value[y][x];
print!("{} ", sum[y][x]);
    }
println!("");
}
}

Knapsack problems

The term knapsack refers to problems where a set of objects is given, and subsets with some properties have to be found. Knapsack problems can often be solved using dynamic programming.

In this section, we focus on the following problem: Given a list of weights $[w_1,w_2,\ldots,w_n]$, determine all sums that can be constructed using the weights. For example, if the weights are $[1,3,3,5]$, the following sums are possible:

0	1	2	3	4	5	6	7	8	9	10	11	12
X	X		X	X	X	X	X	X	X		X	X

In this case, all sums between $0 \ldots 12$ are possible, except 2 and 10. For example, the sum 7 is possible because we can select the weights $[1,3,3]$.

To solve the problem, we focus on subproblems where we only use the first $k$ weights to construct sums.

Let possible(x,k)=true if we can construct a sum $x$ using the first $k$ weights, and otherwise possible(x,k)=false. The values of the function can be recursively calculated as follows:

\[ \texttt{possible}(x,k) = \texttt{possible}(x-w_k,k-1) \lor \texttt{possible}(x,k-1) \]

The formula is based on the fact that we can either use or not use the weight $w_k$ in the sum. If we use $w_k$, the remaining task is to form the sum $x-w_k$ using the first $k-1$ weights, and if we do not use $w_k$, the remaining task is to form the sum $x$ using the first $k-1$ weights. As the base cases, \[ \begin{equation*} \texttt{possible}(x,0) = \begin{cases} \textrm{true} & x = 0\\ \textrm{false} & x \neq 0 \\ \end{cases} \end{equation*} \] because if no weights are used, we can only form the sum 0.

The following table shows all values of the function for the weights $[1,3,3,5]$ (the symbol "X" indicates the true values):

k \ x	0	1	3	4	5	6	7	8	9	11	12
0	X
1	X	X
2	X	X	X	X
3	X	X	X	X		X	X
4	X	X	X	X	X	X	X	X	X	X	X

After calculating those values, $\texttt{possible}(x,n)$ tells us whether we can construct a sum $x$ using all weights.

Let $W$ denote the total sum of the weights. The following $O(nW)$ time dynamic programming solution corresponds to the recursive function:

#![allow(unused)]
fn main() {
let w = vec![1,3,3,5];
let W = w.iter().sum();
let n = w.len();
let  mut possible: Vec<Vec<bool>> = vec![vec![false;n];W];
possible[0][0] = true;
for k in 1..n {
    for x in 0..W {
        if x >= w[k] {
            possible[x][k] |= possible[x.wrapping_sub(w[k])][k.wrapping_sub(1)];
        }
        possible[x][k] |= possible[x][k-1];
    }
}
println!("{:#?}", possible);
}

However, here is a better implementation that only uses a one-dimensional array $\texttt{possible}[x]$ that indicates whether we can construct a subset with sum $x$. The trick is to update the array from right to left for each new weight:

#![allow(unused)]
fn main() {
let w = vec![1,3,3,5];
let W = w.iter().sum();
let n = w.len();
let  mut possible: Vec<bool> = vec![false;W];
possible[0] = true;
for k in 1..n {
    for x in (0..W).rev(){
        if possible[x] {possible[x+w[k]] = true;}
    }
}
println!("{:#?}", possible);
}

Note that the general idea presented here can be used in many knapsack problems. For example, if we are given objects with weights and values, we can determine for each weight sum the maximum value sum of a subset.

Edit distance

The edit distance or Levenshtein distance ¹ is the minimum number of editing operations needed to transform a string into another string. The allowed editing operations are as follows:

insert a character (e.g. ABC $\rightarrow$ ABCA)
remove a character (e.g. ABC $\rightarrow$ AC)
modify a character (e.g. ABC $\rightarrow$ ADC)

For example, the edit distance between LOVE and MOVIE is 2, because we can first perform the operation LOVE $\rightarrow$ MOVE (modify) and then the operation MOVE $\rightarrow$ MOVIE (insert). This is the smallest possible number of operations, because it is clear that only one operation is not enough.

Suppose that we are given a string x of length n and a string y of length m, and we want to calculate the edit distance between x and y. To solve the problem, we define a function distance}(a,b) that gives the edit distance between prefixes x[0..a] and y[0..b]. Thus, using this function, the edit distance between x and y equals distance(n-1,m-1).

We can calculate values of \texttt{distance} as follows: \begin{equation*} \begin{split} \texttt{distance}(a,b) = \min(& \texttt{distance}(a,b-1)+1, \\ & \texttt{distance}(a-1,b)+1, \\ & \texttt{distance}(a-1,b-1)+\texttt{cost}(a,b)). \end{split} \end{equation*}

Here $\texttt{cost}(a,b)=0$ if $\texttt{x}[a]=\texttt{y}[b]$, and otherwise $\texttt{cost}(a,b)=1$. The formula considers the following ways to edit the string $\texttt{x}$:

$\texttt{distance}(a,b-1)$: insert a character at the end of $\texttt{x}$
$\texttt{distance}(a-1,b)$: remove the last character from $\texttt{x}$
$\texttt{distance}(a-1,b-1)$: match or modify the last character of $\texttt{x}$

In the two first cases, one editing operation is needed (insert or remove).

In the last case, if $\texttt{x}[a]=\texttt{y}[b]$, we can match the last characters without editing, and otherwise one editing operation is needed (modify).

The following table shows the values of \texttt{distance} in the example case:

The lower-right corner of the table tells us that the edit distance between LOVE and MOVIE is 2. The table also shows how to construct the shortest sequence of editing operations. In this case the path is as follows:

The last characters of LOVE and MOVIE are equal, so the edit distance between them equals the edit distance between LOV and MOVI. We can use one editing operation to remove the character I from MOVI. Thus, the edit distance is one larger than the edit distance between LOV and MOV, etc.

¹ The distance is named after V. I. Levenshtein who studied it in connection with binary codes [49].

Counting tilings

Sometimes the states of a dynamic programming solution are more complex than fixed combinations of numbers. As an example, consider the problem of calculating the number of distinct ways to fill an $n \times m$ grid using $1 \times 2$ and $2 \times 1$ size tiles. For example, one valid solution for the $4 \times 7$ grid is

and the total number of solutions is 781.

The problem can be solved using dynamic programming by going through the grid row by row. Each row in a solution can be represented as a string that contains $m$ characters from the set ${\sqcap, \sqcup, \sqsubset, \sqsupset }$. For example, the above solution consists of four rows that correspond to the following strings:

$\sqcap \sqsubset \sqsupset \sqcap \sqsubset \sqsupset \sqcap$
$\sqcup \sqsubset \sqsupset \sqcup \sqcap \sqcap \sqcup$
$\sqsubset \sqsupset \sqsubset \sqsupset \sqcup \sqcup \sqcap$
$\sqsubset \sqsupset \sqsubset \sqsupset \sqsubset \sqsupset \sqcup$

Let $\texttt{count}(k,x)$ denote the number of ways to construct a solution for rows $1 \ldots k$ of the grid such that string $x$ corresponds to row $k$. It is possible to use dynamic programming here, because the state of a row is constrained only by the state of the previous row.

A solution is valid if row $1$ does not contain the character $\sqcup$, row $n$ does not contain the character $\sqcap$, and all consecutive rows are \emph{compatible}. For example, the rows $\sqcup \sqsubset \sqsupset \sqcup \sqcap \sqcap \sqcup$ and $\sqsubset \sqsupset \sqsubset \sqsupset \sqcup \sqcup \sqcap$ are compatible, while the rows $\sqcap \sqsubset \sqsupset \sqcap \sqsubset \sqsupset \sqcap$ and $\sqsubset \sqsupset \sqsubset \sqsupset \sqsubset \sqsupset \sqcup$ are not compatible.

Since a row consists of $m$ characters and there are four choices for each character, the number of distinct rows is at most $4^m$. Thus, the time complexity of the solution is $O(n 4^{2m})$ because we can go through the $O(4^m)$ possible states for each row, and for each state, there are $O(4^m)$ possible states for the previous row. In practice, it is a good idea to rotate the grid so that the shorter side has length $m$, because the factor $4^{2m}$ dominates the time complexity.

It is possible to make the solution more efficient by using a more compact representation for the rows. It turns out that it is sufficient to know which columns of the previous row contain the upper square of a vertical tile. Thus, we can represent a row using only characters $\sqcap$ and $\Box$, where $\Box$ is a combination of characters $\sqcup$, $\sqsubset$ and $\sqsupset$. Using this representation, there are only $2^m$ distinct rows and the time complexity is $O(n 2^{2m})$.

As a final note, there is also a surprising direct formula for calculating the number of tilings¹:

\[ \prod_{a=1}^{\lceil n/2 \rceil} \prod_{b=1}^{\lceil m/2 \rceil} 4 \cdot (\cos^2 \frac{\pi a}{n + 1} + \cos^2 \frac{\pi b}{m+1}) \]

This formula is very efficient, because it calculates the number of tilings in $O(nm)$ time, but since the answer is a product of real numbers, a problem when using the formula is how to store the intermediate results accurately.

¹ Surprisingly, this formula was discovered in 1961 by two research teams [43, 67] that worked independently.

Amortized analysis

The time complexity of an algorithm is often easy to analyze just by examining the structure of the algorithm: what loops does the algorithm contain and how many times the loops are performed. However, sometimes a straightforward analysis does not give a true picture of the efficiency of the algorithm.

Amortized analysis can be used to analyze algorithms that contain operations whose time complexity varies. The idea is to estimate the total time used to all such operations during the execution of the algorithm, instead of focusing on individual operations.

Two pointers method

In the two pointers method, two pointers are used to iterate through the array values. Both pointers can move to one direction only, which ensures that the algorithm works efficiently. Next we discuss two problems that can be solved using the two pointers method.

Subarray Sum

As the first example, consider a problem where we are given an array of $n$ positive integers and a target sum $x$, and we want to find a subarray whose sum is $x$ or report that there is no such subarray.

For example, the array

contains a subarray whose sum is 8:

This problem can be solved in $O(n)$ time by using the two pointers method. The idea is to maintain pointers that point to the first and last value of a subarray. On each turn, the left pointer moves one step to the right, and the right pointer moves to the right as long as the resulting subarray sum is at most $x$. If the sum becomes exactly $x$, a solution has been found.

As an example, consider the following array and a target sum $x=8$:

The initial subarray contains the values 1, 3 and 2 whose sum is 6:

Then, the left pointer moves one step to the right. The right pointer does not move, because otherwise the subarray sum would exceed $x$.

Again, the left pointer moves one step to the right, and this time the right pointer moves three steps to the right. The subarray sum is $2+5+1=8$, so a subarray whose sum is $x$ has been found.

The running time of the algorithm depends on the number of steps the right pointer moves. While there is no useful upper bound on how many steps the pointer can move on a single turn. we know that the pointer moves a total of $O(n)$ steps during the algorithm, because it only moves to the right.

Since both the left and right pointer move $O(n)$ steps during the algorithm, the algorithm works in $O(n)$ time.

2 Sum problem

Another problem that can be solved using the two pointers method is the following problem, also known as the 2 SUM proble: given an array of $n$ numbers and a target sum $x$, find two array values such that their sum is $x$, or report that no such values exist.

To solve the problem, we first sort the array values in increasing order. After that, we iterate through the array using two pointers. The left pointer starts at the first value and moves one step to the right on each turn. The right pointer begins at the last value and always moves to the left until the sum of the left and right value is at most $x$. If the sum is exactly $x$, a solution has been found.

For example, consider the following array and a target sum $x=12$:

The initial positions of the pointers are as follows. The sum of the values is $1+10=11$ that is smaller than $x$.

Then the left pointer moves one step to the right. The right pointer moves three steps to the left, and the sum becomes $4+7=11$.

After this, the left pointer moves one step to the right again. The right pointer does not move, and a solution $5+7=12$ has been found.

The running time of the algorithm is $O(n \log n)$, because it first sorts the array in $O(n \log n)$ time, and then both pointers move $O(n)$ steps.

Note that it is possible to solve the problem in another way in $O(n \log n)$ time using binary search. In such a solution, we iterate through the array and for each array value, we try to find another value that yields the sum $x$. This can be done by performing $n$ binary searches, each of which takes $O(\log n)$ time.

A more difficult problem is the 3 SUM problem that asks to find three array values whose sum is $x$. Using the idea of the above algorithm, this problem can be solved in $O(n^2)$ time¹. Can you see how?

¹ For a long time, it was thought that solving the 3 SUM problem more efficiently than in $O(n^2)$ time would not be possible. However, in 2014, it turned out [30] that this is not the case.

Nearest smaller elements

Amortized analysis is often used to estimate the number of operations performed on a data structure. The operations may be distributed unevenly so that most operations occur during a certain phase of the algorithm, but the total number of the operations is limited.

As an example, consider the problem of finding for each array element the nearest smaller element, i.e., the first smaller element that precedes the element in the array. It is possible that no such element exists, in which case the algorithm should report this. Next we will see how the problem can be efficiently solved using a stack structure.

We go through the array from left to right and maintain a stack of array elements. At each array position, we remove elements from the stack until the top element is smaller than the current element, or the stack is empty. Then, we report that the top element is the nearest smaller element of the current element, or if the stack is empty, there is no such element. Finally, we add the current element to the stack.

As an example, consider the following array:

First, the elements 1, 3 and 4 are added to the stack, because each element is larger than the previous element. Thus, the nearest smaller element of 4 is 3, and the nearest smaller element of 3 is 1.

The next element 2 is smaller than the two top elements in the stack. Thus, the elements 3 and 4 are removed from the stack, and then the element 2 is added to the stack. Its nearest smaller element is 1:

Then, the element 5 is larger than the element 2, so it will be added to the stack, and its nearest smaller element is 2:

After this, the element 5 is removed from the stack and the elements 3 and 4 are added to the stack:

Finally, all elements except 1 are removed from the stack and the last element 2 is added to the stack:

The efficiency of the algorithm depends on the total number of stack operations. If the current element is larger than the top element in the stack, it is directly added to the stack, which is efficient. However, sometimes the stack can contain several larger elements and it takes time to remove them. Still, each element is added exactly once to the stack and removed at most once from the stack. Thus, each element causes $O(1)$ stack operations, and the algorithm works in $O(n)$ time.

Sliding window minimum

A sliding window is a constant-size subarray that moves from left to right through the array. At each window position, we want to calculate some information about the elements inside the window. In this section, we focus on the problem of maintaining the sliding window minimum, which means that we should report the smallest value inside each window.

The sliding window minimum can be calculated using a similar idea that we used to calculate the nearest smaller elements. We maintain a queue where each element is larger than the previous element, and the first element always corresponds to the minimum element inside the window. After each window move, we remove elements from the end of the queue until the last queue element is smaller than the new window element, or the queue becomes empty. We also remove the first queue element if it is not inside the window anymore. Finally, we add the new window element to the end of the queue.

As an example, consider the following array:

Suppose that the size of the sliding window is 4. At the first window position, the smallest value is 1:

Then the window moves one step right. The new element 3 is smaller than the elements 4 and 5 in the queue, so the elements 4 and 5 are removed from the queue and the element 3 is added to the queue. The smallest value is still 1.

After this, the window moves again, and the smallest element 1 does not belong to the window anymore. Thus, it is removed from the queue and the smallest value is now 3. Also the new element 4 is added to the queue.

The next new element 1 is smaller than all elements in the queue. Thus, all elements are removed from the queue and it will only contain the element 1:

Finally the window reaches its last position. The element 2 is added to the queue, but the smallest value inside the window is still 1.

Since each array element is added to the queue exactly once and removed from the queue at most once, the algorithm works in $O(n)$ time.

Range queries

In this chapter, we discuss data structures that allow us to efficiently process range queries. In a range query, our task is to calculate a value based on a subarray of an array. Typical range queries are:

$\texttt{sum}_q(a,b)$: calculate the sum of values in range $[a,b]$
$\texttt{min}_q(a,b)$: find the minimum value in range $[a,b]$
$\texttt{max}_q(a,b)$: find the maximum value in range $[a,b]$

For example, consider the range $[3,6]$ in the following array:

In this case, $sum_q(3,6)=14$, $min_q(3,6)=1$ and $max_q(3,6)=6$.

Best way to work with ranges in Rust is to use iterators. For example, the following function can be used to process sum queries on an array:

#![allow(unused)]
fn main() {
let array = vec![1,3,8,4,6,1,3,4];
let a = 3;
let b = 6;
fn sum(array: Vec<i32>, a:usize, b:usize) -> i32{
    array[a..=b].iter().sum()
}
println!("array: {array:?}");
println!("sum from index {} to index {}: {}",a,b,sum(array, a,b));
}

Static array queries

We first focus on a situation where the array is static, i.e., the array values are never updated between the queries. In this case, it suffices to construct a static data structure that tells us the answer for any possible query.

Sum queries

We can easily process sum queries on a static array by constructing a prefix sum array. Each value in the prefix sum array equals the sum of values in the original array up to that position, i.e., the value at position $k$ is $sum_q(0,k)$. The prefix sum array can be constructed in $O(n)$ time.

For example, consider the following array:

The corresponding prefix sum array is as follows:

Since the prefix sum array contains all values of $sum_q(0,k)$, we can calculate any value of $sum_q(a,b)$ in $O(1)$ time as follows:

\[ \texttt{sum}_q(a,b) = \texttt{sum}_q(0,b) - \texttt{sum}_q(0,a-1) \]

By defining $sum_q(0,-1)=0$, the above formula also holds when $a=0$.

For example, consider the range $[3,6]$:

In this case $ \texttt{sum}_q(3,6)=8+6+1+4=19 $. This sum can be calculated from two values of the prefix sum array:

Thus, $ \texttt{sum}_q(3,6)=\texttt{sum}_q(0,6)-\texttt{sum}_q(0,2)=27-8=19 $

It is also possible to generalize this idea to higher dimensions. For example, we can construct a two-dimensional prefix sum array that can be used to calculate the sum of any rectangular subarray in $O(1)$ time. Each sum in such an array corresponds to a subarray that begins at the upper-left corner of the array.

The following picture illustrates the idea:

The sum of the gray subarray can be calculated using the formula \[ S(A) - S(B) - S(C) + S(D), \] where $S(X)$ denotes the sum of values in a rectangular subarray from the upper-left corner to the position of $X$.

Minimum queries

Minimum queries are more difficult to process than sum queries. Still, there is a quite simple $O(n \log n)$ time preprocessing method after which we can answer any minimum query in $O(1)$ time¹. Note that since minimum and maximum queries can be processed similarly, we can focus on minimum queries.

The idea is to precalculate all values of $min_q(a,b)$ where $b-a+1$ (the length of the range) is a power of two. For example, for the array

the following values are calculated:

a	b	$min_q(a,b)$
0	0	1
1	1	3
2	2	4
3	3	8
4	4	6
5	5	1
6	6	4
7	7	2

a	b	$min_q(a,b)$
0	1	1
1	2	3
2	3	4
3	4	6
4	5	1
5	6	1
6	7	2

a	b	$min_q(a,b)$
0	3	1
1	4	3
2	5	1
3	6	1
4	7	1
0	7	1

The number of precalculated values is $O(n \log n)$, because there are $O(\log n)$ range lengths that are powers of two. The values can be calculated efficiently using the recursive formula

\[ \texttt{min}_q(a,b) = \min(\texttt{min}_q(a,a+w-1),\texttt{min}_q(a+w,b)), \]

where $b-a+1$ is a power of two and $w=(b-a+1)/2$. Calculating all those values takes $O(n \log n)$ time.

After this, any value of $min_q(a,b)$ can be calculated in $O(1)$ time as a minimum of two precalculated values. Let $k$ be the largest power of two that does not exceed $b-a+1$. We can calculate the value of $min_q(a,b)$ using the formula

\[ \texttt{min}_q(a,b) = \min(\texttt{min}_q(a,a+k-1),\texttt{min}_q(b-k+1,b)) \]

In the above formula, the range $ [a,b] $ is represented as the union of the ranges $ [a,a+k-1]$ and $[b-k+1,b]$, both of length $k$.

As an example, consider the range $[1,6]$:

The length of the range is 6, and the largest power of two that does not exceed 6 is 4. Thus the range $[1,6]$ is the union of the ranges $[1,4]$ and $[3,6]$:

Since $min_q(1,4)=3$ and $min_q(3,6)=1$, we conclude that $min_q(1,6)=1$.

¹ This technique was introduced in [7] and sometimes called the sparse table method. There are also more sophisticated techniques [22] where the preprocessing time is only $O(n)$, but such algorithms are not needed in competitive programming.

Binary indexed tree

A binary indexed tree or a Fenwick tree ¹ can be seen as a dynamic variant of a prefix sum array. It supports two $O(\log n)$ time operations on an array: processing a range sum query and updating a value.

The advantage of a binary indexed tree is that it allows us to efficiently update array values between sum queries. This would not be possible using a prefix sum array, because after each update, it would be necessary to build the whole prefix sum array again in $O(n)$ time.

Structure

Even if the name of the structure is a binary indexed tree, it is usually represented as an array. In this section we assume that all arrays are one-indexed, because it makes the implementation easier.

Let $p(k)$ denote the largest power of two that divides $k$. We store a binary indexed tree as an array tree such that

\[ \texttt{tree}[k] = \texttt{sum}_q(k-p(k)+1,k) \]

i.e., each position $k$ contains the sum of values in a range of the original array whose length is $p(k)$ and that ends at position $k$. For example, since $p(6)=2$, $ \texttt{tree}[6] $ contains the value of $sum_q(5,6)$.

For example, consider the following array:

The corresponding binary indexed tree is as follows:

The following picture shows more clearly how each value in the binary indexed tree corresponds to a range in the original array:

Using a binary indexed tree, any value of $\texttt{sum}_q(1,k)$ can be calculated in $O(\log n)$ time, because a range $ [1,k] $ can always be divided into $O(\log n)$ ranges whose sums are stored in the tree.

For example, the range $ [1,7] $ consists of the following ranges:

Thus, we can calculate the corresponding sum as follows: \[ \texttt{sum}_q(1,7)=\texttt{sum}_q(1,4)+\texttt{sum}_q(5,6)+\texttt{sum}_q(7,7)=16+7+4=27 \]

To calculate the value of $sum_q(a,b)$ where $a>1$, we can use the same trick that we used with prefix sum arrays:

\[ \texttt{sum}_q(a,b) = \texttt{sum}_q(1,b) - \texttt{sum}_q(1,a-1).\]

Since we can calculate both $sum_q(1,b)$ and $sum_q(1,a-1)$ in $O(\log n)$ time, the total time complexity is $O(\log n)$.

Then, after updating a value in the original array, several values in the binary indexed tree should be updated. For example, if the value at position 3 changes, the sums of the following ranges change:

Since each array element belongs to $O(\log n)$ ranges in the binary indexed tree, it suffices to update $O(\log n)$ values in the tree.

Implementation

The operations of a binary indexed tree can be efficiently implemented using bit operations. The key fact needed is that we can calculate any value of $p(k)$ using the formula \[p(k) = k \& -k.\]

The following function calculates the value of $sum_q(1,k)$:

#![allow(unused)]
fn main() {
let v = vec![1,3,4,8,6,1,4,2];
let tree = vec![1,4,4,16,6,7,4,29];
let mut k = 7;
println!{"{:?}", tree.clone()};
println!{"k = {k}"};
let sum = sum(k, tree);
println!{"{:?}", sum};

fn sum(mut k: isize, tree: Vec<isize>) -> isize{
    let mut s = 0;
    while k>=1 {
        s += tree[k as usize -1];
        k -= k & -k;
    }
    s
}
}

The following function increases the array value at position $k$ by $x$ ($x$ can be positive or negative):

#![allow(unused)]
fn main() {
let v = vec![1,3,4,8,6,1,4,2];
let mut tree = vec![1,4,4,16,6,7,4,29];
let mut k = 1;
let mut x = 2;
println!{"before: {:?}", tree.clone()};
println!{"k = {k}"};
println!{"x = {x}"};
let sum = add(&mut tree, k, x);
println!{"after: {:?}", tree};
fn add(tree: &mut Vec<isize>, mut k: isize, x: isize){
    while (k as usize <= tree.len()) {
        tree[k as usize -1] += x;
        k += k&-k;
    }
}
}

The time complexity of both the functions is $O(\log n)$, because the functions access $O(\log n)$ values in the binary indexed tree, and each move to the next position takes $O(1)$ time.

¹ The binary indexed tree structure was presented by P. M. Fenwick in 1994 [21].

Segment tree

A segment tree ¹ is a data structure that supports two operations: processing a range query and updating an array value. Segment trees can support sum queries, minimum and maximum queries and many other queries so that both operations work in $O(\log n)$ time.

Compared to a binary indexed tree, the advantage of a segment tree is that it is a more general data structure. While binary indexed trees only support sum queries, segment trees also support other queries. On the other hand, a segment tree requires more memory and is a bit more difficult to implement.

Structure

A segment tree is a binary tree such that the nodes on the bottom level of the tree correspond to the array elements, and the other nodes contain information needed for processing range queries.

In this section, we assume that the size of the array is a power of two and zero-based indexing is used, because it is convenient to build a segment tree for such an array. If the size of the array is not a power of two, we can always append extra elements to it.

We will first discuss segment trees that support sum queries. As an example, consider the following array:

The corresponding segment tree is as follows:

Each internal tree node corresponds to an array range whose size is a power of two. In the above tree, the value of each internal node is the sum of the corresponding array values, and it can be calculated as the sum of the values of its left and right child node.

It turns out that any range $ [a,b]$ can be divided into $O(\log n)$ ranges whose values are stored in tree nodes. For example, consider the range [2,7]:

Here $sum_q(2,7)=6+3+2+7+2+6=26$. In this case, the following two tree nodes correspond to the range:

Thus, another way to calculate the sum is $9+17=26$.

When the sum is calculated using nodes located as high as possible in the tree, at most two nodes on each level of the tree are needed. Hence, the total number of nodes is $O(\log n)$.

After an array update, we should update all nodes whose value depends on the updated value. This can be done by traversing the path from the updated array element to the top node and updating the nodes along the path.

The following picture shows which tree nodes change if the array value 7 changes:

The path from bottom to top always consists of $O(\log n)$ nodes, so each update changes $O(\log n)$ nodes in the tree.

Implementation

We store a segment tree as an array of $2n$ elements where $n$ is the size of the original array and a power of two. The tree nodes are stored from top to bottom: $ \texttt{tree}[1]$ is the top node, $\texttt{tree}[2]$ and $ \texttt{tree}[3]$ are its children, and so on. Finally, the values from $\texttt{tree}[n]$ to $\texttt{tree}[2n-1]$ correspond to the values of the original array on the bottom level of the tree.

For example, the segment tree:

is stored as follows:

Using this representation, the parent of $\texttt{tree}[k]$ is $\texttt{tree}[\lfloor k/2 \rfloor]$, and its children are $\texttt{tree}[2k]$ and $\texttt{tree}[2k+1]$. Note that this implies that the position of a node is even if it is a left child and odd if it is a right child.

The following function calculates the value of $\texttt{sum}_q(a,b)$:

#![allow(unused)]
fn main() {
let tree = vec![39, 22, 17, 13, 9, 9, 8, 5, 8, 6, 3, 2, 7, 2, 6];
let a = 2;
let b = 7;
let n = 8;
println!("tree => {:?}", tree);
println!("a =>{a}");
println!("b =>{b}");
println!("n =>{n}");
println!("sum_q({},{}) = {}", a,b,sum(tree, 2, 7, 8));
fn sum(tree: Vec<usize>, mut a: usize, mut b: usize, n:usize) -> usize{
   a += n; 
   b += n;
   let mut s = 0;
   while a<=b {
       if a%2 == 1 { s += tree[a as usize -1]; a+=1;}
       if b%2 == 0 { s += tree[b as usize -1]; b-=1;}
       a/=2;
       b/=2;
   }

   s
}
}

The function maintains a range that is initially $[a+n,b+n]$. Then, at each step, the range is moved one level higher in the tree, and before that, the values of the nodes that do not belong to the higher range are added to the sum.

The following function increases the array value at position $k$ by $x$:

#![allow(unused)]
fn main() {
let mut tree = vec![39, 22, 17, 13, 9, 9, 8, 5, 8, 6, 3, 2, 7, 2, 6];
let k = 5;
let x = 4;
let n = 8;
println!("before => {:?}", tree);
println!("k =>{k}");
println!("x =>{x}");
println!("n =>{n}");
add(&mut tree, k, x, n);
println!("after => {:?}", tree);
fn add(tree: &mut Vec<usize>, mut k: usize, mut x: usize, n:usize){
    k += n;
    tree[k-1] += x;
    k/=2;
    while k>= 1 {
        tree[k-1] = tree[2*k-1]+tree[2*k];
        k/=2;
    }
}
}

First the function updates the value at the bottom level of the tree. After this, the function updates the values of all internal tree nodes, until it reaches the top node of the tree.

Both the above functions work in $O(\log n)$ time, because a segment tree of $n$ elements consists of $O(\log n)$ levels, and the functions move one level higher in the tree at each step.

¹ The bottom-up-implementation in this chapter corresponds to that in [62]. Similar structures were used in late 1970's to solve geometric problems [9].

² In fact, using two binary indexed trees it is possible to support minimum queries [16], but this is more complicated than to use a segment tree.

Additional techniques

Index compression

A limitation in data structures that are built upon an array is that the elements are indexed using consecutive integers. Difficulties arise when large indices are needed. For example, if we wish to use the index $10^9$, the array should contain $10^9$ elements which would require too much memory.

However, we can often bypass this limitation by using index compression, where the original indices are replaced with indices $1,2,3,$ etc. This can be done if we know all the indices needed during the algorithm beforehand.

The idea is to replace each original index $x$ with $c(x)$ where $c$ is a function that compresses the indices. We require that the order of the indices does not change, so if $a<b$, then $c(a)<c(b)$. This allows us to conveniently perform queries even if the indices are compressed.

For example, if the original indices are $555$, $10^9$ and $8$, the new indices are:

\[ \begin{array}{lcl} c(8) & = & 1 \\ c(555) & = & 2 \\ c(10^9) & = & 3 \\ \end{array} \]

Range updates

So far, we have implemented data structures that support range queries and updates of single values. Let us now consider an opposite situation, where we should update ranges and retrieve single values. We focus on an operation that increases all elements in a range $[a,b]$ by $x$.

Surprisingly, we can use the data structures presented in this chapter also in this situation. To do this, we build a difference array whose values indicate the differences between consecutive values in the original array. Thus, the original array is the prefix sum array of the difference array. For example, consider the following array:

The difference array for the above array is as follows:

For example, the value 2 at position 6 in the original array corresponds to the sum $3-2+4-3=2$ in the difference array.

The advantage of the difference array is that we can update a range in the original array by changing just two elements in the difference array. For example, if we want to increase the original array values between positions 1 and 4 by 5, it suffices to increase the difference array value at position 1 by 5 and decrease the value at position 5 by 5. The result is as follows:

More generally, to increase the values in range $[a,b]$ by $x$, we increase the value at position $a$ by $x$ and decrease the value at position $b+1$ by $x$. Thus, it is only needed to update single values and process sum queries, so we can use a binary indexed tree or a segment tree.

A more difficult problem is to support both range queries and range updates. In Chapter 28 we will see that even this is possible.

Bit manipulation

All data in computer programs is internally stored as bits, i.e., as numbers 0 and 1. This chapter discusses the bit representation of integers, and shows examples of how to use bit operations. It turns out that there are many uses for bit manipulation in algorithm programming.

Bit representation

In programming, an $n$ bit integer is internally stored as a binary number that consists of $n$ bits. For example, the Rust type i32 is a 32-bit type, which means that every i32 number consists of 32 bits.

Here is the bit representation of the i32 number 43:

#![allow(unused)]
fn main() {
let number: i32 = 43;
println!("{number:032b}");
}

The bits in the representation are indexed from right to left. To convert a bit representation $b_k \cdots b_2 b_1 b_0$ into a number, we can use the formula \[ b_k 2^k + \ldots + b_2 2^2 + b_1 2^1 + b_0 2^0 \]

For example, \[ 1 \cdot 2^5 + 1 \cdot 2^3 + 1 \cdot 2^1 + 1 \cdot 2^0 = 43 \]

The bit representation of a number is either signed or unsigned. Usually a signed representation is used, which means that both negative and positive numbers can be represented. A signed variable of $n$ bits can contain any integer between $-2^{n-1}$ and $2^{n-1}-1$. For example, the i32 type in Rust is a signed type, so an i32 variable can contain any integer between $-2^{31}$ and $2^{31}-1$.

The first bit in a signed representation is the sign of the number (0 for nonnegative numbers and 1 for negative numbers), and the remaining $n-1$ bits contain the magnitude of the number. Two's complement is used, which means that the opposite number of a number is calculated by first inverting all the bits in the number, and then increasing the number by one.

For example, the bit representation of the i32 number $-43$ is

#![allow(unused)]
fn main() {
let number: i32 = -43;
println!("{number:032b}");
}

In an unsigned representation, only nonnegative numbers can be used, but the upper bound for the values is larger. An unsigned variable of $n$ bits can contain any integer between $0$ and $2^n-1$. For example, in Rust, an u32 variable can contain any integer between $0$ and $2^{32}-1$.

Bit operations

`and` operation

The and operation x & y produces a number that has one bits in positions where both $x$ and $y$ have one bits. For example:

#![allow(unused)]
fn main() {
println!("{}", 22 & 26);
println!("The result is {} because:", 22 & 26);
println!("   {:032b} => ({})", 22, 22);
println!(" & {:032b} => ({})", 26, 26);
println!("{}", vec!['_';43].iter().collect::<String>());
println!(" = {:032b} => ({})", 22 & 26, 22 & 26);
}

	bin	dec
	10110	(22)
&	11010	(26)
=	11110	(30)

Using the and operation, we can check if a number $x$ is even because

x & 1 = 0 if $x$ is even
x & 1 = 1 if $x$ is odd.

More generally, $x$ is divisible by $2^k$ exactly when x & (2^k-1) = 0.

`or` operation

The or operation x | y produces a number that has one bits in positions where at least one of $x$ and $y$ have one bits. For example:

#![allow(unused)]
fn main() {
println!("{}", 22 | 26);
println!("The result is {} because:", 22 | 26);
println!("   {:032b} => ({})", 22, 22);
println!(" | {:032b} => ({})", 26, 26);
println!("{}", vec!['_';43].iter().collect::<String>());
println!(" = {:032b} => ({})", 22 | 26, 22 | 26);
}

	bin	dec
	10110	(22)
\|	11010	(26)
=	11100	(30)

`xor` operation

The xor operation x ^ y produces a number that has one bits in positions where exactly one of x and y have one bits. For example:

#![allow(unused)]
fn main() {
println!("{}", 22 ^ 26);
println!("The result is {} because:", 22 ^ 26);
println!("   {:032b} => ({})", 22, 22);
println!(" ^ {:032b} => ({})", 26, 26);
println!("{}", vec!['_';43].iter().collect::<String>());
println!(" = {:032b} => ({})", 22 ^ 26, 22 ^ 26);
}

	bin	dec
	10110	(22)
^	11010	(26)
=	01100	(12)

`not` operation

The not operation x ! y produces a number where all the bits of x have been inverted. For example:

#![allow(unused)]
fn main() {
println!("{}", !22);
println!("The result is {} because:", !22);
println!(" ! {:032b} => ({})", 22, 22);
println!("{}", vec!['_';43].iter().collect::<String>());
println!(" = {:032b} => ({})", !22, !22);
}

	bin	dec
!	00000000000000000000000000010110	(22)
=	11111111111111111111111111101001	(30)

Bit shifts

The left bit shift x < < k appends $k$ zero bits to the number, and the right bit shift x > > k removes the $k$ last bits from the number. For example:

#![allow(unused)]
fn main() {
println!("{}", 14<<2);
println!("The result is {} because:", 14<<2);
println!("{:032b} => ({})", 14, 14);
println!("{:032b} => ({})", 14<<2, 14<<2);
}

Note that x << k corresponds to multiplying x by 2^k, and x > > k corresponds to dividing x by 2^k rounded down to an integer.

Applications

A number of the form 1 << k has a one bit in position $k$ and all other bits are zero, so we can use such numbers to access single bits of numbers. In particular, the $k$th bit of a number is one exactly when x & (1 << k) is not zero. The following code print a u32 bit by bit:

#![allow(unused)]
fn main() {
let x = 43;
for i in (0..31).rev() {
    print!("{}", if x&(1<<i) != 0 {1} else {0});
}

}

It is also possible to modify single bits of numbers using similar ideas. For example, the formula x | (1 << k) sets the $k$th bit of $x$ to one, the formula x & !(1 << k) sets the $k$th bit of $x$ to zero, and the formula x^(1 << k) inverts the $k$th bit of $x$.

The formula x & (x-1) sets the last one bit of $x$ to zero, and the formula x & -x sets all the one bits to zero, except for the last one bit. The formula x | (x-1) inverts all the bits after the last one bit. Also note that a positive number $x$ is a power of two exactly when x & (x-1) = 0.

Representing sets

Every subset of a set ${0,1,2,\ldots,n-1}$ can be represented as an $n$ bit integer whose one bits indicate which elements belong to the subset. This is an efficient way to represent sets because operations can be implemented as bit operations.

For example, since i32 is a 32-bit type, an i32 number can represent any subset of the set ${0,1,2,\ldots,31}$. The bit representation of the set ${1,3,4,8}$ is

$$ 00000000000000000000000100011010 $$

which corresponds to the number $2^8+2^4+2^3+2^1=282$.

#![allow(unused)]
fn main() {
println!("{}", 0b00000000000000000000000100011010);
}

Set implementation

The following code declares an i32 variable $x$ that can contain a subset of ${0,1,2,\ldots,31}$. After this, the code adds the elements 1, 3, 4 and 8 to the set and prints the size of the set.

#![allow(unused)]
fn main() {
let mut x = 0_i32;
x |= (1<<1);
x |= (1<<3);
x |= (1<<4);
x |= (1<<8);
println!("{:?}", x.count_ones());
}

Then, the following code prints all elements that belong to the set:

#![allow(unused)]
fn main() {
let mut x = 0;
x |= (1<<1);
x |= (1<<3);
x |= (1<<4);
x |= (1<<8);
for i in 0..32 {
    if x&(1<<i) != 0 {println!("{i}")}
}
}

Set operations

Set operations can be implemented as follows as bit operations:

operation	set syntax	bit syntax
intersection	$a \cap b$	$a$ & $b$
union	$a \cup b$	$a$
complement	$\bar a$	~$a$
difference	$a \setminus b$	$a$ & (~$b$)

For example, the following code first constructs the sets $x={1,3,4,8}$ and $y={3,6,8,9}$, and then constructs the set $z = x \cup y = {1,3,4,6,8,9}$:

#![allow(unused)]
fn main() {
let x = (1<<1)|(1<<3)|(1<<4)|(1<<8);
let y = (1<<3)|(1<<6)|(1<<8)|(1<<9);
let z: i32 = x|y;
println!("x => {x:010b}");
println!("y => {y:b}");
println!("z => {z:b}");
println!("{}", z.count_ones());
}

Iterating through subsets

The following code goes through the subsets of ${0,1,\ldots,n-1}$:

#![allow(unused)]
fn main() {
let n = 10;
let mut b=0;
while b < (1<<n) {
    // process subset b
    println!("b={b}, n={n}");
    b+=1;
}
}

The following code goes through the subsets with exactly $k$ elements:

#![allow(unused)]
fn main() {
let mut b = 0_i32;
let k = 4;
let n = 10;
while b < (1<<n) {
    if (b.count_ones() == k) {
        println!("b={b}, k={k}");
        // process subset b
    }
    b+=1;
}
}

The following code goes through the subsets of a set $x$:

#![allow(unused)]
fn main() {
let mut b = 0;
let x = 10;
loop {
    // process subset b
    println!("b={b}, x={x}");
    b = (b-x)&x; 
    if b == 0 {break;}
}
}

Bit optimizations

Many algorithms can be optimized using bit operations. Such optimizations do not change the time complexity of the algorithm, but they may have a large impact on the actual running time of the code. In this section we discuss examples of such situations.

Hamming distance

The Hamming distance $hamming(a,b)$ between two strings $a$ and $b$ of equal length is the number of positions where the strings differ. For example, $$ hamming(01101,11001)=2 $$

Consider the following problem: Given a list of $n$ bit strings, each of length $k$, calculate the minimum Hamming distance between two strings in the list. For example, the answer for $[00111,01101,11110]$ is 2, because

$hamming(00111,01101)=2$
$hamming(00111,11110)=3$
$hamming(01101,11110)=3$

A straightforward way to solve the problem is to go through all pairs of strings and calculate their Hamming distances, which yields an $O(n^2 k)$ time algorithm. The following function can be used to calculate distances:

#![allow(unused)]
fn main() {
let a = "00111";
let b = "01101";
println!("{}", hamming(a, b));
fn hamming(a: &str, b: &str) -> u32 {
    let mut d = 0;
    for i in 0..a.len() {
        if a.chars().nth(i) != b.chars().nth(i) {d+=1}
    }
    d
}
}

However, if $k$ is small, we can optimize the code by storing the bit strings as integers and calculating the Hamming distances using bit operations. In particular, if $k \le 32$, we can just store the strings as i32 values and use the following function to calculate distances:

#![allow(unused)]

fn main() {
let a = 0b00111;
let b = 0b01101;
println!("{}", hamming(a, b));
fn hamming(a: i32, b: i32) -> u32 {
    (a^b).count_ones()
}
}

In the above function, the xor operation constructs a bit string that has one bits in positions where $a$ and $b$ differ. Then, the number of bits is calculated using the count_ones() function.

To compare the implementations, we generated a list of 10000 random bit strings of length 30. The bit optimized code was almost 30 times faster than the original code.

Counting subgrids

As another example, consider the following problem: Given an $n \times n$ grid whose each square is either black (1) or white (0), calculate the number of subgrids whose all corners are black. For example, the grid

contains two such subgrids:

There is an $O(n^3)$ time algorithm for solving the problem: go through all $O(n^2)$ pairs of rows and for each pair $(a,b)$ calculate the number of columns that contain a black square in both rows in $O(n)$ time. The following code assumes that color[y][x] denotes the color in row $y$ and column $x$:

let mut count = 0;
for i in 0..n{
    if color[a][i] == 1 && color[b][i] == 1 {count+=1};
}

account for $count(count-1)/2$ subgrids with black corners, because we can choose any two of them to form a subgrid.

To optimize this algorithm, we divide the grid into blocks of columns such that each block consists of $N$ consecutive columns. Then, each row is stored as a list of $N$-bit numbers that describe the colors of the squares. Now we can process $N$ columns at the same time using bit operations. In the following code,

let mut count = 0;
for i in 0..n/N {
    count += (color[a][i]&color[b][i]).count_ones();
}

The resulting algorithm works in $O(n^3/N)$ time.

We generated a random grid of size $2500 \times 2500$ and compared the original and bit optimized implementation. The original code is 10 times slower.

Dynamic programming

Bit operations provide an efficient and convenient way to implement dynamic programming algorithms whose states contain subsets of elements, because such states can be stored as integers. Next we discuss examples of combining bit operations and dynamic programming.

Optimal selection

As a first example, consider the following problem: We are given the prices of $k$ products over $n$ days, and we want to buy each product exactly once. However, we are allowed to buy at most one product in a day. What is the minimum total price? For example, consider the following scenario ($k=3$ and $n=8$):

In this scenario, the minimum total price is $5$:

Let $ \texttt{price}[x][d]$ denote the price of product $x$ on day $d$. For example, in the above scenario $\texttt{price}[2][3] = 7$. Then, let $\texttt{total}(S,d)$ denote the minimum total price for buying a subset $S$ of products by day $d$. Using this function, the solution to the problem is $\texttt{total}(\{0 \ldots k-1\},n-1)$.

First, $\texttt{total}(\emptyset,d) = 0$, because it does not cost anything to buy an empty set, and $\texttt{total}(\{x\},0) = \texttt{price}[x][0]$, because there is one way to buy one product on the first day. Then, the following recurrence can be used:

\[ \begin{equation*} \begin{split} \texttt{total}(S,d) = \min( & \texttt{total}(S,d-1), \\ & \min_{x \in S} (\texttt{total}(S \setminus x,d-1)+\texttt{price}[x][d])) \end{split} \end{equation*} \]

This means that we either do not buy any product on day $d$ or buy a product $x$ that belongs to $S$. In the latter case, we remove $x$ from $S$ and add the price of $x$ to the total price.

The next step is to calculate the values of the function using dynamic programming. To store the function values, we declare an array

#![allow(unused)]
fn main() {
const K: usize = 3;
const N: usize = 8;
let mut total = vec![vec![0; N]; 1<<K];
println!("{total:?}");
}

where $K$ and $N$ are suitably large constants. The first dimension of the array corresponds to a bit representation of a subset.

First, the cases where $d=0$ can be processed as follows:

#![allow(unused)]
fn main() {
const K: usize = 3;
const N: usize = 8;
let mut total = vec![vec![0; N]; 1<<K];
println!("{total:?}");
let price = [[6,9,5,2,8,9,1,6],[8,2,6,2,7,5,7,2],[5,3,9,7,3,5,1,4]];
for x in 0..K {
    total[1<<x][0] = price[x][0];
}
println!("{total:?}");
}

Then, the recurrence translates into the following code:

#![allow(unused)]
fn main() {
use std::cmp::Ordering;
const K: usize = 3;
const N: usize = 8;
let mut total = vec![vec![0; N]; 1<<K];
let price = [[6,9,5,2,8,9,1,6],[8,2,6,2,7,5,7,2],[5,3,9,7,3,5,1,4]];
for x in 0..K {
  total[1<<x][0] = price[x][0];
}
for d in 1..N {
    for s in 0..1<<K {
        total[s][d] = total[s][d-1];
        for x in 0..K {
            if s&(1<<x) != 0 {
                total[s][d] = total[s][d].min(total[s^(1<<x)][d-1]+price[x][d]);
            }
        }
    }
}

println!("{total:?}");
}

The time complexity of the algorithm is $O(n 2^k k)$.

From permutations to subsets

Using dynamic programming, it is often possible to change an iteration over permutations into an iteration over subsets ¹. The benefit of this is that $n!$, the number of permutations, is much larger than $2^n$, the number of subsets. For example, if $n=20$, then $n! \approx 2.4 \cdot 10^{18}$ and $2^n \approx 10^6$. Thus, for certain values of $n$, we can efficiently go through the subsets but not through the permutations.

As an example, consider the following problem: There is an elevator with maximum weight $x$, and $n$ people with known weights who want to get from the ground floor to the top floor. What is the minimum number of rides needed if the people enter the elevator in an optimal order?

For example, suppose that $x=10$, $n=5$ and the weights are as follows:

person	weight
0	2
1	3
2	3
3	5
4	6

In this case, the minimum number of rides is 2. One optimal order is ${0,2,3,1,4}$, which partitions the people into two rides: first ${0,2,3}$ (total weight 10), and then ${1,4}$ (total weight 9).

The problem can be easily solved in $O(n! n)$ time by testing all possible permutations of $n$ people. However, we can use dynamic programming to get a more efficient $O(2^n n)$ time algorithm. The idea is to calculate for each subset of people two values: the minimum number of rides needed and the minimum weight of people who ride in the last group.

Let $\texttt{weight}[p]$ denote the weight of person $p$. We define two functions: $\texttt{rides}(S)$ is the minimum number of rides for a subset $S$, and $\texttt{last}(S)$ is the minimum weight of the last ride. For example, in the above scenario

\[ \texttt{rides}(\{1,3,4\})=2 \hspace{10px} \textrm{and} \hspace{10px} \texttt{last}(\{1,3,4\})=5 \]

because the optimal rides are ${1,4}$ and ${3}$, and the second ride has weight 5. Of course, our final goal is to calculate the value of $\texttt{rides}({0 \ldots n-1})$.

We can calculate the values of the functions recursively and then apply dynamic programming. The idea is to go through all people who belong to $S$ and optimally choose the last person $p$ who enters the elevator. Each such choice yields a subproblem for a smaller subset of people. If $\texttt{last}(S \setminus p)+\texttt{weight}[p] \le x$, we can add $p$ to the last ride. Otherwise, we have to reserve a new ride that initially only contains $p$.

To implement dynamic programming, we declare an array

#![allow(unused)]
fn main() {
let N = 8;
let mut best = vec![(0,0); 1<<N];
println!("{best:?}");
}

that contains for each subset $S$ a pair $(\texttt{rides}(S),\texttt{last}(S))$. We set the value for an empty group as follows:

#![allow(unused)]
fn main() {
let N = 8;
let mut best = vec![(0,0); 1<<N];
best[0] = (1,0);
println!("{best:?}");
}

Then, we can fill the array as follows:

#![allow(unused)]
fn main() {
let N = 5;
let x = 10;
let mut best = vec![(0,0); 1<<N];
best[0] = (1,0);
let weight = [2,3,3,5,6];
for s in 1..1<<N{
    best[s] = (N+1,0);
    for p in 0..N {
        if s&(1<<p) != 0 {
            let mut option = best[s^(1<<p)];
            if (option.1 + weight[p] <= x) {
                // add p to an existing ride
                option.1 += weight[p];
            } else {
                // reserve a new ride for p
                option.0 +=1;
                option.1 = weight[p];
            }
            best[s] = best[s].min(option);
        }
    }
}
println!("{best:?}");
}

Note that the above loop guarantees that for any two subsets $S_1$ and $S_2$ such that $S_1 \subset S_2$, we process $S_1$ before $S_2$. Thus, the dynamic programming values are calculated in the correct order.

Counting subsets

Our last problem in this chapter is as follows: Let $X={0 \ldots n-1}$, and each subset $S \subset X$ is assigned an integer $\texttt{value}[S]$. Our task is to calculate for each $S$ \[ \texttt{sum}(S) = \sum_{A \subset S} \texttt{value}[A] \] i.e., the sum of values of subsets of $S$.

For example, suppose that $n=3$ and the values are as follows:

$\texttt{value}[\emptyset] = 3$
$\texttt{value}[\{0\}] = 1$
$\texttt{value}[\{1\}] = 4$
$\texttt{value}[\{0,1\}] = 5$
$\texttt{value}[\{2\}] = 5$
$\texttt{value}[\{0,2\}] = 1$
$\texttt{value}[\{1,2\}] = 3$
$\texttt{value}[\{0,1,2\}] = 3$

In this case, for example, \[ \begin{equation*} \begin{split} \texttt{sum}(\{0,2\}) &= \texttt{value}[\emptyset]+\texttt{value}[\{0\}]+\texttt{value}[\{2\}]+\texttt{value}[\{0,2\}] \\ &= 3 + 1 + 5 + 1 = 10. \end{split} \end{equation*} \]

Because there are a total of $2^n$ subsets, one possible solution is to go through all pairs of subsets in $O(2^{2n})$ time. However, using dynamic programming, we can solve the problem in $O(2^n n)$ time. The idea is to focus on sums where the elements that may be removed from $S$ are restricted.

Let $\texttt{partial}(S,k)$ denote the sum of values of subsets of $S$ with the restriction that only elements $0 \ldots k$ may be removed from $S$. For example,

\[ \texttt{partial}(\{0,2\},1)=\texttt{value}[\{2\}]+\texttt{value}[\{0,2\}] \]

because we may only remove elements $0 \ldots 1$. We can calculate values of sum using values of partial, because

\[ \texttt{sum}(S) = \texttt{partial}(S,n-1) \]

The base cases for the function are

\[ \texttt{partial}(S,-1)=\texttt{value}[S] \]

because in this case no elements can be removed from $S$. Then, in the general case we can use the following recurrence:

\[ \begin{equation*} \texttt{partial}(S,k) = \begin{cases} \texttt{partial}(S,k-1) & k \notin S \ \texttt{partial}(S,k-1) + \texttt{partial}(S \setminus \{k\},k-1) & k \in S \end{cases} \end{equation*} \]

Here we focus on the element $k$. If $k \in S$, we have two options: we may either keep $k$ in $S$ or remove it from $S$.

There is a particularly clever way to implement the calculation of sums. We can declare an array

#![allow(unused)]
fn main() {
const N:usize = 3;
let mut sum = vec![0;1<<N];
println!("{sum:?}");
}

that will contain the sum of each subset. The array is initialized as follows:

#![allow(unused)]
fn main() {
const N:usize = 3;
let mut sum = vec![0;1<<N];
let value = [3,1,4,5,5,1,3,3];
for s in 0..1<<N {
    sum[s] = value[s];
}
println!("{sum:?}");
}

Then, we can fill the array as follows:

#![allow(unused)]
fn main() {
const N:usize = 3;
let mut sum = vec![0;1<<N];
let value = [3,1,4,5,5,1,3,3];
for s in 0..1<<N {
   sum[s] = value[s];
}
for k in 0..N {
    for s in 0..1<<N {
        if s&(1<<k) != 0 {sum[s] += sum[s^(1<<k)]}
    }
}
println!("{sum:?}");
}

This code calculates the values of $\texttt{partial}(S,k)$ for $k=0 \ldots n-1$ to the array sum. Since $\texttt{partial}(S,k)$ is always based on $\texttt{partial}(S,k-1)$, we can reuse the array sum, which yields a very efficient implementation.

¹ This technique was introduced in 1962 by M. Held and R. M. Karp [34].

Graph algorithms

Many programming problems can be solved by modeling the problem as a graph problem and using an appropriate graph algorithm. A typical example of a graph is a network of roads and cities in a country. Sometimes, though, the graph is hidden in the problem and it may be difficult to detect it.

This part of the book discusses graph algorithms, especially focusing on topics that are important in competitive programming. In this chapter, we go through concepts related to graphs, and study different ways to represent graphs in algorithms.

Graph terminology

A graph consists of nodes and edges. In this book, the variable $n$ denotes the number of nodes in a graph, and the variable $m$ denotes the number of edges. The nodes are numbered using integers $1,2,\ldots,n$.

For example, the following graph consists of 5 nodes and 7 edges:

A path leads from node $a$ to node $b$ through edges of the graph. The length of a path is the number of edges in it. For example, the above graph contains a path $1 \rightarrow 3 \rightarrow 4 \rightarrow 5$ of length 3 from node 1 to node 5:

A path is a cycle if the first and last node is the same. For example, the above graph contains a cycle $1 \rightarrow 3 \rightarrow 4 \rightarrow 1$. A path is simple if each node appears at most once in the path.

Connectivity

A graph is connected if there is a path between any two nodes. For example, the following graph is connected:

The following graph is not connected, because it is not possible to get from node 4 to any other node:

The connected parts of a graph are called its components. For example, the following graph contains three components: {$1,2,3$}, {$4,5,6,7$} and {$8$}.

A tree is a connected graph that consists of $n$ nodes and $n-1$ edges. There is a unique path between any two nodes of a tree. For example, the following graph is a tree:

Edge directions

A graph is directed if the edges can be traversed in one direction only. For example, the following graph is directed:

The above graph contains a path $3 \rightarrow 1 \rightarrow 2 \rightarrow 5$ from node $3$ to node $5$, but there is no path from node $5$ to node $3$.

Edge weights

In a weighted graph, each edge is assigned a weight. The weights are often interpreted as edge lengths. For example, the following graph is weighted:

The length of a path in a weighted graph is the sum of the edge weights on the path. For example, in the above graph, the length of the path $1 \rightarrow 2 \rightarrow 5$ is $12$, and the length of the path $1 \rightarrow 3 \rightarrow 4 \rightarrow 5$ is $11$. The latter path is the \key{shortest} path from node $1$ to node $5$.

Neighbors and degrees

Two nodes are neighbors or adjacent if there is an edge between them. The degree of a node is the number of its neighbors. For example, in the following graph, the neighbors of node 2 are 1, 4 and 5, so its degree is 3.

The sum of degrees in a graph is always $2m$, where $m$ is the number of edges, because each edge increases the degree of exactly two nodes by one. For this reason, the sum of degrees is always even.

A graph is regular if the degree of every node is a constant $d$. A graph is complete if the degree of every node is $n-1$, i.e., the graph contains all possible edges between the nodes.

In a directed graph, the indegree of a node is the number of edges that end at the node, and the outdegree of a node is the number of edges that start at the node. For example, in the following graph, the indegree of node 2 is 2, and the outdegree of node 2 is 1.

Colorings

In a coloring of a graph, each node is assigned a color so that no adjacent nodes have the same color.

A graph is bipartite if it is possible to color it using two colors. It turns out that a graph is bipartite exactly when it does not contain a cycle with an odd number of edges. For example, the graph

is bipartite, because it can be colored as follows:

However, the graph

is not bipartite, because it is not possible to color the following cycle of three nodes using two colors:

Simplicity

A graph is simple if no edge starts and ends at the same node, and there are no multiple edges between two nodes. Often we assume that graphs are simple. For example, the following graph is not simple:

Graph representation

There are several ways to represent graphs in algorithms. The choice of a data structure depends on the size of the graph and the way the algorithm processes it. Next we will go through three common representations.

Adjacency list representation

In the adjacency list representation, each node $x$ in the graph is assigned an adjacency list that consists of nodes to which there is an edge from $x$. Adjacency lists are the most popular way to represent graphs, and most algorithms can be efficiently implemented using them.

A convenient way to store the adjacency lists is to declare an array of vectors as follows:

#![allow(unused)]
fn main() {
const N:usize = 10;
let mut adj: [Vec<i32>; N]= Default::default();
println!("{adj:?}");
}

The constant $N$ is chosen so that all adjacency lists can be stored. For example, the graph

can be stored as follows:

#![allow(unused)]
fn main() {
const N:usize = 5;
let mut adj: [Vec<i32>; N]= Default::default();
adj[1].push(2);
adj[2].push(3);
adj[2].push(4);
adj[3].push(4);
adj[4].push(1);
println!("{:?}", adj);
}

If the graph is undirected, it can be stored in a similar way, but each edge is added in both directions.

For a weighted graph, the structure can be extended as follows:

#![allow(unused)]
fn main() {
const N:usize = 5;
let mut adj: [Vec<(i32, i32)>; N] = Default::default();
println!("{:?}", adj);
}

In this case, the adjacency list of node $a$ contains the pair $(b,w)$ always when there is an edge from node $a$ to node $b$ with weight $w$. For example, the graph

can be stored as follows:

#![allow(unused)]
fn main() {
const N:usize = 5;
let mut adj: [Vec<(i32, i32)>; N] = Default::default();
adj[1].push((2,5));
adj[2].push((3,7));
adj[2].push((4,6));
adj[3].push((4,5));
adj[4].push((1,2));
println!("{:?}", adj);
}

The benefit of using adjacency lists is that we can efficiently find the nodes to which we can move from a given node through an edge. For example, the following loop goes through all nodes to which we can move from node $s$:

#![allow(unused)]
fn main() {
const N:usize = 5;
let mut adj: [Vec<(i32, i32)>; N] = Default::default();
adj[1].push((2,5));
adj[2].push((3,7));
adj[2].push((4,6));
adj[3].push((4,5));
adj[4].push((1,2));
let mut i=0;
for u in adj{
    // process node u
if i > 0 {println!("node {i}: {u:?}")}
i +=1;
}
}

Adjacency matrix representation

An adjacency matrix is a two-dimensional array that indicates which edges the graph contains. We can efficiently check from an adjacency matrix if there is an edge between two nodes. The matrix can be stored as an array

#![allow(unused)]
fn main() {
const N:usize =4;
let adj: Vec<Vec<i32>> = Vec::new();
println!("{adj:?}");
}

where each value $\texttt{adj}[a][b]$ indicates whether the graph contains an edge from node $a$ to node $b$. If the edge is included in the graph, then $\texttt{adj}[a][b]=1$, and otherwise $\texttt{adj}[a][b]=0$. For example, the graph

If the graph is weighted, the adjacency matrix representation can be extended so that the matrix contains the weight of the edge if the edge exists. Using this representation, the graph

corresponds to the following matrix:

The drawback of the adjacency matrix representation is that the matrix contains $n^2$ elements, and usually most of them are zero. For this reason, the representation cannot be used if the graph is large.

Edge list representation

An edge list contains all edges of a graph in some order. This is a convenient way to represent a graph if the algorithm processes all edges of the graph and it is not needed to find edges that start at a given node.

The edge list can be stored in a vector

#![allow(unused)]
fn main() {
let mut edges: Vec<(i32, i32)> = Vec::new();
println!("{edges:?}");
}

where each pair $(a,b)$ denotes that there is an edge from node $a$ to node $b$. Thus, the graph

can be represented as follows:

#![allow(unused)]
fn main() {
let mut edges: Vec<(i32, i32)> = Vec::new();
edges.push((1,2));
edges.push((2,3));
edges.push((2,4));
edges.push((3,4));
edges.push((4,1));
println!("{edges:?}");
}

If the graph is weighted, the structure can be extended as follows:

#![allow(unused)]
fn main() {
let mut edges: Vec<(i32, i32, i32)> = Vec::new();
println!("{edges:?}");
}

Each element in this list is of the form $(a,b,w)$, which means that there is an edge from node $a$ to node $b$ with weight $w$. For example, the graph

can be represented as follows:

#![allow(unused)]
fn main() {
let mut edges: Vec<(i32, i32, i32)> = Vec::new();
edges.push((1,2,5));
edges.push((2,3,7));
edges.push((2,4,6));
edges.push((3,4,5));
edges.push((4,1,2));
println!("{edges:?}");
}

Graph traversal

This chapter discusses two fundamental graph algorithms: depth-first search and breadth-first search. Both algorithms are given a starting node in the graph, and they visit all nodes that can be reached from the starting node. The difference in the algorithms is the order in which they visit the nodes.

Depth-first search

Depth-first search (DFS) is a straightforward graph traversal technique. The algorithm begins at a starting node, and proceeds to all other nodes that are reachable from the starting node using the edges of the graph.

Depth-first search always follows a single path in the graph as long as it finds new nodes. After this, it returns to previous nodes and begins to explore other parts of the graph. The algorithm keeps track of visited nodes, so that it processes each node only once.

Example

Let us consider how depth-first search processes the following graph:

We may begin the search at any node of the graph; now we will begin the search at node 1.

The search first proceeds to node 2:

After this, nodes 3 and 5 will be visited:

The neighbors of node 5 are 2 and 3, but the search has already visited both of them, so it is time to return to the previous nodes. Also the neighbors of nodes 3 and 2 have been visited, so we next move from node 1 to node 4:

After this, the search terminates because it has visited all nodes.

The time complexity of depth-first search is $O(n+m)$ where $n$ is the number of nodes and $m$ is the number of edges, because the algorithm processes each node and edge once.

Implementation

Depth-first search can be conveniently implemented using recursion. The following function dfs begins a depth-first search at a given node. The function assumes that the graph is stored as adjacency lists in an array

#![allow(unused)]
fn main() {
const N:usize = 10;
let mut adj: [Vec<i32>; N]= Default::default();
println!("{adj:?}");
}

and also maintains an array

#![allow(unused)]
fn main() {
let mut visited: Vec<bool> = Vec::new();
println!("{visited:?}");
}

that keeps track of the visited nodes. Initially, each array value is false, and when the search arrives at node $s$, the value of visited[s] becomes true. The function can be implemented as follows:

#![allow(unused)]
fn main() {
const N:usize = 10;
let mut adj: [Vec<i32>; N]= Default::default();
let mut visited: Vec<bool> = Vec::new();
fn dfs(s: usize, adj: &[Vec<i32>],visited: &mut Vec<bool>){
    if visited[s] {return}
    visited[s] = true;
    // process node s
    for &u in &adj[s] {
        dfs(u as usize, adj, visited);
    }
}
}

Breadth-first search

Breadth-first search (BFS) visits the nodes in increasing order of their distance from the starting node. Thus, we can calculate the distance from the starting node to all other nodes using breadth-first search. However, breadth-first search is more difficult to implement than depth-first search.

Breadth-first search goes through the nodes one level after another. First the search explores the nodes whose distance from the starting node is 1, then the nodes whose distance is 2, and so on. This process continues until all nodes have been visited.

Example

Let us consider how breadth-first search processes the following graph:

Suppose that the search begins at node 1. First, we process all nodes that can be reached from node 1 using a single edge:

After this, we proceed to nodes 3 and 5:

Finally, we visit node 6:

Now we have calculated the distances from the starting node to all nodes of the graph. The distances are as follows:

node	distance
1	0
2	1
3	2
4	1
5	2
6	3

Like in depth-first search, the time complexity of breadth-first search is $O(n+m)$, where $n$ is the number of nodes and $m$ is the number of edges.

Implementation

Breadth-first search is more difficult to implement than depth-first search, because the algorithm visits nodes in different parts of the graph. A typical implementation is based on a queue that contains nodes. At each step, the next node in the queue will be processed.

The following code assumes that the graph is stored as adjacency lists and maintains the following data structures:

#![allow(unused)]
fn main() {
use std::collections::VecDeque;
const N:usize = 10;
let mut q = VecDeque::<usize>::new();
let mut visited = [false; 10];
let mut distance = [0; 10];
}

The queue q contains nodes to be processed in increasing order of their distance. New nodes are always added to the end of the queue, and the node at the beginning of the queue is the next node to be processed. The array visited indicates which nodes the search has already visited, and the array distance will contain the distances from the starting node to all nodes of the graph.

The search can be implemented as follows, starting at node $x$:

#![allow(unused)]
fn main() {
use std::collections::VecDeque;
const N:usize = 10;
let mut q = VecDeque::<usize>::new();
let mut visited = [false; 10];
let mut distance = [0; 10];
let mut adj: [Vec<usize>; N]= Default::default();
let x = 0;
visited[x] = true;
distance[x] = 0;
q.push_front(x);
while !q.is_empty() {
    let s = q.pop_front().unwrap();
    // process node s
    for &u in &adj[s] {
        if visited[u] {continue}
        visited[u] = true;
        distance[u] = distance[s]+1;
        q.push_front(u);
    }
}
}

Applications

Using the graph traversal algorithms, we can check many properties of graphs. Usually, both depth-first search and breadth-first search may be used, but in practice, depth-first search is a better choice, because it is easier to implement. In the following applications we will assume that the graph is undirected.

Connectivity check

A graph is connected if there is a path between any two nodes of the graph. Thus, we can check if a graph is connected by starting at an arbitrary node and finding out if we can reach all other nodes.

a depth-first search from node $1$ visits the following nodes:

Since the search did not visit all the nodes, we can conclude that the graph is not connected. In a similar way, we can also find all connected components of a graph by iterating through the nodes and always starting a new depth-first search if the current node does not belong to any component yet.

Finding cycles

A graph contains a cycle if during a graph traversal, we find a node whose neighbor (other than the previous node in the current path) has already been visited. For example, the graph

contains two cycles and we can find one of them as follows:

After moving from node 2 to node 5 we notice that the neighbor 3 of node 5 has already been visited. Thus, the graph contains a cycle that goes through node 3, for example, $3 \rightarrow 2 \rightarrow 5 \rightarrow 3$.

Another way to find out whether a graph contains a cycle is to simply calculate the number of nodes and edges in every component. If a component contains $c$ nodes and no cycle, it must contain exactly $c-1$ edges (so it has to be a tree). If there are $c$ or more edges, the component surely contains a cycle.

Bipartiteness check

A graph is bipartite if its nodes can be colored using two colors so that there are no adjacent nodes with the same color. It is surprisingly easy to check if a graph is bipartite using graph traversal algorithms.

The idea is to color the starting node blue, all its neighbors red, all their neighbors blue, and so on. If at some point of the search we notice that two adjacent nodes have the same color, this means that the graph is not bipartite. Otherwise the graph is bipartite and one coloring has been found.

For example, the graph

is not bipartite, because a search from node 1 proceeds as follows:

We notice that the color of both nodes 2 and 5 is red, while they are adjacent nodes in the graph. Thus, the graph is not bipartite.

This algorithm always works, because when there are only two colors available, the color of the starting node in a component determines the colors of all other nodes in the component. It does not make any difference whether the starting node is red or blue.

Note that in the general case, it is difficult to find out if the nodes in a graph can be colored using $k$ colors so that no adjacent nodes have the same color. Even when $k=3$, no efficient algorithm is known but the problem is NP-hard.

Shortest paths

Finding a shortest path between two nodes of a graph is an important problem that has many practical applications. For example, a natural problem related to a road network is to calculate the shortest possible length of a route between two cities, given the lengths of the roads.

In an unweighted graph, the length of a path equals the number of its edges, and we can simply use breadth-first search to find a shortest path. However, in this chapter we focus on weighted graphs where more sophisticated algorithms are needed for finding shortest paths.

Bellman–Ford algorithm

The Bellman–Ford algorithm finds shortest paths from a starting node to all nodes of the graph. The algorithm can process all kinds of graphs, provided that the graph does not contain a cycle with negative length. If the graph contains a negative cycle, the algorithm can detect this.

The algorithm keeps track of distances from the starting node to all nodes of the graph. Initially, the distance to the starting node is 0 and the distance to all other nodes in infinite. The algorithm reduces the distances by finding edges that shorten the paths until it is not possible to reduce any distance.

Example

Let us consider how the Bellman–Ford algorithm works in the following graph:

Each node of the graph is assigned a distance. Initially, the distance to the starting node is 0, and the distance to all other nodes is infinite.

The algorithm searches for edges that reduce distances. First, all edges from node 1 reduce distances:

After this, edges $2 \rightarrow 5$ and $3 \rightarrow 4$ reduce distances:

Finally, there is one more change:

After this, no edge can reduce any distance. This means that the distances are final, and we have successfully calculated the shortest distances from the starting node to all nodes of the graph.

For example, the shortest distance 3 from node 1 to node 5 corresponds to the following path:

Implementation

The following implementation of the Bellman–Ford algorithm determines the shortest distances from a node $x$ to all nodes of the graph. The code assumes that the graph is stored as an edge list edges that consists of tuples of the form $(a,b,w)$, meaning that there is an edge from node $a$ to node $b$ with weight $w$.

The algorithm consists of $n-1$ rounds, and on each round the algorithm goes through all edges of the graph and tries to reduce the distances. The algorithm constructs an array distance that will contain the distances from $x$ to all nodes of the graph. The constant INF denotes an infinite distance.

#![allow(unused)]
fn main() {
use std::cmp::Ordering;
const INF:usize = (1./0.) as u32 as usize;
let mut x = 1;
let mut n = 6;
let mut edges=vec![(1,2,5), (1,4,7), (1,3,3), (2,4,3), (2,5,2), (2, 1, 5), (3, 4, 1), (3, 1, 3), (4,5,2), (4, 1, 7), (4, 3, 1), (4, 2, 3),  (5, 4, 2), (5, 2, 2)];
let mut distance = vec![INF; n];
distance[x] = 0;
for _ in 0..n{
    for e in &edges{
        let (a, b, w) = *e;
        distance[b] = distance[b].min(distance[a]+w);
    }
}
println!("{:?}", &distance[1..]);
}

The time complexity of the algorithm is $O(nm)$, because the algorithm consists of $n-1$ rounds and iterates through all $m$ edges during a round. If there are no negative cycles in the graph, all distances are final after $n-1$ rounds, because each shortest path can contain at most $n-1$ edges.

In practice, the final distances can usually be found faster than in $n-1$ rounds. Thus, a possible way to make the algorithm more efficient is to stop the algorithm if no distance can be reduced during a round.

Negative cycles

The Bellman–Ford algorithm can also be used to check if the graph contains a cycle with negative length. For example, the graph

contains a negative cycle $2 \rightarrow 3 \rightarrow 4 \rightarrow 2$ with length $-4$.

If the graph contains a negative cycle, we can shorten infinitely many times any path that contains the cycle by repeating the cycle again and again. Thus, the concept of a shortest path is not meaningful in this situation.

A negative cycle can be detected using the Bellman–Ford algorithm by running the algorithm for $n$ rounds. If the last round reduces any distance, the graph contains a negative cycle. Note that this algorithm can be used to search for a negative cycle in the whole graph regardless of the starting node.

SPFA Algorithm

The SPFA algorithm (Shortest Path Faster Algorithm) [20] is a variant of the Bellman–Ford algorithm, that is often more efficient than the original algorithm. The SPFA algorithm does not go through all the edges on each round, but instead, it chooses the edges to be examined in a more intelligent way.

The algorithm maintains a queue of nodes that might be used for reducing the distances. First, the algorithm adds the starting node $x$ to the queue. Then, the algorithm always processes the first node in the queue, and when an edge $a \rightarrow b$ reduces a distance, node $b$ is added to the queue.

The efficiency of the SPFA algorithm depends on the structure of the graph: the algorithm is often efficient, but its worst case time complexity is still $O(nm)$ and it is possible to create inputs that make the algorithm as slow as the original Bellman–Ford algorithm.

¹ The algorithm is named after R. E. Bellman and L. R. Ford who published it independently in 1958 and 1956, respectively [5, 24]

Dijkstra’s algorithm

Dijkstra's algorithm finds shortest paths from the starting node to all nodes of the graph, like the Bellman–Ford algorithm. The benefit of Dijsktra's algorithm is that it is more efficient and can be used for processing large graphs. However, the algorithm requires that there are no negative weight edges in the graph.

Like the Bellman–Ford algorithm, Dijkstra's algorithm maintains distances to the nodes and reduces them during the search. Dijkstra's algorithm is efficient, because it only processes each edge in the graph once, using the fact that there are no negative edges.

Example

Let us consider how Dijkstra's algorithm works in the following graph when the starting node is node 1:

Like in the Bellman–Ford algorithm, initially the distance to the starting node is 0 and the distance to all other nodes is infinite.

At each step, Dijkstra's algorithm selects a node that has not been processed yet and whose distance is as small as possible. The first such node is node 1 with distance 0.

When a node is selected, the algorithm goes through all edges that start at the node and reduces the distances using them:

In this case, the edges from node 1 reduced the distances of nodes 2, 4 and 5, whose distances are now 5, 9 and 1.

The next node to be processed is node 5 with distance 1. This reduces the distance to node 4 from 9 to 3:

After this, the next node is node 4, which reduces the distance to node 3 to 9:

A remarkable property in Dijkstra's algorithm is that whenever a node is selected, its distance is final. For example, at this point of the algorithm, the distances 0, 1 and 3 are the final distances to nodes 1, 5 and 4.

After this, the algorithm processes the two remaining nodes, and the final distances are as follows:

Negative edges

The efficiency of Dijkstra's algorithm is based on the fact that the graph does not contain negative edges. If there is a negative edge, the algorithm may give incorrect results. As an example, consider the following graph:

The shortest path from node 1 to node 4 is $1 \rightarrow 3 \rightarrow 4$ and its length is 1. However, Dijkstra's algorithm finds the path $1 \rightarrow 2 \rightarrow 4$ by following the minimum weight edges. The algorithm does not take into account that on the other path, the weight $-5$ compensates the previous large weight $6$.

Implementation

The following implementation of Dijkstra's algorithm calculates the minimum distances from a node x to other nodes of the graph. The graph is stored as adjacency lists so that adj[$a$] contains a pair (b,w) always when there is an edge from node a to node b with weight w.

An efficient implementation of Dijkstra's algorithm requires that it is possible to efficiently find the minimum distance node that has not been processed. An appropriate data structure for this is a priority queue that contains the nodes ordered by their distances. Using a priority queue, the next node to be processed can be retrieved in logarithmic time.

In the following code, the priority queue q contains pairs of the form (-d,x), meaning that the current distance to node x is d. The array distance contains the distance to each node, and the array processed indicates whether a node has been processed. Initially the distance is 0 to x and INF to all other nodes.

#![allow(unused)]
fn main() {
use std::cmp::Reverse;
const INF:isize = (1./0.) as u32 as isize;
let mut x = 1;
let mut n = 6;
let mut adj: [Vec<_>; 6]=[vec![], vec![(2,5), (4,7), (3,3)], vec![(4,3), (5,2), (1, 5)], vec![(4, 1), (1, 3)], vec![(5,2), (1, 7), (3, 1), (2, 3)],  vec![(4, 2), (2, 2)]];
let mut q = std::collections::BinaryHeap::new();
let mut processed = vec![false; n];
let mut distance = vec![INF; n];
distance[x] = 0;
q.push((Reverse(0),x));
while !q.is_empty(){
    let a = q.pop().unwrap().1;
    if processed[a] {continue}
    processed[a] = true;
    for &u in &adj[a]{
        let (b, w) = u;
        if (distance[a] + w) < distance[b] {
            distance[b] = distance[a] + w;
            q.push((Reverse(distance[b]), b));
        }
    }
}
println!("{:?}", &distance[1..]);
}

Note that the priority queue push with Reverse(). The reason for this is that the default version of the Rust BinaryHeap finds maximum elements (max-heap), while we want to find minimum elements (min-heap). By using std::Comparing::Reverse helper the cmp Ordering is considered in reverse order. Also note that there may be several instances of the same node in the priority queue; however, only the instance with the minimum distance will be processed.

The time complexity of the above implementation is $O(n+m \log m)$, because the algorithm goes through all nodes of the graph and adds for each edge at most one distance to the priority queue.

¹ E. W. Dijkstra published the algorithm in 1959 [14]; however, his original paper does not mention how to implement the algorithm efficiently.

Floyd–Warshall algorithm

The Floyd–Warshall algorithm provides an alternative way to approach the problem of finding shortest paths. Unlike the other algorithms of this chapter, it finds all shortest paths between the nodes in a single run.

The algorithm maintains a two-dimensional array that contains distances between the nodes. First, distances are calculated only using direct edges between the nodes, and after this, the algorithm reduces distances by using intermediate nodes in paths.

Example

Let us consider how the Floyd–Warshall algorithm works in the following graph:

Initially, the distance from each node to itself is $0$, and the distance between nodes $a$ and $b$ is $x$ if there is an edge between nodes $a$ and $b$ with weight $x$. All other distances are infinite.

	1	2	3	4	5
1	0	5	inf	9	1
2	5	0	2	inf	inf
3	inf	2	0	7	inf
4	9	inf	7	0	2
5	1	inf	inf	2	0

The algorithm consists of consecutive rounds. On each round, the algorithm selects a new node that can act as an intermediate node in paths from now on, and distances are reduced using this node.

On the first round, node 1 is the new intermediate node. There is a new path between nodes 2 and 4 with length 14, because node 1 connects them. There is also a new path between nodes 2 and 5 with length 6.

	1	2	3	4	5
1	0	5	inf	9	1
2	5	0	2	14	6
3	inf	2	0	7	inf
4	9	14	7	0	2
5	1	6	inf	2	0

On the second round, node 2 is the new intermediate node. This creates new paths between nodes 1 and 3 and between nodes 3 and 5:

	1	2	3	4	5
1	0	5	7	9	1
2	5	0	2	14	6
3	7	2	0	7	8
4	9	14	7	0	2
5	1	6	8	2	0

On the third round, node 3 is the new intermediate round. There is a new path between nodes 2 and 4:

	1	2	3	4	5
1	0	5	7	9	1
2	5	0	2	9	6
3	7	2	0	7	8
4	9	9	7	0	2
5	1	6	8	2	0

The algorithm continues like this, until all nodes have been appointed intermediate nodes. After the algorithm has finished, the array contains the minimum distances between any two nodes:

	1	2	3	4	5
1	0	5	7	3	1
2	5	0	2	8	6
3	7	2	0	7	8
4	3	8	7	0	2
5	1	6	8	2	0

For example, the array tells us that the shortest distance between nodes 2 and 4 is 8. This corresponds to the following path:

Implementation

The advantage of the Floyd–Warshall algorithm that it is easy to implement. The following code constructs a distance matrix where distance[a][b] is the shortest distance between nodes $a$ and $b$. First, the algorithm initializes distance using the adjacency matrix adj of the graph:

#![allow(unused)]
fn main() {
const INF:isize = (1./0.) as u32 as isize;
let mut n = 5;
let mut adj = [[INF, INF, INF, INF, INF, INF], [INF, INF, 5, INF, 9, 1], [INF, 5, INF, 2, INF, INF], [INF, INF, 2, INF, 7, INF], [INF, 9, INF, 7, INF, 2], [INF, 1, INF, INF, 2, INF]];
let mut distance = adj;
for i in 1..=n{
    for j in 1..=n {
        if i == j {distance[i][j] = 0}
        else if adj[i][j] != INF {distance[i][j] = adj[i][j]}
        else {distance[i][j] = INF}
    }
}
for i in 1..=n {
    for j in 1..=n{
          let d = distance[i][j].clone().to_string();
        print!("{} ", if distance[i][j] > 10 { "∞" } else { &d });
    }
  println!("");
}
}

After this, the shortest distances can be found as follows:

#![allow(unused)]
fn main() {
use std::cmp::Ordering;
const INF:isize = (1./0.) as u32 as isize;
let mut n = 5;
let mut adj = [[INF, INF, INF, INF, INF, INF], [INF, INF, 5, INF, 9, 1], [INF, 5, INF, 2, INF, INF], [INF, INF, 2, INF, 7, INF], [INF, 9, INF, 7, INF, 2], [INF, 1, INF, INF, 2, INF]];
let mut distance = adj;
for i in 1..=n{
    for j in 1..=n {
        if i == j {distance[i][j] = 0}
        else if adj[i][j] != INF {distance[i][j] = adj[i][j]}
        else {distance[i][j] = INF}
    }
}
for k in 1..=n{
    for i in 1..=n{
        for j in 1..=n{
            distance[i][j] = distance[i][j].min(distance[i][k]+distance[k][j]);
        }
    }
}
for i in 1..=n {
    for j in 1..=n{
        print!("{} ", distance[i][j]);
    }
  println!("");
}
}

The time complexity of the algorithm is $O(n^3)$, because it contains three nested loops that go through the nodes of the graph.

Since the implementation of the Floyd–Warshall algorithm is simple, the algorithm can be a good choice even if it is only needed to find a single shortest path in the graph. However, the algorithm can only be used when the graph is so small that a cubic time complexity is fast enough.

¹ The algorithm is named after R. W. Floyd and S. Warshall who published it independently in 1962 [23, 70].

Tree algorithms

A tree is a connected, acyclic graph that consists of $n$ nodes and $n-1$ edges. Removing any edge from a tree divides it into two components, and adding any edge to a tree creates a cycle. Moreover, there is always a unique path between any two nodes of a tree.

For example, the following tree consists of 8 nodes and 7 edges:

The leaves of a tree are the nodes with degree 1, i.e., with only one neighbor. For example, the leaves of the above tree are nodes 3, 5, 7 and 8.

In a rooted tree, one of the nodes is appointed the root of the tree, and all other nodes are placed underneath the root. For example, in the following tree, node 1 is the root node.

In a rooted tree, the children of a node are its lower neighbors, and the parent of a node is its upper neighbor. Each node has exactly one parent, except for the root that does not have a parent. For example, in the above tree, the children of node 2 are nodes 5 and 6, and its parent is node 1.

The structure of a rooted tree is recursive: each node of the tree acts as the root of a subtree that contains the node itself and all nodes that are in the subtrees of its children. For example, in the above tree, the subtree of node 2 consists of nodes 2, 5, 6 and 8:

Tree traversal

General graph traversal algorithms can be used to traverse the nodes of a tree. However, the traversal of a tree is easier to implement than that of a general graph, because there are no cycles in the tree and it is not possible to reach a node from multiple directions.

The typical way to traverse a tree is to start a depth-first search at an arbitrary node. The following recursive function can be used:

#![allow(unused)]
fn main() {
let adj = [vec![], vec![2, 3, 4], vec![5, 6], vec![], vec![7], vec![], vec![8], vec![], vec![]];
dfs(1,0,&adj);
fn dfs(s: usize, e: usize, adj: &[Vec<usize>]) {
    // process node s
println!("{s}");
    for u in &adj[s] {
        if *u != e {dfs(*u, s, adj)}
    }
}
}

The function is given two parameters: the current node $s$ and the previous node $e$. The purpose of the parameter $e$ is to make sure that the search only moves to nodes that have not been visited yet.

The following function call starts the search at node $x$:

dfs(x, 0, &adg);

In the first call $e=0$, because there is no previous node, and it is allowed to proceed to any direction in the tree.

Dynamic programming

Dynamic programming can be used to calculate some information during a tree traversal. Using dynamic programming, we can, for example, calculate in $O(n)$ time for each node of a rooted tree the number of nodes in its subtree or the length of the longest path from the node to a leaf.

As an example, let us calculate for each node $s$ a value count[s]: the number of nodes in its subtree. The subtree contains the node itself and all nodes in the subtrees of its children, so we can calculate the number of nodes recursively using the following code:

#![allow(unused)]
fn main() {
let adj = [vec![], vec![2, 3, 4], vec![5, 6], vec![], vec![7], vec![], vec![8], vec![], vec![]];
let mut count = vec![0; 9];
dfs(1,0,&adj, &mut count);
fn dfs(s: usize, e: usize, adj: &[Vec<usize>], count: &mut [usize]){
println!("count: {:?}", &count);
    count[s] = 1;
    for u in &adj[s] {
        if *u == e {continue}
        dfs(*u, s, adj, count);
        count[s] += count[*u];
    }
}
}

Diameter

The diameter of a tree is the maximum length of a path between two nodes. For example, consider the following tree:

The diameter of this tree is 4, which corresponds to the following path:

Note that there may be several maximum-length paths. In the above path, we could replace node 6 with node 5 to obtain another path with length 4.

Next we will discuss two $O(n)$ time algorithms for calculating the diameter of a tree. The first algorithm is based on dynamic programming, and the second algorithm uses two depth-first searches.

Algorithm 1

A general way to approach many tree problems is to first root the tree arbitrarily. After this, we can try to solve the problem separately for each subtree. Our first algorithm for calculating the diameter is based on this idea.

An important observation is that every path in a rooted tree has a highest point: the highest node that belongs to the path. Thus, we can calculate for each node the length of the longest path whose highest point is the node. One of those paths corresponds to the diameter of the tree.

For example, in the following tree, node 1 is the highest point on the path that corresponds to the diameter:

We calculate for each node $x$ two values:

to_leaf(x): the maximum length of a path from x to any leaf
max_length(x): the maximum length of a path whose highest point is $x$

For example, in the above tree, to_leaf(1)=2, because there is a path $1 \rightarrow 2 \rightarrow 6$, and max_length(1)=4, because there is a path $6 \rightarrow 2 \rightarrow 1 \rightarrow 4 \rightarrow 7$. In this case, max_length(1) equals the diameter.

Dynamic programming can be used to calculate the above values for all nodes in $O(n)$ time. First, to calculate to_leaf(x), we go through the children of $x$, choose a child $c$ with maximum to_leaf(c) and add one to this value. Then, to calculate max_length(x), we choose two distinct children $a$ and $b$ such that the sum to_leaf(a)+to_leaf(b) is maximum and add two to this sum.

Algorithm 2

Another efficient way to calculate the diameter of a tree is based on two depth-first searches. First, we choose an arbitrary node $a$ in the tree and find the farthest node $b$ from $a$. Then, we find the farthest node $c$ from $b$. The diameter of the tree is the distance between $b$ and $c$.

In the following graph, $a$, $b$ and $c$ could be:

This is an elegant method, but why does it work?

It helps to draw the tree differently so that the path that corresponds to the diameter is horizontal, and all other nodes hang from it:

Node $x$ indicates the place where the path from node $a$ joins the path that corresponds to the diameter. The farthest node from $a$ is node $b$, node $c$ or some other node that is at least as far from node $x$. Thus, this node is always a valid choice for an endpoint of a path that corresponds to the diameter.

All longest paths

Our next problem is to calculate for every node in the tree the maximum length of a path that begins at the node. This can be seen as a generalization of the tree diameter problem, because the largest of those lengths equals the diameter of the tree. Also this problem can be solved in $O(n)$ time.

As an example, consider the following tree:

Let maxLength(x) denote the maximum length of a path that begins at node $x$. For example, in the above tree, maxLength(4)=3, because there is a path $4 \rightarrow 1 \rightarrow 2 \rightarrow 6$. Here is a complete table of the values:


node $x$	1	2	3	4	5	6
maxLength($x$)	2	2	3	3	3	3

Also in this problem, a good starting point for solving the problem is to root the tree arbitrarily:

The first part of the problem is to calculate for every node $x$ the maximum length of a path that goes through a child of $x$. For example, the longest path from node 1 goes through its child 2:

This part is easy to solve in $O(n)$ time, because we can use dynamic programming as we have done previously.

Then, the second part of the problem is to calculate for every node $x$ the maximum length of a path through its parent $p$. For example, the longest path from node 3 goes through its parent 1:

At first glance, it seems that we should choose the longest path from $p$. However, this does not always work, because the longest path from $p$ may go through $x$. Here is an example of this situation:

Still, we can solve the second part in $O(n)$ time by storing two maximum lengths for each node $x$:

maxLength_1(x): the maximum length of a path from $x$
maxLength_2(x): the maximum length of a path from $x$ in another direction than the first path

For example, in the above graph, maxLength_1(1)=2 using the path $1 \rightarrow 2 \rightarrow 5$, and maxLength_2(1)=1 using the path $1 \rightarrow 3$.

Finally, if the path that corresponds to maxLength_1(p) goes through $x$, we conclude that the maximum length is maxLength_2(p)+1, and otherwise the maximum length is maxLength_1(p)+1.

Binary trees

A binary tree is a rooted tree where each node has a left and right subtree. It is possible that a subtree of a node is empty. Thus, every node in a binary tree has zero, one or two children.

For example, the following tree is a binary tree:

The nodes of a binary tree have three natural orderings that correspond to different ways to recursively traverse the tree:

pre-order: first process the root, then traverse the left subtree, then traverse the right subtree
in-order: first traverse the left subtree, then process the root, then traverse the right subtree
post-order: first traverse the left subtree, then traverse the right subtree, then process the root

For the above tree, the nodes in pre-order are $[1,2,4,5,6,3,7]$, in in-order $[4,2,6,5,1,3,7]$ and in post-order $[4,6,5,2,7,3,1]$.

If we know the pre-order and in-order of a tree, we can reconstruct the exact structure of the tree. For example, the above tree is the only possible tree with pre-order $[1,2,4,5,6,3,7]$ and in-order $[4,2,6,5,1,3,7]$. In a similar way, the post-order and in-order also determine the structure of a tree.

However, the situation is different if we only know the pre-order and post-order of a tree. In this case, there may be more than one tree that match the orderings. For example, in both of the trees

the pre-order is $[1, 2]$ and the post-order is $[2, 1]$, but the structures of the trees are different

Spanning trees

A spanning tree of a graph consists of all nodes of the graph and some of the edges of the graph so that there is a path between any two nodes. Like trees in general, spanning trees are connected and acyclic. Usually there are several ways to construct a spanning tree.

For example, consider the following graph:

One spanning tree for the graph is as follows:

The weight of a spanning tree is the sum of its edge weights. For example, the weight of the above spanning tree is $3+5+9+3+2=22$.

A minimum spanning tree is a spanning tree whose weight is as small as possible. The weight of a minimum spanning tree for the example graph is 20, and such a tree can be constructed as follows:

In a similar way, a maximum spanning tree is a spanning tree whose weight is as large as possible. The weight of a maximum spanning tree for the example graph is 32:

Note that a graph may have several minimum and maximum spanning trees, so the trees are not unique.

It turns out that several greedy methods can be used to construct minimum and maximum spanning trees. In this chapter, we discuss two algorithms that process the edges of the graph ordered by their weights. We focus on finding minimum spanning trees, but the same algorithms can find maximum spanning trees by processing the edges in reverse order.

Kruskal’s algorithm

In Kruskal's algorithm, the initial spanning tree only contains the nodes of the graph and does not contain any edges. Then the algorithm goes through the edges ordered by their weights, and always adds an edge to the tree if it does not create a cycle.

The algorithm maintains the components of the tree. Initially, each node of the graph belongs to a separate component. Always when an edge is added to the tree, two components are joined. Finally, all nodes belong to the same component, and a minimum spanning tree has been found.

Example

Let us consider how Kruskal's algorithm processes the following graph:

The first step of the algorithm is to sort the edges in increasing order of their weights. The result is the following list:

edge	weight
5--6	2
1--2	3
3--6	3
1--5	5
2--3	5
2--5	6
4--6	7
3--4	9

After this, the algorithm goes through the list and adds each edge to the tree if it joins two separate components.

Initially, each node is in its own component:

The first edge to be added to the tree is the edge 5--6 that creates a component {5,6} by joining the components {5} and {6}:

After this, the edges 1--2, 3--6 and 1--5 are added in a similar way:

After those steps, most components have been joined and there are two components in the tree: {1,2,3,5,6} and {4}.

After this, the algorithm will not add any new edges, because the graph is connected and there is a path between any two nodes. The resulting graph is a minimum spanning tree with weight $2+3+3+5+7=20$.

Why does this work?

It is a good question why Kruskal's algorithm works. Why does the greedy strategy guarantee that we will find a minimum spanning tree?

Let us see what happens if the minimum weight edge of the graph is not included in the spanning tree. For example, suppose that a spanning tree for the previous graph would not contain the minimum weight edge 5--6. We do not know the exact structure of such a spanning tree, but in any case it has to contain some edges. Assume that the tree would be as follows:

However, it is not possible that the above tree would be a minimum spanning tree for the graph. The reason for this is that we can remove an edge from the tree and replace it with the minimum weight edge 5--6. This produces a spanning tree whose weight is smaller:

For this reason, it is always optimal to include the minimum weight edge in the tree to produce a minimum spanning tree. Using a similar argument, we can show that it is also optimal to add the next edge in weight order to the tree, and so on. Hence, Kruskal's algorithm works correctly and always produces a minimum spanning tree.

Implementation

When implementing Kruskal's algorithm, it is convenient to use the edge list representation of the graph. The first phase of the algorithm sorts the edges in the list in $O(m \log m)$ time. After this, the second phase of the algorithm builds the minimum spanning tree as follows:

for ... {
  if !a.equal_to(b) {a.join(b)}
}

The loop goes through the edges in the list and always processes an edge $a$--$b$ where $a$ and $b$ are two nodes. Two functions are needed: the function equal_to determines if $a$ and $b$ are in the same component, and the function join joins the components that contain $a$ and $b$.

The problem is how to efficiently implement the functions equal_to and join. One possibility is to implement the function equal_to as a graph traversal and check if we can get from node $a$ to node $b$. However, the time complexity of such a function would be $O(n+m)$ and the resulting algorithm would be slow, because the function equal_to will be called for each edge in the graph.

We will solve the problem using a union-find structure that implements both functions in $O(\log n)$ time. Thus, the time complexity of Kruskal's algorithm will be $O(m \log n)$ after sorting the edge list.

¹ The algorithm was published in 1956 by J. B. Kruskal [48].

Union-find structure

A union-find structure maintains a collection of sets. The sets are disjoint, so no element belongs to more than one set. Two $O(\log n)$ time operations are supported: the unite operation joins two sets, and the find operation finds the representative of the set that contains a given element.

Structure

In a union-find structure, one element in each set is the representative of the set, and there is a chain from any other element of the set to the representative. For example, assume that the sets are ${1,4,7}$, ${5}$ and ${2,3,6,8}$:

In this case the representatives of the sets are 4, 5 and 2. We can find the representative of any element by following the chain that begins at the element. For example, the element 2 is the representative for the element 6, because we follow the chain $6 \rightarrow 3 \rightarrow 2$. Two elements belong to the same set exactly when their representatives are the same.

Two sets can be joined by connecting the representative of one set to the representative of the other set. For example, the sets {1,4,7} and {2,3,6,8} can be joined as follows:

The resulting set contains the elements {1,2,3,4,6,7,8}. From this on, the element 2 is the representative for the entire set and the old representative 4 points to the element 2.

The efficiency of the union-find structure depends on how the sets are joined. It turns out that we can follow a simple strategy: always connect the representative of the \emph{smaller} set to the representative of the \emph{larger} set (or if the sets are of equal size, we can make an arbitrary choice). Using this strategy, the length of any chain will be $O(\log n)$, so we can find the representative of any element efficiently by following the corresponding chain.

Implementation

The union-find structure can be implemented using arrays. In the following implementation, the array link contains for each element the next element in the chain or the element itself if it is a representative, and the array size indicates for each representative the size of the corresponding set.

Initially, each element belongs to a separate set:

for i in 1..=n {link[i] = i};
for i in 1..=n {size[i] = i};

The function find returns the representative for an element $x$. The representative can be found by following the chain that begins at $x$.

#![allow(unused)]
fn main() {
fn find(mut x:usize, link: &[usize]) -> usize {
    while x != link[x] {x = link[x]}
    x
}
}

The function same checks whether elements $a$ and $b$ belong to the same set. This can easily be done by using the function find:

#![allow(unused)]
fn main() {
fn find(mut x:usize, link: &[usize]) -> usize {
    while x != link[x] {x = link[x]}
    x
}
fn same(a: usize, b: usize, link: &[usize]) -> bool {
    find(a, link) == find(b, link)
}
}

The function unite joins the sets that contain elements $a$ and $b$ (the elements have to be in different sets). The function first finds the representatives of the sets and then connects the smaller set to the larger set.

#![allow(unused)]
fn main() {
fn find(mut x:usize, link: &[usize]) -> usize {
    while x != link[x] {x = link[x]}
    x
}
fn swap(a: usize, b: usize){
todo!();
}
fn unite(mut a: usize, mut b: usize, link: &mut [usize], size: &mut [usize]) {
    a = find(a, link);
    b = find(b, link);
    if size[a] < size[b] {swap(a,b)}
    size[a] += size[b];
    link[b] = a;
}
}

The time complexity of the function find is $O(\log n)$ assuming that the length of each chain is $O(\log n)$. In this case, the functions same and unite also work in $O(\log n)$ time. The function unite makes sure that the length of each chain is $O(\log n)$ by connecting the smaller set to the larger set.

¹ The structure presented here was introduced in 1971 by J. D. Hopcroft and J. D. Ullman [38]. Later, in 1975, R. E. Tarjan studied a more sophisticated variant of the structure [64] that is discussed in many algorithm textbooks nowadays.

Prim’s algorithm

Prim's algorithm¹ is an alternative method for finding a minimum spanning tree. The algorithm first adds an arbitrary node to the tree. After this, the algorithm always chooses a minimum-weight edge that adds a new node to the tree. Finally, all nodes have been added to the tree and a minimum spanning tree has been found.

Prim's algorithm resembles Dijkstra's algorithm. The difference is that Dijkstra's algorithm always selects an edge whose distance from the starting node is minimum, but Prim's algorithm simply selects the minimum weight edge that adds a new node to the tree.

Example

Let us consider how Prim's algorithm works in the following graph:

Initially, there are no edges between the nodes:

An arbitrary node can be the starting node, so let us choose node 1. First, we add node 2 that is connected by an edge of weight 3:

After this, there are two edges with weight 5, so we can add either node 3 or node 5 to the tree. Let us add node 3 first:

The process continues until all nodes have been included in the tree:

Implementation

Like Dijkstra's algorithm, Prim's algorithm can be efficiently implemented using a priority queue. The priority queue should contain all nodes that can be connected to the current component using a single edge, in increasing order of the weights of the corresponding edges.

The time complexity of Prim's algorithm is $O(n + m \log m)$ that equals the time complexity of Dijkstra's algorithm. In practice, Prim's and Kruskal's algorithms are both efficient, and the choice of the algorithm is a matter of taste. Still, most competitive programmers use Kruskal's algorithm.

¹ The algorithm is named after R. C. Prim who published it in 1957 [54]. However, the same algorithm was discovered already in 1930 by V. Jarník.

Directed graphs

In this chapter, we focus on two classes of directed graphs:

Acyclic graphs: There are no cycles in the graph, so there is no path from any node to itself¹.
Successor graphs: The outdegree of each node is 1, so each node has a unique successor.

It turns out that in both cases, we can design efficient algorithms that are based on the special properties of the graphs.

¹ Directed acyclic graphs are sometimes called DAGs

Topological sorting

A topological sort is an ordering of the nodes of a directed graph such that if there is a path from node $a$ to node $b$, then node $a$ appears before node $b$ in the ordering. For example, for the graph

one topological sort is [4,1,5,2,3,6]:

An acyclic graph always has a topological sort. However, if the graph contains a cycle, it is not possible to form a topological sort, because no node of the cycle can appear before the other nodes of the cycle in the ordering. It turns out that depth-first search can be used to both check if a directed graph contains a cycle and, if it does not contain a cycle, to construct a topological sort.

Dynamic programming

The idea is to go through the nodes of the graph and always begin a depth-first search at the current node if it has not been processed yet. During the searches, the nodes have three possible states:

state 0: the node has not been processed (white)
state 1: the node is under processing (light gray)
state 2: the node has been processed (dark gray)

Initially, the state of each node is 0. When a search reaches a node for the first time, its state becomes 1. Finally, after all successors of the node have been processed, its state becomes 2.

If the graph contains a cycle, we will find this out during the search, because sooner or later we will arrive at a node whose state is 1. In this case, it is not possible to construct a topological sort.

If the graph does not contain a cycle, we can construct a topological sort by adding each node to a list when the state of the node becomes 2. This list in reverse order is a topological sort.

Example 1

In the example graph, the search first proceeds from node 1 to node 6:

Now node 6 has been processed, so it is added to the list. After this, also nodes 3, 2 and 1 are added to the list:

At this point, the list is [6,3,2,1]. The next search begins at node 4:

Thus, the final list is [6,3,2,1,5,4]. We have processed all nodes, so a topological sort has been found. The topological sort is the reverse list [4,5,1,2,3,6]:

Note that a topological sort is not unique, and there can be several topological sorts for a graph.

Example 2

Let us now consider a graph for which we cannot construct a topological sort, because the graph contains a cycle:

The search proceeds as follows:

The search reaches node 2 whose state is 1, which means that the graph contains a cycle. In this example, there is a cycle $2 \rightarrow 3 \rightarrow 5 \rightarrow 2$.

Dynamic programming

If a directed graph is acyclic, dynamic programming can be applied to it. For example, we can efficiently solve the following problems concerning paths from a starting node to an ending node:

how many different paths are there?
what is the shortest/longest path?
what is the minimum/maximum number of edges in a path?
which nodes certainly appear in any path?

Counting the number of paths

As an example, let us calculate the number of paths from node 1 to node 6 in the following graph:

There are a total of three such paths:

$1 \rightarrow 2 \rightarrow 3 \rightarrow 6$
$1 \rightarrow 4 \rightarrow 5 \rightarrow 2 \rightarrow 3 \rightarrow 6$
$1 \rightarrow 4 \rightarrow 5 \rightarrow 3 \rightarrow 6$

Let paths(x) denote the number of paths from node 1 to node $x$. As a base case, paths(1)=1. Then, to calculate other values of paths(x), we may use the recursion

\[ \texttt{paths}(x) = \texttt{paths}(a_1)+\texttt{paths}(a_2)+\cdots+\texttt{paths}(a_k) \]

where $a_1,a_2,\ldots,a_k$ are the nodes from which there is an edge to $x$. Since the graph is acyclic, the values of $\texttt{paths}(x)$ can be calculated in the order of a topological sort. A topological sort for the above graph is as follows:

Hence, the numbers of paths are as follows:

For example, to calculate the value of paths(3), we can use the formula paths(2)+paths(5), because there are edges from nodes 2 and 5 to node 3. Since paths}(2)=2 and paths}(5)=1, we conclude that paths}(3)=3.

Extending Dijkstra's algorithm

A by-product of Dijkstra's algorithm is a directed, acyclic graph that indicates for each node of the original graph the possible ways to reach the node using a shortest path from the starting node. Dynamic programming can be applied to that graph. For example, in the graph

the shortest paths from node 1 may use the following edges:

Now we can, for example, calculate the number of shortest paths from node 1 to node 5 using dynamic programming:

Representing problems as a graphs

Actually, any dynamic programming problem can be represented as a directed, acyclic graph. In such a graph, each node corresponds to a dynamic programming state and the edges indicate how the states depend on each other.

As an example, consider the problem of forming a sum of money $n$ using coins $ \{c_1,c_2,\ldots,c_k\} $. In this problem, we can construct a graph where each node corresponds to a sum of money, and the edges show how the coins can be chosen. For example, for coins $\{1,3,4\}$ and $n=6$, the graph is as follows:

Using this representation, the shortest path from node 0 to node $n$ corresponds to a solution with the minimum number of coins, and the total number of paths from node 0 to node $n$ equals the total number of solutions.

Successor paths

For the rest of the chapter, we will focus on successor graphs. In those graphs, the outdegree of each node is 1, i.e., exactly one edge starts at each node. A successor graph consists of one or more components, each of which contains one cycle and some paths that lead to it.

Successor graphs are sometimes called functional graphs. The reason for this is that any successor graph corresponds to a function that defines the edges of the graph. The parameter for the function is a node of the graph, and the function gives the successor of that node.

$x$	1	2	3	4	5	6	7	8	9
`succ(x)`	3	5	7	6	2	2	1	6	3

defines the following graph

Since each node of a successor graph has a unique successor, we can also define a function succ(x,k) that gives the node that we will reach if we begin at node $x$ and walk $k$ steps forward. For example, in the above graph succ(4,6)=2, because we will reach node 2 by walking 6 steps from node 4:

A straightforward way to calculate a value of succ(x,k) is to start at node $x$ and walk $k$ steps forward, which takes $O(k)$ time. However, using preprocessing, any value of succ(x,k) can be calculated in only $O(\log k)$ time.

The idea is to precalculate all values of succ(x,k) where $k$ is a power of two and at most $u$, where $u$ is the maximum number of steps we will ever walk. This can be efficiently done, because we can use the following recursion:

\[ \begin{equation*} \texttt{succ}(x,k) = \begin{cases} \texttt{succ}(x) & k = 1\\ \texttt{succ}(\texttt{succ}(x,k/2),k/2) & k > 1\\ \end{cases} \end{equation*} \]

Precalculating the values takes $O(n \log u)$ time, because $O(\log u)$ values are calculated for each node. In the above graph, the first values are as follows:

$x$	1	2	3	4	5	6	7	8	9
`succ(x,1)`	3	5	7	6	2	2	1	6	3
`succ(x,2)`	7	2	1	2	5	5	3	2	7
`succ(x,4)`	3	2	7	2	5	5	1	2	3
`succ(x,8)`	7	2	1	2	5	5	3	2	7

After this, any value of succ(x,k) can be calculated by presenting the number of steps $k$ as a sum of powers of two. For example, if we want to calculate the value of succ(x,11), we first form the representation $11=8+2+1$. Using that,

\[ \texttt{succ}(x,11)=\texttt{succ}(\texttt{succ}(\texttt{succ}(x,8),2),1) \]

For example, in the previous graph

\[ \texttt{succ}(4,11)=\texttt{succ}(\texttt{succ}(\texttt{succ}(4,8),2),1)=5 \]

Such a representation always consists of $O(\log k)$ parts, so calculating a value of succ(x,k) takes $O(\log k)$ time.

Cycle detection

Consider a successor graph that only contains a path that ends in a cycle. We may ask the following questions: if we begin our walk at the starting node, what is the first node in the cycle and how many nodes does the cycle contain?

For example, in the graph

we begin our walk at node 1, the first node that belongs to the cycle is node 4, and the cycle consists of three nodes (4, 5 and 6).

A simple way to detect the cycle is to walk in the graph and keep track of all nodes that have been visited. Once a node is visited for the second time, we can conclude that the node is the first node in the cycle. This method works in $O(n)$ time and also uses $O(n)$ memory.

However, there are better algorithms for cycle detection. The time complexity of such algorithms is still $O(n)$, but they only use $O(1)$ memory. This is an important improvement if $n$ is large. Next we will discuss Floyd's algorithm that achieves these properties.

Floyd's algorithm

Floyd's algorithm¹ walks forward in the graph using two pointers $a$ and $b$. Both pointers begin at a node $x$ that is the starting node of the graph. Then, on each turn, the pointer $a$ walks one step forward and the pointer $b$ walks two steps forward. The process continues until the pointers meet each other:

#![allow(unused)]
fn main() {
    let s = [0, 2, 3, 4, 5, 6, 4];
println!("{s:?}");
fn succ(i: usize) -> usize{
    let s = [0, 2, 3, 4, 5, 6, 4];
    s[i] 
}
let x = 1;
let mut a = succ(x);
println!("a = {a}");
let mut b = succ(succ(x));
println!("       b = {b}");
while a != b {
    a = succ(a);
println!("a = {a}");
    b = succ(succ(b));
println!("       b = {b}");
}
}

At this point, the pointer $a$ has walked $k$ steps and the pointer $b$ has walked $2k$ steps, so the length of the cycle divides $k$. Thus, the first node that belongs to the cycle can be found by moving the pointer $a$ to node $x$ and advancing the pointers step by step until they meet again.

#![allow(unused)]
fn main() {
    let s = [0, 2, 3, 4, 5, 6, 4];
println!("{s:?}");
fn succ(i: usize) -> usize{
    let s = [0, 2, 3, 4, 5, 6, 4];
    s[i] 
}
let x = 1;
let mut a = succ(x);
let mut b = succ(succ(x));
while a != b {
   a = succ(a);
  b = succ(succ(b));
}
let mut a = x;
println!("a = {a}");
while a != b {
    a = succ(a);
println!("a = {a}");
    b = succ(b);
println!("       b = {b}");
}
let first = a;
println!("                first = {first}");
}

After this, the length of the cycle can be calculated as follows:

#![allow(unused)]
fn main() {
    let s = [0, 2, 3, 4, 5, 6, 4];
println!("{s:?}");
fn succ(i: usize) -> usize{
    let s = [0, 2, 3, 4, 5, 6, 4];
    s[i] 
}
let x = 1;
let mut a = succ(x);
let mut b = succ(succ(x));
while a != b {
   a = succ(a);
  b = succ(succ(b));
}
let mut a = x;
while a != b {
    a = succ(a);
    b = succ(b);
}
let first = a;
let mut b = succ(a);
let mut length = 1;
while a != b {
    b = succ(b);
    length += 1;
}
println!("cycle length = {length}");
}

¹ The idea of the algorithm is mentioned in [46] and attributed to R. W. Floyd; however, it is not known if Floyd actually discovered the algorithm.

Strong connectivity

In a directed graph, the edges can be traversed in one direction only, so even if the graph is connected, this does not guarantee that there would be a path from a node to another node. For this reason, it is meaningful to define a new concept that requires more than connectivity.

A graph is strongly connected if there is a path from any node to all other nodes in the graph. For example, in the following picture, the left graph is strongly connected while the right graph is not.

The right graph is not strongly connected because, for example, there is no path from node 2 to node 1.

The strongly connected components of a graph divide the graph into strongly connected parts that are as large as possible. The strongly connected components form an acyclic component graph that represents the deep structure of the original graph.

the strongly connected components are as follows:

The corresponding component graph is as follows:

The components are (A=\{1,2\}), (B=\{3,6,7\}), (C=\{4\}) and (D=\{5\}).

A component graph is an acyclic, directed graph, so it is easier to process than the original graph. Since the graph does not contain cycles, we can always construct a topological sort and use dynamic programming techniques like those presented in Chapter 16.

Kosaraju’s algorithm

Kosaraju's algorithm¹ is an efficient method for finding the strongly connected components of a directed graph. The algorithm performs two depth-first searches: the first search constructs a list of nodes according to the structure of the graph, and the second search forms the strongly connected components.

Search 1

The first phase of Kosaraju's algorithm constructs a list of nodes in the order in which a depth-first search processes them. The algorithm goes through the nodes, and begins a depth-first search at each unprocessed node. Each node will be added to the list after it has been processed.

In the example graph, the nodes are processed in the following order:

The notation $x/y$ means that processing the node started at time $x$ and finished at time $y$. Thus, the corresponding list is as follows:

node	processing time
4	5
5	6
2	7
1	8
6	12
7	13
3	14

Search 2

The second phase of the algorithm forms the strongly connected components of the graph. First, the algorithm reverses every edge in the graph. This guarantees that during the second search, we will always find strongly connected components that do not have extra nodes.

After reversing the edges, the example graph is as follows:

After this, the algorithm goes through the list of nodes created by the first search, in reverse order. If a node does not belong to a component, the algorithm creates a new component and starts a depth-first search that adds all new nodes found during the search to the new component.

In the example graph, the first component begins at node 3:

Note that since all edges are reversed, the component does not leak to other parts in the graph.

The next nodes in the list are nodes 7 and 6, but they already belong to a component, so the next new component begins at node 1:

Finally, the algorithm processes nodes 5 and 4 that create the remaining strongly connected components:

The time complexity of the algorithm is $O(n+m)$, because the algorithm performs two depth-first searches.

¹ According to [1], S. R. Kosaraju invented this algorithm in 1978 but did not publish it. In 1981, the same algorithm was rediscovered and published by M. Sharir [57].

2SAT problem

Strong connectivity is also linked with the 2SAT problem ¹. In this problem, we are given a logical formula \[ (a_1 \lor b_1) \land (a_2 \lor b_2) \land \cdots \land (a_m \lor b_m), \] where each $a_i$ and $b_i$ is either a logical variable ($x_1,x_2,\ldots,x_n$) or a negation of a logical variable ($\lnot x_1, \lnot x_2, \ldots, \lnot x_n$). The symbols "$\land$" and "$\lor$" denote logical operators "and" and "or". Our task is to assign each variable a value so that the formula is true, or state that this is not possible.

For example, the formula \[ L_1 = (x_2 \lor \lnot x_1) \land (\lnot x_1 \lor \lnot x_2) \land (x_1 \lor x_3) \land (\lnot x_2 \lor \lnot x_3) \land (x_1 \lor x_4) \] is true when the variables are assigned as follows:

\[ \begin{cases} x_1 = \textrm{false} \\ x_2 = \textrm{false} \\ x_3 = \textrm{true} \\ x_4 = \textrm{true} \\ \end{cases} \]

However, the formula \[ L_2 = (x_1 \lor x_2) \land (x_1 \lor \lnot x_2) \land (\lnot x_1 \lor x_3) \land (\lnot x_1 \lor \lnot x_3) \] is always false, regardless of how we assign the values. The reason for this is that we cannot choose a value for $x_1$ without creating a contradiction. If $x_1$ is false, both $x_2$ and $\lnot x_2$ should be true which is impossible, and if $x_1$ is true, both $x_3$ and $\lnot x_3$ should be true which is also impossible.

The 2SAT problem can be represented as a graph whose nodes correspond to variables $x_i$ and negations $\lnot x_i$, and edges determine the connections between the variables. Each pair $(a_i \lor b_i)$ generates two edges: $\lnot a_i \to b_i$ and $\lnot b_i \to a_i$. This means that if $a_i$ does not hold, $b_i$ must hold, and vice versa.

The graph for the formula $L_1$ is:

And the graph for the formula $L_2$ is:

The structure of the graph tells us whether it is possible to assign the values of the variables so that the formula is true. It turns out that this can be done exactly when there are no nodes $x_i$ and $\lnot x_i$ such that both nodes belong to the same strongly connected component. If there are such nodes, the graph contains a path from $x_i$ to $\lnot x_i$ and also a path from $\lnot x_i$ to $x_i$, so both $x_i$ and $\lnot x_i$ should be true which is not possible.

In the graph of the formula $L_1$ there are no nodes $x_i$ and $\lnot x_i$ such that both nodes belong to the same strongly connected component, so a solution exists. In the graph of the formula $L_2$ all nodes belong to the same strongly connected component, so a solution does not exist.

If a solution exists, the values for the variables can be found by going through the nodes of the component graph in a reverse topological sort order. At each step, we process a component that does not contain edges that lead to an unprocessed component. If the variables in the component have not been assigned values, their values will be determined according to the values in the component, and if they already have values, they remain unchanged. The process continues until each variable has been assigned a value.

The component graph for the formula $L_1$ is as follows:

The components are $A = {\lnot x_4}$, $B = {x_1, x_2, \lnot x_3}$, $C = {\lnot x_1, \lnot x_2, x_3}$ and $D = {x_4}$. When constructing the solution, we first process the component $D$ where $x_4$ becomes true. After this, we process the component $C$ where $x_1$ and $x_2$ become false and $x_3$ becomes true. All variables have been assigned values, so the remaining components $A$ and $B$ do not change the variables.

Note that this method works, because the graph has a special structure: if there are paths from node $x_i$ to node $x_j$ and from node $x_j$ to node $\lnot x_j$, then node $x_i$ never becomes true. The reason for this is that there is also a path from node $\lnot x_j$ to node $\lnot x_i$, and both $x_i$ and $x_j$ become false.

A more difficult problem is the 3SAT problem, where each part of the formula is of the form $(a_i \lor b_i \lor c_i)$. This problem is NP-hard, so no efficient algorithm for solving the problem is known.

¹ The algorithm presented here was introduced in [4]. There is also another well-known linear-time algorithm [19] that is based on backtracking.

Tree queries

This chapter discusses techniques for processing queries on subtrees and paths of a rooted tree. For example, such queries are:

what is the $k$th ancestor of a node?
what is the sum of values in the subtree of a node?
what is the sum of values on a path between two nodes?
what is the lowest common ancestor of two nodes?

Finding ancestors

The $k$th ancestor of a node $x$ in a rooted tree is the node that we will reach if we move $k$ levels up from $x$. Let ancestor}(x,k) denote the $k$th ancestor of a node $x$ (or $0$ if there is no such an ancestor). For example, in the following tree,

An easy way to calculate any value of ancestor(x,k) is to perform a sequence of $k$ moves in the tree. However, the time complexity of this method is $O(k)$, which may be slow, because a tree of $n$ nodes may have a chain of $n$ nodes.

Fortunately, using a technique similar to that used in Chapter 16.3, any value of ancestor(x,k) can be efficiently calculated in $O(\log k)$ time after preprocessing. The idea is to precalculate all values ancestor(x,k) where $k \le n$ is a power of two. For example, the values for the above tree are as follows:

x	2	3	4	5	6	7	8
`ancestor(x,1)`	1	4	1	1	2	4	7
`ancestor(x,2)`	0	1	0	0	1	1	4
`ancestor(x,4)`	0	0	0	0	0	0	0

The preprocessing takes $O(n \log n)$ time, because $O(\log n)$ values are calculated for each node. After this, any value of ancestor(x,k) can be calculated in $O(\log k)$ time by representing $k$ as a sum where each term is a power of two.

Subtrees and paths

A tree traversal array contains the nodes of a rooted tree in the order in which a depth-first search from the root node visits them. For example, in the tree

a depth-first search proceeds as follows:

Hence, the corresponding tree traversal array is as follows:

Subtree queries

Each subtree of a tree corresponds to a subarray of the tree traversal array such that the first element of the subarray is the root node. For example, the following subarray contains the nodes of the subtree of node $4$:

Using this fact, we can efficiently process queries that are related to subtrees of a tree. As an example, consider a problem where each node is assigned a value, and our task is to support the following queries:

update the value of a node
calculate the sum of values in the subtree of a node

Consider the following tree where the blue numbers are the values of the nodes. For example, the sum of the subtree of node $4$ is $3+4+3+1=11$.

The idea is to construct a tree traversal array that contains three values for each node: the identifier of the node, the size of the subtree, and the value of the node. For example, the array for the above tree is as follows:

Using this array, we can calculate the sum of values in any subtree by first finding out the size of the subtree and then the values of the corresponding nodes. For example, the values in the subtree of node $4$ can be found as follows:

To answer the queries efficiently, it suffices to store the values of the nodes in a binary indexed or segment tree. After this, we can both update a value and calculate the sum of values in $O(\log n)$ time.

Path queries

Using a tree traversal array, we can also efficiently calculate sums of values on paths from the root node to any node of the tree. Consider a problem where our task is to support the following queries:

change the value of a node
calculate the sum of values on a path from the root to a node

For example, in the following tree, the sum of values from the root node to node 7 is $4+5+5=14$:

We can solve this problem like before, but now each value in the last row of the array is the sum of values on a path from the root to the node. For example, the following array corresponds to the above tree:

\begin{tikzpicture}[scale=0.7] \draw (0,1) grid (9,-2);

\node[left] at (-1,0.5) {node id}; \node[left] at (-1,-0.5) {subtree size}; \node[left] at (-1,-1.5) {path sum};

\node at (0.5,0.5) {1}; \node at (1.5,0.5) {2}; \node at (2.5,0.5) {6}; \node at (3.5,0.5) {3}; \node at (4.5,0.5) {4}; \node at (5.5,0.5) {7}; \node at (6.5,0.5) {8}; \node at (7.5,0.5) {9}; \node at (8.5,0.5) {5};

\node at (0.5,-0.5) {9}; \node at (1.5,-0.5) {2}; \node at (2.5,-0.5) {1}; \node at (3.5,-0.5) {1}; \node at (4.5,-0.5) {4}; \node at (5.5,-0.5) {1}; \node at (6.5,-0.5) {1}; \node at (7.5,-0.5) {1}; \node at (8.5,-0.5) {1};

\node at (0.5,-1.5) {4}; \node at (1.5,-1.5) {9}; \node at (2.5,-1.5) {12}; \node at (3.5,-1.5) {7}; \node at (4.5,-1.5) {9}; \node at (5.5,-1.5) {14}; \node at (6.5,-1.5) {12}; \node at (7.5,-1.5) {10}; \node at (8.5,-1.5) {6}; \end{tikzpicture}

When the value of a node increases by $x$, the sums of all nodes in its subtree increase by $x$. For example, if the value of node 4 increases by 1, the array changes as follows:

Thus, to support both the operations, we should be able to increase all values in a range and retrieve a single value. This can be done in $O(\log n)$ time using a binary indexed or segment tree (see Chapter 9.4).

Lowest common ancestor

The lowest common ancestor of two nodes of a rooted tree is the lowest node whose subtree contains both the nodes. A typical problem is to efficiently process queries that ask to find the lowest common ancestor of two nodes.

For example, in the following tree, the lowest common ancestor of nodes 5 and 8 is node 2:

Next we will discuss two efficient techniques for finding the lowest common ancestor of two nodes.

Method 1

One way to solve the problem is to use the fact that we can efficiently find the $k$th ancestor of any node in the tree. Using this, we can divide the problem of finding the lowest common ancestor into two parts.

We use two pointers that initially point to the two nodes whose lowest common ancestor we should find. First, we move one of the pointers upwards so that both pointers point to nodes at the same level.

In the example scenario, we move the second pointer one level up so that it points to node 6 which is at the same level with node 5:

After this, we determine the minimum number of steps needed to move both pointers upwards so that they will point to the same node. The node to which the pointers point after this is the lowest common ancestor.

In the example scenario, it suffices to move both pointers one step upwards to node 2, which is the lowest common ancestor:

Since both parts of the algorithm can be performed in $O(\log n)$ time using precomputed information, we can find the lowest common ancestor of any two nodes in $O(\log n)$ time.

Method 2

Another way to solve the problem is based on a tree traversal array\footnote{}. Once again, the idea is to traverse the nodes using a depth-first search:

However, we use a different tree traversal array than before: we add each node to the array \emph{always} when the depth-first search walks through the node, and not only at the first visit. Hence, a node that has $k$ children appears $k+1$ times in the array and there are a total of $2n-1$ nodes in the array.

We store two values in the array: the identifier of the node and the depth of the node in the tree. The following array corresponds to the above tree:

Now we can find the lowest common ancestor of nodes $a$ and $b$ by finding the node with the minimum depth between nodes $a$ and $b$ in the array. For example, the lowest common ancestor of nodes $5$ and $8$ can be found as follows:

Node 5 is at position 2, node 8 is at position 5, and the node with minimum depth between positions $2 \ldots 5$ is node 2 at position 3 whose depth is 2. Thus, the lowest common ancestor of nodes 5 and 8 is node 2.

Thus, to find the lowest common ancestor of two nodes it suffices to process a range minimum query. Since the array is static, we can process such queries in $O(1)$ time after an $O(n \log n)$ time preprocessing.

The distance between nodes $a$ and $b$ equals the length of the path from $a$ to $b$. It turns out that the problem of calculating the distance between nodes reduces to finding their lowest common ancestor.

First, we root the tree arbitrarily. After this, the distance of nodes $a$ and $b$ can be calculated using the formula

\[ \texttt{depth}(a)+\texttt{depth}(b)-2 \cdot \texttt{depth}(c) \]

where $c$ is the lowest common ancestor of $a$ and $b$ and depth(s) denotes the depth of node $s$. For example, consider the distance of nodes 5 and 8:

The lowest common ancestor of nodes 5 and 8 is node 2. The depths of the nodes are depth(5)=3, depth(8)=4 and depth(2)=2, so the distance between nodes 5 and 8 is $3+4-2\cdot2=3$.

¹ This lowest common ancestor algorithm was presented in [7]. This technique is sometimes called the Euler tour technique [66].

Offline algorithms

So far, we have discussed online algorithms for tree queries. Those algorithms are able to process queries one after another so that each query is answered before receiving the next query.

However, in many problems, the online property is not necessary. In this section, we focus on offline algorithms. Those algorithms are given a set of queries which can be answered in any order. It is often easier to design an offline algorithm compared to an online algorithm.

Merging data and structures

One method to construct an offline algorithm is to perform a depth-first tree traversal and maintain data structures in nodes. At each node $s$, we create a data structure d[s] that is based on the data structures of the children of $s$. Then, using this data structure, all queries related to $s$ are processed.

As an example, consider the following problem: We are given a tree where each node has some value. Our task is to process queries of the form ''calculate the number of nodes with value $x$ in the subtree of node $s$''. For example, in the following tree, the subtree of node $4$ contains two nodes whose value is 3.

In this problem, we can use map structures to answer the queries. For example, the maps for node 4 and its children are as follows:

If we create such a data structure for each node, we can easily process all given queries, because we can handle all queries related to a node immediately after creating its data structure. For example, the above map structure for node 4 tells us that its subtree contains two nodes whose value is 3.

However, it would be too slow to create all data structures from scratch. Instead, at each node $s$, we create an initial data structure d[s] that only contains the value of $s$. After this, we go through the children of $s$ and merge d[s] and all data structures d[u] where $u$ is a child of $s$.

For example, in the above tree, the map for node $4$ is created by merging the following maps:

Here the first map is the initial data structure for node 4, and the other three maps correspond to nodes 7, 8 and 9.

The merging at node $s$ can be done as follows: We go through the children of $s$ and at each child $u$ merge d[s] and d[u]. We always copy the contents from d[u] to d[s]. However, before this, we swap the contents of d[s] and d[u] if d[s] is smaller than d[u]. By doing this, each value is copied only $O(\log n)$ times during the tree traversal, which ensures that the algorithm is efficient.

To swap the contents of two data structures $a$ and $b$ efficiently, we can just use the following code:

#![allow(unused)]
fn main() {
let mut a = 5;
let mut b = 6;
println!("a:{a}  b:{b}");
(a, b) = (b, a);
println!("a:{a}  b:{b}");
}

or in alternative:

#![allow(unused)]
fn main() {
let mut a = 5;
let mut b = 6;
println!("a:{a}  b:{b}");
std::mem::swap(&mut a, &mut b);
println!("a:{a}  b:{b}");
}

Lowest common ancestor

There is also an offline algorithm for processing a set of lowest common ancestor queries¹. The algorithm is based on the union-find data structure (see Chapter 15.2), and the benefit of the algorithm is that it is easier to implement than the algorithms discussed earlier in this chapter.

The algorithm is given as input a set of pairs of nodes, and it determines for each such pair the lowest common ancestor of the nodes. The algorithm performs a depth-first tree traversal and maintains disjoint sets of nodes. Initially, each node belongs to a separate set. For each set, we also store the highest node in the tree that belongs to the set.

When the algorithm visits a node $x$, it goes through all nodes $y$ such that the lowest common ancestor of $x$ and $y$ has to be found. If $y$ has already been visited, the algorithm reports that the lowest common ancestor of $x$ and $y$ is the highest node in the set of $y$. Then, after processing node $x$, the algorithm joins the sets of $x$ and its parent.

For example, suppose that we want to find the lowest common ancestors of node pairs $(5,8)$ and $(2,7)$ in the following tree:

In the following trees, gray nodes denote visited nodes and dashed groups of nodes belong to the same set. When the algorithm visits node 8, it notices that node 5 has been visited and the highest node in its set is 2. Thus, the lowest common ancestor of nodes 5 and 8 is 2:

Later, when visiting node 7, the algorithm determines that the lowest common ancestor of nodes 2 and 7 is 1:

¹ This algorithm was published by R. E. Tarjan in 1979 [65].

Paths and circuits

This chapter focuses on two types of paths in graphs:

An Eulerian path is a path that goes through each edge exactly once.
A Hamiltonian path is a path that visits each node exactly once.

While Eulerian and Hamiltonian paths look like similar concepts at first glance, the computational problems related to them are very different. It turns out that there is a simple rule that determines whether a graph contains an Eulerian path, and there is also an efficient algorithm to find such a path if it exists. On the contrary, checking the existence of a Hamiltonian path is a NP-hard problem, and no efficient algorithm is known for solving the problem.

Eulerian paths

An Eulerian path¹ is a path that goes exactly once through each edge of the graph. For example, the graph

has an Eulerian path from node 2 to node 5:

An Eulerian circuit is an Eulerian path that starts and ends at the same node. For example, the graph

has an Eulerian circuit that starts and ends at node 1:

Existence

The existence of Eulerian paths and circuits depends on the degrees of the nodes. First, an undirected graph has an Eulerian path exactly when all the edges belong to the same connected component and

the degree of each node is even \emph{or}
the degree of exactly two nodes is odd, and the degree of all other nodes is even.

In the first case, each Eulerian path is also an Eulerian circuit. In the second case, the odd-degree nodes are the starting and ending nodes of an Eulerian path which is not an Eulerian circuit.

For example, in the graph

nodes 1, 3 and 4 have a degree of 2, and nodes 2 and 5 have a degree of 3. Exactly two nodes have an odd degree, so there is an Eulerian path between nodes 2 and 5, but the graph does not contain an Eulerian circuit.

In a directed graph, we focus on indegrees and outdegrees of the nodes. A directed graph contains an Eulerian path exactly when all the edges belong to the same connected component and

in each node, the indegree equals the outdegree, or
in one node, the indegree is one larger than the outdegree, in another node, the outdegree is one larger than the indegree, and in all other nodes, the indegree equals the outdegree.

In the first case, each Eulerian path is also an Eulerian circuit, and in the second case, the graph contains an Eulerian path that begins at the node whose outdegree is larger and ends at the node whose indegree is larger.

For example, in the graph

nodes 1, 3 and 4 have both indegree 1 and outdegree 1, node 2 has indegree 1 and outdegree 2, and node 5 has indegree 2 and outdegree 1. Hence, the graph contains an Eulerian path from node 2 to node 5:

Hierholzer's algorithm

Hierholzer's algorithm² is an efficient method for constructing an Eulerian circuit. The algorithm consists of several rounds, each of which adds new edges to the circuit. Of course, we assume that the graph contains an Eulerian circuit; otherwise Hierholzer's algorithm cannot find it.

First, the algorithm constructs a circuit that contains some (not necessarily all) of the edges of the graph. After this, the algorithm extends the circuit step by step by adding subcircuits to it. The process continues until all edges have been added to the circuit.

The algorithm extends the circuit by always finding a node $x$ that belongs to the circuit but has an outgoing edge that is not included in the circuit. The algorithm constructs a new path from node $x$ that only contains edges that are not yet in the circuit. Sooner or later, the path will return to node $x$, which creates a subcircuit.

If the graph only contains an Eulerian path, we can still use Hierholzer's algorithm to find it by adding an extra edge to the graph and removing the edge after the circuit has been constructed. For example, in an undirected graph, we add the extra edge between the two odd-degree nodes.

Next we will see how Hierholzer's algorithm constructs an Eulerian circuit for an undirected graph.

Example

Let us consider the following graph:

Suppose that the algorithm first creates a circuit that begins at node 1. A possible circuit is $1 \rightarrow 2 \rightarrow 3 \rightarrow 1$:

After this, the algorithm adds the subcircuit $2 \rightarrow 5 \rightarrow 6 \rightarrow 2$ to the circuit:

Finally, the algorithm adds the subcircuit $6 \rightarrow 3 \rightarrow 4 \rightarrow 7 \rightarrow 6$ to the circuit:

Now all edges are included in the circuit, so we have successfully constructed an Eulerian circuit.

¹ L. Euler studied such paths in 1736 when he solved the famous Königsberg bridge problem. This was the birth of graph theory. ² The algorithm was published in 1873 after Hierholzer's death [35].

Hamiltonian paths

A Hamiltonian path is a path that visits each node of the graph exactly once. For example, the graph

contains a Hamiltonian path from node 1 to node 3:

If a Hamiltonian path begins and ends at the same node, it is called a Hamiltonian circuit. The graph above also has an Hamiltonian circuit that begins and ends at node 1:

Existence

No efficient method is known for testing if a graph contains a Hamiltonian path, and the problem is NP-hard. Still, in some special cases, we can be certain that a graph contains a Hamiltonian path.

A simple observation is that if the graph is complete, i.e., there is an edge between all pairs of nodes, it also contains a Hamiltonian path. Also stronger results have been achieved:

Dirac's theorem: If the degree of each node is at least $n/2$, the graph contains a Hamiltonian path.
Ore's theorem: If the sum of degrees of each non-adjacent pair of nodes is at least $n$, the graph contains a Hamiltonian path.

A common property in these theorems and other results is that they guarantee the existence of a Hamiltonian path if the graph has a large number of edges. This makes sense, because the more edges the graph contains, the more possibilities there is to construct a Hamiltonian path.

Construction

Since there is no efficient way to check if a Hamiltonian path exists, it is clear that there is also no method to efficiently construct the path, because otherwise we could just try to construct the path and see whether it exists.

A simple way to search for a Hamiltonian path is to use a backtracking algorithm that goes through all possible ways to construct the path. The time complexity of such an algorithm is at least $O(n!)$, because there are $n!$ different ways to choose the order of $n$ nodes.

A more efficient solution is based on dynamic programming (see Chapter 10.5). The idea is to calculate values of a function possible(S,x), where $S$ is a subset of nodes and $x$ is one of the nodes. The function indicates whether there is a Hamiltonian path that visits the nodes of $S$ and ends at node $x$. It is possible to implement this solution in $O(2^n n^2)$ time.

De Bruijn sequences

A De Bruijn sequence is a string that contains every string of length $n$ exactly once as a substring, for a fixed alphabet of $k$ characters. The length of such a string is $k^n+n-1$ characters. For example, when $n=3$ and $k=2$, an example of a De Bruijn sequence is

$$ 0001011100 $$

The substrings of this string are all combinations of three bits: 000, 001, 010, 011, 100, 101, 110 and 111.

It turns out that each De Bruijn sequence corresponds to an Eulerian path in a graph. The idea is to construct a graph where each node contains a string of $n-1$ characters and each edge adds one character to the string. The following graph corresponds to the above scenario:

An Eulerian path in this graph corresponds to a string that contains all strings of length $n$. The string contains the characters of the starting node and all characters of the edges. The starting node has $n-1$ characters and there are $k^n$ characters in the edges, so the length of the string is $k^n+n-1$.

Knight’s tours

A knight's tour is a sequence of moves of a knight on an $n \times n$ chessboard following the rules of chess such that the knight visits each square exactly once. A knight's tour is called a closed tour if the knight finally returns to the starting square and otherwise it is called an open tour.

For example, here is an open knight's tour on a $5 \times 5$ board:

A knight's tour corresponds to a Hamiltonian path in a graph whose nodes represent the squares of the board, and two nodes are connected with an edge if a knight can move between the squares according to the rules of chess.

A natural way to construct a knight's tour is to use backtracking. The search can be made more efficient by using heuristics that attempt to guide the knight so that a complete tour will be found quickly.

Warnsdorf's rule

Warnsdorf's rule is a simple and effective heuristic for finding a knight's tour¹. Using the rule, it is possible to efficiently construct a tour even on a large board. The idea is to always move the knight so that it ends up in a square where the number of possible moves is as small as possible.

For example, in the following situation, there are five possible squares to which the knight can move (squares $a \ldots e$):

¹ This heuristic was proposed in Warnsdorf's book [69] in 1823. There are also polynomial algorithms for finding knight's tours [52], but they are more complicated.

Flows and cuts

In this chapter, we focus on the following two problems:

Finding a maximum flow: What is the maximum amount of flow we can send from a node to another node?
Finding a minimum cut: What is a minimum-weight set of edges that separates two nodes of the graph?

The input for both these problems is a directed, weighted graph that contains two special nodes: the source is a node with no incoming edges, and the sink is a node with no outgoing edges.

As an example, we will use the following graph where node 1 is the source and node 6 is the sink:

Maximum Flow

In the maximum flow problem, our task is to send as much flow as possible from the source to the sink. The weight of each edge is a capacity that restricts the flow that can go through the edge. In each intermediate node, the incoming and outgoing flow has to be equal.

For example, the maximum size of a flow in the example graph is 7. The following picture shows how we can route the flow:

The notation $v/k$ means that a flow of $v$ units is routed through an edge whose capacity is $k$ units. The size of the flow is $7$, because the source sends $3+4$ units of flow and the sink receives $5+2$ units of flow. It is easy see that this flow is maximum, because the total capacity of the edges leading to the sink is $7$.

Minimum Cut

In the minimum cut problem, our task is to remove a set of edges from the graph such that there will be no path from the source to the sink after the removal and the total weight of the removed edges is minimum.

The minimum size of a cut in the example graph is 7. It suffices to remove the edges $2 \rightarrow 3$ and $4 \rightarrow 5$:

After removing the edges, there will be no path from the source to the sink. The size of the cut is $7$, because the weights of the removed edges are $6$ and $1$. The cut is minimum, because there is no valid way to remove edges from the graph such that their total weight would be less than $7$.

It is not a coincidence that the maximum size of a flow and the minimum size of a cut are the same in the above example. It turns out that a maximum flow and a minimum cut are always equally large, so the concepts are two sides of the same coin.

Next we will discuss the Ford–Fulkerson algorithm that can be used to find the maximum flow and minimum cut of a graph. The algorithm also helps us to understand why they are equally large.

Ford–Fulkerson algorithm

The Ford–Fulkerson algorithm [25] finds the maximum flow in a graph. The algorithm begins with an empty flow, and at each step finds a path from the source to the sink that generates more flow. Finally, when the algorithm cannot increase the flow anymore, the maximum flow has been found.

The algorithm uses a special representation of the graph where each original edge has a reverse edge in another direction. The weight of each edge indicates how much more flow we could route through it. At the beginning of the algorithm, the weight of each original edge equals the capacity of the edge and the weight of each reverse edge is zero.

The new representation for the example graph is as follows:

Algorithm description

The Ford–Fulkerson algorithm consists of several rounds. On each round, the algorithm finds a path from the source to the sink such that each edge on the path has a positive weight. If there is more than one possible path available, we can choose any of them.

For example, suppose we choose the following path:

After choosing the path, the flow increases by $x$ units, where $x$ is the smallest edge weight on the path. In addition, the weight of each edge on the path decreases by $x$ and the weight of each reverse edge increases by $x$.

In the above path, the weights of the edges are 5, 6, 8 and 2. The smallest weight is 2, so the flow increases by 2 and the new graph is as follows:

The idea is that increasing the flow decreases the amount of flow that can go through the edges in the future. On the other hand, it is possible to cancel flow later using the reverse edges of the graph if it turns out that it would be beneficial to route the flow in another way.

The algorithm increases the flow as long as there is a path from the source to the sink through positive-weight edges. In the present example, our next path can be as follows:

The minimum edge weight on this path is 3, so the path increases the flow by 3, and the total flow after processing the path is 5.

The new graph will be as follows:

We still need two more rounds before reaching the maximum flow. For example, we can choose the paths $1 \rightarrow 2 \rightarrow 3 \rightarrow 6$ and $1 \rightarrow 4 \rightarrow 5 \rightarrow 3 \rightarrow 6$. Both paths increase the flow by 1, and the final graph is as follows:

It is not possible to increase the flow anymore, because there is no path from the source to the sink with positive edge weights. Hence, the algorithm terminates and the maximum flow is 7.

Finding paths

The Ford–Fulkerson algorithm does not specify how we should choose the paths that increase the flow. In any case, the algorithm will terminate sooner or later and correctly find the maximum flow. However, the efficiency of the algorithm depends on the way the paths are chosen.

A simple way to find paths is to use depth-first search. Usually, this works well, but in the worst case, each path only increases the flow by 1 and the algorithm is slow. Fortunately, we can avoid this situation by using one of the following techniques:

The Edmonds–Karp algorithm [18] chooses each path so that the number of edges on the path is as small as possible. This can be done by using breadth-first search instead of depth-first search for finding paths. It can be proven that this guarantees that the flow increases quickly, and the time complexity of the algorithm is $O(m^2 n)$.

The scaling algorithm [2] uses depth-first search to find paths where each edge weight is at least a threshold value. Initially, the threshold value is some large number, for example the sum of all edge weights of the graph. Always when a path cannot be found, the threshold value is divided by 2. The time complexity of the algorithm is $O(m^2 \log c)$, where $c$ is the initial threshold value.

In practice, the scaling algorithm is easier to implement, because depth-first search can be used for finding paths. Both algorithms are efficient enough for problems that typically appear in programming contests.

Minimum Cuts

It turns out that once the Ford–Fulkerson algorithm has found a maximum flow, it has also determined a minimum cut. Let $A$ be the set of nodes that can be reached from the source using positive-weight edges. In the example graph, $A$ contains nodes 1, 2 and 4:

Now the minimum cut consists of the edges of the original graph that start at some node in $A$, end at some node outside $A$, and whose capacity is fully used in the maximum flow. In the above graph, such edges are $2 \rightarrow 3$ and $4 \rightarrow 5$, that correspond to the minimum cut $6+1=7$.

Why is the flow produced by the algorithm maximum and why is the cut minimum? The reason is that a graph cannot contain a flow whose size is larger than the weight of any cut of the graph. Hence, always when a flow and a cut are equally large, they are a maximum flow and a minimum cut.

Let us consider any cut of the graph such that the source belongs to $A$, the sink belongs to $B$ and there are some edges between the sets:

The size of the cut is the sum of the edges that go from $A$ to $B$. This is an upper bound for the flow in the graph, because the flow has to proceed from $A$ to $B$. Thus, the size of a maximum flow is smaller than or equal to the size of any cut in the graph.

On the other hand, the Ford–Fulkerson algorithm produces a flow whose size is exactly as large as the size of a cut in the graph. Thus, the flow has to be a maximum flow and the cut has to be a minimum cut.

Disjoint paths

Many graph problems can be solved by reducing them to the maximum flow problem. Our first example of such a problem is as follows: we are given a directed graph with a source and a sink, and our task is to find the maximum number of disjoint paths from the source to the sink.

Edge-disjoint paths

We will first focus on the problem of finding the maximum number of edge-disjoint paths from the source to the sink. This means that we should construct a set of paths such that each edge appears in at most one path.

For example, consider the following graph:

In this graph, the maximum number of edge-disjoint paths is 2. We can choose the paths $1 \rightarrow 2 \rightarrow 4 \rightarrow 3 \rightarrow 6$ and $1 \rightarrow 4 \rightarrow 5 \rightarrow 6$ as follows:

It turns out that the maximum number of edge-disjoint paths equals the maximum flow of the graph, assuming that the capacity of each edge is one. After the maximum flow has been constructed, the edge-disjoint paths can be found greedily by following paths from the source to the sink.

Node-disjoint paths

Let us now consider another problem: finding the maximum number of node-disjoint paths from the source to the sink. In this problem, every node, except for the source and sink, may appear in at most one path. The number of node-disjoint paths may be smaller than the number of edge-disjoint paths.

For example, in the previous graph, the maximum number of node-disjoint paths is 1:

We can reduce also this problem to the maximum flow problem. Since each node can appear in at most one path, we have to limit the flow that goes through the nodes. A standard method for this is to divide each node into two nodes such that the first node has the incoming edges of the original node, the second node has the outgoing edges of the original node, and there is a new edge from the first node to the second node.

In our example, the graph becomes as follows:

The maximum flow for the graph is as follows:

Thus, the maximum number of node-disjoint paths from the source to the sink is 1.

Maximum matchings

The maximum matching problem asks to find a maximum-size set of node pairs in an undirected graph such that each pair is connected with an edge and each node belongs to at most one pair.

There are polynomial algorithms for finding maximum matchings in general graphs [17], but such algorithms are complex and rarely seen in programming contests. However, in bipartite graphs, the maximum matching problem is much easier to solve, because we can reduce it to the maximum flow problem.

Finding maximum matchings

The nodes of a bipartite graph can be always divided into two groups such that all edges of the graph go from the left group to the right group. For example, in the following bipartite graph, the groups are $\{1,2,3,4\}$ and $ \{5,6,7,8\}$.

The size of a maximum matching of this graph is 3:

We can reduce the bipartite maximum matching problem to the maximum flow problem by adding two new nodes to the graph: a source and a sink. We also add edges from the source to each left node and from each right node to the sink. After this, the size of a maximum flow in the graph equals the size of a maximum matching in the original graph.

The maximum flow of this graph is as follows:

Hall's theorem

Hall's theorem can be used to find out whether a bipartite graph has a matching that contains all left or right nodes. If the number of left and right nodes is the same, Hall's theorem tells us if it is possible to construct a perfect matching that contains all nodes of the graph.

Assume that we want to find a matching that contains all left nodes. Let $X$ be any set of left nodes and let $f(X)$ be the set of their neighbors. According to Hall's theorem, a matching that contains all left nodes exists exactly when for each $X$, the condition $|X| \le |f(X)|$ holds.

Let us study Hall's theorem in the example graph. First, let $X={1,3}$ which yields $f(X)={5,6,8}$:

The condition of Hall's theorem holds, because $|X|=2$ and $|f(X)|=3$. Next, let $X={2,4}$ which yields $f(X)={7}$:

In this case, $|X|=2$ and $|f(X)|=1$, so the condition of Hall's theorem does not hold. This means that it is not possible to form a perfect matching for the graph. This result is not surprising, because we already know that the maximum matching of the graph is 3 and not 4.

If the condition of Hall's theorem does not hold, the set $X$ provides an explanation why we cannot form such a matching. Since $X$ contains more nodes than $f(X)$, there are no pairs for all nodes in $X$. For example, in the above graph, both nodes 2 and 4 should be connected with node 7 which is not possible.

Kőnig's theorem

A minimum node cover of a graph is a minimum set of nodes such that each edge of the graph has at least one endpoint in the set. In a general graph, finding a minimum node cover is a NP-hard problem. However, if the graph is bipartite, Kőnig's theorem tells us that the size of a minimum node cover and the size of a maximum matching are always equal. Thus, we can calculate the size of a minimum node cover using a maximum flow algorithm.

Let us consider the following graph with a maximum matching of size 3:

Now Kőnig's theorem tells us that the size of a minimum node cover is also 3. Such a cover can be constructed as follows:

The nodes that do not belong to a minimum node cover form a maximum independent set. This is the largest possible set of nodes such that no two nodes in the set are connected with an edge. Once again, finding a maximum independent set in a general graph is a NP-hard problem, but in a bipartite graph we can use Kőnig's theorem to solve the problem efficiently. In the example graph, the maximum independent set is as follows:

Path covers

A path cover is a set of paths in a graph such that each node of the graph belongs to at least one path. It turns out that in directed, acyclic graphs, we can reduce the problem of finding a minimum path cover to the problem of finding a maximum flow in another graph.

Node-disjoint path cover

A minimum node-disjoint path cover of this graph consists of three paths. For example, we can choose the following paths:

Note that one of the paths only contains node 2, so it is possible that a path does not contain any edges.

We can find a minimum node-disjoint path cover by constructing a matching graph where each node of the original graph is represented by two nodes: a left node and a right node. There is an edge from a left node to a right node if there is such an edge in the original graph. In addition, the matching graph contains a source and a sink, and there are edges from the source to all left nodes and from all right nodes to the sink.

A maximum matching in the resulting graph corresponds to a minimum node-disjoint path cover in the original graph. For example, the following matching graph for the above graph contains a maximum matching of size 4:

Each edge in the maximum matching of the matching graph corresponds to an edge in the minimum node-disjoint path cover of the original graph. Thus, the size of the minimum node-disjoint path cover is $n-c$, where $n$ is the number of nodes in the original graph and $c$ is the size of the maximum matching.

General path cover

A general path cover is a path cover where a node can belong to more than one path. A minimum general path cover may be smaller than a minimum node-disjoint path cover, because a node can be used multiple times in paths. Consider again the following graph:

The minimum general path cover of this graph consists of two paths. For example, the first path may be as follows:

And the second path may be as follows:

A minimum general path cover can be found almost like a minimum node-disjoint path cover. It suffices to add some new edges to the matching graph so that there is an edge $a \rightarrow b$ always when there is a path from $a$ to $b$ in the original graph (possibly through several edges).

The matching graph for the above graph is as follows:

Dilworth's theorem

An antichain is a set of nodes of a graph such that there is no path from any node to another node using the edges of the graph. Dilworth's theorem states that in a directed acyclic graph, the size of a minimum general path cover equals the size of a maximum antichain. For example, nodes 3 and 7 form an antichain in the following graph:

This is a maximum antichain, because it is not possible to construct any antichain that would contain three nodes. We have seen before that the size of a minimum general path cover of this graph consists of two paths.

Advanced topics

Number theory

Number theory is a branch of mathematics that studies integers. Number theory is a fascinating field, because many questions involving integers are very difficult to solve even if they seem simple at first glance.

As an example, consider the following equation: \[x^3 + y^3 + z^3 = 33\] It is easy to find three real numbers $x$, $y$ and $z$ that satisfy the equation. For example, we can choose \[ \begin{array}{lcl} x = 3, \\ y = \sqrt[3]{3}, \\ z = \sqrt[3]{3}.\\ \end{array} \] However, it is an open problem in number theory if there are any three integers $x$, $y$ and $z$ that would satisfy the equation [6].

In this chapter, we will focus on basic concepts and algorithms in number theory. Throughout the chapter, we will assume that all numbers are integers, if not otherwise stated.

Primes and factors

A number $n>1$ is a prime if its only positive factors are 1 and $n$. For example, 7, 19 and 41 are primes, but 35 is not a prime, because $5 \cdot 7 = 35$. For every number $n>1$, there is a unique prime factorization \[ n = p_1^{\alpha_1} p_2^{\alpha_2} \cdots p_k^{\alpha_k},\] where $p_1,p_2,\ldots,p_k$ are distinct primes and $\alpha_1,\alpha_2,\ldots,\alpha_k$ are positive numbers. For example, the prime factorization for 84 is \[ 84 = 2^2 \cdot 3^1 \cdot 7^1 \]

The number of factors of a number $n$ is

\[ \tau(n)=\prod_{i=1}^k (\alpha_i+1) \]

because for each prime $p_i$, there are $\alpha_i+1$ ways to choose how many times it appears in the factor. For example, the number of factors of 84 is $\tau(84)=3 \cdot 2 \cdot 2 = 12$. The factors are

$$1, 2, 3, 4, 6, 7, 12, 14, 21, 28, 42, 84$$

The sum of factors of $n$ is \[ \sigma(n)=\prod_{i=1}^k (1+p_i+\ldots+p_i^{\alpha_i}) = \prod_{i=1}^k \frac{p_i^{a_i+1}-1}{p_i-1} \] where the latter formula is based on the geometric progression formula. For example, the sum of factors of 84 is \[ \sigma(84)=\frac{2^3-1}{2-1} \cdot \frac{3^2-1}{3-1} \cdot \frac{7^2-1}{7-1} = 7 \cdot 4 \cdot 8 = 224 \]

The product of factors of $n$ is \[ \mu(n)=n^{\tau(n)/2} \] because we can form $\tau(n)/2$ pairs from the factors, each with product $n$. For example, the factors of 84 produce the pairs $1 \cdot 84$, $2 \cdot 42$, $3 \cdot 28$, etc., and the product of the factors is $\mu(84)=84^6=351298031616$.

A number $n$ is called a perfect number if $n=\sigma(n)-n$, i.e., $n$ equals the sum of its factors between $1$ and $n-1$. For example, 28 is a perfect number, because $28=1+2+4+7+14$.

Number of primes

It is easy to show that there is an infinite number of primes. If the number of primes would be finite, we could construct a set $P={p_1,p_2,\ldots,p_n}$ that would contain all the primes. For example, $p_1=2$, $p_2=3$, $p_3=5$, and so on. However, using $P$, we could form a new prime \[p_1 p_2 \cdots p_n+1\] that is larger than all elements in $P$. This is a contradiction, and the number of primes has to be infinite.

Density of primes

The density of primes means how often there are primes among the numbers. Let $\pi(n)$ denote the number of primes between $1$ and $n$. For example, $\pi(10)=4$, because there are 4 primes between $1$ and $10$: 2, 3, 5 and 7.

It is possible to show that

\[ \pi(n) \approx \frac{n}{\ln n} \]

which means that primes are quite frequent. For example, the number of primes between $1$ and $10^6$ is $\pi(10^6)=78498$, and $10^6 / \ln 10^6 \approx 72382$.

Conjectures

There are many conjectures involving primes. Most people think that the conjectures are true, but nobody has been able to prove them. For example, the following conjectures are famous:

Goldbach's conjecture: Each even integer $n>2$ can be represented as a sum $n=a+b$ so that both $a$ and $b$ are primes.
Twin prime conjecture: There is an infinite number of pairs of the form ${p,p+2}$, where both $p$ and $p+2$ are primes.
Legendre's conjecture: There is always a prime between numbers $n^2$ and $(n+1)^2$, where $n$ is any positive integer.

Basic algorithms

If a number $n$ is not prime, it can be represented as a product $a \cdot b$, where $a \le \sqrt n$ or $b \le \sqrt n$, so it certainly has a factor between $2$ and $\lfloor \sqrt n \rfloor$. Using this observation, we can both test if a number is prime and find the prime factorization of a number in $O(\sqrt n)$ time.

The following function prime checks if the given number $n$ is prime. The function attempts to divide $n$ by all numbers between $2$ and $\lfloor \sqrt n \rfloor$,

#![allow(unused)]
fn main() {
fn prime(n: usize) -> bool {
    if n < 2 {
        return false
    }
    let mut x = 2;
    while x*x <= n{
        if n%x == 0 {
            return false
        }
        x+=1
    }
    true
}
}

The following function factors constructs a vector that contains the prime factorization of $n$. The function divides $n$ by its prime factors, and adds them to the vector. The process ends when the remaining number $n$ has no factors between $2$ and $\lfloor \sqrt n \rfloor$. If $n>1$, it is prime and the last factor.

#![allow(unused)]
fn main() {
println!("n = 100 -> {:?}", factors(100));
fn factors(mut n: isize) -> Vec<isize>{
    let mut f = Vec::new();
    let mut x = 2;
    while x*x <= n {
        while n%x == 0 {
            f.push(x);
            n/=x;
        }
        x+=1;
    }
    if n > 1 { f.push(n) }
    f
}
}

Note that each prime factor appears in the vector as many times as it divides the number. For example, $24=2^3 \cdot 3$, so the result of the function is [2,2,2,3].

Sieve of Eratosthenes

The sieve of Eratosthenes is a preprocessing algorithm that builds an array using which we can efficiently check if a given number between $2 \ldots n$ is prime and, if it is not, find one prime factor of the number.

The algorithm builds an array sieve whose positions $2,3,\ldots,n$ are used. The value sieve[k]=0 means that $k$ is prime, and the value sieve[k] != 0 means that $k$ is not a prime and one of its prime factors is sieve[k].

The algorithm iterates through the numbers $2 \ldots n$ one by one. Always when a new prime $x$ is found, the algorithm records that the multiples of $x$ ($2x,3x,4x,\ldots$) are not primes, because the number $x$ divides them.

For example, if $n=20$, the array is as follows:

The following code implements the sieve of Eratosthenes. The code assumes that each element of sieve is initially zero.

#![allow(unused)]
fn main() {
let mut n = 20;
let mut sieve = vec![0_usize;n+1];
for x in 2..=n {
    if sieve[x] == 0 {
        let mut u = 2*x;
        while u <= n {
            sieve[u] = x;
            u+=x;
        }
    }
}
println!("{:?}", &sieve[2..]);
}

The inner loop of the algorithm is executed $n/x$ times for each value of $x$. Thus, an upper bound for the running time of the algorithm is the harmonic sum \[ \sum_{x=2}^n n/x = n/2 + n/3 + n/4 + \cdots + n/n = O(n \log n) \]

In fact, the algorithm is more efficient, because the inner loop will be executed only if the number $x$ is prime. It can be shown that the running time of the algorithm is only $O(n \log \log n)$, a complexity very near to $O(n)$.

Euclid's Algorithm

The greatest common divisor of numbers $a$ and $b$, $\gcd(a,b)$, is the greatest number that divides both $a$ and $b$, and the least common multiple of $a$ and $b$, $\textrm{lcm}(a,b)$, is the smallest number that is divisible by both $a$ and $b$. For example, $\gcd(24,36)=12$ and $\textrm{lcm}(24,36)=72$.

The greatest common divisor and the least common multiple are connected as follows: \[ \textrm{lcm}(a,b)=\frac{ab}{\textrm{gcd}(a,b)} \]

Euclid's algorithm¹ provides an efficient way to find the greatest common divisor of two numbers. The algorithm is based on the following formula:

\[ \textrm{gcd}(a,b) = \begin{cases} a & b = 0\\ \textrm{gcd}(b,a \bmod b) & b \neq 0\ \end{cases} \]

For example, \[ \textrm{gcd}(24,36) = \textrm{gcd}(36,24) = \textrm{gcd}(24,12) = \textrm{gcd}(12,0)=12 \]

The algorithm can be implemented as follows:

#![allow(unused)]
fn main() {
fn gcd(a: usize, b:usize)-> usize{
    if b == 0 {return a}
    gcd(b, a%b)
}
println!("gcd between 100 and 85 is {:?}", gcd(100, 85));
}

It can be shown that Euclid's algorithm works in $O(\log n)$ time, where $n=\min(a,b)$. The worst case for the algorithm is the case when $a$ and $b$ are consecutive Fibonacci numbers. For example, \[ \textrm{gcd}(13,8)=\textrm{gcd}(8,5) =\textrm{gcd}(5,3)=\textrm{gcd}(3,2)=\textrm{gcd}(2,1)=\textrm{gcd}(1,0)=1 \]

Euler's totient function

Numbers $a$ and $b$ are \key{coprime} if $\textrm{gcd}(a,b)=1$. Euler's totient function $\varphi(n)$ %\footnote{Euler presented this function in 1763.} gives the number of coprime numbers to $n$ between $1$ and $n$. For example, $\varphi(12)=4$, because 1, 5, 7 and 11 are coprime to 12.

The value of $\varphi(n)$ can be calculated from the prime factorization of $n$ using the formula

\[ \varphi(n) = \prod_{i=1}^k p_i^{\alpha_i-1}(p_i-1) \]

For example, $\varphi(12)=2^1 \cdot (2-1) \cdot 3^0 \cdot (3-1)=4$. Note that $\varphi(n)=n-1$ if $n$ is prime.

¹ Euclid was a Greek mathematician who lived in about 300 BC. This is perhaps the first known algorithm in history.

Modular arithmetic

In modular arithmetic, the set of numbers is limited so that only numbers $0,1,2,\ldots,m-1$ are used, where $m$ is a constant. Each number $x$ is represented by the number $x \bmod m$: the remainder after dividing $x$ by $m$. For example, if $m=17$, then $75$ is represented by $75 \bmod 17 = 7$.

Often we can take remainders before doing calculations. In particular, the following formulas hold: \[ \begin{array}{rcl} (x+y) \bmod m & = & (x \bmod m + y \bmod m) \bmod m \\ (x-y) \bmod m & = & (x \bmod m - y \bmod m) \bmod m \\ (x \cdot y) \bmod m & = & (x \bmod m \cdot y \bmod m) \bmod m \\ x^n \bmod m & = & (x \bmod m)^n \bmod m \\ \end{array} \]

Modular exponentiation

There is often need to efficiently calculate the value of $x^n \bmod m$. This can be done in $O(\log n)$ time using the following recursion:

\[ x^n = \begin{cases} 1 & n = 0 \\ x^{n/2} \cdot x^{n/2} & \text{n is even} \\ x^{n-1} \cdot x & \text{n is odd} \\ \end{cases} \]

It is important that in the case of an even $n$, the value of $x^{n/2}$ is calculated only once. This guarantees that the time complexity of the algorithm is $O(\log n)$, because $n$ is always halved when it is even.

The following function calculates the value of $x^n \bmod m$:

#![allow(unused)]
fn main() {
println!("{:?}", modpow(5, 4, 2));
fn modpow(x: usize, n:usize, m:usize) -> usize{
    if n == 0 {return 1%m}
    let mut u = modpow(x, n/2, m);
    u = (u*u)%m;
    if n%2 == 1 {u = (u*x)%m}
    u
}
}

Fermat's theorem and Euler's theorem

Fermat's theorem states that \[x^{m-1} \bmod m = 1\] when $m$ is prime and $x$ and $m$ are coprime. This also yields \[x^k \bmod m = x^{k \bmod (m-1)} \bmod m.\] More generally, Euler's theorem states that \[x^{\varphi(m)} \bmod m = 1\] when $x$ and $m$ are coprime. Fermat's theorem follows from Euler's theorem, because if $m$ is a prime, then $\varphi(m)=m-1$.

Modular Inverse

The inverse of $x$ modulo $m$ is a number $x^{-1}$ such that \[ x x^{-1} \bmod m = 1. \] For example, if $x=6$ and $m=17$, then $x^{-1}=3$, because $6\cdot3 \bmod 17=1$.

Using modular inverses, we can divide numbers modulo $m$, because division by $x$ corresponds to multiplication by $x^{-1}$. For example, to evaluate the value of $36/6 \bmod 17$, we can use the formula $2 \cdot 3 \bmod 17$, because $36 \bmod 17 = 2$ and $6^{-1} \bmod 17 = 3$.

However, a modular inverse does not always exist. For example, if $x=2$ and $m=4$, the equation \[ x x^{-1} \bmod m = 1 \] cannot be solved, because all multiples of 2 are even and the remainder can never be 1 when $m=4$. It turns out that the value of $x^{-1} \bmod m$ can be calculated exactly when $x$ and $m$ are coprime.

If a modular inverse exists, it can be calculated using the formula \[ x^{-1} = x^{\varphi(m)-1}. \]

If $m$ is prime, the formula becomes

\[ x^{-1} = x^{m-2}. \]

For example,

\[6^{-1} \bmod 17 =6^{17-2} \bmod 17 = 3.\]

This formula allows us to efficiently calculate modular inverses using the modular exponentation algorithm. The formula can be derived using Euler's theorem. First, the modular inverse should satisfy the following equation:

\[ x x^{-1} \bmod m = 1. \]

On the other hand, according to Euler's theorem,

\[ x^{\varphi(m)} \bmod m = xx^{\varphi(m)-1} \bmod m = 1, \]

so the numbers $x^{-1}$ and $x^{\varphi(m)-1}$ are equal.

Computer Arithmetic

In Rust, when arithmetic operations result in a value that exceeds the maximum representable value for a given data type, it causes an overflow. This behavior is different from modulo wrapping.

For example, in Rust, if you have a u32 variable with a value of $123456789$, and you multiply it by itself, the result will overflow because it exceeds the maximum value that can be represented by a u32. The calculation would be:

#![allow(unused)]
fn main() {
let x = 123456789_u32;
println!("{}", x*x); // overflow
}

Rust provides several ways to handle integer overflow. One common approach is to use the wrapping_* methods, which perform arithmetic operations while wrapping around on overflow. Here's an example of using wrapping_mul to perform multiplication, the result is $123456789^2 \bmod 2^{32} = 2537071545$.

#![allow(unused)]
fn main() {
let x = 123456789_u32;
println!("{}", x.wrapping_mul(x)); // 2537071545
}

Solving equations

Diophantine Equations

A Diophantine equation is an equation of the form

\[ ax + by = c, \]

where $a$, $b$ and $c$ are constants and the values of $x$ and $y$ should be found. Each number in the equation has to be an integer. For example, one solution for the equation $5x+2y=11$ is $x=3$ and $y=-2$.

We can efficiently solve a Diophantine equation by using Euclid's algorithm. It turns out that we can extend Euclid's algorithm so that it will find numbers $x$ and $y$ that satisfy the following equation: \[ ax + by = \textrm{gcd}(a,b) \]

A Diophantine equation can be solved if $c$ is divisible by $\textrm{gcd}(a,b)$, and otherwise it cannot be solved.

As an example, let us find numbers $x$ and $y$ that satisfy the following equation: \[ 39x + 15y = 12 \] The equation can be solved, because $\textrm{gcd}(39,15)=3$ and $3 \mid 12$. When Euclid's algorithm calculates the greatest common divisor of 39 and 15, it produces the following sequence of function calls:

\[ \textrm{gcd}(39,15) = \textrm{gcd}(15,9) = \textrm{gcd}(9,6) = \textrm{gcd}(6,3) = \textrm{gcd}(3,0) = 3 \]

This corresponds to the following equations:

\[ \begin{array}{lcl} 39 - 2 \cdot 15 & = & 9 \\ 15 - 1 \cdot 9 & = & 6 \\ 9 - 1 \cdot 6 & = & 3 \\ \end{array} \]

Using these equations, we can derive

\[ 39 \cdot 2 + 15 \cdot (-5) = 3 \]

and by multiplying this by 4, the result is

\[ 39 \cdot 8 + 15 \cdot (-20) = 12, \]

so a solution to the equation is $x=8$ and $y=-20$.

A solution to a Diophantine equation is not unique, because we can form an infinite number of solutions if we know one solution. If a pair $(x,y)$ is a solution, then also all pairs

\[(x+\frac{kb}{\textrm{gcd}(a,b)},y-\frac{ka}{\textrm{gcd}(a,b)})\]

are solutions, where $k$ is any integer.

Chinese Remainder Theorem

The Chinese remainder theorem solves a group of equations of the form

\[ \begin{array}{lcl} x & = & a_1 \bmod m_1 \\ x & = & a_2 \bmod m_2 \\ \cdots \\ x & = & a_n \bmod m_n \\ \end{array} \]

where all pairs of $m_1,m_2,\ldots,m_n$ are coprime.

Let $x^{-1}_m$ be the inverse of $x$ modulo $m$, and

\[ X_k = \frac{m_1 m_2 \cdots m_n}{m_k}.\]

Using this notation, a solution to the equations is

\[ x = a_1 X_1 {X_1}^{-1}_{m_1} + a_2 X_2 {X_2}^{-1}_{m_2} + \cdots + a_n X_n {X_n}^{-1}_{m_n} \]

In this solution, for each $k=1,2,\ldots,n$,

\[a_k X_k {X_k}^{-1}_{m_k} \bmod m_k = a_k\]

because

\[X_k {X_k}^{-1}_{m_k} \bmod m_k = 1\]

Since all other terms in the sum are divisible by $m_k$, they have no effect on the remainder, and $x \bmod m_k = a_k$.

For example, a solution for

\[ \begin{array}{lcl} x & = & 3 \bmod 5 \\ x & = & 4 \bmod 7 \\ x & = & 2 \bmod 3 \\ \end{array} \]

\[ 3 \cdot 21 \cdot 1 + 4 \cdot 15 \cdot 1 + 2 \cdot 35 \cdot 2 = 263. \]

Once we have found a solution $x$, we can create an infinite number of other solutions, because all numbers of the form

\[ x+m_1 m_2 \cdots m_n \]

are solutions.

Other results

Lagrange's Theorem

Lagrange's theorem states that every positive integer can be represented as a sum of four squares, i.e., $a^2+b^2+c^2+d^2$. For example, the number 123 can be represented as the sum $8^2+5^2+5^2+3^2$.

Zeckendorf's Theorem

Zeckendorf's theorem states that every positive integer has a unique representation as a sum of Fibonacci numbers such that no two numbers are equal or consecutive Fibonacci numbers. For example, the number 74 can be represented as the sum $55+13+5+1$.

Pythagorean triples

A Pythagorean triple is a triple $(a,b,c)$ that satisfies the Pythagorean theorem $a^2+b^2=c^2$, which means that there is a right triangle with side lengths $a$, $b$ and $c$. For example, $(3,4,5)$ is a Pythagorean triple.

If $(a,b,c)$ is a Pythagorean triple, all triples of the form $(ka,kb,kc)$ are also Pythagorean triples where $k>1$. A Pythagorean triple is primitive if $a$, $b$ and $c$ are coprime, and all Pythagorean triples can be constructed from primitive triples using a multiplier $k$.

Euclid's formula can be used to produce all primitive Pythagorean triples. Each such triple is of the form

\[(n^2-m^2,2nm,n^2+m^2),\]

where $0<m<n$, $n$ and $m$ are coprime and at least one of $n$ and $m$ is even. For example, when $m=1$ and $n=2$, the formula produces the smallest Pythagorean triple

\[ (2^2-1^2,2\cdot2\cdot1,2^2+1^2)=(3,4,5) \]

Wilson's Theorem

Wilson's theorem states that a number $n$ is prime exactly when

\[ (n-1)! \bmod n = n-1 \]

For example, the number 11 is prime, because

\[ 10! \bmod 11 = 10 \] and the number 12 is not prime, because

\[ 11! \bmod 12 = 0 \neq 11 \]

Hence, Wilson's theorem can be used to find out whether a number is prime. However, in practice, the theorem cannot be applied to large values of $n$, because it is difficult to calculate values of $(n-1)!$ when $n$ is large.

Competitive Programmer's Handbook - Rust Edition