This is the eleventh part of the ILP series. For your convenience you can find other parts in the table of contents in Part 1 – Boolean algebra

Last time we implemented a selection sort in ILP. Today we are going to do something similar: we are going to implement counting sort.

Table of Contents

Introduction

As you probably remember from your classes, counting sort is useful when we try to sort values from small and finite set. It is for instance great for sorting people by their age because there are no people older than 200 years so we have at most two hundred possible values (and we can confine this range even more). What’s more, counting sort time complexity is linear so in theory it should work faster than selection sort. So let’s begin.

Problem definition

First, let’s define necessary variables:

$\begin{gather*} n, m \in \mathbb{N_+} \\ y_1, y_2, \ldots, y_m \in \mathbb{Z} \\ y_1 \le y_2 \le \cdots \le y_m \\ \mathbb{Y} = \{ y_1, y_2, \ldots, y_m \} \\ x_1, x_2, \ldots, x_n \in \mathbb{Y} \\ k \in \mathbb{Z_+} \\ |x_1|, |x_2|, \ldots, |x_n| \le k \\ x = \left(x_1, x_2, \ldots, x_n\right)\\ z_1, z_2, \ldots, z_n \in \mathbb{Y} \\ z_1 \le z_2 \le \cdots \le z_n \end{gather*}$

We have $n$ variables $x_i$ to sort. Every $x_i$ variable has a value from set $\mathbb{Y}$ . Set $\mathbb{Y}$ has $m$ variables $y_i$ . Variables $z_i$ represents the final result.
We are going to sort variables $x_i$ using the knowledge that they all come from the set $\mathbb{Y}$ .

Algorithm

Our algorithm will have the following steps:

Compare every $x_i$ with every $y_j$
Sum results of comparisons to know how many values are there actually
Construct the final vector

Comparing

We define the following variables:

$c_{y_i = x_j} = y_i \stackrel{?}{=} x_j$

We basically try to find them number of values $y_i$ in vector $X$ . After this step we have $m \cdot n$ different variables.

Summing results

Now we need to know how many values are there so we define the following variables:

$c_{y_i} = \sum_{j=1}^{n} c_{y_i = x_j}$

Now we know that there are exactly $c_{y_i}$ values $y_i$ in vector $X$ . We now need to aggregate the results to be able to reconstruct the vector, so we define the partial sums:

$s_{i} = \sum_{j=1}^{i} c_{y_i}$

This value tells us that there are exactly $s_i$ elements not greater than $y_i$ . This might be a bit tricky so let’s consider an example first.

Example

Let’s imagine that we want to sort the following vector:

$X = \{1, 2, 3, 1, 2, 1\}$

We can see that $x_1 = 1, x_2 = 2, x_3 = 3, x_4 = 4, x_5 = 2, x_6 = 1$ and we have $n=6$ values. We also assume the following set of possible values:

$\begin{gather*} m = 4\\ y_1 = 1\\ y_2 = 2\\ y_3 = 3\\ y_4 = 4 \end{gather*}$

Basically we assumed that our values are not greater than $4$ and not less than $1$ . It is worth noting that in $X$ there is no variables equal to $4$ but this is how counting sort works. We now have the following results of comparisons:

$\begin{gather*} c_{y_1 = x_1} = y_1 \stackrel{?}{=} x_1 = 1 \stackrel{?}{=} 1 = 1\\ c_{y_1 = x_2} = y_1 \stackrel{?}{=} x_2 = 1 \stackrel{?}{=} 2 = 0\\ c_{y_1 = x_3} = y_1 \stackrel{?}{=} x_3 = 1 \stackrel{?}{=} 3 = 0\\ c_{y_1 = x_4} = y_1 \stackrel{?}{=} x_4 = 1 \stackrel{?}{=} 1 = 1\\ c_{y_1 = x_5} = y_1 \stackrel{?}{=} x_5 = 1 \stackrel{?}{=} 2 = 0\\ c_{y_1 = x_6} = y_1 \stackrel{?}{=} x_6 = 1 \stackrel{?}{=} 1 = 1\\ c_{y_2 = x_1} = y_2 \stackrel{?}{=} x_1 = 2 \stackrel{?}{=} 1 = 0\\ c_{y_2 = x_2} = y_2 \stackrel{?}{=} x_2 = 2 \stackrel{?}{=} 2 = 1\\ c_{y_2 = x_3} = y_2 \stackrel{?}{=} x_3 = 2 \stackrel{?}{=} 3 = 0\\ c_{y_2 = x_4} = y_2 \stackrel{?}{=} x_4 = 2 \stackrel{?}{=} 1 = 0\\ c_{y_2 = x_5} = y_2 \stackrel{?}{=} x_5 = 2 \stackrel{?}{=} 2 = 1\\ c_{y_2 = x_6} = y_2 \stackrel{?}{=} x_6 = 2 \stackrel{?}{=} 1 = 0\\ c_{y_3 = x_1} = y_3 \stackrel{?}{=} x_1 = 3 \stackrel{?}{=} 1 = 0\\ c_{y_3 = x_2} = y_3 \stackrel{?}{=} x_2 = 3 \stackrel{?}{=} 2 = 0\\ c_{y_3 = x_3} = y_3 \stackrel{?}{=} x_3 = 3 \stackrel{?}{=} 3 = 1\\ c_{y_3 = x_4} = y_3 \stackrel{?}{=} x_4 = 3 \stackrel{?}{=} 1 = 0\\ c_{y_3 = x_5} = y_3 \stackrel{?}{=} x_5 = 3 \stackrel{?}{=} 2 = 0\\ c_{y_3 = x_6} = y_3 \stackrel{?}{=} x_6 = 3 \stackrel{?}{=} 1 = 0\\ c_{y_4 = x_1} = y_4 \stackrel{?}{=} x_1 = 4 \stackrel{?}{=} 1 = 0\\ c_{y_4 = x_2} = y_4 \stackrel{?}{=} x_2 = 4 \stackrel{?}{=} 2 = 0\\ c_{y_4 = x_3} = y_4 \stackrel{?}{=} x_3 = 4 \stackrel{?}{=} 3 = 0\\ c_{y_4 = x_4} = y_4 \stackrel{?}{=} x_4 = 4 \stackrel{?}{=} 1 = 0\\ c_{y_4 = x_5} = y_4 \stackrel{?}{=} x_5 = 4 \stackrel{?}{=} 2 = 0\\ c_{y_4 = x_6} = y_4 \stackrel{?}{=} x_6 = 4 \stackrel{?}{=} 1 = 0\\ \end{gather*}$

Now we need to aggregate these values:

$\begin{gather*} c_{y_1} = c_{y_1 = x_1} + c_{y_1 = x_2} + c_{y_1 = x_3} + c_{y_1 = x_4} + c_{y_1 = x_5} + c_{y_1 = x_6} = 3\\ c_{y_2} = 2\\ c_{y_3} = 1\\ c_{y_4} = 0 \end{gather*}$

Right now we know that there are exactly $c_{y_i}$ variables in vector $X$ equal to $y_i$ . Now we sum the results:

$\begin{gather*} s_{1} = c_{y_1} = 3\\ s_{2} = c_{y_1} + c_{y_2} = 3 + 2 = 5\\ s_{3} = c_{y_1} + c_{y_2} + c_{y_3} = 3 + 2 + 1 = 6\\ s_{4} = c_{y_1} + c_{y_3} + c_{y_3} + c_{y_4}= 3 + 2 + 1 + 0 = 6 \end{gather*}$

And now we know that there are exactly $3$ values not greater than $y_1$ in vector $X$ , $5$ values not greater than $y_2$ and so on.

Constructing the result

Every $s_{i}$ variable tells us how many values not greater than $y_i$ there are in the original vector. We can utilize this knowledge to put values in correct places:

$\begin{gather*} z_{i} = \min \left( s_j \stackrel{?}{\ge} i \ ?\ y_i \ :\ k\right) \end{gather*}$

In our example it is:

$\begin{gather*} z_{1} = \min \left( s_1 \stackrel{?}{\ge} 1 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 1 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 1 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 1 \ ?\ y_4 \ :\ k ) \\ = \min \left(y_1, y_2, y_3, y_4 \right) = y_1\\ z_{2} = \min \left( s_1 \stackrel{?}{\ge} 2 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 2 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 2 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 2 \ ?\ y_4 \ :\ k )\\ = \min \left(y_1, y_2, y_3, y_4 \right) = y_1\\ z_{3} = \min \left( s_1 \stackrel{?}{\ge} 3 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 3 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 3 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 3 \ ?\ y_4 \ :\ k )\\ = \min \left(y_1, y_2, y_3, y_4 \right) = y_1\\ z_{4} = \min \left( s_1 \stackrel{?}{\ge} 4 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 4 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 4 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 4 \ ?\ y_4 \ :\ k )\\ = \min \left(k ,y_2, y_3, y_4 \right) = y_2\\ z_{5} = \min \left( s_1 \stackrel{?}{\ge} 5 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 5 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 5 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 5 \ ?\ y_4 \ :\ k )\\ = \min \left(k, y_2, y_3, y_4 \right) = y_2\\ z_{6} = \min \left( s_1 \stackrel{?}{\ge} 6 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 6 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 6 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 6 \ ?\ y_4 \ :\ k )\\ = \min \left(k, k, y_3, y_4 \right) = y_3 \end{gather*}$

This works in the following way: for every position $i$ we ask whether there are at least $i$ elements not greater than $y_j$ and we choose minimum of $y_j$ values. So we basically try to insert values in their places and we need to use $\min$ function to select lowest possible value for every place.

Complexity

As you can see, this algorithm needs $m\cdot n$ temporary variables. Assuming that $n \gg m$ our algorithm uses linear space. If it is not true that $n \gg m$ then it makes no sense using this algorithm but the same happens with casual counting sort. There is no use to sort numbers using this algorithm when the domain is really big.

Summary

We already know two sorting algorithms. Their implementation is rather straightforward and really resembles the imperative counterparts. As an exercise you might want to implement other imperative algorithms using similar approach.