Sort – Random IT Utensils https://blog.adamfurmanek.pl IT, operating systems, maths, and more. Sat, 02 Jan 2021 18:44:54 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.2 ILP Part 11 – Counting Sort https://blog.adamfurmanek.pl/2015/10/31/ilp-part-11/ https://blog.adamfurmanek.pl/2015/10/31/ilp-part-11/#comments Sat, 31 Oct 2015 09:00:42 +0000 http://blog.adamfurmanek.pl/?p=1071 Continue reading ILP Part 11 – Counting Sort]]>

This is the eleventh part of the ILP series. For your convenience you can find other parts in the table of contents in Part 1 – Boolean algebra

Last time we implemented a selection sort in ILP. Today we are going to do something similar: we are going to implement counting sort.

Introduction

As you probably remember from your classes, counting sort is useful when we try to sort values from small and finite set. It is for instance great for sorting people by their age because there are no people older than 200 years so we have at most two hundred possible values (and we can confine this range even more). What’s more, counting sort time complexity is linear so in theory it should work faster than selection sort. So let’s begin.

Problem definition

First, let’s define necessary variables:

    \begin{gather*} n, m \in \mathbb{N_+} \\ y_1, y_2, \ldots, y_m \in \mathbb{Z} \\ y_1 \le y_2 \le \cdots \le y_m \\ \mathbb{Y} = \{ y_1, y_2, \ldots, y_m \} \\ x_1, x_2, \ldots, x_n \in \mathbb{Y} \\ k \in \mathbb{Z_+} \\ |x_1|, |x_2|, \ldots, |x_n| \le k \\ x = \left(x_1, x_2, \ldots, x_n\right)\\ z_1, z_2, \ldots, z_n \in \mathbb{Y} \\ z_1 \le z_2 \le \cdots \le z_n \end{gather*}

We have n variables x_i to sort. Every x_i variable has a value from set \mathbb{Y}. Set \mathbb{Y} has m variables y_i. Variables z_i represents the final result.
We are going to sort variables x_i using the knowledge that they all come from the set \mathbb{Y}.

Algorithm

Our algorithm will have the following steps:

  • Compare every x_i with every y_j
  • Sum results of comparisons to know how many values are there actually
  • Construct the final vector

Comparing

We define the following variables:

    \[ c_{y_i = x_j} = y_i \stackrel{?}{=} x_j \]

We basically try to find them number of values y_i in vector X. After this step we have m \cdot n different variables.

Summing results

Now we need to know how many values are there so we define the following variables:

    \[ c_{y_i} = \sum_{j=1}^{n} c_{y_i = x_j} \]

Now we know that there are exactly c_{y_i} values y_i in vector X. We now need to aggregate the results to be able to reconstruct the vector, so we define the partial sums:

    \[ s_{i} = \sum_{j=1}^{i} c_{y_i} \]

This value tells us that there are exactly s_i elements not greater than y_i. This might be a bit tricky so let’s consider an example first.

Example

Let’s imagine that we want to sort the following vector:

    \[ X = \{1, 2, 3, 1, 2, 1\} \]

We can see that x_1 = 1, x_2 = 2, x_3 = 3, x_4 = 4, x_5 = 2, x_6 = 1 and we have n=6 values. We also assume the following set of possible values:

    \begin{gather*} m = 4\\ y_1 = 1\\ y_2 = 2\\ y_3 = 3\\ y_4 = 4 \end{gather*}

Basically we assumed that our values are not greater than 4 and not less than 1. It is worth noting that in X there is no variables equal to 4 but this is how counting sort works. We now have the following results of comparisons:

    \begin{gather*} c_{y_1 = x_1} = y_1 \stackrel{?}{=} x_1 = 1 \stackrel{?}{=} 1 = 1\\ c_{y_1 = x_2} = y_1 \stackrel{?}{=} x_2 = 1 \stackrel{?}{=} 2 = 0\\ c_{y_1 = x_3} = y_1 \stackrel{?}{=} x_3 = 1 \stackrel{?}{=} 3 = 0\\ c_{y_1 = x_4} = y_1 \stackrel{?}{=} x_4 = 1 \stackrel{?}{=} 1 = 1\\ c_{y_1 = x_5} = y_1 \stackrel{?}{=} x_5 = 1 \stackrel{?}{=} 2 = 0\\ c_{y_1 = x_6} = y_1 \stackrel{?}{=} x_6 = 1 \stackrel{?}{=} 1 = 1\\ c_{y_2 = x_1} = y_2 \stackrel{?}{=} x_1 = 2 \stackrel{?}{=} 1 = 0\\ c_{y_2 = x_2} = y_2 \stackrel{?}{=} x_2 = 2 \stackrel{?}{=} 2 = 1\\ c_{y_2 = x_3} = y_2 \stackrel{?}{=} x_3 = 2 \stackrel{?}{=} 3 = 0\\ c_{y_2 = x_4} = y_2 \stackrel{?}{=} x_4 = 2 \stackrel{?}{=} 1 = 0\\ c_{y_2 = x_5} = y_2 \stackrel{?}{=} x_5 = 2 \stackrel{?}{=} 2 = 1\\ c_{y_2 = x_6} = y_2 \stackrel{?}{=} x_6 = 2 \stackrel{?}{=} 1 = 0\\ c_{y_3 = x_1} = y_3 \stackrel{?}{=} x_1 = 3 \stackrel{?}{=} 1 = 0\\ c_{y_3 = x_2} = y_3 \stackrel{?}{=} x_2 = 3 \stackrel{?}{=} 2 = 0\\ c_{y_3 = x_3} = y_3 \stackrel{?}{=} x_3 = 3 \stackrel{?}{=} 3 = 1\\ c_{y_3 = x_4} = y_3 \stackrel{?}{=} x_4 = 3 \stackrel{?}{=} 1 = 0\\ c_{y_3 = x_5} = y_3 \stackrel{?}{=} x_5 = 3 \stackrel{?}{=} 2 = 0\\ c_{y_3 = x_6} = y_3 \stackrel{?}{=} x_6 = 3 \stackrel{?}{=} 1 = 0\\ c_{y_4 = x_1} = y_4 \stackrel{?}{=} x_1 = 4 \stackrel{?}{=} 1 = 0\\ c_{y_4 = x_2} = y_4 \stackrel{?}{=} x_2 = 4 \stackrel{?}{=} 2 = 0\\ c_{y_4 = x_3} = y_4 \stackrel{?}{=} x_3 = 4 \stackrel{?}{=} 3 = 0\\ c_{y_4 = x_4} = y_4 \stackrel{?}{=} x_4 = 4 \stackrel{?}{=} 1 = 0\\ c_{y_4 = x_5} = y_4 \stackrel{?}{=} x_5 = 4 \stackrel{?}{=} 2 = 0\\ c_{y_4 = x_6} = y_4 \stackrel{?}{=} x_6 = 4 \stackrel{?}{=} 1 = 0\\ \end{gather*}

Now we need to aggregate these values:

    \begin{gather*} c_{y_1} = c_{y_1 = x_1} + c_{y_1 = x_2} + c_{y_1 = x_3} + c_{y_1 = x_4} + c_{y_1 = x_5} + c_{y_1 = x_6} = 3\\ c_{y_2} = 2\\ c_{y_3} = 1\\ c_{y_4} = 0 \end{gather*}

Right now we know that there are exactly c_{y_i} variables in vector X equal to y_i. Now we sum the results:

    \begin{gather*} s_{1} = c_{y_1} = 3\\ s_{2} = c_{y_1} + c_{y_2} = 3 + 2 = 5\\ s_{3} = c_{y_1} + c_{y_2} + c_{y_3} = 3 + 2 + 1 = 6\\ s_{4} = c_{y_1} + c_{y_3} + c_{y_3} + c_{y_4}= 3 + 2 + 1 + 0 = 6 \end{gather*}

And now we know that there are exactly 3 values not greater than y_1 in vector X, 5 values not greater than y_2 and so on.

Constructing the result

Every s_{i} variable tells us how many values not greater than y_i there are in the original vector. We can utilize this knowledge to put values in correct places:

    \begin{gather*} z_{i} = \min \left( s_j \stackrel{?}{\ge} i \ ?\ y_i \ :\ k\right) \end{gather*}

In our example it is:

    \begin{gather*} z_{1} = \min \left( s_1 \stackrel{?}{\ge} 1 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 1 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 1 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 1 \ ?\ y_4 \ :\ k ) \\ = \min \left(y_1, y_2, y_3, y_4 \right) = y_1\\ z_{2} = \min \left( s_1 \stackrel{?}{\ge} 2 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 2 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 2 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 2 \ ?\ y_4 \ :\ k )\\ = \min \left(y_1, y_2, y_3, y_4 \right) = y_1\\ z_{3} = \min \left( s_1 \stackrel{?}{\ge} 3 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 3 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 3 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 3 \ ?\ y_4 \ :\ k )\\ = \min \left(y_1, y_2, y_3, y_4 \right) = y_1\\ z_{4} = \min \left( s_1 \stackrel{?}{\ge} 4 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 4 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 4 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 4 \ ?\ y_4 \ :\ k )\\ = \min \left(k ,y_2, y_3, y_4 \right) = y_2\\ z_{5} = \min \left( s_1 \stackrel{?}{\ge} 5 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 5 \ ?\ y_2 \ :\ k,\\ s_3 \stackrel{?}{\ge} 5 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 5 \ ?\ y_4 \ :\ k )\\ = \min \left(k, y_2, y_3, y_4 \right) = y_2\\ z_{6} = \min \left( s_1 \stackrel{?}{\ge} 6 \ ?\ y_1 \ :\ k, s_2 \stackrel{?}{\ge} 6 \ ?\ y_2 \ :\ k,\\  s_3 \stackrel{?}{\ge} 6 \ ?\ y_3 \ :\ k, s_4 \stackrel{?}{\ge} 6 \ ?\ y_4 \ :\ k )\\ = \min \left(k, k, y_3, y_4 \right) = y_3 \end{gather*}

This works in the following way: for every position i we ask whether there are at least i elements not greater than y_j and we choose minimum of y_j values. So we basically try to insert values in their places and we need to use \min function to select lowest possible value for every place.

Complexity

As you can see, this algorithm needs m\cdot n temporary variables. Assuming that n \gg m our algorithm uses linear space. If it is not true that n \gg m then it makes no sense using this algorithm but the same happens with casual counting sort. There is no use to sort numbers using this algorithm when the domain is really big.

Summary

We already know two sorting algorithms. Their implementation is rather straightforward and really resembles the imperative counterparts. As an exercise you might want to implement other imperative algorithms using similar approach.

]]>
https://blog.adamfurmanek.pl/2015/10/31/ilp-part-11/feed/ 1
ILP Part 10 – Sort https://blog.adamfurmanek.pl/2015/10/24/ilp-part-10/ https://blog.adamfurmanek.pl/2015/10/24/ilp-part-10/#comments Sat, 24 Oct 2015 10:00:38 +0000 http://blog.adamfurmanek.pl/?p=881 Continue reading ILP Part 10 – Sort]]>

This is the tenth part of the ILP series. For your convenience you can find other parts in the table of contents in Part 1 – Boolean algebra

Today we are going to use conditional operator to implement something similar to selection sort. Let’s begin.

Algorithm

Selection sort goes as follows:

for i from 0 to array.length
    find minimum in array[i+1..array.length]
    swap found element with array[i]

In every iteration we look for the lowest element in the rest of the array and then we swap current element with the one we have just found. We are going to implement something similar. Because we don’t know yet how to define loop, we need to define order in different way. Basically, we will define requirements for element on every single position which will make sure that the array is sorted.

Variables

We will start with the following variables:

    \begin{gather*} n \in \mathbb{N_+}\\ x_1, x_2, \ldots, x_n \in \mathbb{Z} \\ k \in \mathbb{Z_+} |x_1|, |x_2|, \ldots, |x_n| \le k \\ x = \left(x_1, x_2, \ldots, x_n\right) \end{gather*}

We have a vector of n integer variables. We are going to compare values so we need to add constraints for maximum possible value for every variable.
We need another vector which will represent the sorted variables:

    \begin{gather*} y_1, y_2, \ldots, y_n \in x \\ y_1 \ge y_2 \ge \cdots \ge y_n \\ y = \left(y_1, y_2, \ldots, y_n\right) \end{gather*}

y is just a sorted vector x. Before we define the general formula, let us start with an example.

Sorting three variables

Imagine that we want to sort three variables x_1 = 5, x_2 = 2, x_3 = 3. For now let’s also assume that we have no duplicates, so x_1 \ne x_2, x_1 \ne x_3, x_2 \ne x_3.
First, we need to compare every pair and check the ordering. We calculate the following variables:

    \begin{gather*} x_{x_1 \ge x_2} = x_1 \stackrel{?}{\ge} x_2 = 5 \stackrel{?}{\ge} 2 = 1\\ x_{x_1 \ge x_3} = x_1 \stackrel{?}{\ge} x_3 = 5 \stackrel{?}{\ge} 3 = 1\\ x_{x_2 \ge x_1} = x_2 \stackrel{?}{\ge} x_1 = 2 \stackrel{?}{\ge} 5 = 0\\ x_{x_2 \ge x_3} = x_2 \stackrel{?}{\ge} x_3 = 2 \stackrel{?}{\ge} 3 = 0\\ x_{x_3 \ge x_1} = x_3 \stackrel{?}{\ge} x_1 = 3 \stackrel{?}{\ge} 5 = 0\\ x_{x_3 \ge x_2} = x_3 \stackrel{?}{\ge} x_2 = 3 \stackrel{?}{\ge} 2 = 1\\ \end{gather*}

We have compared all pairs and for each variable we know the number of smaller variables. Now we need to know the total count of not less variables:

    \begin{gather*} x_{s1} = x_{x_1 \ge x_2} + x_{x_1 \ge x_3} = 1 + 1 = 2 \\ x_{s2} = x_{x_2 \ge x_1} + x_{x_2 \ge x_3} = 0 + 0 = 0 \\ x_{s3} = x_{x_3 \ge x_1} + x_{x_3 \ge x_2} = 0 + 1 = 1 \\ \end{gather*}

We are almost there. We have compared the values and we know their ordering. Now we need to construct final vector. Value y_i is the i-th biggest value:

    \begin{gather*} y_0 = x_{s1} \stackrel{?}{=} 0\ ?\ x_1\ :\ x_{s2} \stackrel{?}{=} 0\ ?\ x_2\ :\ x_3\\ y_1 = x_{s1} \stackrel{?}{=} 1\ ?\ x_1\ :\ x_{s2} \stackrel{?}{=} 1\ ?\ x_2\ :\ x_3\\ y_2 = x_{s1} \stackrel{?}{=} 2\ ?\ x_1\ :\ x_{s2} \stackrel{?}{=} 2\ ?\ x_2\ :\ x_3 \end{gather*}

For every y_i we choose variable which is not less than i different values. We assumed that all x_i are different so there is only one value which satisfies the condition. However, if that is not be the case we might modify the formula to handle the duplicates:

    \begin{gather*} y_0 = \min \left(x_{s1} \stackrel{?}{\ge} 0\ ?\ x_1\ :\ k, x_{s2} \stackrel{?}{\ge} 0\ ?\ x_2\ :\ k, x_{s_3} \stackrel{?}{\ge} 0\ ?\ x_3\ :\ k \right)\\ y_1 = \min \left(x_{s1} \stackrel{?}{\ge} 1\ ?\ x_1\ :\ k, x_{s2} \stackrel{?}{\ge} 1\ ?\ x_2\ :\ k, x_{s_3} \stackrel{?}{\ge} 1\ ?\ x_3\ :\ k \right)\\ y_2 = \min \left(x_{s1} \stackrel{?}{\ge} 2\ ?\ x_1\ :\ k, x_{s2} \stackrel{?}{\ge} 2\ ?\ x_2\ :\ k, x_{s_3} \stackrel{?}{\ge} 2\ ?\ x_3\ :\ k \right) \end{gather*}

For every y_i we choose select values which are not less than at least i different values and then we choose the minimum. For instance, when x_1 = x_2 = x_3 = 5, every x_{x_i \ge x_j} is one, so the sums x_{si} are equal to 2, but we are still able to construct the final vector.

Final formula

General sorting formula goes as follows:

    \begin{gather*} \displaystyle\mathop{\forall}_{\substack{ i=0,\ldots,n \\ j=0, \ldots, n \\ j \ne i}} x_{x_i \ge x_j} = x_i \stackrel{?}{\ge} x_j \\ \displaystyle\mathop{\forall}_{i=0, \ldots, n} x_{si} = \sum_{\substack{j=0 \\ j \ne i }}^{n} x_{x_i \ge x_j}\\ \displaystyle\mathop{\forall}_{i=0, \ldots, n} y_i = \displaystyle\mathop{\min}_{j=0, \ldots, n} \left( x_{sj} \stackrel{?}{\ge} i\ ?\ x_j\ :\ k \right) \end{gather*}

Summary

We are now able to sort any integer variables. This algorithm requires O\left(n^2\right) temporary variables because we compare every possible pair so is rather memory-consuming but works quite well and solvers are able to calculate the results rather smoothly.

Bonus chatter

As an exercise you might want to implement counting sort. What is the space complexity of your solution?

]]>
https://blog.adamfurmanek.pl/2015/10/24/ilp-part-10/feed/ 1