This is the first part of the Parallel Execution for Contests series. For your convenience you can find other parts using the links below :
Part 1 — Powershell
Part 2 — C#
Part 3 — C++

Hello! Today we start a new series about running code in parallel. This is not intented to be neither a concurrent programming course nor a tutorial how to use MPI. I am going to show a few simple programs which are very useful when solving algorithmic tasks.

Introduction

When you take part in a contest similar to Deadline24 you need to solve multiple problems on your machine. These are algorithmic problems so you need to come up with a feasible solution, both in time and space complexity. However, when you have a slower algorithm, you might want to catch up using the power of your machine — if you have a powerfull cluster then you can easily solve tasks even using less optimized algorithms.

Since most of todays’ CPUs are multicore it is fairly easy to speed up execution time using multiple threads. Unfortunately, sometimes it is not easy to change algorithm to multithreaded and do not loose precious seconds on threads synchronization. You might try introducing interthreads communication using MPI but when there is not enough time you probably want to introduce parallelism with as few changes as possible.

Fortunately, contests like Deadline usualy consists of multiple test cases for each task. We can utilize this to simply execute multiple instances of our program in parallel. This approach doesn’t require any threads synchronization and is very effective because it is the operating system who will take care of most of the stuff. All we need to do is run multiple instances of application and pass different input files for each of them. Let’s see how to do that.

Powershell features

We assume that we have a single threaded program called Solver.exe which reads some data from standard input and writes to standard output. We also have multiple input files: input1.txt, …, input10.txt. All we want to do is run application, pass input file and capture output. We can do the following using Powershell:

cat input1.txt | .\Solver.exe | out-file -encoding ascii output1.txt

We can also solve all files with one command:

1..10 | %{ cat "input_.txt" | .\Solver.exe | out-file -encoding ascii "output_.txt" }

This is very easy, however, it runs one by one. Another instance of Solver will be run only after previous one finished its work. This means that most of our cores will stay idle during the execution.

Fortunately, there are simple tricks to run code in parallel using Powershell. Let’s see them.

Workflows

We start with workflows. 64 bit Powershell is able to run code in parallel using a feature called Workflow like this:

workflow OurWorkflow
{
	foreach -parallel (i in 1..10){ 		cat "inputi.txt" | .\Solver.exe | out-file -encoding ascii "outputi.txt" 	} }  OurWorkflow </pre>  We can see two importants things here. First, we introduce a block of code using <code>workflow</code> keyword. Next, we use <code>foreach -parallel</code> option to run code in parallel. You can see some description <a href="https://technet.microsoft.com/en-us/library/jj713711.aspx">here</a>  This approach is simple and works out of the box. However, there are some drawbacks: <ul> 	<li>we cannot simply stop the script since CTRL+C might not work</li> 	<li>we cannot write to the console so simple <code>echo "I am here!"</code> will not work</li> 	<li>we cannot easily control number of threads running the code</li> 	<li>requires 64 bit Powershell</li> </ul>    So we can see that while it works it might not suit all our needs.  <h2>Background jobs</h2> Powershell is capable of running some jobs in the background using <a href="https://technet.microsoft.com/library/hh849698.aspx">start-job commandlet</a>. We can use this to run all jobs in background and wait for them:  <pre class="EnlighterJSRAW" data-enlighter-language="C#">jobs = @()

Function Run(scriptBlock,arguments){
	jobs += ,(start-jobscriptBlock -Arg arguments) }  Function Finish(){jobs | % {wait-job _} }  1..10 | %{ Run { param(_); cat "input_.txt" | .\Solver.exe | out-file -encoding ascii "output_.txt" } @(_) } -end { Finish } </pre>  This works as expected, we can also query the job result and do some magic with it. We could also await for specific jobs and implement custom thread pooling. Unfortunately, there are drawbacks as well: <ul> 	<li>it runs in different directory</li> 	<li>it is slow — even after all jobs are done I need to wait few seconds to see the results (when writing to files)</li> 	<li>we are unable to write to the console (since these are background jobs)</li> </ul>  <h2>Parallel foreach</h2> Previous methods were using only built-in features of Powershell. Right now we will take a look at different method which requires additional code.  If you google carefully you will find <a href="https://gallery.technet.microsoft.com/Foreach-Parallel-Parallel-a8f3d22b">this script</a> on Technet. It basically does everything we need to execute code in parallel. This code goes as follows:  <pre class="EnlighterJSRAW" data-enlighter-language="C#"> function ForEach-Parallel {  <#  .SYNOPSIS  A parallel ForEach that uses runspaces    .PARAMETER ScriptBlock  ScriptBlock to execute for each InputObject    .PARAMETER ScriptFile  Script file to execute for each InputObject    .PARAMETER InputObject  Object(s) to run script against in parallel    .PARAMETER Throttle  Maximum number of threads to run at one time.  Default: 5    .PARAMETER Timeout  Stop each thread after this many minutes.  Default: 0    WARNING:  This parameter should be used as a failsafe only  Set it for roughly the entire duration you expect for all threads to complete    .PARAMETER SleepTimer  When looping through open threads, wait this many milliseconds before looping again.  Default: 200    .EXAMPLE  (0..50) | ForEach-Parallel -Throttle 4 {_; sleep (Get-Random -Minimum 0 -Maximum 5) } 
} 
 
Send the number 0 through 50 to scriptblock.  For each, display the number and then sleep for 0 to 5 seconds.  Only execute 4 threads at a time. 
 
.EXAMPLE 
servers | Foreach-Parallel -Throttle 20 -Timeout 60 -sleeptimer 200 -verbose -scriptFile C:\query.ps1    Run query.ps1 against each computer inservers.  Run 20 threads at a time, timeout a thread if it takes longer than 60 minutes to run, give verbose output. 
 
.FUNCTIONALITY  
PowerShell Language 
 
.NOTES 
Credit to Tome Tanasovski 
ForEach-Parallel 
#> 
    [cmdletbinding()] 
    param( 
        [Parameter(Mandatory=false,position=0,ParameterSetName='ScriptBlock')]              [System.Management.Automation.ScriptBlock]ScriptBlock, 
 
        [Parameter(Mandatory=false,ParameterSetName='ScriptFile')]          [ValidateScript({test-path_ -pathtype leaf})] 
            scriptFile,            [Parameter(Mandatory=true,ValueFromPipeline=true)]              [PSObject]InputObject, 
 
            [int]Throttle=5,                [double]sleepTimer = 200, 
 
            [double]Timeout = 0      )      BEGIN {                    #Build the scriptblock depending on the parameter used          switch (PSCmdlet.ParameterSetName){ 
            'ScriptBlock' {ScriptBlock =ExecutionContext.InvokeCommand.NewScriptBlock("param(`_)`r`n" +Scriptblock.ToString())} 
            'ScriptFile' {scriptblock = [scriptblock]::Create((get-content scriptFile | out-string))}              Default {Write-Error ("Must provide ScriptBlock or ScriptFile"); Return}          }                    #Define the initial sessionstate, create the runspacepool          Write-Verbose "Creating runspace pool withThrottle threads" 
        sessionState = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()pool = [Runspacefactory]::CreateRunspacePool(1, Throttle,sessionState, host)pool.open() 
         
        #array to hold details on each thread 
        threads = @()            #If inputObject is bound get a total count and set bound to truebound = false          if(PSBoundParameters.ContainsKey("inputObject") ){ 
            bound =true 
            totalCount =inputObject.count 
        } 
         
    } 
 
    PROCESS { 
         
run = @'          #For each pipeline object, create a new powershell instance, add to runspacepoolpowershell = [powershell]::Create().addscript(scriptblock).addargument(InputObject) 
        powershell.runspacepool=pool 
        startTime = get-date            #add references to inputobject, instance, handle and startTime to threads arraythreads += New-Object psobject -Property @{ 
            Object = inputObject;              instance =powershell; 
            handle = powershell.begininvoke();              startTime =startTime 
        } 
 
        Write-Verbose "Added inputobject to the runspacepool atstartTime" 
'@ 
 
        #Run the here string.  Put it in a foreach loop if it didn't come from the pipeline 
        if(bound){run = run -replace 'inputObject', 'object'              foreach(object in inputObject){                   Invoke-Expression -commandrun 
            } 
        } 
 
        else{ 
         
            Invoke-Expression -command run          }        }      END {notdone = true                    #Loop through threads.          while (notdone) { 
 
            notdone =false 
            for (i=0;i -lt threads.count;i++) { 
                thread =threads[i]                  if (thread) { 
 
                    #If thread is complete, dispose of it. 
                    if (thread.handle.iscompleted) {                          Write-verbose "Closing thread for(thread.Object)"thread.instance.endinvoke(thread.handle)thread.instance.dispose() 
                        threads[i] = null                      }                        #Thread exceeded maxruntime timeout threshold                      elseif(Timeout -ne 0 -and ( (get-date) - thread.startTime ).totalminutes -gtTimeout ){ 
                        Write-Error "Closing thread for (thread.Object): Thread exceeded Timeout minute limit" -TargetObjectthread.inputObject 
                        thread.instance.dispose()threads[i] =null 
                    } 
 
                    #Thread is running, loop again! 
                    else { 
                        notdone =true 
                    } 
                }            
            } 
 
            #Sleep for specified time before looping again 
            Start-Sleep -Milliseconds sleepTimer          }pool.close() 
    } 
}

It basically has all the things we would like it to have. We are able to specify number of threads, we also can easily pass script block and write to console. We execute this code in the following way:

1..10 | foreach-parallel -throttle 4 { cat "input_.txt" | .\Solver.exe | out-file -encoding ascii "output_.txt" }

When the task is finished its output will be print out to the console so we can easily verify our commands by echoing them. This is especially useful when we don’t have much time to debug scripts.

Summary

We now know how to execute multiple instances of our program. In the next parts we are going to see how to introduce parallelism to our algorithms by solving every test case on different thread using thread pools.