I often get feedback from new users of Whitebox Geospatial Analysis Tools about how surprisingly fast it is for whatever task they are doing. However whenever I develop a new tool or function for Whitebox, I always take algorithm performance into consideration. A big part of that consideration these days is the potential for parallelizing all or parts of an operation to take advantage of the of multi-core processors that have become ubiquitous. There are many geoprocessing tools in Whitebox GAT that take advantage of concurrency, e.g the Clip tool, the various interpolation tools for LiDAR, and several others.
I try to take a balanced approach with respect to parallelization. Not every operation will necessarily benefit from parallelization (some will actually become slower) and some operations are simply unparallelizable, while many tools will only benefit when a certain bottleneck in the processing workflow are calculated concurrently. Another major consideration when I develop a new tool or function is managing memory requirements. People keep talking about Big Data these days like it’s a new phenomenon, but we in the geomatics community have been dealing with massive datasets for as long as there’s been a community. Often parallelizing an operation is possible but adds significant memory requirements that would make the parallelized version only applicable to smaller datasets that can easily fit in system memory. When this is the case, I’ll sometimes provide an option to the user to choose a parallelized or memory-optimized version. I must always consider the scalability of a tool during design.
With this tool you can call any of Whitebox’s 400+ plugin tools in a type of parallel batch mode. Each call of the tool will run on it’s own a separate thread. This can significantly reduce processing time, particularly when you have a four (or even eight) core system. Even custom plugin tools that you have developed will be available for targeting with the Run Plugin In Parallel tool. Using the tool is fairly straightforward. You have to specify the name of the plugin tool that you are running. Here it is important to remember that it is the proper plugin name and not the descriptive name that you may see displayed in the tools listings. Normally the proper plugin name is the same or similar to the descriptive name but without spaces. If in doubt, you can always look at the source code. The second input parameter is a text file. Each line within the text file provides the parameters that are supplied to the tool for one run. The input parameters are going to be specific to the tool and are the same that are used when you call a tool from a script. The help documentation for each tool has a Scripting section that describes the input parameters required to run the tool from a script. The following is an example of a text file that could be used for to run a parallelized batch mode of the FD8 Flow Accumulation tool (FlowAccumulationFD8):
/Documents/Data/DEM1.dep, /Documents/Data/FlowAccum1.dep, 1, specific catchment area (sca), true, not specified
/Documents/Data/DEM2.dep, /Documents/Data/FlowAccum2.dep, 1, specific catchment area (sca), true, not specified
/Documents/Data/DEM3.dep, /Documents/Data/FlowAccum3.dep, 1, specific catchment area (sca), true, not specified
/Documents/Data/DEM4.dep, /Documents/Data/FlowAccum4.dep, 1, specific catchment area (sca), true, not specified
Notice that each line should contain entries for each of the input parameters for the tool as specified by the help documentation, which in this case include:
demFile, outputFile, exponent, outputType, logTransformOutput, threshold
All of the parameters required for one run must be contained on the same line. The above example can be used to run the FD8 Flow Accumulation four times, each run on a separate thread, with four different inputs and four outputs. It will effectively quarter the time required to process the data on a typical quad-core system. But the real benefit occurs when you have many more files that need processing. If you have a more complex workflow involving more than one operation, you can run this parallel batch mode for each step (i.e. process all of the depression filling operations and then process all of the flow accumulation operations). You can even call the Run Plugin In Parallel tool from a script to automate these types of operations. If you’re working at that level, I think you’ve earned the right to call yourself a GIS ninja! Leave your comments below and, as always, best wishes and happy geoprocessing.