Parallelization of GIS Operations in Whitebox GAT

I often get feedback from new users of Whitebox Geospatial Analysis Tools about how surprisingly fast it is for whatever task they are doing. However whenever I develop a new tool or function for Whitebox, I always take algorithm performance into consideration. A big part of that consideration these days is the potential for parallelizing all or parts of an operation to take advantage of the of multi-core processors that have become ubiquitous. There are many geoprocessing tools in Whitebox GAT that take advantage of concurrency, e.g the Clip tool, the various interpolation tools for LiDAR, and several others.

I try to take a balanced approach with respect to parallelization. Not every operation will necessarily benefit from parallelization (some will actually become slower) and some operations are simply unparallelizable, while many tools will only benefit when a certain bottleneck in the processing workflow are calculated concurrently. Another major consideration when I develop a new tool or function is managing memory requirements. People keep talking about Big Data these days like it’s a new phenomenon, but we in the geomatics community have been dealing with massive datasets for as long as there’s been a community. Often parallelizing an operation is possible but adds significant memory requirements that would make the parallelized version only applicable to smaller datasets that can easily fit in system memory. When this is the case, I’ll sometimes provide an option to the user to choose a parallelized or memory-optimized version. I must always consider the scalability of a tool during design.

There is one common scenario when you can really benefit from parallelizing your workflow. Have you ever found yourself in a situation where you have dozens, or maybe even hundreds of files all of which need to have the same operation applied? Perhaps you have hundreds of LiDAR tiles and need to apply a depression removal operation on them all. Maybe you have dozens of satellite images that need to have the same convolution filter applied. It’s a fairly common situation for a GIS tech to find themselves in. Of course you could always write a quick Python, Javascript or Groovy script to automate the process and save you the tedium of having to perform each operation manually. But unless you dedicate some time to writing that script just right, it’s likely going to run sequentially. All of those extra processing cores on your machine are just going to idle away. I have a machine with 8 cores and boy does it bother me when they’re not all in use at all times! Well Whitebox has an answer for this particular situation and it’s called Run Plugin In Parallel.

Run Plugin In Parallel tool

Run Plugin In Parallel tool (click to enlarge)

With this tool you can call any of Whitebox’s 400+ plugin tools in a type of parallel batch mode. Each call of the tool will run on it’s own a separate thread. This can significantly reduce processing time, particularly when you have a four (or even eight) core system. Even custom plugin tools that you have developed will be available for targeting with the Run Plugin In Parallel tool. Using the tool is fairly straightforward. You have to specify the name of the plugin tool that you are running. Here it is important to remember that it is the proper plugin name and not the descriptive name that you may see displayed in the tools listings. Normally the proper plugin name is the same or similar to the descriptive name but without spaces. If in doubt, you can always look at the source code. The second input parameter is a text file. Each line within the text file provides the parameters that are supplied to the tool for one run. The input parameters are going to be specific to the tool and are the same that are used when you call a tool from a script. The help documentation for each tool has a Scripting section that describes the input parameters required to run the tool from a script. The following is an example of a text file that could be used for to run a parallelized batch mode of the FD8 Flow Accumulation tool (FlowAccumulationFD8):

/Documents/Data/DEM1.dep, /Documents/Data/FlowAccum1.dep, 1, specific catchment area (sca), true, not specified
/Documents/Data/DEM2.dep, /Documents/Data/FlowAccum2.dep, 1, specific catchment area (sca), true, not specified
/Documents/Data/DEM3.dep, /Documents/Data/FlowAccum3.dep, 1, specific catchment area (sca), true, not specified
/Documents/Data/DEM4.dep, /Documents/Data/FlowAccum4.dep, 1, specific catchment area (sca), true, not specified

Notice that each line should contain entries for each of the input parameters for the tool as specified by the help documentation, which in this case include:

demFile, outputFile, exponent, outputType, logTransformOutput, threshold

All of the parameters required for one run must be contained on the same line. The above example can be used to run the FD8 Flow Accumulation four times, each run on a separate thread, with four different inputs and four outputs. It will effectively quarter the time required to process the data on a typical quad-core system. But the real benefit occurs when you have many more files that need processing. If you have a more complex workflow involving more than one operation, you can run this parallel batch mode for each step (i.e. process all of the depression filling operations and then process all of the flow accumulation operations). You can even call the Run Plugin In Parallel tool from a script to automate these types of operations. If you’re working at that level, I think you’ve earned the right to call yourself a GIS ninja! Leave your comments below and, as always, best wishes and happy geoprocessing.

Advertisements

Displaying LAS LiDAR point clouds in the Whitebox map area

I enjoy working with LiDAR data whenever I can because of its remarkable topographic detail and unique characteristics. More often then not, I work with LiDAR data interpolated onto a raster grid. Lately, however, I’ve been working with terrestrial laser scanner data and having a means of quickly visualizing the point data itself has become important to my workflows. Did you know that as of the latest release of Whitebox GAT (v. 3.2.1) there is now native support for displaying LAS files, the commonly used standard format for storing LiDAR point clouds, in the map area?

Adding LAS layers to a map area

Adding LAS layers to a map area

The LAS point cloud will be added to the Whitebox map area in the same way that you can overlay other vector or raster geospatial data. Here’s an example of a LAS dataset overlaid on top of a raster hillshade image.

Example LAS point cloud

Example LAS point cloud (click to enlarge)

The display properties, including the point size, colour palette, and display value ranges can be easily manipulated.

LAS display properties

LAS display properties

You can even render the point cloud based on elevation, intensity, class value, scan angle, or the GPS time (This is a newly added feature that will be present in the next public release).

Rendering attribute options

Rendering attribute options

Here’s an example of a LAS point cloud rendered using its intensity data. It’s interesting how much it resembles a fine-resolution orthophotograph, but this image is actually made up of millions of individual points. You need to zoom in to see them all.

LAS file rendered by intensity

LAS file rendered by intensity (click to enlarge)

Visualizing the GPS time and scan angle can be very useful for identifying individual flight lines in airborne LiDAR datasets. Here’s an example of rendering the LAS file displayed above using the GPS time as the display attribute:

GPS time rendering

GPS time rendering (click to enlarge)

It may not be much to look at (unless you really like orange), but the edges of the various overlapping flight lines are quite apparent.

I’ve also recently added a new LiDAR Histogram tool that will allow you to visualize the statistical distribution of elevation, intensity, or scan angle within a LAS dataset, including outputting a table of the percentiles.

LiDAR histogram tool

LiDAR histogram tool (click to enlarge)

Whitebox has certainly become a first-rate open-source platform for manipulating and analyzing LiDAR data. And it’s getting better with every release! Leave your comments below and, as always, best wishes and happy geoprocessing.

If you enjoyed this blog, you may also enjoy “Working with LiDAR data in Whitebox GAT“.

Scripting Custom Whitebox GAT Plugin Tools

You may know that you can use scripting in Whitebox Geospatial Analysis Tools to automate your workflows, but did you know that you can also use the same in-application scripting to develop your own custom plugin tool? Of course, you can develop a custom tool using the Java programming language in an integrated development environment like Netbeans (see the How to create a plugin tool for Whitebox tutorial in the Help) but that is a bit of an involved process. By far, the easiest and most rapid way to create a new tool is to use Whitebox’s scripting functionality. You can write script-based tools to do any sort of spatial analysis function including manipulating raster data, shapefiles, and even LAS LiDAR point clouds. The best part of developing a script-based tool is that because you don’t need to recompile the program and add the jar file into the appropriate directory every time you make a change to the code, testing your tool on real-life data directly in the Whitebox environment speeds up the testing phase of development considerably.

Have you ever noticed that the tools listed in the Tools panel have two different icons–the ‘wrench’ (tool) and the ‘scroll’ (ScriptIcon2) icons? The ‘scroll’ icon designates a tool that is written as a script  using the Whitebox Scripter. Importantly, when you develop a tool using the built-in scripting functionality, Whitebox will treat it in exactly the same way that it treats compiled Java tools. This means that it will be listed in the Tool treeview and listings, it will have the same type of dialog user-interface, and it will even be automatically available to be called from other scripts. There are more than one hundred standard plugin tools distributed with Whitebox that have been developed using scripting. People frequently ask me how I’ve managed to write so many GIS tools into Whitebox (over 400 at this point) and script-based tools are my secret advantage.

One of the major differences between these tools and the compiled Java tools, is that they are fully editable. That is, you can open a live version of the source code of scripted tools directly in Whitebox and modify the code and the changes will be integrated as soon as you save the file. You simply need to right-click over the tool in the Tool treeview and select Edit Script. Having the ability to dig deep into the functionality of a tool and even experiment with the code is a large part of the open-access development strategy that the Whitebox project has developed. And the user doesn’t have to worry about breaking the code, because you can always fix any changes that you make to the code by reverting to the original version simply by right-clicking over the tool’s icon and selecting Update Script From Code Repository or by selecting Update Scripts From Repository in the Tools menu (this will also give you a preview of any new scripts that have been committed to the repository after the release version that you are using). So feel free to mess around with the code for various tools. Experiment, tweak, investigate, improve, and have fun with it!

These script-based plugin tools can be developed using any of the three supported scripting languages, including Python, JavaScript, and Groovy. The implementation of the Python programming language used by Whitebox is called Jython (Python 2.7 currently), which runs on the Java platform. Similarly, the JavaScript engine used by Whitebox is called Nashorn and is the newly built (for Java 8) scripting environment that is baked directly into Java. It’s a modern JavaScript engine that has significant performance advantages over the previous JavaScript engine (Rhino) used by older versions of Java. Nashorn is to the Java platform what V8 is to Chrome. Groovy is a scripting language that runs on the Java platform and is the most similar of the three languages to the Java programming language itself. In fact, many people refer to Groovy as a superset of Java, in that most valid Java code will also be executable Groovy (although Groovy has several advances that make it less verbose and generally much nicer to program in than Java itself).

Let’s consider a simple function, a 3 x 3 mean filter run over a raster image, as an example of how you would write a Whitebox plugin tool using each of the three languages. The function, which performs an average of the 9 grid cells surrounding each cell in an input raster and outputs the mean to the corresponding cell in an output raster, is quite typical of the type of analysis done in raster GIS operations. The following is the Python version of this function (please note that there is some kind of bug in WordPress that seems to substitute quotation marks, less than and greater than symbols with things like ‘"’ which may appear in the following code):

# imports
import time
import os
from threading import Thread
from whitebox.ui.plugin_dialog import ScriptDialog
from java.awt.event import ActionListener
from whitebox.geospatialfiles import WhiteboxRaster
from whitebox.geospatialfiles.WhiteboxRasterBase import DataType

'''The following four variables are required for this 
   script to be integrated into the tool tree panel. 
   Comment them out if you want to remove the script.'''
name = "PythonExamplePlugin" 
descriptiveName = "Example Python Plugin" 
description = "Just an example of a plugin tool using Python."
toolboxes = ["topmost"] 
	
class PythonExamplePlugin(ActionListener):
    def __init__(self, args):
        if len(args) != 0:
            self.execute(args)
        else:
            ''' Create a dialog for this tool to collect user-specified
                tool parameters.''' 
            self.sd = ScriptDialog(pluginHost, "Python Example Plugin", self)	
			
            ''' Specifying the help file will display the html help
            // file in the help pane. This file should be be located 
            // in the help directory and have the same name as the 
            // class, with an html extension.'''
            helpFile = self.__class__.__name__
            self.sd.setHelpFile(helpFile)
            
            ''' Specifying the source file allows the 'view code' 
            // button on the tool dialog to be displayed.'''
            self.sd.setSourceFile(os.path.abspath(__file__))
            
            # add some components to the dialog '''
            self.sd.addDialogFile("Input raster file", "Input Raster File:", "open", "Raster Files (*.dep), DEP", True, False)
            self.sd.addDialogFile("Output raster file", "Output Raster File:", "save", "Raster Files (*.dep), DEP", True, False)
            
            # Resize the dialog to the standard size and display it '''
            self.sd.setSize(800, 400)
            self.sd.visible = True
            
    def actionPerformed(self, event):
        if event.getActionCommand() == "ok":
            args = self.sd.collectParameters()
            t = Thread(target=lambda: self.execute(args))
            t.start()

    ''' The execute function is the main part of the tool, where the actual
    work is completed.'''
    def execute(self, args):
        try:
            dX = [ 1, 1, 1, 0, -1, -1, -1, 0 ]
            dY = [ -1, 0, 1, 1, 1, 0, -1, -1 ]
            
            if len(args) != 2:
                pluginHost.showFeedback("Incorrect number of arguments given to tool.")
                return
                
            # read the input parameters
            inputfile = args[0]
            outputfile = args[1]
            
            # read the input image 
            inputraster = WhiteboxRaster(inputfile, 'r')
            nodata = inputraster.getNoDataValue()
            rows = inputraster.getNumberRows()
            cols = inputraster.getNumberColumns()
            
            # initialize the output image
            outputraster = WhiteboxRaster(outputfile, "rw", inputfile, DataType.FLOAT, nodata)
            outputraster.setPreferredPalette(inputraster.getPreferredPalette())
            
            '''perform the analysis
            This code loops through a raster and performs a 
            3 x 3 mean filter.'''
            oldprogress = -1
            for row in xrange(0, rows):
                for col in xrange(0, cols):
                    z = inputraster.getValue(row, col)
                    if z != nodata:
                        mean = z
                        numneighbours = 1
                        for n in xrange(0, 8):
                            zn = inputraster.getValue(row + dY[n], col + dX[n])
                            if zn != nodata:
                                mean += zn
                                numneighbours += 1
                                
                        outputraster.setValue(row, col, mean / numneighbours)
                        
                    progress = (int)(100.0 * row / (rows - 1))
                    if progress != oldprogress:
                        oldprogress = progress
                        pluginHost.updateProgress(progress)
                        if pluginHost.isRequestForOperationCancelSet():
                            pluginHost.showFeedback("Operation cancelled")
                            return
			
            inputraster.close()
            outputraster.addMetadataEntry("Created by the " + descriptiveName + " tool.")
            outputraster.addMetadataEntry("Created on " + time.asctime())
            outputraster.close()

            # display the output image
            pluginHost.returnData(outputfile)
            
            except Exception, e:
                print e
                pluginHost.showFeedback("An error has occurred during operation. See log file for details.")
                pluginHost.logException("Error in " + descriptiveName, e)
                return
            finally:
                # reset the progress bar
                pluginHost.updateProgress(0)
	
if args is None:
    pluginHost.showFeedback("The arguments array has not been set.")
else:		
    PythonExamplePlugin(args)

Many of you GIS ‘Pythonistas’ will feel right at home looking at that code. After saving the code in the Scripter, you need to relaunch Whitebox before the program will recognize the tool and include it in its list of plugins (you only have to do this once). This is what the dialog looks like when you run the tool either by double-clicking the tool in the Tools treeview or by selecting Execute in the Scripter:

Tool dialog

Tool dialog (click to enlarge)

It has two input parameters, an input raster file name and the name of the output raster. To do this up right, we could write a help file for the tool and make sure that it is saved in the Whitebox Help directory. To do this, you simply need to select the Create New Help Entry button on the dialog and enter the HTML document.

Now, the following code is the equivalent JavaScript tool:

// imports
var Runnable = Java.type('java.lang.Runnable');
var Thread = Java.type('java.lang.Thread');
var ActionListener = Java.type('java.awt.event.ActionListener');
var ScriptDialog = Java.type('whitebox.ui.plugin_dialog.ScriptDialog');
var WhiteboxRaster = Java.type('whitebox.geospatialfiles.WhiteboxRaster');
var DataType = Java.type('whitebox.geospatialfiles.WhiteboxRasterBase.DataType');

// The following four variables are what make this recognizable as 
// a plugin tool for Whitebox. Each of name, descriptiveName, 
// description and toolboxes must be present.
var name = "JavascriptExamplePlugin";
var descriptiveName = "Example JavaScript tool";
var description = "Just an example of a plugin tool using JavaScript.";
var toolboxes = ["topmost"];

// Create a dialog for the tool
function createDialog(args) {
    if (args.length !== 0) {
        execute(args);
    } else {
        // create an ActionListener to handle the return from the dialog
        var ac = new ActionListener({
            actionPerformed: function(event) {
        if (event.getActionCommand() === "ok") {
            var args = sd.collectParameters();
            sd.dispose();
	    var r = new Runnable({
	        run: function() {
	            execute(args);
	        }
	    });
	    var t = new Thread(r);
	    t.start();
	    }
	}
	});

        // Create the scriptdialog object
        sd = new ScriptDialog(pluginHost, descriptiveName, ac);
        
        // Add some components to it
        sd.addDialogFile("Input raster file", "Input Raster File:", "open", "Raster Files (*.dep), DEP", true, false);
        sd.addDialogFile("Output raster file", "Output Raster File:", "save", "Raster Files (*.dep), DEP", true, false);
        
        // Specifying the help file will display the html help
        // file in the help pane. This file should be be located 
        // in the help directory and have the same name as the 
        // class, with an html extension.
        sd.setHelpFile(toolName);
        
        // Specifying the source file allows the 'view code' 
        // button on the tool dialog to be displayed.
        var scriptFile = pluginHost.getResourcesDirectory() + "plugins/Scripts/" + toolName + ".js";
        sd.setSourceFile(scriptFile);
        		
        // set the dialog size and make it visible
        sd.setSize(800, 400);
        sd.visible = true;
        return sd;
    }
}

// The execute function is the main part of the tool, where the actual
// work is completed.
function execute(args) {
    try {
        // declare  some variables for later
        var z, zn, mean;
        var numNeighbours;
        // read in the arguments
        if (args.length < 2) {
            pluginHost.showFeedback("The tool is being run without the correct number of parameters");
            return;
        }
        var inputFile = args[0];
        var outputFile = args[1];
        
        // setup the raster
        var input = new WhiteboxRaster(inputFile, "rw");
        var rows = input.getNumberRows();
        var cols = input.getNumberColumns();
        var nodata = input.getNoDataValue();
        var output = new WhiteboxRaster(outputFile, "rw", inputFile, DataType.FLOAT, nodata);
        output.setPreferredPalette(input.getPreferredPalette());
        
        /* perform the analysis
          This code loops through a raster and performs a 
 	3 x 3 mean filter. */
        var dX = [ 1, 1, 1, 0, -1, -1, -1, 0 ];
        var dY = [ -1, 0, 1, 1, 1, 0, -1, -1 ];
        var progress, oldProgress = -1;
        for (row = 0; row < rows; row++) {
            for (col = 0; col < cols; col++) {
                var z = input.getValue(row, col);
                if (z != nodata) {
                    mean = z;
                    numNeighbours = 1;
                    for (n = 0; n < 8; n++) {
                        zn = input.getValue(row + dY[n], col + dX[n]);
                        if (zn !== nodata) {
                            mean += zn;
                            numNeighbours++;
                        }
                    }
                    output.setValue(row, col, mean / numNeighbours);
                }
            }
            progress = row * 100.0 / (rows - 1);
            if (progress !== oldProgress) {
                pluginHost.updateProgress(progress);
                oldProgress = progress;
                // check to see if the user has requested a cancellation
                if (pluginHost.isRequestForOperationCancelSet()) {
                    pluginHost.showFeedback("Operation cancelled");
                    return;
                }
            }
        }
	
        input.close();
        output.addMetadataEntry("Created by the " + descriptiveName + " tool.");
        output.addMetadataEntry("Created on " + new Date());
        output.close();
        
        // display the output image
        pluginHost.returnData(outputFile);

    } catch (err) {
        pluginHost.showFeedback("An error has occurred:\n" + err);
        pluginHost.logException("Error in " + descriptiveName, err);
    } finally {
        // reset the progress bar
        pluginHost.updateProgress("Progress:", 0);
    }
}

if (args === null) {
    pluginHost.showFeedback("The arguments array has not been set.");
} else {
    var sd = createDialog(args);
}

And lastly, this is the equivalent Groovy code:

import java.awt.event.ActionListener
import java.awt.event.ActionEvent
import java.util.Date
import whitebox.interfaces.WhiteboxPluginHost
import whitebox.geospatialfiles.WhiteboxRaster
import whitebox.geospatialfiles.WhiteboxRasterBase.DataType
import whitebox.ui.plugin_dialog.ScriptDialog
import groovy.transform.CompileStatic

// The following four variables are required for this 
// script to be integrated into the tool tree panel. 
// Comment them out if you want to remove the script.
def name = "GroovyExamplePlugin"
def descriptiveName = "Example Groovy tool"
def description = "Just an example of a plugin tool using Groovy."
def toolboxes = ["topmost"]

public class GroovyExamplePlugin {
    private WhiteboxPluginHost pluginHost
    private String descriptiveName
    public GroovyExamplePlugin(WhiteboxPluginHost pluginHost, 
        String[] args, String name, String descriptiveName) {
        this.pluginHost = pluginHost;
        this.descriptiveName = descriptiveName;
        if (args.length > 0) {
            execute(args)
        } else {
            // create an ActionListener to handle the return from the dialog
            def ac = new ActionListener() {
                public void actionPerformed(ActionEvent event) {
                    if (event.getActionCommand().equals("ok")) {
                        args = sd.collectParameters()
                        sd.dispose()
                        final Runnable r = new Runnable() {
                            @Override
                            public void run() {
                                execute(args)
                            }
                        }
                        final Thread t = new Thread(r)
                        t.start()
                    }
                }
	    };
            
            // Create a dialog for this tool to collect user-specified
            // tool parameters.
            def sd = new ScriptDialog(pluginHost, descriptiveName, ac)	
            
            // Specifying the help file will display the html help
            // file in the help pane. This file should be be located 
            // in the help directory and have the same name as the 
            // class, with an html extension.
            sd.setHelpFile(name)
            
            // Specifying the source file allows the 'view code' 
            // button on the tool dialog to be displayed.
            def scriptFile = pluginHost.getResourcesDirectory() + "plugins" + File.separator + "Scripts" + File.separator + name + ".groovy"
            sd.setSourceFile(scriptFile)
            	
            // add some components to the dialog
            sd.addDialogFile("Input raster file", "Input Raster File:", "open", "Raster Files (*.dep), DEP", true, false)
            sd.addDialogFile("Output file", "Output Raster File:", "save", "Raster Files (*.dep), DEP", true, false)
            
            // resize the dialog to the standard size and display it
            sd.setSize(800, 400)
            sd.visible = true
        }
    }

    // The execute function is the main part of the tool, where the actual
    // work is completed.
    //@CompileStatic
    private void execute(String[] args) {
        try {
            int progress, oldProgress = -1, n, row, col, numNeighbours
            double z, zn, mean, nodata;
            int[] dX = [ 1, 1, 1, 0, -1, -1, -1, 0 ]
            int[] dY = [ -1, 0, 1, 1, 1, 0, -1, -1 ]
            
            if (args.length != 2) {
                pluginHost.showFeedback("Incorrect number of arguments given to tool.")
                return
            }
            // read the input parameters
            String inputFile = args[0]
            String outputFile = args[1]
            
            // read the input image
            WhiteboxRaster input = new WhiteboxRaster(inputFile, "r")
            nodata = input.getNoDataValue()
            int rows = input.getNumberRows()
            int cols = input.getNumberColumns()
            			
            // initialize the output image
            WhiteboxRaster output = new WhiteboxRaster(outputFile, "rw", inputFile, DataType.FLOAT, nodata)
            output.setPreferredPalette(input.getPreferredPalette());

            /* perform the analysis
            This code loops through a raster and performs a 
            3 x 3 mean filter. */
            for (row = 0; row < rows; row++) {
                for (col = 0; col < cols; col++) {
                    z = input.getValue(row, col);
                    if (z != nodata) {
                        mean = z;
                        numNeighbours = 1;
                        for (n = 0; n < 8; n++) {
                            zn = input.getValue(row + dY[n], col + dX[n]);
                            if (zn != nodata) {
                                mean += zn;
                                numNeighbours++;
                            }
                        }
                        output.setValue(row, col, mean / numNeighbours);
                    }
                }
                progress = (int)(100f * row / (rows - 1))
                if (progress != oldProgress) {
                    pluginHost.updateProgress(progress)
                    oldProgress = progress
                    // check to see if the user has requested a cancellation
                    if (pluginHost.isRequestForOperationCancelSet()) {
                        pluginHost.showFeedback("Operation cancelled")
                        return
                    }
                }
            }
			
            input.close()
            output.addMetadataEntry("Created by the " + descriptiveName + " tool.")
            output.addMetadataEntry("Created on " + new Date())
            output.close()
            
            // display the output image
            pluginHost.returnData(outputFile)
	
        } catch (Exception e) {
            pluginHost.showFeedback("An error has occurred during operation. See log file for details.")
            pluginHost.logException("Error in " + descriptiveName, e)
        } finally {
            // reset the progress bar
            pluginHost.updateProgress(0)
        }
    }
}

if (args == null) {
    pluginHost.showFeedback("Plugin arguments not set.")
} else {
    def myTool = new GroovyExamplePlugin(pluginHost, args, name, descriptiveName)
}

All three tools look identical and perform the exact same function. However, being developed using dynamically typed scripting languages, there is a performance penalty that exists compared with with the high-speed performance of a tool written in fast statically-typed Just-In-Time (JIT) compiled Java code. To compare the performance of each of our three identical plugin tools, I ran them each 10 times (I actually ran it 11 times and averaged the last 10 runs, to warm-up the JVM) on a 2,862 x 3,249 rows-by-columns raster grid and averaged the run time of each. Here are the results of the performance comparison:

Python: average time = 41.9 sec., lines of code = 124
JavaScript: average time = 9.8 sec., lines of code = 143
Groovy: average time = 3.0 sec., lines of code = 152

Much of the difference in the length of the programs (lines of code) are the result of the need to specify closing brackets in JavaScript and Groovy, compared to the elegant ‘meaningful whitespace’ of Python code. The real difference is obviously in the execution time of each of our three programs. The Python program was 4.3X slower than the JavaScript program and nearly 14X slower than the Groovy program. And here’s the best part; Groovy is actually an ‘optionally typed’ language, meaning that if you are looking to speed up performance even further, you have the option to statically compile various methods. Notice that commented out line in the Groovy source code, “//@CompileStatic”. Simply by removing the comments and running the program again, I was able to speed up the Groovy code even further:

Groovy (compile static): average time = 1.2 sec., lines of code = 152

That’s very nearly equivalent to the execution time of the program written in compiled Java code. That’s rather impressive! I know that in the GIS community, there is a great many Python programmers out there, and certainly with the popularity of web programming these days, there are even more JavaScript programmers. But if you’re wondering why so many of the 100+ script-based tools that are in Whitebox GAT are developed using a peculiarly named scripting language (Groovy) that you’ve probably never heard of before picking up Whitebox, that’s why. Truthfully, if you’re writing a tool that isn’t doing anything computationally demanding, then the Python and JavaScript are wonderful options to quickly build your tool. But if you’re doing something that involves intensive computation, perhaps consider writing the tool in Groovy. The nice thing about Whitebox is that you have the option and other than differences in performance, your tool will be treated by the program in exactly the same way. Ultimately, you should use the language that is most suited to the application at hand and the one that you are comfortable using.

Of course, if you develop a custom Whitebox plugin tool that you think might be useful for others, then consider donating your tool to the project so that it can be distributed to the whole community. To do so, simply e-mail me your source code and perhaps some data to perform testing on. Leave your comments below and, as always, best wishes and happy plugin tool writing!

Whitebox GAT for ArcGIS users

A short time ago over on the Whitebox GAT listserv someone asked an excellent question. They asked whether there was such thing as a Whitebox GAT User’s Manual, since the transition to Whitebox GAT, when you’re coming from an ArcGIS background, can be a little difficult. Often there are corresponding tools in Whitebox GAT to those in ArcGIS but they will frequently (and sometimes frustratingly, I’d imagine) have different names. There are different workflows for doing similar tasks and in general you can feel a little like a fish out of water for a short while until you get use to the Whitebox GAT way of doing things.

I can certainly understand this perspective. To date, I’ve been the main contributor to the Whitebox GAT project and so it reflects my design decisions more than anything. And, let’s face it, I’m a little different. As I pointed out to the question asker, I have never in my past had a time when I have used ArcGIS on a day-to-day basis. I’ve used IDRISI extensively many years ago (and that fact is apparent in some aspects of Whitebox in the way certain things are done) but frankly I’m just not that familiar with ArcGIS. It doesn’t inform how I have developed Whitebox to a large extent.

While we’re still working on a draft of a comprehensive Whitebox GAT User’s Manual (thank you to Adam Bonnycastle for his efforts here), I have been busily compiling quick tutorials on common tasks in Whitebox, which I have been cleverly embedding in the Help menu:

Whitebox's Help menu

Whitebox’s Help menu

Notice, that list starting with ‘Getting the Whitebox source code’? I’m not sure to what extent this feature has been taken up, but it’s there. If you have any suggestions for other additions let me know either by emailing me or leaving a comment below. If you have any Whitebox GAT tutorials that you’ve created yourself and would like to donate to the project, also please let me know.

But this gets me to the main point of this blog. What I would like to do is offer a brief tutorial on the topic “Whitebox GAT for ArcGIS users“. It can be loosely formed as a list of common gottcha’s or ‘Them vs. Us’ comparisons. But as I’ve said above, I really don’t use ArcGIS, so I need to ask for your help. If you’re a Whitebox GAT user and are coming to it from an ArcGIS background, can you provide us with a list of the main difficulties you experienced when you first tried using Whitebox? We’re looking for some of the things that you had to spend time trying to figure out how to do. Wouldn’t it be great if you could help to make that transition a little easier for future Whitebox users? I’m a big believer in the idea of paying it forward. Leave your comments below and, as always, best wishes and happy geoprocessing.

Hexbinning in Whitebox GAT

The practice of binning point data to form a type of 2D histogram, density plot, or what is sometimes called a heatmap, is quite useful as an alternative for the cartographic display of of very dense points sets. This is particularly the case when the points experience significant overlap at the displayed scale. This is the type of operation that is used (in feature space) in Whitebox‘s Feature Space Plot tool. Normally when we perform binning, however, it is using a regular square or perhaps rectangular grid. The other day I read an interesting blog on hexbinning (see also the excellent post by Zachary Forest Johnson also on the topic of hexbinning), or the use of a hexagonal-shaped grid for the creation of density plots.

A Hex-binned heatmap

A Hex-binned heatmap (click to enlarge)

These blogs got me excited, in part because I have used hex-grids for various spatial operations in the past and am aware of several advantages to this tessellation structure. The linked hexbinning blogs point out that hexagonal grids are useful because the hexagon is the most circular-shaped polygon that tessellates. A consequence of this circularity is that hex grids tend to represent curves more naturally than square grids, which includes better representation of polygon boundaries during rasterization. But there are several other advantages that make hexbinning a worthwhile practice. For example, one consequence of the nearly circular shape of hexagons is that hex-grids are very closely packed compared with square grids. That is, the average spacing between hexagon centre points in a hex-grid is smaller than the equivalent average spacing in a square grid. One way of thinking about this characteristic is that it means that hexagonal grids require about 13% fewer data points then the square grid to represent a distribution at a comparable level of detail. Also, unlike a square grid, each cell in a hex-grid shares an equal-length boundary with its six neighbouring grid cells. With square grids, four of the eight neighbouring cells are connected through a point only. This causes all sorts of problems for spatial analysis, not the least of which is the characteristic orientation sensitivity of square grids; and don’t get me started on the effects of this characteristics for surface flow-path modelling on raster digital elevation models. Hex-grid cells are also equally distant to each of their six neighbours. I’ve even heard it argued before that given the shape of those cone and rod cells in our eyes, hex-grids are more naturally suited to the human visual system, although I’m not sure how true this is.

Hexagonal grids are certainly worthwhile data structures and hex-binning is a great way to make a heatmap. So, that said, I decided to write a tool to perform hex-binning in Whitebox GAT. The tool will be publicly available in the 3.2.1 release of the software, which will hopefully be out soon but here is a preview:

Whitebox's Hexbinning tool

Whitebox’s Hexbinning tool (click to enlarge)

The tool will take either an input shapefile of POINT or MULTIPOINT ShapeType, or a LiDAR LAS file and it will be housed in both the Vector Tools and LiDAR Tools toolboxes. Here is an example of a hex-binned density map (seen on right) derived using the tool applied to a distribution of 7.7 million points (seen on left) contained in a LAS file and derived using a terrestrial LiDAR system:

Hexbinned LiDAR points

Hexbinned LiDAR points (Click to enlarge)

Notice that the points in the image on the left are so incredibly dense in places that you cannot effectively see individual features; they overlap completely to form blanket areas of points. It wouldn’t matter how small I rendered the points, at the scale of the map, they would always coalesce into areas. The hex-binned heatmap is a far more effective way of visualizing the distribution of LiDAR points in this case.

The hexbinning tool is also very efficient. It took about two seconds to perform the binning on the 7.7 million LiDAR points above using my laptop. In the CUNY blog linked above, the author (I think it was Lee Hachadoorian) describes several problems that they ran into when they performed the hexbinning operation on their 285,000 points using QGIS. I suspect that their problems were at least partly the result of the fact that they performed the binning using point-in-polygon operations to identify the hexagon in which each point plotted. Point-in-polygon operations are computationally expensive and I think there is a better way here. A hex-grid is essentially the Voronoi diagram for the set of hexagon centre points, meaning that every location within a hexagon grid cell is nearest the hexagon’s centre than another other neighbouring cell centre point. As such, a more efficient way of performing hexbinning would be to enter each hexagon’s centre point into a KD-tree structure and perform a highly efficient nearest-neighbour operation on each of the data points to identify their enclosing hexagon. This is a fast operation in Whitebox and so the hexbinning tool works well even with massive point sets, as you commonly encounter when working with LiDAR data and other types of spatial data.

So I hope you have fun with this new tool and make some beautiful maps. Leave your comments below and, as always, best wishes and happy geoprocessing.

John Lindsay

SRTM DEM data downloader in Whitebox GAT

I just finished developing a tool for Whitebox GAT that will automatically download Shuttle Radar Topography Mission (SRTM) digital elevation models (DEMs) from the USGS SRTM FTP site. SRTM-3 data are among the best global elevation data, with a grid resolution of approximately 90 m. In many areas SRTM data provide the only topographic data set available. Within the United States, the SRTM-1 dataset provides an improved 30 m resolution. Not only does this Whitebox tool retrieve the SRTM tiles contained within the bounding box of a specified area of interest, but it will also import the tiles to Whitebox GAT, fill missing data holes (which are common with SRTM data in rugged terrain) and mosaic the tiles.

Whitebox's new Retrieve SRTM Data tool

Whitebox’s new Retrieve SRTM Data tool

There have been many times in the past when I have needed to download numerous SRTM files, import the files, fill the missing data holes, and finally mosaic the individual tiles. It can be a laborious workflow indeed. This tool will save me a great deal of time, and so I’m rather excited about it. It’s as though the data magically appear in Whitebox!

SRTM data in Whitebox GAT

SRTM data in Whitebox GAT

Of course, with Whitebox GAT’s extensive Terrain Analysis and Hydrological Analysis toolboxes, there’s plenty of interesting things that you can do once those data do magically appear.

90 m DEM of the entire British Isles

90 m DEM of the entire British Isles. (Note that the image is of a coarser resolution than the actual DEM.)

As an example, I created the SRTM-3 90 m DEM of the entire British Isles shown above in 4 minutes, 56.27 seconds, including the time to download 91 individual SRTM tiles, fill their missing data gaps, and mosaic the tiles. My son was even watching Netflix during my downloading, so I can only imagine how much that slowed things down! I only wish that other data providers could follow a similar data sharing model as the USGS and use an anonymous FTP server to distribute their data. If that were the case, we could have many other data sets automatically ingested directly into Whitebox. I’ll release this new SRTM retrieval tool in the next public release of Whitebox GAT (v 3.2.1), which will likely be sometime later this summer. If you are as keen to try it out as I am, email me for a preview copy. If you have any other comments or feedback, please leave them in the comments section below. As always, best wishes and happy geoprocessing.

John Lindsay

*****EDIT*****

Version 3.2.1 of Whitebox has now been released, with the SRTM downloader tool embedded in the Data Layers menu. Please let me know if you have any issues in using the tool.

****EDIT****

I’ve included an image of the DEM of Ireland for John below:

SRTM DEM of Ireland

SRTM DEM of Ireland (Click to enlarge)

So, what exactly is ‘open-access’ software?

There’s no doubt that Whitebox Geospatial Analysis Tools is open-source software. It developed under a free and open-source (FOSS) licence called the GNU General Public Licence and it’s source code is publicly available and modifiable. But I often say that Whitebox GAT is an example of open-access software. So what exactly do I mean by the term open-access software? That’s a good question since, as far as I know, I made the term up. Actually, open-access is a fairly common idea these days and has largely developed out of a perceived need for greater public availability to the outputs of academic publishing. The concept of open access is defined in the statement of the Budapest Open Access Initiative (Chan et al., 2002) as the publication of scholarly literature in a format that removes financial, legal, and technical access barriers to knowledge transfer. Although this definition of open access focuses solely on the publication of research literature, I would argue that the stated goals of reducing barriers associated with knowledge transfer can be equally applied to software. In fact, many of the goals of open-access stated in the definition above are realized by open-source software. Therefore, open-access software can be viewed as a complimentary extension to the traditional open-source model of software development.

The idea of open-access software is that the user community as a whole benefits from the ability of individual users to examine the internal workings of the software. In the case of geospatial software, e.g. a GIS or remote sensing software, this is likely to relate to specific algorithms associated with various analysis tools. Cȃmara and Fonseca (2007) also recognized that adoption of open-source software is not only a choice of software, but also a means of acquiring knowledge. Direct insight into the workings of algorithm design and implementation allows for educational opportunities and a deeper level of knowledge transfer, as well as the potential for rapid innovation, software improvements, and community-directed development. I would argue that this is particularly important in the GIS field because many geospatial algorithms are highly complex and are impacted by implementation details. There are often multiple competing algorithms for accomplishing the same task and the choice of one method over another can greatly impact the outcome of a spatial analysis operation. For example, consider how many different methods there are to measure the pattern of slope gradient from a digital elevation model or the numerous flow routing algorithms for interrogating overland flow paths. This is likely one of the reasons that open-source GIS packages have become so widely used in recent years.

So the benefits of an engaged user community with the ability to inspect software source code are numerous and profound, but aren’t these benefits realized by all FOSS GIS? As with anything worth looking into deeply, the answer is probably more complex than it initially appears. It’s true that all FOSS allow users the opportunity to download source code and to inspect the internal workings. This is in contrast to proprietary software for which the user can only gain insight into the workings of a tool from the provided help documentation. But the traditional method of developing FOSS doesn’t lend itself to end-user code inspection.

The concept of open-access software is based on the idea that software should be designed in a way that reduces the barriers that often discourage or disallow end-users from examining the algorithm design and implementation associated with the source code of specific software artefacts, e.g. geospatial tools in a GIS. That is, open-access software encourages the educational opportunities gained by direct inspection of code. It is important to understand that I am referring to barriers encountered by the typical end-user that may be interested in more completely understanding the details of how a specific tool works; I’m not considering the barriers encountered by the developers of the software…that’s a different topic that I’ll leave for another day. Think about how often you’ve used a GIS and wondered how some tool or operation functions after you pressed the OK button on the tool’s dialog. Even if it was an open-source GIS, you probably didn’t go any further than reading the help documentation to satisfy your curiosity. Why is that? It likely reflects a set of barriers that discourages user engagement and is inherent in the typical implementation of the open-source software model (and certainly the proprietary model too). An open-access software model, however, states that the reduction of these barriers should be a primary design goal that is taken into account at the inception of the project.

The main barriers that restrict the typical user of an open-source GIS from engaging with the code include each of the following:

  1. The need to download source code from a project repository that is separate from the main software artefact (i.e. the executable file). Often the project source-code files are larger than the actual program and downloading such a large file can be challenging for people with limited access to the internet.
  2. The need to download and install a specialized software program, called an integrated development environment (IDE), often required to open and view a project’s source code files. A typical GIS end-user who may find themselves interested in how a particular tool works is less likely to install this additional software, presenting yet another barrier between the user and the source code. 
  3. The required familiarity with the software project’s organizational structure needed to navigate the project files to locate the code related to a specific tool. That is, an understanding of the organization of the source code is necessary to identify the code associated with a specific tool or algorithm of interest. Most desktop GIS projects consist of hundreds of thousands of lines of computer code that are contained within many hundred files. Large projects possess complex organizational structures that are only familiar to the core group of developers. The level of familiarity with a project’s organization that is needed to navigate to the code associated with a particular feature or tool presents a significant barrier to the casual end-user who may be interested in gaining a more in-depth understanding of how a specific feature operates.
  4. The required ability to read code written in a specific programming language.  

Each of the barriers described above impose significant obstacles for users of open-source GIS that discourage deeper probing into how operations function. There may be other barriers, but those listed above are certainly among the most significant. Whitebox GAT attempts to address some these issues by allowing users to view the computer code associated with each plug-in directly from the tool’s dialog. Thus, just as a detailed description of a tool’s working is provided in the help documentation, which appears within the tool’s dialog, so to can the user choose to view the actual algorithm implementation simply by selecting the View Code button on the dialog.

The Clump tool dialog. Notice the View Code button common to all tool dialogs.

The Clump tool dialog. Notice the View Code button common to all tool dialogs. Click on image for enlarged version.

This removes the need to download separate, and often large, project source code files and it eliminates the requisite familiarity with the project to identify the lines of code related to the operation of the tool of interest. Furthermore the tool’s code will appear within an embedded window that provides syntax highlighting to enhance the viewer’s ability to interpret the information. The View Code button is so much more than a quirk of Whitebox GAT; it’s the embodiment of a design philosophy that empowers the software’s user community. This model has the potential to encourage further community involvement and feedback. Among the group of users that are comfortable with GIS programming and development, the ability to readily view the code associated with a tool can allow rapid transfer of knowledge and best-practices for enhancing performance. This model also encourages more rapid development because new functionality can be added simply by modifying existing code. The 1.0 series of Whitebox, developed using the .NET framework, even had the ability to automatically translate code written in one programming language into several other languages, thereby increasing the potential for knowledge transfer and lessening Barrier 4 above. Unfortunately this feature could not be replicated when the project migrated to the Java platform although there are on-going efforts to implement a similar feature.

So, that’s what I mean by open-access GIS. I think that it is a novel concept with the potential to significantly enhance the area of open-source software development in a way that will benefit the whole user community. So when people ask me why I bothered to write my own GIS when I could simply have contributed to one of the many successful and interesting open-source GIS projects that are out there, my reply is usually centred around the need for an open-access GIS. Some would say that I am an idealist, but oddly, I tend to think of myself as a pragmatist. In any case, the world could benefit from more idealists, don’t you think? If you have comments, suggestions, or feedback please leave them in the comments section below. And, as always, best wishes and happy geoprocessing.

Note: this blog is based on sections of a presentation that I gave at GISRUK 2014 and a manuscript that I am preparing for publication on the topic.