Working with LiDAR data in Whitebox GAT

LiDAR data are derived from laser scanners, usually from aerial platforms (i.e. airborne LiDAR) or mounted on tripods (i.e. terrestrial laser scanners). Laser scanning is considered a brute-force technique for topographic mapping because the instrument sends out millions pulses per second and records the returned data. Most of the pulses that are sent out won’t reflect back to the sensor but because there are so many pulses there will always be some returned data. In fact, sometimes a single laser pulse can produce multiple return points when the laser footprint is reflected off of more than one surface. Thus, these instruments produce very large data volumes and are capable of producing extremely detailed (1 m or finer) topographic models or digital elevation models (DEMs). One of the most interesting aspects of LiDAR data is that it is capable of ‘seeing through vegetation’ to create a bare-Earth DEM in which the vegetation canopy has effectively been stripped off the land. LiDAR is the only data source capable of doing this. In fact, the instrument isn’t actually seeing through the vegetation, but rather, because there are so many millions of laser pulses emitted over forest canopies, some of those points are likely to miss the vegetation and bounce off of the ground surface below, at least under moderate vegetation densities. The creation of a bare-Earth DEM requires interpolating the raw LiDAR point data in a way that ignores point returns that are from vegetation and other off-terrain objects and only includes the ground return points.

The most common data exchange format for LiDAR data is the LAS file. Since LiDAR produces massive data sets, LAS files are stored in an efficient binary data format that makes a surprisingly efficient use of every bit to store information about the x, y, and z coordinates of each return point along with additional information about the return intensity, return number, scan angle and direction, classification, and possibly even colour (RGB) information. The LAS file format is an industry standard format for storing data acquired by laser scanners and it has been created to handle all common data types. Not all LAS files will use all of the capabilities of the LAS format, e.g. some data sets may not contain information about point classifications or colour data. Since LiDAR datasets usually contain hundreds of millions, even billions, of points, it is common to separate the data into multiple LAS files, usually in 1 x 1 km2 tiles.

The first thing that you should do when processing LiDAR data is run the Get LAS File Summary tool in Whitebox GAT. This allows you to view some of the critical statistics for a LiDAR dataset. Most importantly, it will allow you to determine the point density, which will impact the resolution of DEM that you can interpolate from the dataset. Naturally point densities will vary throughout the area and may be quite low in areas of open water. The tool will also allow you to determine whether the point data have land classification associated with them, which will impact the way that you choose to interpolate the data. The following is an example of the output of this tool for a LiDAR file:

File Creation Date:   31/7/2008
Generating Software:   TerraScan
LAS Version:   1.1

File Summary
Statistic Value
Total Num. Points 2,429,232
Avg. Density 2.429 pts. m-2
Min. X 423,000.01
Max. X 424,000
Min. Y 4,692,000.02
Max. Y 4,693,000
Min. Z 209.69
Max. Z 243.6
Min. Intensity 0
Max. Intensity 760
Min. Scan Angle -17
Max. Scan Angle 19
Point Return Table
Return Value Number Percentage
1 2,076,265 85.47%
2 240,642 9.906%
3 100,520 4.138%
4 11,805 0.486%
Point Position Table
Return Position Number Percentage
Only 1,835,871 75.574%
First 240,394 9.896%
Last 240,846 9.914%
Intermediate 112,121 4.615%
Point Classification Table
Classification Number Percentage
2 Unclassified 1,057,167 43.519%
3 Ground 1,349,368 55.547%
4 Low Vegetation 67 0.003%
6 High Vegetation 16,331 0.672%
7 Building 2,921 0.12%
10 Water 3,378 0.139%

Number of Variable Length Records:   0     

So now, how do you go about creating a bare-earth DEM from a series of LAS files? The first step is to interpolate the LAS files. In fact, because LiDAR data is generally of such high density, you don’t generally need all that sophisticated an interpolation routine. Krigging for example would probably be overkill for most LiDAR datasets. Whitebox currently offers a number of interpolation methods for LAS files and the decision to use one over the other will depend on the characteristics of the dataset as well as the application. These interpolation methods include Nearest Neighbour (NN), which assigns the centre of each grid cell in the output raster the z-value of the nearest LiDAR point, minimum and maximum interpolation, which assigns each cell the minimum or maximum of the z values of the LiDAR points that fall within the circle with a diameter equal to the grid resolution, and an inverse distance weighted (IDW) interpolator. If you have a high point density (as you generally do with LiDAR data) and your LAS files contain information about the point return numbers, such that the algorithm can discern first and last returns, I’d recommend either IDW or NN interpolations. If land class information is also available within the LAS files, it is also possible to exclude points from the interpolation, e.g. exclude vegetation and building points.

If you don’t have the return number, then minimum interpolation can be useful to help create a bare-earth DEM, and maximum interpolation can be useful towards creating a first-return DEM, what some call a digital surface model. However, Whitebox also contains a special NN-based interpolation tool, called Bare-Earth DEM (LiDAR), that can be very effective at creating bare-Earth DEMs with very little additional information beyond point x, y, z coordinates. The tool distinguishes ground points from non-ground points based on the inter-point slope threshold. The interpolation area is divided into grid cells, corresponding to the cells of the output DEM. All of the LiDAR points within the circle encompassing each grid cell is then examined as a neighbourhood. Each point within a neighbourhood that has an inter-point slope with any other point and is also situated above the corresponding point, is considered to be a non-ground point. An appropriate value for the inter-point slope threshold parameter will depend on the steepness of the terrain, but generally values of 15-35 degrees produce satisfactory results. Importantly, this method of distinguishing non-ground points from terrain points does not rely on any information about the point return (e.g. first versus last return). It is therefore possible to derive a bare-Earth DEM for any LAS file, even those that do not contain information about the point return. The technique has been shown to work very well at creating bare-Earth DEMs even compared to interpolation using last-return points only, as is the case with the IDW and regular nearest-neighbour tools. The following is a bare-Earth DEM and canopy model created using this method from a LiDAR data set under extremely dense forest cover.

Bare-Earth DEM

A bare-Earth DEM created using Whitebox GAT bare-Earth DEM tool for an area under dense forest cover.

Canopy model

The canopy model corresponding to the bare-Earth DEM.

The IDW interpolation tool is one of the most sophisticated of the LiDAR-specific interpolators in Whitebox GAT. For example, if you specify many LAS tiles, it will use the points on the edges of neighbouring tiles to ensure that each of the interpolated output rasters don’t have significant edge-effects. You typically will input the names of several LAS files contained within a single directory. You then specify an output suffix. The algorithm will interpolate a raster for each LAS tile, processing the data in parallel if you have a computer with multiple cores. The output rasters will have the same names as the LAS inputs but will have this suffix appended to the end of the file names. Thus you could, for example, have your output raster named ‘401_4697_last return.dep’ where the first number part is a reference to the LAS tile and is derived from the input LAS file and ‘_last return’ is the suffix that you use to tell you that these have all been interpolated using the last return data only. This can be handy because you will often interpolate the LAS files numerous times with different requirements (first vs. last return, or perhaps excluding various types of classes like buildings). If it is a bare-earth DEM that you are after, then you should specify Last Return as the point return to interpolate. This will only use points that are designated as last return (e.g. 4 of 4) during interpolation. Naturally doing so will reduce the point density, which will impact the appropriate grid resolution of the output DEM. The maximum search distance parameter is usually set to between 1-5 times the grid resolution, which for most LiDAR datasets is appropriately set between 0.5 m – 5 m, depending on the point density. Keep in mind that if you end up having to reduce the resolution of the DEM at a later stage to make it more manageable for subsequent analysis (e.g. flow path modelling) then you would be better off interpolating the raw data at this coarser resolution (e.g. 10 m) than reducing the resolution of the interpolated DEM. If your LiDAR data contains point class data, you may optionally choose to exclude points associated with various classes from the interpolation process. For example, you can exclude points associated with low, medium and high vegetation and buildings, which will make the creation of a bare earth DEM all the better.

Whitebox GAT's LiDAR toolbox

Whitebox GAT’s LiDAR toolbox

The maximum scan angle deviation parameter in the IDW interpolator (also found in other LiDAR-specific interpolation tools in Whitebox) is an interesting feature. It is particularly useful if you are going to be using the resulting DEM for any kind of flow-path modelling, e.g. mapping streams or watersheds. The maximum scan angle deviation parameter can be used to filter points from the neighbourhood around each interpolated grid cell that have scan angles that are larger than the neighbourhood minimum scan angle by this user-defined threshold value. Large variation in the scan angle of nearby LiDAR points can be common near the edges of scan lines. Although the effects of this are rarely apparent in the DEM, the impact of using a collection of points with greatly varying scan angles for interpolation can be observed in the derived products of the DEM, particularly the hill shade image (see figure below).

flight-line overlap pattern

Wavy patterning associated with flight-line overlaps

The extent of this wavy patterning depends on the tools that were used to merge the data from adjacent flight-lines during data processing and can significantly impact modelled surface flowpaths in these flight-line edge areas. Lowering the maximum scan angle deviation parameter will have the impact of reducing this artifact. If the parameter is set too low, however, the resulting interpolated surface may contain extensive areas of NoData values. Similarly, if you do not want to exclude any points as a result of the variation of scan angles within interpolation neighbourhoods, simply set this parameter to a high value, e.g. 90.0 degrees.

Now that you’ve created all of your interpolated DEM tiles there are a number of things that you need to do. You’ll need to mosaic the tiles together (i.e. stitch them into one larger file using the Mosaic tool), fill the no-data gaps (these are grid cells in areas of low point density where there were no LiDAR points within the specified search distance) using the Fill Missing Data Holes tool, and finally use the Remove Off-Terrain Objects tool to remove any remaining buildings, vegetation, or other off-terrain structures in the DEM. Depending on the size of the final DEM, which can be enormous for many LiDAR datasets, you may find it easier or even necessary to perform these tasks on the individual tiles and then do a final mosaic. If the data set is small enough, you would be better off mosaicing first then filling the gaps and removing the off-terrain objects. If you are also interested in creating a map of vegetation heights, you may consider using the Canopy Model (LiDAR) tool which works in a similar way as the Bare-Earth DEM (LiDAR) tool.

LiDAR data can be a joy to work with because they offer levels of detail that are unmatched by other data sources. Of course, to unlock the rich information contained in these unique data you need the appropriate tools at hand. Whitebox GAT provides some very powerful tools for working with LiDAR. So as always, best wishes and happy geoprocessing.

John Lindsay

Selecting features by attributes in Whitebox GAT

The latest released version of Whitebox GAT (3.1.3) has greatly improved feature selection capabilities, including feature selection by attributes built directly into the vector attributes table. The following post is a brief tutorial to guide you through the process of how this works.

To select features based on attributes contained with the attributes table, first be sure that the vector layer of interest is selected as the active map layer in the Layers tab. Now open the attribute table of the vector layer either by selecting the View Attribute Table icon located on the toolbar, or by selecting View Attribute Table from the Data Layers menu. Click on the Feature Selection tab (see below).

Selection by attribute

Example of selection by attribute

The first important feature of this tab pane is the Execute Code icon (green arrow) used to run a selection command. You will also notice a drop-down menu that can be used to specify the type of selection operation including the following four modes:

  • Create new selection
  • Add to current selection
  • Remove from current selection
  • Select from current selection

Each of the various modes differ in the way that the newly identified features are related to any features that are already selected. The default mode of Create new selection will simply ignore any existing selection and create a new selection set.

There is also a drop-down menu containing the names of each of the fields within the attribute table. If you select a field from the field name menu, the name will automatically be inserted into the selection script text area. The selection script text area has an auto-complete feature that allows users to press Control (Ctrl) and the space bar (or Command + space on a Mac) after having typed one or a few letters of the field name and a helpful pop-up window will appear from which you can select the correct field name. This auto-complete feature is also available to help identify all of the methods that are associated with fields containing strings (e.g. startsWith, toLowerCase, equals, contains, etc.). If the selected field is a text string or is a numeric integer value with fewer than 300 unique values within the table these values will appear listed in the ‘Unique Values’ drop-down value. Selecting a unique value will result in it being inserted into the selection script text area as well.

The operators drop-down menu provides a quick-link list to many of the commonly used operators. Importantly, the selection is based on Groovy-language scripting. Groovy is a super-set of the Java programming language and provides substantial power to the feature selection process. The common logical and comparison operators that are used in a selection script include:

  • & (logical AND operator)
  • | (logical OR operator)
  • == (equal-to comparison operator for numeric data types; for text strings use str1.equals(str2) method)
  • != (not-equal-to comparison operatorfor numeric data types; for text strings use !str1.equals(str2) method))
  • > (greater-than comparison operator)
  • >= (greater-than-equal-to comparison operator)
  • < (greater-than comparison operator)
  • <= (greater-than-equal-to comparison operator)

However, any valid Groovy code is allowable in the selection script. The selection script is typically a single line long, effectively a single command, although multiline scripts are accepted as well. Importantly, the last line of the script must evaluate to a Boolean (true or false) value. Each row in the attributes table will be evaluated and if the final line expression evaluates to ‘true‘, the corresponding feature will be selected. The values within each of the fields for a row are assigned to variables with the same name as the field (case sensitive). There is also an extra variable available within the script called ‘index‘ which is the row (feature) number. For example, the following selection script:

index < 100 

would select the first 100 features in the vector layer. More complex selections are also possible. For example,

 
NAME.startsWith("C") & POP2005 > 25000000 & LAT > 0.0

selects the features that have NAME starting with the letter U, POP2005 greater than 25,000,000 and LAT greater than 0.0. And,

CLASS.equals("agriculture") | CLASS.equals("forest")

selects all features that have an attribute CLASS value of either agriculture or forest.

After performing a selection, you will find that a filter has been applied to the attribute table such that only the selected features are displayed. The Options menu allows you to remove this filter, to copy the selected features to the clip board (they can then be pasted into a spreadsheet program as comma-separated values (CSV) text), or even save the features as a separate vector layer. The selected features will also be rendered on the map with a cyan coloured outline. That’s it. I think it’s rather straightforward and hopefully you do too. As always, best wishes and happy geoprocessing.

Workflow Automation in Whitebox GAT

(This is part 1 of a multi-part post. See Workflow Automation Part 2 here.)

Okay, Whitebox GAT provides users with a very user-friendly means of accessing geoprocessing tools through dialog boxes with built-in help documentation. The dialog boxes are very consistent in design and the tools are each readily accessible from the Tools tab in the tool tree-view and the quick-search lists. This likely provides the most convenient way to access this functionality for most of your day-to-day work. Every now and then however you’ll find yourself in a situation where you have a workflow with numerous intermediate steps and you need to repeat those steps several times over. This is a case for automation. Every plugin tool in Whitebox can be run automatically  from a script. In fact, the help documentation for each tool provides a description of how that tool can be scripted. In the longer term I intend to develop a graphical workflow automator that will automatically generate scripts based on a workflow graph that the user creates. This type of functionality, similar to what is found in ArcGIS, Idrisi, and a few others, can be quite helpful for automating certain tasks. They are never as powerful however as directly scripting a workflow and that is why so many GIS users these days have become comfortable with scripting. It’s an extremely powerful tool for the GIS analyst to have.

Whitebox GAT currently provides scripting support for three programming languages including Python (actually Jython), Groovy, and JavaScript. I know that because ArcGIS supports Python scripting there are many GISers out there that are familiar with Python. Those of you who have been paying close attention to Whitebox development may have noticed that I have started to write a lot of new plugin tools using Groovy (most of Whitebox is developed using Java). Why Groovy instead of Python, you may ask? Well Groovy is a superset of Java, meaning that most valid Java is valid Groovy (if you know Java you already know Groovy). Groovy, being a scripting language, has some syntax advantages that make it much terser than regular Java. Compared with Jython, the Groovy project also seems to be more actively maintained and certainly has better computational performance, particularly with the optional static compilation. So for writing full-blown plugin tools, Groovy makes a lot of sense. (As an aside, how can you not like a language with such a great name?) But for simply automating workflows, I know that most of you are going to be looking at good old Python.

So, that said, below is a Python script that provides an example of how to automate a workflow. In this case, it automates the very common task in spatial hydrology of taking in a digital elevation model, removing the topographic depressions, calculating the flow directions, calculating the flow accumulation grid, then extracting streams. The script could be made much terser but I added a lot of comments to clarify. Simply paste the script into the Whitebox Scripter and press Execute Code. Scripting in Whitebox GAT can be a heck of a lot of fun and awfully addictive once you realize how much power you have at your fingertips! As always, best wishes and happy geoprocessing.

'''
Copyright (C) 2014 Dr. John Lindsay &amp;lt;jlindsay@uoguelph.ca&amp;gt;

This program is intended for instructional purposes only. The
following is an example of how to use Whitebox's scripting
capabilities to automate a geoprocessing workflow. The scripting
language is Python; more specifically it is Jython, the Python
implementation targeting the Java Virtual Machine (JVM).

In this script, we will take a digital elevation model (DEM),
remove all the topographic depressions from it (i.e. hydrologically
correct the DEM), calculate a flow direction pointer grid, use
the pointer file to perform a flow accumulation (i.e. upslope area)
calculation, then threshold the upslope area to derive valley lines
or streams. This is a fairly common workflow in spatial hydrology.

When you run a script from within Whitebox, a reference to the
Whitebox user interface (UI) will be automatically bound to your
script. It's variable name is 'pluginHost'. This is the primary
reason why the script must be run from within Whitebox's Scripter.

First we need the directory containing the data, and to set
the working directory to this. We will use the Vermont DEM contained
within the samples directory.
'''

import os

separator = os.sep # The system-specific directory separator
wd = pluginHost.getApplicationDirectory() + separator + &amp;quot;resources&amp;quot; + separator + &amp;quot;samples&amp;quot; + separator + &amp;quot;Vermont DEM&amp;quot; + separator
pluginHost.setWorkingDirectory(wd)

demFile = wd + &amp;quot;Vermont DEM.dep&amp;quot;
# Notice that spaces are allowed in file names. There is also no
# restriction on the length of the file name...in fact longer,
# descriptive names are preferred. Whitebox is friendly!

# A raster or vector file can be displayed by specifying the file
# name as an argument of the returnData method of the pluginHost
pluginHost.returnData(demFile)

'''
Remove the depressions in the DEM using the 'FillDepressions' tool.

The help file for each tool in Whitebox contains a section detailing
the required input parameters needed to run the tool from a script.
These parameters are always fed to the tool in a String array, in
the case below, called 'args'. The tool is then run using the 'runPlugin'
method of the pluginHost. runPlugin takes the name of the tool (see
the tool's help for the proper name), the arguments string array,
followed by two Boolean arguments. The first of these Boolean
arguments determines whether the plugin will be run on its own
separate thread. In most scripting applications, this should be set
to 'False' because the results of this tool are needed as inputs to
subsequent tools. The second Boolean argument specifies whether the
data that are returned to the pluginHost after the tool is completed
should be suppressed. Many tools will automatically display images
or shapefiles or some text report when they've completed. It is often
the case in a workflow that you only want the final result to be
displayed, in which case all of the runPlugins should have this final
Boolean parameter set to 'True' except for the last operation, for
which it should be set to 'False' (i.e. don't suppress the output).
The data will still be written to disc if the output are supressed,
they simply won't be automatically displayed when the tool has
completed. If you don't specify this last Boolean parameter, the
output will be treated as normal.
'''
filledDEMFile = wd + &amp;quot;filled DEM.dep&amp;quot;
flatIncrement = &amp;quot;0.001&amp;quot; # Notice that although this is a numeric parameter, it is provided to the tool as a string.
args = [demFile, filledDEMFile, flatIncrement]
pluginHost.runPlugin(&amp;quot;FillDepressions&amp;quot;, args, False, True)

# Calculate the D8 pointer (flow direction) file.
pointerFile = wd + &amp;quot;pointer.dep&amp;quot;
args = [filledDEMFile, pointerFile]
pluginHost.runPlugin(&amp;quot;FlowPointerD8&amp;quot;, args, False, True)

# Perform the flow accumulation operation.
flowAccumFile = wd + &amp;quot;flow accumulation.dep&amp;quot;
outputType = &amp;quot;number of upslope grid cells&amp;quot;
logTransformOutput = &amp;quot;False&amp;quot;
args = [pointerFile, flowAccumFile, outputType, logTransformOutput]
pluginHost.runPlugin(&amp;quot;FlowAccumD8&amp;quot;, args, False, True)

# Extract the streams
streamsFile = wd + &amp;quot;streams.dep&amp;quot;
channelThreshold = &amp;quot;1000.0&amp;quot;
backValue = &amp;quot;NoData&amp;quot;
args = [flowAccumFile, streamsFile, channelThreshold, backValue]
pluginHost.runPlugin(&amp;quot;ExtractStreams&amp;quot;, args, False, False) # This final result will be displayed

'''
Note that in each of the examples above, I have created new variables
to hold each of the input parameters for the plugin tools. I've done
this more for clarity than anything else. The script could be
substantially shorted if the shorter variables were directly entered
into the args array. For instance, I could have easily used:

args = [flowAccumFile, streamsFile, &amp;quot;1000.0&amp;quot;, &amp;quot;NoData&amp;quot;]

for the last runPlugin and saved myself declaring the two variables.
Because the file names are generally used in subsequent operations,
it is a good idea to dedicate variables to those parameters.
'''
Whitebox Scripter

Whitebox Scripter

Edit
Catching Bugs Before They Bug You:

When you’re writing a script in Whitebox, if your program throws an error, the software will record the error in your log files. The log files are xml files contained within the logs directory within the main Whitebox folder. They are detailed printouts of exactly what was happening around the time that the exception was thrown. They can certainly be challenging to read. An easier way of dealing with this problem is to incorporate exception handling in your script. Here’s a brief example:

try:
	i = 14
	k = &quot;24&quot;
	j = i + k # Throws an error
except Exception, e:
	print e
	pluginHost.showFeedback(&quot;Error during script execution.&quot;)
	''' alternatively, you many want to send the exception to 
	the pluginHost.logException() method '''
finally:
	print &quot;I'm done!&quot;

This code will print the following statement to the Whitebox Scripter console:

TypeError(“unsupported operand type(s) for +: ‘int’ and ‘unicode'”,)

Getting the most out of Whitebox GAT

Have you been running into out-of-memory errors with Whitebox and are a bit confused because you have 8-16GB of memory in your computer? My guess is you’re running Microsoft Windows and the good news is there’s a solution to your problems. The default Java Runtime Environment (JRE) for download on the Oracle site for computers running Microsoft Windows OS is the 32-bit version even if you are running a 64-bit version of the operating system. This will severely limit the amount of memory that Whitebox can access. In fact, working with 32-bit Java will limit the memory that any Java program on your computer will see, but because Whitebox is used to process large spatial files, it’s particularly problematic for Whitebox users. The 32-bit version only allows Java to see about 1.1GB of your RAM which doesn’t give you much room to play with that wonderful satellite image or LiDAR file, does it? This can affect performance and can lead to out-of-memory errors when handling large files. Given the large number of Windows users, this affects many people (I’ll point out here that there is only a 64-bit version of Java for Mac OSX so this same problem simply doesn’t occur).

You can check what version of the JRE Whitebox is running under by selecting Options and Settings under the View menu.

Whitebox Options and Settings

Options and Settings in Whitebox

You can actually be running multiple versions of the Java runtime on your computer at the same time, so you really have to check this in Whitebox to find out which version it is running on. If you are running the 32-bit version on a 64-bit computer it would be advisable to upgrade your Java to the 64-bit version. On the Oracle Java download site, the default version of Java for Windows is 32-bit, so you’ll need to explicitly select the 64-bit version. You want the version that ends with -windows-x64.exe rather than the 32-bit version which ends with -windows-i586.exe. Now uninstall that old 32-bit version (make sure that no other programs require it first and also make sure that Whitebox is closed before you uninstall Java) and install the new 64-bit version. Launch Whitebox again and check under the Options and Settings menu to see that you have a reasonable heap size size and that you’re using 64-bit Java. You should find that you have considerably more room to play around with those massive files now. It’s like moving into a mansion after having lived for years in a one-bedroom apartment…what will you do with all that new room? I hope you use it to do some wonderfully exciting geospatial analysis 😉

Cheers,

John Lindsay

At last, a Whitebox GAT listserv!

Well, I’ve finally gotten around to setting up a listserv for Whitebox GAT. This will be an excellent way of reaching out to the larger Whitebox user community. If you would like to join the Whitebox GAT listserv send an email message to listserv@listserv.uoguelph.ca that says only:

SUBSCRIBE whitebox-gat <your name>

If you have an email signature, be sure to delete it from your message. To post a message, email:

whitebox-gat@listserv.uoguelph.ca

You can use the listserv to get help from Whitebox experts and offer your expertise to the Whitebox user community (contribute and watch your karma improve!). It’s also a great way to catch up on recent developments.

Whitebox and Landscape Visibility

I have just finished writing a new tool for estimating viewsheds from digital elevation models (DEMs) and also for calculating a relative visibility index. The algorithms are actually based on earlier tools that I had written into the Terrain Analysis System (TAS). The visibility index estimates the size of the viewshed for each grid cell in a DEM (see figure below).

Fig. 1. Example of the relative visibility index.

These algorithms are actually extremely computationally intensive, as most viewsheding algorithms are, making their application to even moderate-sized DEMs quite challenging on modestly powered computers. So I thought that I would take this opportunity to explore the possibility of using Java’s concurrent features (i.e. parallel processing) to improve computation. This has become increasingly important as a means of improving the performance of computationally intensive spatial analysis algorithms, since most modern computers have multi-core processors. I was able to implement a parallel version of the visibility index tool that provided substantial speed-ups over the serial version.

In doing so, I have come to realize one of the major challenges that we face in the field of spatial analysis. There is a trade-off that exists between using a concurrent (parallel) approach to a spatial analysis algorithm and the size of data that can be processed. On the one hand, using a parallel approach enables the processing of larger datasets in a way that uses all of those available processor cores. However, it would appear that parallel algorithms need the geospatial data being manipulated by a tool to be stored in computer memory, unless I’m missing something obvious. Most of Whitebox’s existing tools have been written in a way that allow for processing very large raster files, by only reading into memory chunks of data that fit comfortably in the existing resources. If the use of concurrent methods requires data to be housed within RAM, we find ourselves once again limited with respect to the size of data that we can process by memory resources. This brings me back to the days of programming TAS in the early 2000’s. Unless the amount of RAM on computers increases substantially, I worry that our ability to develop concurrent spatial analysis algorithms to process massive raster data sets may be somewhat limited. It seems that all those extra cores on our processors are a bit of a mixed blessing and requires considerable re-thinking of some of the standard ways of processing our geospatial data! No doubt, this would make  a very interesting avenue for research for some bright young geoscientist…

John Lindsay