Whitebox Geospatial Analysis Tools v. 3.2.2 Released

It is with tremendous pleasure that I announce the release of the latest version of Whitebox Geospatial Analysis Tools, version 3.2.2. It has been several months since the release of the previous version of Whitebox GAT and we have incorporated several significant new features, tools, and bug fixes. The following is a partial list of changes:

  • LAS files can now be rendered based on the Classification, Scan Angle, and GPS Time in addition to Elevation and Intensity. This is on top of several other improvements that have been made to the display of LAS file data in Whitebox GAT.
  • Added Conditional Evaluation tool (if-then-else statement). I really like this tool and now wonder how it was that I managed to get by without it.
  • Added a PickFromList tool, which outputs a raster with the value from a stack
    of rasters based on an input position raster.
  • Added LowestPosition, HighestPosition, PercentEqualTo, PercentGreaterThan, and PercentLessThan tools for working with raster stacks, i.e. lists of overlapping
    raster images.
  • Added a tool to create histograms based on the elevations, intensity, and
    scan angles within a LiDAR (LAS) file. It will also output the percentiles of
    the distribution, e.g. 95th percentile of elevation.
  • Added the AddPointCoordinateToTable tool, which can be used to add the x-y
    coordinates of points within a Point type ShapeFile as fields within its attribute table.
  • Added a tool to filter the points in a LiDAR (LAS) file based on a threshold
    in the absolute scan angle. Currently the output is a shapefile of mass-point shape but eventually we would like to have it write to a new LAS file.
  • The Merge Points Files tool was replaced with the more general Merge Shapefiles
    tool, which works with any ShapeType.
  • Added the FindLowestHighestLocations tool, which will output vector points
    corresponding to the lowest and highest points in a raster. This has already come in handy several times.
  • Added ExtractRasterValuesAtPoints tool for extracting the cell values of each
    image in a raster stack corresponding to a set of vector points. Values are
    placed within the vector attribute table.
  • Added DeleteSmallLakesAndExtendRivers tool, which can be used to remove small lakes (polygons) in a vector drainage network and to extend the associated river
    network (intersecting polylines) into the interior of the lake. I created this tool in response to an interesting question asked over on the GIS Stack Exchange.
  • Added a Long Profile From Point tool, which can generate one or more longitudinal
    profiles for the flowpaths issuing from a set of vector points.
  • Modified the Mosaic With Feathering tool to handle RGB images in addition to
    continuous scale rasters. At the moment, this only works for the nearest-
    neighbour mode. I’m not sure why I didn’t do this earlier.
  • Added Image Stack Profile tool to create a line graphs for a set of vector points
    through an image stack, such as multi-spectral satellite image data. This can be handy for visualizing the spectral signatures of individual pixels.
  • Added a Simple Region Grow tool that will perform a very simple region grow
    segmentation of pixels in an image stack based on a specified threshold. I’d like to continue development in this area and eventually include a full object-based image segmentation.
  • Added parallel implementations of the D8 flow pointer and accumulation algorithms. At the moment this is really an experimental tool that is not intended for widespread use but there is more to come, including parallel versions of all the flow accumulation algorithms.
  • Fixed a bug in the Hillslopes tool.
  • I’ve added a whole suite of tools to the Elevation Residuals toolbox for
    performing multi-scaled topographic position analysis. This includes modified
    tools for calculating difference and deviation from mean elevations using an
    integral image approach that is extremely computationally efficient, even with
    large search windows. It also includes the Local Topographic Position Scale
    Signature and the Maximum Elevation Deviation tools. Combined with the
    ‘customRGBScript.groovy’ script, these functions allow for the creation of
    some spectacular multi-scale topographic position visualizations. See this link
    for examples.

And like a proud father, I can’t resist showing some nice pictures…

Whitebox GAT 3.2.2 screenshot

Whitebox GAT 3.2.2 screenshot

Whitebox GAT 3.2.2 screenshot

Whitebox GAT 3.2.2 screenshot

Whitebox GAT v. 3.2.2

Whitebox GAT v. 3.2.2

One point of note is that Oracle stopped supporting Java on Windows XP some time ago and therefore recent versions of Whitebox GAT no longer function on this platform. It’s time to upgrade!

Please let me know if you have any feedback or questions regarding the new version of Whitebox GAT and I hope you enjoy all the new goodies. As always, best wishes and happy geoprocessing.

UPDATE (April 15, 2015): It would seem that there were some issues with some of the plugin tools written as Groovy scripts that resulted from breaking changes associated with the update from Groovy 2.3 to Groovy 2.4.1. To overcome these issues I regressed the Groovy library linked to by Whitebox GAT to the 2.3 version and now all of the affected tools are working properly. Thank you to the users that alerted me to this issue.

Displaying LAS LiDAR point clouds in the Whitebox map area

I enjoy working with LiDAR data whenever I can because of its remarkable topographic detail and unique characteristics. More often then not, I work with LiDAR data interpolated onto a raster grid. Lately, however, I’ve been working with terrestrial laser scanner data and having a means of quickly visualizing the point data itself has become important to my workflows. Did you know that as of the latest release of Whitebox GAT (v. 3.2.1) there is now native support for displaying LAS files, the commonly used standard format for storing LiDAR point clouds, in the map area?

Adding LAS layers to a map area

Adding LAS layers to a map area

The LAS point cloud will be added to the Whitebox map area in the same way that you can overlay other vector or raster geospatial data. Here’s an example of a LAS dataset overlaid on top of a raster hillshade image.

Example LAS point cloud

Example LAS point cloud (click to enlarge)

The display properties, including the point size, colour palette, and display value ranges can be easily manipulated.

LAS display properties

LAS display properties

You can even render the point cloud based on elevation, intensity, class value, scan angle, or the GPS time (This is a newly added feature that will be present in the next public release).

Rendering attribute options

Rendering attribute options

Here’s an example of a LAS point cloud rendered using its intensity data. It’s interesting how much it resembles a fine-resolution orthophotograph, but this image is actually made up of millions of individual points. You need to zoom in to see them all.

LAS file rendered by intensity

LAS file rendered by intensity (click to enlarge)

Visualizing the GPS time and scan angle can be very useful for identifying individual flight lines in airborne LiDAR datasets. Here’s an example of rendering the LAS file displayed above using the GPS time as the display attribute:

GPS time rendering

GPS time rendering (click to enlarge)

It may not be much to look at (unless you really like orange), but the edges of the various overlapping flight lines are quite apparent.

I’ve also recently added a new LiDAR Histogram tool that will allow you to visualize the statistical distribution of elevation, intensity, or scan angle within a LAS dataset, including outputting a table of the percentiles.

LiDAR histogram tool

LiDAR histogram tool (click to enlarge)

Whitebox has certainly become a first-rate open-source platform for manipulating and analyzing LiDAR data. And it’s getting better with every release! Leave your comments below and, as always, best wishes and happy geoprocessing.

If you enjoyed this blog, you may also enjoy “Working with LiDAR data in Whitebox GAT“.

Hexbinning in Whitebox GAT

The practice of binning point data to form a type of 2D histogram, density plot, or what is sometimes called a heatmap, is quite useful as an alternative for the cartographic display of of very dense points sets. This is particularly the case when the points experience significant overlap at the displayed scale. This is the type of operation that is used (in feature space) in Whitebox‘s Feature Space Plot tool. Normally when we perform binning, however, it is using a regular square or perhaps rectangular grid. The other day I read an interesting blog on hexbinning (see also the excellent post by Zachary Forest Johnson also on the topic of hexbinning), or the use of a hexagonal-shaped grid for the creation of density plots.

A Hex-binned heatmap

A Hex-binned heatmap (click to enlarge)

These blogs got me excited, in part because I have used hex-grids for various spatial operations in the past and am aware of several advantages to this tessellation structure. The linked hexbinning blogs point out that hexagonal grids are useful because the hexagon is the most circular-shaped polygon that tessellates. A consequence of this circularity is that hex grids tend to represent curves more naturally than square grids, which includes better representation of polygon boundaries during rasterization. But there are several other advantages that make hexbinning a worthwhile practice. For example, one consequence of the nearly circular shape of hexagons is that hex-grids are very closely packed compared with square grids. That is, the average spacing between hexagon centre points in a hex-grid is smaller than the equivalent average spacing in a square grid. One way of thinking about this characteristic is that it means that hexagonal grids require about 13% fewer data points then the square grid to represent a distribution at a comparable level of detail. Also, unlike a square grid, each cell in a hex-grid shares an equal-length boundary with its six neighbouring grid cells. With square grids, four of the eight neighbouring cells are connected through a point only. This causes all sorts of problems for spatial analysis, not the least of which is the characteristic orientation sensitivity of square grids; and don’t get me started on the effects of this characteristics for surface flow-path modelling on raster digital elevation models. Hex-grid cells are also equally distant to each of their six neighbours. I’ve even heard it argued before that given the shape of those cone and rod cells in our eyes, hex-grids are more naturally suited to the human visual system, although I’m not sure how true this is.

Hexagonal grids are certainly worthwhile data structures and hex-binning is a great way to make a heatmap. So, that said, I decided to write a tool to perform hex-binning in Whitebox GAT. The tool will be publicly available in the 3.2.1 release of the software, which will hopefully be out soon but here is a preview:

Whitebox's Hexbinning tool

Whitebox’s Hexbinning tool (click to enlarge)

The tool will take either an input shapefile of POINT or MULTIPOINT ShapeType, or a LiDAR LAS file and it will be housed in both the Vector Tools and LiDAR Tools toolboxes. Here is an example of a hex-binned density map (seen on right) derived using the tool applied to a distribution of 7.7 million points (seen on left) contained in a LAS file and derived using a terrestrial LiDAR system:

Hexbinned LiDAR points

Hexbinned LiDAR points (Click to enlarge)

Notice that the points in the image on the left are so incredibly dense in places that you cannot effectively see individual features; they overlap completely to form blanket areas of points. It wouldn’t matter how small I rendered the points, at the scale of the map, they would always coalesce into areas. The hex-binned heatmap is a far more effective way of visualizing the distribution of LiDAR points in this case.

The hexbinning tool is also very efficient. It took about two seconds to perform the binning on the 7.7 million LiDAR points above using my laptop. In the CUNY blog linked above, the author (I think it was Lee Hachadoorian) describes several problems that they ran into when they performed the hexbinning operation on their 285,000 points using QGIS. I suspect that their problems were at least partly the result of the fact that they performed the binning using point-in-polygon operations to identify the hexagon in which each point plotted. Point-in-polygon operations are computationally expensive and I think there is a better way here. A hex-grid is essentially the Voronoi diagram for the set of hexagon centre points, meaning that every location within a hexagon grid cell is nearest the hexagon’s centre than another other neighbouring cell centre point. As such, a more efficient way of performing hexbinning would be to enter each hexagon’s centre point into a KD-tree structure and perform a highly efficient nearest-neighbour operation on each of the data points to identify their enclosing hexagon. This is a fast operation in Whitebox and so the hexbinning tool works well even with massive point sets, as you commonly encounter when working with LiDAR data and other types of spatial data.

So I hope you have fun with this new tool and make some beautiful maps. Leave your comments below and, as always, best wishes and happy geoprocessing.

John Lindsay

Working with LiDAR data in Whitebox GAT

LiDAR data are derived from laser scanners, usually from aerial platforms (i.e. airborne LiDAR) or mounted on tripods (i.e. terrestrial laser scanners). Laser scanning is considered a brute-force technique for topographic mapping because the instrument sends out millions pulses per second and records the returned data. Most of the pulses that are sent out won’t reflect back to the sensor but because there are so many pulses there will always be some returned data. In fact, sometimes a single laser pulse can produce multiple return points when the laser footprint is reflected off of more than one surface. Thus, these instruments produce very large data volumes and are capable of producing extremely detailed (1 m or finer) topographic models or digital elevation models (DEMs). One of the most interesting aspects of LiDAR data is that it is capable of ‘seeing through vegetation’ to create a bare-Earth DEM in which the vegetation canopy has effectively been stripped off the land. LiDAR is the only data source capable of doing this. In fact, the instrument isn’t actually seeing through the vegetation, but rather, because there are so many millions of laser pulses emitted over forest canopies, some of those points are likely to miss the vegetation and bounce off of the ground surface below, at least under moderate vegetation densities. The creation of a bare-Earth DEM requires interpolating the raw LiDAR point data in a way that ignores point returns that are from vegetation and other off-terrain objects and only includes the ground return points.

The most common data exchange format for LiDAR data is the LAS file. Since LiDAR produces massive data sets, LAS files are stored in an efficient binary data format that makes a surprisingly efficient use of every bit to store information about the x, y, and z coordinates of each return point along with additional information about the return intensity, return number, scan angle and direction, classification, and possibly even colour (RGB) information. The LAS file format is an industry standard format for storing data acquired by laser scanners and it has been created to handle all common data types. Not all LAS files will use all of the capabilities of the LAS format, e.g. some data sets may not contain information about point classifications or colour data. Since LiDAR datasets usually contain hundreds of millions, even billions, of points, it is common to separate the data into multiple LAS files, usually in 1 x 1 km2 tiles.

The first thing that you should do when processing LiDAR data is run the Get LAS File Summary tool in Whitebox GAT. This allows you to view some of the critical statistics for a LiDAR dataset. Most importantly, it will allow you to determine the point density, which will impact the resolution of DEM that you can interpolate from the dataset. Naturally point densities will vary throughout the area and may be quite low in areas of open water. The tool will also allow you to determine whether the point data have land classification associated with them, which will impact the way that you choose to interpolate the data. The following is an example of the output of this tool for a LiDAR file:

File Creation Date:   31/7/2008
Generating Software:   TerraScan
LAS Version:   1.1

File Summary
Statistic Value
Total Num. Points 2,429,232
Avg. Density 2.429 pts. m-2
Min. X 423,000.01
Max. X 424,000
Min. Y 4,692,000.02
Max. Y 4,693,000
Min. Z 209.69
Max. Z 243.6
Min. Intensity 0
Max. Intensity 760
Min. Scan Angle -17
Max. Scan Angle 19
Point Return Table
Return Value Number Percentage
1 2,076,265 85.47%
2 240,642 9.906%
3 100,520 4.138%
4 11,805 0.486%
Point Position Table
Return Position Number Percentage
Only 1,835,871 75.574%
First 240,394 9.896%
Last 240,846 9.914%
Intermediate 112,121 4.615%
Point Classification Table
Classification Number Percentage
2 Unclassified 1,057,167 43.519%
3 Ground 1,349,368 55.547%
4 Low Vegetation 67 0.003%
6 High Vegetation 16,331 0.672%
7 Building 2,921 0.12%
10 Water 3,378 0.139%

Number of Variable Length Records:   0     

So now, how do you go about creating a bare-earth DEM from a series of LAS files? The first step is to interpolate the LAS files. In fact, because LiDAR data is generally of such high density, you don’t generally need all that sophisticated an interpolation routine. Krigging for example would probably be overkill for most LiDAR datasets. Whitebox currently offers a number of interpolation methods for LAS files and the decision to use one over the other will depend on the characteristics of the dataset as well as the application. These interpolation methods include Nearest Neighbour (NN), which assigns the centre of each grid cell in the output raster the z-value of the nearest LiDAR point, minimum and maximum interpolation, which assigns each cell the minimum or maximum of the z values of the LiDAR points that fall within the circle with a diameter equal to the grid resolution, and an inverse distance weighted (IDW) interpolator. If you have a high point density (as you generally do with LiDAR data) and your LAS files contain information about the point return numbers, such that the algorithm can discern first and last returns, I’d recommend either IDW or NN interpolations. If land class information is also available within the LAS files, it is also possible to exclude points from the interpolation, e.g. exclude vegetation and building points.

If you don’t have the return number, then minimum interpolation can be useful to help create a bare-earth DEM, and maximum interpolation can be useful towards creating a first-return DEM, what some call a digital surface model. However, Whitebox also contains a special NN-based interpolation tool, called Bare-Earth DEM (LiDAR), that can be very effective at creating bare-Earth DEMs with very little additional information beyond point x, y, z coordinates. The tool distinguishes ground points from non-ground points based on the inter-point slope threshold. The interpolation area is divided into grid cells, corresponding to the cells of the output DEM. All of the LiDAR points within the circle encompassing each grid cell is then examined as a neighbourhood. Each point within a neighbourhood that has an inter-point slope with any other point and is also situated above the corresponding point, is considered to be a non-ground point. An appropriate value for the inter-point slope threshold parameter will depend on the steepness of the terrain, but generally values of 15-35 degrees produce satisfactory results. Importantly, this method of distinguishing non-ground points from terrain points does not rely on any information about the point return (e.g. first versus last return). It is therefore possible to derive a bare-Earth DEM for any LAS file, even those that do not contain information about the point return. The technique has been shown to work very well at creating bare-Earth DEMs even compared to interpolation using last-return points only, as is the case with the IDW and regular nearest-neighbour tools. The following is a bare-Earth DEM and canopy model created using this method from a LiDAR data set under extremely dense forest cover.

Bare-Earth DEM

A bare-Earth DEM created using Whitebox GAT bare-Earth DEM tool for an area under dense forest cover.

Canopy model

The canopy model corresponding to the bare-Earth DEM.

The IDW interpolation tool is one of the most sophisticated of the LiDAR-specific interpolators in Whitebox GAT. For example, if you specify many LAS tiles, it will use the points on the edges of neighbouring tiles to ensure that each of the interpolated output rasters don’t have significant edge-effects. You typically will input the names of several LAS files contained within a single directory. You then specify an output suffix. The algorithm will interpolate a raster for each LAS tile, processing the data in parallel if you have a computer with multiple cores. The output rasters will have the same names as the LAS inputs but will have this suffix appended to the end of the file names. Thus you could, for example, have your output raster named ‘401_4697_last return.dep’ where the first number part is a reference to the LAS tile and is derived from the input LAS file and ‘_last return’ is the suffix that you use to tell you that these have all been interpolated using the last return data only. This can be handy because you will often interpolate the LAS files numerous times with different requirements (first vs. last return, or perhaps excluding various types of classes like buildings). If it is a bare-earth DEM that you are after, then you should specify Last Return as the point return to interpolate. This will only use points that are designated as last return (e.g. 4 of 4) during interpolation. Naturally doing so will reduce the point density, which will impact the appropriate grid resolution of the output DEM. The maximum search distance parameter is usually set to between 1-5 times the grid resolution, which for most LiDAR datasets is appropriately set between 0.5 m – 5 m, depending on the point density. Keep in mind that if you end up having to reduce the resolution of the DEM at a later stage to make it more manageable for subsequent analysis (e.g. flow path modelling) then you would be better off interpolating the raw data at this coarser resolution (e.g. 10 m) than reducing the resolution of the interpolated DEM. If your LiDAR data contains point class data, you may optionally choose to exclude points associated with various classes from the interpolation process. For example, you can exclude points associated with low, medium and high vegetation and buildings, which will make the creation of a bare earth DEM all the better.

Whitebox GAT's LiDAR toolbox

Whitebox GAT’s LiDAR toolbox

The maximum scan angle deviation parameter in the IDW interpolator (also found in other LiDAR-specific interpolation tools in Whitebox) is an interesting feature. It is particularly useful if you are going to be using the resulting DEM for any kind of flow-path modelling, e.g. mapping streams or watersheds. The maximum scan angle deviation parameter can be used to filter points from the neighbourhood around each interpolated grid cell that have scan angles that are larger than the neighbourhood minimum scan angle by this user-defined threshold value. Large variation in the scan angle of nearby LiDAR points can be common near the edges of scan lines. Although the effects of this are rarely apparent in the DEM, the impact of using a collection of points with greatly varying scan angles for interpolation can be observed in the derived products of the DEM, particularly the hill shade image (see figure below).

flight-line overlap pattern

Wavy patterning associated with flight-line overlaps

The extent of this wavy patterning depends on the tools that were used to merge the data from adjacent flight-lines during data processing and can significantly impact modelled surface flowpaths in these flight-line edge areas. Lowering the maximum scan angle deviation parameter will have the impact of reducing this artifact. If the parameter is set too low, however, the resulting interpolated surface may contain extensive areas of NoData values. Similarly, if you do not want to exclude any points as a result of the variation of scan angles within interpolation neighbourhoods, simply set this parameter to a high value, e.g. 90.0 degrees.

Now that you’ve created all of your interpolated DEM tiles there are a number of things that you need to do. You’ll need to mosaic the tiles together (i.e. stitch them into one larger file using the Mosaic tool), fill the no-data gaps (these are grid cells in areas of low point density where there were no LiDAR points within the specified search distance) using the Fill Missing Data Holes tool, and finally use the Remove Off-Terrain Objects tool to remove any remaining buildings, vegetation, or other off-terrain structures in the DEM. Depending on the size of the final DEM, which can be enormous for many LiDAR datasets, you may find it easier or even necessary to perform these tasks on the individual tiles and then do a final mosaic. If the data set is small enough, you would be better off mosaicing first then filling the gaps and removing the off-terrain objects. If you are also interested in creating a map of vegetation heights, you may consider using the Canopy Model (LiDAR) tool which works in a similar way as the Bare-Earth DEM (LiDAR) tool.

LiDAR data can be a joy to work with because they offer levels of detail that are unmatched by other data sources. Of course, to unlock the rich information contained in these unique data you need the appropriate tools at hand. Whitebox GAT provides some very powerful tools for working with LiDAR. So as always, best wishes and happy geoprocessing.

John Lindsay