Working with LiDAR data in Whitebox GAT

LiDAR data are derived from laser scanners, usually from aerial platforms (i.e. airborne LiDAR) or mounted on tripods (i.e. terrestrial laser scanners). Laser scanning is considered a brute-force technique for topographic mapping because the instrument sends out millions pulses per second and records the returned data. Most of the pulses that are sent out won’t reflect back to the sensor but because there are so many pulses there will always be some returned data. In fact, sometimes a single laser pulse can produce multiple return points when the laser footprint is reflected off of more than one surface. Thus, these instruments produce very large data volumes and are capable of producing extremely detailed (1 m or finer) topographic models or digital elevation models (DEMs). One of the most interesting aspects of LiDAR data is that it is capable of ‘seeing through vegetation’ to create a bare-Earth DEM in which the vegetation canopy has effectively been stripped off the land. LiDAR is the only data source capable of doing this. In fact, the instrument isn’t actually seeing through the vegetation, but rather, because there are so many millions of laser pulses emitted over forest canopies, some of those points are likely to miss the vegetation and bounce off of the ground surface below, at least under moderate vegetation densities. The creation of a bare-Earth DEM requires interpolating the raw LiDAR point data in a way that ignores point returns that are from vegetation and other off-terrain objects and only includes the ground return points.

The most common data exchange format for LiDAR data is the LAS file. Since LiDAR produces massive data sets, LAS files are stored in an efficient binary data format that makes a surprisingly efficient use of every bit to store information about the x, y, and z coordinates of each return point along with additional information about the return intensity, return number, scan angle and direction, classification, and possibly even colour (RGB) information. The LAS file format is an industry standard format for storing data acquired by laser scanners and it has been created to handle all common data types. Not all LAS files will use all of the capabilities of the LAS format, e.g. some data sets may not contain information about point classifications or colour data. Since LiDAR datasets usually contain hundreds of millions, even billions, of points, it is common to separate the data into multiple LAS files, usually in 1 x 1 km2 tiles.

The first thing that you should do when processing LiDAR data is run the Get LAS File Summary tool in Whitebox GAT. This allows you to view some of the critical statistics for a LiDAR dataset. Most importantly, it will allow you to determine the point density, which will impact the resolution of DEM that you can interpolate from the dataset. Naturally point densities will vary throughout the area and may be quite low in areas of open water. The tool will also allow you to determine whether the point data have land classification associated with them, which will impact the way that you choose to interpolate the data. The following is an example of the output of this tool for a LiDAR file:

File Creation Date:   31/7/2008
Generating Software:   TerraScan
LAS Version:   1.1

File Summary
Statistic Value
Total Num. Points 2,429,232
Avg. Density 2.429 pts. m-2
Min. X 423,000.01
Max. X 424,000
Min. Y 4,692,000.02
Max. Y 4,693,000
Min. Z 209.69
Max. Z 243.6
Min. Intensity 0
Max. Intensity 760
Min. Scan Angle -17
Max. Scan Angle 19
Point Return Table
Return Value Number Percentage
1 2,076,265 85.47%
2 240,642 9.906%
3 100,520 4.138%
4 11,805 0.486%
Point Position Table
Return Position Number Percentage
Only 1,835,871 75.574%
First 240,394 9.896%
Last 240,846 9.914%
Intermediate 112,121 4.615%
Point Classification Table
Classification Number Percentage
2 Unclassified 1,057,167 43.519%
3 Ground 1,349,368 55.547%
4 Low Vegetation 67 0.003%
6 High Vegetation 16,331 0.672%
7 Building 2,921 0.12%
10 Water 3,378 0.139%

Number of Variable Length Records:   0     

So now, how do you go about creating a bare-earth DEM from a series of LAS files? The first step is to interpolate the LAS files. In fact, because LiDAR data is generally of such high density, you don’t generally need all that sophisticated an interpolation routine. Krigging for example would probably be overkill for most LiDAR datasets. Whitebox currently offers a number of interpolation methods for LAS files and the decision to use one over the other will depend on the characteristics of the dataset as well as the application. These interpolation methods include Nearest Neighbour (NN), which assigns the centre of each grid cell in the output raster the z-value of the nearest LiDAR point, minimum and maximum interpolation, which assigns each cell the minimum or maximum of the z values of the LiDAR points that fall within the circle with a diameter equal to the grid resolution, and an inverse distance weighted (IDW) interpolator. If you have a high point density (as you generally do with LiDAR data) and your LAS files contain information about the point return numbers, such that the algorithm can discern first and last returns, I’d recommend either IDW or NN interpolations. If land class information is also available within the LAS files, it is also possible to exclude points from the interpolation, e.g. exclude vegetation and building points.

If you don’t have the return number, then minimum interpolation can be useful to help create a bare-earth DEM, and maximum interpolation can be useful towards creating a first-return DEM, what some call a digital surface model. However, Whitebox also contains a special NN-based interpolation tool, called Bare-Earth DEM (LiDAR), that can be very effective at creating bare-Earth DEMs with very little additional information beyond point x, y, z coordinates. The tool distinguishes ground points from non-ground points based on the inter-point slope threshold. The interpolation area is divided into grid cells, corresponding to the cells of the output DEM. All of the LiDAR points within the circle encompassing each grid cell is then examined as a neighbourhood. Each point within a neighbourhood that has an inter-point slope with any other point and is also situated above the corresponding point, is considered to be a non-ground point. An appropriate value for the inter-point slope threshold parameter will depend on the steepness of the terrain, but generally values of 15-35 degrees produce satisfactory results. Importantly, this method of distinguishing non-ground points from terrain points does not rely on any information about the point return (e.g. first versus last return). It is therefore possible to derive a bare-Earth DEM for any LAS file, even those that do not contain information about the point return. The technique has been shown to work very well at creating bare-Earth DEMs even compared to interpolation using last-return points only, as is the case with the IDW and regular nearest-neighbour tools. The following is a bare-Earth DEM and canopy model created using this method from a LiDAR data set under extremely dense forest cover.

Bare-Earth DEM

A bare-Earth DEM created using Whitebox GAT bare-Earth DEM tool for an area under dense forest cover.

Canopy model

The canopy model corresponding to the bare-Earth DEM.

The IDW interpolation tool is one of the most sophisticated of the LiDAR-specific interpolators in Whitebox GAT. For example, if you specify many LAS tiles, it will use the points on the edges of neighbouring tiles to ensure that each of the interpolated output rasters don’t have significant edge-effects. You typically will input the names of several LAS files contained within a single directory. You then specify an output suffix. The algorithm will interpolate a raster for each LAS tile, processing the data in parallel if you have a computer with multiple cores. The output rasters will have the same names as the LAS inputs but will have this suffix appended to the end of the file names. Thus you could, for example, have your output raster named ‘401_4697_last return.dep’ where the first number part is a reference to the LAS tile and is derived from the input LAS file and ‘_last return’ is the suffix that you use to tell you that these have all been interpolated using the last return data only. This can be handy because you will often interpolate the LAS files numerous times with different requirements (first vs. last return, or perhaps excluding various types of classes like buildings). If it is a bare-earth DEM that you are after, then you should specify Last Return as the point return to interpolate. This will only use points that are designated as last return (e.g. 4 of 4) during interpolation. Naturally doing so will reduce the point density, which will impact the appropriate grid resolution of the output DEM. The maximum search distance parameter is usually set to between 1-5 times the grid resolution, which for most LiDAR datasets is appropriately set between 0.5 m – 5 m, depending on the point density. Keep in mind that if you end up having to reduce the resolution of the DEM at a later stage to make it more manageable for subsequent analysis (e.g. flow path modelling) then you would be better off interpolating the raw data at this coarser resolution (e.g. 10 m) than reducing the resolution of the interpolated DEM. If your LiDAR data contains point class data, you may optionally choose to exclude points associated with various classes from the interpolation process. For example, you can exclude points associated with low, medium and high vegetation and buildings, which will make the creation of a bare earth DEM all the better.

Whitebox GAT's LiDAR toolbox

Whitebox GAT’s LiDAR toolbox

The maximum scan angle deviation parameter in the IDW interpolator (also found in other LiDAR-specific interpolation tools in Whitebox) is an interesting feature. It is particularly useful if you are going to be using the resulting DEM for any kind of flow-path modelling, e.g. mapping streams or watersheds. The maximum scan angle deviation parameter can be used to filter points from the neighbourhood around each interpolated grid cell that have scan angles that are larger than the neighbourhood minimum scan angle by this user-defined threshold value. Large variation in the scan angle of nearby LiDAR points can be common near the edges of scan lines. Although the effects of this are rarely apparent in the DEM, the impact of using a collection of points with greatly varying scan angles for interpolation can be observed in the derived products of the DEM, particularly the hill shade image (see figure below).

flight-line overlap pattern

Wavy patterning associated with flight-line overlaps

The extent of this wavy patterning depends on the tools that were used to merge the data from adjacent flight-lines during data processing and can significantly impact modelled surface flowpaths in these flight-line edge areas. Lowering the maximum scan angle deviation parameter will have the impact of reducing this artifact. If the parameter is set too low, however, the resulting interpolated surface may contain extensive areas of NoData values. Similarly, if you do not want to exclude any points as a result of the variation of scan angles within interpolation neighbourhoods, simply set this parameter to a high value, e.g. 90.0 degrees.

Now that you’ve created all of your interpolated DEM tiles there are a number of things that you need to do. You’ll need to mosaic the tiles together (i.e. stitch them into one larger file using the Mosaic tool), fill the no-data gaps (these are grid cells in areas of low point density where there were no LiDAR points within the specified search distance) using the Fill Missing Data Holes tool, and finally use the Remove Off-Terrain Objects tool to remove any remaining buildings, vegetation, or other off-terrain structures in the DEM. Depending on the size of the final DEM, which can be enormous for many LiDAR datasets, you may find it easier or even necessary to perform these tasks on the individual tiles and then do a final mosaic. If the data set is small enough, you would be better off mosaicing first then filling the gaps and removing the off-terrain objects. If you are also interested in creating a map of vegetation heights, you may consider using the Canopy Model (LiDAR) tool which works in a similar way as the Bare-Earth DEM (LiDAR) tool.

LiDAR data can be a joy to work with because they offer levels of detail that are unmatched by other data sources. Of course, to unlock the rich information contained in these unique data you need the appropriate tools at hand. Whitebox GAT provides some very powerful tools for working with LiDAR. So as always, best wishes and happy geoprocessing.

John Lindsay