Extracting Accurate Bare Earth Dems From Unclassified Lidar Point Clouds

Removing Vegetation from LiDAR Data

LiDAR (Light Detection and Ranging) data collected from aerial platforms contains returns from both the bare earth surface as well as objects above the ground including buildings, trees, and other vegetation. To create an accurate bare earth digital elevation model (DEM), points classified as vegetation must be removed while retaining essential ground elevation points.

The process of separating ground returns from non-ground returns is referred to as classification. Various automated filtering algorithms exist to classify LiDAR point clouds, utilizing factors such as scan angle, return intensity, elevation discontinuities and local slope to distinguish probable ground returns. Common classification methods include morphological filtering, surface-based filtering, segment-based filtering, and machine learning approaches.

Morphological filtering defines a moving window over the point cloud data and thresholds elevation values within the window to extract probable ground returns. Surface-based filtering fits a continuous surface to the lowermost points as an approximation of the bare earth. Segment-based filtering portions the entire data set into segments with similar properties from which ground points can be selected. Machine learning techniques train models on hand-classified point clouds to classify additional data.

Table of Contents

While automated filtering methods are convenient, vegetation classification remains an active area of research. Complex landscapes, low point densities, and variation in LiDAR systems can impact filtering accuracy. As such, manual editing may be necessary to correct misclassified points after applying automated filters. Visual inspection of classified data is recommended to validate classification quality prior to generating DEMs.

Key concepts in removing vegetation from LiDAR:

Classification separates ground and non-ground points
Common filters utilize scan geometry, intensities, slopes
Automated methods can have classification errors
Visual inspection should follow filtering

Classifying Ground and Non-Ground Points

Distinguishing ground points from non-ground points is essential to bare earth DEM generation. LiDAR systems record x,y,z coordinates with each return, capturing both elevation and position. Classification leverages patterns within this 3D point cloud to separate the two classes.

Key metrics used by classification algorithms include scan angle, return intensity, local slope and elevation discontinuity. Points collected at narrow scan angles exhibit properties distinct from steeply angled vegetation returns. Return intensity indicates reflectance strength, which varies by land cover type. Significant changes in slope or elevation over short distances suggest points lie on different surfaces.

By thresholding based on these metrics either individually or collectively, automated filters categorize points as ground or non-ground. Morphological filters define a window to evaluate point neighborhoods. Surface filters group low or continuous elevations as ground returns. Machine learning classification trains models on hand-classified data to inform categorization.

While automated methods have shown success for classification, errors still occur due to complex terrain, low point density and other factors. As such, visual inspection and manual correction following automated filtering is recommended to attain the highest quality ground point sets for DEM creation.

Key properties for classifying LiDAR points:

Scan angle
Return intensity
Slope
Elevation discontinuity

Implementing Cloth Simulation Filtering

Cloth simulation filtering (CSF) operates by modelling the movement of a flexible cloth draped over a point cloud to separate ground and non-ground points. CSF defines cloth parameters including stiffness, density, and damping coefficients according to the LiDAR data properties. Gravity and loose cloth behavior causes the modelled fabric to conform to peaks and pits during iteration.

Key steps when implementing CSF are:

Define cloth size matching the extent of the input LiDAR
Set cloth parameters based on point density and terrain complexity
Iterate cloth simulation until equilibrium reached
Classify points above cloth as non-ground returns
Refine cloth to capture terrain details through further simulation

As CSF classification occurs in 3D space, steep slopes and discontinuities are naturally handled avoiding issues posed with 2.5D raster filtering methods. CSF also requires minimal parameter tuning compared to machine learning techniques. However, complexity arises in large data sets which can be memory intensive.

While powerful, CSF suffers from similar issues as other automated filters, with errors occurring in areas of dense vegetation or steep slopes. As such, visual inspection of classified point clouds based on CSF remains essential to verify and manually edit misclassifications.

Key aspects when applying CSF:

Set cloth size and properties appropriately
Iterate simulation until equilibrium
Refine cloth to capture terrain details
Visually inspect and correct errors

Encoding Elevation Values in a Raster

Following classification and filtering to isolate ground returns, the LiDAR point cloud is converted to a raster DEM. This encoding process assigns elevation values from the irregular 3D point cloud onto a 2D grid of uniform cells. Interpolation methods fill DEM voids where no elevation measurement exists within a cell area.

Common interpolation approaches include inverse distance weighting (IDW), kriging, splining, and triangulation. IDW averages nearby measured values weighted by distance. Kriging considers spatial autocorrelation in surface trends. Splines fit smooth surfaces to points while preserving breaklines. Triangulation constructs mesh surfaces from point data.

Cell size governs DEM resolution with smaller sizes better capturing terrain variation. However, smaller cells require denser point samples. Upscaling elevation through binning, averaging, or aggregation integrates further points where cell sizes exceed native densities. Resolution balances detail with noise induced through interpolation error.

Encoded DEMs represent continuous surfaces, facilitating drainage modeling, visibility analysis, and volumetric calculations. However, raster encoding smoothes abrupt elevation changes that occur along road cuts, streams, ridges and other breaklines. Vector-based contouring preserves breaklines though lacks support for surface analysis methods.

Key factors in generating raster DEMs:

Select interpolation method matching terrain complexity
Set cell size relative to point density
Preserve breaklines through contouring if required

Evaluating DEM Accuracy

Rigorous accuracy assessment should follow raster DEM generation to validate if the derived model meets requirements of intended applications. Comparing interpolated elevations against withheld check points not used in modelling provides quantitative accuracy metrics.

Standard accuracy measures include root-mean-square error (RMSE), mean error (ME), and standard deviation (SD) calculated from residual errors at check points. RMSE indicates overall error magnitude with lower values showing less deviation from source measurements. ME reveals if a model exhibits bias towards higher or lower estimates. SD quantifies precision variation.

Spatial distribution of error matters alongside aggregate metrics like RMSE. Elevation residuals plotted across the DEM highlight zones of over or under estimation. Statistical analysis assesses if interpolation error significantly correlates with slope, aspect or land cover factors.

Qualitative evaluation through visual inspection maps regions of excessive flattening or artificial peaks. Positioning additional check points in problematic areas supports iterative refinement. Smoothness metrics gauge surface continuity disruption from artifacts. Validating DEMs ensures suitability for usage in hydrological and other models.

Recommendations for evaluating DEM accuracy:

Compare interpolated values against check points
Use statistics like RMSE, ME and SD
Map spatial error distribution
Visually inspect for artifacts
Iteratively refine uncertain areas

Example Script for applying CSF in Python

This example Python script loads a LAS LiDAR file, implements cloth simulation filtering to classify ground returns, converts the filtered point cloud to a raster DEM, and calculates accuracy statistics based on withheld validation points.


# Import modules
import laspy, numpy, pdal, clothsimfilter, rasterio
from scipy import interpolate
from statistics import mean 

# Read input LAS file  
lasData = laspy.read("data.las")

# Initialize cloth filter            
csf = clothsimfilter.ClothSimFilter()
csf.set_input(lasData)

# Run cloth simulation         
csf.simulate(max_iterations=100)   

# Access classified points
ground_points = csf.get_ground_points()
non_ground_points = csf.get_non_ground_points()

# Export ground points to new LAS file
out = laspy.create(point_format=lasData.header.point_format)
out.points = ground_points
out.write("ground_points.las")             

# Convert points to raster               
xmin, ymin = numpy.min(ground_points, axis=0)[:-1]
xmax, ymax = numpy.max(ground_points, axis=0)[:-1]
Cols = 100  
Rows = 100

xcoords = ground_points["x"]                                        
ycoords = ground_points["y"]
zvals = ground_points["z"]  

grid_x = numpy.linspace(xmin, xmax, Cols)
grid_y = numpy.linspace(ymin, ymax, Rows)

interp_linear = interpolate.LinearNDInterpolator(numpy.vstack([xcoords, ycoords]).T, zvals)

xx, yy = numpy.meshgrid(grid_x, grid_y)
zz = interp_linear(numpy.vstack([xx.flatten(), yy.flatten()]).T).reshape(100,100)


# Write GeoTIFF depth raster                      
geo_transform = (xmin, (xmax - xmin) / Cols, 0, ymax, 0, (ymin - ymax) / Rows) 
with rasterio.open('output_dem.tif', 'w', driver='GTiff', width=Cols, height=Rows, count=1, crs='epsg:4326', transform=geo_transform, dtype='float32') as dst:
    dst.write(zz.astype('float32'), 1)
        
# Calculate vertical accuracy statistics vs validation points   
check_points = pdal.read_las("check_points.las")  

errors=[]   
for p in check_points:
    val = numpy.float32(zz[int(p.Y), int(p.X)]) #Read raster value at XY
    err = p.Z - val #Calculate error vs validation Z           
    errors.append(err)
               
rmse = numpy.sqrt(numpy.mean(numpy.square(errors))) #RMSE accuracy
mean_err = mean(errors) #Mean error
std_dev = numpy.std(errors) #Standard deviation 

print(f"Vertical accuracy statistics:")                   
print(f"RMSE: {rmse}")
print(f"Mean error: {mean_err}")   
print(f"Standard deviation: {std_dev}")

In this script, key steps include:

Applying CSF to classify ground points
Interpolating ground returns to raster DEM
Calculating accuracy metrics vs checkpoints

The output statistics validate interpolation quality and reveal if additional DEM refinement is necessary prior to usage.

Tips for Improving Bare Earth DEM Extraction

Producing accurate and artifact-free bare earth DEMs from raw LiDAR data requires attention across steps of classification, filtering, interpolation and validation. The following tips can help improve output quality:

Manually verify point cloud classification in problem regions
Apply multiple filtering methods and compare outputs
Use breakline-preserving interpolation near terrain discontinuities
Iteratively refine uncertain areas marked by higher residual errors
Maintain native resolution rather than resampling if possible
Assess DEM derivatives like slope and aspects for artifacts
Use smoothness metrics to quantify surface continuity

No universal best practice exists for DEM extraction given variation across landscapes, data formats and intended usage. However evaluating the sensitivity of the final DEM to different parameterizations and visually inspecting for artifacts can help maximize accuracy and fitness to requirements.