Optimizing Las File Sizes For Efficient Lidar Data Processing

Understanding LAS File Sizes

Light Detection and Ranging (LiDAR) data is often stored in the LAS file format. LAS files contain dense 3D point clouds that represent terrain and landscape features scanned by aerial LiDAR systems. As high-resolution LiDAR becomes more prevalent, LAS file sizes have increased dramatically into the multi-gigabyte range.

Managing extremely large LAS datasets can become problematic for processing workflows. Larger file sizes lead to storage capacity issues and slow data transfer speeds that hinder efficiency. Strategies for reducing LAS file sizes are needed to streamline LiDAR data handling.

Factors Impacting File Size

The primary drivers of LAS file size are:

Table of Contents

Point cloud density – higher resolution scans produce more scan points per square area.
Coverage area – larger areas scanned result in more total points.
Data attributes – additional information beyond 3D coordinates increases per-point bytes.
Precision level – higher precision coordinates use more bytes to store.

These parameters are determined during the LiDAR data collection process. Post-processing provides opportunities to optimize file sizes for analysis purposes.

Managing Large Datasets

When working with massive multi-gigabyte LAS files, data management becomes a major hurdle. Reading, transferring, and processing such unwieldy datasets strains computer hardware and bogs down workflows.

Strategies like tiling large LAS files into smaller sub-area chunks, filtering to reduced point sets, conversion to optimized LAZ compression, and distributed processing allow more tractable analysis of big LiDAR data.

Strategies for Reducing File Sizes

Post-processing workflows can apply various methods to slim down bloated LAS files to more compact and nimble formats optimized for particular analysis needs.

Filtering Data By Classification

The LAS specification allows storing a class code for land cover types with each point. Keeping only ground and building points filtered by classification codes provides the essential terrain structure needed for many applications, discarding vegetation and noise.

Thinning Dense Point Clouds

LAS post-processing can thin excessively dense scans down to more manageable point spacing sufficient for the target analysis using spatial binning, averages, grid sampling, and interpolation techniques to reduce point totals with minimal information loss.

Converting to Optimized LAZ Format

The LAZ format applies ziplike compression to LAS files, compacting file size substantially while retaining all original attributes. LAZ compression ratios of 5-10x or more cut storage needs and transfer times.

Optimizing Workflows

With strategies to condense bulky LAS inputs in hand, tackling big LiDAR data boils down to balancing processing chains for best speed, accuracy, and efficiency.

Benchmarking Hardware Capabilities

Realistic assessment of available computer system resources guides decisions on processing extent. Memory, storage, multi-core CPUs, and GPU acceleration are key hardware factors dictating feasible workflow design.

Balancing Processing Speed and Accuracy

Precision requirements vary by application – dense precise scans are not always needed. Slimming data volume through filtering, thinning, and LAZ conversion does entail some accuracy loss. Constraints on tolerable losses vs needs for responsiveness must be weighed.

Automating Processing Chains

Manual LAS post-processing and analysis rapidly becomes infeasible for massive datasets. Scripting processing chains makes leveraging distributed cloud infrastructure practical, avoiding manual effort bottlenecks.

Example Python Code

Python has become a popular language for LiDAR processing with packages like pylas providing LAS I/O and processing functions. The example below loads a LAS file, filters classification codes, thins to 5 meter nominal post spacing, converts to LAZ, and writes the slimmed output.

import pylas

inFile = 'dense_scan.las' 

lasData = pylas.read(inFile)

groundFilter = lasData.classification == 2
buildingFilter = lasData.classification == 6 

cleaned = lasData.extract(groundFilter | buildingFilter)

thinned = cleaned.thin(5.0) 

outFile = 'thinned_ground_buildings.laz'

thinned.write(outFile, format="laz")

Achieving Efficient LiDAR Data Handling

By following best practices for LAS storage, transfer, processing, and analysis workflows, even very large raw LiDAR data can be transformed into tractable formats optimized for user needs.

Recommended Specs for Working with Large Datasets

To most effectively harness workflows for huge LAZ processing, high core count CPUs, maximum RAM and SSD storage, and GPU accelerators are recommended. Cloud virtual machines give flexible access to suitable hardware capacity.

Cloud Processing Options

Public clouds like AWS, Azure, and GCP provide on-demand access to beefy hardware well suited for big data workflows. Orchestration platforms like GeoTorch facilitate distributed processing of massively parallel jobs on elastic cloud infrastructure.

Future Directions for Compression Algorithms

Ongoing LiDAR research seeks better compression via evolving machine learning algorithms like neural nets autoencoders, providing avenues to further improve handling of bulky point clouds through smarter slimming.