Integrating Vector And Raster Datasets With Python And Gdal
Reading in Raster and Vector Files
The Geospatial Data Abstraction Library (GDAL) provides capabilities for working with both raster and vector spatial data formats in Python. To read a raster dataset into a Python script, the gdal.Open() function can be used, which takes the filesystem path of the raster file and returns a gdal.Dataset object representing the file contents and metadata.
For example:
import gdal raster_ds = gdal.Open('image.tif')
Similarly, OGR provides vector data access capabilities. The ogr.Open() method opens a vector file like a shapefile or GeoJSON data source and returns an ogr.DataSource object encapsulating the file’s layers and features.
import ogr vector_ds = ogr.Open('countries.shp')
Once opened, raster bands and metadata can be accessed through the Dataset object, while vector layers and features are available through the DataSource. This provides a consistent API for working with both data types in Python scripts.
Converting Between Raster and Vector
GDAL provides the gdal.Polygonize() method for converting a raster dataset into a vector representation. This traces contiguous pixel groups with the same values and outputs vector polygons enclosing those pixels. For example:
import gdal raster_ds = gdal.Open('land_cover.tif') vector_ds = gdal.Polygonize(raster_ds, raster_ds.GetRasterBand(1))
Here a thematic land cover raster is converted to a vector dataset with polygons delineating each distinct land cover type’s spatial extent. The input raster band defines the pixel values used in polygon tracing.
Converting the other direction from vector to raster is achieved using the gdal.Rasterize() method. This burns vector geometries into a target raster grid, assigning cell values based on an attribute field or constant value.
rectangle = ogr.CreateGeometryFromWkt('POLYGON ((10 10, 10 20, 20 20, 20 10, 10 10))') output_raster = gdal.Rasterize(rectangle, 100)
In this case a polygon rectangle is rasterized into a new GTiff output_raster, with all pixels covered by the rectangle assigned a constant value of 100.
Spatial Joins
Integrating attributes between vector and raster layers can be achieved through spatial joins, joining attributes from one dataset to another based on spatial predicates like intersection. OGR provides the ability to join attributes from a vector layer to target vector layer geometries using ST_Intersects or other predicates.
For example, joining census block demographic data to polygon school district boundaries:
import ogr census_layer = ogr.Open('census_blocks.shp') school_districts_layer = ogr.Open('school_districts.shp') school_districts_layer.JoinLayer(census_layer, ogr.JOIN_ONE_TO_ONE)
Spatial joins can also integrate raster values with vector attributes. Using gdal.RasterizeLayer to burn vector features to a raster grid, then sampling the raster values back onto the original vectory using gdal.SampleRaster() supports transferring pixel values as attributes.
Intersecting Geometries
Finding the intersection regions between both vector-vector and vector-raster datasets provides an integrated output representing the spatial commonalities between the inputs. OGR provides ST_Intersection capabilities through the ogr.Intersection() method.
For example, clipping country boundaries by a continent polygon:
import ogr countries_layer = ogr.Open('ne_10m_admin_0_countries.shp') continent_layer = ogr.Open('continent_bounds.shp') result_layer = countries_layer.Intersection(continent_layer)
The resulting layer would contain only country geometries overlapping the input continent polygon. This can isolate shared regions between feature classes.
Similarly, intersecting a vector overlay with a raster data set extracts only pixels regions where both overlap. Using gdal.RasterizeLayer() to convert the vector to a mask, then multiplying it with the raster grid implements a spatial intersection.
Clipping Rasters
In addition to intersecting geometries, clipping one dataset by another’s extent focuses on just the overlapping regions. For rasters, gdal.Warp() can clip to a vector polygon’s bounding box.
import gdal raster_ds = gdal.Open('image.tif') clip_shape = ogr.CreateGeometryFromWkt('POLYGON(...)') output_ds = gdal.Warp('output.tif', raster_ds, cutlineDSName=clip_shape)
The output raster contains only pixels within the clipping polygon, providing an easy way to extract a spatial subset of a larger raster.
Vector layers can also be clipped using ogr.Clip(). This clipping occurs based on the area extent of a second intersection polygon geometry.
import ogr vector_layer = ogr.Open('countries.shp') clip_polygon = ogr.CreateGeometryFromWkt('POLYGON(...)') clipped_layer = vector_layer.Clip(clip_polygon)
Masking and Extracting Pixels
In addition to clipping full rasters, pixel-level masks can selectively extract pixels below vector geometries while ignoring other regions. gdalf.RasterizeLayer() renders vector features to a mask raster aligned to the target raster grid.
import gdal raster_ds = gdal.Open('image.tif') vector_layer = ogr.Open('polygons.shp') mask = gdal.RasterizeLayer(vector_layer, raster_ds)
Multiplying the raster and mask extracts only pixels intersecting the vector polygons, setting all other pixels to nodata values. This achieves a detailed pixel-level mask without modifying the original raster or vector data.
Inverse masks can also exclude vector polygons, hiding pixels below them while passing through all other values.
Reprojecting Datasets
Integrating raster and vector data in the same workflow often requires reprojecting to align coordinate reference systems and ensure accurate analysis. GDAL provides gdal.Warp() and ogr.ReprojectLayer() to reproject both raster and vector data, avoiding distortions.
For example, reprojecting a GeoJSON dataset to match a UTM raster grid:
import gdal raster_ds = gdal.Open('image.tif') vector_ds = ogr.Open('data.geojson') utm_cs = raster_ds.GetProjection() vector_ds = vector_ds.ReprojectLayer(utm_cs)
This approach ensures both datasets share the same projection for integrated computation. Sensor and model output rasters often use projected coordinates, requiring reprojection of vector data to match.
Exporting Integrated Datasets
After combining raster and vector data through operations like clipping, masking, and spatial joins, the integrated output results can be exported to common geospatial file formats for additional analysis and visualization.
For example, using gdal.Translate() and ogr.GetDriverByName() to write an analysis result as a GeoTIFF file:
integrated_ds = # Result dataset out_raster = gdal.Translate('output.tif', integrated_ds) out_vector = ogr.GetDriverByName('ESRI Shapefile').CopyDataSource(integrated_ds, 'output.shp')
GDAL and OGR support translation to many geospatial file types like GeoJSON, GeoTIFF, shapefile, File Geodatabases, and more. This facilitates saving integrated datasets for additional processing and mapping.