Summing Point Attributes By Polygon In Python Gis

Loading Vector Datasets

To sum point attributes by polygon in Python, we first need to load vector datasets into GeoPandas data frames. The main vector data types used in geospatial analysis are points, lines, and polygons. Points represent discrete locations, lines represent linear features, and polygons represent enclosed areas.

We will load a point layer containing numeric attributes we wish to summarize, as well as a polygon layer we will use to group the points. The polygon layer defines the geographic areas of interest we will sum the point values within.

Common GIS vector data sources supported by GeoPandas include shapefiles, GeoJSON files, PostGIS databases, and file-based spatial databases like GeoPackage. Here is an example loading shapefiles using the GeoPandas read_file method:


import geopandas as gpd

points_df = gpd.read_file("points.shp")
polys_df = gpd.read_file("polys.shp")

The points layer contains geometric point objects representing locations, along with associated numeric attributes we want to sum. The polys layer contains polygon geometries defining regions we will group and summarize the point values within.

Exploring the datasets loaded into the GeoPandas data frames can give us better insight into the data we are working with.

Exploring Attribute Tables

Once our vector datasets are loaded, we should explore the attribute tables to understand the data structure and determine how to best summarize the values.

The main attributes of interest are the geometric coordinates defining locations and boundaries, as well as any numerical attributes tied to points we want to sum. We can print head() and describe() methods to get an overview of the columns in the attribute tables:


print(points_df.head())
print(points_df['numeric_field'].describe())

print(polys_df.head()) 
print(polys_df.columns)

Key things to check are that the points layer has a numeric field to sum, and the polys layer has a unique ID field identifying each polygon region.

We may also wish to plot the layers to visually inspect them before processing:

  
ax = polys_df.plot(color='white', edgecolor='black')
points_df.plot(ax=ax, markersize=10)

With an understanding of our input data structure, we can now group points within polygon regions and sum values.

Summarizing Attributes by Polygon

Using pandas GroupBy

GeoPandas extends the Pandas library, so we can leverage Pandas for performing grouped summaries by polygon area.

The main approach is:

1. Spatial join points to assign polygon IDs
2. Group points by polygon ID
3. Apply sum() aggregate

Here we perform a spatial join to assign each point the ID of the polygon it falls inside:


points_df = gpd.sjoin(points_df, polys_df, how="inner", op='within')

Next we GroupBy polygon ID and apply the sum() method:

  
sums = points_df.groupby('poly_id')['numeric_attribute'].sum().reset_index()

The result, sums, contains the unique polygon IDs and total summed value of numeric_attribute within each polygon area.

Applying Aggregate Functions

In addition to sum(), we could apply other aggregate functions like mean(), min(), max(), etc. by polygon:


metrics = points_df.groupby('poly_id')['values'].agg([sum, min, max, mean])

This aggregates multiple statistics summarizing the distribution of values within each polygon region.

Handling Null Values

Sometimes point data contains null values we want to account for properly during summation.

We can drop rows with null values, or include them as zero values using fillna(0):

 
filled = points_df.fillna(0)
sums = filled.groupby('poly_id')['values'].sum().reset_index()

The sums now handles null points as 0 values, rather than ignoring those points completely. Pre-filling nulls allows including all data records in the aggregates.

Writing Summary Data to New Layer

Once we have computed the polygon summary aggregates, we can write results to a new output file or database table.

For example, to write the summed values by polygon ID to a GeoJSON file:


sums.to_file("polygon_sums.json", driver="GeoJSON") 

The output includes the polygon ID and any summary statistics fields, which can be joined and mapped with geospatial vector layers.

Other output options include CSV, PostGIS tables, SQLite/SpatiaLite, and GeoPackage (vector format). This facilitates analysis workflows combining Python scripting with additional desktop GIS software.

Visualizing and Validating Results

Visualizing summary data joined to the original polygons provides a way to visually QC check outputs.

For example, plotting sums as choropleths:


polys_df = polys_df.merge(sums, on='poly_id')
polys_df.plot(column='numeric_attribute', legend=True)

Thematic mapping techniques can reveal anomalies or exceptions indicating potential data issues or edge cases requiring special handling.

As well, comparing distribution histograms before and after the summarization can check overall consistency. Summary statistics between raw and aggregated outputs should exhibit similar distributions.

Additional Examples and Use Cases

The general Pandas GroupBy approach can be applied across many common point-in-polygon aggregation workflows.

Summarizing Census Data by Districts

Census data is commonly provided as dense point layers, with demographic attributes tied to each point.

District boundaries (stored as polygons) define reporting units for demographics. By spatially joining census points to their containing districts then deriving aggregates, we can attach summary population statistics to districts for thematic mapping and analysis.

Analyzing Crime Locations by Neighborhood

For spatial analysis of crime patterns, point locations of incidents need to be quantified within regions of interest.

Summarizing crime counts by neighborhood polygons enables crime rate mapping and analysis by region. Spatial hot spot analysis can then highlight areas of high crime density.

Map Visualization of Real Estate Values

In real estate applications, the locations and sales prices of individual properties (points) can be aggregated to characterize average prices by different neighborhoods, zip codes, or political wards (polygons).

Choropleth mapping of averages provides visualizations for quick property value comparisons across regions.

Field Survey Summaries by Plot

Ecological field data is also commonly captured as point observations with attributes, with logical sampling plots bounded by polygon delineations.

Grouping field measurements by plot areas allows for efficient summarization statistics per region, which can inform scientific analysis.

The same pandas GroupBy workflow extends across many discipline-specific use cases involving regional summaries of point event or observation data.

Leave a Reply

Your email address will not be published. Required fields are marked *