Automating Geographic Data Selection With Arcpy.Select_Analysis()
The Problem of Manual Data Selection
Manually selecting subsets of features from geographic datasets can be extremely tedious and time-consuming. When working with large, complex datasets containing millions of features, attempting to cherry pick certain features using traditional selection techniques in ArcGIS Desktop is inefficient at best and often completely impractical.
The manual approach requires a GIS analyst to visually inspect the attribute table of a dataset, identify features meeting certain criteria, and individually select desired features one-by-one. This not only takes a massive amount of time but is also prone to human error and inaccuracies.
Automating data selection and subset generation eliminates the risks associated with manual workflows. By leveraging the arcpy.Select_analysis() tool and Python scripting, GIS analysts can automatically select features based on any combination of attributes, locations, and SQL queries. This improves efficiency, accuracy, and repeatability.
Understanding arcpy.Select_analysis()
The arcpy.Select_analysis() geoprocessing function creates a new subset dataset by extracting selected features from an existing dataset. The selection can be based on any combination of attribute queries, spatial filters, and SQL statements.
Selecting features based on attributes
To select features based on attributes, a selection SQL query is used to specify one or more field values that selected features must have. Common selection criteria may include a parcel’s land use classification, a pipeline’s installation date, or a county’s population.
For example, to select only pipeline features installed after January 1, 2010, the SQL query would be: “InstallDate” > ‘1/1/2010’. Multiple selection criteria can be combined using AND/OR logical operators.
Selecting features based on location
Features can also be selected based on their spatial location relative to other dataset features. Common spatial selection techniques include selecting features based on:
- Intersection – Features intersecting another layer
- Containment – Points inside polygons or lines crossing polygons
- Proximity – Features within a specified distance of features in another layer
- Layers and distances can be specified explicitly or based on attribute values
Selecting features using SQL queries
For more advanced selection logic, the arcpy.Select_analysis() tool accepts native SQL WHERE clauses. SQL offers maximum flexibility to apply all types of logical operators, wildcards, functions, and subqueries for powerful attribute-based selection.
For example:
SELECT * FROM Parcels WHERE LandUse IN ('RESIDENTIAL', 'COMMERCIAL') AND Area > 1000
would select all residential and commercial land parcels larger than 1000 square meters.
Selecting Points Within a Polygon
Sample polygon layer
To demonstrate geographic feature selection, we will use a polygon sample layer containing land parcels and a point layer containing tree specimens. Our goal is to select all trees falling within one of the land parcels.
The polygon layer contains various attributes including a unique ParcelID field, ownership information, land classification codes, and more. For selection purposes we are interested only in the geometry and do not need any specific attributes.
Selecting points based on location
Using the arcpy.Select_analysis tool, we specify the input point layer to select features from, and the polygon layer to use as the selection boundary. We set the relationship parameter to “CONTAINED_BY” to select all points fully within polygon boundaries.
arcpy.Select_analysis("trees","trees_in_parcels","CONTAINED_BY","parcels")
This outputs a new point feature class “trees_in_parcels” containing only those tree points falling inside parcel polygons, selecting them based purely on their geographic location.
Verifying selection results
We can visually inspect the output data against the original overall tree point layer to verify that the selection worked as expected. We should see tree points only within parcel boundaries.
Further validation can be done by checking feature counts before and after selection, or using spatial joins and attribute inspection to analyze selected feature characteristics.
Selecting Parcels By Area
Sample parcel layer
For the next example, we will work with the same polygon parcel layer as before. Our goal now is to automatically select only parcels greater than 1 acre in area.
In addition to basic attributes, this layer includes an “Area” field containing the area of each land parcel polygon in square meters. We can use this field to drive an attribute-based selection.
Using SQL query to select by area
The key to attribute-driven selection is forming the proper SQL query string. In this case, we build an SQL query specifying the “Area” field and a value greater than the number of sq meters in 1 acre (4046 sq meters).
query = '"Area" > 4046'
arcpy.Select_analysis("parcels","parcels_over_acre",query)
This extracts all parcels greater than 1 acre in size to a new selected dataset, by querying the “Area” attribute field using standard SQL syntax.
Verifying selection results
We can now visually confirm that only larger area parcels were selected or check feature counts before and after to validate. An additional best practice is spot checking attributes to confirm selected features have attributes meeting the selection criteria.
Automating Workflows with arcpy.Select_analysis()
Manually executing feature selection processes would eliminate any efficiency gains. To fully automate workflows, parameterize selection processes into reusable script tools and models.
Creating selection model scripts
ModelBuilder is the primary approach for constructing automated workflows in ArcGIS. Python scripting can also encapsulate and re-use selection logic. For example, the parcel area selection process could become a Python function:
def selectLargeParcels(parcelLayer):
outLayer = "large_parcels"
query = '"Area" > 4046'
arcpy.Select_analysis(parcelLayer,outLayer,query)
return outLayer
This standardizes selection logic into a reusable script tool callable with any valid input parcel layer.
Scheduling and executing scripts
Using Windows Task Scheduler or cron jobs, scripts can be triggered to run automatically at specified times. For example, running the large parcel selection script every Sunday night to prepare data for weekly analysis.
Scripts can also be executed directly through Python IDEs and script tools within ArcGIS Pro. Batch files or shell scripts provide another option for script invocation.
Next Steps After Selection
The arcpy.Select_analysis() function provides an automated avenue for extracting precise subsets of features from larger datasets. This opens up many possibilities for post-processing of selected data.
Editing selected features
Once unwanted features have been cleared from a dataset, the remaining selected subset can undergo targeted editing operations. Editing workflows are simplified by only focusing on features requiring updates rather than the entire bulk dataset.
Exporting selection to new dataset
In many cases, the end goal is to export the selected feature subset to a new output dataset for specialized analysis and visualization. Selection eliminates clutter and establishes a clean foundation dataset.
Entirely new workflows can be built around these freshly curated datasets selected and extracted in an automated fashion using arcpy.Select_analysis().