Integrating Diverse Data Sources For Richer Gis Analysis
Gathering Data from Multiple Sources
The first step in enriching GIS analysis is identifying and gathering relevant data from a variety of sources. Government agencies at local, regional, and national levels often provide authoritative spatial data relevant to GIS users. Open data portals have proliferated in recent years, offering freely available datasets for download. Commercial data providers can fill gaps with high quality and specialty data products. Key data types to collect are:
- Vector data: Point, line, and polygon vector files representing features like buildings, roads, boundaries, infrastructure
- Raster data: Gridded imagery, digital elevation models, and other analysis layers
- Tabular data: Spreadsheets and database tables containing attributes to link with spatial features
Carefully assess each identified dataset for accuracy, resolution, completeness, licensing terms and other attributes to determine its fitness for the GIS use case.
Preprocessing and Converting Data
With data gathered from government portals, OSM contributions, proprietary feeds or scanned analog maps, preprocessing ensures a common formatting and structure across the layers as they are brought together. Tasks include:
- Geocoding and spatial alignment: Confirm all data layers share a common coordinate system for map alignment
- Reprojection: Convert raster and vector data to a shared projected coordinate system optimized for the region of interest
- Format conversion: Translate data between file types such as ESRI Shapefile, GeoJSON, KML, FileGDB
- Table joining: Link attributes from tabular data sources like CSVs and spreadsheets to spatial features based on shared identifiers
Proper preprocessing allows the constructed geospatial database to serve as an accurate and versatile foundation for GIS integration and analysis tasks further downstream.
Enriching GIS Capabilities
A key motivation for integrating varied datasets into a GIS is expanding analysis possibilities beyond what isolated data layers can support. New capacities arise in visualization, exploratory analysis, and modeling workflows.
High resolution imagery, digital surface models, and terrain data enrich mapping capabilities with oblique views, immersive flythroughs over 3D cityscapes, dynamic hillshading, and accurate measurements absent in 2D data. Tabular data expands attribute information linked to vector features, powering data visualization via chloropleth mapping, graduated symbols on points, and other techniques that reveal spatial patterns.
Real-time sensor streams such as traffic counters, air quality monitors, and drone telemetry can animate maps to convey how phenomena change over time. And amalgamating statistical census records, business listings, infrastructure inventories, lidar point clouds, and other punctiform, linear, and polygon layers exposes correlations impossible to extract from single sources in isolation.
Advanced Analysis Across Integrated Data
Once an array of relevant data layers are fused together, analysis techniques can synthesize the composite information to yield powerful insights not possible working with individual datasets alone. Common approaches include:
- Suitability modeling: Factoring together terrain, land cover, infrastructure, ownership and hazards data to determine optimal locations for siting new facilities
- Cluster analysis: Identifying statistically significant hot spots and spatial outliers by combining geo-located incident data with demographic attributes
- Change detection: Discerning subtle shifts in land cover over time by comparing differences between older and newer classified imagery
Many tools and algorithms underpin advanced spatial analysis workflows, from aggregating, weighting, and contouring data variables to delineating catchment zones across flow-direction rasters.
Example Workflow: OpenStreetMap and Census Data
To intermix common public domain data sources and illustrate expanded analytic capabilities, we walk through an example using:
- Building footprints and road vectors extracted from OpenStreetMap (OSM), the crowdsourced mapping project
- Socioeconomic indicators joined from the American Community Survey (ACS) published by the US Census
Analysts can download OSM building and road data for an area of interest, then preprocess the vectors to tidy anomalies, enforce topology rules, and improve attribution. Census boundary files connect geographic units like Block Groups and Tracts to tabular ACS tabulations detailing population, housing, income, education and other statistics.
Joining the OSM infrastructure vectors to ACS records based on contained or intersecting spatial relationships enables fresh analytic angles. Clustering analysis might reveal correlations between highway proximity and poverty indicators. Or suitability models can prescribe locations for new parks based on open land parcels in high-density areas currently lacking amenities.