Improving Performance And Scalability Of Web-Based Gis Applications
Caching Map Tiles and Datasets for Faster Loading
To improve performance of web-based geographic information systems (GIS), a common technique is to cache map tiles, vector data, and other spatial datasets in a fast in-memory store like Redis or Memcached. This avoids having to load the same data repeatedly from disk or databases, reducing latency. Popular libraries like GeoWebCache can manage caches of pre-rendered map tiles, determining which ones to retain based on usage statistics and eviction policies. For dynamic vector data, Redis provides geospatial indexes to quickly locate features near certain map views. By caching data close to GIS application servers, network trips to remote storage are minimized. CDNs can also be used to distribute caches globally.
Configuring Cache Size, Eviction Policies and Pre-Seeding
When implementing caching, it’s important to size the cache capacity appropriately to the working dataset size, set eviction policies that retain most frequently accessed tiles/data, and pre-seed the cache with known hot data to avoid cache misses on initial views. Tools like TileStache, TileCache, GeoDjango Cache Machine, and Esri’s ArcGIS Server allow configuring tile cache parameters. Benchmarking with production workloads helps right-size caches.
Optimizing Spatial Queries and Filters
GIS applications rely heavily on spatial queries to search, filter and aggregate vector geo-data. Improving performance of geospatial queries can significantly speed up response times. This requires properly configuring the database, creating spatial indexes, partitioning data, and optimizing the query plans.
Spatial Indexes, Query Optimization and Vector Data Partitioning
PostGIS, Oracle Spatial, SqlServer Spatial and other geospatial databases leverage R-Trees, GiSTs and other spatial indexes to quickly filter vector data by location and geometry. Spatial database parameters need optimization for the dataset size and query patterns. EXPLAIN can be used to check if indexes are utilized. Query plans may need to be tuned for targeted use cases. Large vector tables can be partitioned by geography to distribute queries across servers.
Using Vector Tiles Over Raster Tiles
Traditionally web maps are rendered as raster map image tiles. An emerging trend is using vector tiles for greater flexibility, lower storage needs and dynamic styling. With raster tiles, labels, roads, borders have to be baked into images. Vector tiles retain roads, POIs as vectors allowing changing styles. Mapbox GL, ArcGIS Vector Tiles, Google Maps Vector Tiles are leading examples. Vector tiles are typically small sized, compressed, protobuf or JSON formatted vector and attribute data chunks, rendered dynamically with styles controlled via CSS or Mapbox GL stylesheets. Specialized vector tile servers like Tegola cache and serve vector tiles for low latency dynamic map rendering.
Generating Optimized Vector Tiles
While vector tiles provide flexibility, poorly configured vector tiles can bloat size due to complexity. Care needs to be taken to simplify geometries, filter unnecessary attributes, and optimize precision for balancing rendering quality with size. Custom layers should be configured for targeted use case needs. Tools like Tippecanoe, ArcGIS Pro, QGIS can help build and fine tune vector tilesets for production use.
Employing Efficient Data Structures and Algorithms
The logical design of geospatial data structures and choice of computational geometry algorithms fundamentally impacts GIS application performance. Typically data is modeled as vector features (points, lines, polygons), rasters, topologies connected via spatial relationships and geo-entities stored in hierarchical data structures. Carefully structuring and normalizing this data for the use case’s spatial computation needs allows efficient analysis and querying.
Tuning Topology Graphs, Spatial Models and Computational Geometry
Modeling geospatial domains efficiently requires structuring with topological connectedness between spatial entities while avoiding overcomplicated topologies for simpler use cases. Similarly applying computational geometry principles and appropriate algorithms results in speed ups. Simplifying geometries to avoid unnecessary vertices, generalizing rasters to lower resolution where possible are other techniques for improving rendering and analysis speeds.
Scaling Out Across Servers and Cloud Infrastructure
As the number of users, spatial data volumes and application complexity increases, a single GIS server hits bottlenecks in storage, compute and memory use. Scaling out across clusters of commodity servers can multiply capacity and performance, spreading load horizontally. Cloud platforms provide autoscaling clusters with load balancing to handle spiky traffic and failovers. Open source tools like GeoServer, GeoMesa or ArcGIS GeoEvent Server enable building distributed geospatial data and processing pipelines across servers.
Configuring GIS Clusters on Cloud for Hybrid Deployments
While cloud infrastructure readily supports elastic scaling, many enterprises choose hybrid models with some core components on-premise for security and low latency needs. APIs, microservices and containerization allow smoothly integrating cloud-based servers. Kubernetes can orchestrate container lifecycles and scale pods on AWS EC2, Azure Kubernetes Service or GCP GKE performing geospatial analysis. A well planned hybrid cluster balances flexibility, scalability and governance needs.
Accommodating More Concurrent Users and Requests
Supporting higher numbers of concurrent users is essential for scalability. At peak load times, response times get impacted unless additional user requests can be handled in parallel. This requires replicated cache servers to distribute requests, leveraging CDNs to cache rendered maps and tiles closer to users. Scaling app servers via autoscaling server pools and load balancers helps avoid overloading any single server. Databases may need read replicas distributing queries for load balanced low latency.
Tuning Connection Pools, Request Queuing and Microservices Deployment
To ensure fast responses under load, application server connection pools to the backend need to be tuned for the geospatial use case context to efficiently support concurrent sessions. Request queues can alleviate flooding and gracefully degrade service during traffic surges. Stateless microservices with independent autoscaling allows cost efficiently allocating compute resources to subsystems experiencing higher load.
Sample Code for Implementing Clustered Map Servers
High performance web GIS solutions typically leverage a multi-node cluster with load balancing across servers to distribute requests. Here is sample code demonstrating a NodeJS based vector tile server implementation leveraging stateless microservices, container orchestration and auto scaling groups to efficiently serve map data.
NodeJS Code Snippet for Containerized Vector Tile Server Microservice
// index.js
const express = require('express');
const geojsonVt = require('geojson-vt');
const tiles = geojsonVt(vectorData, tileOptions);
const app = express();
app.get('/tiles/:z/:x/:y.vector.pbf', getTile);
function getTile(req, res) {
const {x, y, z} = req.params;
const tile = tiles.getTile(z, x, y);
if (tile) {
res.setHeader('Content-Encoding', 'gzip');
res.send(pbf.fromGeojsonVt(tile));
} else {
res.status(404).end();
}
}
module.exports = app;
The microservice can be containerized into Docker and replicated across clusters managed by Kubernetes for scalability. Geo-data partitioning and read-replicas help scale underlying databases.
Architecting High-Performance Geospatial Databases
Slow database response is often a bottleneck for web GIS applications needing dynamic renderings with latest data. While adding read-replicas helps scale out databases, the underlying data models and infrastructure configurations also majorly impact performance. Multi-tenant clusters, in-memory acceleration, SSD storage, data partitioning are key performance boosters.
Tuning Database Configurations, Deployments and Infrastructure
Beyond indexes and queries, database configurations like shared buffers, work memory, checkpoint segments and parallelism settings impact performance. Geo-replicated multi-tenant cloud data warehouse clusters running on fast SSD infrastructure speed queries. PostgreSQL has emerging geospatial optimizations. Purpose built geospatial databases like SpatialDB, Planetscale’s Atlas, ESRI’s Geodata Services provide cloud-native high performance at scale.
Strategies for Reducing Application Latency
From static asset optimization to using browser storage for offline use, various techniques help lower web GIS application latency. Pre-fetching data for upcoming user views avoids delays rendering new maps and visuals. Progressive enhancement begins with a fast minimal UI improving iteratively. Search-as-you-type with client filtering provides instant feedback. Edge computing and fog networks distribute data closer to users.
Sample Code for Client Side Data Caching
// cache.js
const cache = new Map();
async function getData(url) {
// check cache first
if (cache.has(url)) {
return cache.get(url);
}
// fetch fresh data
const res = await fetch(url);
const data = await res.json();
// set in cache
cache.set(url, data);
return data;
}
module.exports = {
getData
}
Using above pattern for geodata like tiles and feature layers caches assets on the client, speeding up panning and zooming across a web map application.