Вы находитесь на странице: 1из 77

GeoServer on steroids

All you wanted to know about how to make GeoServer faster but you never asked (or you did and no one answered)

Ing. Andrea Aime, GeoSolutions Ing. Simone Giannecchini, GeoSolutions

FOSS4G 2011, Denver 12th-16th September 2011

GeoSolutions

Founded in Italy in late 2006


Expertise

Image Processing, GeoSpatial Data Fusion Java, Java Enterprise, C++, Python JPEG2000, JPIP, Advanced 2D visualization

Supporting/Developing FOSS4G projects

GeoTools, GeoServer
GeoBatch, GeoNetwork

Clients

Public Agencies Private Companies

http://www.geo-solutions.it
FOSS4G 2011, Denver 12th-16th September 2011

Preparing raster inputs

FOSS4G 2011, Denver 12th-16th September 2011

Raster Data CheckList

Objectives Fast extraction of a subset of the data Fast extraction of overviews Check-list Avoid having to open a large number of files per request Avoid parsing of complex structures Avoid on-the-fly reprojection (if possible) Get to know your bottlenecks CPU vs Disk Access Time vs Memory Experiment with Format, compression, different color models, tile size, overviews, configuration (in GeoServer of course)
FOSS4G 2011, Denver 12th-16th September 2011

Problematic Formats

PNG/JPEG direct serving Bad formats (especially in Java) No tiling (or rarely supported) Chew a lot of memory and CPU for decompression Mitigate with external overviews NetCDF/grib1 and similar formats Complex formats (often with many subdatasets) Often contains un-calibrated data Must usually use multiple dimensions

Use ImageMosaic e.g. transpose X,Y,


FOSS4G 2011, Denver 12th-16th September 2011

Must usually massage the data before serving

Problematic Formats

Ascii Grid, GTOPO30, IDRISI and similar formats are bad

ASCII formats are bad No internal tiling, no compression, no internal overviews Extensible and rich, not (always) fast Can be difficult to tune for performance (might require specific encoding options) Why bother its proprietary?
FOSS4G 2011, Denver 12th-16th September 2011

JPEG2000 (with Kakadu)

ECW and MrSID

Choosing Formats and Layouts

To remember: GeoTiff is a swiss knife But you dont want to cut a tree with it! Tremendously flexible, good fir for most (not all) use cases BigTiff pushes the GeoTiff limits farther Single File VS Mosaic VS Pyramids Use single GeoTiff when Overviews and Tiling stay within 4GB No additional dimensions Consider BigTiff for very large file (> 4 GB) Support for tiling Support for Overviews Can be inefficient with very large files + small tiling
FOSS4G 2011, Denver 12th-16th September 2011

Choosing Formats and Layouts

Use ImageMosaic when:

A single file gets too big (inefficient seeks, too much metadata to read, etc..) Multiple Dimensions (time, elevation, others..) Avoid mosaics made of many very small files Single granules can be large Use Tiling + Overviews + Compression on granules Tremendously large dataset

Use ImagePyramid when:

Too many files / too large files Especially low resolution

Need to serve at all scales

For single granules (< 2Gb) GeoTiff is generally a good fit


FOSS4G 2011, Denver 12th-16th September 2011

Choosing Formats and Layouts

Examples:

Small dataset: single 2GB GeoTiff file Medium dataset: single 40GB BigTiff Large dataset: 400GB mosaic made of 10GB BigTiff files Extra large: 4TB of imagery, built as pyramid of mosaics of BigTiff/GeoTiff files to keep the file count low

FOSS4G 2011, Denver 12th-16th September 2011

GeoTiff preparation

STEP 0: get to know your data gdalinfo utility is your friend

CheckList Missing CRS

Add a .prj file Fix with gdal_translate Add a World File Fix with gdal_translate Fix with gdal_translate Use gdaladdo Use gdal_translate

Missing georeferencing

Bad Tiling

Missing Overviews

Compression

FOSS4G 2011, Denver 12th-16th September 2011

GeoTiff preparation

STEP 1: fix and optimize with gdal_translate Inner Tiling


gdal_translate -co "TILED=YES" -co "BLOCKXSIZE=512" -co "BLOCKYSIZE=512" in.tif out.tif Check also GeoTiff driver creation options here

CRS and GeoReferencing

gdal_translate a_srs EPSG:32619 a_ullr 285409.2 2014405.2 287536.8 2011947.6 in.tif out.tif Leverages on tiff support for multipage files and reduced resolution pages gdaladdo -r cubic output.tif 2 4 8 16 32 64 128 Choose the resampling algorithm wisely Chose the tile size and compression wisely (use
GDAL_TIFF_OVR_BLOCKSIZE)

STEP 2: add overviews with gdal_addo

Consider external overviews


FOSS4G 2011, Denver 12th-16th September 2011

GeoTiff preparation

FOSS4G 2011, Denver 12th-16th September 2011

GeoTiff preparation

Compression

Consider when disk speed/space is an issue Control it with gdal_translate and creation options LZW/Deflate are good for lossless compression JPEG is good for visually lossless compression Use LZW/Deflate on geophysical data (DEM, acquisitions) USE JPEG visually lossless with Photometric Interpretation to YCbCr for RGB
FOSS4G 2011, Denver 12th-16th September 2011

GeoTiff tiles can be compressed

From experience

Time, Elevation and other dimensions

Use Cases:

MetOc data (support for time, elevation) Data with additional indipendent dimensions Split in multiple GeoTiff files Optimize the files individually Use ImageMosaic Use a DBMS for indexing granules Use File Name based property collectors to turn properties into DB rows attributes Filter by time, elevation and other attributes via OGC and CQL filters

WorkFlow

Check back up slides for more info!


FOSS4G 2011, Denver 12th-16th September 2011

Time, Elevation and other dimensions

Indexing multiple dimensions with DB support (video here)

datastore.properties

timeregex.properties

stringregex.properties
indexer.properties

FOSS4G 2011, Denver 12th-16th September 2011

Time, Elevation and other dimensions

FOSS4G 2011, Denver 12th-16th September 2011

Proper Mosaic Preparation

ImageMosaic stitches single granules together with basic processing

Filtered selection Overviews/Decimation on read Over/DownSampling in memory ColorMask (optional) Mosaic/Stitch ColorMask again (optional)

Optimize files as if you were serving them individually Keep a balance between number and dimensions of granules
FOSS4G 2011, Denver 12th-16th September 2011

Proper Mosaic Configuration

STEP 0: Configure Coverage Access (see slide 22) STEP 1: Configure Mosaic Parameters ALLOW_MULTITHREADING

Load data from different granules in parallel Needs USE_JAI_IMAGE_READ set to false (Immediate Mode) In-memory processing, must not be too large Disk tiling should larger USE_JAI_IMAGREAD to true USE_MULTITHREADING to false*

Use a proper Tile Size


If memory is scarce:

Otherwise

USE_JAI_IMAGREAD to false ALLOW_MULTITHREADING to true

FOSS4G 2011, Denver 12th-16th September 2011

Proper Mosaic Configuration

Optional (Advanced): Configure Mosaic Parameters Directly Caching


Load the index in memory (using JTS SRTree) Super fast granule lookup, good for shapefiles Bad if you have additional dimension to filter on
Based on Soft References, controlled via Java switch SoftRefLRUPolicyMSPerMB

ExpandToRGB

Expand colormapped imagery to RGB in memory Trade performance for quality

SuggestedSPI
Default ImageIO Decoder class to use Dont touch unless expert

FOSS4G 2011, Denver 12th-16th September 2011

Proper Pyramid Preparation

Use gdal_retile for creating the pyramid Prepare the list of tiles to be retiled Create the pyramid with GDAL retile (grab a coffee!)
Chunks should not be too small (here 2048x2048) Too many files is bad anyway Use internal Tiling for Larger chunks size If the input dataset is huge use the useDirForEachRow option Too many files in a dir is bad practice Make sure the number of level is consistent Too few bad performance at high scale
FOSS4G 2011, Denver 12th-16th September 2011

Proper Pyramid Configuration

STEP 0: Configure Coverage Access (see slide 22) STEP 1: Configure Pyramid Parameters ALLOW_MULTITHREADING

Load data from different granules in parallel Needs USE_JAI_IMAGE_READ set to false (Immediate Mode) In-memory processing, must not be too large Disk tiling should larger USE_JAI_IMAGREAD to true USE_MULTITHREADING to false*

Use a proper Tile Size


If memory is scarce:

Otherwise

ImagePyramid relies on ImageMosaic

USE_JAI_IMAGREAD to false ALLOW_MULTITHREADING to true

FOSS4G 2011, Denver 12th-16th September 2011

Proper Pyramid Configuration

Optional (Advanced): Configure Mosaic Parameters Directly Caching


Load the index in memory (using JTS SRTree) Super fast granule lookup, good for shapefiles Bad if you have additional dimension to filter on
Based on Soft References, controlled via Java switch SoftRefLRUPolicyMSPerMB

ExpandToRGB

Expand colormapped imagery to RGB in memory Trade performance for quality

SuggestedSPI
Default ImageIO Decoder class to use Dont touch unless expert

FOSS4G 2011, Denver 12th-16th September 2011

Proper GDAL Formats Configuration

Fix Missing/Improper CRS with PRJ or coverage config Fix Missing GeoReferencing with World File Make sure GDAL_DATA is properly configured Use a proper Tile Size

In-memory processing, must not be too large Fundamental for striped data! JNI overhead Disk tiling should larger USE_JAI_IMAGREAD to true USE_MULTITHREADING to true* USE_JAI_IMAGREAD to false USE_MULTITHREADING is ignored

If memory is scarce:

Otherwise

FOSS4G 2011, Denver 12th-16th September 2011

Proper JPEG2000 Kakadu Configuration

Fix Missing/Improper CRS with PRJ or coverage config Fix Missing GeoReferencing with World File Make sure Kakadu dll/so is properly loaded Use a proper Tile Size

In-memory processing Must not be too large Disk tiling should larger USE_JAI_IMAGREAD to true USE_MULTITHREADING to true* USE_JAI_IMAGREAD to false USE_MULTITHREADING is ignored

If memory is scarce:

Otherwise

FOSS4G 2011, Denver 12th-16th September 2011

Proper GeoServer Coverage Options Configuration

Make sure native JAI and Image is installed

Enable ImageIO native acceleration Enable JAI Mosaicking native acceleration


Give JAI enough memory Dont raise JAI memory Threshold too high Rule of thumb: use 2 X #Core Tile Threads (check next slide)

Enable Tile Recycling only on trunk Enable Tile Recycling if memory is not a problem
FOSS4G 2011, Denver 12th-16th September 2011

Proper GeoServer Coverage Options Configuration

Multithreaded Granule Loading Allows to fine tuning multithreading for ImageMosaic Orthogonal to JAI Tile Threads Rule of Thumb: use 2 X #Core Tile Threads Perform testing to fine tune depending on layer configuration as well as on typical requests

ImageIO Cache threshold

decide when we switch to disk cache (very large WCS requests)

FOSS4G 2011, Denver 12th-16th September 2011

Reprojection Performance Vs Quality

GeoServer 2.1.x reprojects raster data using a piecewiselinear algorithm The area is divided in rectangular blocks, each having its own affine transform

The transformation between the full trigonometric expressions and the linear ones is driven by a tolerance, default value is 0.333
Larger value will make reprojection faster, but lower the quality -Dorg.geotools.referencing.resampleTolerance=0.5

FOSS4G 2011, Denver 12th-16th September 2011

Preparing vector inputs

FOSS4G 2011, Denver 12th-16th September 2011

Vector data checklikst

What do we want from vector data:

Binary data No complex parsing of data structures Fast extraction of a geographic subset Fast filtering on the most commonly used attributes

FOSS4G 2011, Denver 12th-16th September 2011

Choosing a format

Slow formats

WFS GML DXF

Good formats, local and indexable

Shapefile Directory of shapefiles SDE Spatial databases: PostGIS, Oracle Spatial, DB2, MySQL*, SQL server*

FOSS4G 2011, Denver 12th-16th September 2011

Shapefiles vs DBMS

Speed comparison vs spatial extent depicted:

Shapefile very fast when rendering the full dataset Database faster when extracting a small subset of a very large data set

Shapefile

no attribute indexing, avoid if filtering on attribute is important (filtering == reading less data, not applying symbols) Rich support for complex native filters Use connection pooling (preferably via JNDI) Validate connections (with proper pooling)

Database

FOSS4G 2011, Denver 12th-16th September 2011

Shapefile preparation

Remove .qix file if present, let GeoServer 2.1.x rebuild it (more efficient) If there are large DBF attributes that are not in use, get rid of them using ogr2ogr, e.g.: ogr2ogr -select FULLNAME,MTFCC arealm.shp tl_2010_08013_arealm.shp

If on Linux, enable memory mapping, faster, more scalable (but will kill Windows):

FOSS4G 2011, Denver 12th-16th September 2011

Shapefile filtering

Stuck with shapefiles and have scale dependent rules like the following?

Show highways first Show all streets when zoomed in

Use ogr2ogr to build two shapefiles, one with just the highways, one with everything, and build two layers, e.g.: ogr2ogr -sql "SELECT * FROM tl_2010_08013_roads WHERE MTFCC in ('S1100', 'S1200')" primaryRoads.shp tl_2010_08013_roads.shp

FOSS4G 2011, Denver 12th-16th September 2011

PostGIS specific hints

PostgreSQL out of the box configured for very small hardware: http://wiki.postgresql.org/wiki/Performance_Optimization

Make sure to run ANALYZE after data imports (updates optimizer stats)
As usual, avoid large joins in SQL views, consider materialized views If the dataset is massive, CLUSTER on the spatial index:

http://postgis.refractions.net/documentation/manual1.3/ch05.html

Careful with prepared statements (bad performance)


FOSS4G 2011, Denver 12th-16th September 2011

Optimize styling

FOSS4G 2011, Denver 12th-16th September 2011

Use scale dependencies

Never show too much data

the map should be readable, not a graphic blob. Rule of thumb: 1000 features max in the display

FOSS4G 2011, Denver 12th-16th September 2011

Labeling

Labeling conflict resolution is expensive, limit to the most inner zooms Halo is important for readability, but adds significant overhead Careful with maxDisplacement, makes for various label location attempts

FOSS4G 2011, Denver 12th-16th September 2011

FeatureTypeStyle

GeoServer uses SLD FeatureTypeStyle objects as Z layers for painting Each one allocates its own rendering surface (which can use a lot of memory), use as few as possible

FOSS4G 2011, Denver 12th-16th September 2011

Use translucency sparingly

Translucent display is expensive, use it sparingly

FOSS4G 2011, Denver 12th-16th September 2011

Scale dependent rules

Too often forgotten or little used, yet very important:

Hide layers when too zoomed in (raster/vector example) Progressively show details Add more expensive rendering when there are less features

Key to any high performance / good looking map

FOSS4G 2011, Denver 12th-16th September 2011

Example

FOSS4G 2011, Denver 12th-16th September 2011

Hide as you zoom in

Add a MinScaleDenominator to the rule This will make the layer disappear at 1:75000 (towards 1:1)

FOSS4G 2011, Denver 12th-16th September 2011

Alternative rendering

Simple rendering at low scale (up to 1:2000) More complex rendering when zoomed in (1:1999 and above)

FOSS4G 2011, Denver 12th-16th September 2011

Alternative rendering

FOSS4G 2011, Denver 12th-16th September 2011

Point symbols

600 loc for 6 different points types Painful


FOSS4G 2011, Denver 12th-16th September 2011

Prepare data

alter table pointlm add column image varchar;

update pointlm set image = 'shop_supermarket.p.16.png' where MTFCC = 'C3081' and (FULLNAME like '%Shopping%' or FULLNAME like '%Mall%');
update pointlm set image = 'peak.png' where MTFCC = 'C3022'

update pointlm set image = 'amenity_prison.p.20.png' where MTFCC = 'K1236'; update pointlm set image = 'museum.p.16.png' where MTFCC = 'K2165'; update pointlm set image = 'airport.p.16.png' where MTFCC = 'K2451'; update pointlm set image = 'school.png' where MTFCC = 'K2543';
update pointlm set image = 'christian3.p.14.png' where MTFCC = 'K2582';

update pointlm set image = 'gate2.png' where MTFCC = 'K3066';

FOSS4G 2011, Denver 12th-16th September 2011

Dynamic symbolizers

FOSS4G 2011, Denver 12th-16th September 2011

Output tuning

FOSS4G 2011, Denver 12th-16th September 2011

WMS output formats


JPEG PNG 8bit PNG 24bit

23.8KB

66KB

169.4KB

27KB Compression artifacts

27KB

64KB Large size

Color reduction FOSS4G 2011, Denver 12th-16th September 2011

WFS output formats


35 30 25 20 15 10 5 0

Dimension MB

HTTP GZip compression is transparent in GeoServer, make sure proxies keep it (or pay 10x price)
FOSS4G 2011, Denver 12th-16th September 2011

Tile caching

FOSS4G 2011, Denver 12th-16th September 2011

Tile caching with GeoWebCache

Tile oriented maps, fixed zoom levels and fixed grid Useful for stable layers, backgrounds Protocols: WMTS, TMS, WMS-C, Google Maps/Earth, VE Speedup compared to dynamic WMS: 10 to 100 times, assuming tiles are already cached (whole layer preseeded) Suitable for:

Mostly static layer No (or few) dynamic parameters (CQL filters, SLD params, SQL query params, time/elevation, format options)
FOSS4G 2011, Denver 12th-16th September 2011

Embedded GWC advantage

No double encoding when using meta-tiling, faster seeding

FOSS4G 2011, Denver 12th-16th September 2011

Space considerations

Seeding Colorado, assuming 8 cores, one layer, 0.1 sec 756x756 metatile, 15KB for each tile Do yours: http://tinyurl.com/3apkpss Not enough disk space? Set a disk quota
Zoom level 13 14 15 16 17 18 19 20 Tile count 58,377 232,870 929,475 3,713,893 14,855,572 59,396,070 237,584,280 950,273,037 Size (MB) 1 4 14 57 227 906 3,625 14,500 Time to seed Time to seed (hours) (days) 0 0 0 0 0 0 1 0 6 0 23 1 92 4 367 15

FOSS4G 2011, Denver 12th-16th September 2011

Resource control

FOSS4G 2011, Denver 12th-16th September 2011

WMS request limits

Max memory per request: avoid large requests, allows to size the server memory (max concurrent request * max memory)

Max time per request: avoid requests taking too much time (e.g., using a custom style provided with dynamic SLD in the request)
Max errors: best effort renderer, but handling errors takes time

FOSS4G 2011, Denver 12th-16th September 2011

WFS request limits

Max feature returned, configured as a global limit Return feature bbox: reduce amount of generated GML

Per layer max feature count

FOSS4G 2011, Denver 12th-16th September 2011

WCS request limits

FOSS4G 2011, Denver 12th-16th September 2011

Control flow

Control how many requests are executed in parallel, queue others:

Increase throughput Control memory usage Enforce fairness

More info here


FOSS4G 2011, Denver 12th-16th September 2011

Control flow

17%

$GEOSERVER_DATA_DIR/controlflow.properties # don't allow more than 16 GetMap requests in parallel ows.wms.getmap=16 FOSS4G 2011, Denver 12th-16th September 2011

Auditing

Log each and every request Log contents driven by customizable template Summarize and analyze requests with offline tools More info here

FOSS4G 2011, Denver 12th-16th September 2011

JVM and deploy configuration

FOSS4G 2011, Denver 12th-16th September 2011

Premise

The options discussed here are not going to help visibly if you did not prepare the data and the styles They are finishing touches that can get performance up once the major data bottlenecks have been dealt with

Check Running in production instructions here

FOSS4G 2011, Denver 12th-16th September 2011

JVM settings

--server: enables the server JIT compiler --Xms2048m -Xmx2048m: sets the JVM use two gigabytes of memory --XX:+UseParallelOldGC -XX:+UserParallelGC: enables multi-threaded garbage collections, useful if you have more than two cores --XX:NewRatio=2: informs the JVM there will be a high number of short lived objects --XX:+AggressiveOpt: enable experimental optimizations that will be defaults in future versions of the JVM

FOSS4G 2011, Denver 12th-16th September 2011

Native JAI and JDK

Install native JAI and use a recent Sun JDK! Benchmark over a small data set (the effect is not as visible on larger ones)

FOSS4G 2011, Denver 12th-16th September 2011

Setup a local cluster

Java2D locks when drawing antialiased vectors

Limits scalability severely

Use Apache mod_proxy_balance and setup a GeoServer each 2/4 cores


mod_proxy_balance

GeoServer

GeoServer

GeoServer

FOSS4G 2011, Denver 12th-16th September 2011

Clustering advantage

FOSS4G 2010 vector benchmarks (roads/buildings/isolines and so on, over the entire Spain) GeoServer was benchmarked without local clustering

66%

FOSS4G 2011, Denver 12th-16th September 2011

Benchmarking

FOSS4G 2011, Denver 12th-16th September 2011

Using JMeter

Good benchmarking tool Allows to setup multiple thread groups, different parallelelism and request count, to ramp up the load Can use CSV files to generate semi-randomized requests Reports results in a simple table

http://jakarta.apache.org/jmeter/

FOSS4G 2011, Denver 12th-16th September 2011

Using JMeter

Thread group: how many threads Loop: how many requests

HTTP sampler: the request


CSV: read request params from CSV Summary table

FOSS4G 2011, Denver 12th-16th September 2011

Generating the CSV

Simple randomized generation tool built during WMS shootouts, wms_request.py Generate csv with the bbox and width/height to be used in JMeter scripts: ./wms_request.py -count 1200 -region -180 -90 180 90 -minres 0.002 -maxres 0.1 -minsize 256 256 -maxsize 1024 1024

Get it here along with a corresponding JMeter script: http://demo1.geo-solutions.it/share/jmeter_2011.zip

FOSS4G 2011, Denver 12th-16th September 2011

Checking results

Results table Run the benchmarks 2-3 times, let the results stabilize Save the results, check other optimizations, compare the results

FOSS4G 2011, Denver 12th-16th September 2011

Real world deploy

FOSS4G 2011, Denver 12th-16th September 2011

Deploy configuration

FOSS4G 2011, Denver 12th-16th September 2011

Raster data

Whole Italy at 50cm per pixel Over 4TB, updated fully every 3 years (old data still available for historical access) Custom pyramid

100 m per pixel: one image 20m per pixel: mosaic of 20 tiles 4m per pixel: mosaic of few hundred tiles

0.5m per pixel: 9000 tiles

Each tile is 10000x10000, with overviews

FOSS4G 2011, Denver 12th-16th September 2011

Vector data

Cadastral data for the whole Italy, with full history (interval of validity for each parcel) 100 million polygons A query extracts a subset relative to a certain time interval and area the user is allowed to see

No data from this table is ever shown below 1:50000 (SLD scale dependencies)
Physical table level partitioning (Oracle style) of the table based on geographic area to parallelize and cluster data loading, plus spatial indexing and indexes on commonly filtered upon attributes

FOSS4G 2011, Denver 12th-16th September 2011

The End

Questions?
andrea.aime@geo-solutions.it

simone.giannecchini@geo-solutions.it
FOSS4G 2011, Denver 12th-16th September 2011

Вам также может понравиться