Your data may be too large for some purposes. Some applications don’t even allow you to upload too large data. For example, Google Earth often has trouble displaying KML / KMZ files larger than 100 MB – this issue is probably memory related. There is no official data size limit for Google Earth, and although this issue has been known for several years, Google has not yet addressed it. Google Earth doesn’t even inform you that the data could not be retrieved due to being too large, but the operation will end with a meaningless message: “No results – empty KML file”.
Another example of the limitations of working with large data comes from Google again. Their Google Maps has also limits set, this time officially. Unzipped KML and KMZ files can be up to 5MB and the other files (eg GPX, CSV, TSV, XLSX) can be up to 40MB. Another limitation in Google Maps is that you can’t import files with more than 2,000 rows.
You can also easily fill your local drive or cloud storage with unnecessarily large spatial data, and this large data is also very complicated to work with (editing / analysis), due to its high demands on system resources (CPU, memory, …). Last but not least, reducing the size of the map file can significantly speed up the loading time and optimize it for use on the web and for your mapping applications.
What can I do to reduce size?
There are several ways to process vector data to get a much smaller file. You can:
- Clip the vector layer just to the essential part
- Split datasets by polygons
- Remove unnecessary information (some attribute fields)
- Reduce the geometric accuracy of vector drawings
- Reduce unnecessarily high accuracy of coordinate storage
You can use one of the methods described above or combine them for even greater size reduction.
Clip vector data
Data clipping is essential to reduce the overall file size if your area of interest is significantly smaller than the total data extent. To clip, it is necessary to define a polygon indicating your area of interest and crop the vector data according to this polygon. There is a tool for this purpose “Clip vector data“. The tool allows you to clip vector layer(s) according to the specific area. You can define the area by uploading a vector file (you can also choose the specific layer in multilayer files) or by drawing the area on a map. Then you have to select the type of shape the data should be clipped – either Geometry (exact shape), Bounding box (extent) or Convex hull. Finally, you can select the output vector file format.
Split datasets by polygons
Specialized tool “Split datasets by polygons” splits the input dataset/file into as many output layers as is the number of polygons in the vector layer used for split. Such a layer will be produced also for each individual layer in the input datasets. You can set the areas by which the input data will be splitted either by uploading a vector layer or by drawing polygons on the map.
Delete unnecessary attributes
Another way to reduce the size of a vector file is to remove unnecessary information in some attribute fields. If the data contains columns with attributes that you don’t need for your work, you should consider deleting it. For some file formats, this can significantly reduce the overall file size.
A highly effective method for reducing the size of a vector file is simplification of the geometry. You can do that in a simple way using the “Simplify Geometries Preserving Topology” tool. This tool allows you to simplify highly detailed geodata (polylines/polygons – eg. census boundary data) with respect to the topology of neighbors. When you’re intending to create for example only overview maps, such high accuracy is not needed. Generalisation during the simplification process preserves the geometry relationship (topology) – so there are no overlaps or gaps.
It’s not just about the file size. Too much detail in the geometries can slow down the data rendering and response of your application. Simplification also significantly reduces the file size. To reduce the file size even more, the tool allows export to various vector formats – including TopoJSON, GeoJSON, ESRI Shapefile, etc.
Example: When you simplify the complexity of the geometry to 25% it means that 75% of points will be removed. The original vector layer of roads in California in ESRI Shapefile was 73 MB, but after being simplified to 25%, the size was reduced to 53 MB. However, if you have really large data, such as a 1032 MB contour lines GeoJSON file, you can simplify the geometry to 50% and reduce the size of the original file to 566 MB.
Reduce coordinates accuracy
When it is not necessary to preserve extremely high accuracy, you can significantly reduce the file size by rounding the geometric coordinates. There is a special tool designed just for that purpose – Round geometry coordinates.
Vector data geometries are composed of coordinates with a given number of decimal places. Very often there is for example 15 decimal places accuracy for coordinates that were acquired with one meter accuracy (eg. coordinates collected in the field using GPS). In these cases, there is sufficient to decrease accuracy for example to one decimal place and this small adjustment can reduce the file size eg. by 50% and in many cases much more.
However, you have to remember that if you reduce the decimal places too much (especially when using the degree coordinate system), then the geometry of some small elements may disappear (the number of vertices in the resulting file may decrease). Eg. for data in degrees the default precision of 5 decimal places roughly equals of precision of about 1.1 m.
Example: The original vector layer of contour lines in GeoJSON (WGS 84 coordinate system) was 1032 MB. After reducing the number of decimal places for coordinates from the original value of 15 to the new value of 5, the size of the GeoJSON file was also reduced to approximately half 506 MB.