CREATE EXTENSION postgis;
The XYZT platform currently supports the following databases:
Postgres
ClickHouse
Although both databases use SQL, they differ significantly in terms of architecture, geospatial support, indexing strategies, and performance characteristics. As a result, each database requires a slightly different setup and optimization approach.
The following sections describe general guidelines as well as database-specific requirements and recommendations.
The following guidelines apply to all supported databases.
When working with large volumes of data, choose the smallest possible data type that covers the required range and precision:
Prefer integers over floating-point numbers.
Prefer smaller integer types (for example, SMALLINT or Int8) over larger ones
(for example, BIGINT or Int32).
Prefer lower-precision floating-point types (for example, REAL or Float32)
over higher-precision types (for example, DOUBLE PRECISION or Float64).
Although strictly required only on geometry tables, performance may improve if you also include the priority level column in time series or data tables. This increases the likelihood that the query planner can skip large portions of data that are not visible at a given map scale.
Try to reduce the number of distinct values per column whenever possible. Aligning values improves compression and scan efficiency. For example, for time series data:
Round timestamps to the coarsest acceptable resolution.
Ensure all records belonging to the same interval use exactly the same timestamp value (for example, the start or midpoint of a 15-minute interval).
The PostGIS extension must be enabled.
CREATE EXTENSION postgis;
Geometries may be stored in any coordinate reference system supported by PostGIS, but EPSG:4326 is recommended for optimal performance.
Temporal columns should include time zone information.
Geometry tables must define a unique identifier as a primary key.
Time series tables must reference the geometry table identifier using a foreign key.
For optimal performance, create indices on the following columns:
Geometry table:
A spatial index on the geometry column
A basic index on the priority level column (if present)
Time series table:
A basic index on the geometry identifier column
A basic index on the priority level column (if present)
A basic index on the timestamp column
Geometry table
CREATE TABLE road_segments (
segment_id VARCHAR(255) PRIMARY KEY,
geometry GEOMETRY(LINESTRING, 4326) NOT NULL,
name VARCHAR(255),
road_class SMALLINT,
start_node_id INTEGER NOT NULL,
end_node_id INTEGER NOT NULL,
reverse_id VARCHAR(255),
FOREIGN KEY (reverse_id) REFERENCES road_segments(segment_id)
);
CREATE INDEX idx_road_segments_geometry
ON road_segments USING GIST(geometry);
CREATE INDEX idx_road_segments_road_class
ON road_segments(road_class);
Time series table
CREATE TABLE traffic_stats (
segment_id VARCHAR(255) NOT NULL,
road_class SMALLINT,
timestamp TIMESTAMPTZ NOT NULL,
average_speed SMALLINT,
free_flow_speed SMALLINT,
FOREIGN KEY (segment_id) REFERENCES road_segments(segment_id)
);
CREATE INDEX idx_traffic_stats_segment
ON traffic_stats(segment_id);
CREATE INDEX idx_traffic_stats_road_class
ON traffic_stats(road_class);
CREATE INDEX idx_traffic_stats_timestamp
ON traffic_stats(timestamp);
In this example, SMALLINT is used to store speed values, assuming integer speeds
such as 100 km/h rather than fractional values like 100.12. Using the smallest
appropriate data type reduces storage size and improves performance, especially for
large data sets.
Note also that the road_class column is duplicated in the traffic_stats table.
This denormalization improves query performance by allowing the database to filter
records without an additional join.
ClickHouse is a column-oriented database that typically outperforms Postgres on very large time series data sets. While it also uses SQL, table design and indexing differ in several important ways.
ClickHouse has limited spatial support; geometries must be stored in EPSG:4326.
Temporal columns should include time zone information.
Geometry tables must define a unique identifier as a primary key.
Geometry tables must include an additional bounding box column with the format
(minLon, minLat, maxLon, maxLat), defined as
Tuple(Float64, Float64, Float64, Float64).
Time series tables must reference the geometry table identifier.
|
Because ClickHouse does not provide full spatial indexing like PostGIS, the bounding box column is required to support spatial filtering. |
Create indices on the following columns:
Geometry table:
An index on the bounding box column
A basic index on the priority level column (if present)
Choose an efficient record order using the ORDER BY clause.
Proper ordering allows ClickHouse to compress data effectively and skip irrelevant
data blocks during query execution. Include, where applicable:
Priority level
Geometry identifier
Timestamp
Trip identifier
Columns with a limited number of distinct values often perform best when placed earlier in the ordering.
For very large data sets, define a partitioning scheme using PARTITION BY.
Partitioning allows ClickHouse to skip entire data partitions during queries.
Common partitioning candidates include:
Priority level
Timestamp
CREATE TABLE road_segments (
segment_id String,
geometry LineString,
name String,
road_class Int16,
start_node_id Int32,
end_node_id Int32,
reverse_id Nullable(String),
bbox Tuple(Float64, Float64, Float64, Float64),
INDEX bbox_idx bbox TYPE minmax GRANULARITY 1,
INDEX idx_road_class road_class TYPE minmax GRANULARITY 4
) ENGINE = MergeTree()
ORDER BY segment_id;
CREATE TABLE traffic_stats (
segment_id String,
road_class Int16,
timestamp DateTime('UTC'),
average_speed Int16,
free_flow_speed Int16
) ENGINE = MergeTree()
PARTITION BY (road_class, toYYYYMMDD(timestamp))
ORDER BY (road_class, segment_id, timestamp);