Database-specific guidelines

Database configuration

The XYZT platform currently supports the following databases:

Postgres
ClickHouse

Although both databases use SQL, they differ significantly in terms of architecture, geospatial support, indexing strategies, and performance characteristics. As a result, each database requires a slightly different setup and optimization approach.

The following sections describe general guidelines as well as database-specific requirements and recommendations.

General performance tips

The following guidelines apply to all supported databases.

When working with large volumes of data, choose the smallest possible data type that covers the required range and precision:
- Prefer integers over floating-point numbers.
- Prefer smaller integer types (for example, SMALLINT or Int8) over larger ones (for example, BIGINT or Int32).
- Prefer lower-precision floating-point types (for example, REAL or Float32) over higher-precision types (for example, DOUBLE PRECISION or Float64).
Although strictly required only on geometry tables, performance may improve if you also include the priority level column in time series or data tables. This increases the likelihood that the query planner can skip large portions of data that are not visible at a given map scale.
Try to reduce the number of distinct values per column whenever possible. Aligning values improves compression and scan efficiency. For example, for time series data:
- Round timestamps to the coarsest acceptable resolution.
- Ensure all records belonging to the same interval use exactly the same timestamp value (for example, the start or midpoint of a 15-minute interval).

Working with Postgres

Requirements

The PostGIS extension must be enabled.

CREATE EXTENSION postgis;

Geometries may be stored in any coordinate reference system supported by PostGIS, but EPSG:4326 is recommended for optimal performance.
Temporal columns should include time zone information.
Geometry tables must define a unique identifier as a primary key.
Time series tables must reference the geometry table identifier using a foreign key.

Performance tips

For optimal performance, create indices on the following columns:

Geometry table:
- A spatial index on the geometry column
- A basic index on the priority level column (if present)
Time series table:
- A basic index on the geometry identifier column
- A basic index on the priority level column (if present)
- A basic index on the timestamp column

Example table definitions

Geometry table

CREATE TABLE road_segments (
    segment_id VARCHAR(255) PRIMARY KEY,
    geometry GEOMETRY(LINESTRING, 4326) NOT NULL,
    name VARCHAR(255),
    road_class SMALLINT,
    start_node_id INTEGER NOT NULL,
    end_node_id INTEGER NOT NULL,
    reverse_id VARCHAR(255),
    FOREIGN KEY (reverse_id) REFERENCES road_segments(segment_id)
);

CREATE INDEX idx_road_segments_geometry
    ON road_segments USING GIST(geometry);

CREATE INDEX idx_road_segments_road_class
    ON road_segments(road_class);

Time series table

CREATE TABLE traffic_stats (
    segment_id VARCHAR(255) NOT NULL,
    road_class SMALLINT,
    timestamp TIMESTAMPTZ NOT NULL,
    average_speed SMALLINT,
    free_flow_speed SMALLINT,
    FOREIGN KEY (segment_id) REFERENCES road_segments(segment_id)
);

CREATE INDEX idx_traffic_stats_segment
    ON traffic_stats(segment_id);

CREATE INDEX idx_traffic_stats_road_class
    ON traffic_stats(road_class);

CREATE INDEX idx_traffic_stats_timestamp
    ON traffic_stats(timestamp);

In this example, SMALLINT is used to store speed values, assuming integer speeds such as 100 km/h rather than fractional values like 100.12. Using the smallest appropriate data type reduces storage size and improves performance, especially for large data sets.

Note also that the road_class column is duplicated in the traffic_stats table. This denormalization improves query performance by allowing the database to filter records without an additional join.

Working with ClickHouse

ClickHouse is a column-oriented database that typically outperforms Postgres on very large time series data sets. While it also uses SQL, table design and indexing differ in several important ways.

Requirements

ClickHouse has limited spatial support; geometries must be stored in EPSG:4326.
Temporal columns should include time zone information.
Geometry tables must define a unique identifier as a primary key.
Geometry tables must include an additional bounding box column with the format (minLon, minLat, maxLon, maxLat), defined as Tuple(Float64, Float64, Float64, Float64).
Time series tables must reference the geometry table identifier.

Because ClickHouse does not provide full spatial indexing like PostGIS, the bounding box column is required to support spatial filtering.

Performance tips

Create indices on the following columns:
- Geometry table:
  - An index on the bounding box column
  - A basic index on the priority level column (if present)
Choose an efficient record order using the ORDER BY clause. Proper ordering allows ClickHouse to compress data effectively and skip irrelevant data blocks during query execution. Include, where applicable:
- Priority level
- Geometry identifier
- Timestamp
- Trip identifier
  Columns with a limited number of distinct values often perform best when placed earlier in the ordering.
For very large data sets, define a partitioning scheme using PARTITION BY. Partitioning allows ClickHouse to skip entire data partitions during queries. Common partitioning candidates include:
- Priority level
- Timestamp

Example table definitions

Geometry table

CREATE TABLE road_segments (
   segment_id String,
   geometry LineString,
   name String,
   road_class Int16,
   start_node_id Int32,
   end_node_id Int32,
   reverse_id Nullable(String),
   bbox Tuple(Float64, Float64, Float64, Float64),
   INDEX bbox_idx bbox TYPE minmax GRANULARITY 1,
   INDEX idx_road_class road_class TYPE minmax GRANULARITY 4
) ENGINE = MergeTree()
ORDER BY segment_id;

Time series table

CREATE TABLE traffic_stats (
    segment_id String,
    road_class Int16,
    timestamp DateTime('UTC'),
    average_speed Int16,
    free_flow_speed Int16
) ENGINE = MergeTree()
PARTITION BY (road_class, toYYYYMMDD(timestamp))
ORDER BY (road_class, segment_id, timestamp);