Data stores versus data sets

Got feedback? Additional questions? Just want to have a friendly chat? Get in touch!

Data sets versus data stores

The xyzt.ai platform supports two ways to upload data: as data sets and data stores. This article explains the similarities and differences.

Both the data set and the data store contain the location data that you upload to the platform, as explained in the introduction article.

When creating a new data store or data set, you have to:

define the properties that are present in your (meta)data, and what their data type is (e.g. strings, integers, …)
define which properties you want to be available for analysis
upload your (meta)data files

But there are also differences between them:

	Data set	Data store
Supports unlimited data	❌	✔
Directly usable for analytics	✔	❌
Supports movement data	✔	✔
Supports movement path data	✔	❌
Supports time series data	✔	❌
Supports static data	✔	❌

Data set

Data store

Supports unlimited data

❌

✔

Directly usable for analytics

✔

❌

Supports movement data

✔

Supports movement path data

✔

❌

Supports time series data

✔

❌

Supports static data

✔

❌

The main benefit of a data store over a data set is that it supports an unlimited amount of data. For example, if you plan to use traffic data of a city over a time span of multiple years, you easily end up with multiple TBs of data and hundreds of billions of records.

This is too much data to store in a single data set, but it is supported by the data store.

The drawback of using a data store is that you cannot directly use it for analytics. Instead, you have to extract a data set from it before you can visualize and analyze the records.

When you extract the data set, you can define filters (spatial, temporal, property based) to define which data you want to extract. For example, you could ask for all trucks that traveled through a specific street in the last 2 years, or all the vehicles that traveled through the city in the last month.

The amount of data you can extract into a single data set has the same limit as when you would create a data set directly.

Got feedback? Additional questions? Just want to have a friendly chat? Get in touch!