Data upload overview

Overview

Uploading your movement data files is a multistep process:

Create a data set: a data set is the resource in which the platform saves your data
Define the structure of your data: the platform needs to know how your data files will be structured. For example where the timestamps, longitude and latitude values are stored and which other properties are important.
Define the processing settings: before the platform can process your data, you need to tell how you want your data to be processed.
Upload data: now that the platform has all the information, you can start uploading your data and the platform engine will process it.
Optionally, upload metadata: sometimes you not only have files with location data (positions of cars, vessels, persons, …) but also additional metadata on those assets (the brand of the car, the name of the vessel, …) that you want to include in the analysis. These files can also be uploaded to the platform.

New to data uploads ? Follow the tutorial !

There is a tutorial available where you go through the whole process of uploading your movement data to the platform. That is the best starting point for people trying to upload data for the first time.

The article you are currently reading and the articles it refers to provide more in-depth information compared to the tutorial, but they won’t guide you through the whole process.

Supported file types

The platform supports the following file formats for movement data sets:

Data files: .csv and .parquet files.
Metadata files: .csv and .parquet files.

Creating a movement data set

Before you can upload your files to a data set containing movement data, you need to create an empty data set. How to do this is explained in this article.

Defining the structure of your data

The platform needs to know the structure of the files you are going to upload.

The data files only contain data, but don’t tell you anything about the format of the data. They don’t specify:

Which columns contain the location information
Which column contains the timestamp
Which columns contain properties you are interested in
Which columns contain information that can be ignored
CSV files don’t even contain information about how the data is stored: are it numbers or are it strings ?
…

You will have to provide this information so that the platform knows how to interpret the data in the files. This is explained in the Configure data properties article.

Defining the processing settings

You can enable or disable each data representation in the processing settings. Some of them have additional processing settings, which are discussed below.

More information on each of these data representations and their use-cases is available in this article.

Gridded: spatio-temporal resolution

To keep the platform responsive during analysis of your data, your original data gets processed.

One of the steps during processing is to divide the whole spatial and temporal range of your data set into different bins, and aggregate your data records in those bins.

The ideal size of those bins depends on both your accuracy needs during analysis and the accuracy with which the location data was recorded.

More details on these bins and how to configure them is available in the Configure the spatio-temporal resolution article.

Trajectories: distance and duration threshold

In addition to the default gridded data representation, you can also choose to have your data available as trajectories. Trajectories are consecutive sequences of location records that belong together. Think of them as space-time lines. An example is showing the trajectory of a ship by connecting all the recorded positions, sorted by their timestamp, with a line.

There might be cases where you don’t want two successive records to be connected; you can control this via the distance and duration thresholds.

More details on these thresholds and how to configure them is available in the Configure the distance and duration thresholds article.

Realtime

When enabled, the processing engine will store for each car, vessel, etc. the last seen location together with its properties.

The realtime representation has no additional processing settings.

Uploading your data files

Once everything is configured, uploading data is as straightforward as drag-and-dropping your files onto a widget. The data upload will start, and when the platform has received the files it will start processing them.

The upload data and metadata article provides more details on this.

Optionally: adding metadata

Sometimes you have data which is the same for each record and which is stored in a separate file.

For example your location data file contains the location records of a ship:

Ship identifier	Longitude	Latitude	Timestamp
Ship 1	10.0	15.0	2020-01-05’T'15:15:15
Ship 1	10.1	15.1	2020-01-05’T'20:15:15
Ship 2	20.0	20.0	2020-01-05’T'16:00:27

and in another file, you have some properties about each ship that remain constant over time

Ship Identifier	Ship Length	Ship Name
Ship 1	15	Jolly Roger
Ship 2	23	Santa Maria

Ship Identifier

Ship Length

Ship Name

Ship 1

15

Jolly Roger

Ship 2

23

Santa Maria

We call this second type of file (containing the info that remains constant over time) metadata. This metadata can be uploaded in a similar way to the location data.

The upload data and metadata article provides more details on how to upload metadata, and the data versus metadata article has a closer look at the differences between data and metadata.