Filtering data is a requirement to gain insights into large datasets.
The xyzt.ai platform offers two ways to filter your data:
Use the UI to create basic filters
Use the query language to create more advanced filters.
This article introduces the query language and explains the different parts of the advanced filters.
You can filter your (movement) data in the platform through a multistep process that we call the filter pipeline.
You start from the data uploaded to the platform, and in each step of the pipeline you further refine the data you want to retain.
The records and asset retained by step x
are the input for step x+1
in the pipeline.
The first step of the filter pipeline is an asset filter. Such a filter decides to retain or keep all the records of the assets, based on a metadata property or the duration of the asset:
For example:
Only retain the records of the assets of type car, and filter out the ones of type truck and scooter
Only retain the records of the assets with id 42 and 69, and filter out all other assets
Only retain the assets that have records that span at least 1 hour in time
Only retain the assets that have their first record inside an area
At the end of this step, you end up with a subset of the original assets. For all the retained assets, you also still retain all the records of the asset.
The second step of the filter pipeline is an asset route filter. This filter retains only the assets that travel along a specific route, and match specific properties during their travel.
You can for example retain only the assets that traveled between Brussels and Leuven on a Monday morning between 8 and 10 AM on the freeway, and did so with a speed of over 100 km/h.
Further, the filter let you choose:
To only retain the records that match this route (the Brussels to Leuven part in the above example), or
Retain all the records of the assets that match this route
The result of this filter is a continuous set of records matching the described route for each matching asset.
If a specific asset travelled twice between Brussels and Leuven while matching all the other conditions, the result of the filter would be 2 sets of continuous records. One for the first match and one for the second match.
In the above picture there are 2 sets of records for asset 1 that are passed to the next step in the filter pipeline. Also note how the input of the asset route filter is the output of the previous step.
The third step in the filter pipeline is again an asset filter, just like in the first step.
However, this time it works on each of the continuous parts of the asset, returned by the previous asset route filter. If the asset route filter returned only parts of the asset, this filter can retain or remove some of those parts.
To continue on the example of the asset route filtering, this filter can be used to remove assets that took over an hour to travel from Brussels to Leuven.
In the fourth and last step, you can no longer filter on the whole asset. This filter will be used to evaluate each individual record that is still present in the filtered data, and only retain the records that match the filter.
For example:
Only retain records when the car was travelling between 70 and 80 km/h
Only retain records where the ship has a heading between 30 and 80 degrees
…
The query language is a way to express a filter pipeline in text. It looks very similar to formulas that you would use in Microsoft Excel or Google Sheets.
For example, if you would want to show only the records where the speed property has a value between 0 and 50 km/h, you would use
BETWEEN(PROP("speed"), 0, 50)
The PROP("speed")
extracts the speed property from a record, and with the BETWEEN
function you check that this value lies between 0 and 50.
In some cases, functions can also be combined into a new function. For example if you want to retain the records of cars with a speed between 0 and 50 km/h of the brand BMW or Audi, you would use
AND(
BETWEEN(PROP("speed"), 0, 50),
OR(
EQ(PROP("brand"), "Audi"),
EQ(PROP("brand"), "BMW")
)
)
Again, the PROP
function is used to extract the brand property.
Since we want to express that the brand can be either Audi or BMW, we combine them using the OR
function.
And since we want to only retain the properties that match both the brand condition and the speed condition, we combine those using the AND
function.
We’ll show some more example queries in the remainder of this article. Consult the reference article for an overview of all available functions.
The asset filters allow to filter assets on their metadata properties (=properties which are constant for the asset) or on their duration. It also allows to filter on the properties, locations and/or times of the first and last record of the asset.
For example:
Only retain assets with a specific id
Only retain assets where the property vehicle type has value "truck"
Only retain assets with their first record inside an area
…
Each of those filters can be combined using AND
, OR
and NOT
functions into a new asset filtering function.
An example asset filter that only selects the trucks that start from the "parking" area:
ASSET(
AND(
EQ(START_PROP("type"), "truck"),
INSIDE(START_LOC(), AREA(name="parking"))
)
)
In this example, we assume there is a metadata property "type".
The record based filtering allows to filter each record based on its properties, location and time.
For example:
Only retain records where the speed property has value between 0 and 50 km/h.
Only retain records where the brand name property has value "BMW"
Only retain records that happened on a Monday between 08 and 10 AM
Only retain records that are located in a specific area
…
Each of those filters can be combined using AND
, OR
and NOT
functions into a new record filtering function.
This allows to express more complicated cases, like for example showing only the records located in the Brussels or Leuven area between 08 and 10 AM where the speed is over 50 km/h.
An asset route filter defines a route that an asset should follow, as well as the properties the asset should have along the route. Only continuous parts of the asset route that match the filter will be retained by the filter.
In the following example, we’re going to build up an asset route filter that shows all traffic that travels from Brussels to Leuven.
In the example, we will be using the following data.
We have 3 assets, each with their own collection of records. For each asset, assume the records are sorted in time from left to right.
We also have 2 areas, Brussels and Leuven.
When creating an asset route filter, you have to describe all the states (area + properties) the asset should follow. The evaluator of the filter then tries to find a continuous set of records of the asset that matches that route.
In our example, we’ll start with an asset route filter that only contains the destination state:
ASSET_ROUTE(
INSIDE(LOC(), AREA(name="Leuven"))
)
This filter selects the part of each asset that is located inside the Leuven area. This is illustrated in the picture, where the retained records are drawn filled. The records that are only outlined are rejected by the filter.
What is important to note is that for asset 3, the filter will select 2 continuous parts of the asset. The first 2 records in the Leuven area are 1 part. The third record inside the Leuven however forms another part of the asset due to the record that is not contained in the Leuven area that lies in between.
While this might not be relevant for this very simple asset route, it will become relevant if we make the routes more complex or when you combine it in a filter pipeline that filters on duration afterwards.
Now let’s add the origin to the route. The order of appearance in the route is also the order in which the asset should visit the locations. So by specifying Brussels before Leuven, we make Brussels the origin and Leuven the destination.
ASSET_ROUTE(
INSIDE(LOC(), AREA(name="Brussels")),
INSIDE(LOC(), AREA(name="Leuven"))
)
This filter has a problem. It will only retain asset 2 and not asset 1.
While asset 1 travels from Brussels to Leuven, it also has recorded records between Brussels and Leuven. And our filter doesn’t specify that the asset can have records between Brussels and Leuven.
As we want to select all assets that travel from Brussels to Leuven, and have no requirements about what the asset does between Brussels and Leuven, we will use the ANY
function:
ASSET_ROUTE(
INSIDE(LOC(), AREA(name="Brussels")),
ANY(mandatory=false),
INSIDE(LOC(), AREA(name="Leuven"))
)
This will have the desired result:
Note how we used ANY(mandatory=false)
.
This specifies that the asset is allowed to have records between Brussels and Leuven, but doesn’t need to.
If we had used ANY(mandatory=true)
, asset 2 would have been refused by the filter.
The asset route filter can combine the location filtering with additional property or time based filtering.
Let’s start from the following dataset with a single asset:
We start with an asset route filter that filters assets that leave Brussels, and selects the part starting in Brussels until the end.
ASSET_ROUTE(
INSIDE(LOC(), AREA(name="Brussels")),
NOT(INSIDE(LOC(), AREA(name="Brussels")))
)
This retains the following part of the asset:
For the first part of the route, we’ll now combine the INSIDE
filter with a restriction on the speed.
We are only interested in the asset part in Brussels where the asset travels at a speed below 50 km/h:
ASSET_ROUTE(
AND(
INSIDE(LOC(), AREA(name="Brussels")),
BETWEEN(PROP("speed"), 0, 50)
),
NOT(INSIDE(LOC(), AREA(name="Brussels")))
)
This selects the following part of the asset:
Since we only retain continuous parts of the asset, the first point inside the Brussels area is not retained. That point alone is not a match for the whole asset route as the asset route requires at least 2 points (one inside Brussels with the correct speed and one outside Brussels).
Note how the result is also different from creating a pipeline where you would first do the asset route filtering without property filters, and then use a record filtering step.
If you create such a filter pipeline (where the pipe character |
is used to append the record filtering step after the asset route filtering):
ASSET_ROUTE(
INSIDE(LOC(), AREA(name="Brussels")),
NOT(INSIDE(LOC(), AREA(name="Brussels")))
) |
BETWEEN(PROP("speed"), 0, 50)
the asset route filter would retain the following records
after which the individual records are filtered against the speed criterion and only the following records are retained
There is a separate article with example asset route filters.
The asset route filter is greedy, meaning it will select the longest matching continuous parts of the asset.
For example, if you have an asset that:
Starts in Leuven
Travels to Brussels
Leaves Brussels again to travel to other places
Finishes in Brussels
and you use an asset route filter
ASSET_ROUTE(
INSIDE(LOC(), AREA(name="Leuven")),
ANY(),
INSIDE(LOC(), AREA(name="Brussels"))
)
the whole asset trajectory will be selected, and not just the first Leuven to Brussels part.
If in the above example you only want to select the first Leuven to Brussels part, you would have to use the following filter:
ASSET_ROUTE(
INSIDE(LOC(), AREA(name="Leuven")),
NOT(INSIDE(LOC(), AREA(name="Brussels"))),
INSIDE(LOC(), AREA(name="Brussels"))
There are some edge-cases where the asset route filtering might not return the results you expect. To correctly interpret those results, you will need some extra insight and understanding in how the filter evaluation works.
Let us start from the following data:
and an asset route filter that filters on both the speed and the areas:
ASSET_ROUTE(
AND(
INSIDE(LOC(), AREA(name="Brussels")),
BETWEEN(PROP("speed"), 0, 50)
),
ANY(mandatory=true),
INSIDE(LOC(), AREA(name="Leuven"))
)
The first internal step when evaluating that asset route filter is to extract all properties, times and areas which are used in the asset route. In this case this would be the areas Brussels and Leuven as well as the speed property.
For each record, we then check what the result of each individual filter is and reduce the list of records to a list of states. Only when the result of one of the filters changes, we add a new state to the list. This allows us to only look at state transitions which significantly improves the performance of evaluating filters.
In the above example, we will have a state transition in the list each time:
The speed of the asset crosses the 50 km/h threshold
The asset enters or leaves the Brussels area
The asset enters or leaves the Leuven area
In the next step, we search for all the longest matches of the asset route described in the filter.
A consequence of this is that in some cases assets which could be a match aren’t found. The following is an example of such a case.
Let’s use an asset of which all records are contained in the Brussels area
This asset wouldn’t be a match for the following filter:
ASSET_ROUTE(
INSIDE(LOC(), AREA(name="Brussels")),
ANY(mandatory=true)
)
As the filter only contains a filter on the Brussels area and all records are located in that area, the list of state transitions will only contain a single state. However, the asset route describes a route consisting out of 2 states, hence it won’t match.