By now, your data download is hopefully finished, and it is time to upload the taxi data to the platform.
Click the Upload Data button in the top navigation bar:
The platform supports both
.csv files and compressed (=gzipped) files
If you are dealing with a large CSV file, the upload might take some time. To reduce this time, you can use gzip compression to first compress the file locally and then upload the compressed file.
On Windows, you can use 7-Zip to create gzip compressed files.
On Linux and macOS, you can use the
gzip command on the command line
In case you need to upload multiple files and you want to compress them, you will have to gzip them one by one.
Since the file downloaded at the start of this tutorial is already compressed, you can just upload it directly.
Now that you have your
.csv files, or the compressed
.csv.gz files, you can drag-and-drop them onto the data upload area.
An alternative to drag-and-drop is clicking the area, which will show a file chooser where you can select your files.
You can drag-and-drop multiple files in one go
If you have multiple files to upload, you can select them all and drag-and-drop them in one go.
While your files are being uploaded, they will be listed in the Files Being Uploaded area.
Once the file is uploaded, it will appear in the Data Files area. Here you can see which of your files are already processed, which are still queued, and access the processing log for each file by clicking on the Status link.
The Processing Jobs area shows the different processing jobs that have ran for that data set. For example, if you would change the processing settings afterwards, a new job will be started, and you would see multiple entries here.
At first, a job will be queued until a processing node becomes available. Once the processing starts, the Job status will be updated.
You can navigate away from this page at any time
Once you have drag-and-dropped the files you want to upload, you can safely navigate away from this page and continue using the platform while your upload finishes in the background.
The only requirement is that you keep the tab with the platform open.
When working with large or many
.csv files, processing can take some time.
Each uploaded file will be processed and at intermediate steps the results will be persisted to the cloud storage.
You can follow the status of the individual files as discussed above. When files have a status Uploaded or Processing, the data set is not ready. In addition, after handling the files, the persisting will take some time to finish.
To know if your data set is entirely processed, follow these steps:
Navigate to the Data sets page by clicking on Data sets in the navigation bar.
Click on YOUR DATA SETS (…) and scroll to or search for the data set you are looking for.
The status of the data set is indicated in the lower left corner of the card:
Queued means that a processing job is queued for the data set, and the data set is not ready.
Processing means that data is still being processed, and the data set is not ready.
All data processed means that the data set is ready.
Go to the next part: Step 6: Use your data set