Step 0: Obtain data

The data that we will be using in this tutorial is US Accidents data from 2016 until 2023. This data contains the location and timestamp as well as the severity and the weather condition at the time of accident.

The data comes from kaggle website with US Accidents data and was provided in following publications:

  • Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic Accident Dataset.”, 2019.

  • Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. "Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights." In proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019.

The file on above website contains 46 properties. For convenience we have reduced the number of properties for the tutorial, you can download the prepared file here.

While the data downloads, you can already continue with the next part where you will create the new data set.

Next part

Go to the next part: Step 1: Create data set