Multinomial logistic regression for prediction of vulnerable road users risk injuries based on spatial and temporal assessment

Urban area’s rapid growth often leads to adverse effects such as traffic congestion and increasing accident risks due to the expansion in transportation systems. In the frame of smart cities, active modes are expected to be promoted to improve living conditions. To achieve this goal, it is necessary to reduce the number of vulnerable road users (VRUs) injuries. Considering injury severity levels from crashes involving VRUs, this article seeks spatial and temporal patterns between cities and presents a model to predict the likelihood of VRUs to be involved in a crash. Kernel Density Estimation was applied to identify blackspots based on injury severity levels. A Multinomial Logistic Regression model was developed to identify statistically significant variables to predict the occurrence of these crashes. Results show that target spatial and temporal variables influence the number and severity of crashes involving VRUs. This approach can help to enhance road safety policies. ARTICLE HISTORY Received 15 February 2019 Revised 16 May 2019 Accepted 15 July 2019


Introduction
Road crashes have a significant impact in society representing approximately 3% of the gross domestic product. Worldwide, 1.35 million people lost their lives every year in road crashes and between 20 and 50 million more people suffered nonfatal injuries (WHO, 2018). In 2015, 21% of the fatalities in the European Union's roads were pedestrians, while 8% were cyclists (European Commission, 2016). In Portugal, in 2017, there were 5096 pedestrians and 1918 cyclists injured representing 20 and 5% of the road fatalities, respectively (ANSR, 2017).
The strategic target for EU road safety for the period of 2011-2020 is to reduce the number of road deaths by half (European Commission, 2011). Vulnerable road users (VRUs) such as pedestrians and cyclists suffer severe consequences in collisions since they are unprotected (European Commission, 2018). In order to reduce the number of VRU crashes or, at least, the severity of injuries, besides different improvement solutions on the redesign of infrastructures, solutions for mobility in the framework of smart cities can be developed (such as information tools to improve safety by, for instance, identifying areas prone to risk).
It is founded that cities that invest in active modes, such as walking and cycling, are reducing traffic congestion, which consequently makes travel times more reliable, reduced delays, crashes, increased access to city facilities and services and reduced transportation costs (Alliance for Biking & Walking, 2014). Ensuring VRUs road safety is also a way of promoting active transport modes, representing health, environment and positive economic effects.
Several studies have investigated spatial (Dereli & Erdogan, 2017;Jia, Khadka, & Kim, 2018;Mohaymany, Shahri, & Mirbagheri, 2013;Soltani & Askari, 2017;Van Raemdonck & Macharis, 2014;K. Xie, Ozbay, & Yang, 2019) and temporal patterns (Weast, 2018), or even a combination of both (Bao, Liu, & Ukkusuri, 2019;Liu & Sharma, 2018;Ma, Chen, & Chen, 2017) in the analysis of road crashes. The recognition of hazardous areas-often called blackspots-is the initial step in the traffic safety analysis. A crash blackspot can be theoretically defined as any location that has a crash frequency significantly higher when compared with other areas (Van Raemdonck & Macharis, 2014). The spatial and temporal analysis of road crashes can also take into account the severity of the crash or the injury severity level. It was reported in the literature that considering the level of injury severity allowed to avoid potential statistical problems and can change the idea of what could be a dangerous road zone (Mannering & Bhat, 2014). Additionally, it is also important to point out that risk areas for crashes involving motor vehicles may present different characteristics when compared with areas of highest risk for VRUs (Wang, Huang, & Zeng, 2017) Recently, identification of spatial and temporal patterns among road crashes involving VRUs has become a hot research topic (Chimba, Musinguzi, & Kidando, 2018;Dozza, 2017;Loidl, Traun, & Wallentin, 2016;Lu, Mondschein, Buehler, & Hankey, 2018;Wang et al., 2017). This can be due to the fact that the number of VRUs fatalities and serious injuries has been growing, representing a challenge for both research and policymaking (Tiwari, 2018).
Regarding pedestrian crash patterns a study showed that the probability of severe injuries grows for older pedestrians, in males, rural areas, low-speed zones and with poor lighting (Senserrick, Boufous, de Rome, Ivers, & Stevenson, 2014). On the other hand, shopping and residential areas, pedestrians density are related to a reduction in pedestrian injury severity (Prato, Kaplan, Patrier, & Rasmussen, 2018). Regarding cyclists, studies revealed that urban roads and signal intersections density increase crash risk (Guo, Osama, & Sayed, 2018). Likewise, the presence of retail or service establishments, touristic attraction places and environmental factors (e.g. time information, pavement condition and weather) increase the risk of vehicle-bicycle collisions (Prati, Pietrantoni, & Fraboni, 2017). On the other hand, posted speed limit and older age of the cyclists are related to an increase of injury severity (Chen & Shen, 2016). Temporal correlations of crash reports revealed that pedestrian's fatalities occur especially in holiday periods and in November and December, while most cyclist fatalities occur in Summer or early Fall (Weast, 2018).
In the literature, predictive models have been developed for estimating the likelihood of VRUs to be involved in a crash. Logistic regression models were developed to analyse the significance of contributing factors of VRUs crashes (Damsere-Derry, Palk, & King, 2017;Useche, Montoro, Alonso, & Oviedo-Trespalacios, 2018;Yuan & Chen, 2017). Yuan and Chen (2017) revealed that night-time, road intersections, older age and vehicle high-speed increase crash severity between VRUs. A prediction model was developed based on series of intersection crash models for total, severe, pedestrians and cyclist crashes and showed that macro-variables are significant for a rigorous crash analysis (Lee, Abdel-Aty, & Cai, 2017). Multinomial logistic regression (MLR) models have also been developed and showed the effectiveness of the MLR approach in crash severity modelling (Abdulhafedh, 2017).
The research contribution of this article is to perform a spatial and temporal analysis of crashes involving VRUs considering the severity of injuries, in order to establish some pattern between cities with different specificities. A crash prediction model is also developed to identify the risk factors that can influence the severity of a VRU when involved in a motor vehicle crash. A database of pedestrian and bicycle crashes was evaluated, comparing cities with different population densities and areas. The predictive model and spatial analysis are macro-level based, and the blackspots are built on the density and severity of injuries. This work is built based on three steps: 1. To evaluate and perform spatial mapping of blackspots using geographic information system (GIS) techniques and statistical data analysis procedures on the study areas (taking into account level of injury severity); 2. To perform a temporal analysis using spider plots, which are often used to reflect the trend of influencing variables and comparing multidimensional patterns; 3. To formulate a crash prediction model based on Multinomial Logistic Regression to describe the probability of a crash involving a motor-vehicle and VRU. This is important not only to predict crash occurrences in specific blackspots but also to infer the severity of crashes involving VRUs.
This work intends to be a baseline study supported by a thorough analysis that can be used by policy and decision makers and road safety managers in order to recognize blackspots and address the most relevant variables that influence VRUs safety.

Methodology
A description of methods applied for blackspots identification and development of the predictive models is made. Afterward, case studies and the development of the crash database is described. The conceptual framework is presented in Figure 1.

Blackspots identification
To highlight areas prone to road crashes involving VRUs, geostatistical-based approach KDE was applied to obtain patterns based on the level of VRUs injury severity using the ArcGISV R software (ERSI -Environmental Sistem Research Institute, 2015).
KDE is one of the most commonly used methods and revealed to outperform other popular methods for spatial analysis of crashes and blackspots identification (Yu, Liu, Chen, & Wang, 2014). In this technique, each observation is covered by a kernel, yielding a circular cell-shape neighbourhood, with a maximum value at a reference point, decreasing to zero at radius (r ) distance from it.
In this study, we used the quartic kernel function (default in ArcGIS), which is one of the most commonly used functions. The density estimation using such function can be given as (1) where f ðx; yÞ is the density estimation for location ðx; yÞ , m is the number of observations and K is the kernel function defined as with d i being the i -th observation location and K a real coefficient. The choice of bandwidth r controls the smoothness of the estimated density (Z. Xie & Yan, 2008). Considering the data level in detail and the area of each city under study, different radius (bandwidths) were empirically examined, and smaller radius showed to be reasonably suitable to obtain an unsmooth density distribution, allowing smaller scale details. In this study, the radius was set to approximately 175 m.
In order to embed the injury severity level in the spatial data structure, a specific weight on each VRU injured observation was considered based on the severity index developed by Elvik (2008). This approach establishes different weights for the different severity levels: one for light injuries, three for serious injuries and five for fatalities.
KDE returned nine levels of risk, displayed in a range of grey shades from nearly white (Level 9: low-risk area) to black (Level 1: high-risk area), and Kernel density surfaces were derived for total injuries for each city.

Predictive modelling
MLR is a predictive analysis that is used to describe data and to explain the relationship between the nominal dependent variable and one or more independent variables. In this article, the response variable is categorical: it has two classes related to VRUs: pedestrian or cyclist. The set of predictor variables includes VRUs' gender and age, level of injury severity, weekday, time period and weather conditions. By using MLR, one can determine the strength of influence that a particular independent variable has upon the type of VRU involved in a crash. We assume that the pedestrian is the reference group since it is the class with more injuries. The statistical software SPSS was used (IBM Corp., 2016).
Considering an MLR model in which the response variable consists of k ! 2 categories, the probability of a given observation x belong to one of the groups y i can be estimated by where i ¼ 1; 2; . . . ; k À 1, x i is the i -th independent variable of the data set, and b i represent the estimated model. In particular, the MLR was performed considering a 95% confidence interval. Estimation of the parameters of these models was conducted using maximum likelihood procedures. The well-known deviance and Pearson chi-square tests were used as goodness-to-fit statistics to evaluate the model fit.
The Pseudo-R 2 statistic was used as a measure on how well the model can predict the dependent variable based on the independent variables. The used methods for computing this measure were Cox and Snell, Nagelkerke and McFadden, which are most often available in statistical software. Finally, a likelihood ratio test was performed for evaluating the effect of each of the parameters, providing the weight of each independent variable in the prediction model.

Development of a crash database
Crash data involving VRUs from three Portuguese cities (Aveiro, Porto and Lisbon) with different areas and sociodemographic characteristics (Table 1) were analyzed. Furthermore, a crash database was conceived. A total of 4439 VRUs-involving crash registrations from 2012 to 2015 were provided by ANSR (Portuguese Authority of Road Safety). About 4615 VRUs were injured in these crashes. About 87% of the injuries are related to motor vehiclepedestrians and 13% to motor vehicle-cyclists crashes. All cities present a percentage of light injuries between 90 and 97% of total injuries. Table 2 describes the distribution of the number of injuries per 10,000 inhabitants and per square kilometre for each city. The crashes database was built and the analysis was focused on the VRU injury severity level, which is subdivided into three classes: light injuries, serious injuries and fatalities. In order to have a representative sample with common characteristics, records with missing information or uninjured VRUs were removed from the dataset. According to ANSR, a crash victim is considered seriously injured if there is the need to be at least 24 h in a hospital; fatalities involve victims that do not survive within 30 days after the crash. After 30 days, the crashes' victims are not considered in the statistical analysis of the authority (ANSR, 2017).
The main attributes considered in the forthcoming analyses are: VRU age and injury severity level; Temporal variables: year, month, weekday, time of the day; Weather conditions: good, bad (adverse weather conditions, e.g. rain, fog, snow).
A more specific analysis is developed focusing on the most severe consequences (severe injuries and fatalities) in order to identify patterns between them. This analysis considered the attributes mentioned above, and a detailed look is given to the following ones: VRU gender (male, female) and presence of the most vulnerable age groups (14 or younger and 65 or older); Type of road location (segment, intersection and others-e.g. parking areas, open land, private road); Built environment (area characterization: shopping, touristic, educational, health, residential, industrial, services and agriculture).

Results
This section presents the results of the spatial and temporal analysis, in an attempt to discover spatial and temporal patterns among the severely injured and fatal VRUs. Finally, results on multinomial logic models are discussed.

Spatial analysis
Crashes involving VRUs were georeferenced, and an injury attribute was used to generate spatial maps. Figure 2 illustrates the geographic distribution of crashes resulting in VRUs injuries by the level of severity, highlighting blackspots on each city. KDE was applied to analyze the spatial distribution of motor vehicle VRU crashes.
Regarding Aveiro, the main blackspot is inside the city centre, including a shopping area and one of the main innercity connection roads. Porto blackspots were identified in urban and historical centres, in places involving a high number of tourist points, churches, stores and train station. Lisbon blackspots are mostly in urban and historical centres, close to touristic points, but also around governmental institutions. Additionally, in a second level, there is a blackspot covering a train station and a hospital. Results suggest blackspots in areas that attract many people, and although it can be thought as expected, the truth is that this is a compelling situation since in these specific areas vehicles running speed should be low (speed limit equal or less than 50 km/h).

Annual evolution
Considering an annual evolution, Aveiro is the only city presenting a decrease of 2% regarding pedestrians' injuries.
Regarding cyclists, Aveiro presents an annual growth rate of 11%. Porto presents an annual growth rate of 2% of pedestrians' injuries and 15% of cyclists' injuries. The annual growth rate in Lisbon is 4% of pedestrians' and 15% of cyclists' injuries. A closer look regarding pedestrian's number of injuries shows Lisbon with an increase over the years and Aveiro and Porto with a decrease in 2015. Considering injured cyclists, the proportion of this class is higher in Aveiro, followed by Porto, and Lisbon. Decrease of cyclists injured in 2015 can be explained by the implementation of new legislation in 2014, the driver of the motor vehicle is required to leave a minimum lateral distance of 1.5 m between the vehicle and the bicycle. Figure 3 shows the monthly evolution of VRUs injured for the cities and years under study considering the level of injury severity (light and serious injuries and fatalities). Aveiro has the most evident fluctuation during October and November. August, which typically is a vacation month, presented a lower number of injuries, particularly for Porto and Lisbon. Regarding Porto, VRUs injuries' peak is in September, then decreasing until November. Concerning Lisbon data, the highest number of VRUs injuries occurs in December and October. Considering a global analysis of the three cities, November, October and September are the months with a higher number of VRUs injuries with around 1.5 injury/1000 inhabitants. This fact can be explained for the uncertain weather conditions between October and November (this finding was also verified by Weast, 2018), and September can also be explained since is a month of restart work/school activities which can generate differences on traffic movements patterns. Table 3 presents the distribution of VRU injuries for a weekday. Regarding Aveiro, 20% of VRUs injuries occur on Thursdays and 18% during the weekend. In Porto, weekends still reach a lower percentage of injuries, while the riskiest day seems to be Friday with 18% of injuries. Focusing on Lisbon crashes involving VRUs injuries occur mostly on Thursdays (18%) and Fridays (17%) and weekends represent 17% of injuries. An overview of the three cities allows us to conclude that Thursday and Friday are the weekdays with more injuries (35% of the total; 2.7 and 2.5 injuries/1000 inhabitants, respectively). Figure 4 shows the distribution of vehicle-pedestrian and -cyclist crashes for different hours during weekdays. Spider plots for Aveiro suggest most vehicle-pedestrian crashes occur on Monday with peak hours at 8 am and 5 pm, while for cyclists most critical days are Thursday, at 1 pm and Mondays at 8am. For pedestrians, peak hours for Porto seem to be 6 pm on Mondays, 5 pm and 7 pm on Fridays and 9am and 6 pm on Wednesdays. Regarding cyclists, peaks are clearly at 7 pm on Thursdays, and there is also a peak at 4 pm on Wednesdays. In Lisbon, Thursdays have peak hours in terms of crashes between 8 am and 9 am, and 5 pm to 6 pm, while on Wednesdays, peaks are at 9am and 11am, and between 5 pm and 6 pm. For cyclists, Thursday at 9am and 7 pm, Saturday at 11am, Tuesday at 4 pm and Monday at 6 pm represent the most critical time periods. Results of the three cities revealed that the worst hours considering pedestrians injuries are between 5 pm to 7 pm and for cyclist at 1 pm and, as for pedestrians, from 5 pm to 7 pm. This is explained by VRUs daily routines and peak hours with higher traffic volumes.

Monthly, weekly and hourly distributions
Hourly distribution by age group Figure 5 presents the distribution of injuries according to different age groups. Specifically, in Aveiro, school age group of pedestrians (<15 year old) reveals a peak in crashes at 8am and most crashes occur after 4 pm for pedestrians between 18 and 49 years of age. For pedestrians with 65þ years, 8 am and 10-11am are critical periods. Cyclists between 25 and 49 years are involved in a higher number of crashes at 10 am, 1 pm and 7 pm. For older cyclists, 11am is a critical hour. Patterns for pedestrian-involving crashes can be visualized at 8 am for school age, between 5 pm and 8 pm for 18-24 and 25-49 age groups, during the morning for 65þ. Part of the reason for these results may be due to the existence of schools, University and hospital close to the city centre, which involve many daily trips. Comparing to Aveiro, distinct patterns can be pointed in Porto. Pedestrians of 65þ presented the most significant number of injuries and have a first high-risk time at 9 am, as well as the working age group (25-64). However, other critical times for older pedestrians are 11am and 6 pm, while the working age group has a peak between 5 pm and 7 pm. Many vehicle-cyclist crashes involve working age group, with peaks between 9am and 11am, 3 pm and 4 pm and also at 7 pm. These findings suggest that off-peak traffic hours are also important in crashes involving older pedestrians since their daily routines are not restricted to working hours. In Lisbon, most of the crashes involve pedestrians on working age group and 65þ groups, with morning peaks between 8 am and 9am, and 8 am and 11 am, respectively. During afternoon, working age group has peaks in terms of crashes around lunch time and between 4 and 8 pm, and older pedestrians present a peak at 5 pm. For cyclists between 25 and 49 years of age, there are more crashes at 8 am and 12 am, as well as between 4 and 7 pm, which can be explained by the existence of schools, general services and offices in the city centre.

Analysis of severely injured and fatal occurrences
A closer look to severely injured and fatal occurrences is given considering the importance to find patterns between the most severe consequences.
Aveiro presents a higher proportion in the number of serious injuries and fatalities (11% considering pedestrians' injuries and 9% considering cyclists injuries). Regarding Porto and Lisbon, these percentages are for pedestrians' injuries 3% and 7%, respectively. Regarding cyclists injuries, Porto did not present any serious or fatal crash, and Lisbon presented a percentage of 5%.
Monthly and weekly distribution Month evolution (Figure 3) revealed that May and February are the months with the highest serious injuries and fatalities, respectively, for the city of Aveiro. For Porto, February has the highest number of serious injuries, while September, November and December present most fatalities. In Lisbon, serious injuries occur in May and January recorded the highest number of fatalities. As a general overview, January and February seem to be the months with more fatalities among all cities.
Regarding the distribution of VRU injuries for weekdays (Table 3), Aveiro presents the most severe injuries (serious and fatal) on Mondays. For Porto, Wednesday and Thursday present the huge percentage of serious injuries and fatalities (4%). Lisbon, Sunday present the highest percentage (9%) of severe injuries. A general overview highlights, Thursday and Friday, the most critical weekdays in what concern high VRUs injury severity levels.

Distribution by gender and vulnerable age groups
Considering VRUs gender results show that in Aveiro and Lisbon almost 80 and 90%, respectively, are male cyclists. The trend concerning male pedestrians is more balanced in Porto and Lisbon, representing 54 and 57%, respectively, while in Aveiro more than 70% of the injuries occur with female pedestrians.
Regarding the most vulnerable age groups, results show that 7% of the injured pedestrians in Porto and Lisbon are children, while in Aveiro this percentage rises to 28%. Half of the elderly pedestrians involved in crashes in Porto are severely injured or fatalities, while in Lisbon such value drops to 38%. Aveiro presented the smallest percentage (17%). Considering cyclists, Aveiro presented a massive 43% of elderly cyclists between the severely injured and fatalities, while Lisbon present 6%. Aveiro does not present any severely injured or dead child cyclist, while Lisbon presents almost 20% of cyclists severely injured or dead.
A closer look on some road specificities allowed to conclude that more than 40% of the pedestrian-motor vehicle crashes occur in the presence of crosswalks; in particular, Porto presented the worst scenario with almost 60%. Porto presents the highest percentage of crashes occurring in the presence of traffic lights (almost 40%), followed by Lisbon with 25%, while Aveiro presented the smallest percentage (3%). None of the records involving severely injured and dead cyclists describe occurrences close to cycle lanes. However, 8% of the total crashes involving light injured cyclists occurred close to these facilities. Figure 8 illustrates the relative proportion of severely injured and dead VRUs for each particular city, taking into account aspects of the built environment. Regarding the built environment, results show that Aveiro and Lisbon present the highest number in residential areas (34% and 40%, respectively). There are also relevant percentages of crashes in agricultural and industrial areas in Aveiro (19% and 16%,respectively); this can be explained by the higher speed limit (in the roads close to industrial areas) and the lack of sidewalks in agricultural areas. On the other hand, Lisbon areas have a higher exposure to VRUs in touristic (23%) and service (17%) areas. Porto presents a quite different daunting trend with 28% of the most severe crashes occurring in educational areas, 24% in residential areas and 15% in touristic points. Shopping areas can be associated with 6-8% of the severely injured and dead VRUs for all cities.

Multivariate model analysis
In this section, MLR models involving vehicle-VRU crashes for each city are presented.
Results suggest appropriate fits to the models and show that the variables added to the model are statistically significantly (Sig. < 0.05) and improve the model for each city (i.e. the obtained models significantly predict the response variable)- Table 4. Results on the quality of fitting the data      ( Table 5) reveal that the model fits well the data for Porto and Lisbon, while for Aveiro the results of both measures of goodness-to-fit are different, yet Pearson chi-square statistic shows the model fits the data as well. The pseudo-R 2 statistics (Table 6) permits to assess the predictive strength of the obtained MLR models. The best measure is obtained for the Nagelkerke's R2: 37, 39 and 29% for Aveiro, Porto and Lisbon, respectively, meaning that the obtained models are able to explain these percentages of data variability. The multinomial logistic regression model was developed to predict the probability of the VRU being a cyclist. Table 7 presents the coefficients (B) of the significant variables and the reference variable for each model. Considering male gender as a reference variable, it can be observed that female gender is one of the variables that reduce the probability of the VRU injury be cyclist (negative value of coefficient -B). Age group between 25 and 49 years old (for the three models) and all the age groups for Porto and Lisbon positively affect the probability of a cyclist to be injured in a crash. Similar conclusions can be observed for the remaining variables- Table 7. Finally, regarding statistics related to the model parameters, VRU gender, age group and weather conditions are statistically significant for all models- Table  8. While VRU gender presents a negative effect; age group and weather conditions present a positive effect.

Conclusion
This article presented a spatial and temporal analysis of crashes involving motor vehicles and VRUs considering the severity of injuries, in an attempt to highlight some patterns between cities of different specificities. Moreover, assessing which factors can influence the level of injury severity of a VRU was also reported.
The main findings allowed to conclude that most injuries occur in surrounding areas of high attraction places, such as train stations, shopping and touristic points, where speed limits are relatively low. Intersections are the singularity type with more serious injuries and fatal cyclists, while road segments seem to be more dangerous for pedestrians. More than 40% of the pedestrian crashes occur in the vicinity of crosswalks. A general overview of the built environment allowed to conclude that areas with more impact are residential, educational and touristic zones. In a medium-sized city, as Aveiro, pedestrians from the active age and female groups are the most vulnerable. However, for Porto and Lisbon, older adults are the most vulnerable both in injury number and in severity; in these cities, cyclists in the active age are more likely to be involved in a crash. The developed MLR models for each city revealed that VRU gender and age, as well as weather conditions, are statistically significant variables to predict this type of crashes.
Despite the findings achieved so far, some limitations should be taken into account and addressed in future research. First, data of pedestrians and cyclists' exposure by age and gender could give a better perspective of the result. Secondly, a micro level study can be attempted with additional information related to vehicle details, road characteristics and driver profile information.
This work intends to be useful for policy and decision makers, as well as road safety managers, in order to recognize blackspots and improve VRUs safety. This is even more important in an era where driverless vehicles are about to be implemented, and the way they will circulate in the urban space and interact with VRUs is of utmost importance.