2.2.1 Data set construction and covariates used in our model
The association between exogenous time-varying covariates and the risk of an event can be studied using the time-varying Cox model. To fit this model, we use the counting process notation that utilises the intervals created by the time points (Age) at which the covariate was recorded.
The dataset for a time-varying covariate model is constructed by organising the data based on a company identifier and a time-related variable (Age). This arrangement groups the data by the identifier and orders the observations chronologically.
To create the time-dependent dataset, the data is grouped by the identifier and the time-related variable is used to establish a sequence of time intervals. For each observation, a start- and an end-time are defined based on the chronological order of the time-related variable (Age).
The dataset includes various covariates related to the analysis, such as financial ratios, economic indicators, and other relevant factors. The time intervals and covariates are used to create a time-dependent survival object with an event variable (Default) to indicate whether a specific event occurred during the time interval.
This survival object is used to fit a Cox hazards model with time-dependent covariates. The model includes various covariates that change over time, allowing it to account for the dynamic nature of the data and its relationship with the event of interest.
By incorporating time-varying covariates, the model can better capture changes in the explanatory variables over time and their impact on the outcome, providing a more accurate assessment of risk and prediction of the event.
In our model, we used historical data derived from Scientific Infra & Private Asset’s internal database, which consists of more than 8,000 annual observations of default/survival events and explanatory variables captured between 1988 and 2023.