2.2.2 Data
We aim to estimate carbon emissions for approximately 800 assets that are part of infraMetrics Unlisted Infrastructure Universe. We build our models based on the most relevant variables that can serve as proxies for assets’ carbon emissions. We identify these variables by reviewing the latest literature and research, consulting experts, including sustainability report disclosures, and running correlation tests with reported emissions.
Based on data availability and model performance (when comparing to reported emissions), we establish either activity-based (ABM) or regression-based (RBM) models. Overall, we use the following types of data:
Reported emissions on company or asset level: Financial or sustainability reports can include differentiated information on Scope 1, 2, and 3 emissions or focus on selective (i.e., only Scope 1) or combined (i.e., the sum of Scope 1 and 2) emissions.
Reports from asset-owning companies to understand assets’ levels of operations and production: These reports usually aggregate sources of emissions.
Commercial datasets provide asset class characteristics or model-specific information, for example, geolocation, capacities (e.g., power plant generation capacity), output production (such as electricity), or volumes of operations (e.g., air traffic data).
The TICCS classification standard of infrastructure assets and our in-house expertise on technologies, types of operations, and main sources of emissions of infrastructure classes.
infraMetrics’ financial key values on asset level.
Websites of companies, Wikipedia, and other public information sources when deemed reliable.
Data for Regression-Based Models
The most significant data for RBMs are reported emissions at the asset level. We web-scrape or manually extract all publicly available emissions data from sustainability reports for infrastructure assets across all sectors. In some cases, reported emissions are not available at the asset but aggregated at the company level (combined emissions for several assets). In such cases – especially within the superclass of distribution networks (IC80) – we built RBMs at the company level.
Furthermore, a company may not report emissions for every year. In these cases, we use revenue as a proxy for emissions. For example, if we know the emissions of year N and revenue of years N and N+1, we estimate emissions for year N+1 based on the same ratio of revenue to reported emissions in year N. These interpolations work in both directions – forward and backward in time – and only concern a small number of assets.
Finally, assets might report their emissions in an aggregated manner (combined emissions for Scope 1, 2, and 3). In these rare cases, we excluded the report from our database.
Besides reported emissions, the RBMs include asset-level characteristics that correlate well with reported Scope 1, 2, and 3 emissions. This allows us to estimate carbon emissions on an annual basis for those assets that do not report emissions but share similar characteristics.
Data for Activity-Based Models
If the available data does not allow us to build statistical regression models, we can use EFs and CFs, combined with other physical quantities, to build ABMs. ABMs use these alternative sources of information to convert an asset’s level of activity (e.g., how much electricity a power plant produces annually or can produce based on its capacity) into carbon emissions. EFs define the rate at which an activity releases emissions (e.g., how much kilogram CO2 is released per kWh of electricity produced by a coal power plant). They can also relate to other types of activities outside the power sector. CFs define at what capacity a power plant is running and serve as a proxy for the actual power generation.
Hence, ABMs require information on assets’ operation, consumption, or production levels that are in a direct relationship with emissions. Again, a company may provide information on an irregular basis, at the company level (combining several assets), or at an aggregate level (combining different sources and activities). Our models assume that physical characteristics, like the length of a road or the capacity of a power plant, remain unchanged over time (for a timeframe of several years). This assumption is valid, considering infrastructures’ long life cycles. Furthermore, we use infraMetrics data (specifically assets’ revenues) to control an asset’s activity period and provide financial data necessary to the models.