Skip to main navigation Skip to search Skip to main content

A data-centric investigation on the challenges of machine learning methods for bridging life cycle inventory data gaps

  • Bu Zhao
  • , Jitong Jiang
  • , Ming Xu
  • , Qingshi Tu

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Life cycle assessment (LCA) is a systematic approach to quantify the environmental impacts of a product system from its entire life cycle. Despite its wide use in assessing mature technologies, the inventory data gap has been a fundamental challenge that limits the application of LCA to emerging new processes. Machine learning (ML) methods are among the possible solutions that can mitigate these data gaps in an automated and scalable way. Nonetheless, the performance of existing ML methods is unstable which limits the trustworthiness and generalizability of the models. In this study, we conducted a data-centric investigation to delineate the causes of the unstable performance using a similarity-based ML framework based on Ecoinvent 3.1 unit process (UPR) database. We found that the pattern of imbalance in the data for method development, manifest by the substantial differences in (1) flow and process availability and (2) the order of magnitude of their values, is a major cause of the unstable performance. We also identified the causes due to the challenges with ML method development workflow, particularly, the steps of data preprocessing, and ML model training (e.g., randomness in train–test data splits). In addition, we also tested the proposed ML method on the U.S. Life Cycle Inventory Database, where we observed that the generalizability of the method was highly influenced by the database size of the application. To address these issues, we proposed that further research should focus on reducing the barriers in database integration such that both the size and balance of the data for ML method development can be improved.

Original languageEnglish
Pages (from-to)955-966
Number of pages12
JournalJournal of Industrial Ecology
Volume29
Issue number3
DOIs
StatePublished - Jun 2025

Keywords

  • data centric
  • data gap
  • industrial ecology
  • life cycle inventory
  • machine learning
  • similarity based

Fingerprint

Dive into the research topics of 'A data-centric investigation on the challenges of machine learning methods for bridging life cycle inventory data gaps'. Together they form a unique fingerprint.

Cite this