Skip to main navigation Skip to search Skip to main content

Runtime provenance refinement for notebooks

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Computational notebooks (e.g., Jupyter or Apache Zeppelin) have become a popular choice for data exploration, preparation, and ETL. Notebooks are more suited for interactive development of data pipelines than classical workflow systems, because they provide immediate feedback for the results of a computation and do not require the full computation to be specified upfront. However, the notebook model suffers from poor reproducibility, does not support automatic incremental re-evaluation of code when the code or inputs change, and does not allow for parallel execution of cells - - all symptoms of its kernel-based evaluation strategy. We propose a new "workbook"model that combines the usability of notebooks with the provenance and parallel execution capabilities of workflow systems. This is made possible through a novel approach that refines a static approximation of provenance for Python code at runtime and a scheduler that dynamically adapts the execution order of cells based on data dependencies detected or refuted at runtime. We demonstrate the feasibility of this approach using a prototype implementation in our notebook engine Vizier.

Original languageEnglish
Title of host publicationProceedings of 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022
PublisherAssociation for Computing Machinery, Inc
Pages44-47
Number of pages4
ISBN (Electronic)9781450393492
DOIs
StatePublished - Jun 12 2022
Event14th International Workshop on the Theory and Practice of Provenance, TaPP 2022, held in conjunction with SIGMOD 2022 - Philadelphia, United States
Duration: Jun 17 2022Jun 17 2022

Publication series

NameProceedings of 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022

Conference

Conference14th International Workshop on the Theory and Practice of Provenance, TaPP 2022, held in conjunction with SIGMOD 2022
Country/TerritoryUnited States
CityPhiladelphia
Period06/17/2206/17/22

Fingerprint

Dive into the research topics of 'Runtime provenance refinement for notebooks'. Together they form a unique fingerprint.

Cite this