Leveraging AI for Data Provenance: Enhancing Tracking and Verification of Data Lineage in FATE Assessment
Swathi Chundru
Motivity Labs Pvt Ltd, Hyderabad, Telangana, India
Download PDFAbstract
A record of the sources and processing of data, known as data provenance, holds new possibilities in the ever-growing role that artificial intelligence (AI)-based systems play in assisting human decision-making. Fairness, accountability, transparency, and explainability are the four key virtues that responsible AI builds upon to prevent the terrible consequences that might arise from biased AI systems. This work describes current biases and explores potential applications of data provenance to alleviate them, in an effort to spark more research on data provenance that facilitates responsible AI. We start by going over biases resulting from the pre-processing and data origins. Next, we talk about the practice as it is now, the difficulties it faces, and the solutions that have been suggested. In order to create responsible AI-based systems, we give an overview of how our recommendations might help establish data provenance and hence eliminate biases arising from the origins and preprocessing of the data. We wrap up by outlining future study directions in our research agenda.
Keywords: Artificial Intelligence; Data lineage; FATE assessment
- Amina Adadi and Mohammed Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6, (2018), 52138–52160.DOI:https://doi.org/10.1109/ACCESS.2018.2870052
- Gediminas Adomavicius, Jesse Bockstedt, ShawnCurley, and Jingjng Zhang. 2019. Reducing Recommender Systems Biases: An Investigation of Rating Display Designs. MIS Quarterly 43, 4 (February 2019), 18–19.
- Gediminas Adomavicius and Mochen Yang. 2019.Integrating Behavioral, Economic, and Technical Insights to Address Algorithmic Bias: Challenges and Opportunities for Research. SSRN Journal (2019).DOI: https://doi.org/10.2139/ssrn.3446944
- Alan Alexander, Megan McGill, Anna Tarasova, Cara Ferreira, and Delphine Zurkiya. 2019. Scanning the Future of Medical Imaging. Journal of the American College of Radiology 16, 4 (April 2019), 501–507.DOI:https://doi.org/10.1016/j.jacr.2018.09.050
- Ilkay Altintas, Oscar Barney, and Efrat Jaeger-Frank. 2006. Provenance Collection Support in the Kepler Scientific Workflow System. In Provenance and Annotation of Data (Lecture Notes in Computer Science), Springer, Berlin, Heidelberg, 118–132. DOI: https://doi.org/10.1007/11890850_14
- Marcus A. Badgeley, John R. Zech, Luke Oakden- Rayner, Benjamin S. Glicksberg, Manway Liu, William Gale, Michael V. McConnell, Bethany Percha, Thomas M. Snyder, and Joel T. Dudley. 2019. Deep learning predicts hip fracture using confounding patient and healthcare variables. npj Digit. Med. 2, 1 (December 2019), 31. DOI: https://doi.org/10.1038/s41746-019-0105-1
- Khalid Belhajjame, Reza B’Far, James Cheney, Sam Coppens, Stephen Cresswell, Yolanda Gil, Paul Groth, Graham Klyne, Timothy Lebo, Jim McCusker, Simon Miles, James Myers, Satya Sahoo, and Curt Tilmes. 2013. PROV-DM: The PROV Data Model. (2013).
- Francine Berman, Rob Rutenbar, Brent Hailpern, Henrik Christensen, Susan Davidson, Deborah Estrin, Michael Franklin, Margaret Martonosi, Padma Raghavan, Victoria Stodden, and Alexander S. Szalay. 2018. Realizing the potential of data science. Commun. ACM 61, 4 (March 2018), 67–72. DOI: https://doi.org/10.1145/3188721
- Donald J. Berndt, James A. McCart, Dezon K. Finch, and Stephen L. Luther. 2015. A Case Study of Data Quality in Text Mining Clinical Progress Notes. ACM Trans. Manage. Inf. Syst. 6, 1 (April 2015), 1–21. DOI: https://doi.org/10.1145/2669368 [13] Peter Buneman and Susan B Davidson. Data provenance – the foundation of data quality. 8.
- Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and Where: A Characterization of Data Provenance. In Database Theory — ICDT 2001, Jan Van den Bussche and Victor Vianu (eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 316–330. DOI: https://doi.org/10.1007/3-540-44503-X_20
- Cansu Canca. 2020. Operationalizing AI ethics principles. Commun. ACM 63, 12 (November 2020), 18–21. DOI: https://doi.org/10.1145/3430368
- James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2007. Provenance in Databases: Why, How, and Where. FNT in Databases 1, 4 (2007), 379–474. DOI: https://doi.org/10.1561/1900000006
- Enrico Coiera. 2019. The Last Mile: Where Artificial Intelligence Meets Reality. J Med Internet Res 21, 11 (November 2019), e16323. DOI: https://doi.org/10.2196/16323