OpenSource Safe DataLoader for Gen AI applications
Pebblo Data Reports provides an in-depth visibilty into the document ingested into Gen-AI RAG application during every load.
This document describes the information produced in the Data Report.
Report Summary provides the following details:
Findings
over the total number of files used in this document load. This field indicates the number of files that need to be inspected to remediate any potentially text that needs to be removed and/or cleaned for Gen-AI inference.This table indicates the top files that had the most findings. Typically these files are the most affending ones that needs immediate attention and best ROI for data cleansing and remediation.
This table provides the history of findings and path to the reports for the previous loads of the same RAG application.
This section provide a quick glance of where the RAG application is physically running like in a Laptop (Mac OSX) or Linux VM and related properties like IP address, local filesystem path and Python version.
This table provides a summary of all the different Topics and Entities found across all the files that got ingested usind Pebblo SafeLoader
enabled Document Loaders.
This sections provides the actual text inspected by the Pebblo Daemon
using the Pebblo Topic Classifier
and Pebblo Entity Classifier
. This will be useful to quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file the snippet is loaded from easy remediation.