Pebblo Data Reports
Pebblo Data Reports provides an in-depth visibility into the document ingested into Gen-AI RAG application during every load.
This document describes the information produced in the Data Report.
Report Summary
Report Summary provides the following details:
- Findings: Total number of Topics and Entities found across all the snippets loaded in this specific load run.
- Files with Findings: The number of files that has one or more
Findings
over the total number of files used in this document load. This field indicates the number of files that need to be inspected to remediate any potentially text that needs to be removed and/or cleaned for Gen-AI inference. - Number of Data Source: The number of data sources used to load documents into the Gen-AI RAG application. For e.g. this field will be two if a RAG application loads data from two different directories or two different AWS S3 buckets.
Top Files with Most Findings
This table indicates the top files that had the most findings. Typically, these files are the most offending ones that needs immediate attention and best ROI for data cleansing and remediation.
Load History
This table provides the history of findings and path to the reports for the previous loads of the same RAG application.
Load History provides details about latest 5 loads of this app. It provides the following details:
- Report Name: The path to the report file.
- Findings: The number of findings identified in the report.
- Files With Findings: The number of files containing findings.
- Generated On: The timestamp, when the report was generated. Time would be in local time zone.
- Find more reports on: Path to the folder where you can find reports for all the loads. This field will be visible when there are more than 5 loads of an app.
Instance Details
This section provide a quick glance of where the RAG application is physically running like in a Laptop (Mac OSX) or Linux VM and related properties like IP address, local filesystem path and Python version.
Data Source Findings Table
This table provides a summary of all the different Topics and Entities found across all the files that got ingested using Pebblo SafeLoader
enabled Document Loaders.
Snippets
This sections provides the actual text inspected by the Pebblo Server
using the Pebblo Topic Classifier
and Pebblo Entity Classifier
. This will be useful to quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file the snippet is loaded from easy remediation.