Using Brim's Validation Tool

Brim's Validation Tool helps you measure abstraction accuracy and track improvement over time.


Our Validation Tool supports Training data and Holdout data as two distinct datasets so you can separately manage the examples you use to improve your variables from the held-out set you use to measure true performance.


Training Data vs. Holdout Data


Training Data Holdout Data
Purpose Improve and optimize variable definitions Measure true, unbiased performance
How values are added Uploaded CSV files and/or manually labeled examples Uploaded CSV file only
Agreement calculation Uses the most recently provided value per patient/document/variable Uses the most recently provided value in uploaded comparison dataset
When to use Actively refining variables Ready to assess where you stand

Why keep them separate? If you train and evaluate on the same examples, your agreement scores reflect how well Brim has been tuned to those specific cases—not how it will perform on new data. Keeping a holdout set untouched gives you an honest signal.


Training Data

Training data represents the labeled examples you use to guide and improve your variable definitions. It can come from two sources: uploaded files and manually labeled examples within Brim.

Adding Training Data

Option A: Upload a file

  1. Navigate to Settings > Validation Data, and select the Training tab.
  2. Click "Choose File" and select a CSV to upload.

The Brim detailed export file is already in the correct format. You can also create your own. Each row should represent one value with the following minimum fields:

Field Name Description Notes
Name The variable name Must exactly match the variable name in Brim
Scope One Label Per Patient, One Label Per Note, Many Labels Per Note, or Patient-Level Dependent Variable Must match the variable scope in Brim. Many Labels Per Note variables are ignored—matching is imprecise.
Patient_id Alphanumeric patient ID Must exactly match the patient ID in Brim
Note_id Alphanumeric document ID, or blank for One Label Per Patient and Patient-Level Dependent Variable Must exactly match the document ID in Brim
Value The expected value Must exactly match the Brim value
  1. Click "Upload". Brim will validate the structure and compare uploaded values against any generated results.

Checking "Include only reviewed results" will exclude rows where the "Reviewed" column does not equal "TRUE". If your file has no "Reviewed" column, this checkbox will exclude all rows.


Option B: Manually label examples

You can also add training labels directly within Brim by clicking "Accept", "Edit", or "Remove" on Generated Labels or Manually adding a label. Manually labelled values are treated as training data alongside any uploaded values.

Duplicate Handling: When a patient/document/variable combination has multiple validation values (either in multiple files or through a combination of file uploads and manual labelling), Brim prioritizes the most recently provided value and uses it to compute agreement.

Investigate Specific Values

Click "See all Compared Values" to view a full list with the comparison value, Brim's value, and the evidence used. From here you can:

  • Sort by Agreement (True/False)
  • Filter by variable or patient
  • Click the "i" next to a variable name to review its definition
  • Review the Raw Text used and click "View in Document" to see it in context
  • Click the "?" icon in the Brim Value column to see reasoning, give feedback, or explore the variable hierarchy

Walkthrough: Setting Up Training Data


Holdout Data

Holdout data is your held-out, trusted reference set for evaluation. It is kept strictly separate from training data and is used solely to measure how your variables are performing, not to tune them.


To protect from overfitting, Brim allows each patient to be included in the Training Data OR the Holdout Data, but not both. If you are seeing uploaded or labelled examples ignored, check if that patient is already included in the conflicting dataset.


Adding Holdout Data

  1. Navigate to Project Setup or Settings > Validation Data, and select the Holdout tab.
  2. Click "Choose File" and select a CSV. The format is identical to the Training Data format described above.
  3. Click "Upload". Brim will immediately compare the dataset against any generated results.

If you don't have generated results yet, go to Label Generation > Start New Generation.

Reviewing Holdout Results

Once uploaded, Brim calculates agreement scores between each generation and your holdout dataset. Scores appear at the top of the Label Generation screen after each generation.

To view detailed results, go to Settings > Validation Data > Holdout:

  • Matching Variables (right): Variables are matched when name and scope are identical, and scope is One Label Per Patient or One Label Per Note.
  • Values Compared (middle): Values are compared when they belong to matched variables AND have exactly matching patient IDs and document IDs.
  • Agreement Percentage (left): The percentage of Values Compared where Brim's value exactly matches your holdout dataset.

Duplicate Handling: When a patient/document/variable combination has multiple validation values (either in multiple files or through a combination of file uploads and manual labelling), Brim prioritizes the most recently provided value and uses it to compute agreement.

Walkthrough: Running a Holdout Evaluation

Validation Match Rules

By default, variables use case-insensitive exact match, but you can specify the match rules for each variable. Learn more in variable settings, or follow along below:

Unmatched Values

Brim can't always match a comparison dataset value to a generated value in Brim. Understanding why can help you adjust your setup, your comparison dataset, or both.

View Upload Details

After uploading either training or holdout data, you can click "Show all Uploads" > "Show Details" for a row-by-row breakdown of the upload:

  • Total rows: All rows in the file, excluding the header.
  • Accepted: Rows Brim successfully matched to a generated value.
  • Skipped: Rows Brim could not compare. Breakdown by reason:
    • Duplicate: Multiple identical rows exist; Brim accepts one and skips the rest.
    • Empty Value: Rows with a blank value are skipped.
    • Not Reviewed: Skipped because "Only load reviewed labels" is selected and the row lacks "TRUE" in the Reviewed column.
    • Missing Required Column: A required column is absent.
    • Missing Document ID: One Label Per Note variables require a document ID to match against the correct document.
    • Unsupported/Unknown Scope: Scope is not One Label Per Patient or One Label Per Note.
    • Already in training data/Already in holdout data: To avoid overfitting, each patient can be included in either the training set or the holdout data set, but not both. Examples that conflict with this rule are skipped.

See Details on Unmatched Values

In Settings > Validation Data > See all Compared Values, use the "Matched" and "Unmatched" tabs at the top.

Select "Unmatched" to see every skipped value with a specific reason why it couldn't be compared.

Best Practices

  • Protect your holdout set: Once you designate examples as holdout data, avoid using them to inform variable edits. Treat them as your source of truth.
  • Start small: Validate on a handful of patients before scaling up.
  • Iterate with training data: Use training examples and manual labels to refine unclear instructions or edge cases, then check your holdout scores to confirm improvements hold.
  • Guard against drift: Run a quick holdout evaluation on every new batch of documents.

FAQ

Q: Can I replace my training or holdout dataset? Yes. Upload a new CSV in the appropriate tab and it will replace the previous file.

Q: What happens if I have both an uploaded training value and a manual label for the same variable/patient/document? Brim uses the most recently provided value to compute agreement for that combination.

Q: Can I export the validation results? Yes. A Detailed Export from the Export Data screen includes validation values and agreement for both training and holdout data.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.