Pre-Production Validation with Brim

When working with AI, making an answer is easy, but validating it and scaling it reliably is harder. Brim's Validation Tool is designed to help you do this for any project, starting pre-production with designing your abstraction, iterating to meet your metrics, supporting evaluation in preparation for production, and enabling ongoing monitoring for clinical workflows.

This article is designed to help you use Brim's Validation Tool from iteration through Production.

Core Principles

1. Measure what matters.

Make sure your evaluation lines up with the real-world priorities of your workflow. This starts with prioritizing the most important variables in the validation. For example, if you are identifying patients for a clinical trial, you might want to start with agreement for the top level variable (clinical trial eligibility), and work from there.

This also means considering the types of errors that are most consequential. For example, if you are identifying high-risk patients for rescheduling, recall (catching every possible high risk patient) might matter more than precision (avoiding false positives).

2. Understand your ground truth.

No dataset is perfect. If you are building your comparison dataset yourself, consider building your comparison dataset carefully: have two people abstract the same records, resolve disagreements with an expert, and keep definitions crystal clear. You can even do this in Brim, which helps you more efficiently find evidence in the notes.

If you can, get a measure of the inter-rater agreement; this represents the level of clarity between human reviewers, and is a good goal to shoot for when doing AI-guided abstraction.

3. Allow “unknown.”

Sometimes the evidence just isn’t in the chart. When Brim doesn't have evidence, it will give a default value you specify, usually "No evidence." This is meant to prevent guessing and improve reliability.

How to Validate with Brim

There are two main applications of the Validation Tool pre-production for Brim:

Create a Validation Dataset in Brim. Brim can be used to create a validation dataset more quickly than by hand, supporting downstream workflows.
Use a Validation Dataset while Building Variables. You can use a validation dataset while iterating on variables to evaluate draft abstractions and understand which variables need more tuning.

1. Create a Validation Dataset in Brim

Goal: Establish a baseline quickly.

Once you know the variables you want to abstract, you can create a Validation Dataset in Brim that you can use during iteration.

Upload data and create variables.
Generate for a small batch. We recommend 10-20 patients.
Manually label the results using Label Review. On the first run, the Brim values likely won't be perfect. We recommend supplementing the evidence Brim pulls with reviewing the charts and:
1. Accepting, Editing, or Removing each Brim value in Label Review.
2. Adding manual labels where appropriate, including the evidence in the text that drove the conclusion.
Download a Detailed Export. Go to Data Export, select "Detailed Export" and download the CSV.
1. If you generated for more than your Validation Dataset, you can choose to export only the reviewed values by clicking "Show advanced settings" and selecting "Include only reviewed labels".
Upload your Labelled Dataset as Validation Data. Go to Project Setup > Validation Data and upload your Detailed Export. Future iterations of the variables will be automatically compared to the Validation Dataset.
When generating for future iterations, remember to select "Overwrite Generated & Human Labels" so that Brim uses your new variable definitions to abstract instead of keeping your manually labelled values.

Select "Overwrite Generated & Human Labels" when iterating.

2. Use a Validation Dataset while Building Variables

Goal: Learn quickly and refine.

Upload data.
Complete the first draft of your variables. You want to make sure they match the validation dataset values so you can get information right away.
Upload a Validation Dataset. Start small; 10 to 50 patients that cover a variety of sites, note types, and patient profiles should be sufficient. Include some rare but important cases.
1. The validation dataset could be one you created using Brim, or one you created externally. If you are using an externally created dataset, review the format required for upload here.
Generate results in Brim for those patients.
1. If you're using a validation dataset with only a portion of the records you've loaded in Brim, you can generate for a list of specific patients by selecting "Specific patients" in the "Generation Size" dropdown and adding a list of patient IDs.
Look at your Overall Agreement and Agreement by variable. In Settings > Validation Data, you'll be able to see the overall agreement and agreement by variable. This will can help you focus your iteration efforts.
Improve variables.
1. In "See all Comparison Values", you can filter by variable and sort by Agreement value to see which values aren't agreeing.
2. If you can see patterns, try to address them in the variable prompt. Common error types include formatting mismatches, missed information, and ambiguous definitions.
3. Adding an example in the prompt is effective, but we recommend tweaking it to be more general than the one in your test dataset so that you can continue to use your test dataset as you iterate.
Generate again. When you've made progress, go back to step 4 and generate for your test patients.
1. If you labeled results in Brim, make sure to select "Overwrite Generated & Human Labels" when generating so that Brim will use your updated prompts to re-generate the values.
2. We recommend doing this variable by variable if you can to focus your efforts. You can generate for a subset of variables by going to "Show Advanced Settings" in generation and selecting variables in the variable field.
Stop iterating when performance is close to or better than human agreement for your most important fields.

When generating, click "Show Advanced Settings" and select which variables to generate.

Brim’s tools are designed to make your validation & evaluation process straightforward. Reach out if you have more suggestions for how we can make this easier!