Data Aggregation in Brim

To support evidence-based chart abstraction at scale, Brim starts with the smallest pieces of evidence and aggregates upwards into more powerful and logic-based variables. Knowing more about how this works can help you build variables that leverage the process.

A generation consists of several ordered steps:

Generation Step What happens for a variable Elements Used
  1. Prepare Generation
Brim creates generation tasks and adds them to the task queue. If there are large running tasks and not many GPUs, the new generation tasks may have to wait.
  1. Variable Generation

Brim creates a list of evidence snippets that might be pertinent for each variable.


Then Brim uses the Variable Instructions to assign labels to any relevant evidence snippets and discard the rest.


If the variable has a scope of "Many per note" and no input variables, it is done generating after this step.

Variable Instructions

Variable Definition

Notes, Note Titles, and Note Dates

If a Variable has input variables, those variable names, values, and evidence.

  1. Variable Aggregation

Brim takes the list of many per note labels and uses the variable aggregation instructions to assign labels either:

  • For each note for "One per note" variables or
  • For each patient for "One per patient" variables

Selecting "Show all inputs" in the label table for a One per note or One per patient variable will show all of the underlying labels.

Variable Aggregation Strategy or Custom Aggregation Instructions

Value for "Only use true value in aggregation"

Pre-aggregation variable values, with evidence snippets.

  1. Dependent Variable Generation

Brim takes the list of all labeled variables and uses the dependent variable instructions to assign labels at the "One per patient" level.


If a dependent variable has dependent variables as inputs, Brim will run the dependent variables in the correct order.

Dependent Variable Instructions

Dependent Variable Definition

Variable names, values, and evidence for input variables.

  1. Validation Match Generator
Brim compares the generated labels with any uploaded validation datasets and calculates agreement metrics.

Variable and Dependent Variable names, scopes, and values

Validation datasets

Propagation of abstractor corrections

Since each step takes the previous steps as inputs, if you manually change the value of a label in an early step, you will need to re-run generation on later steps to see the logic propagate. You can do this by running generation for the specific variable you want refreshed, or by running a generation that does not overwrite human-reviewed labels.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.