Label Generation Process

Once you've uploaded data and defined variables for your project, it's time to generate labels.

Each generation takes time and resources. To make the best use of them, we recommend an iterative process:

When you're ready to generate AI-drafted labels for your patients, use the Set Up New Generation form. This guide walks through every option so you can target exactly the patients, notes, and variables you need  and manage costs along the way.


How to set your Label Generation Options:

1. Go to the dashboard.

2. Click Label Generation.

3. Click Set Up New Generation.

You will find different options for what you can generate labels on:


Section 1: Patients

Choose which patients this generation should run for:

  • Batch of Patients (default) — Generates labels for a random batch of patients that hasn't been generated yet. Choose a Batch Type to configure the batch. We recommend starting with a batch size of 1 to verify results before scaling up.
  • Specific Patients — Enter one or more MRNs to generate labels for a targeted set of patients. Useful when debugging or reviewing a known case.
  • All Patients — Generates labels for every patient in your dataset. Note: this can be data- and time-intensive for large datasets. We recommend working in batches when possible.

Section 2: Filter By

Use the Filter By section to narrow down which variables and notes are included in this generation. These options help reduce compute costs and speed up targeted runs.


    1. Variables : Control which variables are generated in this run.
      • All Variables (default) — Generates values for every variable in your project.
      • Specific Variables — Opens a variable picker so you can choose only the variables you want to generate. Useful when iterating on a single variable or rerunning a subset after updating instructions.

b. Notes: Control which notes are included in this generation.

      • All Notes (default) — Generates for all notes associated with the selected patients.
      • Specific Note IDs — Enter one or more note IDs. For multiple IDs, separate by commas.
      • Notes from a Specific Upload — Select a data upload file from the dropdown. Brim will generate only for notes that were last updated in that upload file. Each note is included in the file where it was most recently updated.  Note: This setting is not available for "Batch" patient generation.


Section 3: Overwrite Existing Labels

  • Choose what happens when a label already exists for a patient/note/variable combination.
        • Skip It — Leaves all existing labels as-is. Only generates values where none currently exist. Use this when you want to fill gaps without touching any previously generated or reviewed data.
        • Replace all AI-Generated Values Only (default) — Regenerates any AI-drafted labels that haven't been reviewed by a human, while preserving human-reviewed values.
        • Replace all Human and AI-Generated Values — Regenerates everything, including labels that a human reviewer has already edited. Use with caution.
      • Best practice: The default setting (Replace AI-Generated Only) is right for most runs. Brim will skip any label your team has already reviewed, so you never lose human-validated data.




  1. Start small. Generate for a single patient (Batch of Patients, batch size = 1) to verify your variable definitions are working as expected.
  2. Review and refine. Check the labels for that patient. Adjust variable instructions as needed, and optimize your variables if you haven't yet. See: Optimizing Variable Instructions.
  3. Scale up in batches. Once you're satisfied with the results, increase your batch size and generate for more patients.
  4. Use filters to reduce cost. Use the Variables and Variable Values filters to limit generation to only what you need — especially useful when rerunning after making changes.

Continue generating and reviewing in batches until all your data is abstracted and reviewed. You can optimize variables at any point to apply learnings from reviewed data to future generations.



Understanding Token Estimates

Before you run a generation, Brim displays an Estimated Max Tokens figure to help you understand your potential token usage upfront. This estimate is calculated based on the number of variables in your project, the patients you've selected, and the average length of their documents — and it updates automatically as you adjust your patient selection.

Keep in mind: this is a maximum estimate, not a prediction of exact usage. In practice, your actual token consumption will often be lower, since each generation runs dynamically and real document lengths and filter conditions vary. Think of it as a ceiling, not a target.

If you've configured filters or overwrite settings, click Refine Estimate to get a tighter maximum that accounts for those specifics. Once refined, the estimate will reflect your variable filters, variable value filters, note ID filters, and overwrite settings. If you update any settings after refining, use Update Estimate to recalculate.

For a deeper look at how estimates are calculated, see Estimating Token Usage


Generation History

As you generate labels, you can review a complete history of all your generations. Each generation shows the parameters used, resources consumed, and current status.

Generation Details

Each generation is grouped with the following information:

  • Started: The timestamp when the generation began.
  • Generation Parameters:
    • Patients: Whether you generated for "All" patients, a "Batch X (Y Patients)", or specific MRNs.
    • Filters: Any non-default filters applied, displayed in the order they appear in the form.
    • Variables: The variables included in this generation (comma-separated list).
    • Conditions: Any conditions applied, expressed as phrases.
    • Note IDs: The specific Note IDs included, or "last updated in
    • Overwrite: Shows only if not set to "Skip" — either "AI Values Only" or "AI and Human Values".
    • Scope: Shows only if not set to "All".

Tokens Used: The number of tokens consumed by the generation (formatted with thousands separators).

Time Elapsed: How long the generation took, displayed in human-readable format (e.g., 36 seconds; 1 minute, 27 seconds; 2 hours, 3 minutes, 56 seconds).


Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.