Variable Best Practices
Well-designed variables are the foundation of accurate, scalable chart abstraction. When a variable is clearly defined and thoughtfully scoped, it leads to better AI performance, easier human review, and cleaner data for your team.
Use the checklist below to evaluate and improve your variables.
There are 4 overall categories:
1) Simplicity: Is this variable answering the simplest question possible? Note: Simple variables can be layered to achieve very complex reasoning — with increased accuracy and traceability.
2) Semantics: Are the desired results in various situations clearly described?
3) References: If this variable references others, are those references clearly laid out and contextualized?
4) Completeness: Are the fields of this variable complete and consistent?
1) Simplicity
✅ Single Concept
Each variable should capture one idea only. Avoid combining multiple questions into one (e.g., “Does the patient have diabetes or hypertension?”). Your instruction should not require conditional logic like "If X, then Y, unless Z."
Weak on Single Concept | Strong on Single Concept |
left_handed_tennis_injury: Is this patient left handed and do they have a tennis injury? | left_handed: Is the patient left handed (boolean) tennis_injury: Do they have an injury from tennis? (boolean) left_handed_tennis_injury: Are left_handed and tennis_injury both true for this patient? |
✅ Limited Categories (Bonus)
Whenever possible, keep your variable response options simple. Booleans (yes/no
) or limited categories (e.g., stage 1–4
) are easier for both humans and AI to interpret consistently.
Not Limited Categories | Limited Categories |
tennis_injury_type: Free text description of tennis injury type. | tennis_injury_type: ankle sprain, torn ACL, broken leg, broken wrist, other |
2) Semantics
✅ Insufficient Evidence
Make it clear what to do if the desired information isn’t found. For example:
“If the diagnosis is not mentioned, return ‘unknown.’”
Weak Guidance for Insufficient Evidence | Strong Guidance for Insufficient Evidence |
tennis_injury_type: should be ankle sprain, torn ACL, broken leg, broken wrist, or other | tennis_injury_type: should be ankle sprain, torn ACL, broken leg, broken wrist, or other In the prompt: "If there is insufficient direct evidence of a tennis injury, return "None"." |
✅ Multiple Values
Specify how to handle cases where the value appears more than once. Common patterns include:
- Use the most recent value
- Use the earliest documented value
- Choose the highest grade or most severe classification
Weak Guidance for Multiple Values | Strong Guidance for Multiple Values |
tennis_injury_type: should be ankle sprain, torn ACL, broken leg, broken wrist, or other | tennis_injury_type: should be ankle sprain, torn ACL, broken leg, broken wrist, or other In the aggregation instructions: "If there are multiple tennis injuries, return the most recent." or choose a standard aggregation instruction. |
✅ Expected Format
Clarify the format of the output (e.g., MM/YYYY
, whole number, free text). This improves consistency and downstream usability.
Weak Guidance for Expected Format | Strong Guidance for Expected Format |
tennis_injury_type: return the type of tennis injury | tennis_injury_type: Return the injury location, with a colon, followed by the injury. eg "wrist: sprain", "wrist: break". |
✅ Temporal Context
Define the relevant timeframe. Is the variable referring to:
- The current status?
- The initial diagnosis?
- A specific encounter?
If the answer depends on time, say so explicitly.
Weak Guidance for Temporal Context | Strong Guidance for Temporal Context |
injured_this_year: return whether the patient was injured this year. | injured_this_year: return whether the patient was injured this year, defined as after 1/1/2025. Injuries before this date should be ignored. |
3) References (Only if Applicable)
✅ Variables Connected
If this variable references others (e.g., using "Date of Birth" to calculate age), make sure those are listed in the variables field.
✅ References in Prompt
Reference those connected variables directly in your variable instructions so Brim can use them correctly.
Example: “Use the Date of Birth variable to calculate the patient’s age.”
4) Completeness
✅ Consistency
The variable name should match its data type, scope, and intent. For instance, don’t name something has_cancer
if it returns a free-text tumor description.
Weak Consistency | Strong Consistency |
age: return whether the patient is older than 45 years old. | older_than_45: return whether the patient is older than 45 years old. |
✅ Examples
Include at least one example or anti-example. Show a snippet of clinical text and what value should be abstracted.
“If the note says ‘The patient was diagnosed with prostate cancer in 2018,’ return ‘01/01/2018’.”
No Examples | Includes Examples |
tennis_injury_type: return the type of injury sustained from tennis. | tennis_injury_type: return the type of injury sustained from tennis. Examples:
|
✅ Optimization
Optimize your variable using Brim's variable optimizer. This will also include any examples with labels you've already reviewed. You'll find more information on optimizing variables here.
By following these best practices, you’ll create variables that are easier to maintain, easier to scale, and more likely to produce accurate results.