Variable Best Practices

Well-designed variables are the foundation of accurate, scalable chart abstraction. When a variable is clearly defined and thoughtfully scoped, it leads to better AI performance, easier human review, and cleaner data for your team.

Use the checklist below to evaluate and improve your variables.

There are 4 overall categories:

1) Simplicity: Is this variable answering the simplest question possible? Note: Simple variables can be layered to achieve very complex reasoning — with increased accuracy and traceability.

2) Semantics: Are the desired results in various situations clearly described?

3) References: If this variable references others, are those references clearly laid out and contextualized?

4) Completeness: Are the fields of this variable complete and consistent?

1) Simplicity

✅ Single Concept

Each variable should capture one idea only. Avoid combining multiple questions into one (e.g., “Does the patient have diabetes or hypertension?”). Your instruction should not require conditional logic like "If X, then Y, unless Z."

Weak on Single Concept

Strong on Single Concept

left_handed_tennis_injury: Is this patient left handed and do they have a tennis injury?

left_handed: Is the patient left handed (boolean)

tennis_injury: Do they have an injury from tennis? (boolean)

left_handed_tennis_injury: Are left_handed and tennis_injury both true for this patient?

✅ Limited Categories (Bonus)

Whenever possible, keep your variable response options simple. Booleans (yes/no ) or limited categories (e.g., stage 1–4 ) are easier for both humans and AI to interpret consistently.

Not Limited Categories	Limited Categories
tennis_injury_type: Free text description of tennis injury type.	tennis_injury_type: ankle sprain, torn ACL, broken leg, broken wrist, other

2) Semantics

✅ Insufficient Evidence

Make it clear what to do if the desired information isn’t found. For example:

“If the diagnosis is not mentioned, return ‘unknown.’”

Weak Guidance for Insufficient Evidence

Strong Guidance for Insufficient Evidence

tennis_injury_type: should be ankle sprain, torn ACL, broken leg, broken wrist, or other

tennis_injury_type: should be ankle sprain, torn ACL, broken leg, broken wrist, or other

In the prompt:

"If there is insufficient direct evidence of a tennis injury, return "None"."

✅ Multiple Values

Specify how to handle cases where the value appears more than once. Common patterns include:

Use the most recent value
Use the earliest documented value
Choose the highest grade or most severe classification

Weak Guidance for Multiple Values

Strong Guidance for Multiple Values

tennis_injury_type: should be ankle sprain, torn ACL, broken leg, broken wrist, or other

tennis_injury_type: should be ankle sprain, torn ACL, broken leg, broken wrist, or other

In the aggregation instructions:

"If there are multiple tennis injuries, return the most recent." or choose a standard aggregation instruction.

✅ Expected Format

Clarify the format of the output (e.g., MM/YYYY , whole number, free text). This improves consistency and downstream usability.

Weak Guidance for Expected Format	Strong Guidance for Expected Format
tennis_injury_type: return the type of tennis injury	tennis_injury_type: Return the injury location, with a colon, followed by the injury. eg "wrist: sprain", "wrist: break".

✅ Temporal Context

Define the relevant timeframe. Is the variable referring to:

The current status?
The initial diagnosis?
A specific encounter?

If the answer depends on time, say so explicitly.

Weak Guidance for Temporal Context	Strong Guidance for Temporal Context
injured_this_year: return whether the patient was injured this year.	injured_this_year: return whether the patient was injured this year, defined as after 1/1/2025. Injuries before this date should be ignored.

3) References (Only if Applicable)

✅ Variables Connected

If this variable references others (e.g., using "Date of Birth" to calculate age), make sure those are listed in the variables field.

✅ References in Prompt

Reference those connected variables directly in your variable instructions so Brim can use them correctly.

Example: “Use the Date of Birth variable to calculate the patient’s age.”

4) Completeness

✅ Consistency

The variable name should match its data type, scope, and intent. For instance, don’t name something has_cancer if it returns a free-text tumor description.

Weak Consistency	Strong Consistency
age: return whether the patient is older than 45 years old.	older_than_45: return whether the patient is older than 45 years old.

✅ Examples

Include at least one example or anti-example. Show a snippet of clinical text and what value should be abstracted.

“If the note says ‘The patient was diagnosed with prostate cancer in 2018,’ return ‘01/01/2018’.”

No Examples

Includes Examples

tennis_injury_type: return the type of injury sustained from tennis.

Examples:

"Patient had a broken wrist from playing tennis". Expected value: "Wrist: broken"
"Patient broke their leg." Expected value: "None", because no tennis is mentioned.

✅ Optimization

Optimize your variable using Brim's variable optimizer. This will also include any examples with labels you've already reviewed. You'll find more information on optimizing variables here.

By following these best practices, you’ll create variables that are easier to maintain, easier to scale, and more likely to produce accurate results.

Variable Best Practices

1) Simplicity

2) Semantics

3) References (Only if Applicable)

4) Completeness

Related Articles