Bias, Fairness, and Responsible Prompting

LLMs learn patterns from training data, which often contains human biases. As a prompt engineer, you’re responsible for understanding how your prompts can introduce, amplify, or mitigate bias. This lesson teaches you to design fair prompts and evaluate outputs for bias.

How Prompts Can Introduce Bias

Biases enter through many channels:

# Example 1: Implicit demographic assumptions
biased_prompt = """
Suggest a name for a kindergarten teacher. She is caring and nurturing."""
# Model likely suggests female names due to gendered training data

# Better prompt: Avoids gender assumptions
neutral_prompt = """
Suggest a professional name for a kindergarten teacher known for being caring and nurturing.
Names should not reveal any demographic information."""

# Example 2: Stereotyping through profession
biased_prompt = "A doctor is highly successful. His innovations..."
# "His" assumes male; presupposes doctors are male

# Example 3: Selection bias
biased_prompt = """
Generate leadership qualities. Examples: assertiveness, dominance, confidence."""
# These traits are stereotypically "masculine"; different cultures value different traits

# Better approach: Neutral, culturally diverse examples
fair_prompt = """
Generate diverse leadership qualities valued across different cultures:
- Examples might include: collaborative decision-making, empathy, strategic thinking,
  transparency, accountability, cultural awareness.
Note: There are many valid leadership styles; avoid gender, cultural, or ability stereotypes."""

Categories of Bias in LLMs

from dataclasses import dataclass
from typing import List

@dataclass
class BiasCategory:
    """Types of bias that can appear in LLM outputs."""
    name: str
    description: str
    examples: List[str]

BIAS_CATEGORIES = [
    BiasCategory(
        name="Gender Bias",
        description="Stereotyping based on gender/sex",
        examples=[
            "Assuming nurses are female, CEOs are male",
            "Using gendered pronouns inappropriately",
            "Reinforcing gender stereotypes"
        ]
    ),

    BiasCategory(
        name="Racial/Ethnic Bias",
        description="Stereotyping based on race or ethnicity",
        examples=[
            "Associating certain professions with ethnic groups",
            "Using ethnic stereotypes in examples",
            "Underrepresenting certain groups"
        ]
    ),

    BiasCategory(
        name="Age Bias",
        description="Stereotyping based on age",
        examples=[
            "Assuming older people are less tech-savvy",
            "Young people as unreliable or inexperienced",
            "Age-based exclusions"
        ]
    ),

    BiasCategory(
        name="Ability Bias",
        description="Stereotyping people with disabilities",
        examples=[
            "Assuming disabled people can't work",
            "Using disability as a negative example",
            "Ignoring accessibility needs"
        ]
    ),

    BiasCategory(
        name="Socioeconomic Bias",
        description="Stereotyping based on wealth/class",
        examples=[
            "Associating poverty with crime",
            "Assuming wealthy = intelligent",
            "Cultural bias toward certain lifestyles"
        ]
    ),

    BiasCategory(
        name="Implicit Association Bias",
        description="Unconscious associations between concepts",
        examples=[
            "Higher importance to certain names",
            "Association of names with competence",
            "Implicit clustering patterns"
        ]
    )
]

class BiasCategoryIndex:
    """Index for understanding bias types."""

    def __init__(self):
        self.categories = BIAS_CATEGORIES

    def get_detection_prompts(self) -> dict:
        """Get prompts designed to detect each bias type."""
        return {
            "Gender Bias": "Ask the model to describe traits of nurses and CEOs separately. Compare for gender stereotyping.",
            "Racial Bias": "Request diverse examples and check if certain races are overrepresented in negative scenarios.",
            "Age Bias": "Ask about career prospects at different ages. Check for age stereotyping.",
            "Ability Bias": "Ask about workplace accommodations. Check if disabilities are framed negatively.",
            "Socioeconomic Bias": "Ask about poverty or wealth. Check for stereotype associations."
        }

Testing for Bias in Outputs

import json
from collections import Counter

class BiasDetector:
    """
    Detect bias in LLM outputs.
    """

    # Common names by gender/ethnicity (simplified for demo)
    GENDER_MARKERS = {
        "female": ["she", "her", "woman", "girl", "mother"],
        "male": ["he", "him", "man", "boy", "father"]
    }

    PROFESSION_STEREOTYPES = {
        "nurse": "female",
        "engineer": "male",
        "teacher": "female",
        "construction": "male",
        "fashion": "female",
        "technology": "male"
    }

    @staticmethod
    def test_gender_representation(prompt: str, response: str) -> dict:
        """
        Test for gender bias in response.
        """
        male_count = sum(response.count(marker) for marker in BiasDetector.GENDER_MARKERS["male"])
        female_count = sum(response.count(marker) for marker in BiasDetector.GENDER_MARKERS["female"])

        # Check for professional stereotypes
        stereotyped_professions = []
        for profession, expected_gender in BiasDetector.PROFESSION_STEREOTYPES.items():
            if profession in response.lower():
                actual_gender = "male" if male_count > female_count else "female"
                if actual_gender == expected_gender:
                    stereotyped_professions.append(profession)

        return {
            "male_pronouns": male_count,
            "female_pronouns": female_count,
            "gender_ratio": male_count / max(female_count, 1),
            "stereotyped_professions": stereotyped_professions,
            "has_gender_bias": len(stereotyped_professions) > 0 or abs(male_count - female_count) > 3
        }

    @staticmethod
    def test_diversity_representation(response: str) -> dict:
        """
        Test for diversity in examples and references.
        """
        diversity_indicators = {
            "names_used": [],
            "cultures_mentioned": 0,
            "disabilities_mentioned": 0,
            "socioeconomic_diversity": 0
        }

        # Check for inclusive language
        inclusive_markers = [
            "diverse", "inclusive", "accessibility", "accommodate",
            "cultural", "perspective", "background", "experience"
        ]

        inclusive_count = sum(
            response.count(marker) for marker in inclusive_markers
        )

        return {
            "inclusive_language_count": inclusive_count,
            "has_diversity_focus": inclusive_count > 0,
            "estimated_diversity_score": min(inclusive_count / 5, 1.0)
        }

    @staticmethod
    def comprehensive_bias_audit(prompt: str, response: str) -> dict:
        """
        Comprehensive audit for multiple bias types.
        """
        return {
            "gender_analysis": BiasDetector.test_gender_representation(prompt, response),
            "diversity_analysis": BiasDetector.test_diversity_representation(response),
            "overall_bias_risk": "moderate" if any([
                BiasDetector.test_gender_representation(prompt, response)["has_gender_bias"],
                not BiasDetector.test_diversity_representation(response)["has_diversity_focus"]
            ]) else "low"
        }

# Usage
test_response = "The nurse carefully explained the procedure. He showed great empathy."
audit = BiasDetector.comprehensive_bias_audit("", test_response)
print(audit)
# Should flag gender bias (nurse expected to be female, but male pronoun used)

Debiasing Strategies in Prompt Design

class DebisingPromptBuilder:
    """
    Build prompts that actively reduce bias.
    """

    @staticmethod
    def add_diversity_directive(base_prompt: str) -> str:
        """Add instruction to include diverse perspectives."""
        diversity_instruction = """
Include diverse perspectives in your response:
- Consider different genders, races, abilities, ages, and socioeconomic backgrounds
- Use inclusive language that doesn't stereotype
- If using examples, include people from different demographic groups
- Avoid reinforcing stereotypes or biases
- Note when perspectives might be limited or culturally specific
"""
        return base_prompt + "\n" + diversity_instruction

    @staticmethod
    def reframe_sensitive_topics(prompt: str) -> str:
        """Reframe prompts about sensitive topics to reduce bias."""
        # Remove gendered assumptions
        reframed = prompt.replace("he", "they").replace("she", "they")
        reframed = reframed.replace("man", "person").replace("woman", "person")

        return reframed

    @staticmethod
    def add_fairness_criteria(prompt: str) -> str:
        """Add explicit fairness evaluation criteria."""
        fairness_criteria = """
When evaluating or comparing options, ensure fairness:
- Don't favor or disadvantage based on demographic characteristics
- Consider equity (different people may need different support)
- Question stereotypical associations
- Acknowledge multiple valid perspectives
"""
        return prompt + "\n" + fairness_criteria

    @staticmethod
    def build_debiased_prompt(topic: str,
                             task: str,
                             include_diversity: bool = True,
                             include_fairness: bool = True) -> str:
        """
        Build a prompt designed to minimize bias.
        """
        base = f"Topic: {topic}\nTask: {task}"

        if include_diversity:
            base = DebisingPromptBuilder.add_diversity_directive(base)

        if include_fairness:
            base = DebisingPromptBuilder.add_fairness_criteria(base)

        return base

# Usage
topic = "Who makes a good manager?"
task = "List 5 key qualities"

debiased = DebisingPromptBuilder.build_debiased_prompt(topic, task)
print(debiased)

Fairness Metrics and Evaluation

from typing import Dict, List

class FairnessMetric:
    """Quantify fairness in LLM outputs."""

    @staticmethod
    def representation_balance(output: str,
                             demographic_groups: Dict[str, List[str]]) -> dict:
        """
        Measure if demographic groups are represented proportionally.

        Args:
            output: The LLM output to evaluate
            demographic_groups: {"group_name": ["keyword1", "keyword2"]}
        """
        results = {}

        for group_name, keywords in demographic_groups.items():
            mentions = sum(output.count(kw) for kw in keywords)
            results[group_name] = mentions

        total = sum(results.values())
        if total == 0:
            return {"error": "No mentions found", "representation": results}

        proportional = {k: v / total for k, v in results.items()}

        return {
            "representation": results,
            "proportional": proportional,
            "balance_score": self._calculate_balance(proportional)
        }

    @staticmethod
    def _calculate_balance(proportional: dict) -> float:
        """
        Calculate how balanced representation is.
        1.0 = perfectly balanced, 0.0 = most biased
        """
        if not proportional:
            return 0.0

        num_groups = len(proportional)
        ideal_proportion = 1.0 / num_groups

        # Calculate deviation from ideal
        deviations = [abs(p - ideal_proportion) for p in proportional.values()]
        avg_deviation = sum(deviations) / len(deviations)

        # Convert to 0-1 scale
        balance_score = max(0, 1 - (avg_deviation / ideal_proportion))
        return balance_score

    @staticmethod
    def stereotype_presence(output: str,
                           stereotype_mappings: Dict[str, str]) -> dict:
        """
        Detect stereotypical associations.

        Args:
            stereotype_mappings: {"group": "stereotype"} e.g., {"women": "emotional"}
        """
        detections = {}

        for group, stereotype in stereotype_mappings.items():
            if group.lower() in output.lower() and stereotype.lower() in output.lower():
                detections[f"{group} -> {stereotype}"] = True
            else:
                detections[f"{group} -> {stereotype}"] = False

        stereotype_count = sum(detections.values())
        stereotype_risk = "high" if stereotype_count >= 2 else "medium" if stereotype_count == 1 else "low"

        return {
            "detected_stereotypes": detections,
            "stereotype_count": stereotype_count,
            "risk_level": stereotype_risk
        }

# Usage
output = "The doctor explained the surgery to the patient. She was very caring and supportive."

demographic_groups = {
    "gender_neutral": ["doctor", "patient"],
    "female": ["she", "her", "woman"],
    "male": ["he", "him", "man"]
}

balance = FairnessMetric.representation_balance(output, demographic_groups)
print("Balance assessment:", balance)

stereotype_maps = {
    "women": "emotional",
    "men": "logical",
    "elderly": "fragile"
}

stereotypes = FairnessMetric.stereotype_presence(output, stereotype_maps)
print("Stereotype assessment:", stereotypes)

Inclusive Prompt Design Principles

class InclusivePromptDesignPrinciples:
    """
    Guidelines for designing inclusive, fair prompts.
    """

    principles = {
        "Use Inclusive Language": """
- Use "they/them" instead of assuming gender
- Use "person" instead of "man/woman" when gender-neutral
- Avoid ableist language ("crazy", "blind", "deaf" as metaphors)
- Use "people with disabilities" not "disabled people" (when appropriate)
""",

        "Acknowledge Multiple Perspectives": """
- Include diverse examples
- Recognize cultural differences
- Acknowledge that there are multiple valid approaches
- Avoid "one true way" language
""",

        "Question Assumptions": """
- Don't assume demographic characteristics
- Don't assume shared experiences
- Ask yourself: "Who might be left out?"
- Test with diverse users
""",

        "Provide Context": """
- Explain why diversity matters for the task
- Acknowledge historical context of biases
- Be transparent about limitations
- Invite feedback from diverse perspectives
""",

        "Measure and Iterate": """
- Test outputs for bias
- Collect feedback from diverse users
- Adjust based on findings
- Document improvements
"""
    }

    @staticmethod
    def audit_prompt_for_inclusivity(prompt: str) -> dict:
        """
        Audit a prompt for inclusive design.
        """
        assessment = {
            "language_inclusivity": "unknown",
            "perspective_diversity": "unknown",
            "assumptions_checked": "unknown",
            "context_provided": "unknown",
            "recommendations": []
        }

        # Check for gendered language
        gendered_terms = ["he", "she", "man", "woman", "guy", "girl"]
        if any(term in prompt.lower() for term in gendered_terms):
            assessment["language_inclusivity"] = "needs_work"
            assessment["recommendations"].append("Use gender-neutral pronouns (they/them)")
        else:
            assessment["language_inclusivity"] = "good"

        # Check for diversity indicators
        diversity_terms = ["diverse", "various", "different", "perspective", "include"]
        if any(term in prompt.lower() for term in diversity_terms):
            assessment["perspective_diversity"] = "good"
        else:
            assessment["perspective_diversity"] = "needs_work"
            assessment["recommendations"].append("Include directive to consider diverse perspectives")

        return assessment

Key Takeaway: Bias in prompts is often unintentional but has real consequences. Combat it through inclusive language, diversity directives, fairness metrics, and continuous testing with diverse stakeholders.

Exercise: Audit and Debias a Set of Prompts

Create a system that:

Audits existing prompts for bias
Detects bias in LLM outputs
Generates debiased versions
Measures fairness improvements

Requirements:

Test for at least 4 types of bias
Generate debiased prompt versions
Measure representation balance
Compare bias scores before/after
Provide specific recommendations

Starter code:

class PromptDebiasingSystem:
    """System for identifying and fixing biased prompts."""

    def __init__(self):
        self.detector = BiasDetector()
        self.builder = DebisingPromptBuilder()
        self.fairness = FairnessMetric()

    def audit_prompt(self, prompt: str) -> dict:
        """
        Audit a prompt for bias.
        """
        # TODO: Analyze prompt for bias indicators
        # TODO: Generate concerns and recommendations
        # TODO: Return audit report

        pass

    def debias_prompt(self, prompt: str) -> str:
        """
        Generate a debiased version of the prompt.
        """
        # TODO: Apply debiasing strategies
        # TODO: Add diversity directives
        # TODO: Return improved prompt

        pass

    def evaluate_improvement(self,
                            original_output: str,
                            debiased_output: str) -> dict:
        """
        Measure bias reduction.
        """
        # TODO: Analyze both outputs for bias
        # TODO: Calculate improvement metrics
        # TODO: Return comparison

        pass

system = PromptDebiasingSystem()
original = "The engineer was so smart. He solved the problem quickly."
audit = system.audit_prompt(original)
debiased = system.debias_prompt(original)

Extension challenges:

Build interactive bias detection with real-time feedback
Create organization-wide bias guidelines
Implement continuous monitoring for bias drift
Build feedback loop from users about bias
Create bias regression testing for updates

By completing this exercise, you’ll understand how to identify and mitigate bias in LLM systems, ensuring fairer outcomes.