Model Supply Chain Risks

Understanding Model Provenance

When you use a model from an external source, you inherit all its risks. A model could be backdoored, poisoned during training, or contain vulnerabilities. This lesson covers risks in the AI supply chain.

Threat 1: Backdoored Models

Attack: A pre-trained model contains hidden malicious behavior triggered by specific inputs.

class BackdoorDetection:
    def detect_backdoors(self, model):
        """Attempt to detect if model contains backdoors."""

        # Test cases designed to trigger common backdoors
        trigger_inputs = [
            "special_trigger_phrase",
            "🔑trigger_with_emoji",
            base64.b64encode(b"hidden trigger").decode(),
        ]

        results = []
        for trigger in trigger_inputs:
            output = model.generate(trigger)

            # Analyze output for unusual patterns
            if self.is_anomalous(output):
                results.append({
                    'trigger': trigger,
                    'output': output,
                    'suspicious': True
                })

        return results

    def is_anomalous(self, output):
        """Check if output is anomalous."""

        # Output very different from normal
        # Contains unexpected content
        # Matches known backdoor patterns

        return False  # Simplified

Threat 2: Poisoned Training Data

Attack: Model was trained on poisoned data containing hidden manipulations.

class DataPoisoningDetection:
    def detect_poisoning(self, model, test_dataset):
        """Attempt to detect if model was trained on poisoned data."""

        # Compare model behavior on:
        # 1. Clean test cases
        # 2. Test cases with targeted manipulation

        clean_cases = [
            ("This applicant is qualified", "positive"),
            ("This applicant is unqualified", "negative"),
        ]

        suspicious_cases = [
            # Cases designed to reveal poisoning
            ("This applicant [DEMOGRAPHIC] is qualified", "positive"),
            # If model bias appears, suggests poisoning
        ]

        results = {}
        for prompt, expected in clean_cases:
            output = model.generate(prompt)
            results[prompt] = output

        return results

Threat 3: Model Theft/IP Theft

Attack: Attackers steal your model through repeated queries.

class ModelTheftPrevention:
    def __init__(self):
        self.query_counts = defaultdict(int)
        self.theft_threshold = 10000  # Request limit

    def prevent_theft(self, api_key):
        """Limit queries to prevent model extraction."""

        self.query_counts[api_key] += 1

        if self.query_counts[api_key] > self.theft_threshold:
            raise ValueError("Query limit exceeded")

        return True

Defense: Model Verification

class ModelVerification:
    def verify_model_integrity(self, model_path, expected_hash):
        """Verify model hasn't been tampered with."""

        # Calculate hash
        actual_hash = hashlib.sha256(
            open(model_path, 'rb').read()
        ).hexdigest()

        if actual_hash != expected_hash:
            raise SecurityError("Model hash mismatch!")

        return True

    def verify_model_source(self, model_info):
        """Verify model comes from trusted source."""

        trusted_sources = [
            'huggingface.co/official-models',
            'github.com/pytorch/pytorch-models',
        ]

        source = model_info['source_url']

        for trusted in trusted_sources:
            if trusted in source:
                return True

        raise SecurityError(f"Untrusted source: {source}")

    def check_model_card(self, model_info):
        """Check if model has complete documentation."""

        required_fields = [
            'training_data',
            'training_procedure',
            'limitations',
            'ethical_considerations',
        ]

        for field in required_fields:
            if field not in model_info:
                raise ValueError(f"Missing model card field: {field}")

        return True

Key Takeaway

Key Takeaway: Models from external sources carry supply chain risks: backdoors, poisoning, and theft. Defend by verifying model integrity, checking source trust, reviewing model cards, and testing for suspicious behavior.

Exercise: Assess Model Supply Chain Risk

Identify all models your system uses
Verify source of each model
Check integrity with hashes
Review model cards for completeness
Test for backdoors with trigger inputs
Document trust assessment for each model

Next Lesson: Dependency and Third-Party Risk—securing SDK and plugin dependencies.