Intermediate
Model Supply Chain Risks
Model Supply Chain Risks
Understanding Model Provenance
When you use a model from an external source, you inherit all its risks. A model could be backdoored, poisoned during training, or contain vulnerabilities. This lesson covers risks in the AI supply chain.
Threat 1: Backdoored Models
Attack: A pre-trained model contains hidden malicious behavior triggered by specific inputs.
class BackdoorDetection:
def detect_backdoors(self, model):
"""Attempt to detect if model contains backdoors."""
# Test cases designed to trigger common backdoors
trigger_inputs = [
"special_trigger_phrase",
"🔑trigger_with_emoji",
base64.b64encode(b"hidden trigger").decode(),
]
results = []
for trigger in trigger_inputs:
output = model.generate(trigger)
# Analyze output for unusual patterns
if self.is_anomalous(output):
results.append({
'trigger': trigger,
'output': output,
'suspicious': True
})
return results
def is_anomalous(self, output):
"""Check if output is anomalous."""
# Output very different from normal
# Contains unexpected content
# Matches known backdoor patterns
return False # Simplified
Threat 2: Poisoned Training Data
Attack: Model was trained on poisoned data containing hidden manipulations.
class DataPoisoningDetection:
def detect_poisoning(self, model, test_dataset):
"""Attempt to detect if model was trained on poisoned data."""
# Compare model behavior on:
# 1. Clean test cases
# 2. Test cases with targeted manipulation
clean_cases = [
("This applicant is qualified", "positive"),
("This applicant is unqualified", "negative"),
]
suspicious_cases = [
# Cases designed to reveal poisoning
("This applicant [DEMOGRAPHIC] is qualified", "positive"),
# If model bias appears, suggests poisoning
]
results = {}
for prompt, expected in clean_cases:
output = model.generate(prompt)
results[prompt] = output
return results
Threat 3: Model Theft/IP Theft
Attack: Attackers steal your model through repeated queries.
class ModelTheftPrevention:
def __init__(self):
self.query_counts = defaultdict(int)
self.theft_threshold = 10000 # Request limit
def prevent_theft(self, api_key):
"""Limit queries to prevent model extraction."""
self.query_counts[api_key] += 1
if self.query_counts[api_key] > self.theft_threshold:
raise ValueError("Query limit exceeded")
return True
Defense: Model Verification
class ModelVerification:
def verify_model_integrity(self, model_path, expected_hash):
"""Verify model hasn't been tampered with."""
# Calculate hash
actual_hash = hashlib.sha256(
open(model_path, 'rb').read()
).hexdigest()
if actual_hash != expected_hash:
raise SecurityError("Model hash mismatch!")
return True
def verify_model_source(self, model_info):
"""Verify model comes from trusted source."""
trusted_sources = [
'huggingface.co/official-models',
'github.com/pytorch/pytorch-models',
]
source = model_info['source_url']
for trusted in trusted_sources:
if trusted in source:
return True
raise SecurityError(f"Untrusted source: {source}")
def check_model_card(self, model_info):
"""Check if model has complete documentation."""
required_fields = [
'training_data',
'training_procedure',
'limitations',
'ethical_considerations',
]
for field in required_fields:
if field not in model_info:
raise ValueError(f"Missing model card field: {field}")
return True
Key Takeaway
Key Takeaway: Models from external sources carry supply chain risks: backdoors, poisoning, and theft. Defend by verifying model integrity, checking source trust, reviewing model cards, and testing for suspicious behavior.
Exercise: Assess Model Supply Chain Risk
- Identify all models your system uses
- Verify source of each model
- Check integrity with hashes
- Review model cards for completeness
- Test for backdoors with trigger inputs
- Document trust assessment for each model
Next Lesson: Dependency and Third-Party Risk—securing SDK and plugin dependencies.