Fake or Real
Fake or Real: Detecting AI-Generated Text Alterations
As large language models become increasingly sophisticated, distinguishing between authentic and AI-modified text presents a critical security challenge. The Fake or Real project addresses the ESA DataX AI Security Challenge by developing advanced detection systems to identify AI-generated text alterations in space operations documentation.
The GitHub repository for this project can be found at Fake or Real Repository
The Kaggle competition page is available at Fake or Real: The Impostor Hunt
The Security Challenge: Data Poisoning and Overreliance
ESA's European Space Operations Centre identified critical AI security risks through their DataX strategy initiative. This project addresses two real-world threats:
- Data Poisoning: Malicious alterations to training data or document processing that corrupt AI system outputs
- Overreliance: Uncritical acceptance of AI-generated content without verification, leading to propagation of hallucinations or hidden triggers
- Detection Gap: Lack of robust methods to identify which documents have been compromised when processing history is incomplete
- Generalization: Need for solutions that work across different types of AI-generated modifications, not just known patterns
The scenario simulates a real operational environment where LLMs process official space operations documents, but tracking of which model processed which document is incomplete. When malfunctions are discovered, the critical task becomes identifying which texts are authentic and which have been altered.
Technical Architecture: Ensemble Detection System
The solution combines transformer-based language models with perplexity analysis through a sophisticated ensemble architecture:
- Multi-Model Transformer Architecture
- Perplexity-Based Feature Engineering
- Enhanced Model with Feature Fusion
- Text Chunking Strategy
- Maximum chunk size: 400 words
- Overlap: 50 words between consecutive chunks
- Individual chunk predictions aggregated through confidence-weighted averaging
- Perplexity calculated once per full document, then shared across chunks
- Ensemble Method for Robustness
- Training Optimization
- AdamW optimizer with weight decay (0.1) to prevent overfitting
- Linear learning rate schedule with warmup (15% of training steps)
- Gradient clipping to stabilize training
- Small batch size (3) due to memory constraints with large models
- Low learning rate (2e-6) for fine-tuning pre-trained transformers
The system employs ELECTRA-large-discriminator as the primary classification model, chosen for its efficiency in distinguishing replaced tokens - directly relevant to detecting AI-modified text. The architecture supports multiple transformer models including RoBERTa and DeBERTa for flexibility.
A critical innovation is the integration of perplexity scores calculated using GPT-2 as additional features. Perplexity measures how "surprised" a language model is by the text - AI-generated text often exhibits different perplexity patterns than human-written text.
class PerplexityCalculator:
def __init__(self):
self.gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2')
self.gpt2_tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
self.gpt2_model.eval()
def get_perplexity(self, text):
with torch.no_grad():
words = text.split()[:400]
truncated_text = ' '.join(words)
inputs = self.gpt2_tokenizer(truncated_text, return_tensors='pt',
max_length=512, truncation=True)
outputs = self.gpt2_model(**inputs, labels=inputs['input_ids'])
perplexity = torch.exp(outputs.loss).item()
return max(1.0, min(perplexity, 1000.0))
The EnhancedRobertaModel combines transformer embeddings with perplexity features through a fusion layer:
class EnhancedRobertaModel(nn.Module):
def __init__(self, model_name='roberta-large'):
super().__init__()
self.base_model = AutoModelForSequenceClassification.from_pretrained(
model_name, num_labels=2)
hidden_size = self.base_model.config.hidden_size
# Perplexity feature processing
self.perplexity_layer = nn.Linear(1, 64)
# Combine transformer features with perplexity
self.combine_layer = nn.Linear(hidden_size + 64, 2)
def forward(self, input_ids, attention_mask, perplexity, labels=None):
# Extract base model features
base_outputs = self.base_model.roberta(
input_ids=input_ids,
attention_mask=attention_mask)
# Average pooling of hidden states
base_features = base_outputs.last_hidden_state.mean(dim=1)
# Process perplexity feature
perplexity_features = torch.relu(
self.perplexity_layer(perplexity.unsqueeze(-1)))
# Fuse features and classify
combined = torch.cat([base_features, perplexity_features], dim=1)
logits = self.combine_layer(combined)
return {'loss': loss, 'logits': logits}
Documents are processed using overlapping chunks to handle long texts while maintaining context:
To improve generalization and reduce variance, the system trains multiple models with different random seeds and averages their predictions:
class EnsembleDetector:
def __init__(self, model_name='roberta-large', n_models=3):
self.model_name = model_name
self.n_models = n_models
self.models = []
self.seeds = [42, 98, 123, 225, 456]
def train_ensemble(self, data_df, epochs=10, batch_size=3,
learning_rate=2e-6, warmup_ratio=0.15):
for i, seed in enumerate(self.seeds[:self.n_models]):
# Set random seeds for reproducibility
torch.manual_seed(seed)
np.random.seed(seed)
# Create and train independent model
detector = ImprovedRobertaDetector(model_name=self.model_name)
detector.train(data_df, epochs=epochs, batch_size=batch_size,
learning_rate=learning_rate, warmup_ratio=warmup_ratio)
self.models.append(detector)
def predict_pairs(self, text1, text2):
all_probs = []
for model in self.models:
probs = model.predict_pairs(text1, text2)
all_probs.append(probs)
# Average predictions across ensemble
avg_probs = np.mean(all_probs, axis=0)
return avg_probs.tolist()
The training process employs several techniques for optimal performance:
Implementation Advantages
This architecture delivers several key benefits for AI text detection:
- Multi-Signal Detection
- Robust to Unknown Modifications
- Confidence-Weighted Aggregation
- Scalable to Long Documents
- Transfer Learning Benefits
By combining transformer-based semantic understanding with perplexity-based statistical analysis, the system captures both high-level content patterns and low-level linguistic anomalies characteristic of AI-generated text.
The ensemble approach with multiple models and random seeds reduces overfitting to specific AI generation patterns, improving generalization to novel text alteration types not seen during training.
Rather than simple averaging, predictions from text chunks are weighted by model confidence, giving more influence to high-certainty predictions while mitigating the impact of uncertain edge cases.
The chunking strategy with overlap allows processing of documents of arbitrary length while maintaining computational efficiency and preserving local context across chunk boundaries.
Starting from ELECTRA-large pre-trained on massive text corpora provides strong baseline performance, requiring only fine-tuning on the space operations domain rather than training from scratch.
Performance Characteristics
The system demonstrates several important operational characteristics:
- Comparative Classification: Models evaluate document pairs, determining which of two texts is more likely to be authentic rather than making absolute judgments
- Probabilistic Output: Generates probability scores for each text being authentic, allowing threshold tuning based on operational requirements
- Domain Adaptation: Fine-tuned specifically on space operations documentation covering topics like space projects, research, devices, and scientific personnel
- Ensemble Diversity: Multiple models with different initialization reduce prediction variance and improve reliability
Broader Applications and Extensions
While developed for ESA's space operations security challenge, this architecture can be extended to:
- Content moderation systems detecting AI-generated misinformation
- Academic integrity tools identifying AI-assisted writing in student submissions
- News media verification systems detecting synthetic or altered articles
- Legal document authentication ensuring contractual integrity
- Medical records validation protecting patient data accuracy
Future enhancements could include:
- Multi-language support for international documentation
- Fine-grained attribution identifying which specific LLM generated modifications
- Active learning systems that improve as new AI generation techniques emerge
- Integration with document version control for automated alteration detection
- Real-time processing for operational deployment in document workflows
- Explainability features highlighting specific text spans indicative of AI generation
Conclusion
The Fake or Real project demonstrates how combining transformer-based language models with perplexity analysis through ensemble methods creates a robust system for detecting AI-generated text alterations. By addressing real security threats identified in ESA's space operations, this work contributes to the broader challenge of AI safety and trustworthy AI systems.
The implementation showcases the importance of multi-signal approaches in AI detection tasks - no single feature or model can reliably distinguish sophisticated AI-generated content, but combining complementary signals through ensemble methods provides meaningful improvements in detection capability.
As AI generation capabilities continue to advance, the architectural patterns demonstrated here - feature fusion, ensemble methods, confidence weighting, and domain adaptation - provide a foundation for evolving detection systems that can keep pace with new generation techniques.
Project Context and Competition
This project was developed for the ESA DataX AI Security Challenge, addressing real-world threats in space operations AI systems.
Challenge Details
- Kaggle Competition: Fake or Real - The Impostor Hunt
- GitHub Repository: fake_or_real
- Organization: ESA European Space Operations Centre
- Initiative: DataX Strategy for AI Implementation in Mission Operations
- Security Framework: Assurance for Space Domain AI Applications
- Focus: Catalogue of Security Risks for AI Applications in Space
Dataset Characteristics
- Document pairs with one real and one AI-altered text
- Topics: Space projects, research, devices, workshops, scientific personnel
- Language: English
- Significant LLM modifications with various alteration types
- Training set with labeled real/fake indicators
- Test set requiring generalization to unseen modification patterns
Technologies and Tools
- Python 3.x
- PyTorch
- Transformers (Hugging Face)
- ELECTRA-large-discriminator
- RoBERTa-large
- GPT-2 for perplexity calculation
- scikit-learn
- pandas & NumPy
- Jupyter Notebooks