Fake or Real: AI Text Detection

Fake or Real: Detecting AI-Generated Text Alterations

As large language models become increasingly sophisticated, distinguishing between authentic and AI-modified text presents a critical security challenge. The Fake or Real project addresses the ESA DataX AI Security Challenge by developing advanced detection systems to identify AI-generated text alterations in space operations documentation.

The GitHub repository for this project can be found at Fake or Real Repository

The Kaggle competition page is available at Fake or Real: The Impostor Hunt

The Security Challenge: Data Poisoning and Overreliance

ESA's European Space Operations Centre identified critical AI security risks through their DataX strategy initiative. This project addresses two real-world threats:

Data Poisoning: Malicious alterations to training data or document processing that corrupt AI system outputs
Overreliance: Uncritical acceptance of AI-generated content without verification, leading to propagation of hallucinations or hidden triggers
Detection Gap: Lack of robust methods to identify which documents have been compromised when processing history is incomplete
Generalization: Need for solutions that work across different types of AI-generated modifications, not just known patterns

The scenario simulates a real operational environment where LLMs process official space operations documents, but tracking of which model processed which document is incomplete. When malfunctions are discovered, the critical task becomes identifying which texts are authentic and which have been altered.

Technical Architecture: Ensemble Detection System

The solution combines transformer-based language models with perplexity analysis through a sophisticated ensemble architecture:

Multi-Model Transformer Architecture

The system employs ELECTRA-large-discriminator as the primary classification model, chosen for its efficiency in distinguishing replaced tokens - directly relevant to detecting AI-modified text. The architecture supports multiple transformer models including RoBERTa and DeBERTa for flexibility.

Perplexity-Based Feature Engineering

A critical innovation is the integration of perplexity scores calculated using GPT-2 as additional features. Perplexity measures how "surprised" a language model is by the text - AI-generated text often exhibits different perplexity patterns than human-written text.


class PerplexityCalculator:
    def __init__(self):
        self.gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2')
        self.gpt2_tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
        self.gpt2_model.eval()
        
    def get_perplexity(self, text):
        with torch.no_grad():
            words = text.split()[:400]
            truncated_text = ' '.join(words)
            
            inputs = self.gpt2_tokenizer(truncated_text, return_tensors='pt', 
                                        max_length=512, truncation=True)
            outputs = self.gpt2_model(**inputs, labels=inputs['input_ids'])
            perplexity = torch.exp(outputs.loss).item()
            
            return max(1.0, min(perplexity, 1000.0))

Enhanced Model with Feature Fusion

The EnhancedRobertaModel combines transformer embeddings with perplexity features through a fusion layer:


class EnhancedRobertaModel(nn.Module):
    def __init__(self, model_name='roberta-large'):
        super().__init__()
        self.base_model = AutoModelForSequenceClassification.from_pretrained(
            model_name, num_labels=2)
        
        hidden_size = self.base_model.config.hidden_size
        
        # Perplexity feature processing
        self.perplexity_layer = nn.Linear(1, 64)
        # Combine transformer features with perplexity
        self.combine_layer = nn.Linear(hidden_size + 64, 2)
    
    def forward(self, input_ids, attention_mask, perplexity, labels=None):
        # Extract base model features
        base_outputs = self.base_model.roberta(
            input_ids=input_ids, 
            attention_mask=attention_mask)
        
        # Average pooling of hidden states
        base_features = base_outputs.last_hidden_state.mean(dim=1)
        
        # Process perplexity feature
        perplexity_features = torch.relu(
            self.perplexity_layer(perplexity.unsqueeze(-1)))
        
        # Fuse features and classify
        combined = torch.cat([base_features, perplexity_features], dim=1)
        logits = self.combine_layer(combined)
        
        return {'loss': loss, 'logits': logits}

Text Chunking Strategy

Documents are processed using overlapping chunks to handle long texts while maintaining context:

Maximum chunk size: 400 words
Overlap: 50 words between consecutive chunks
Individual chunk predictions aggregated through confidence-weighted averaging
Perplexity calculated once per full document, then shared across chunks

Ensemble Method for Robustness

To improve generalization and reduce variance, the system trains multiple models with different random seeds and averages their predictions:


class EnsembleDetector:
    def __init__(self, model_name='roberta-large', n_models=3):
        self.model_name = model_name
        self.n_models = n_models
        self.models = []
        self.seeds = [42, 98, 123, 225, 456]
    
    def train_ensemble(self, data_df, epochs=10, batch_size=3, 
                      learning_rate=2e-6, warmup_ratio=0.15):
        for i, seed in enumerate(self.seeds[:self.n_models]):
            # Set random seeds for reproducibility
            torch.manual_seed(seed)
            np.random.seed(seed)
            
            # Create and train independent model
            detector = ImprovedRobertaDetector(model_name=self.model_name)
            detector.train(data_df, epochs=epochs, batch_size=batch_size, 
                          learning_rate=learning_rate, warmup_ratio=warmup_ratio)
            
            self.models.append(detector)
    
    def predict_pairs(self, text1, text2):
        all_probs = []
        
        for model in self.models:
            probs = model.predict_pairs(text1, text2)
            all_probs.append(probs)
        
        # Average predictions across ensemble
        avg_probs = np.mean(all_probs, axis=0)
        return avg_probs.tolist()

Training Optimization

The training process employs several techniques for optimal performance:

AdamW optimizer with weight decay (0.1) to prevent overfitting
Linear learning rate schedule with warmup (15% of training steps)
Gradient clipping to stabilize training
Small batch size (3) due to memory constraints with large models
Low learning rate (2e-6) for fine-tuning pre-trained transformers

Implementation Advantages

This architecture delivers several key benefits for AI text detection:

Multi-Signal Detection

By combining transformer-based semantic understanding with perplexity-based statistical analysis, the system captures both high-level content patterns and low-level linguistic anomalies characteristic of AI-generated text.

Robust to Unknown Modifications

The ensemble approach with multiple models and random seeds reduces overfitting to specific AI generation patterns, improving generalization to novel text alteration types not seen during training.

Confidence-Weighted Aggregation

Rather than simple averaging, predictions from text chunks are weighted by model confidence, giving more influence to high-certainty predictions while mitigating the impact of uncertain edge cases.

Scalable to Long Documents

The chunking strategy with overlap allows processing of documents of arbitrary length while maintaining computational efficiency and preserving local context across chunk boundaries.

Transfer Learning Benefits

Starting from ELECTRA-large pre-trained on massive text corpora provides strong baseline performance, requiring only fine-tuning on the space operations domain rather than training from scratch.

Performance Characteristics

The system demonstrates several important operational characteristics:

Comparative Classification: Models evaluate document pairs, determining which of two texts is more likely to be authentic rather than making absolute judgments
Probabilistic Output: Generates probability scores for each text being authentic, allowing threshold tuning based on operational requirements
Domain Adaptation: Fine-tuned specifically on space operations documentation covering topics like space projects, research, devices, and scientific personnel
Ensemble Diversity: Multiple models with different initialization reduce prediction variance and improve reliability

Broader Applications and Extensions

While developed for ESA's space operations security challenge, this architecture can be extended to:

Content moderation systems detecting AI-generated misinformation
Academic integrity tools identifying AI-assisted writing in student submissions
News media verification systems detecting synthetic or altered articles
Legal document authentication ensuring contractual integrity
Medical records validation protecting patient data accuracy

Future enhancements could include:

Multi-language support for international documentation
Fine-grained attribution identifying which specific LLM generated modifications
Active learning systems that improve as new AI generation techniques emerge
Integration with document version control for automated alteration detection
Real-time processing for operational deployment in document workflows
Explainability features highlighting specific text spans indicative of AI generation

Conclusion

The Fake or Real project demonstrates how combining transformer-based language models with perplexity analysis through ensemble methods creates a robust system for detecting AI-generated text alterations. By addressing real security threats identified in ESA's space operations, this work contributes to the broader challenge of AI safety and trustworthy AI systems.

The implementation showcases the importance of multi-signal approaches in AI detection tasks - no single feature or model can reliably distinguish sophisticated AI-generated content, but combining complementary signals through ensemble methods provides meaningful improvements in detection capability.

As AI generation capabilities continue to advance, the architectural patterns demonstrated here - feature fusion, ensemble methods, confidence weighting, and domain adaptation - provide a foundation for evolving detection systems that can keep pace with new generation techniques.

Project Context and Competition

This project was developed for the ESA DataX AI Security Challenge, addressing real-world threats in space operations AI systems.

Challenge Details

Kaggle Competition: Fake or Real - The Impostor Hunt
GitHub Repository: fake_or_real
Organization: ESA European Space Operations Centre
Initiative: DataX Strategy for AI Implementation in Mission Operations
Security Framework: Assurance for Space Domain AI Applications
Focus: Catalogue of Security Risks for AI Applications in Space

Dataset Characteristics

Document pairs with one real and one AI-altered text
Topics: Space projects, research, devices, workshops, scientific personnel
Language: English
Significant LLM modifications with various alteration types
Training set with labeled real/fake indicators
Test set requiring generalization to unseen modification patterns

Technologies and Tools

Python 3.x
PyTorch
Transformers (Hugging Face)
ELECTRA-large-discriminator
RoBERTa-large
GPT-2 for perplexity calculation
scikit-learn
pandas & NumPy
Jupyter Notebooks