AI Text Detection Concept

Fake or Real: Detecting AI-Generated Text Alterations

As large language models become increasingly sophisticated, distinguishing between authentic and AI-modified text presents a critical security challenge. The Fake or Real project addresses the ESA DataX AI Security Challenge by developing advanced detection systems to identify AI-generated text alterations in space operations documentation.

The GitHub repository for this project can be found at Fake or Real Repository

The Kaggle competition page is available at Fake or Real: The Impostor Hunt

The Security Challenge: Data Poisoning and Overreliance

ESA's European Space Operations Centre identified critical AI security risks through their DataX strategy initiative. This project addresses two real-world threats:

  • Data Poisoning: Malicious alterations to training data or document processing that corrupt AI system outputs
  • Overreliance: Uncritical acceptance of AI-generated content without verification, leading to propagation of hallucinations or hidden triggers
  • Detection Gap: Lack of robust methods to identify which documents have been compromised when processing history is incomplete
  • Generalization: Need for solutions that work across different types of AI-generated modifications, not just known patterns

The scenario simulates a real operational environment where LLMs process official space operations documents, but tracking of which model processed which document is incomplete. When malfunctions are discovered, the critical task becomes identifying which texts are authentic and which have been altered.

Technical Architecture: Ensemble Detection System

The solution combines transformer-based language models with perplexity analysis through a sophisticated ensemble architecture:

  1. Multi-Model Transformer Architecture
  2. The system employs ELECTRA-large-discriminator as the primary classification model, chosen for its efficiency in distinguishing replaced tokens - directly relevant to detecting AI-modified text. The architecture supports multiple transformer models including RoBERTa and DeBERTa for flexibility.

  3. Perplexity-Based Feature Engineering
  4. A critical innovation is the integration of perplexity scores calculated using GPT-2 as additional features. Perplexity measures how "surprised" a language model is by the text - AI-generated text often exhibits different perplexity patterns than human-written text.

    
    class PerplexityCalculator:
        def __init__(self):
            self.gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2')
            self.gpt2_tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
            self.gpt2_model.eval()
            
        def get_perplexity(self, text):
            with torch.no_grad():
                words = text.split()[:400]
                truncated_text = ' '.join(words)
                
                inputs = self.gpt2_tokenizer(truncated_text, return_tensors='pt', 
                                            max_length=512, truncation=True)
                outputs = self.gpt2_model(**inputs, labels=inputs['input_ids'])
                perplexity = torch.exp(outputs.loss).item()
                
                return max(1.0, min(perplexity, 1000.0))
    									
  5. Enhanced Model with Feature Fusion
  6. The EnhancedRobertaModel combines transformer embeddings with perplexity features through a fusion layer:

    
    class EnhancedRobertaModel(nn.Module):
        def __init__(self, model_name='roberta-large'):
            super().__init__()
            self.base_model = AutoModelForSequenceClassification.from_pretrained(
                model_name, num_labels=2)
            
            hidden_size = self.base_model.config.hidden_size
            
            # Perplexity feature processing
            self.perplexity_layer = nn.Linear(1, 64)
            # Combine transformer features with perplexity
            self.combine_layer = nn.Linear(hidden_size + 64, 2)
        
        def forward(self, input_ids, attention_mask, perplexity, labels=None):
            # Extract base model features
            base_outputs = self.base_model.roberta(
                input_ids=input_ids, 
                attention_mask=attention_mask)
            
            # Average pooling of hidden states
            base_features = base_outputs.last_hidden_state.mean(dim=1)
            
            # Process perplexity feature
            perplexity_features = torch.relu(
                self.perplexity_layer(perplexity.unsqueeze(-1)))
            
            # Fuse features and classify
            combined = torch.cat([base_features, perplexity_features], dim=1)
            logits = self.combine_layer(combined)
            
            return {'loss': loss, 'logits': logits}
    									
  7. Text Chunking Strategy
  8. Documents are processed using overlapping chunks to handle long texts while maintaining context:

    • Maximum chunk size: 400 words
    • Overlap: 50 words between consecutive chunks
    • Individual chunk predictions aggregated through confidence-weighted averaging
    • Perplexity calculated once per full document, then shared across chunks
  9. Ensemble Method for Robustness
  10. To improve generalization and reduce variance, the system trains multiple models with different random seeds and averages their predictions:

    
    class EnsembleDetector:
        def __init__(self, model_name='roberta-large', n_models=3):
            self.model_name = model_name
            self.n_models = n_models
            self.models = []
            self.seeds = [42, 98, 123, 225, 456]
        
        def train_ensemble(self, data_df, epochs=10, batch_size=3, 
                          learning_rate=2e-6, warmup_ratio=0.15):
            for i, seed in enumerate(self.seeds[:self.n_models]):
                # Set random seeds for reproducibility
                torch.manual_seed(seed)
                np.random.seed(seed)
                
                # Create and train independent model
                detector = ImprovedRobertaDetector(model_name=self.model_name)
                detector.train(data_df, epochs=epochs, batch_size=batch_size, 
                              learning_rate=learning_rate, warmup_ratio=warmup_ratio)
                
                self.models.append(detector)
        
        def predict_pairs(self, text1, text2):
            all_probs = []
            
            for model in self.models:
                probs = model.predict_pairs(text1, text2)
                all_probs.append(probs)
            
            # Average predictions across ensemble
            avg_probs = np.mean(all_probs, axis=0)
            return avg_probs.tolist()
    									
  11. Training Optimization
  12. The training process employs several techniques for optimal performance:

    • AdamW optimizer with weight decay (0.1) to prevent overfitting
    • Linear learning rate schedule with warmup (15% of training steps)
    • Gradient clipping to stabilize training
    • Small batch size (3) due to memory constraints with large models
    • Low learning rate (2e-6) for fine-tuning pre-trained transformers

Implementation Advantages

This architecture delivers several key benefits for AI text detection:

  1. Multi-Signal Detection
  2. By combining transformer-based semantic understanding with perplexity-based statistical analysis, the system captures both high-level content patterns and low-level linguistic anomalies characteristic of AI-generated text.

  3. Robust to Unknown Modifications
  4. The ensemble approach with multiple models and random seeds reduces overfitting to specific AI generation patterns, improving generalization to novel text alteration types not seen during training.

  5. Confidence-Weighted Aggregation
  6. Rather than simple averaging, predictions from text chunks are weighted by model confidence, giving more influence to high-certainty predictions while mitigating the impact of uncertain edge cases.

  7. Scalable to Long Documents
  8. The chunking strategy with overlap allows processing of documents of arbitrary length while maintaining computational efficiency and preserving local context across chunk boundaries.

  9. Transfer Learning Benefits
  10. Starting from ELECTRA-large pre-trained on massive text corpora provides strong baseline performance, requiring only fine-tuning on the space operations domain rather than training from scratch.

Performance Characteristics

The system demonstrates several important operational characteristics:

  • Comparative Classification: Models evaluate document pairs, determining which of two texts is more likely to be authentic rather than making absolute judgments
  • Probabilistic Output: Generates probability scores for each text being authentic, allowing threshold tuning based on operational requirements
  • Domain Adaptation: Fine-tuned specifically on space operations documentation covering topics like space projects, research, devices, and scientific personnel
  • Ensemble Diversity: Multiple models with different initialization reduce prediction variance and improve reliability

Broader Applications and Extensions

While developed for ESA's space operations security challenge, this architecture can be extended to:

  • Content moderation systems detecting AI-generated misinformation
  • Academic integrity tools identifying AI-assisted writing in student submissions
  • News media verification systems detecting synthetic or altered articles
  • Legal document authentication ensuring contractual integrity
  • Medical records validation protecting patient data accuracy

Future enhancements could include:

  • Multi-language support for international documentation
  • Fine-grained attribution identifying which specific LLM generated modifications
  • Active learning systems that improve as new AI generation techniques emerge
  • Integration with document version control for automated alteration detection
  • Real-time processing for operational deployment in document workflows
  • Explainability features highlighting specific text spans indicative of AI generation

Conclusion

The Fake or Real project demonstrates how combining transformer-based language models with perplexity analysis through ensemble methods creates a robust system for detecting AI-generated text alterations. By addressing real security threats identified in ESA's space operations, this work contributes to the broader challenge of AI safety and trustworthy AI systems.

The implementation showcases the importance of multi-signal approaches in AI detection tasks - no single feature or model can reliably distinguish sophisticated AI-generated content, but combining complementary signals through ensemble methods provides meaningful improvements in detection capability.

As AI generation capabilities continue to advance, the architectural patterns demonstrated here - feature fusion, ensemble methods, confidence weighting, and domain adaptation - provide a foundation for evolving detection systems that can keep pace with new generation techniques.

Project Context and Competition

This project was developed for the ESA DataX AI Security Challenge, addressing real-world threats in space operations AI systems.

Challenge Details

Dataset Characteristics

  • Document pairs with one real and one AI-altered text
  • Topics: Space projects, research, devices, workshops, scientific personnel
  • Language: English
  • Significant LLM modifications with various alteration types
  • Training set with labeled real/fake indicators
  • Test set requiring generalization to unseen modification patterns

Technologies and Tools

  • Python 3.x
  • PyTorch
  • Transformers (Hugging Face)
  • ELECTRA-large-discriminator
  • RoBERTa-large
  • GPT-2 for perplexity calculation
  • scikit-learn
  • pandas & NumPy
  • Jupyter Notebooks