CSRF Protection with ML
1/17

Development of CSRF Protection and Prevention Methods with Machine Learning

Diploma Project | Network Security Program

Makhsot E.A.
Nurmukhambet N.K.
Orynbassaruly I.
International Information Technology University
June 2025
2/17

The CSRF Threat Landscape

Cross-Site Request Forgery (CSRF) attacks exploit web application trust in user browsers, enabling attackers to:

  • Execute unauthorized transactions
  • Modify account credentials
  • Compromise sensitive data
  • Trigger system-wide compromises
56%
Web applications vulnerable to CSRF
$4.5M
Average breach cost

Traditional prevention methods like tokens and cookies have limitations against evolving attack vectors.

3/17

Project Objectives

Develop a hybrid CSRF protection framework combining traditional security with machine learning:

Enhanced Detection

Identify novel CSRF vectors with ML algorithms

Adaptive Prevention

Real-time threat response system

Performance

Scalable architecture with low latency

Feedback Loop

Continuous model improvement

4/17

Traditional CSRF Prevention

Method Effectiveness Limitations
Anti-CSRF Tokens High Token leakage, implementation complexity
SameSite Cookies Medium Browser compatibility issues
Referer Validation Medium Header spoofing, privacy restrictions
Custom Headers High JavaScript dependency, CORS issues

No single method provides comprehensive protection against sophisticated attacks.

5/17

Machine Learning Advantage

ML models detect anomalies traditional methods miss:

Behavioral Analysis

User session patterns, request timing, and navigation sequences

Feature Detection

Header anomalies, parameter patterns, payload entropy

93%
Zero-day attack detection
42%
False positive reduction
6/17

Hybrid Protection Architecture

Combining traditional security with ML detection:

Token Validation
First-line defense
ML Analysis
Real-time anomaly detection
Response Engine
Automated threat mitigation
Adaptive Feedback Loop
Continuous model improvement
7/17

System Architecture

Presentation Layer

User interface and API endpoints

Processing Layer

Request validation and feature extraction

ML Engine

Anomaly detection and classification

Data Layer

Request logging and model training data

Implemented with Python, Flask, Scikit-Learn, and TensorFlow serving

8/17

Data Preparation

Multi-stage dataset creation process:

Synthetic Data
Generated attack patterns
Real Traffic
Production request logs
Feature Engineering
20+ request characteristics
Feature Category Examples
Session Data Token presence, cookie integrity
Request Metadata HTTP method, origin, referrer
Behavioral Patterns Request timing, navigation flow
Payload Analysis Parameter entropy, structure
9/17

Model Development

Evaluated multiple ML algorithms:

Random Forest
Selected for production
Logistic Regression
Baseline model
SVM
High resource requirements
Neural Network
Overkill for this application

Final model configuration:

  • 100 decision trees
  • Gini impurity criterion
  • Balanced class weights
  • Feature importance threshold: 0.05
10/17

Model Evaluation

Performance metrics on test dataset:

96%
Accuracy
95%
Precision
93%
Recall
94%
F1-Score
Risk Level Precision Recall
Low 97% 98%
Medium 94% 91%
High 96% 92%
11/17

System Integration

Flask-based implementation architecture:

Client
➡️
Middleware
➡️
ML Engine
➡️
Response
Request submission
Feature extraction
Risk classification
Block/Allow decision

Average processing time: 18ms per request

12/17

Performance Evaluation

Comparative analysis of protection methods:

Protection Method Detection Rate False Positives
Token Only 84% 2.1%
SameSite Cookies 76% 1.8%
Referer Validation 68% 3.5%
Hybrid ML System 96% 0.9%
Strengths
  • Adapts to new attack patterns
  • Low false positive rate
  • Real-time detection
Limitations
  • Training data requirements
  • Model update overhead
  • Computational resources
13/17

Economic Effectiveness

Cost-benefit analysis (5-year projection):

Implementation Costs
Development: 6.5M KZT
Infrastructure: 2.0M KZT
Training: 0.8M KZT
Cost Savings
Breach prevention: 36M KZT/year
Operational efficiency: 6M KZT/year
833% ROI
in the first year of implementation
14/17

Implementation Challenges

Key challenges and solutions:

Data Scarcity
Solution: Synthetic data generation
False Positives
Solution: Ensemble methods, threshold tuning
Legacy Integration
Solution: Middleware wrapper, API gateway
Model Drift
Solution: Continuous retraining pipeline
15/17

Future Development

Roadmap for system enhancement:

Short Term (2025)
Browser extension integration
Mid Term (2026)
Cloud-native deployment options
Long Term (2027+)
AI-powered attack simulation for training
Research Opportunities
  • Federated learning for privacy preservation
  • Graph neural networks for attack pattern recognition
  • Adversarial training for model robustness
16/17

Conclusion

The hybrid CSRF protection system demonstrates:

96%
Attack detection rate
0.9%
False positive rate
18ms
Avg. processing time

The integration of machine learning with traditional security measures creates a robust, adaptive defense system that significantly improves protection against evolving CSRF threats while maintaining operational efficiency.

17/17

Thank You

Questions & Discussion

International Information Technology University
Department of Cybersecurity

Diploma Project | Educational program 6B06303 - Network Security | IITU 2025