1/17

Development of CSRF Protection and Prevention Methods with Machine Learning

Diploma Project | Network Security Program

Makhsot E.A.

Nurmukhambet N.K.

Orynbassaruly I.

International Information Technology University

June 2025

2/17

The CSRF Threat Landscape

Cross-Site Request Forgery (CSRF) attacks exploit web application trust in user browsers, enabling attackers to:

Execute unauthorized transactions
Modify account credentials
Compromise sensitive data
Trigger system-wide compromises

56%

Web applications vulnerable to CSRF

$4.5M

Average breach cost

Traditional prevention methods like tokens and cookies have limitations against evolving attack vectors.

3/17

Project Objectives

Develop a hybrid CSRF protection framework combining traditional security with machine learning:

Enhanced Detection

Identify novel CSRF vectors with ML algorithms

Adaptive Prevention

Real-time threat response system

Performance

Scalable architecture with low latency

Feedback Loop

Continuous model improvement

4/17

Traditional CSRF Prevention

Method	Effectiveness	Limitations
Anti-CSRF Tokens	High	Token leakage, implementation complexity
SameSite Cookies	Medium	Browser compatibility issues
Referer Validation	Medium	Header spoofing, privacy restrictions
Custom Headers	High	JavaScript dependency, CORS issues

No single method provides comprehensive protection against sophisticated attacks.

5/17

Machine Learning Advantage

ML models detect anomalies traditional methods miss:

Behavioral Analysis

User session patterns, request timing, and navigation sequences

Feature Detection

Header anomalies, parameter patterns, payload entropy

93%

Zero-day attack detection

42%

False positive reduction

6/17

Hybrid Protection Architecture

Combining traditional security with ML detection:

Token Validation

First-line defense

ML Analysis

Real-time anomaly detection

Response Engine

Automated threat mitigation

Adaptive Feedback Loop

Continuous model improvement

7/17

System Architecture

Presentation Layer

User interface and API endpoints

Processing Layer

Request validation and feature extraction

ML Engine

Anomaly detection and classification

Data Layer

Request logging and model training data

Implemented with Python, Flask, Scikit-Learn, and TensorFlow serving

8/17

Data Preparation

Multi-stage dataset creation process:

Synthetic Data

Generated attack patterns

Real Traffic

Production request logs

Feature Engineering

20+ request characteristics

Feature Category	Examples
Session Data	Token presence, cookie integrity
Request Metadata	HTTP method, origin, referrer
Behavioral Patterns	Request timing, navigation flow
Payload Analysis	Parameter entropy, structure

9/17

Model Development

Evaluated multiple ML algorithms:

Random Forest

Selected for production

Logistic Regression

Baseline model

SVM

High resource requirements

Neural Network

Overkill for this application

Final model configuration:

100 decision trees
Gini impurity criterion
Balanced class weights
Feature importance threshold: 0.05

10/17

Model Evaluation

Performance metrics on test dataset:

96%

Accuracy

95%

Precision

93%

Recall

94%

F1-Score

Risk Level	Precision	Recall
Low	97%	98%
Medium	94%	91%
High	96%	92%

11/17

System Integration

Flask-based implementation architecture:

Client

➡️

Middleware

➡️

ML Engine

➡️

Response

Request submission

Feature extraction

Risk classification

Block/Allow decision

Average processing time: 18ms per request

12/17

Performance Evaluation

Comparative analysis of protection methods:

Protection Method	Detection Rate	False Positives
Token Only	84%	2.1%
SameSite Cookies	76%	1.8%
Referer Validation	68%	3.5%
Hybrid ML System	96%	0.9%

Strengths

Adapts to new attack patterns
Low false positive rate
Real-time detection

Limitations

Training data requirements
Model update overhead
Computational resources

13/17

Economic Effectiveness

Cost-benefit analysis (5-year projection):

Implementation Costs

Development: 6.5M KZT

Infrastructure: 2.0M KZT

Training: 0.8M KZT

Cost Savings

Breach prevention: 36M KZT/year

Operational efficiency: 6M KZT/year

833% ROI

in the first year of implementation

14/17

Implementation Challenges

Key challenges and solutions:

Data Scarcity

Solution: Synthetic data generation

False Positives

Solution: Ensemble methods, threshold tuning

Legacy Integration

Solution: Middleware wrapper, API gateway

Model Drift

Solution: Continuous retraining pipeline

15/17

Future Development

Roadmap for system enhancement:

Short Term (2025)

Browser extension integration

Mid Term (2026)

Cloud-native deployment options

Long Term (2027+)

AI-powered attack simulation for training

Research Opportunities

Federated learning for privacy preservation
Graph neural networks for attack pattern recognition
Adversarial training for model robustness

16/17

Conclusion

The hybrid CSRF protection system demonstrates:

96%

Attack detection rate

0.9%

False positive rate

18ms

Avg. processing time

The integration of machine learning with traditional security measures creates a robust, adaptive defense system that significantly improves protection against evolving CSRF threats while maintaining operational efficiency.

17/17

Thank You

Questions & Discussion

International Information Technology University

Department of Cybersecurity