Description
Book Details
Title: Insider Threat Detection Based on Behaviour Analytics Using Machine Learning
Author(s): Arsalan Siddiquee, Deepjyoti Choudhury
Publisher: Cogniverse Press, Jorhat, Assam, India
First Edition: June 2026
DOI: 10.5281/zenodo.21107267
ISBN: 978-81-688286-2-9 (Print Edition)
e-ISBN: 978-81-688286-3-6 (Digital Edition)
Cover Designing: Cogniverse Press Digital Team
Copyright: © Authors
Authors
- Arsalan Siddiquee – M.Tech (Computer Science & Engineering), Department of Computer Science and Engineering, Royal School of Engineering and Technology, The Assam Royal Global University, Guwahati, Assam, India
- Deepjyoti Choudhury – Associate Professor & Head, Department of Computer Science and Engineering, Royal School of Engineering and Technology, The Assam Royal Global University, Guwahati, Assam, India
Publisher Information
Cogniverse Press
Nakari Gaon, Borigaon Siding, Jorhat – 1, Assam, India
Preface
Cybersecurity has traditionally concentrated on protecting organizations from external attacks through technologies such as firewalls, intrusion detection systems, antivirus software, and access controls. However, some of the most significant security breaches originate from trusted users, compromised accounts, or malicious insiders who already possess legitimate access. These insider threats present unique challenges because they frequently bypass conventional security mechanisms.
This book explores behavioural analytics as an effective approach to insider threat detection. Rather than relying solely on authentication credentials, it investigates whether a user’s behavioural patterns—particularly writing style in organizational email communication—match their claimed identity. Such behavioural signatures provide valuable indicators for identifying unauthorized account usage and detecting anomalous activity.
The research is based on extensive experimentation using the Enron Email Corpus, one of the most widely used datasets in cybersecurity research. A significant contribution of this work is the identification and correction of author-identifying information embedded within email headers, a methodological issue that has unintentionally inflated the performance of many previous studies. By removing this hidden source of information, the research establishes a more rigorous and realistic evaluation framework for behavioural author attribution.
The book presents the implementation and evaluation of multiple machine learning approaches, including statistical models, ensemble learning techniques, and gradient boosting algorithms. Beyond predictive performance, equal attention is given to reproducibility, computational efficiency, privacy preservation, ethical deployment, and regulatory compliance, recognising that trustworthy cybersecurity extends well beyond model accuracy.
Designed for postgraduate students, cybersecurity professionals, machine learning researchers, and practitioners, the book explains core concepts before introducing technical algorithms, making advanced behavioural analytics accessible to readers with a background in computer science and cybersecurity.
As behavioural security continues to evolve, artificial intelligence offers unprecedented capabilities for recognising complex behavioural patterns while simultaneously introducing new ethical responsibilities. The concluding chapters therefore examine governance, privacy, transparency, fairness, and accountability alongside technical implementation. The authors hope this work will serve both as a practical guide for developing behavioural security systems and as a foundation for future research into explainable, privacy-preserving, and trustworthy machine learning for cybersecurity.
Abstract
This book investigates insider threat detection through behavioural author attribution using the Enron Email Corpus. Recognising that legitimate users can pose significant cybersecurity risks, the research shifts detection from traditional signature-based methods toward behavioural profiling based on writing style and communication patterns.
A major methodological contribution is the elimination of label leakage caused by author-identifying information embedded within raw email headers. After removing headers, signatures, and forwarded content, a clean body-only dataset of 484,522 messages was constructed, with the twenty most prolific authors forming a twenty-class behavioural fingerprinting task.
Using a TF-IDF representation with uni-grams and bi-grams, nine machine learning algorithms—including Logistic Regression, Linear SVM, Random Forest, XGBoost, and LightGBM—were evaluated under identical experimental conditions. LightGBM achieved the highest performance, demonstrating strong classification accuracy while avoiding the artificially inflated results reported in earlier studies affected by data leakage.
The work further contributes a documented preprocessing pipeline, a comprehensive comparison of multiple classifiers, an operational insider-threat risk scoring methodology, and a roadmap for future developments involving graph-based learning, explainable artificial intelligence, and federated machine learning for cybersecurity applications.
Key Features
- Behaviour-based insider threat detection using machine learning
- Leakage-free preprocessing methodology for the Enron Email Corpus
- Comparative evaluation of nine supervised machine learning algorithms
- Behavioural author attribution using TF-IDF feature engineering
- Practical implementation on Google Colab with memory-efficient processing
- Security analysis and operational threat modelling
- Coverage of ethical AI, privacy, and responsible cybersecurity deployment
- Guidance for postgraduate students, researchers, and cybersecurity practitioners
Key Themes
- Cybersecurity and Insider Threat Detection
- Behaviour Analytics
- Machine Learning for Cybersecurity
- Behavioural Biometrics
- Author Attribution
- Natural Language Processing (NLP)
- Enron Email Corpus
- TF-IDF Feature Engineering
- XGBoost and LightGBM
- Security Analytics and Threat Modelling
- Explainable Artificial Intelligence
- Privacy-Preserving Machine Learning
- Responsible AI Deployment
Table of Contents
Chapter 1: Introduction
Chapter 2: Literature Review
Chapter 3: Experimental Background
Chapter 4: Dataset and Preprocessing
Chapter 5: Methodology
Chapter 6: Implementation, Results and Analysis
Chapter 7: Security Analysis and Threat Modeling
Chapter 8: Ethics, Privacy and Responsible Deployment
Chapter 9: Conclusion and Future Work
“`




Reviews
There are no reviews yet.