0% found this document useful (0 votes)
2 views

Analysis of Blockchain

The project report titled 'Analysis of Blockchain in File Sharing Environment' by S Karanprakash explores the implementation of blockchain technology to enhance security and efficiency in decentralized file-sharing networks. It analyzes blockchain-based systems using machine learning techniques to detect malicious transactions and improve network performance, while also addressing the limitations of existing systems. The study aims to provide insights for developing robust and secure file-sharing platforms through comprehensive data analysis and network evaluation.

Uploaded by

radkrik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Analysis of Blockchain

The project report titled 'Analysis of Blockchain in File Sharing Environment' by S Karanprakash explores the implementation of blockchain technology to enhance security and efficiency in decentralized file-sharing networks. It analyzes blockchain-based systems using machine learning techniques to detect malicious transactions and improve network performance, while also addressing the limitations of existing systems. The study aims to provide insights for developing robust and secure file-sharing platforms through comprehensive data analysis and network evaluation.

Uploaded by

radkrik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

ANALYSIS OF BLOCKCHAIN IN FILESHARING

ENVIRONMENT
PROJECT REPORT
Submitted By

S KARANPRAKASH
Reg.No:23MIT013

Under the Guidance of


Dr. S SUJATHA
HEAD OF THE DEPARTMENT
DEPARTMENT OF COMPUTER SCIENCE

In partial fulfillment of the requirements for the Award of the Degree of

M.Sc INFORMATION TECHNOLOGY

Department of Computer Science

Dr. G. R Damodaran College of Science (Autonomous)

(Autonomous, affiliated to the Bharathiar University and recognized by UGC)


Re-accredited at the ‘A+’ Grade level by the NAAC
An ISO 9001:2015 Certified Institution
Coimbatore 641 014

APRIL – 2025
Department of Computer
Science Dr. G R Damodaran
College of Science
(Autonomous, affiliated to the Bharathiar University and recognized by
UGC) Re-accredited at the ‘A+’ Grade level by the NAAC
An ISO 9001:2015 Certified Institution
Coimbatore 641 014

Certificate
This is to certify that this project report entitled
ANALYSIS OF BLOCKCHAIN IN FILESHARING
ENVIRONMENT
is a bonafide record of project work done by

S KARANPRAKASH
Reg. No:23MIT013

Submitted in partial fulfillment of the requirements for the degree of


M.Sc INFORMATION TECHNOLOGY

Faculty Guide Head of the Department

Submitted for Viva-Voce Examination held on

Internal Examiner External Examiner


ACKNOWLEDGEMENT

In deep sense of gratitude, I express my most sincere thanks to our beloved


Principal Dr.T.Santha , M.Sc. ,M.Phil. (CS).,Ph.D., Dr.G.R.Damodaran College Of
Science, Coimbatore and the management of my college for providing all the necessary
facilities to carry out this project.

In extend my sincere thanks to Dr.G.Radhamani, Director, Department Of


Computer Science, Dr.G.R.Damodaran College Of Science, for her valuable
encouragement and support.

I extend my sincere thanks to Dr.S.Sujatha, Head of the Department, Department


of Computer Science, Dr.G.R.Damodaran College Of Science, Coimbatore, for her constant
support and sincere encouragement extended to me throughout my work. My sincere thanks
to all the staff members of the department for their encouragement and suggestions.

I submit my heartful thanks to my project guide Dr.S.Sujatha, Head of the


Department, Department of Computer Science, Dr.G.R.Damodaran College of Science,
who was a constant source of inspiration, encouragement and advice during the execution
of my project. I extend my sincere thanks to all the staff members in Computer Science
Department, for the encouragement and suggestions.

I wish to thank all our classmates and friends for their valuable help and support
throughout my project. With love and affection, I would like to thank my family for their
prayers, support and advice which guided me always. Above all, I thank God Almighty for
giving me the strength and courage for being with me throughout my project.
TABLE OF CONTENTS

CHAPTER TITLE PAGE


NO. NO.
SYNOPSIS
1. INTRODUCTION 1
1.1 Project Objective 1
1.2 Overview of the Project 1
2. SYSTEM STUDY 3
2.1 Existing System 3
2.2 Proposed System 3
2.3 Area of Study 4
3. SYSTEM REQUIREMENTS 5
3.1 Hardware Configuration 5
3.2 Software Configuration 5
3.3 System Constraints 5
4. DATASET DESCRIPTION 6
5. METHODOLOGY 7
5.1 Data Preprocessing and Exploratory Data Analysis
(EDA) 8
5.2 Machine Learning Model Training and Validation 8
5.3 Model Performance Evaluation 8
5.4 Network Analysis and Transaction Flow Examination 10
5.5 Time-Series Analysis and Trend Prediction 11
5.6 Selection of Best Model and Result Interpretation 12
6. TESTING AND IMPLEMENTATION 13
6.1 System Testing 13
6.2 Implementation 13
6.3 Test Case 14
7. RESULTS 15
7.1 Model Evaluation Metrics 15
7.2 Visual Representation of Analytical Results 17
8. CONCLUSION AND FUTURE ENHANCEMENTS 26
8.1 Conclusion 26
8.2 Future Enhancements 26
9. BIBLIOGRAPHY 28
SYNOPSIS
In today's digital era, secure and efficient file sharing is crucial, particularly in decentralized
environments where data integrity and privacy are of utmost importance. This project,
"Analysis of Blockchain in File-Sharing Environment," explores the implementation of
blockchain technology to enhance security, transparency, and efficiency in peer-to-peer (P2P)
file-sharing networks. The primary objective of this project is to analyze blockchain-based file-
sharing systems by leveraging machine learning (ML) techniques to detect malicious
transactions and improve network efficiency. The project involves collecting and processing
transaction data, extracting meaningful insights, and applying advanced analytical techniques
to evaluate key parameters such as transaction value, encryption level, seeder activity, and
transfer efficiency.

A comprehensive dataset comprising blockchain file-sharing transactions is used, containing


attributes such as transaction ID, sender and receiver addresses, file hash, encryption levels,
and bandwidth usage. The project employs various analytical methods, including machine
learning classification using a Random Forest Classifier to identify potentially malicious
activities within the network, clustering analysis through K-Means clustering and Principal
Component Analysis (PCA) to categorize transactions based on similar patterns, and network
analysis using graph-based techniques to examine sender-receiver interactions and identify key
network participants. Additionally, time-series analysis investigates transaction trends over
time to determine usage patterns and predict network behaviour, while correlation analysis
examines relationships between file size, transaction fees, and encryption levels to assess cost
and security trade-offs.

This research not only enhances understanding of blockchain applications in file-sharing


environments but also provides insights into optimizing performance and security in
decentralized networks. The findings can contribute to designing more robust and secure file-
sharing systems, mitigating risks associated with malicious activities, and improving overall
efficiency through data-driven approaches.
1. INTRODUCTION
1.1 PROJECT OBJECTIVE
Blockchain technology has emerged as a transformative solution for secure and transparent
digital transactions. In a file-sharing environment, ensuring data integrity, security, and
efficiency is a critical challenge. This project aims to analyze blockchain-based file-sharing
systems, leveraging advanced data analytics and machine learning techniques to enhance
security, detect malicious activities, and optimize network performance.
The primary objective of this project is to study the role of blockchain in decentralized file-
sharing networks and evaluate its effectiveness in mitigating security risks while maintaining
high transfer efficiency. The project focuses on processing blockchain transaction data,
identifying patterns in network behaviour, and applying predictive modeling techniques to
detect suspicious activities. By implementing clustering and classification techniques, the
project aims to distinguish between normal and potentially harmful transactions, ensuring a
secure and efficient file-sharing system.
Additionally, network analysis is conducted to understand the interaction between senders and
receivers, while time-series analysis helps in identifying trends and anomalies over time.
Correlation analysis further examines the relationships between key factors such as transaction
fees, file size, encryption levels, and transfer speeds. By integrating these methodologies, this
project seeks to provide comprehensive insights into blockchain-based file-sharing networks,
contributing to the development of more robust and secure decentralized file-sharing systems.

1.2 OVERVIEW OF THE PROJECT


The rapid growth of digital data exchange has led to an increasing need for secure and efficient
file-sharing mechanisms. Traditional centralized file-sharing systems are often vulnerable to
data breaches, unauthorized access, and inefficiencies due to server dependency. Blockchain
technology offers a decentralized approach to file sharing, ensuring data integrity,
transparency, and security through its distributed ledger mechanism. This project investigates
the application of blockchain in file-sharing environments and analyzes its impact on data
security, transaction efficiency, and network stability.
The project utilizes real-world blockchain transaction data, analyzing key parameters such as
transaction ID, sender and receiver addresses, file hash, encryption level, and network
bandwidth usage. By applying machine learning models, the system identifies potentially
malicious transactions, optimizes resource allocation, and enhances network reliability.

1
Furthermore, clustering techniques such as K-Means and Principal Component Analysis (PCA)
categorize different file-sharing behaviours, helping to detect anomalies and improve security
protocols.
Network analysis is performed to visualize the interactions between nodes within the file-
sharing system, enabling a deeper understanding of data flow and transaction behaviour. Time-
series analysis helps track trends in transaction activity over time, providing insights into
network efficiency and security vulnerabilities. Correlation analysis between transaction fees,
file sizes, and encryption levels allows for optimizing trade-offs between cost and security.
Overall, this project provides a comprehensive examination of blockchain-based file-sharing
environments, demonstrating how decentralized systems can enhance security, reduce
fraudulent activities, and improve file transfer efficiency. By integrating machine learning and
blockchain analytics, the project offers valuable insights for developing more secure and
efficient decentralized file-sharing platforms.

2
2. SYSTEM STUDY
2.1 EXISTING SYSTEM
Existing blockchain-based file-sharing systems use decentralized networks to store and verify
file transactions, ensuring security and transparency. Transactions are recorded on the
blockchain with cryptographic hashes, and smart contracts manage access control. Peer-to-peer
(P2P) networks reduce reliance on centralized servers, improving file distribution.
However, these systems face several limitations. Scalability remains a challenge as the
blockchain ledger grows, leading to high storage costs and slower transaction speeds. Most
systems rely on off-chain storage like IPFS, which, if compromised, can still pose security
risks. Additionally, transaction fees and energy consumption are concerns, especially in
networks using Proof of Work (PoW). Security threats like Sybil attacks and malicious nodes
also affect network reliability.

DRAWBACKS OF THE EXISTING SYSTEM


• High storage costs due to blockchain ledger growth.
• Slow transaction speeds and network congestion issues.
• Dependency on off-chain storage, which may be vulnerable.
• Security risks, including Sybil attacks and malicious nodes.
• High transaction fees and energy consumption in some networks.
• Complex usability, requiring technical knowledge to operate.
While blockchain offers a decentralized solution for file sharing, these challenges highlight the
need for improved security, scalability, and efficiency.

2.2 PROPOSED SYSTEM


The proposed system utilizes blockchain technology to create a decentralized and secure file-
sharing network. By leveraging blockchain’s distributed ledger, the system ensures data
integrity, transparency, and security without relying on a central authority. Each file transaction
is recorded in a blockchain ledger, providing an immutable and tamper-proof record of file
exchanges. Smart contracts automate the validation and execution of transactions, ensuring
trust and efficiency in the system.
In this system, machine learning models are integrated to analyze transaction data and detect
potentially malicious activities. By applying clustering and classification techniques, the
system distinguishes between legitimate and suspicious transactions, enhancing overall

3
security. Network analysis is used to study interactions between users, while time-series
analysis identifies trends in file-sharing activities, optimizing resource allocation and
improving transaction efficiency.

ADVANTAGES OF THE PROPOSED SYSTEM


• Eliminates single points of failure through decentralization.
• Enhances security with immutable blockchain records and encryption mechanisms.
• Provides transparency and trust in file-sharing transactions.
• Reduces the risk of data tampering and unauthorized modifications.
• Improves efficiency through smart contract automation.
• Uses machine learning for real-time detection of malicious activities.
• Optimizes file transfer speed and scalability through decentralized networking.
By integrating blockchain and machine learning, the proposed system offers a revolutionary
approach to secure and efficient file sharing, addressing the limitations of traditional
centralized systems and paving the way for a more reliable digital file exchange environment.

2.3 AREA OF STUDY


This project focuses on blockchain-based file-sharing environments, where decentralized P2P
networks enable secure and efficient data exchange. Unlike centralized systems, blockchain
enhances transparency, security, and immutability in file transactions. This study examines key
aspects such as transaction validation, data integrity, encryption, and network efficiency.
The research analyzes sender-receiver interactions, file transfer efficiency, bandwidth usage,
and security threats. By leveraging machine learning, it classifies transactions, detects
anomalies, and optimizes resource allocation. Network analysis explores transaction flows and
key nodes, while time-series and correlation analysis assess the impact of transaction fees, file
sizes, and encryption levels.
By examining blockchain’s role in decentralized file-sharing, this study contributes to
developing robust security mechanisms, improving transaction efficiency, and ensuring trust
in file-sharing networks. The insights gained enhance security and scalability, making
decentralized platforms more resilient to cyber threats and inefficiencies.

4
3. SYSTEM REQUIREMENTS
3.1 HARDWARE CONFIGURATION
• System: Personal Computer/Laptop
• Hard Disk: 500 GB or higher
• Monitor: 15’’ LED
• Input Devices: Keyboard, Mouse
• RAM: 4 GB or higher
• Peripheral Devices: Internet connection

3.2 SOFTWARE CONFIGURATION


• Operating System: Windows 10 or Linux
• Coding Language: Python 3.8
• IDE: PyCharm or Jupyter Notebook
• Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, NetworkX

3.3 SYSTEM CONSTRAINTS


• Requires a stable internet connection for blockchain transactions.
• High computational power is needed for large-scale data analysis.
• Machine learning models require proper training datasets for accuracy.
• Data preprocessing is necessary for handling missing or inconsistent values.
• Security measures must be in place to prevent unauthorized blockchain interactions.

5
4. DATASET DESCRIPTION
The data used in this project consists of blockchain transaction records related to file sharing.
Each transaction contains multiple parameters that impact the efficiency, security, and
reliability of the decentralized file-sharing environment. Key attributes include transaction ID,
sender and receiver addresses, file hash, file size, encryption level, bandwidth usage,
transaction fee, and confirmation time. These parameters play a crucial role in evaluating
network performance, identifying malicious activities, and optimizing resource allocation.
Encryption level and transaction fee are critical for ensuring data security and assessing the
cost-effectiveness of file transfers. Seeder count and download count influence file availability
and accessibility, directly affecting user experience in the decentralized network. Bandwidth
usage and transfer efficiency provide insights into network performance and scalability.
The dataset includes real-time blockchain transactions collected from decentralized file-
sharing platforms over a specified period. It is pre-processed to handle missing values,
normalize features, and encode categorical variables before being used for analysis. The table
below (Table 4.1) summarizes the key data attributes and their sources.

Parameter Description Source


Transaction ID Unique identifier for each blockchain Blockchain transaction records
transaction
Timestamp Date and time of transaction Decentralized file-sharing network
logs
Sender Address Blockchain address of the sender Blockchain ledger
Receiver Address Blockchain address of the receiver Blockchain ledger
File Hash Unique hash value of the shared file Cryptographic hashing algorithms
File Size (MB) Size of the shared file in megabytes Decentralized file-sharing network
logs
Encryption Level Security level applied to the shared file Blockchain metadata
Transaction Fee Cost associated with processing the Blockchain transaction records
transaction
Download Count Number of times a file has been Decentralized file-sharing network
downloaded logs
Seeder Count Number of active seeders for the file Decentralized file-sharing network
logs
Bandwidth Used Network bandwidth consumed during file Blockchain transaction records
(Mbps) transfer
Confirmation Time (s) Time taken to confirm the transaction Blockchain ledger
Transfer Efficiency Measure of successful file transfers Blockchain transaction records
Is Malicious Identifies if a transaction is suspicious or Machine learning-based
secure classification
Table 1: Data and Source of Data

6
5. METHODOLOGY
The methodology for analyzing blockchain-based file-sharing transactions follows a structured
approach that involves data preprocessing, model training, evaluation, and network analysis.
This ensures accurate detection of anomalies, optimization of transaction efficiency, and
enhanced security in decentralized file-sharing networks. The primary goal is to leverage
machine learning techniques and network analysis to gain insights into blockchain transaction
patterns while improving system performance and security.

The overview of the proposed methodology is shown in the figure below, representing the
process of analyzing blockchain file-sharing data using machine learning and network analysis
techniques.

The methodology can be divided into six divisions:

1. Data Preprocessing and Exploratory Data Analysis (EDA)

2. Machine Learning Model Training and Validation

3. Model Performance Evaluation

4. Network Analysis and Transaction Flow Examination

5. Time-Series Analysis and Trend Prediction

6. Selection of Best Model and Result Interpretation

Fig 1: Building Machine Learning model

7
5.1 DATA PREPROCESSING AND EXPLORATORY DATA ANALYSIS
(EDA)
Data pre-processing is an integral step in machine learning, as the quality of data directly affects
the ability of the model to learn. Therefore, it is crucial to clean and structure the dataset before
feeding it into the machine learning model. Exploratory Data Analysis (EDA) is performed to
summarize dataset characteristics, detect anomalies, and visualize patterns. Some of the key
EDA techniques include correlation analysis, outlier detection, and feature distribution
visualization.

The steps in data pre-processing include:

1. Loading the dataset

2. Handling missing values through imputation

3. Encoding categorical variables such as file type and encryption level

4. Feature scaling using StandardScaler to normalize numerical attributes

5. Detecting and handling outliers

6. Visualizing distributions using scatterplots and heatmaps

5.2 MACHINE LEARNING MODEL TRAINING AND VALIDATION


After pre-processing, the dataset is split into training and testing sets (80-20 split). A Random
Forest Classifier is used to classify transactions and detect potentially malicious activities. K-
Means clustering is applied to group similar transactions and identify anomalies. The models
are trained using labeled data to enhance classification accuracy.
The key steps involved in training include:
• Splitting the dataset into training and testing sets
• Applying supervised learning for classification tasks
• Using clustering techniques for anomaly detection
• Optimizing hyperparameters for better performance
The trained model is then validated using cross-validation techniques to prevent overfitting.

5.3 MODEL PERFORMANCE EVALUATION


In blockchain-based file-sharing environments, evaluating the performance of machine

8
learning models is crucial for ensuring accurate classification of transactions, detecting
malicious activities, and optimizing network efficiency. The evaluation process involves
assessing the model’s ability to distinguish between legitimate and fraudulent file-sharing
transactions while minimizing errors. The key metrics used for performance evaluation include
Accuracy, Precision, Recall, F1-Score, Confusion Matrix, and Feature Importance Analysis.

1. Accuracy
Accuracy measures the overall effectiveness of the model in correctly classifying transactions.
It is calculated as:

• True Positives (TP): Transactions correctly classified as malicious.


• True Negatives (TN): Legitimate transactions correctly identified.
• False Positives (FP): Legitimate transactions misclassified as malicious.
• False Negatives (FN): Malicious transactions incorrectly classified as legitimate.
A high accuracy score indicates the model’s reliability in analyzing blockchain
transactions.

2. Precision and Recall


• Precision determines how many of the transactions flagged as malicious were actually
malicious. It is crucial in blockchain security, as high precision minimizes false alarms.

• Recall evaluates how well the model detects malicious transactions. A high recall
ensures that the system captures most security threats.

• F1-Score is the harmonic mean of precision and recall, providing a balanced measure
of model performance.

9
3. Confusion Matrix
A confusion matrix provides a detailed breakdown of correct and incorrect predictions,
helping to visualize model errors. It highlights the trade-off between false positives and false
negatives, which is critical in blockchain-based security applications. A heatmap of the
confusion matrix is generated to better interpret classification results.

4. Feature Importance Analysis


Since blockchain file-sharing transactions consist of multiple attributes such as transaction
fee, encryption level, bandwidth usage, and seeder count, feature importance analysis is
conducted to identify which parameters contribute most to malicious transaction detection.
The Random Forest Classifier provides a ranked list of the most significant features, aiding in
refining the model for better performance.

5. Cross-Validation and Hyperparameter Tuning


To enhance model robustness, k-fold cross-validation is applied, where the dataset is split
into multiple subsets to train and test the model iteratively. Hyperparameter tuning using
GridSearchCV optimizes parameters like the number of estimators in Random Forest or the
number of clusters in K-Means, improving the model’s accuracy and efficiency.

6. Performance Visualization
• A Receiver Operating Characteristic (ROC) curve is plotted to illustrate the model’s
classification capability across different thresholds.
• Precision-Recall curves help evaluate model trade-offs in identifying malicious
activities.
• Heatmaps and bar charts visualize feature importance and model decision-making
patterns.

5.4 NETWORK ANALYSIS AND TRANSACTION FLOW


EXAMINATION
Network analysis plays a crucial role in understanding the interactions between nodes in a
decentralized blockchain-based file-sharing environment. By constructing a directed graph of
sender and receiver addresses, it is possible to analyze transaction flows and detect abnormal

10
patterns indicative of fraudulent activity.
The key steps involved in network analysis include:
• Building a Transaction Network Graph: A directed graph is constructed using
NetworkX, where nodes represent users (senders and receivers), and edges represent
file-sharing transactions.
• Identifying Key Nodes: Highly active nodes are identified based on in-degree (number
of received transactions) and out-degree (number of sent transactions).
• Detecting Anomalous Patterns: Abnormal transaction patterns, such as repeated
interactions between specific users or high transaction frequencies, are flagged for
further investigation.
• Visualization of Network Flow: The transaction network is visualized to highlight
central nodes, clustering tendencies, and unusual transaction behaviours.
This analysis helps uncover hidden patterns in blockchain file-sharing, improving security and
transaction efficiency.

5.5 TIME-SERIES ANALYSIS AND TREND PREDICTION


Time-series analysis is used to examine the historical trends of blockchain transactions, identify
seasonal patterns, and predict future transaction behaviors. By analyzing transaction
timestamps and corresponding values, insights can be gained into how blockchain file-sharing
activity evolves over time.
The time-series analysis process includes:
• Data Aggregation: Transaction data is grouped based on timestamps, converting raw
logs into structured time-series data.
• Identifying Seasonal Trends: Fluctuations in transaction volume, bandwidth usage, and
encryption levels are analyzed to detect recurring patterns.
• Moving Average and Forecasting: Moving averages are applied to smoothen short-term
variations, and predictive models such as ARIMA (AutoRegressive Integrated Moving
Average) are used for forecasting future trends.
• Anomaly Detection in Time-Series Data: Sudden spikes in transaction activity or
irregular trends are flagged as potential security threats.
This analysis enables proactive decision-making, helping to optimize network performance and
enhance security by detecting suspicious transaction behaviors before they escalate.

11
5.6 SELECTION OF BEST MODEL AND RESULT INTERPRETATION
Model selection is the process of choosing the most suitable predictive model based on
performance metrics and real-world applicability. Factors such as accuracy, interpretability,
and computational efficiency are considered when selecting the final model.
The key steps involved in model selection include:
• Comparing Different Models: The performance of multiple models (e.g., Random
Forest, SVM, K-Means) is compared to identify the most effective approach.
• Hyperparameter Optimization: The best-performing model is fine-tuned using
hyperparameter adjustments to maximize accuracy and minimize errors.
• Result Interpretation: The selected model’s predictions are analyzed to determine key
insights into blockchain file-sharing transactions and security risks.
• Implementation and Deployment: Once validated, the final model is integrated into the
blockchain analysis system to provide real-time monitoring and threat detection.
By selecting the most efficient model, the project ensures that blockchain-based file-sharing
networks are secure, reliable, and optimized for performance.

12
6. TESTING AND IMPLEMENTATION
6.1 SYSTEM TESTING
System testing is a crucial phase in ensuring the reliability, security, and efficiency of the
blockchain-based file-sharing system. This project follows a comprehensive testing strategy
that includes functional, performance, security, and usability testing.
Functional testing is conducted to verify that all core features, including file encryption,
transaction validation, and user authentication, work as expected. Each function is tested
individually and then integrated into the overall system to ensure seamless operation. The
blockchain ledger is tested to confirm that it accurately records file transactions, preventing
any unauthorized modifications.
Performance testing evaluates the system's ability to handle multiple transactions
simultaneously. Given the decentralized nature of blockchain, it is essential to ensure that
network congestion does not lead to significant delays. Various test cases simulate high-traffic
conditions to measure transaction speeds and overall system responsiveness.
Security testing is performed to identify potential vulnerabilities in the system. Since
blockchain-based file sharing involves sensitive data, multiple security tests are conducted to
ensure encryption is applied correctly, unauthorized access is prevented, and malicious
activities are detected in real-time. Machine learning models integrated into the system are
tested to confirm their accuracy in identifying suspicious transactions.
Usability testing assesses the user experience, ensuring that the system is intuitive and easy to
navigate. Testers interact with the platform to evaluate its interface, accessibility, and overall
efficiency in performing file-sharing operations. Feedback from users is analyzed to make
necessary improvements before deployment.

6.2 IMPLEMENTATION
The implementation of the blockchain-based file-sharing system follows a structured approach
to ensure security, scalability, and efficiency. The system is developed using a Python-based
backend, incorporating Flask or Django for handling API requests. For storing metadata,
databases such as MongoDB or PostgreSQL are utilized, while the blockchain network itself
manages file integrity and transactions.
The integration of blockchain technology into the file-sharing environment ensures data
security through cryptographic hashing and decentralized ledger management. Each file
transaction is recorded immutably, preventing tampering or unauthorized alterations. Smart

13
contracts are deployed to automate transaction validation, access control, and permissions
management, reducing the need for manual intervention.
To enhance security, a machine learning model is integrated into the system to detect malicious
transactions. The model is trained using real-world blockchain transaction data and employs a
Random Forest Classifier to classify activities as legitimate or suspicious. The implementation
of this AI-based security layer strengthens the system’s ability to identify and prevent
fraudulent activities.

6.3 TEST CASE


Test Test Scenario Expected Result Actual Result Status
ID
TC-01 Load blockchain Dataset should load Successfully loaded Pass
transaction dataset without errors
TC-02 Handle missing values No NaN values should No missing values found Pass
in dataset remain after
preprocessing
TC-03 Encode categorical Categorical features Successfully encoded Pass
variables correctly should be converted into
numerical
representations
TC-04 Scale numerical Data should be StandardScaler applied Pass
features properly normalized for model correctly
training
TC-05 Train Random Forest Model should achieve at Accuracy = 95.25% Pass
model on transaction least 85% accuracy
data
TC-06 Detect malicious Malicious transactions Some false negatives Needs
transactions using ML should be flagged present improvem
model correctly ent
TC-07 Apply K-Means Transactions should be 3 distinct clusters Pass
clustering on grouped into clusters formed
transactions based on file size, fees,
and encryption
TC-08 Generate confusion Visual representation of Confusion matrix Pass
matrix for classification classification results generated
should be produced
TC-09 Perform network Graph visualization Graph generated Pass
analysis of sender- should highlight key correctly
receiver interactions nodes and anomalies
TC-10 Detect highly active Most frequent senders Key nodes detected Pass
nodes in file-sharing and receivers should be
transactions identified

Table 2: Test Scenarios and Expected Outcomes

14
7. RESULTS
After testing various machine learning models such as Random Forest Classifier, Decision
Tree, and Logistic Regression, the performance of the trained model was evaluated based on
accuracy, precision, recall, and F1-score. The dataset used in this study consists of 2000
transactions with 19 attributes, capturing essential details such as transaction fees, encryption
levels, file sizes, and sender-receiver relationships.
The Random Forest Classifier was chosen as the final model due to its high accuracy and
robustness in detecting malicious transactions. The dataset was split into training and testing
sets, with 80% used for training and 20% for testing. The model was successfully trained,
achieving an overall accuracy of 95.25% on the test set.

7.1 MODEL EVALUATION METRICS


Confusion Matrix:
The model correctly classified 377 non-malicious transactions and 4 malicious transactions,
with 19 false negatives, indicating that some malicious transactions were misclassified.

Fig 2: Confusion Matrix


Classification Report:
• Precision: 95% for non-malicious transactions, 100% for malicious transactions
• Recall: 100% for non-malicious transactions, 17% for malicious transactions
• F1-score: 98% for non-malicious transactions, 30% for malicious transactions
• Macro Average F1-score: 64%
• Weighted Average F1-score: 94%

Fig 3: Classification Report

15
Clustering Results:
K-Means clustering was used to analyze blockchain transactions based on key attributes. Three
clusters were identified, with the following average statistics:
• Cluster 0: Average file size 8.66 MB, average transaction fee 0.0577
• Cluster 1: Average file size 9.34 MB, average transaction fee 0.0529
• Cluster 2: Average file size 7.58 MB, average transaction fee 0.059

Fig 4: Cluster Statistics


File Type Distribution :
The analysis of file type distribution provides insights into the nature of files shared in the
blockchain-based system. By categorizing transactions based on file types, we identify trends
in file-sharing behavior. This helps in understanding storage utilization, security implications,
and optimizing resource allocation. Similar to previous analyses like clustering and confusion
matrix evaluation, this result supports identifying patterns in decentralized file-sharing
networks.

Fig 5: File Type Distribution


Network and Transaction Flow Analysis:
The network analysis identified key nodes responsible for the highest volume of transactions.
The sender-receiver relationships were mapped to visualize high-frequency interactions,
helping in detecting potential security threats and understanding the structure of the
decentralized network.

16
7.2 VISUAL REPRESENTATION OF ANALYTICAL RESULTS
The following graphs visually represent key findings in the blockchain-based file-sharing
analysis. They illustrate trends in transaction attributes, network activity, classification
performance, and clustering results, providing deeper insights into system efficiency and
security.

Fig 6: Visualizes the correlation heatmap of numerical features in a dataset.

17
Fig 7: Represents the relationship between file size and transaction fees.

18
Fig 8: Shows the distribution of file types in a dataset.

19
Fig 9: Displays a confusion matrix for evaluating classification performance.

20
Fig 10: Displays feature importance in a blockchain ML model.

21
Fig 11: Displays K-means clustering results on blockchain file-sharing data.

22
Fig 12: Compares file size distribution between malicious and non-malicious files.

23
Fig 13: Depicts transfer efficiency based on permission type (private, shared, public).

24
Fig 14: Illustrates daily, weekly, and monthly blockchain transaction values over time.

25
8. CONCLUSION AND FUTURE
ENHANCEMENTS
8.1 CONCLUSION
This project enhances security, transparency, and efficiency in decentralized file-sharing
networks. By leveraging machine learning techniques, the system effectively analyzed
blockchain transaction data to detect malicious activities, classify transactions, and optimize
network efficiency. The Random Forest Classifier, used for classification, achieved a high
accuracy of 95.25%, indicating strong predictive performance in identifying suspicious
transactions. Additionally, clustering techniques provided valuable insights into transaction
patterns, while network analysis helped visualize sender-receiver relationships, improving our
understanding of decentralized file-sharing dynamics.
The results validate the effectiveness of blockchain in securing file-sharing transactions,
ensuring data integrity, and reducing the risk of unauthorized access. The study highlights the
importance of integrating blockchain analytics with machine learning to enhance security
measures, optimize transaction efficiency, and improve trust in decentralized environments.
However, the low recall for malicious transaction detection suggests that further improvements
are needed in data preprocessing and feature engineering to refine the model’s ability to detect
fraudulent activities more accurately.
Overall, this research contributes to advancing decentralized file-sharing systems by providing
a data-driven approach to blockchain security analysis. It lays the groundwork for future
improvements and practical implementations in real-world blockchain-based file-sharing
applications.

8.2 FUTURE ENHANCEMENTS


While this project successfully analyzed blockchain-based file-sharing transactions, several
areas can be further enhanced to improve security, efficiency, and scalability. The following
enhancements can be considered for future work:
1. Real-time Anomaly Detection: Implementing real-time monitoring of blockchain
transactions using advanced machine learning models such as deep learning and
anomaly detection algorithms can improve the system’s ability to detect malicious
activities instantly.
2. Integration with Smart Contracts: Enhancing security through automated smart

26
contracts that validate and verify file-sharing transactions can ensure greater
transparency and trustworthiness in decentralized networks.
3. Scalability Improvements: Optimizing blockchain architecture and consensus
mechanisms to handle larger transaction volumes efficiently can enhance network
performance and reduce latency.
4. Enhanced Feature Engineering: Incorporating additional transaction parameters, such
as user reputation scores and past transaction behavior, can improve the accuracy of
fraud detection models.
5. Multi-Blockchain Compatibility: Expanding the analysis to multiple blockchain
platforms, such as Ethereum and Hyperledger, can provide broader insights into
decentralized file-sharing across different blockchain ecosystems.
6. Privacy-Preserving Techniques: Implementing zero-knowledge proofs or
homomorphic encryption can enhance privacy in blockchain-based file-sharing systems
without compromising security.
By incorporating these enhancements, the system can evolve into a more robust, intelligent,
and scalable solution for blockchain-based file-sharing, ensuring greater security, efficiency,
and user trust in decentralized environments.

27
9. BIBLIOGRAPHY
WEBSITES:
1. Nakamoto, S. (2008). Bitcoin: A Peer-to-Peer Electronic Cash System. Retrieved
from https://bitcoin.org/bitcoin.pdf
2. Techtarget. (2025). Top 8 Blockchain Platforms to Consider in 2025. Retrieved from
https://www.techtarget.com
3. Helalabs. (2024). Top 10 Decentralized Storage Projects to Know. Retrieved from
https://helalabs.com
4. ResearchGate. (2025). Blockchain-Based File Storage and Sharing with IPFS.
Retrieved from https://www.researchgate.net
5. ACM Digital Library. (2020). A Secure File Sharing System Based on IPFS and
Blockchain. Retrieved from https://dl.acm.org

BOOKS:
1. Bashir, I. (2019). Mastering Blockchain. O'Reilly Media.
2. Choo, K. R., & Dehghantanha, A. (2020). Blockchain Cybersecurity, Trust and
Privacy. Springer.
3. Werbach, K. (2018). The Blockchain and the New Architecture of Trust. MIT Press.
4. Prusty, N. (2017). Building Blockchain Projects. Packt Publishing.
5. Swan, M. (2015). Blockchain: Blueprint for a New Economy. O'Reilly Media.
6. Tapscott, D., & Tapscott, A. (2016). Blockchain Revolution. Portfolio.
7. Lewis, A. (2018). The Basics of Bitcoins and Blockchains. Mango Publishing.

28

You might also like