Analysis of Blockchain
Analysis of Blockchain
ENVIRONMENT
PROJECT REPORT
Submitted By
S KARANPRAKASH
Reg.No:23MIT013
APRIL – 2025
Department of Computer
Science Dr. G R Damodaran
College of Science
(Autonomous, affiliated to the Bharathiar University and recognized by
UGC) Re-accredited at the ‘A+’ Grade level by the NAAC
An ISO 9001:2015 Certified Institution
Coimbatore 641 014
Certificate
This is to certify that this project report entitled
ANALYSIS OF BLOCKCHAIN IN FILESHARING
ENVIRONMENT
is a bonafide record of project work done by
S KARANPRAKASH
Reg. No:23MIT013
I wish to thank all our classmates and friends for their valuable help and support
throughout my project. With love and affection, I would like to thank my family for their
prayers, support and advice which guided me always. Above all, I thank God Almighty for
giving me the strength and courage for being with me throughout my project.
TABLE OF CONTENTS
1
Furthermore, clustering techniques such as K-Means and Principal Component Analysis (PCA)
categorize different file-sharing behaviours, helping to detect anomalies and improve security
protocols.
Network analysis is performed to visualize the interactions between nodes within the file-
sharing system, enabling a deeper understanding of data flow and transaction behaviour. Time-
series analysis helps track trends in transaction activity over time, providing insights into
network efficiency and security vulnerabilities. Correlation analysis between transaction fees,
file sizes, and encryption levels allows for optimizing trade-offs between cost and security.
Overall, this project provides a comprehensive examination of blockchain-based file-sharing
environments, demonstrating how decentralized systems can enhance security, reduce
fraudulent activities, and improve file transfer efficiency. By integrating machine learning and
blockchain analytics, the project offers valuable insights for developing more secure and
efficient decentralized file-sharing platforms.
2
2. SYSTEM STUDY
2.1 EXISTING SYSTEM
Existing blockchain-based file-sharing systems use decentralized networks to store and verify
file transactions, ensuring security and transparency. Transactions are recorded on the
blockchain with cryptographic hashes, and smart contracts manage access control. Peer-to-peer
(P2P) networks reduce reliance on centralized servers, improving file distribution.
However, these systems face several limitations. Scalability remains a challenge as the
blockchain ledger grows, leading to high storage costs and slower transaction speeds. Most
systems rely on off-chain storage like IPFS, which, if compromised, can still pose security
risks. Additionally, transaction fees and energy consumption are concerns, especially in
networks using Proof of Work (PoW). Security threats like Sybil attacks and malicious nodes
also affect network reliability.
3
security. Network analysis is used to study interactions between users, while time-series
analysis identifies trends in file-sharing activities, optimizing resource allocation and
improving transaction efficiency.
4
3. SYSTEM REQUIREMENTS
3.1 HARDWARE CONFIGURATION
• System: Personal Computer/Laptop
• Hard Disk: 500 GB or higher
• Monitor: 15’’ LED
• Input Devices: Keyboard, Mouse
• RAM: 4 GB or higher
• Peripheral Devices: Internet connection
5
4. DATASET DESCRIPTION
The data used in this project consists of blockchain transaction records related to file sharing.
Each transaction contains multiple parameters that impact the efficiency, security, and
reliability of the decentralized file-sharing environment. Key attributes include transaction ID,
sender and receiver addresses, file hash, file size, encryption level, bandwidth usage,
transaction fee, and confirmation time. These parameters play a crucial role in evaluating
network performance, identifying malicious activities, and optimizing resource allocation.
Encryption level and transaction fee are critical for ensuring data security and assessing the
cost-effectiveness of file transfers. Seeder count and download count influence file availability
and accessibility, directly affecting user experience in the decentralized network. Bandwidth
usage and transfer efficiency provide insights into network performance and scalability.
The dataset includes real-time blockchain transactions collected from decentralized file-
sharing platforms over a specified period. It is pre-processed to handle missing values,
normalize features, and encode categorical variables before being used for analysis. The table
below (Table 4.1) summarizes the key data attributes and their sources.
6
5. METHODOLOGY
The methodology for analyzing blockchain-based file-sharing transactions follows a structured
approach that involves data preprocessing, model training, evaluation, and network analysis.
This ensures accurate detection of anomalies, optimization of transaction efficiency, and
enhanced security in decentralized file-sharing networks. The primary goal is to leverage
machine learning techniques and network analysis to gain insights into blockchain transaction
patterns while improving system performance and security.
The overview of the proposed methodology is shown in the figure below, representing the
process of analyzing blockchain file-sharing data using machine learning and network analysis
techniques.
7
5.1 DATA PREPROCESSING AND EXPLORATORY DATA ANALYSIS
(EDA)
Data pre-processing is an integral step in machine learning, as the quality of data directly affects
the ability of the model to learn. Therefore, it is crucial to clean and structure the dataset before
feeding it into the machine learning model. Exploratory Data Analysis (EDA) is performed to
summarize dataset characteristics, detect anomalies, and visualize patterns. Some of the key
EDA techniques include correlation analysis, outlier detection, and feature distribution
visualization.
8
learning models is crucial for ensuring accurate classification of transactions, detecting
malicious activities, and optimizing network efficiency. The evaluation process involves
assessing the model’s ability to distinguish between legitimate and fraudulent file-sharing
transactions while minimizing errors. The key metrics used for performance evaluation include
Accuracy, Precision, Recall, F1-Score, Confusion Matrix, and Feature Importance Analysis.
1. Accuracy
Accuracy measures the overall effectiveness of the model in correctly classifying transactions.
It is calculated as:
• Recall evaluates how well the model detects malicious transactions. A high recall
ensures that the system captures most security threats.
• F1-Score is the harmonic mean of precision and recall, providing a balanced measure
of model performance.
9
3. Confusion Matrix
A confusion matrix provides a detailed breakdown of correct and incorrect predictions,
helping to visualize model errors. It highlights the trade-off between false positives and false
negatives, which is critical in blockchain-based security applications. A heatmap of the
confusion matrix is generated to better interpret classification results.
6. Performance Visualization
• A Receiver Operating Characteristic (ROC) curve is plotted to illustrate the model’s
classification capability across different thresholds.
• Precision-Recall curves help evaluate model trade-offs in identifying malicious
activities.
• Heatmaps and bar charts visualize feature importance and model decision-making
patterns.
10
patterns indicative of fraudulent activity.
The key steps involved in network analysis include:
• Building a Transaction Network Graph: A directed graph is constructed using
NetworkX, where nodes represent users (senders and receivers), and edges represent
file-sharing transactions.
• Identifying Key Nodes: Highly active nodes are identified based on in-degree (number
of received transactions) and out-degree (number of sent transactions).
• Detecting Anomalous Patterns: Abnormal transaction patterns, such as repeated
interactions between specific users or high transaction frequencies, are flagged for
further investigation.
• Visualization of Network Flow: The transaction network is visualized to highlight
central nodes, clustering tendencies, and unusual transaction behaviours.
This analysis helps uncover hidden patterns in blockchain file-sharing, improving security and
transaction efficiency.
11
5.6 SELECTION OF BEST MODEL AND RESULT INTERPRETATION
Model selection is the process of choosing the most suitable predictive model based on
performance metrics and real-world applicability. Factors such as accuracy, interpretability,
and computational efficiency are considered when selecting the final model.
The key steps involved in model selection include:
• Comparing Different Models: The performance of multiple models (e.g., Random
Forest, SVM, K-Means) is compared to identify the most effective approach.
• Hyperparameter Optimization: The best-performing model is fine-tuned using
hyperparameter adjustments to maximize accuracy and minimize errors.
• Result Interpretation: The selected model’s predictions are analyzed to determine key
insights into blockchain file-sharing transactions and security risks.
• Implementation and Deployment: Once validated, the final model is integrated into the
blockchain analysis system to provide real-time monitoring and threat detection.
By selecting the most efficient model, the project ensures that blockchain-based file-sharing
networks are secure, reliable, and optimized for performance.
12
6. TESTING AND IMPLEMENTATION
6.1 SYSTEM TESTING
System testing is a crucial phase in ensuring the reliability, security, and efficiency of the
blockchain-based file-sharing system. This project follows a comprehensive testing strategy
that includes functional, performance, security, and usability testing.
Functional testing is conducted to verify that all core features, including file encryption,
transaction validation, and user authentication, work as expected. Each function is tested
individually and then integrated into the overall system to ensure seamless operation. The
blockchain ledger is tested to confirm that it accurately records file transactions, preventing
any unauthorized modifications.
Performance testing evaluates the system's ability to handle multiple transactions
simultaneously. Given the decentralized nature of blockchain, it is essential to ensure that
network congestion does not lead to significant delays. Various test cases simulate high-traffic
conditions to measure transaction speeds and overall system responsiveness.
Security testing is performed to identify potential vulnerabilities in the system. Since
blockchain-based file sharing involves sensitive data, multiple security tests are conducted to
ensure encryption is applied correctly, unauthorized access is prevented, and malicious
activities are detected in real-time. Machine learning models integrated into the system are
tested to confirm their accuracy in identifying suspicious transactions.
Usability testing assesses the user experience, ensuring that the system is intuitive and easy to
navigate. Testers interact with the platform to evaluate its interface, accessibility, and overall
efficiency in performing file-sharing operations. Feedback from users is analyzed to make
necessary improvements before deployment.
6.2 IMPLEMENTATION
The implementation of the blockchain-based file-sharing system follows a structured approach
to ensure security, scalability, and efficiency. The system is developed using a Python-based
backend, incorporating Flask or Django for handling API requests. For storing metadata,
databases such as MongoDB or PostgreSQL are utilized, while the blockchain network itself
manages file integrity and transactions.
The integration of blockchain technology into the file-sharing environment ensures data
security through cryptographic hashing and decentralized ledger management. Each file
transaction is recorded immutably, preventing tampering or unauthorized alterations. Smart
13
contracts are deployed to automate transaction validation, access control, and permissions
management, reducing the need for manual intervention.
To enhance security, a machine learning model is integrated into the system to detect malicious
transactions. The model is trained using real-world blockchain transaction data and employs a
Random Forest Classifier to classify activities as legitimate or suspicious. The implementation
of this AI-based security layer strengthens the system’s ability to identify and prevent
fraudulent activities.
14
7. RESULTS
After testing various machine learning models such as Random Forest Classifier, Decision
Tree, and Logistic Regression, the performance of the trained model was evaluated based on
accuracy, precision, recall, and F1-score. The dataset used in this study consists of 2000
transactions with 19 attributes, capturing essential details such as transaction fees, encryption
levels, file sizes, and sender-receiver relationships.
The Random Forest Classifier was chosen as the final model due to its high accuracy and
robustness in detecting malicious transactions. The dataset was split into training and testing
sets, with 80% used for training and 20% for testing. The model was successfully trained,
achieving an overall accuracy of 95.25% on the test set.
15
Clustering Results:
K-Means clustering was used to analyze blockchain transactions based on key attributes. Three
clusters were identified, with the following average statistics:
• Cluster 0: Average file size 8.66 MB, average transaction fee 0.0577
• Cluster 1: Average file size 9.34 MB, average transaction fee 0.0529
• Cluster 2: Average file size 7.58 MB, average transaction fee 0.059
16
7.2 VISUAL REPRESENTATION OF ANALYTICAL RESULTS
The following graphs visually represent key findings in the blockchain-based file-sharing
analysis. They illustrate trends in transaction attributes, network activity, classification
performance, and clustering results, providing deeper insights into system efficiency and
security.
17
Fig 7: Represents the relationship between file size and transaction fees.
18
Fig 8: Shows the distribution of file types in a dataset.
19
Fig 9: Displays a confusion matrix for evaluating classification performance.
20
Fig 10: Displays feature importance in a blockchain ML model.
21
Fig 11: Displays K-means clustering results on blockchain file-sharing data.
22
Fig 12: Compares file size distribution between malicious and non-malicious files.
23
Fig 13: Depicts transfer efficiency based on permission type (private, shared, public).
24
Fig 14: Illustrates daily, weekly, and monthly blockchain transaction values over time.
25
8. CONCLUSION AND FUTURE
ENHANCEMENTS
8.1 CONCLUSION
This project enhances security, transparency, and efficiency in decentralized file-sharing
networks. By leveraging machine learning techniques, the system effectively analyzed
blockchain transaction data to detect malicious activities, classify transactions, and optimize
network efficiency. The Random Forest Classifier, used for classification, achieved a high
accuracy of 95.25%, indicating strong predictive performance in identifying suspicious
transactions. Additionally, clustering techniques provided valuable insights into transaction
patterns, while network analysis helped visualize sender-receiver relationships, improving our
understanding of decentralized file-sharing dynamics.
The results validate the effectiveness of blockchain in securing file-sharing transactions,
ensuring data integrity, and reducing the risk of unauthorized access. The study highlights the
importance of integrating blockchain analytics with machine learning to enhance security
measures, optimize transaction efficiency, and improve trust in decentralized environments.
However, the low recall for malicious transaction detection suggests that further improvements
are needed in data preprocessing and feature engineering to refine the model’s ability to detect
fraudulent activities more accurately.
Overall, this research contributes to advancing decentralized file-sharing systems by providing
a data-driven approach to blockchain security analysis. It lays the groundwork for future
improvements and practical implementations in real-world blockchain-based file-sharing
applications.
26
contracts that validate and verify file-sharing transactions can ensure greater
transparency and trustworthiness in decentralized networks.
3. Scalability Improvements: Optimizing blockchain architecture and consensus
mechanisms to handle larger transaction volumes efficiently can enhance network
performance and reduce latency.
4. Enhanced Feature Engineering: Incorporating additional transaction parameters, such
as user reputation scores and past transaction behavior, can improve the accuracy of
fraud detection models.
5. Multi-Blockchain Compatibility: Expanding the analysis to multiple blockchain
platforms, such as Ethereum and Hyperledger, can provide broader insights into
decentralized file-sharing across different blockchain ecosystems.
6. Privacy-Preserving Techniques: Implementing zero-knowledge proofs or
homomorphic encryption can enhance privacy in blockchain-based file-sharing systems
without compromising security.
By incorporating these enhancements, the system can evolve into a more robust, intelligent,
and scalable solution for blockchain-based file-sharing, ensuring greater security, efficiency,
and user trust in decentralized environments.
27
9. BIBLIOGRAPHY
WEBSITES:
1. Nakamoto, S. (2008). Bitcoin: A Peer-to-Peer Electronic Cash System. Retrieved
from https://bitcoin.org/bitcoin.pdf
2. Techtarget. (2025). Top 8 Blockchain Platforms to Consider in 2025. Retrieved from
https://www.techtarget.com
3. Helalabs. (2024). Top 10 Decentralized Storage Projects to Know. Retrieved from
https://helalabs.com
4. ResearchGate. (2025). Blockchain-Based File Storage and Sharing with IPFS.
Retrieved from https://www.researchgate.net
5. ACM Digital Library. (2020). A Secure File Sharing System Based on IPFS and
Blockchain. Retrieved from https://dl.acm.org
BOOKS:
1. Bashir, I. (2019). Mastering Blockchain. O'Reilly Media.
2. Choo, K. R., & Dehghantanha, A. (2020). Blockchain Cybersecurity, Trust and
Privacy. Springer.
3. Werbach, K. (2018). The Blockchain and the New Architecture of Trust. MIT Press.
4. Prusty, N. (2017). Building Blockchain Projects. Packt Publishing.
5. Swan, M. (2015). Blockchain: Blueprint for a New Economy. O'Reilly Media.
6. Tapscott, D., & Tapscott, A. (2016). Blockchain Revolution. Portfolio.
7. Lewis, A. (2018). The Basics of Bitcoins and Blockchains. Mango Publishing.
28