Development of Privacy-Preservation of Big Data With Support of Hyperledger Fabric and IPFS
Development of Privacy-Preservation of Big Data With Support of Hyperledger Fabric and IPFS
Volume
Alpesh Vaghela et al., International Journal of 10,
Advanced No.5,
Trends in September - October
Computer Science 2021 10(5), September - October 2021, 2930 – 2935
and Engineering,
International Journal of Advanced Trends in Computer Science and Engineering
number of information patterns [1]. Figure 1 shows a different
ABSTRACT Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse011052021.pdf
ABSTRACT method for a different phase of the big data life cycle.
https://doi.org/10.30534/ijatcse/2021/011052021
Received Date : August 05, 2021 Accepted Date : September 14, 2021 Published Date : October 06, 2021
1. INTRODUCTION
2930
Alpesh Vaghela et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(5), September - October 2021, 2930 – 2935
Data security entails gathering and protecting data from role in the privacy-preservation of big data in the healthcare
unauthorized users, as well as destroying data that is no industry. Figure 4 depicts the roles of several actors in data
longer needed. The right use of data is characterized as data mining in the hospital setting.
privacy. Customers' personal data is used by businesses on the Data Provider: In the privacy-preservation of big data in
condition that no personal information is disclosed and no the hospital area, all users of the hospital, including doctors,
data is used that is not useful to customers [4]. The concerns patients, nurses, pharmacies, and lab reports, who supply
of data abuse, such as privacy data transactions, are not some input in the form of data are known as data providers.
properly regulated due to a lack of legal limits and immature Data Collector: A Data Collector is a hospital that provides
auditing tools, and the difficulty of maintaining privacy is a server and storage area for storing data provider data.
worsened. The keyword search feature needs to be performed Because all data providers must rely on the data collector for
over encrypted centralized data storage without disclosing their data privacy and security, the data collector is the most
any information about the search query or the retrieved important actor for privacy protection.
document in order to ensure data privacy. A Data minor: A data minor is a technical expert who is
privacy-preserving keyword search is what it's called. Figure familiar with a variety of algorithms for discovering
2 shows the list of users who have their privacy protected by meaningful information.
the centralization data mining server. The requirements for Decision maker: For economic growth, the decision-maker
preserving users' privacy from the centralization big data is obliged for publishing the optimal report from data
mining server are listed below. analysis and data mining.
MeDShare is a verifiable method for data auditing and control relevant knowledge from it during the data mining process.
of medical shared data in cloud repositories for big data At this time, all PPDM approaches are based on a centralized
databases, based on blockchain technology. Cloud service and distributed data-based system. Data suppliers should put
providers and other data guardians can achieve data their trust in a third party or a centralized system authorized
provenance and auditing while sharing medical data with person. We presented a decentralised blockchain-based
entities like research and medical institutions with minimum method for large data privacy preservation, in which data is
risk to data privacy by utilizing MeDShare. Users can access securely transmitted from data collector to data miner.
data using their pre-generated private key, and if the key is Because blockchain's data storage capacity is restricted, we
legitimate, the user can access data; otherwise, the transaction use HDFS for data mining. Private data is saved on-chain,
is declared invalid [7]. The authors [8] suggested framework while non-sensitive information or data required for data
provides uniform admittance standards for these records and mining is stored off-chain. ChainPPDM is a
safe record storage based on blockchain. Because blockchain permission-based decentralised system with excellent data
does not have enough storage capacity for huge data, the study mining feasibility and security.
utilizes an IPFS system to store off-chain storage. And the
role-based access also benefits the system as the medical 3. BLOCKCHAIN STRUCTURE
records are only available to the trusted and related
individuals. A Blockchain is made up of a database of transactions linked
by a cryptographic hash. To store enormous amounts of
The blockchain technology creates a distributed ecosystem medical data, however, a big data storage system with a
with decentralised and tamper-proof records, as well as a new blockchain is required. [12]. The blockchain's fundamental
way to secure and share electronic health record systems. The design is depicted in Figure 5, which contains an individual
authors suggest a new electronic healthcare data sharing block with a block header and a block body that is linked to
system based on blockchain that verifies data integrity. The the next to continue blocks. Each hospital has its own servers,
author created a smart contract that allows users to access, and each doctor associated with that hospital has his or her
alter, and remove healthcare records stored in the cloud. This own Blockchain network, which is connected to the others
solution employs a private key that is kept in the blockchain, [13]. The entire network linked to the system manager will be
and an authorized individual can use this key to access any in charge of system management.
cloud-based record using the smart contract [9].
2932
Alpesh Vaghela et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(5), September - October 2021, 2930 – 2935
ChainPPDM presents new on-chain and off-chain solutions inserted into the HDFS system of data minor. Figure 6 shows
for protecting big data privacy. We believe, however, that the the process of data set generation and stored in the blockchain
big data analysis method is faster than a database system. and centralized database.
Authors should keep bid data in mind while mining data or
looking for patterns. The most serious issue is data privacy.
For data analysis, ChainPPDM employed the HDFS big data
system and a blockchain system for user privacy. Healthcare
has information on customers who placed orders online. The
Hospital's patient database contains both confidential and no
sensitive information, such as the patient's address,
neighborhood, and city. ChainPPDM is kept private in a
Hyperledger Fabric collection, and non-sensitive data is kept
in HDFS. To send data from the data collector database to the
data minor warehouse, ChainPPDM employs IPFS.
4.1 Hash Value Generated and Data Transfer to Figure 6: the process of data set generation and storage in
Blockchain blockchain and centralized database
From the entire amount of data, the data collector has 5. ARCHITECTURE AND ALGORITHMS
constructed two independent sets of data. The patient's private
data is one set, while nonsensitive data from data mining is IPFS is utilized by ChainPPDM to send data from the data
the other. collector to the data minor quickly and securely. ChainPPDM
is implemented using Chaincode, MSP, statedb, IPFS, and
(1) HDFS. CouchDB is a state database that keeps track of
transaction results. Only core data is stored in the blockchain
is private data attributes of consumer network, while non-core data is stored in a central database.
The MSP is a pluggable interface that consists of a set of
(2) encryption methods and protocols for issuing and verifying
certificates and identities in the blockchain. The CA is used to
is nonsensitive data attribute of consumer and is whole data generate certificates and secret keys, as well as to initialize the
set. The data collector is created data set for data mining. MSP, and the order node serves as a network proxy for data
Before giving data set to transfer to Data Minor warehouse, distribution. Figures 7 and 8 depict ChainPPDM's
Data collector is decided sensitive data attribute for data architecture and algorithms, respectively.
privacy. To measure hash value, we use
(3)
where is the private data of the patient and we add some salt
to generate hash because every same input of value for a hash
function is generated the same value. It has a good chance of
guessing a private value from similar hashes. For data of any
length, the SHA256 algorithm can generate a 256-bit long
hash value [15]. A hash value is a numerical representation of
a piece of data that is unique and exceedingly compact.
Generated has become essential for both sensitive and
non-sensitive data. Now we create new data set using has
Figure 7: shows the architecture of ChainPPDM
value, . New data set has key and value pair. Hash
value becomes key for access to private as well as
6. EXPERIMENTAL RESULTS AND ANALYSIS
non-sensitive value.
We created a prototype system based on the Virtual Machine for
(4) ChainPPDM and the Hyperledger fabric to ensure that our data
separation and storage solution is genuinely effective for privacy
(5) preservation of big data using blockchain systems like data mining.
is inserted into blockchain private collection and is The system runs on an Ubuntu 20.04 (64-bit) virtual machine,
2933
Alpesh Vaghela et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(5), September - October 2021, 2930 – 2935
Intel(R) Core (TM) i7-4790 CPU @ 3.60 GHz processor and 8 GB Figure 10 depicts Write Latency(second) plot using
RAM, and uses the HDFS simulation central database. We observe Hyperledger Caliper tool. In comparison to the BlockBDM
the changes in the storage capacity after separately sensitive and and PBMS, ChainPPDM function better because
non-sensitive data with the hash value of non-sensitive data. Table 1
ChainPPDM write private data in private state of Hyperledger
shows compression between other method for privacy-preservation
of big data with ChainPPDM. Hyperledger Caliper tool has been
Fabric.
used for evaluation of ChainPPDM.
BlockBD
PBMS [9] ChainPPDM
M [8]
Public
Ethereum NA NA
Blockchian
Permissioned hyperledge hyperledge hyperledger
Blockchain r Fabric 1.4 r Fabric 1.4 Fabric 2.0
Data Type multimedia text text Figure 10: Write Latency(second) plot using Hyperledger Caliper
Database tool.
before MySql MySql MySql
system 7. CONCLUSION
Nodes
4 4 4 ChainPPDM has solved the challenge of massive data privacy
Number
On-Chain preservation. ChainPPDM presents a blockchain-based
Blockchain Blockchain Blockchain architecture for large data privacy preservation, as well as a
Storage
Off-chain prototype system to demonstrate feasibility and provide a
IPFS MySql IPFS, HDFS
Storage solution for big data mining technologies. In addition, data
was separated into two sections, which increased data privacy
The average size of a single data provider record in single and solved the problem of greater data computation in
files is 182 bytes, and the data provider separates this data into blockchain vs traditional data mining systems. In this article,
two sections, with private data taking up 142 bytes and we will just utilise text big data to create the system; however,
nonsensitive data taking up 40 kb. The storage load of we will assess it and try to extend it to additional fields in the
non-sensitive hash data grew with byte kb of each record on future.
blockchain as well as on centralized data minimal storage.
Figure 9 depicts the Read Latency(second) plot using ACKNOWLEDGEMENT
Hyperledger Caliper tool. BlochBDM and PBMS read latency
low compare to ChainPPDM because ChainPPDM is used The authors are thankful to the GMB staff members, Dr.
private state of Hyperledger Fabric. With the secrecy of data, Nalin Jani, KSV and Dr. Seema Mahajan, IU for their best
ChainPPDM increases file size on hard disc. cooperation, support and guidance.
2934
Alpesh Vaghela et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(5), September - October 2021, 2930 – 2935
2935