Data Life cycle

The data life cycle encompasses stages from data creation and acquisition to disposal, including data use, modification, archiving, and repurposing. Key considerations during these stages involve data quality, privacy, and the efficient management of data storage and retrieval. Ultimately, data may be disposed of intentionally or due to legal requirements, even if it holds no current value.

Uploaded by

Ilakiya T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

4 views

Data Life cycle

Uploaded by

Ilakiya T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 5

Data life Cycle “The overall process from data creation to disposal is normally referred to as the data life ‘cycle. The various stages in the Data life cycle are: Data creation and Acquisition Data use. ‘Data Modification Data Archiving Data Repurposing & Data Disposal Data Creation and Acquisition ‘The process of data creation and acquisition is a function of the source and type of data.in the pharmacogenomic lab data are generated by sequencing machines and microarrays in the molecular biology laboratory, and by clinicians and clinical studies in the clinic or hospital. ‘The major issues in the data-creation phase of the data life cycle include tool selection, ‘data format, standards, version control, error rate, precision, and accuracy. In particular, metrics such as error rate, precision, and accuracy are more easily ascribed to machine-generated data, ‘whether from clinical laboratory studies or microarray analysis.Creation & Acquisition Figure :Data Life Cycle of a pharmacogenomic laboratory Key steps in the process include data creation and acquisition, use, modification, repurposing, and the end game —archiving and disposal. Depending on the difficulty in creating the data and the intended use, the creation process may be trivial and inexpensive or extremely complicated and costly. For example, recruiting test subjects to donate tissue biopsies is generally more expensive and difficult than identifying patients who are willing to provide less-invasive (and painful) tissue samples. Sats Gencasadi Gy ftna-oncliial ae aOR ered nthe erro ea through the use of manual transcription, voice recognilion data-input systems, or desklop or handheld computers, “There is significant variation in subjective interpretation of clinical studies, For example, _five seasoned radiologists will typically pravide five different interpretations of the same chest “film or other radiographic study. In addition to the quality of the initial clinical observation, there are errars introduced by the hardware, software,and processes involved in capturing data, from keyboard and mouse to optical character recognition.and voice recognition.SS aEieneemmeemene Data Use Ones clinical and genomic data are captured, they can be put to a variety of imme {ses from simulation, statistical analysis, and visualization to communications, Tesuee » stage of the data life cycle include intellectual Property rights, privacy, and distribution Sxample, unless patients have expressly given permission to have their names used, mice dala should be identified by ID number though a system that unaintains dhe anonymity donor. Data Modification Data are rarely used in their raw_form, without some amount of editing. The data dictionary is one means of modifying data in a controlled way thal ene Standards ate [ollowed. A data dictionary can be used to tag all microarray data with tone date information in a standard format so ‘that they can be automatically correlated with cli findi i i Fig :Relationship between clinical data and microarray dataee Data Archiving Tris concemed with making data available for future use(back up).An atchive is foutainer for data that is infrequently accessed, wilh the focus more on data storage for longe fe than on access speed. Inthe archiving process —which ean range from, making a backup a a local database on a CD-ROM or Zip® disk to creating a backup of an entire EME system ins large hospital— data are named, indexed, and fled jn a way that facilitates identification later, One of the primary determinants of archive capacity is the storage media the physica material used to form a tape, disk, or carttidge. In addition to capacity, media can bx characterized fn terms of compatibility, speed, data density, cost, volatility, durability, and Compatibility is the ability of media to function within a particular software and hardware environment, Powed is a multifaceted performance characteristic that encompasses both the time to Tocate data (Seek tine) and the time to write i 1 or download it from the medig (data transfer Tate Seok tine may be several hundred milliseconds for a CD-ROM, a few millisecond fee a hard drive, and a few microseconds for aflasti memory card. Capacity the maximum amount of data the media can store —is a function of the media construction. Capacity is also a function of data density. Cost is a function of the raw materials involved in the creation of media, Volatility, a characteristic normaly ascribed to solid-state memory, refers to the status of the data when external power is removed.Hash memory, like magnetic disk or tape, is considered relatively non-volatile, and can hold data for years without lose Durability refers to the physical properties of the media that contsibute to the Tongevity of the surface, mechanisms, and housing, if any, during normal use. For example, the beatings and other components in the rotational system of a hard drive undergo wear and tear ever time. Archives vary considerably in configuration and in proximity to the source date For Sample servers typically employ several independent hard drives configured as » Redundaee Artay of Independent Disks (RAID system) that function in part as an integrated archival System RAID systems derive their speed from reading and writing to multiple disks in parallel, there are seven levels of RAID, level 3 is most applicable to bioinformatics computing, In RAID-3, a disk is dedicated to storing » party bit—an extra bit used to determine the securacy of data transfer—for error detection and correction, If analysis of the parity bie indicates an error, the faulty disk can be identified and replaced, The dats can be reconstructed by using the remaining disks and the parity disk For example in Figure, disks A-D are dedicated to data and disk P is used to store the parity bit In this case, an odd number of "1" bits cortesponds to a high (1) patity bit, When dla are written in parallel to the data disks, the corresponding parity bit ia toned on tha Parity isk Immediately after the data are written to the data disks, the data are road and the paritybts are compared. The change noted in Figure is typical ofa case when thete is an ervor on one disk. The error on disk "C" can be repaired, or if groups of errors are suddenly becoming apparent indicating imminent disk failure, then the entire disk can be replaced. Hig: RAID-5, Data clisks are read and writion to in parallel providing speed, while a dedicated arity disk provides increased reliability through error detection and correction, In the Zample, an error in disk C is detected by a different parity bit (P, indicating that the data vend from disks A-D don't agree with what was writen to the disks. Although the ‘parity bit is ‘ssually based on a comparison of bytes on the data disks, bits (0 or 1) are used here ken clarity. Data Repurposing One of the major benefits of having data readily available in an archive is the ability t0 purpose it for a variety of uses. For exemple, linear sequence data originally captured te slscover new genes are commonly repurposed to support the 3D visualization or protein structures, ‘The major issues in repurposing data is the ability to efficiently locate data in archives, ‘he difficulty in locating data once is been incorporated into a storage system, depends on the volumw of data involved. Data Disposal All deta die, either because they are intentionally disposed of when their velue has decreased to the point that itis less than the cost af maintaining it, or because of accidental loss, Often, data have to be archived because of legal reasons, even though the dota is of no nhc, ‘Value to the institution or researcher. For example, most official hospital or cline Patient records must be maintained for the life of the patient.

Data Science Harvard Lecture 1 PDF
No ratings yet
Data Science Harvard Lecture 1 PDF
43 pages
DNA Data Storage
No ratings yet
DNA Data Storage
9 pages
Data Storage
No ratings yet
Data Storage
21 pages
1 CC By-Nc
No ratings yet
1 CC By-Nc
18 pages
Data Security Consideration
No ratings yet
Data Security Consideration
14 pages
Chapter Three - Data Warehouse Evaluation: SATA Technology and Business Collage
No ratings yet
Chapter Three - Data Warehouse Evaluation: SATA Technology and Business Collage
7 pages
Varying Value of Data With Time & Usage
No ratings yet
Varying Value of Data With Time & Usage
28 pages
Trev 307-Inphase PDF
No ratings yet
Trev 307-Inphase PDF
8 pages
Review On Big Data & Analytics - Concepts, Philosophy, Process and Applications
No ratings yet
Review On Big Data & Analytics - Concepts, Philosophy, Process and Applications
25 pages
Data Mining Report
No ratings yet
Data Mining Report
15 pages
Deep Store: An Archival Storage System Architecture
No ratings yet
Deep Store: An Archival Storage System Architecture
12 pages
Distributed Parallel Architecture For "Big Data"
No ratings yet
Distributed Parallel Architecture For "Big Data"
12 pages
DATA RECOVERY (kHATEEB)
100% (1)
DATA RECOVERY (kHATEEB)
46 pages
File Design Alternatives
100% (1)
File Design Alternatives
5 pages
dbms unit-3
No ratings yet
dbms unit-3
26 pages
TTDS Lecture 1
No ratings yet
TTDS Lecture 1
22 pages
INFORMATICS
No ratings yet
INFORMATICS
42 pages
Data Mining Part 02 Eng
No ratings yet
Data Mining Part 02 Eng
12 pages
Data Mining: Set-01: (Introduction)
No ratings yet
Data Mining: Set-01: (Introduction)
14 pages
MITRES STR002S16 IntroDM
No ratings yet
MITRES STR002S16 IntroDM
53 pages
Unit 3
No ratings yet
Unit 3
28 pages
Data Archival Testing
No ratings yet
Data Archival Testing
4 pages
Data Archiving and Archive Development Kit
No ratings yet
Data Archiving and Archive Development Kit
24 pages
Unit 2 of AI
No ratings yet
Unit 2 of AI
5 pages
DE UNIT 4
No ratings yet
DE UNIT 4
33 pages
Managing Research Data
100% (1)
Managing Research Data
31 pages
CBCT2203 - Topic 4
No ratings yet
CBCT2203 - Topic 4
24 pages
Lo 1
No ratings yet
Lo 1
65 pages
BI_UNIT 3
No ratings yet
BI_UNIT 3
18 pages
DB_CH5
No ratings yet
DB_CH5
42 pages
Pengarsipan Inter 3
No ratings yet
Pengarsipan Inter 3
51 pages
Chapter 2.1 2.2
No ratings yet
Chapter 2.1 2.2
40 pages
13 July_webinar slides
No ratings yet
13 July_webinar slides
59 pages
Datamining Lect1
No ratings yet
Datamining Lect1
61 pages
White Paper For IP 21
No ratings yet
White Paper For IP 21
13 pages
E Records
No ratings yet
E Records
9 pages
Diamond Elite Funded Head Count Agmt
No ratings yet
Diamond Elite Funded Head Count Agmt
3 pages
dbms 5th Unit (2)
No ratings yet
dbms 5th Unit (2)
22 pages
CT lec 5
No ratings yet
CT lec 5
26 pages
Artificial Intelligence For Engineers: Unit-2 Lecture - History of Data
100% (1)
Artificial Intelligence For Engineers: Unit-2 Lecture - History of Data
33 pages
Data Storage Concepts - Handout
No ratings yet
Data Storage Concepts - Handout
8 pages
Unit-2
No ratings yet
Unit-2
13 pages
1-s2.0-S0065245808602983-main
No ratings yet
1-s2.0-S0065245808602983-main
45 pages
Komputer Pada Radiografi Storage IndianJRadiolImaging184287-7637852 - 211258
No ratings yet
Komputer Pada Radiografi Storage IndianJRadiolImaging184287-7637852 - 211258
3 pages
8-Disk Storage Files
No ratings yet
8-Disk Storage Files
31 pages
VTU Exam Question Paper With Solution of 18CS822 Storage Area Networks June-2023-Dr.S.seetha
No ratings yet
VTU Exam Question Paper With Solution of 18CS822 Storage Area Networks June-2023-Dr.S.seetha
38 pages
10 Things You Should Know About Long-Term Data Archiving
No ratings yet
10 Things You Should Know About Long-Term Data Archiving
3 pages
A Practical Approach To Data Science: CSCI E-84
No ratings yet
A Practical Approach To Data Science: CSCI E-84
51 pages
Data Security ConsiderationLesson3
No ratings yet
Data Security ConsiderationLesson3
34 pages
L1
No ratings yet
L1
44 pages
DATA Archival
0% (1)
DATA Archival
42 pages
Chapter 5 Physical Database Design
No ratings yet
Chapter 5 Physical Database Design
20 pages
Updated DM
No ratings yet
Updated DM
72 pages
Chap 6 7
0% (1)
Chap 6 7
12 pages
Chapter 4: Spatial Storage and Indexing
No ratings yet
Chapter 4: Spatial Storage and Indexing
39 pages
5-Backups of Security Devices-10-06-2023
No ratings yet
5-Backups of Security Devices-10-06-2023
16 pages
Storage and File Structure
No ratings yet
Storage and File Structure
60 pages

Data Life cycle

Uploaded by

Data Life cycle

Uploaded by

You might also like