0% found this document useful (0 votes)
4 views

Data Life cycle

The data life cycle encompasses stages from data creation and acquisition to disposal, including data use, modification, archiving, and repurposing. Key considerations during these stages involve data quality, privacy, and the efficient management of data storage and retrieval. Ultimately, data may be disposed of intentionally or due to legal requirements, even if it holds no current value.

Uploaded by

Ilakiya T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Life cycle

The data life cycle encompasses stages from data creation and acquisition to disposal, including data use, modification, archiving, and repurposing. Key considerations during these stages involve data quality, privacy, and the efficient management of data storage and retrieval. Ultimately, data may be disposed of intentionally or due to legal requirements, even if it holds no current value.

Uploaded by

Ilakiya T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 5
Data life Cycle “The overall process from data creation to disposal is normally referred to as the data life ‘cycle. The various stages in the Data life cycle are: Data creation and Acquisition Data use. ‘Data Modification Data Archiving Data Repurposing & Data Disposal Data Creation and Acquisition ‘The process of data creation and acquisition is a function of the source and type of data.in the pharmacogenomic lab data are generated by sequencing machines and microarrays in the molecular biology laboratory, and by clinicians and clinical studies in the clinic or hospital. ‘The major issues in the data-creation phase of the data life cycle include tool selection, ‘data format, standards, version control, error rate, precision, and accuracy. In particular, metrics such as error rate, precision, and accuracy are more easily ascribed to machine-generated data, ‘whether from clinical laboratory studies or microarray analysis. Creation & Acquisition Figure :Data Life Cycle of a pharmacogenomic laboratory Key steps in the process include data creation and acquisition, use, modification, repurposing, and the end game —archiving and disposal. Depending on the difficulty in creating the data and the intended use, the creation process may be trivial and inexpensive or extremely complicated and costly. For example, recruiting test subjects to donate tissue biopsies is generally more expensive and difficult than identifying patients who are willing to provide less-invasive (and painful) tissue samples. Sats Gencasadi Gy ftna-oncliial ae aOR ered nthe erro ea through the use of manual transcription, voice recognilion data-input systems, or desklop or handheld computers, “There is significant variation in subjective interpretation of clinical studies, For example, _five seasoned radiologists will typically pravide five different interpretations of the same chest “film or other radiographic study. In addition to the quality of the initial clinical observation, there are errars introduced by the hardware, software,and processes involved in capturing data, from keyboard and mouse to optical character recognition.and voice recognition. SS aEieneemmeemene Data Use Ones clinical and genomic data are captured, they can be put to a variety of imme {ses from simulation, statistical analysis, and visualization to communications, Tesuee » stage of the data life cycle include intellectual Property rights, privacy, and distribution Sxample, unless patients have expressly given permission to have their names used, mice dala should be identified by ID number though a system that unaintains dhe anonymity donor. Data Modification Data are rarely used in their raw_form, without some amount of editing. The data dictionary is one means of modifying data in a controlled way thal ene Standards ate [ollowed. A data dictionary can be used to tag all microarray data with tone date information in a standard format so ‘that they can be automatically correlated with cli findi i i Fig :Relationship between clinical data and microarray data ee Data Archiving Tris concemed with making data available for future use(back up).An atchive is foutainer for data that is infrequently accessed, wilh the focus more on data storage for longe fe than on access speed. Inthe archiving process —which ean range from, making a backup a a local database on a CD-ROM or Zip® disk to creating a backup of an entire EME system ins large hospital— data are named, indexed, and fled jn a way that facilitates identification later, One of the primary determinants of archive capacity is the storage media the physica material used to form a tape, disk, or carttidge. In addition to capacity, media can bx characterized fn terms of compatibility, speed, data density, cost, volatility, durability, and Compatibility is the ability of media to function within a particular software and hardware environment, Powed is a multifaceted performance characteristic that encompasses both the time to Tocate data (Seek tine) and the time to write i 1 or download it from the medig (data transfer Tate Seok tine may be several hundred milliseconds for a CD-ROM, a few millisecond fee a hard drive, and a few microseconds for aflasti memory card. Capacity the maximum amount of data the media can store —is a function of the media construction. Capacity is also a function of data density. Cost is a function of the raw materials involved in the creation of media, Volatility, a characteristic normaly ascribed to solid-state memory, refers to the status of the data when external power is removed.Hash memory, like magnetic disk or tape, is considered relatively non-volatile, and can hold data for years without lose Durability refers to the physical properties of the media that contsibute to the Tongevity of the surface, mechanisms, and housing, if any, during normal use. For example, the beatings and other components in the rotational system of a hard drive undergo wear and tear ever time. Archives vary considerably in configuration and in proximity to the source date For Sample servers typically employ several independent hard drives configured as » Redundaee Artay of Independent Disks (RAID system) that function in part as an integrated archival System RAID systems derive their speed from reading and writing to multiple disks in parallel, there are seven levels of RAID, level 3 is most applicable to bioinformatics computing, In RAID-3, a disk is dedicated to storing » party bit—an extra bit used to determine the securacy of data transfer—for error detection and correction, If analysis of the parity bie indicates an error, the faulty disk can be identified and replaced, The dats can be reconstructed by using the remaining disks and the parity disk For example in Figure, disks A-D are dedicated to data and disk P is used to store the parity bit In this case, an odd number of "1" bits cortesponds to a high (1) patity bit, When dla are written in parallel to the data disks, the corresponding parity bit ia toned on tha Parity isk Immediately after the data are written to the data disks, the data are road and the parity bts are compared. The change noted in Figure is typical ofa case when thete is an ervor on one disk. The error on disk "C" can be repaired, or if groups of errors are suddenly becoming apparent indicating imminent disk failure, then the entire disk can be replaced. Hig: RAID-5, Data clisks are read and writion to in parallel providing speed, while a dedicated arity disk provides increased reliability through error detection and correction, In the Zample, an error in disk C is detected by a different parity bit (P, indicating that the data vend from disks A-D don't agree with what was writen to the disks. Although the ‘parity bit is ‘ssually based on a comparison of bytes on the data disks, bits (0 or 1) are used here ken clarity. Data Repurposing One of the major benefits of having data readily available in an archive is the ability t0 purpose it for a variety of uses. For exemple, linear sequence data originally captured te slscover new genes are commonly repurposed to support the 3D visualization or protein structures, ‘The major issues in repurposing data is the ability to efficiently locate data in archives, ‘he difficulty in locating data once is been incorporated into a storage system, depends on the volumw of data involved. Data Disposal All deta die, either because they are intentionally disposed of when their velue has decreased to the point that itis less than the cost af maintaining it, or because of accidental loss, Often, data have to be archived because of legal reasons, even though the dota is of no nhc, ‘Value to the institution or researcher. For example, most official hospital or cline Patient records must be maintained for the life of the patient.

You might also like