0% found this document useful (0 votes)
19 views

UNIT-2 (1)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

UNIT-2 (1)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT 2: DATA RELATED ISSUES

The Concept of Data Importance of


Gathering Ensuring Accurate
 Data gathering entails the Data Gathering
Consequences from improperly
systematic process of collected data include:
collecting information on  Inability to accurately answer
relevant factors to answer research questions
specific research questions,  Inability to replicate and
test hypotheses, and assess validate the study
results.  Distorted findings resulting in
 All academic disciplines, resource waste
including the humanities,  Deceiving other researchers
social sciences, businesses, into pursuing fruitless
and the physical and natural avenues of investigation
sciences.  Compromising decisions for
 Although techniques differ public policy
 Harming both human and
depending on the discipline,
the importance of ensuring animal subjects
accurate and truthful
Issues related to Maintaining Integrity of
Data Gathering
 To assist the detection of Each strategy is used at
errors in the data gathering various stages of the
process, (deliberate research timeline:
falsifications) or not,
maintaining data integrity is
the main justification  Quality control measures
(systematic or random errors). are taken before data
collecting starts.
 There are two approaches
that can preserve data  Activities that are
integrity and ensure the performed both during and
scientific validity of study
after data collection
results.
constitute quality control.
• Quality Assurance
• Quality Control
Quality Control Measurements
There are several ways to show these
 Since data collecting shortcomings:
comes before quality  Lack of clarity regarding the timing,
assurance, its primary
goal is “Prevention" procedures, and identity of the
person(s) in charge of data review;
 The uniformity of protocol incomplete listing of the things to be
created in the thorough collected;
procedures manual for  Vague descriptions of the data
data collecting serves as collection tools to be used rather
the best example of this than detailed step-by-step
proactive step. instructions on administering tests;
 Failure to identify specific content
 Poorly Written Manuals
and strategies for training or
increase the risk of falling
to identify problems and
retraining staff members in charge of
errors early in the data collection; and
research endeavor  Cryptic instructions for utilizing,
adjusting, and calibrating data.
Quality Control

 Quality Control actions (detection/monitoring and


intervention) take place both during and after data
collection.

 A clearly defined communication structure is a


necessary precondition for establishing monitoring
systems.

 Detection or monitoring can take the form of direct staff


observation during site visits, conference calls, or
regular and frequent reviews of data report to identify
inconsistencies, extreme values or invalid codes.
Verification of Data Quality
Challenges of Data
Gathering
 It is also possible to gauge
the sincerity of responses  Errors in individual data
using measures of "social items,
desirability." In this case, two
 systematic errors,
issues need to be brought up:
I. cross-checks during the  protocol violations,
data collection process, and  issues with particular
II. data quality is a problem at staff members or site
both the observational level performance,
and across the board.  fraud or scientific
Therefore, it is important to misconduct, for example,
consider data quality for each are examples of data-
measurement, each gathering concerns that
observation, and the overall set necessitate a quick
of data.
response.
Data Dictionary

 A data dictionary is a file that aids in defining how a


specific database is organised. The programmer or other
users who might need access to the data might utilise the
data dictionary to describe the data objects or items in a
model.

 An orderly table containing the names and definitions of


data elements makes up a simple data dictionary. It may
be used to define a single database, a subset of an
organization's data holdings, or the entirety of those
holdings. Advanced data dictionaries may include entity
relationship diagrams and database structures with
reference keys.
Components of a Defining Variables in
Data Dictionary a Data Dictionary
 Data Assets: These are the raw  Name: Choose a clear,
materials that make up the data,
descriptive name that
such as spreadsheets, databases, or
text files. reflects the variable's
 Entities: These are the main objects content.
or concepts represented in the data,  Type: Specify the data type,
like customers, products, or orders. such as text, number, date,
 Attributes: These are the or Boolean.
characteristics or properties of each  Description: Provide a brief
entity, such as customer name,
product price, or order date. definition of the variable's
 Value Domains: These are the sets meaning.
of possible values for each  Requirement ID: Indicate
attribute, like gender (male, why the variable is included
female), product category in the system.
(electronics, clothing), or order
 Security Category: Define
status (pending, shipped,
delivered). who has access to the
variable.
Determining Data
Variable Naming Type and Data
Guidelines Elements
 Integers: These are whole numbers
 Use evocative used for things like frequencies, counts,
identifiers. or coded categorical data (like choices
in a questionnaire).
 Use both short and
 Floating Point Numbers: These are
long names. numbers with decimal places, used for
 Employ prefixes and measurements like blood alcohol
suffixes. concentration, height, or weight. Make
 Avoid reserved words, sure to specify the required number of
decimal places for accuracy.
umlauts, blank  Character Strings: These are used for
spaces, and special text-based data, like descriptions,
characters. explanations, or free-text responses.
 Use English variable Always define the maximum length of
the string.
names if possible.
 Date and Time: These represent specific
 Include structural
points in time, and can be expressed in
details. various formats. It's important to
What is a Data Element?
 A data element is a fundamental unit of data in a dataset. It represents
a single piece of information, like a person's name, age, or address.

Attributes of a Data Element:


1. Domain: This defines the context or category the data element belongs
to. For example, "participant information" could be a domain
encompassing data elements like name, address, phone number, etc.
2. Identification Number: A unique identifier for the data element, often
used for technical purposes.
3. Name: A clearly defined name that's universally recognized within the
system.
4. Field Names: These are the names used for the data element in
databases and computer programs.
5. Definition: A brief explanation of the data element's meaning and
significance.
6. Unit of Measurement: If applicable, this specifies the unit used for the
data, like "kilograms" for weight or "years" for age.
7. Value: This is the actual reported value of the data element.
8. Precision: The level of detail or accuracy provided in the reported
value.
9. Data Type: The kind of data, its size, and any specific
representations used (e.g., character string, integer, date).
10. Size: The maximum field length allowed for the data element,
including the number of decimal places if necessary.
11. Field Limitations: This specifies whether the field is mandatory,
conditional (depends on other fields), or optional (null).
12. Default Value: A predetermined value that is automatically
assigned if no other value is provided.
13. Edit Mask: A format or pattern that helps ensure the data is
entered correctly, like "yyyy/mm/dd" for a date.
14. Business Rules: These define any validation criteria or coding
rules for the data element, such as acceptable values or ranges.
15. Related Data Elements: A list of other data elements that are
significantly linked to this one.
16. Security Classification: Specifies access restrictions or security
levels for the data element
17. Database References: Indicates which tables in the database
the data element is used in, and its role within those tables.
18. Definitions and Sources: Provides definitions of the application
domain and references to documents that help understand the
meaning and use of the data element.
19. Data Source: Explains where the data element's value
originates.
20. Validity Dates: Specifies the dates the data element definition
is valid, potentially covering multiple time periods.
21. External References: Provides links to books, publications,
legislation, etc. that are relevant to the data element.
22. History References: Documents previous versions or changes
made to the data element definition.
23. Version Information: Indicates the version number of the data
element documentation.
24. Written Date: The date this specific version of the
documentation was created.
Defining Ranges of Values
 A proper range of values for each variable should be defined, to
reduce the possibility of entering implausible values during the
data collecting process.
 The values must not be too narrow and must include the whole
range of realistic values.
 Indicate the scale of measurement if possible, for future quality
control analysis.
Objective/ Possible Values Scale of
Variable Range Measurement
Selectable options 1,2,3 Nominal

Measurement in 0 - 100 Ordinal


percentage

Admission date 01.01.2024 – Cardinal


31.12.2024
Coding Valid Values

 If answering question requires choosing from one or more


predetermined options or response categories, each solution
must be expressed clearly.
 Coding needs to represent all potential solutions and be
transparent, and consistent.
 Alphanumeric and numeric codes are applied

Objectives Valid Values Possible Coding


Standard question No, Yes 0,1

Gender Male, Female M,F

Accident cause Scalding, flame, 1,2,3


explosion
Specifying
Qualitative Missing Checking Dependency

 In general, it is best to be as  Checking dependencies


specific as you can when involves identifying and
describing missing data. analyzing the
 All missing values must be
interrelationships
encoded to be able to make between different
qualitative claims about them
and to demonstrate the
components of a system.
completeness of the data for
statistical analyses and report. • Hardware: Physical
 Take into account the possible devices like computers,
ranges of values. servers, and network
 For missing outside the equipment.
permitted value range, use
• Software: Applications,
negative codes or the highest
numeric codes that can be operating systems, and
presented outside the allowed databases
value ranges.
Best Practice in the
Identifying Calculable Values
Data Gathering
 Effective data gathering is a cornerstone
of data-driven decision-making. Here are
 Identifying calculable some best practices to ensure the quality
and reliability of your data.
values is a crucial step
in understanding the 1. Define Clear Objectives
data requirements and • Specificity: Clearly articulate what you
functional needs of a want to achieve with the data.
• Relevance: Ensure the data you collect
system. These values, directly aligns with your objectives
derived from raw data,
provide insights and 2.Choose the Right Data Collection Methods
drive decision-making. • Surveys: Use for quantitative data, but
design questions carefully to avoid bias.
• Interviews: Ideal for qualitative data, but
ensure structured questioning to
maintain focus.
3.Ensure Data Quality:
• Accuracy: Verify data for correctness and consistency.

4.Protect Data Integrity:


• Secure Storage: Implement robust security measures to protect sensitive
data.

5.Ethical Considerations:
• Informed Consent: Obtain informed consent from participants, especially
for sensitive data
• .
6.Plan for Data Analysis:
• Data Cleaning: Clean and preprocess data to remove errors and
inconsistencies

7.Ensure Data Security:


• Access Controls: Limit access to sensitive data.
• Encryption: Encrypt sensitive data to protect it from unauthorized access.

By following these best practices, you can gather high-quality data that will
THANK YOU!

You might also like