0% found this document useful (0 votes)
6 views

Ex - 2 - Data Transformation-1

Uploaded by

jrntrmpr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Ex - 2 - Data Transformation-1

Uploaded by

jrntrmpr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Ex.

No: 2
Date: 11/09/2023

Data Transformation: Case Summaries – Replacing Missing Values – Identifying Duplicate Cases–
Recoding – Ranking Cases – Compute New Variables.

I. Case Summaries:
In SPSS (Statistical Package for the Social Sciences), you can create case summaries or descriptive
statistics to summarize and analyze your data.
1. From the menus choose:
Analyze > Reports > Case summaries
2. Click Select variables under the Dependent variables section and select one or more
variables. Click OK after selecting the dependent variables.
3. Optionally, click Select variables under the Group variables section and select one or more
variables that define groups of cases and produce individual summaries with respect to each
category. Click OK after selecting the group variables to run the analysis. SPSS will generate
case summaries for the selected variables and display them in the output viewer.

II. Replacing missing values:


Missing observations can be problematic in analysis, and some time series measures cannot be
computed if there are missing values in the series. Replacing missing values in SPSS is a common data
preprocessing step to ensure that your analysis includes all available data. You can replace missing
values with specific values, such as zeros, means, medians, or any other value of your choice.

1. From the menus choose:


Transform > Replace missing values
2. From the menus choose:
3. Transform > Replace missing values...
4. Click Select variables under the Variables for which to replace missing values section, select
the variables for which you want to replace missing values, and click OK.
5. Optionally, click the new variable name and method link next to each variable under the
Variables for which to replace missing values section, enter a new variable name to override
the default name and/or select the estimation method that you want to use to replace missing
values, and click OK.
6. For more information, see Estimation methods for replacing missing values.
7. Click Apply.

Estimation methods for replacing missing values


• Series means: Replaces missing values with the mean for the entire series.
• Mean of nearby points: Replaces missing values with the mean of valid surrounding values.
The span of nearby points is the number of valid values above and below the missing value
used to compute the mean.
• Median of nearby points: Replaces missing values with the median of valid surrounding values.
The span of nearby points is the number of valid values above and below the missing value
used to compute the median.
• Linear interpolation: Replaces missing values using a linear interpolation. The last valid value
before the missing value and the first valid value after the missing value are used for the
interpolation. If the first or last case in the series has a missing value, the missing value is not
replaced.
• Linear trend at point: Replaces missing values with the linear trend for that point. The existing
series is regressed on an index variable scaled 1 to n. Missing values are replaced with their
predicted values.

III. Identifying Duplicate Cases:

1. From the menus choose:


Data > Identify duplicate cases...
2. Click Select variables under the Define duplicate cases section, select variables that contain
cases that are considered duplicates, and click OK.
3. Click Select variables under the Sort within matching groups section, select one or more
variables to sort cases within groups defined by the selected matching cases variables, and
click OK. The sort order defined by these variables determines the "first" and "last" case in
each group. Otherwise, the original file order is used.
4. Optionally, you can:
o Select a sort order for the selected sort variables.
o Expand the Additional settings menu and click Save to dataset to create a variable
that identifies all unique cases and duplicates in each group.
5. Click Apply.

IV. Recoding:

The Recode variables procedure provides options for reassigning the values of existing
variables, or collapsing ranges of existing values, into new values for existing variables or for new
variables. For example, you could collapse salaries into salary range categories for existing variables or
into a new variable that contains salary-range categories. Both numeric and string variables can be
recoded.

Recode into Different Variables:


1. From the menu, choose Transform > Recode into Different Variables. The “Recode into
Different Variables” dialog box will appear.
2. Select the variable you want to recode.
3. In the Output Variable area, enter the name for the new variable and click Change.
4. Click Old and New Values to specify how to recode values.
5. Specify an old value and a new value. Click Add to place the specification into the Old –>
New list. In this example, the age variable is recoded into four age groups (below 20, 21 to 40,
41 to 60, 61 and older).
6. Click Continue and return to the previous dialog box.
7. Click OK.
Recode into Same Variables:
1. From the menus choose:
Transform > Recode into Same Variables...
2. Select the variables you want to recode. If you select multiple variables, they must be of the
same type (numeric or string).
3. Click Old and New Values.
4. Specify an old value and a new value.
5. Click Add to place the specification into the Old->New list.
The new value must be of the same type (numeric or string) as the existing variable. For string variables,
it must be of the same length as the existing variable.
V. Ranking Cases:

From the menus choose:


Transform > Rank cases...
1. Click Select variables under the Rank variables section and select one or more variables to
rank. You can rank only numeric variables. Click OK after selecting the variables.
2. Optionally, set the sorting method. By default, cases are sorted in ascending order.
3. By default, summary tables display in the output. You can deselect the Display summary
tables option to prevent summary tables in the output.
4. Optionally, click Select variables under the Grouping variables section to choose variables
that will organize rankings into subgroups. Ranks are computed within each subgroup.
Click OK after selecting the variables.
5. Optionally, you can select the following options from the Additional settings menu:
o Click Types to include ranking methods.
o Click Ties to specify methods for handling the ranking of cases with the same value.
6. Click Apply.

VI. Compute New Variables:

1. To compute a new variable, click Transform > Compute Variable.


2. Target Variable: The name of the new variable that will be created during the computation.
Simply type a name for the new variable in the text field. Once a variable is entered here, you
can click on “Type & Label” to assign a variable type and give it a label. The default type for
new variables is numeric.
3. Numeric Expression: Specify how to compute the new variable by writing a numeric
expression. This expression must include one or more variables from your dataset, and can use
arithmetic or functions.
4. Function group: You can also use the built-in functions in the Function group list on the right-
hand side of the window. The function group contains many useful, common functions that
may be used for calculating values for new variables (e.g., mean, logarithm). To find a specific
function, simply click one of the function groups in the Function Group list. You will now see
a list of functions that belong to that function group in the Functions and Special Variables area.
If you click on a specific function, a description of that function will appear in the text field to
the left.
5. Eg. This expression says that the new variable will be calculated as variable Weight multiplied
by 703, divided by the square of variable Height.
6. Click OK to complete the computation.

You might also like