Open In App

Scale Function in R

Last Updated : 28 May, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Scale function in R is a handy way of accomplishing this goal. This means that the numerical variables are uniformized by means of centering and scaling. This piece goes into all the intricacies of the scale function such as its syntax, parameters, use, examples, applications, and best practices so that the data could be standardized.

Understanding the Scale Function

The R scale function mainly applies to normalizing the variables of a given data set. Standardization also known as z-score normalization standardizes the data to make its mean zero and its standard deviation one single. This change can be very helpful when there are data points with different units or scales as it brings all the variables on the same scale. However, there is no distortion of a range in variation.

The basic syntax of the scale function is:

Syntax: scale(x, center = TRUE, scale = TRUE)

where:

  • x: A numeric vector, matrix, or data frame that contains the set of data to be standardized.
  • center: A logical value or a numeric vector. If TRUE, the means of each column are subtracted from the data (default is TRUE). If a numeric vector is given, its length should be equal to the number of columns in x, and the vector will be used for centering.
  • scale: A logical value or a numeric vector. If TRUE, the data is scaled by dividing by the standard deviation of every column (default is TRUE).

To implement the scale function, you can send your data as an input. Here is the basic example of scale function in R.

R
# Sample numeric vector
data <- c(1, 2, 3, 4, 5)

# Applying the scale function
scaled_data <- scale(data)

# Displaying the scaled data
print(scaled_data)

Output:

           [,1]
[1,] -1.2649111
[2,] -0.6324555
[3,] 0.0000000
[4,] 0.6324555
[5,] 1.2649111

The output shows the standardized values with mean 0 and standard deviation 1.

Using Scale function with a Data Frame

When dealing with data frames, scale can standardize each numeric column independently:

R
# Sample data frame
data <- data.frame(
  A = c(1, 2, 3, 4, 5),
  B = c(10, 20, 30, 40, 50)
)
data
# Applying the scale function
scaled_data <- scale(data)

# Displaying the scaled data
print(scaled_data)

Output:

  A  B
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50

A B
[1,] -1.2649111 -1.2649111
[2,] -0.6324555 -0.6324555
[3,] 0.0000000 0.0000000
[4,] 0.6324555 0.6324555
[5,] 1.2649111 1.2649111

The output shows the standardized values for each column.

3. Custom Centering and Scaling

Centering typically involves subtracting the mean or a custom value from the data, and scaling involves dividing by the standard deviation or another custom value. This is commonly done to standardize data before performing machine learning or statistical analysis.

R
# Creating a sample data frame
df <- data.frame(
  height = c(150, 160, 170, 180, 190),
  weight = c(50, 60, 70, 80, 90)
)
print(df)
# Custom centering and scaling values
center_values <- c(height = 165, weight = 75)
scale_values <- c(height = 10, weight = 15)

# Centering and scaling the data frame
df_scaled <- as.data.frame(scale(df, center = center_values, scale = scale_values))
print(df_scaled)

Output:

  height weight
1 150 50
2 160 60
3 170 70
4 180 80
5 190 90

height weight
1 -1.5 -1.6666667
2 -0.5 -1.0000000
3 0.5 -0.3333333
4 1.5 0.3333333
5 2.5 1.0000000

This output shows that each value in the data frame has been adjusted according to the specified centering and scaling values, allowing for standardized comparisons across the data set.

Conclusion

The R scale functions are fundamental for data standardization. Through centering and scaling, you make sure each and every variable, including yours, adds the same amount of significance to your analysis or model. Knowledge of the intended applications and of the function of parameters will make your data preprocessing more precise and reliable with the intended results.


Next Article
Article Tags :

Similar Reads