0% found this document useful (0 votes)
4 views

Assignment 1

This document is an assignment for a course on the Application of AI and ML in Chemical Engineering, consisting of various sections with short answer, conceptual, analytical, and advanced application questions. It covers topics such as machine learning techniques, data analysis, PCA, and model evaluation in the context of chemical engineering processes. Students are required to provide detailed answers, calculations, and explanations for each question.

Uploaded by

sanskarughadeuna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Assignment 1

This document is an assignment for a course on the Application of AI and ML in Chemical Engineering, consisting of various sections with short answer, conceptual, analytical, and advanced application questions. It covers topics such as machine learning techniques, data analysis, PCA, and model evaluation in the context of chemical engineering processes. Students are required to provide detailed answers, calculations, and explanations for each question.

Uploaded by

sanskarughadeuna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Assignment 1

Course Code: C653

Course Name: Application of AI and ML in Chemical Engineering

Instructions:
- Answer all questions.
- Clearly mention assumptions, formulas, and steps in calculations.
- Use diagrams, flowcharts, or graphs where applicable.
Section A: Short Answer Questions (1 Mark Each)
1. Define the difference between hyperparameter and parameter in machine learning?
2. Give details about feature selection Vs feature extraction?
3. What does an ROC curve represent in model evaluation?
4. Define data scaling and Principal Component Analysis (PCA). How is PCA used for dimensionality reduction?
5. Which machine learning techniques utilize both labelled and unlabelled data?
6. The following are the ages of a group of people: [22, 24, 25, 29, 30, 32, 33, 100]. Identify the outlier(s) in the
data using the range method?
7. The following are the scores of 5 students in a math test: [72, 80, 85, 90, 95]. Calculate the mean and the
standard deviation of the scores?
8. What are the differences between Standard Score and Min-Max Scaling in terms of algorithms and outliers?
9. True/False: “Overfitting can sometimes be reduced using optimization techniques like regularization.”?
10. If a dataset has a covariance of 0.85 between two features, what does it indicate?

Section B: Conceptual and Calculation-Based Questions (2 Marks Each)


1. What is the difference between classification and regression in machine learning?
2. If a feature in a dataset ranges from 10 to 100, apply min-max normalization to transform a value of 55?

3. If Singular Value Decomposition (SVD) is applied to a 5 × 4 matrix, what are the dimensions of its
decomposed matrices?

4. Given a dataset with eigenvalues (5.0, 3.0, 1.5, 0.5), determine how many components should be selected if
we want to retain 85% variance?

5. Explain the working of the Savitzky-Golay filter, specifying the polynomial order and window size, and apply
it to smooth the noisy data points: [2.1, 3.4, 4.0, 4.8, 5.3, 6.0]?

Section C: Analytical and Problem-Solving Questions (4 Marks Each)


1. Describe the application of a reinforcement learning algorithm in the field of chemical engineering for
process optimization, focusing on its implementation in a decision support system for a reverse osmosis
plant. The explanation should cover the components such as the flowchart encompassing all aspects of the
decision support system, the features from the reverse osmosis plant that can be utilized, the formulation of
a loss function incorporating rewards and penalties, and the benefits of utilizing reinforcement learning to
differentiate between suboptimal and ideal operational states, thereby assisting operators in improving
plant performance.
2. Discuss the role of optimization in machine learning model development. Is it always required? Provide two
scenarios where optimization is a must and might not be necessary, focusing on the implications for machine
learning model performance and efficiency.
Page 1 of 5
3. Suppose you have an unscaled dataset with two features: 'temperature' (ranging from 0-100°C) and
'pressure' (ranging from 1-10 atm). Explain how standardization would help in ML model training?

4. Using Principal Component Analysis (PCA), compute the first principal component for the dataset:
X = [2, 3, 5], Y = [1, 4, 6]

Find the eigenvector for the covariance matrix?

5. Given the following dataset of daily sales in a store for one week: [150, 200, 180, 220, 250, 190, 210],
perform a basic Exploratory Data Analysis (EDA) and answer the following:
a) What is the mean sales value for the week?
b) What is the range of sales values?
c) Identify any trends or patterns you notice?

6. Given a training dataset with 1000 samples, explain how stratified sampling can improve training results in
classification models?
7. If an ML model was trained using 80% of the data and tested on 20%, compute how many samples were in
the training set if the dataset contained 500 records.?
8. A process control system logs the following pressure values: [80, 85, 83, 87, 90]. Compute the Exponential
Moving Average with α = 0.1?
9. Given the specified feature ranges CSTR, compute the normalized and standardized values for a data point
with a `Temperature` of 200°C, a `Pressure` of 50 bar, and a `Concentration` of 1.0 mol/L. Utilize the dataset
mean (μ) and standard deviation (σ) for each feature: `Temperature` (μ=200°C, σ=75°C), `Pressure` (μ=100
bar, σ=50 bar), and `Concentration` (μ=1.05 mol/L, σ=0.5 mol/L). Temperature: Min = 100°C, Max = 300°C,
Pressure: Min = 20 bar, Max = 180 bar. Concentration: Min = 0.2 mol/L, Max = 2.2 mol/L. Present a detailed
breakdown of your calculations. Further, assume you are requested to create a machine-learning model to
forecast reactor yield. In this context, how do normalization and standardization affect the predictive
accuracy of machine learning models when estimating CSTR yield, given the distinct challenges presented by
the scale, distribution, and units of variables such as Temperature, Pressure, and Concentration? The scale
and distribution of these features (e.g., Temperature, Pressure, and Concentration) can notably impact CSTR
yield.
10. Given a high-dimensional dataset with 200 features, apply PCA to reduce the dimensionality while retaining
90% variance. Explain your approach and compute the number of components to retain if the total variance
is 500 and eigenvalues of selected components sum to 450?
11. The measured pH of the solution in CSTR is observed to have high noise and the data sample for a 1.9-sec
duration is shown in Table 2. Further to develop an ML model to predict the pH with respect to the reactor
input feature, the noise present in the pH has to be suppressed using appropriate smoothing filters.
12. Table 2: pH variation in CSTR reactor.

Time (sec) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
pH 3.81 3.55 3.95 4.05 4.34 3.65 4.19 4.24 4.09 4.56
Time (sec) 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
pH 4.48 4.13 4.70 4.30 4.80 4.25 4.72 4.08 4.09 4.48
a) Perform the following smoothening techniques to suppress the noise in measured pH data for three-
time steps.
b) Simple Moving Average with window size =3
c) Exponential Moving Average with β=0.25 & β=0.75
d) Savitzky–Golay (SG) Filter of order 2 and window size =5
e) Given data: SG filter coefficients

Page 2 of 5
13. An exothermic reaction is taking place in a jacketed continuously stirred tank reactor. The performance of
the reactor depends on the flow rates between reactant and coolant. The operator has defined reactor
operation as good and bad based on the yield and energy efficiency of the reactor. Three different linear ML
models are constructed to segregate the good and bad operating points. These models are shown in below
Figure 1.
a) Calculate the model 1 performance metrics namely Precision, Recall, F1 score
b) For each of the models estimate the false positive rate, and true positive rate and plot the Receiver
Operating Characteristic (ROC) curve.
A) Model 1 B) Model 2

C) Model 3 Figure 1: Three different ML linear models


to classify the reactor operation category

Section D: Advanced Applications (10 Marks Each)

Page 3 of 5
1. In the context of operating a petrochemical distillation column, accurate prediction of the quality of the top
product, such as the purity of the distilled compound, is paramount for optimizing the process. Various
operational parameters like feed flow rate, feed composition, feed temperature, reflux ratio, distillate flow
rate, and temperatures of the top two trays play crucial roles in influencing the distillation process. Principal
Component Analysis (PCA) has been conducted on historical data to understand how these variables impact
the purity of the top product.

The results of the PCA revealed two principal components, PC1 and PC2, that collectively explain a significant
portion of the variance in the data: PC1 explains 60% of the variance, while PC2 explains 30%. The
eigenvectors associated with PC1 and PC2 provide insight into the influence of each parameter on the
process dynamics. The eigenvectors for PC1 and PC2, reflecting the influence of each parameter, are as
follows:

• Eigenvector for PC1: [0.5 (feed flow rate), 0.3 (feed composition), 0.4 (feed temperature), -0.3 (reflux
ratio), -0.2 (distillate flow rate), 0.6 (top tray temperature), 0.5 (second top tray temperature)].
• Eigenvector for PC2: [-0.2 (feed flow rate), 0.6 (feed composition), -0.1 (feed temperature), 0.5
(reflux ratio), 0.3 (distillate flow rate), -0.4 (top tray temperature), -0.3 (second top tray
temperature)].

Please provide your interpretation and analysis below questions:

a) Discuss the implications of the variance explained by PC1 and PC2 for distillation column operations
and how they contribute to understanding process dynamics.
b) Analyze operational parameters based on their coefficients in the eigenvectors for PC1 and PC2, with
a focus on the importance of the top two tray temperatures.
c) Design a conceptual model architecture leveraging PC1 and PC2 to predict the purity of the distilled
compound. Outline the steps for constructing this model, from inputting operational parameters to
outputting product purity predictions.
I. Incorporate PCA results into the model architecture and explain the roles of PC1 and PC2 in
predicting product purity.
II. Define data preprocessing steps before applying PCA and how the model would handle new
data for real-time or near-real-time predictions of product purity.
d) Propose operational adjustments or monitoring strategies based on PCA analysis to enhance the
purity of the top product, emphasizing how controlling key parameters identified through PCA can
optimize distillation efficiency.
2. In the packed bed column, the pressure drop is measured using a differential pressure gauge by varying the
input flow rate through the column, the measured experimental data is given in Table 3.

Table 3: Pressure drop vs flow rate across the packed bed column

S.No 1 2 3 4 5 6 7 8
Flowrate (m3/hr) 2 5 8 10 12 14 16 18
Pressure drop (bar) 0.1 0.5 0.8 1.5 1.9 2 2.1 2.6
a) Perform the PCA analysis and extract one new feature (i.e., the equation for PC1) that can be used to
identify the process performance.
b) Draw a Scree plot and conclude that the extracted feature (PC1) is good enough to capture the original
process behavior observed from the original data.
3. Explain the methodology to deploy LDA for feature extraction and regression model development.
4. Apply Linear Discriminant Analysis (LDA) on the dataset below to classify two classes and compute the class
separation boundary:
Page 4 of 5
a. Class 1: X = [1, 2, 3], Y = [2, 4, 6]

b. Class 2: X = [4, 5, 6], Y = [8, 10, 12] ?

6. Implement Ridge Regression on the dataset below and compute the regression coefficients using L2
regularization:

a. X = [1, 2, 3, 4, 5]

b. Y = [2, 4, 6, 8, 10]

c. Regularization Parameter λ = 0.5 ?

Page 5 of 5

You might also like