0% found this document useful (0 votes)
10 views

Problem

zCSZ

Uploaded by

Hamada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Problem

zCSZ

Uploaded by

Hamada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

problem (1)

1. CNN Network for MNIST Data


In Assignment 2, our objective was to construct a LeNet-5 CNN model tailored to classify reduced
MNIST data. After training and testing it on the dataset, the results will be detailed and analyzed in
the report. Below is a description of the CNN architecture's layers:

Figure 1. The Architecture of LeNet-5 CNN Network

 Input Data Layer


The initial layer of our CNN is the input data layer, sized at 28x28, ideal for the reduced MNIST
dataset comprising handwritten digit images. This dataset includes 10,000 training images, with
1,000 images for each digit, and 2,000 testing images, with 200 images for each digit

 Convolutional Layer
The second layer is a convolutional layer with 6 filters of size 5 by 5, which are applied with
padding to maintain the spatial dimensions of the input.

 ReLU Layer
The convolutional layer is followed by ReLU activation function to maintain that all the output
of convolution is positive and suitable for image processing

 Average Pooling Layer


Next in the architecture is an average pooling layer with a stride of 2. This layer's purpose is to
downsample the input, reducing its spatial dimensions by a factor of 2 while maintaining the
number of channels.

 Fully Connected Layer


The architecture extends with additional stages where the previous three layers are replicated.
This includes a convolutional layer employing 16 filters of size 5 by 5, followed by a ReLU
activation function, and an average pooling layer with a stride of 2. Subsequently, there are fully
connected layers containing 120, 84, and 10 neurons, respectively. The first two fully connected
layers utilize ReLU activation to rectify negative values.
 Softmax Layer
In contrast to the initial two fully connected layers, the third layer employs softmax activation.
This activation function generates a probability distribution across the ten classes representing
the ten digits, aiding in classifying the images into one of these categories. The classification
layer ultimately assigns a predicted class label to each input image

To build this network we used MATLAB with its deep learning extensions to create the layers
mentioned above as shown in the following script and its corresponding layers diagram from Deep
Network Designer app in MATLAB.

2. CNN Network for MNIST Data with Spatial Attention


In Assignment 3 part the aim is to add spatial attention layer in order to increase the performance of
the CNN network.

 Spatial Attention
is a mechanism used in neural networks to selectively focus on specific regions of an input by
Identifying and prioritizing important regions or features within the input data and Assigning
weights to different parts of the input, indicating their significance

In our network we use the same network by same parameters used in Assignment 2 but with
adding a spatial attention layer after each convolutional stage in the system.

 Parameters
For training parameters, we use SGDM optimization algorithm to update weights of the network
with initial learn rate of 0.01, Max Epochs of 20 and Minimum Batch Size of 200.

 Results

Accuracy Training Time Testing Time


Without Attention 97.25% 418.46 s 5.13 s
with Attention 98.00% 410.39 s 3.73 s

Figure 2. result of minist data with/without attention


3. CNN Network for Audio Spectrogram Data
The goal outlined in Assignment 2 documentation was to develop a CNN model for classifying
audio utterances representing digits from 0 to 9. Spectrograms of the audio data were generated and
converted into images, which were then fed into the CNN network. This approach was similar to the
reduced MNIST case but with certain adjustments. The network underwent training and testing using
the dataset, and the results obtained will be elaborated upon and analyzed in the subsequent sections
of the report.

 Input Data Layer


The input data layer is the first layer in our CNN of size 292x219 which is suitable for audio
spectrograms images which are the utterances consists of 1200 training images 120 for each digit
and 300 testing images 30 for each digit.

 Same network architecture

4. CNN Network for Audio Data with Spatial Attention


In Assignment 3 part the aim is to add spatial attention layer in order to increase the performance of
the CNN network. In our network we use the same network by same parameters used in Assignment
2 but with adding a spatial attention layer after each convolutional stage in the system.

 Parameters
For training parameters, we use SGDM optimization algorithm to update weights of the network
with initial learn rate of 0.01, Max Epochs of 20 and Minimum Batch Size of 128.

 Result without augmentation

Accuracy Training Time Testing Time


Without Attention 89.00% 103.4 s 1.61 s
with Attention 90.00% 96.8 s 1.17 s
 Result with noise in speech

Accuracy Training Time Testing Time


Without Attention 81.22% 483.5 s 5.92 s
with Attention 83.56% 265.9 s 2.87 s
 Result with noise in image

Accuracy Training Time Testing Time


Without Attention 87.22% 383.7 s 4.07 s
with Attention 90.00% 325.5 s 2.88 s
problem (2)

English to Arabic Machine Translation: A Comparative Analysis of API Suppliers


Machine Translation from English to Arabic utilizes software and technology to automatically
convert text from English to Arabic. This process involves artificial intelligence and machine
learning to ensure accurate and understandable translations. It serves as a valuable tool for
facilitating cross-cultural communication and knowledge exchange in our modern era.

 Data sources
We collected data from diverse sources to ensure comprehensive evaluation. The dataset
comprised 60 pages, with 30 pages in English and 30 pages in Arabic. Sources included short
English stories, translated English articles, and surahs from the Qur’an.We chose different types
of texts to make sure we cover a wide range of writing styles and topics.
 Suppliers
1. Google API
2. Microsoft API
3. IBM API
 Methodology of compare
1. Initial Translation:
 Thirty pages of text in English represent the test model.
 Thirty pages of text in Arabic represent the golden model.
 Each supplier's API translates the test model into Arabic.
2. Evaluation Metrics:
 The translated data from each supplier is compared to the reference model to
calculate accuracy.
 Metrics such as BLEU score, word accuracy, and sentence accuracy are
utilized for evaluation.
3. Retranslation and Further Evaluation:
 Additionally, the translated data from each supplier is re-translated back
into English by the same supplier.
 The retranslated data is then compared to the Arabic reference model after
translated to English.
 This comparison provides further evaluation of translation quality and
helps assess any loss of meaning or fidelity during translation.
4. Speed of Translation:
 The speed of translation per file is recorded for each supplier.

The tables below summarize the comparative analysis results :


1- Accuracy Comparison of Machine Translation Suppliers"

Reference model versus translated Retranslation of translated data


data set versus original data
Average Average Average Average
 Suppliers Average Average
Blue score
word sentence Blue score
word sentence
accuracy accuracy accuracy accuracy

Google API 0.65 0.46 0.00 0.71 0.66 0.01


Microsoft
0.70 0.52 0.00 0.75 0.70 0.2
API
0.68 0.48 0.00 0.68 0.48 0.02
IBM API

2- Speed of translation per file for each Api supplier

Google api Microsoft api IBM API


Speed 4.25 sec per file 1.10 sec per file 4.13 sec per file

And this is the figure that shows the simulation result:

Figure 3 : Simulation Result

Note: The average word and sentence accuracy metrics are calculated using our own method.
 Conclusion:
Based on our evaluation, the Microsoft API emerges as the most accurate and fastest solution for
English to Arabic machine translation.

Part3
1) computational power aspect:

To match the computational abilities of the human brain using DGX H100, we have to figure out
how many servers we need. Each DGX H100 server has 8 H100 GPUs, and each GPU has a
8*16896 153,168 CUDA-CORES of cores.  86000000000/(8*16896) = 636,246 servers to
match the brain's power.

2) Energy Consumption:

We can’t represent the human brain in watts, so we will calculate the total cost of the machine
upon our assumption, then calculate the cost of a single human brain based on other factors, such
as diet

Assume average KW cost ~ $0.3

a) units of DGX H100


 The power consumption of each DGX H100 unit is calculated by multiplying the
power consumption of one H100 GPU (400 W) by the number of GPUs per server
(8), resulting in 3200 W per server. With 636,246 servers required to match the
computational power of the human brain, the total power consumption is
2,035,987.2 kW. This translates to an annual power consumption of 1.783x10^10
kWh.
 Considering a cost of 0.1 USD per kWh, the annual power cost for the DGX
H100 units amounts to 5.351 billion USD.

Assumption: Annual single human brain food Cost = $435/year

b) Air-conditioning
Server Dimensions: Height (H): 356mm, Width (W):482.2mm, Length (L): 897.1mm
Each server occupies an area of approximately = W*L = 0.43258 m2.
We assume Each server occupies an area of approximately =0.6 m2 .
So, System area = 0.6 * 636,246 =381747.6 m2 (all servers only)
One server size = 356* 482.2* 897.1*10^-9 = 0.1539 m2
System size = 0.1539 m2 * 636,246 = 97981.28 m3 (all servers)
Let a total floor area of 300,000 m2 accommodating the servers and necessary
infrastructure, we assume a conservative estimate of 12 watts per square meter (W/m²)
for air conditioning.
• 𝐏𝐨𝐰𝐞𝐫 𝐜𝐨𝐧𝐬𝐮𝐦𝐩𝐭𝐢𝐨𝐧 𝐨𝐟 Air-conditioning for the whole building = 300,000 *
12 = 3600000 WH = 3600 kw
• The annual 𝐏𝐨𝐰𝐞𝐫 𝐜𝐨𝐧𝐬𝐮𝐦𝐩𝐭𝐢𝐨𝐧 𝐨𝐟 Air-conditioning for the whole building =
3600 *365*24 = 31536000 kwh/year
• The annual 𝐏𝐨𝐰𝐞𝐫 𝐜𝐨𝐧𝐬𝐮𝐦𝐩𝐭𝐢𝐨𝐧 𝐨𝐟 Air-conditioning for the whole building
Cost (0.3 $/kW) = 9.47 Million/year

c) Lighting and other electricity uses

Lighting and other electricity uses:

Let's assume a conservative estimate of 7 watts per square meter (W/m²) for lighting and
other electrical purposes.

• Power consumption for lighting and other electricity uses = 500,000 sqm * 7 W/m² =
3,500 kW

• Annual power consumption for lighting and other electricity uses = 3,500 kW * 24
hours/day * 365 days/year = 30,660,000 kWh/year

Considering the previously assumed cost of $0.4 per kWh, the annual power
consumption cost for lighting and other electricity uses would be $12,264,000.

3) The weight of the human brain w.r.t. the artificial brain, and take into account:

a) DGX H100 unit’s weight


One unit of DGX H100 weight is 130.45 Kgs w.r.t data sheet provided and I need total 636,246
units so the full weigh is 82998290.7 Kgs = 829998.2907 tons

When considering system packaged weight the full weight will increase 40 Kgs on each server

The total weigh is 829,998,290.7 Kgs + 40*636,246 Kgs = 108448130.7 kg

Vs The human brain which is just 1.35 Kgs

b) Air-conditioning units’ weight


The cooling tool that is typically used in data centers to cool high-density computing systems is a
computer room air conditioning (CRAC) unit.

A typical CRAC unit for a large data center can weigh over 1000 kg or more

Since Total GDX H100 unit Power Consumption (kW) is 3200*636,246 = 2035987200 KW

a typical CRAC efficiency rating of 0.8


total heat load generated by the computing equipment is approximately

2035987200 KW x 0.2 = 407197440 KW

Assuming a standard cooling capacity of 180-240 kilowatts per CRAC unit

to cool a heat load of 407197.44KW, we would need approximately 1697 CRAC units.

The weight of each CRAC unit can range from 300-1500 kgs, so on average it’s 1 tons.

So the total weight of air conditioning is about 1697 tons

c) other stuff weight like racks :

To estimate weight of additional components like racks and power units in a data center
we assume a standard layout, estimating around 1.5 kg/ m² of floor space. Therefore, for
a data center spanning 500,000 m², the total weight of these components would amount to
approximately 750kg.

d) Building weight

Assuming 8000 m2 floor area, we are using it to calculate the average floor weight.
Assuming a concrete floor which can weigh 75 kg/m2
it weighs around 600000 kg/sqm.
𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐰𝐞𝐢𝐠𝐡𝐭 = 108,448.1307 tons + 60 tons + 1900 tons + 1.5 tons = 110,409 ton
𝐁𝐫𝐚𝐢𝐧 𝐰𝐞𝐢𝐠𝐡𝐭 𝐰. 𝐫.𝐭. 𝐦𝐚𝐜𝐡𝐢𝐧𝐞 = (1.35 kg/110,409 ton) is negligible

Brain weight w.r.t the machine weight is an incredible reduction!!!

4) If you know that it is reported that hardware represents only 17%

Cost of 1 DGX H100 unit = $482,000 (Wikipedia)

Total cost of 636,246 DGX H100 units = 636,246 units * $482,000 = $306.67 B

Total cost of the complete IT system = $306.67 B / 0.17 = $1803 B


5) Other crucial factors to consider in the comparison

a) The interconnected neurons of the human brain versus the parallel architecture of GPUs
b) the brain's capacity for adaptation and rewiring versus the fixed architectures of GPUs,
c) The lifelong learning ability of the brain versus the potential need for retraining or fine-
tuning GPUs for new tasks.
d) when comparing the cost of machine consumption to the human brain, various
considerations such as lifelong air-conditioning and other necessities for optimal brain
function must be taken into account. Humans typically work around 8 hours a day, which
should be factored into the analysis.
e) With the average cost of living index globally estimated at 51.43 and an average monthly
income of $3,000, the annual cost of living is approximately $18,000. Furthermore,
taking into consideration the cost of the machine itself after it becomes inoperative,
assuming the DGX H100 is utilized continuously at its maximum operating conditions
for 8 hours daily, and with an average lifespan of 5 years, each unit's average cost in 2023
would be $482,000. Consequently, the total projected cost for each machine over 5 years
would amount to roughly $2,410,000. In comparison, considering the average human
lifespan of approximately 73 years, and assuming only 50 years are dedicated to work,
the estimated cost rises to around $900,000."
 The table

636,246 servers

97981.28 M^3 all servers

110,409 𝐭𝐨𝐧𝐬

1.79 *10^10 KWh

415$ 1.79*10^9 USD

About 308 B$
Problem5
Human Brain:

Number of Neurons: Approximately 86 billion neurons.

Each neuron is connected to about 10,000 other cells.

Total parameters: 86 billion neurons * 10,000 = 860 trillion parameters.

ChatGPT Base Hardware (GPT-3):

Number of Trainable Parameters: 175 billion parameters.

Comparison:

Percentage = 860 trillion/ 175 billion = 4914

The comparison between the parameters of the human brain, which are 4914 times more than
those of GPT-3, underscores the remarkable power of the human brain. While the human brain
can perform trillions of computations per second, ChatGPT operates on digital processors that are
considerably less potent.

Major advantages of the human brain:

1. Generalization: The human brain excels at generalizing knowledge and skills across different
domains. Humans can apply knowledge learned in one context to solve problems in entirely
different situations. Neural networks often struggle with generalization, requiring extensive
training data and fine-tuning to perform well across diverse scenarios.
2. Adaptability: The brain is incredibly adaptable, capable of learning new skills and adapting to
changes in the environment quickly. Neural networks typically require retraining or significant
adjustments to adapt to new tasks or environments.
3. Robustness to Noise and Variability: The human brain is remarkably robust to noisy or
incomplete data, as well as variations in input. Humans can understand speech in noisy
environments, recognize objects from partial views, and interpret ambiguous or incomplete
information. Neural networks can be sensitive to noise and variations outside their training data
distribution.
4. Efficient Learning from Few Examples: Humans can learn new concepts or skills from just a
few examples or even a single instance, thanks to our ability to abstract and generalize
knowledge. Neural networks often require large amounts of labeled data to learn effectively,
making them less efficient in learning from limited examples.
5. Incorporation of Context and Prior Knowledge: The human brain integrates contextual
information and prior knowledge into decision-making and problem-solving processes. This
allows humans to make informed decisions based on past experiences and knowledge of the
world, even in novel situations. While neural networks can incorporate some forms of context,
they often lack the rich, hierarchical representations of knowledge that humans possess.
6. Emotional and Social Intelligence: Humans possess emotional and social intelligence, allowing
us to understand and navigate complex social interactions, empathize with others, and express
emotions through language and behavior. Neural networks lack true emotional understanding and
social intelligence, limiting their ability to interact meaningfully in social contexts.

You might also like