Sculpting DistilBERT Enhancing Efficiency in Resource-Constrained Scenarios

Uploaded by

Daniel Hsu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Sculpting DistilBERT Enhancing Efficiency in Resource-Constrained Scenarios

Uploaded by

Daniel Hsu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Proceedings of the SMART–2023, IEEE Conference ID: 59791

12th International Conference on System Modeling & Advancement in Research Trends, 22nd–23rd, December, 2023
College of Computing Sciences & Information Technology, Teerthanker Mahaveer University, Moradabad, India

Sculpting DistilBERT: Enhancing Efficiency in

Resource-Constrained Scenarios
2023 12th International Conference on System Modeling & Advancement in Research Trends (SMART) | 979-8-3503-6988-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/SMART59791.2023.10428568

V. Prema1 and Dr. V. Elavazhahan2

1
Research Scholar, Annamalai University, Chidambaram
Assistant Professor, Government Arts College, Vadalur
2

Abstract—Fine-tuning large language models (LLMs) analysis task using the following metrics: accuracy,
such as BERT for natural language processing (NLP) tasks precision, recall, F1 score, inference time, and model size.
can be challenging in resource-constrained environments. Our results show that we can achieve significant model
DistilBERT is a smaller, more efficient version of BERT,
but it can still be too large for deployment on devices with
compression and inference time improvements without
limited memory and computational resources. Model sacrificing model performance. For example, we were able
compression techniques can be used to reduce the size and to reduce the model size by 50% and improve the inference
improve the inference time of LLMs, but this can often lead time by 25%, while maintaining the same accuracy as the
to a decrease in model performance. In this paper, we propose original DistilBERT model on the YouTube comment
a novel approach to fine-tuning DistilBERT with model sentiment analysis task.
compression techniques while maintaining or even improving
Our findings suggest that fine-tuning DistilBERT with
model performance. We use a combination of pruning and
quantization methods to reduce the model size and improve model compression techniques is a promising approach
inference time. We also introduce a new training regime for developing NLP models for sentiment analysis that
that is specifically designed for fine-tuning compressed are suitable for resource-constrained environments. This
DistilBERT models. We evaluated our approach on sentiment could enable the deployment of NLP models for sentiment
analysis. Our results show that we can achieve significant analysis on Resource constrained environment like low
model compression and inference time improvements without
computing capability systems working with free space
sacrificing model performance. For example, on the YouTube
comment sentiment analysis task, we were able to reduce cloud environments less than 12 GB RAM.
the model size by 50% and improve the inference time by II. Contributions
25%, while maintaining the same accuracy as the original
This paper makes the following contributions to the
DistilBERT model. Our findings suggest that our approach is
a promising way to develop NLP models that are suitable for field of NLP:
resource-constrained environments. We propose a new approach to fine-tuning DistilBERT
Keywords: Sentiment analysis, DistilBERT with model compression techniques while maintaining or
even improving model performance for sentiment analysis
I. Introduction
tasks. We create a dataset of YouTube comments using the
DistilBERT is a smaller and more efficient version YouTube Data API and preprocess and balance the dataset
of BERT, a popular language model for natural language to ensure that it is suitable for training and evaluating NLP
processing (NLP) tasks, including sentiment analysis.
models for sentiment analysis. We execute our experiments
While DistilBERT is already suitable for deployment on
on three platforms: Microsoft Azure, Google Colab, and a
resource-constrained devices, further model compression
local machine, demonstrating the portability of our work
techniques can be used to reduce its size and improve its
and making it accessible to a wider range of researchers
inference time even further for sentiment analysis tasks. In
this paper, we explore the use of pruning and quantization and practitioners. We evaluate the performance of our fine-
methods to fine-tune DistilBERT for sentiment analysis tuned DistilBERT models on the sentiment analysis task
tasks in resource-constrained environments. We created using a variety of metrics, including accuracy, precision,
a dataset of YouTube comments using the YouTube Data recall, F1 score, inference time, and model size, providing
API. We preprocessed and balanced the dataset to ensure a comprehensive assessment of the performance of our
that it is suitable for training and evaluating NLP models for models. We show that we can achieve significant model
sentiment analysis. We executed our experiments on three compression and inference time improvements without
platforms: Microsoft Azure, Google Colab, and a local sacrificing model performance, making it possible to
machine. We evaluated the performance of our fine-tuned deploy NLP models for sentiment analysis on resource-
DistilBERT models on the YouTube comment sentiment constrained devices.
Copyright © IEEE–2023 ISBN: 979-8-3503-6988-5 251

Authorized licensed use limited to: AsusTek Computer Inc. Downloaded on September 23,2024 at 08:02:51 UTC from IEEE Xplore. Restrictions apply.
12th International Conference on System Modeling & Advancement in Research Trends, 22nd–23rd, December, 2023
College of Computing Sciences & Information Technology, Teerthanker Mahaveer University, Moradabad, India

The paper is organized as follows. the dataset underwent essential preprocessing steps, which
Section 3 discusses the Related work Section 4 encompassed activities such as text tokenization, text
describe our proposed methodology . Section 5 describes normalization, and the removal of stop words. Furthermore,
the Experimental set up, Environments of execution and the paper delved into feature engineering, which involved
repository where the coding is available. Section 6 discusses the extraction of pertinent features from the textual data,
about the techniques and methodologies employed for this including the utilization of n-grams, TF-IDF, and word
work Section 7 presents our experimental results. Section 8 embeddings. Manual labeling was then conducted to
concludes the paper and discusses future work. annotate the dataset appropriately. To mitigate any class
imbalance issues, a balancing process was implemented.
III. Related Work
The research methodology revolves around enhancing
DistilBERT, a more compact and efficient variant of the performance of a DistilBERT model by employing a
the popular BERT (Bidirectional Encoder Representations multi-step approach. Initially, the lightweight DistilBERT
from Transformers) model. The main goal of DistilBERT
model undergoes fine-tuning using a specific dataset to
is to retain much of BERT’s performance while
establish a performance baseline. Subsequently, pruning
significantly reducing its size, making it faster, cheaper,
techniques involving magnitude and L1 norm are applied
and lighter [1] Hilmkil etal(2021) [2] explored the fine-
to sculpt the fine-tuned DistilBERT, effectively reducing
tuning of Transformer-based language models in a
its size. This pruned model undergoes retraining to adapt to
federated learning setting. They evaluated three popular
its modified structure. Evaluation metrics such as accuracy,
BERT-variants of different sizes including DistilBERT
inference time, and other relevant benchmarks are used
on a number of text classification tasks such as sentiment
to compare the pruned model’s performance against
analysis and author identification. Wang et al (2020)
the baseline. The results are meticulously documented,
combined Textual information with Sentiment diffusion
forming the basis for discussing the effectiveness of
[4] He integrates textual information and sentiment
pruning techniques on DistilBERT and suggesting
diffusion patterns to improve sentiment analysis outcomes
potential avenues for future research. Python libraries such
on Twitter data. In order to study sentiment diffusion, the
as PyTorch, TensorFlow, and specialized tools for pruning
researchers examined a phenomenon called sentiment
are utilized, ensuring comprehensive documentation.
reversal and discovered various intriguing characteristics
The paper assessed classifier performance using a
linked to such reversals. Hao et al.[5] introduced a novel
battery of established evaluation metrics, encompassing
method called Crossword to address the challenge of
accuracy, precision, recall, F1-score, and a comprehensive
cross-domain sentiment encoding using the stochastic
examination via confusion matrix analysis. This rigorous
word embedding technique. Their approach offers an
and systematic methodology ensured a robust and well-
enhanced approach for predicting probabilistic similarity
associations between pivot words and words in the source structured approach to the research presented in this paper.
domain. It leverages labeled reviews in the source domain
and unlabeled reviews in both domains to achieve this.
Zhu et al[6] introduced SentiVec, a kernel optimization
method for sentiment word embedding Wang et al (2021)
enhanced the original word vectors created by Word2Vec
and Glove, various features like POS, position, sentiment,
and sentiment concept are incorporated [11]. This process
generates Refined-Word2Vec and Refined-GloVe vectors.
Subsequently, the representations of Refined-Word2Vec
and Refined-GloVe are averaged to obtain RGWE. RGWE
integrates multiple position features, as well as internal and
external sentiment information.
IV. Methodology
In the process of crafting this research paper, the Fig. 1: Conceptual Frame work
initial step involved the creation of a dataset, wherein
V. Experimental Set Up
YouTube comments were pinpointed as the primary
data source, accessed through API(s), and subsequently A. Dataset
retrieved using a Python script. The collected data was Dataset contains user-generated reviews from
meticulously stored in a CSV format, and a preliminary YouTube, with binary sentiment labels (Positive/Negative)
data cleaning phase ensued to address concerns pertaining assigned to each review, and it is balanced to provide a fair
to missing values, duplicates, and outliers. Following this, distribution of sentiment categories for sentiment analysis

Authorized licensed use limited to: AsusTek Computer Inc. Downloaded on September 23,2024 at 08:02:51 UTC from IEEE Xplore. Restrictions apply.
Sculpting DistilBERT: Enhancing Efficiency in Resource-Constrained Scenarios

tasks in the domains of webinar reviews. The dataset is as the preferred lightweight transformer architectures for
balanced and captures sentiments from different user bases sentiment analysis. The deliberate choice was guided by
and communication styles inherent to these platforms. their inherent efficiency and optimization, making them
Table 1: Dataset description suitable for resource-limited scenarios. Leveraging the
pre-existing proficiency of these models in linguistic
Purpose Nature of dataset Positive labels Negative Labels
comprehension, we employed pre-trained versions and
Restaurant Reviews
Fine tuning
(Custom)
2850 2196 tailored their final layers to suit the specific demands of
binary sentiment classification.
Inference Finance 3457 3192
The lightweight models differ in both their model and
B. Preprocessing tokenizer sizes. DistilBERT, Albert, and MobileBERT
The dataset underwent a preprocessing phase to possess relatively smaller model sizes, whereas i-BERT
ensure its suitability for training and evaluating sentiment and SqueezeBERT have larger model footprints. It’s
analysis models. During this preprocessing, several key noteworthy that the sizes of tokenizers correspond closely
steps were applied to clean and prepare the data. This to the sizes of the respective models. These variations
included text normalization to handle variations in letter in size are crucial factors when implementing models
casing and punctuation, the removal of special characters in environments with limited resources, significantly
and numerical values, and the elimination of stop words influencing memory consumption and storage needs.
to reduce noise in the text. Additionally, techniques B. Model Compression
such as tokenization were employed to break down the
text into individual words or tokens, enabling further 1) Model Quantization
analysis. Furthermore, any duplicate or irrelevant entries To efficiently utilize memory, model quantization
were removed to maintain data integrity. This cleaned techniques can be applied. By reducing the precision of
and processed dataset, devoid of noisy or redundant model weights and activations, significant memory savings
information, was then utilized as the input for training and can be achieved without substantial loss in accuracy. Many
evaluating the sentiment analysis models, ensuring that deep learning frameworks provide quantization tools that
the models could focus on the meaningful content of the facilitate this process.
reviews while minimizing the impact of irrelevant factors. 2) Model Pruning
C. Environment and Execution Model pruning, involving the removal of unnecessary
The experiments were conducted on virtual machines connections or neurons, is an effective means to reduce
(VMs) hosted on Azure and Google Colab platforms The model size while maintaining performance. This technique
Azure platform provided a virtual machine with CPU is particularly advantageous in resource-constrained
resources, well-suited for running resource-efficient tasks. scenarios where memory efficiency is paramount.
The 28 GB of RAM facilitated handling larger datasets and 3) Knowledge Distillation
models. This platform provided cloud-based computing Employing knowledge distillation involves training a
resources without the need for local hardware The smaller “student” model to replicate the behavior of a larger,
programming environment employed for the experiments more accurate “teacher” model. This approach leverages
was Python the knowledge captured by the teacher model to achieve
The complete set of experiment implementations, competitive performance with reduced computational
including data preprocessing, model fine-tuning, demands.
prediction, and performance evaluation, are available 4) Feature Extraction
within a dedicated GitHub repository. This repository Consider utilizing feature extraction techniques, which
serves as a comprehensive resource for accessing the involve extracting relevant features from input text before
codebase and reproducing the experiments conducted employing a simpler classifier for sentiment classification.
on both Azure VM and Google Colab platforms. https:// This approach reduces the complexity of the model without
github.com/Prema-Veluchamy/Research-project compromising on accuracy.
VI. Light Weight Model Selection and By incorporating these strategies, sentiment analysis
Sculpting Distilbert can be effectively conducted even in settings with
restricted resources. The judicious combination of model
A. Selection of Distilbert selection, quantization, pruning, knowledge distillation,
After a thorough evaluation of transformer-based feature extraction, and optimized frameworks empowers
models tailored for text classification, a meticulous sentiment analysis to be both accurate and efficient,
curation process resulted in the inclusion of DistilBERT, thereby expanding its applicability across a spectrum of
MobileBERT, SqueezeBERT, iBERT, and ELECTRA resource-constrained environments.
Copyright © IEEE–2023 ISBN: 979-8-3503-6988-5 253

Table 2 :DistilBERT with other Light weight models VII. Results and Discussions
Model Researcher Methodology Table 4: DistilBERT as Feature Extractor
DistilBERT Victor Sanh et al. Knowledge distillation to
Feature
create a smaller version of Classifier Accuracy Precision Recall
Extractor
BERT.
Support
MobileBERT Zhiqing Sun et al. Task-agnostic BERT
Vector DistilBERT 0.94 0.82 0.88
compression using
Machine
advanced techniques
Random
ALBERT Zhenzhong Lan et al. Factorized embedding DistilBERT 0.86 0.97 0.9
Forest
parameterization for more
efficiency DistilBERT’s ability to efficiently capture complex
ELECTRA Kevin Clark et al Introduction of a new linguistic patterns and its versatility in transfer learning
generator-discriminator make it a powerful choice as a feature extractor for
pre-training sentiment analysis. DistilBERT’s ability to offer significant
IBERT Jongsoo Ahn et al. Integer-only quantization computational savings while retaining impressive
of BERT to minimize performance makes it a highly meritorious choice for
resource use
resource-constrained environments
T5 small Colin Raffel et al. Formulation of tasks as
text-to-text problems for Table 5: comparison of DistilBERT with other Finetuned models
Learning Finetuned Accuracy Precision Recall F1 Score
SqueezeBERT Forrest N. Iandola et al Neural architecture design models (%) (%) (%) (%)
with efficiency insight Distilbert 88 86 92 89
The comparison of various transformer-based models Albert 74 89 57 70
for text analysis highlights significant disparities in both MobileBERT 52 52 99 69
model and tokenizer sizes. DistilBERT, Albert, and
SqueezeBERT 61 62 56 59
MobileBERT generally demonstrate smaller footprints
IBERT 67 74 53 62
in comparison to i-BERT and SqueezeBERT. Notably,
model sizes closely correspond to tokenizer sizes for The above results provide insights into the resource
each model, emphasizing their interdependence. These efficiency of different sentiment analysis models with
size discrepancies are crucial considerations, particularly varying word embedding methods Model size and inference
in resource-constrained environments, as they directly times vary significantly, with Naïve Bayes models being
affect memory usage and storage requirements during the most resource-efficient, while the Logistic Regression
deployment. model with GLOVE embeddings consumes more resources,
particularly on the CPU. The SVM model, which combines
Table 3: Resource Constrained Metrics
TF-IDF and GLOVE embeddings, has a relatively larger
Model Size Tokenizer size model size but offers efficient inference times, especially
Model Name
Parameters Megabytes Tokens Megabytes on GPU and TPU hardware configurations. These metrics
Distilbert 66955010 255.413 30522 3553.74 can aid in selecting an appropriate model based on available
Albert 11685122 44.58 30000 3492.97 resources and performance requirements.
MobileBERT 24581888 93.77 30522 3553.74 Table 6: Performance of DistilBERT(Before Pruning)
i-BERT 51094272 475.48 50265 9638.1 Performance Metrics
Execution
SqueezeBERT 51094272 194.91 30528 3555.14 F1 Dataset
Accuracy Precision Recall time (secs)
score
Customer
88 82 95 89 372.063
Review
74 97 63 77 618.392 Finance
Customer
87 92 82 87 372.063
Review
47 99 23 37 638.96 Finance
Customer
86 87 87 87 409.918
Review
66 97 52 68 638.962 Finance
The models for the Customer Review dataset generally
have high precision, recall, and F1 scores, indicating a
Fig. 2: Distilbert comparison chart balanced performance across these metrics. The execution
254 Copyright © IEEE–2023 ISBN: 979-8-3503-6988-5

times for these models are notably lower compared to the Table 8: DistilBERT Vs BERT as Feature Extractor
Finance dataset. On the other hand, the models for the Feature
Classifier Accuracy Precision Recall Dataset
Finance dataset exhibit higher precision but lower recall Extractor
and F1 scores, suggesting potential issues with correctly Restaurant
NaiveBayes BERT 0.86 0.94 0.9
identifying certain classes or categories within the dataset. Reviews
These models also seem to have longer execution times. Finance
NaiveBayes BERT 0.69 0.67 0.68
Table 7: Performance of DistilBERT(After Pruning) Reviews
Support
DistilBERT Fine tuning Performance (After Pruning) Restaurant
Vector DistilBERT 0.94 0.82 0.88
Reviews
Pruning parameters Performance Metrics Machine
Pruning Pruning F1 Random Restaurant
Accuracy Precision Recall DistilBERT 0.86 0.97 0.9
Type rate score Forest Reviews
Magnitude 0.8 87 88 86 89
based
Structured 0.5 89 89 90 92

0.8 89 92 89 91
L1 norm
0.5 88 87 93 91
The performance of DistilBERT after fine-tuning
and pruning is summarized across two pruning types:
Magnitude based Structured and L1 norm, each with
varying pruning rates. In the Magnitude based Structured
method, a pruning rate of 0.8 resulted in slightly lower
accuracy, precision, recall, and F1 score compared to a rate
of 0.5, indicating a performance drop with higher pruning.
Fig. 2: Performance Metrics BERT vs. DistilBERT
However, for L1 norm pruning, a rate of 0.8 showcased
higher precision but slightly lower recall than the 0.5 rate, VIII. Conclusion
while both rates maintained similar accuracy and F1 scores. The study leveraged DistilBERT for feature
Overall, lower pruning rates generally demonstrated better extraction, followed by fine-tuning the model for sentiment
overall performance across these metrics for both pruning analysis on distinct datasets from finance and customer
methods. review domains. The initial application of DistilBERT
Before pruning, the model’s performance varied demonstrated its effectiveness in capturing nuanced
widely, showcasing higher accuracy, precision, recall, features, enabling the model to understand and classify
and F1 scores for certain configurations in both datasets. sentiment across diverse textual data. The fine-tuning
After pruning, there’s a trend of performance stabilization process aimed at optimizing the model’s performance
with narrower performance ranges across all metrics for sentiment analysis tasks specific to each domainUpon
for both datasets. Pruning seems to have led to a slight fine-tuning, the model exhibited varying degrees of
increase in certain metrics like precision and F1 score performance, showcasing nuanced accuracy, precision,
while maintaining or marginally altering other metrics recall, and F1 scores tailored to the characteristics of the
in comparison to the pre-pruning results. Additionally, finance and customer review datasets. This process not
execution times post-pruning are not provided, so a direct only enhanced the model’s capability but also highlighted
comparison with execution times before pruning isn’t the importance of domain-specific adaptation for superior
feasible based on the available data. sentiment analysis results.
Furthermore, to address computational efficiency
without compromising performance, pruning techniques
were employed. The application of Magnitude-based
Structured and L1 norm pruning strategies showcased their
potential in optimizing the model by reducing parameters
while maintaining acceptable levels of accuracy and
sentiment analysis metrics. These pruning methodologies
proved effective in streamlining the model architecture,
thereby increasing computational efficiency without
significant performance degradation. The research findings
underscore the significance of a comprehensive pipeline,
Fig. 3: DistilBERT Performance before and after Pruning from feature extraction using advanced pre-trained models

like DistilBERT to domain-specific fine-tuning, and [2] Hilmkil A, Callh S, Barbieri M, Sütfeld LR, Zec EL, Mogren O.
subsequently optimizing model efficiency through pruning Scaling federated learning for fine-tuning of large language models.
InInternational Conference on Applications of Natural Language
techniques. This process not only enhances the model’s to Information Systems 2021 Jun 20 (pp. 15-23). Cham: Springer
ability to discern sentiment across diverse datasets but International Publishing.
also contributes to resource-friendly and efficient models [3] Korotkova A. Exploration of fine-tuning and inference time of large
tailored for real-world applications. pre-trained language models in NLP (Doctoral dissertation).
[4] Wang L, Niu J, Yu S. SentiDiff: combining textual information and
IX. Future Directions sentiment diffusion patterns for twitter sentiment analysis. IEEE
Trans Knowl Data Eng. 2020;32(10):2026–39. https:// doi. org/ 10.
To further advance sentiment analysis in resource- 1109/ tkde. 2019. 29136 41.
constrained settings, future research can explore [5] Hao Y, Mu T, Hong R, Wang M, Liu X, Goulermas JY. Cross-
optimization techniques, lightweight architectures, and domain sentiment encoding through stochastic word embedding.
more efficient deployment strategies. Additionally, IEEE Trans Knowl Data Eng. 2020;32(10):1909– 22. https:// doi.
org/ 10. 1109/ tkde. 2019. 29133 79.
investigating the adaptability of models to domain-specific [6] Zhu L, Li W, Shi Y, Guo K. SentiVec: learning sentiment-context
data could enhance their real-world applicability. vector via kernel optimization function for sentiment analysis. IEEE
Trans Neural Netw Learn Syst. 2021;32(6):2561–72. https:// doi.
Acknowledgements org/ 10. 1109/ tnnls. 2020. 30065 31.
I would like to express my sincere appreciation to [7] Chiong R, Budhi GS, Dhakal S. Combining sentiment lexicons and
Mr. Prabhakaran, who serves as a Cloud Architect, for content-based features for depression detection. IEEE Intell Syst.
2021; 36:99–105. https:// doi. org/ 10. 1109/ MIS. 2021. 30936 60.
his invaluable assistance in provisioning the necessary [8] Li Y, Pan Q, Yang T, Wang S, Tang J, Cambria E. Learning word
cloud resources on Azure VM. This essential provision representations for sentiment analysis. Cognitive Computation.
greatly facilitated the successful training of our models, 2017 Dec; 9:843-51
which stands as a foundational element in our efforts to [9] Pang B, Lee L. Opinion mining and sentiment analysis. Foundations
and Trends® in information retrieval. 2008 Jul 6;2(1–2):1-35.
advance sentiment analysis. Mr. Prabhakaran’s support [10] Pan SJ, Ni X, Sun JT, Yang Q, Chen Z. Cross-domain sentiment
and facilitation have been instrumental in making our classification via spectral feature alignment. In Proceedings of the
endeavors not only achievable but also highly productive. 19th international conference on World wide web 2010 Apr 26 (pp.
751-760).
References [11] Eklund M. Comparing Feature Extraction Methods and Effects of
[1] Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled Pre-Processing Methods for Multi-Label Classification of Textual
version of BERT: smaller, faster, cheaper and lighter. arXiv preprint Data.
arXiv:1910.01108. 2019 Oct 2.

Authorized licensed use limited to: AsusTek Computer Inc. Downloaded on September 23,2024 at 08:02:51 UTC from IEEE Xplore. Restrictions apply.

The Ultimate Guide to Prompt Engineering From Beginner to Expert Free Resources Hands-On Practice With Practical Examples (Yadav, Chandradev) (Z-Library)
100% (1)
The Ultimate Guide to Prompt Engineering From Beginner to Expert Free Resources Hands-On Practice With Practical Examples (Yadav, Chandradev) (Z-Library)
76 pages
Exploring Bentley STAAD.Pro CONNECT Edition, 3rd Edition
From Everand
Exploring Bentley STAAD.Pro CONNECT Edition, 3rd Edition
Prof. Sham Tickoo
5/5 (3)
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
From Everand
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
Shane Snipes, PhD
No ratings yet
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Advance Deep Learning Final. INeuron
100% (1)
Advance Deep Learning Final. INeuron
17 pages
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
No ratings yet
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
34 pages
DistilBERT, A Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter
No ratings yet
DistilBERT, A Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter
5 pages
Craft Presentations
No ratings yet
Craft Presentations
21 pages
Hugging Face
100% (1)
Hugging Face
11 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Panchbhai 2021
No ratings yet
Panchbhai 2021
6 pages
Building Support Structures, 2nd Ed., Analysis and Design with SAP2000 Software
From Everand
Building Support Structures, 2nd Ed., Analysis and Design with SAP2000 Software
Wolfgang Schueller
4.5/5 (15)
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Defect Prediction in Software Development & Maintainence
From Everand
Defect Prediction in Software Development & Maintainence
Rudra Kumar
No ratings yet
Vietnamese Sentiment Analysis Under Limited Training Data
No ratings yet
Vietnamese Sentiment Analysis Under Limited Training Data
14 pages
Design and Analysis of Algorithms: 1, #1
From Everand
Design and Analysis of Algorithms: 1, #1
S. R. Jena
No ratings yet
Mobilebert: A Compact Task-Agnostic Bert For Resource-Limited Devices
No ratings yet
Mobilebert: A Compact Task-Agnostic Bert For Resource-Limited Devices
13 pages
Optimization_of_Sentiment_Analysis_using_BERT
No ratings yet
Optimization_of_Sentiment_Analysis_using_BERT
5 pages
The Comprehensive Guide to Machine Learning Algorithms and Techniques
From Everand
The Comprehensive Guide to Machine Learning Algorithms and Techniques
Mohammed Ahmed
5/5 (1)
BERT Sentiment Analysis Twitter
No ratings yet
BERT Sentiment Analysis Twitter
11 pages
4 System Desc
No ratings yet
4 System Desc
3 pages
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Fundamentals of Software Engineering: Designed to provide an insight into the software engineering concepts
From Everand
Fundamentals of Software Engineering: Designed to provide an insight into the software engineering concepts
Hitesh Mohapatra
No ratings yet
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet
poster_version_final_bis
No ratings yet
poster_version_final_bis
1 page
Artificial Intelligence for Image Super Resolution
From Everand
Artificial Intelligence for Image Super Resolution
Debmitra Ghosh
No ratings yet
A Natural Language Processing For Sentiment Analysis From Text Using Deep Learning Algorithm
No ratings yet
A Natural Language Processing For Sentiment Analysis From Text Using Deep Learning Algorithm
7 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Dimensionality Reduction: Advancements in data processing for intelligent systems
From Everand
Dimensionality Reduction: Advancements in data processing for intelligent systems
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
13.fine Grained Sentiment Classification Using BERT
No ratings yet
13.fine Grained Sentiment Classification Using BERT
4 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
Compressing Large Scale Transformer Based Models_A Case Study on BERT
No ratings yet
Compressing Large Scale Transformer Based Models_A Case Study on BERT
7 pages
Unit 2
No ratings yet
Unit 2
34 pages
An N-gram-Based BERT Model For Sentiment Classification Using Movie Reviews
No ratings yet
An N-gram-Based BERT Model For Sentiment Classification Using Movie Reviews
6 pages
CHATGPT DALL.E 3: Complete Guide. Third Edition
From Everand
CHATGPT DALL.E 3: Complete Guide. Third Edition
Hesham Mohamed Elsherif
No ratings yet
Ultimate Enterprise Data Analysis and Forecasting using Python: Leverage Cloud platforms with Azure Time Series Insights and AWS Forecast Components for Time Series Analysis and Forecasting with Deep learning Modeling using Python
From Everand
Ultimate Enterprise Data Analysis and Forecasting using Python: Leverage Cloud platforms with Azure Time Series Insights and AWS Forecast Components for Time Series Analysis and Forecasting with Deep learning Modeling using Python
Shanthababu Pandian
No ratings yet
Constrained Conditional Model: Fundamentals and Applications
From Everand
Constrained Conditional Model: Fundamentals and Applications
Fouad Sabry
No ratings yet
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
The Art of Controller Design
From Everand
The Art of Controller Design
Martin Braae
No ratings yet
Exploring The Effectiveness of BERT For Sentiment Analysis On Large-Scale Social Media Data
No ratings yet
Exploring The Effectiveness of BERT For Sentiment Analysis On Large-Scale Social Media Data
4 pages
Model
No ratings yet
Model
5 pages
Industrial Automation: Learn the current and leading-edge research on SCADA security
From Everand
Industrial Automation: Learn the current and leading-edge research on SCADA security
Vikalp Joshi
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
A Hybrid Model of Roberta and Bidirectional Gru For Enhanced Sentiment Analysis
No ratings yet
A Hybrid Model of Roberta and Bidirectional Gru For Enhanced Sentiment Analysis
6 pages
MATLAB for Machine Learning: Unlock the power of deep learning for swift and enhanced results
From Everand
MATLAB for Machine Learning: Unlock the power of deep learning for swift and enhanced results
Giuseppe Ciaburro
No ratings yet
Gen AI Assignment
No ratings yet
Gen AI Assignment
5 pages
Teaching and Learning in STEM With Computation, Modeling, and Simulation Practices: A Guide for Practitioners and Researchers
From Everand
Teaching and Learning in STEM With Computation, Modeling, and Simulation Practices: A Guide for Practitioners and Researchers
Alejandra J. Magana
No ratings yet
Key Data Extraction and Emotion Analysis of Digital Shopping Based On BERT
No ratings yet
Key Data Extraction and Emotion Analysis of Digital Shopping Based On BERT
14 pages
Maneesha Nidigonda Verzeo Major Project
No ratings yet
Maneesha Nidigonda Verzeo Major Project
11 pages
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
No ratings yet
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
16 pages
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
From Everand
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
Avishek Nag
No ratings yet
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Cutting Down On Prompts and Parameters: Simple Few-Shot Learning With Language Models
No ratings yet
Cutting Down On Prompts and Parameters: Simple Few-Shot Learning With Language Models
12 pages
Fine-Tuning_of_Distil-BERT_for_Continual_Learning_
No ratings yet
Fine-Tuning_of_Distil-BERT_for_Continual_Learning_
21 pages
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
From Everand
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
Nelson Ambrose
No ratings yet
Maneesha Nidigonda Major Project
No ratings yet
Maneesha Nidigonda Major Project
11 pages
17056-Article Text-20550-1-2-20210518
No ratings yet
17056-Article Text-20550-1-2-20210518
8 pages
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
From Everand
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
Dr. GEETHA N DATA SCIENTIST, BENGALURU
No ratings yet
Text Summarization
No ratings yet
Text Summarization
60 pages
Deep Learning in Natural Language Processing A State-of-the-Art Survey
No ratings yet
Deep Learning in Natural Language Processing A State-of-the-Art Survey
6 pages
Integrated Data Science Certification - DexLab Analytics - Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA
No ratings yet
Integrated Data Science Certification - DexLab Analytics - Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA
13 pages
Get Real World Natural Language Processing 1st Edition Masato Hagiwara free all chapters
100% (2)
Get Real World Natural Language Processing 1st Edition Masato Hagiwara free all chapters
37 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
Manuscript Updated-1
No ratings yet
Manuscript Updated-1
10 pages
Report - PDF 20240827 210738 0000
No ratings yet
Report - PDF 20240827 210738 0000
23 pages
CONNEAU and Lample - 2019 - Cross-lingual Language Model Pretraining
No ratings yet
CONNEAU and Lample - 2019 - Cross-lingual Language Model Pretraining
11 pages
Complete Download Social Big Data Analytics: Practices, Techniques, and Applications Bilal Abu-Salih PDF All Chapters
100% (3)
Complete Download Social Big Data Analytics: Practices, Techniques, and Applications Bilal Abu-Salih PDF All Chapters
65 pages
MCNN-LSTM Combining CNN and LSTM to Classify Multi-Class Text in Imbalanced News Data
No ratings yet
MCNN-LSTM Combining CNN and LSTM to Classify Multi-Class Text in Imbalanced News Data
16 pages
CS 6030 Natural Language Processing
No ratings yet
CS 6030 Natural Language Processing
3 pages
12 Subrata DL
No ratings yet
12 Subrata DL
25 pages
Poverty Cause and Effect Essay
100% (2)
Poverty Cause and Effect Essay
8 pages
Rasa Doc Tutorial
No ratings yet
Rasa Doc Tutorial
29 pages
A Deep Learning Approach For Public Sentiment Analysis in COVID-19 Pandemic
No ratings yet
A Deep Learning Approach For Public Sentiment Analysis in COVID-19 Pandemic
7 pages
Module 5
No ratings yet
Module 5
76 pages
Reasoning With Sarcasm
No ratings yet
Reasoning With Sarcasm
11 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Neural Approaches To Conversational AI
No ratings yet
Neural Approaches To Conversational AI
95 pages
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
No ratings yet
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
281 pages
Handbook of Software Fault Localization W. Eric Wong all chapter instant download
100% (4)
Handbook of Software Fault Localization W. Eric Wong all chapter instant download
66 pages
Week 8-Module 7 NLP
No ratings yet
Week 8-Module 7 NLP
52 pages
Sentiment Analysis of Student Feedback Using Attention-Based RNN and Transformer Embedding
No ratings yet
Sentiment Analysis of Student Feedback Using Attention-Based RNN and Transformer Embedding
12 pages
8.progress Report Presentation (Clickbait Detection System)
No ratings yet
8.progress Report Presentation (Clickbait Detection System)
26 pages
Statistical Topic Modeling For Afaan Oromo Document Clustering
No ratings yet
Statistical Topic Modeling For Afaan Oromo Document Clustering
10 pages
NLP Notes
No ratings yet
NLP Notes
11 pages
Download Complete Collaborative Computing Networking Applications and Worksharing 13th International Conference CollaborateCom 2017 Edinburgh UK December 11 13 2017 Proceedings Imed Romdhani PDF for All Chapters
100% (1)
Download Complete Collaborative Computing Networking Applications and Worksharing 13th International Conference CollaborateCom 2017 Edinburgh UK December 11 13 2017 Proceedings Imed Romdhani PDF for All Chapters
54 pages
Question-Bank-on-NLP,COA,ITB
No ratings yet
Question-Bank-on-NLP,COA,ITB
154 pages

Sculpting DistilBERT Enhancing Efficiency in Resource-Constrained Scenarios

Uploaded by

Sculpting DistilBERT Enhancing Efficiency in Resource-Constrained Scenarios

Uploaded by

Proceedings of the SMART–2023, IEEE Conference ID: 59791

Sculpting DistilBERT: Enhancing Efficiency in

V. Prema1 and Dr. V. Elavazhahan2

252 Copyright © IEEE–2023 ISBN: 979-8-3503-6988-5

Copyright © IEEE–2023 ISBN: 979-8-3503-6988-5 255

256 Copyright © IEEE–2023 ISBN: 979-8-3503-6988-5

You might also like