0% found this document useful (0 votes)
41 views2 pages

Resume Sonaika Pati D (010824) - 4-1

Uploaded by

bheemlalikes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views2 pages

Resume Sonaika Pati D (010824) - 4-1

Uploaded by

bheemlalikes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Sonalika Pati

[email protected] | +91- 7750801108 LinkedIn/Sonalika Pati


PROFESSIONAL EXPERIENCE
Codvo | Data Scientist II (Jun 2023 – Present)
• Developed and implemented AI models for querying various file formats including PDF, DOCX, XLS, and Postgres SQL databases,
using LlamaIndex, Vaana.ai, ChromaDB, and GPT-3/Gemini Flash, resulting in a 70% increase in data retrieval efficiency
• Developed customized knowledge spaces to meet unique client requirements, ensuring optimal performance and user
satisfaction. Successfully integrated and tested end-to-end pipelines for client-specific knowledge spaces, including Customer
Center and HR use cases.
• Tested the end-to-end pipeline by constructing diverse knowledge spaces aligned with specific client use cases (e.g., Customer
Center, HR etc). Modified and optimized AI prompts according to specific use cases, achieving an 80% increase in model
accuracy, leading to a 60% improvement in user satisfaction
• Conducted a POC for Wren AI, comparing its performance with Vaana AI for SQL database queries, and documented the results
to guide future AI strategy, improving query accuracy by 40%.
• Loan Status Prediction: Automated Loan Eligibility Process in a real time scenario related to customer's detail provided while
applying application for loan forms.
• Used multiple ML algorithms for training purposes like Decision Tree, Random Forest, SVM, Logistic Regression, XGB Regressor,
etc. Among all the algorithms SVM performed best on the validation data with an accuracy score of 82.7%.
• Deployed model in production map with exporting of model using pickle and bind with flask web application API on AWS EC2
instance to calculate customize prediction using Web App API.
• Credit Card package Recommendation: Generated personalized offering to adapt the customer's behavioral preferences and
developed a Hierarchical Bayes Choice Model to assess willingness-to-pay (WTP) at customer-attribute level.
• Developed a two-staged LSTM model capturing temporal information to predict customer's lapse or lifetime. Utilized clustering
and statistical distribution to estimate customer's lifetime value Beat the in-house algorithm by achieving 86% recall for 0-3
months lapsers; 88% overall recall.
• Fraud Detection Project: Drove redevelopment of fraud detection algorithm, introducing 10+ new attributes, resulting in
decreased false positive cases (FP) by 75%. Evaluated model for best accuracy, identified best parameter for leading to find
fraud transaction. Applied different types of sampling techniques to handle highly imbalanced dataset.
• Analyzed customer’s spending behavior, including mapping the location of the spenders to identify fraudulent transactions
from the non-fraudulent ones, increasing the accuracy rate by over 97%.
• Developed machine learning models to predict password strength for enterprise systems, including user insights to identify
common patterns and weak passwords, enabling customers to improve security protocols by using TF-IDF vectorizer.
Continental Automotive Components (India) Pvt. Ltd | Data Scientist (Jul 2022- Jun 2023)
• Implemented multiple machine learning models including linear regression, logistic regression, decision trees, KNN, HMM
techniques and neural network to analyze data pertaining to fault prediction and predictive maintenance in the automotive
industry using fault data from sensors and OEM specific knowledge to reduce cost of car manufacturing.
• Developed a fault prediction model to detect mechanical failures in diesel engines based on various features such as mileage,
engine RPM, oil temperature, air consumption and consumption of fuel, extending vehicle life and reducing costly downtime.
• Developed algorithm to predict faults in vehicle brakes, establishing a predictive maintenance program to ensure lowest level of
customer service calls and longest time between brake replacements, resulting in reduced costs for parts and labor.
• Built drowsiness detection system with Python, OpenCV, TensorFlow, Keras, to detect the faces and eyes using a Haar Cascade
classifier along with CNN model to predict the status of closed eyes of drivers & alert them if they fall asleep.
• Leveraged different techniques like CNN, LSTM networks and transfer learning with 4 core features and optimized LSTM model
to achieve higher accuracy with a much lower false-negative rate of 0.2 with hyperparameter tuning & feature normalization.
• Performed EDA, using tools such as python pandas, scikit-learn, and matplotlib to derive meaningful insights for the
automotive portal, consumers and dealers.
• Deployed and optimized model in AWS cloud, utilizing EC2, Redshift, S3, EMR, and RDS; reduced mean time to repair (MTTR)
by 20%. Contributed to $3M cost saving initiative by collecting real-time sensor data and applying algorithms to minimize
power consumption and reduce maintenance costs.
Ernst & Young Global Limied | Associate Consultant - Data Science (Aug 2020 – Jul 2022)
• Technology stack: Python, Pandas, NumPy, Sklearn, Matplotlib & Seaborn. Handled Life cycle for Predictive Analysis by using
EDA, Feature Engineering, Feature selection, Model Building & Model Deployments.
• Prepared data for building classification model, trained and tested using Logistic Regression, Decision tree, Random Forest.
• Developed Churn model in python for North America Zone (NAZ) & South America Zone (SAZ) using Random Forest and provided
association rules to provide specific conditions in which learners have not opted the courses. This enabled the client to target
learners who satisfy those conditions & thus reducing churn rate by 5% & analyzed large volumes of data to draw actionable
insights that supports business decision.
• Developed a framework to build propensity models estimating the probability that a given customer will remain a paid customer
over several time windows like 15D, 30D, and 45D rolling window and created framework for better targeting win-back campaigns
and identified features that are the biggest differentiators amongst customers.
• Extracted customized credits report from SAP SuccessFactors & built Data visualizations with Power BI Dashboard using Python
libraries like Seaborn, Matplotlib and Pandas for product KPIs that reduced manual reporting work by 8 hours weekly.
• Flagged new customers & merged datasets using approximate string-matching techniques and used segmentation on various
levels using geospatial analytics, heuristics and clustering techniques, Prioritization of customers based on their sales potential,
vital for RTM strategies.
• Developed a CNN model for House interior prediction, incorporating advanced techniques such as batch normalization, dropout,
transfer learning, and data augmentation and achieved an accuracy of 95% on the validation set, surpassing benchmarks.
• Applied transfer learning by fine-tuning a pre-trained model, leveraging features from a large dataset. Implemented data
augmentation strategies (rotation, flipping, etc.) to boost model performance on limited interior images.
• Achieved a measurable business impact with a 25% boost in user engagement attributed to the enhanced model transparency
and improved user experience.
• Analyzed restaurant data using advanced Excel functions and SQL queries, determined percentage of restaurants with table
booking and online delivery options by developing charts based on cuisines, ratings etc.
• Implemented advanced Excel functions such as VLOOKUP, PivotTables, and conditional formatting. Executed SQL queries to
extract relevant information by using group by, Joins, Subqueries, Stored procedures and Window functions (ROW_NUMBER,
RANK, and DENSE_RANK), & visualized insights with interactive dashboards using Power BI by implementing DAX.
• Registered the trained churn model in the SageMaker Model Registry and Created a SageMaker model by taking the artifacts of
the best model and developed ML development workflow fully automated model development process.

PROJECTS
Customer Service Automation (Natural Language Processing)
• Used NLTK library with the core natural language processing to develop and implement novel techniques for sentiment analysis
on product reviews, dialogue state tracking in chatbots, customer service automation, etc.
• Categorized comments into positive & negative clusters from different social networking sites using Sentiment & Text Analytics
• Ensured model has low False Positive Rate & Text classification, sentiment analysis for unstructured and semi- structured data.
• Created and designed reports by using gathered metrics to infer and draw logical conclusions from past and future behavior.
Autonomous Tagging of Stack Overflow Questions
• ML SVM Classifier for predicting tags using Scikit-Learn classifier to correctly predict tags of Stack Overflow questions.
• Used token-based feature engineering techniques along with TF-IDF vectorizer with multi-label classification as each question
can have 1-5 tags.
Stock Price Prediction
• Leveraged ARIMA model to forecast the stock price trend with MAPE of around 2.5% in predicting the next 15 observations.
• Utilized cross-validation to avoid the look-ahead bias. Trained multiple machine learning models and then combined them
using ensemble learning to produce higher prediction accuracy.

COURSES/CERTIFICATIONS
• Machine Learning Fundamentals by Andrew Ng • Microsoft Certified: Analyzing Data with MS Power BI
• Data Science & Business Analytics • Probability & Statistics
• Deep Learning for computer vision with TensorFlow • PyTorch for Deep Learning
• Data Analyst Nanodegree by Udacity • Python for Data Science
• AWS Certified Machine Learning Specialty • Complete Neural Network Bootcamp

SKILLSETS
• Programming Languages: Python, SQL
• Tools: Tableau, Power BI, Jupyter Notebook, Visual Studio Code, AWS, GitHub, MS Office, Confluence, Jira, MS Excel
• Databases: SQL Server, MySQL, MongoDB, GCP Big Query
• Libraries: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, NLTK, Keras, ARIMA, SpaCy, TensorFlow, SciPy, OpenCV, PyTorch
• Skills: Big Data, EDA, Data Analysis, Data Visualization, Data Mining, Sklearn, Natural Language Processing, BERT, Transformer,
Time series, Deep Learning, MLOps, Business Analytics, Supervised or Unsupervised Machine Learning Algorithms, Regression,
Classification, One hot encoding, Clustering, Generative AI, LLM, LangChain, LlamaIndex, ChromaDB, GPT-3, vector database
Gemini Flash
• Deployment: Docker, AWS Sage maker, MLOps, AWS S3, EC2, Streamlit, Flask, Pickle

EDUCATION
Program Institution CGPA/% Completion Year
Bachelors (BBA) in Information Technology Utkal University 9.3 2022
Class XII [HSC Examination] KMBB Junior Science College 7.4 2018
Class X [SSC Examination] SSVM Keonjhar 8.6 2016

You might also like