The Complete Collection of Data Science Cheatsheets KDnuggets
The Complete Collection of Data Science Cheatsheets KDnuggets
Scikit-learn for Machine Learning Linux for Data Science Git for Data Science
1
Table of Contents
1. SQL
2. Web Scraping
4. Data Analytics
5. Business Intelligence
6. Big Data
8. Machine Learning
9. Deep Learning
2
SQL
Majority of technical interviews and assessment tests include some type of
SQL questions so, it is better to prepare for the interview using the collection
of SQL cheat sheets. These cheat sheets will also help you get better at
creating and managing databases. It will also help you understand complex
SQL queries.
• SQL Basics
• SQL Expert
• SQL Window Functions Cheat Sheet
• SQL Joins Cheat Sheet
• SQL – Data Analysis
• PostgreSQL
• SQL for the Job Interview
3
Web Scraping
Web Scraping is an essential part of data science, as it is used for gathering
data, market research, and maintaining data pipelines. Beautiful Soup is a
popular library for parsing HTML/Java scripts and converting them into
human-readable dataframe. The section consists of tools that are used to
parse scripts in Python and R.
• Probability
• William Chen's Probability Cheatsheet 2.0
• Stanford: Algebra and Calculus
• Statistics, Probability & Math
• MIT: Statistics
• Stanford: Statistics
• Calculus for Machine Learning
• Linear algebra for deep learning
• SciPy: Linear Algebra in Python
• A Comprehensive Statistics for Interview
5
Data Analytics
Data analytics is used for making business decisions, marketing campaigns,
scientific research, and designing unique data products. Entire IT industry
depends on it. This category is further divided into three subcategories:
Python, R, Julia. All of these languages are popular among data scientists
and data analysts.
Python
The list contains the most used Python packages from data ingestion,
manipulation, and visualization. Numpy and Pandas are the most popular
tools among the data community for performing scientific calculation and
data augmentation.
Julia
• Hadoop
• Scala
• Spark
• Hive Functions
• Spark with sparklyr
9
Data Structures & Algorithms
The most common technical interview questions are about data structures
and algorithms. If you are a software engineer or data scientist then you
must know common data structure operations, search & sorting algorithms,
and data structure types. The list was created to help you understand
complex sorting functions and algorithms.
• Supervised learning
• Statistics & Mathematics for Machine Learning
• Unsupervised learning
• Scikit-Learn: Python Machine Learning
• Scikit-Learn: Machine Learning Algorithm Selection
• Machine Learning Models
• Time Series with R
• Machine Learning tips and tricks
• Caret: Modeling and machine learning in R
• Machine Learning Modeling with R
• Deep Learning
• PyTorch
• Neural Network Architectures
• Neural Network Graphs
• Neural Network Cells
• Neural Network Type with Diagram
• Keras: Neural Networks in Python
• Deep learning with Keras in R
• TensorFlow
12
Natural Language Processing
Natural Language Processing (NLP) is used for processing and cleaning text,
audio, and image data so we can extract useful information. NLP
applications are limitless, as it is used for language translation, transcription,
conversation AI, question & answering, generative technology, classification,
name entity recognition, and many more. The collection of cheat sheets
contains bite-size information about the most famous NLP tools and
algorithms.
About Author
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who
loves building and deploying machine learning models. Currently, he is
focusing on content creation and writing technical blogs on machine
learning and data science. Abid holds a master’s degree in Technology
Management and a bachelor’s degree in Telecommunication Engineering.
His vision is to build an AI product using a graph neural network for students
struggling with mental illness.