Data engineering and analytics using python

Jun 21, 2018Download as PPTX, PDF2 likes1,365 views

This document provides an overview of data engineering and analytics using Python. It discusses Jupyter notebooks and commonly used Python modules for data science like Pandas, NumPy, SciPy, Matplotlib and Seaborn. It describes Anaconda distribution and the key features of Pandas including data loading, structures like DataFrames and Series, and core operations like filtering, mapping, joining, sorting, cleaning and grouping. It also demonstrates data visualization using Seaborn and a machine learning example of linear regression.

Data
Engineering
and Analytics
using Python
PURNA CHANDER RAO. KATHULA

Talking Topics
 Jupyter notebook
 About me
 Python modules for Data Science
 Anaconda
 Pandas
 About pandas
 Data Munging / Data Preparation.
 Demo
 Seaborn
 About seaborn
 Machine Learning
 Linear Regression.

About me..
 Job Title = Architect QA
 Build Tools using Python for QA automation testing .
 Currently Learning

Python modules for Data Science
 Packages used for Data Analysis and Analytics
 Jupyter Notebook
 Pandas
 Numpy
 Scipy
 Matplotlib
 Seaborn
 Scikitlearn

What is Anaconda ?
 Essentially a Large ( ~ 400 MB ) Python Installation.
 But Contains Everything you need for Data Analysis
 Unless you have a special reason not to , you should just install and use this.

About Pandas
 What is Pandas ?
Pandas is a Python library for data analysis and data manipulation. A python version of the R
data.frame library.
 Key Features of Pandas
 It has API’s for loading data from different file formats into memory.
 ( exel, tsv, csv, db and etc).
 Data is structured in the form of Rows and Columns.
 Retrieval of data is similar as SQL, can perform all the operations such as Groupby, Joins, Views and etc..
 Merging of data from multiple datasets.
 Does support much of DataTime series functionality, Timezone, Business Days, Holidays and etc..
 Boolean Indexing
 Fancy Indexing

Core DataStructures of Pandas
 DataFrames
 Series
Core Operations
Create Select Insert Map
Join Sort Clean ApplyMap
View Update Filter Append
Group Summarize Confirm Rotate

Create ( Creating a DataFrame)
View ( Viewing the rows and columns)

Insert ( Adding a new column to dataframe)

Filter ( Slicing and dicing the datframe)

Append (Joining the dataframes based on x-axis=0 )

Concat (Joining the dataframes on Axis = 0 or 1)

Sort (by columns ascending True or False)

Clean ( Fillna ( method=‘ffill / bfill’)

Conform ( reindex() / resample, dropping / NAN as needed)

What is Seaborn?
 Seaborn provides a high-level interface to matplotlib. It provides a high level
interface for drawing attractive statistical graphs.

Demo ( Restaurant Dataset visualization)

Machine Learning ( Linear Regression)
DEMO

Pandas is an open source Python library that provides data structures and data analysis tools for working with tabular data. It allows users to easily perform operations on different types of data such as tabular, time series, and matrix data. Pandas provides data structures like Series for 1D data and DataFrame for 2D data. It has tools for data cleaning, transformation, manipulation, and visualization of data.

Visualization and Matplotlib using Python.pptxSharmilaMore5

This document provides an overview of Matplotlib, a Python data visualization library. It discusses Matplotlib's pyplot and OO APIs, how to install Matplotlib, create basic plots using functions like plot(), and customize plots using markers and line styles. It also covers displaying plots, the Matplotlib user interface, Matplotlib's relationships with NumPy and Pandas, and examples of different types of graphs and charts like line plots that can be created with Matplotlib.

Data Analysis with Python PandasNeeru Mittal

Python Pandas is a powerful library for data analysis and manipulation. It provides rich data structures and methods for loading, cleaning, transforming, and modeling data. Pandas allows users to easily work with labeled data and columns in tabular structures called Series and DataFrames. These structures enable fast and flexible operations like slicing, selecting subsets of data, and performing calculations. Descriptive statistics functions in Pandas allow analyzing and summarizing data in DataFrames.

Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch

Exploratory data analysis with PythonDavis David

Pandasmaikroeder

Pandas is a powerful Python library for data analysis and manipulation. It provides rich data structures for working with structured and time series data easily. Pandas allows for data cleaning, analysis, modeling, and visualization. It builds on NumPy and provides data frames for working with tabular data similarly to R's data frames, as well as time series functionality and tools for plotting, merging, grouping, and handling missing data.

Data Analysis and Visualization using PythonChariza Pladin

The document is a presentation about data analysis and visualization using Python libraries. It discusses how data is everywhere and growing exponentially, and introduces a 5-step process for data analysis and decision making. It emphasizes the importance of visualizing data to analyze patterns, discover insights, support stories, and teach others. The presentation then introduces Jupyter Notebook and highlights several Python libraries for data visualization, including matplotlib, seaborn, ggplot, Bokeh, pygal, plotly, and geoplotlib.

Python Seaborn Data Visualization Sourabh Sahu

PandasJyoti shukla

Pandas is a Python library used for working with structured and time series data. It provides data structures like Series (1D array) and DataFrame (2D tabular structure) that are built on NumPy arrays for fast and efficient data manipulation. Key features of Pandas include fast DataFrame objects with indexing, loading data from different formats, handling missing data, reshaping/pivoting datasets, slicing/subsetting large datasets, and merging/joining data. The document provides an overview of Pandas, why it is useful, its main data structures (Series and DataFrame), and how to create and use them.

HadoopNishant Gandhi

Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It was created to support applications handling large datasets operating on many servers. Key Hadoop technologies include MapReduce for distributed computing, and HDFS for distributed file storage inspired by Google File System. Other related Apache projects extend Hadoop capabilities, like Pig for data flows, Hive for data warehousing, and HBase for NoSQL-like big data. Hadoop provides an effective solution for companies dealing with petabytes of data through distributed and parallel processing.

Introduction to Python Pandas for Data AnalyticsPhoenix

Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf

Most data is allocated to a period or to some point in time. We can gain a lot of insight by analyzing what happened when. The better the quality and accuracy of our data, the better our predictions can become. Unfortunately the data we have to deal with is often aggregated for example on a monthly basis, but not all months are the same, they may have 28 days, 31 days, have four or five weekends,…. It’s made fit to our calendar that was made fit to deal with the earth surrounding the sun, not to please Data Scientists. Dealing with periodical data can be a challenge. This talk will show to how you can deal with it with Pandas.

PPT on Data Science Using PythonNishantKumar1179

Introduction to NumPy (PyData SV 2013)PyData

NumPy is a Python library used for working with multidimensional arrays and matrices for scientific computing. It allows fast operations on arrays through optimized C code and is the foundation of the Python scientific computing stack. NumPy arrays can be created in many ways and support operations like indexing, slicing, broadcasting, and universal functions. NumPy provides many useful features for linear algebra, Fourier transforms, random number generation and more.

Python pandas LibraryMd. Sohag Miah

This document provides an overview of Pandas, a Python library used for data analysis and manipulation. Pandas allows users to manage, clean, analyze and model data. It organizes data in a form suitable for plotting or displaying tables. Key data structures in Pandas include Series for 1D data and DataFrame for 2D (tabular) data. DataFrames can be created from various inputs and Pandas includes input/output tools to read data from files into DataFrames.

Introduction to KerasJohn Ramey

An introduction to Keras, a high-level neural networks library written in Python. Keras makes deep learning more accessible, is fantastic for rapid protyping, and can run on top of TensorFlow, Theano, or CNTK. These slides focus on examples, starting with logistic regression and building towards a convolutional neural network. The presentation was given at the Austin Deep Learning meetup: https://www.meetup.com/Austin-Deep-Learning/events/237661902/

Data science presentationMSDEVMTL

This document provides an overview of getting started with data science using Python. It discusses what data science is, why it is in high demand, and the typical skills and backgrounds of data scientists. It then covers popular Python libraries for data science like NumPy, Pandas, Scikit-Learn, TensorFlow, and Keras. Common data science steps are outlined including data gathering, preparation, exploration, model building, validation, and deployment. Example applications and case studies are discussed along with resources for learning including podcasts, websites, communities, books, and TV shows.

Python Scipy NumpyGirish Khanzode

This document provides an overview of the Python programming language. It discusses Python's history and evolution, its key features like being object-oriented, open source, portable, having dynamic typing and built-in types/tools. It also covers Python's use for numeric processing with libraries like NumPy and SciPy. The document explains how to use Python interactively from the command line and as scripts. It describes Python's basic data types like integers, floats, strings, lists, tuples and dictionaries as well as common operations on these types.

NUMPY SharmilaChidaravalli

NumPy is a Python library that provides multidimensional array and matrix objects to perform scientific computing. It contains efficient functions for operations on arrays like arithmetic, aggregation, copying, indexing, slicing, and reshaping. NumPy arrays have advantages over native Python sequences like fixed size and efficient mathematical operations. Common NumPy operations include elementwise arithmetic, aggregation functions, copying and transposing arrays, changing array shapes, and indexing/slicing arrays.

Data visualization in PythonMarc Garcia

This document discusses data visualization tools in Python. It introduces Matplotlib as the first and still standard Python visualization tool. It also covers Seaborn which builds on Matplotlib, Bokeh for interactive visualizations, HoloViews as a higher-level wrapper for Bokeh, and Datashader for big data visualization. Additional tools discussed include Folium for maps, and yt for volumetric data visualization. The document concludes that Python is well-suited for data science and visualization with many options available.

1 seaborn introduction YuleiLi3

This document introduces the Seaborn library for statistical data visualization in Python. It discusses how Seaborn builds on Matplotlib and Pandas to provide higher-level visualization functions. Specifically, it covers using distplot to create histograms and kernel density estimates, regplot for scatter plots and regression lines, and lmplot for faceted scatter plot grids. Examples are provided to illustrate customizing distplot, combining different plot elements, and using faceting controls in lmplot.

Introduction to Data ScienceEdureka!

This document provides an overview of a data science course. It discusses topics like big data, data science components, use cases, Hadoop, R, and machine learning. The course objectives are to understand big data challenges, implement big data solutions, learn about data science components and prospects, analyze use cases using R and Hadoop, and understand machine learning concepts. The document outlines the topics that will be covered each day of the course including big data scenarios, introduction to data science, types of data scientists, and more.

Data preprocessingGajanand Sharma

9. Document Oriented DatabasesFabio Fumarola

Data Science With PythonMosky Liu

This document provides an introduction to data science with Python. It discusses key concepts in data science including visualization, statistics, machine learning, deep learning, and big data. Various Python packages are introduced for working with data, including Jupyter, NumPy, SciPy, Matplotlib, Pandas, Scikit-learn and others. The document outlines the main steps in a data science analysis process, including defining assumptions, validating assumptions with data, and iterating. Specific techniques are covered like preprocessing, dimensionality reduction, statistical modeling, and machine learning modeling. The document emphasizes an iterative approach to learning through applying concepts to problems and data.

chapter 6 data visualization ppt.pptxsayalisonavane3

The document discusses data visualization and provides information on: 1) What data visualization is and its purpose of representing data graphically to make it easy to understand. 2) The challenges of big data visualization including visual noise, information loss, large image perception, and high performance requirements. 3) Approaches to big data visualization including tools like Hadoop, R, and D3. 4) D3 (Data Driven Documents) as an open-source JavaScript library for creating custom interactive data visualizations in the browser using SVG, HTML and CSS.

Python NumPy Tutorial | NumPy Array | EdurekaEdureka!

( Python Training: https://www.edureka.co/python ) This Edureka Python Numpy tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) explains what exactly is Numpy and how it is better than Lists. It also explains various Numpy operations with examples. Check out our Python Training Playlist: https://goo.gl/Na1p9G This tutorial helps you to learn the following topics: 1. What is Numpy? 2. Numpy v/s Lists 3. Numpy Operations 4. Numpy Special Functions

BIG DATA and USE CASESBhaskara Reddy Sannapureddy

This document discusses big data and use cases. It begins by reviewing the history and evolution of big data and advanced analytics. It then explains how technologies like Hadoop, stream processing, and in-memory computing support big data solutions. The document presents two use cases - analyzing credit risk by examining customer transaction data to improve credit offers, and detecting fraud by analyzing financial transactions for unusual patterns that could indicate suspicious activity. It describes how these use cases leverage technologies like Oracle R Connector for Hadoop to run analytics and machine learning algorithms on large datasets.

python-pandas-For-Data-Analysis-Manipulate.pptxPLOKESH8

4)12th_L-1_PYTHON-PANDAS-I.pptxAdityavardhanSingh15

Pandas is a Python library used for data analysis and manipulation. It contains data structures like Series and DataFrame. A Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, etc.). It is like a column in a DataFrame. A DataFrame is a two-dimensional data structure with labeled axes (rows and columns). It is like a spreadsheet or SQL table. This document discusses how to create Pandas Series objects by specifying data, indices, and datatypes. Methods to access Series attributes and elements are also described.

More Related Content

What's hot (20)

PandasJyoti shukla

HadoopNishant Gandhi

Introduction to Python Pandas for Data AnalyticsPhoenix

Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf

PPT on Data Science Using PythonNishantKumar1179

Introduction to NumPy (PyData SV 2013)PyData

Python pandas LibraryMd. Sohag Miah

Introduction to KerasJohn Ramey

Data science presentationMSDEVMTL

Python Scipy NumpyGirish Khanzode

NUMPY SharmilaChidaravalli

Data visualization in PythonMarc Garcia

1 seaborn introduction YuleiLi3

Introduction to Data ScienceEdureka!

Data preprocessingGajanand Sharma

9. Document Oriented DatabasesFabio Fumarola

Data Science With PythonMosky Liu

chapter 6 data visualization ppt.pptxsayalisonavane3

Python NumPy Tutorial | NumPy Array | EdurekaEdureka!

BIG DATA and USE CASESBhaskara Reddy Sannapureddy

PandasJyoti shukla

HadoopNishant Gandhi

Introduction to Python Pandas for Data AnalyticsPhoenix

Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf

PPT on Data Science Using PythonNishantKumar1179

Introduction to NumPy (PyData SV 2013)PyData

Python pandas LibraryMd. Sohag Miah

Introduction to KerasJohn Ramey

Data science presentationMSDEVMTL

Python Scipy NumpyGirish Khanzode

NUMPY SharmilaChidaravalli

Data visualization in PythonMarc Garcia

1 seaborn introduction YuleiLi3

Introduction to Data ScienceEdureka!

Data preprocessingGajanand Sharma

9. Document Oriented DatabasesFabio Fumarola

Data Science With PythonMosky Liu

chapter 6 data visualization ppt.pptxsayalisonavane3

Python NumPy Tutorial | NumPy Array | EdurekaEdureka!

BIG DATA and USE CASESBhaskara Reddy Sannapureddy

Similar to Data engineering and analytics using python (20)

python-pandas-For-Data-Analysis-Manipulate.pptxPLOKESH8

4)12th_L-1_PYTHON-PANDAS-I.pptxAdityavardhanSingh15

Lecture 3 intro2dataJohnson Ubah

Meetup Junio Data Analysis with python 2018DataLab Community

This document provides an introduction to data analysis techniques using Python. It discusses key Python libraries for data analysis like NumPy, Pandas, SciPy, Scikit-Learn and libraries for data visualization like matplotlib and Seaborn. It covers essential concepts in data analysis like Series, DataFrames and how to perform data cleaning, transformation, aggregation and visualization on data frames. It also discusses statistical analysis, machine learning techniques and how big data and data analytics can work together. The document is intended as an overview and hands-on guide to getting started with data analysis in Python.

Python Interview Questions PDF By ScholarHatScholarhat

Unit 3_Numpy_VP.pptxvishnupriyapm4

1. NumPy is a fundamental Python library for numerical computing that provides support for arrays and vectorized computations. 2. Pandas is a popular Python library for data manipulation and analysis that provides DataFrame and Series data structures to work with tabular data. 3. When performing arithmetic operations between DataFrames or Series in Pandas, the data is automatically aligned based on index and column labels to maintain data integrity. NumPy also automatically broadcasts arrays during arithmetic to align dimensions element-wise.

Unit 3_Numpy_VP.pptxvishnupriyapm4

This document provides an overview of working with DataFrames in Python using the Pandas library. It discusses: 1. What a DataFrame is - a two-dimensional, size-mutable, tabular data structure in Pandas for data manipulation. 2. How to create DataFrames from dictionaries, lists, CSV files and more. 3. Common tasks like viewing data, selecting rows/columns, modifying data, analysis and saving DataFrames. It also covers indexing and filtering DataFrames using labels or boolean conditions, arithmetic alignment in Pandas and NumPy, and vectorized computation in NumPy.

Python-for-Data-Analysis.pptxParveenShaik21

This document provides an overview of Python libraries for data analysis and data science. It discusses popular Python libraries such as NumPy, Pandas, SciPy, Scikit-Learn and visualization libraries like matplotlib and Seaborn. It describes the functionality of these libraries for tasks like reading and manipulating data, descriptive statistics, inferential statistics, machine learning and data visualization. It also provides examples of using these libraries to explore a sample dataset and perform operations like data filtering, aggregation, grouping and missing value handling.

Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks

This document summarizes a presentation on Spark SQL and its capabilities. Spark SQL allows users to run SQL queries on Spark, including HiveQL queries with UDFs, UDAFs, and SerDes. It provides a unified interface for reading and writing data in various formats. Spark SQL also allows users to express common operations like selecting columns, joining data, and aggregation concisely through its DataFrame API. This reduces the amount of code users need to write compared to lower-level APIs like RDDs.

An Introduction to Spark with ScalaChetan Khatri

The document provides an introduction to Apache Spark and Scala. It discusses that Apache Spark is a fast and general-purpose cluster computing system that provides high-level APIs for Scala, Java, Python and R. It supports structured data processing using Spark SQL, graph processing with GraphX, and machine learning using MLlib. Scala is a modern programming language that is object-oriented, functional, and type-safe. The document then discusses Resilient Distributed Datasets (RDDs), DataFrames, and Datasets in Spark and how they provide different levels of abstraction and functionality. It also covers Spark operations and transformations, and how the Spark logical query plan is optimized into a physical execution plan.

introduction to data structures in pandasvidhyapm2

Introduction to a Python Libraries and python frameworksyokeshmca

Data Analysis packagesDevashish Kumar

Pandas is a Python library used for data manipulation and analysis. It introduces two main data structures: Series and DataFrame. Series is a one-dimensional array-like object containing data and labels, while DataFrame is a spreadsheet-like structure containing an ordered collection of columns. NumPy must be installed before Pandas, SciPy, or other Python packages for scientific computing. These packages provide powerful tools for data analysis and visualization.

Python data structures - best in class for data analysisRajesh M

Robert Meyer- pypetPyData

This document describes Pypet, a Python parameter exploration toolbox. Pypet allows for easy exploration of parameter spaces and storage of simulation results and parameters. It revolves around a trajectory container, which uses a tree data structure to manage parameters and results in a natural naming scheme. Pypet supports a variety of data formats and storage via HDF5. It provides tools for disentangling simulations from I/O, logging, version control integration, and parallelization. Pypet is open source, well tested, and documented.

Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa

1) The document provides a quick guide to using data.table in R and Pentaho Data Integration (PDI) for fast data loading and manipulation. It discusses benchmarks showing data.table is 2-20x faster than traditional methods for reading, ordering, and transforming large data. 2) The outline discusses how to use basic data.table functions for speed gains and to overcome R's scaling limitations. It also provides a very brief overview of PDI's capabilities for Extract/Transform/Load (ETL) workflows without writing code. 3) The benchmarks section shows data.table is up to 500% faster than traditional R methods for reading large CSV files and orders of magnitude faster for sorting and aggregating

Unit 3_Numpy_Vsp.pptxprakashvs7

Vectorization refers to performing operations on entire NumPy arrays or sequences of data without using explicit loops. This allows computations to be performed more efficiently by leveraging optimized low-level code. Traditional Python code may use loops to perform operations element-wise, whereas NumPy allows the same operations to be performed vectorized on entire arrays. Broadcasting rules allow operations between arrays of different shapes by automatically expanding dimensions. Vectorization is a key technique for speeding up numerical Python code using NumPy.

Lecture 9.pptxMathewJohnSinoCruz

The document discusses various Python libraries used for data science tasks. It describes NumPy for numerical computing, SciPy for algorithms, Pandas for data structures and analysis, Scikit-Learn for machine learning, Matplotlib for visualization, and Seaborn which builds on Matplotlib. It also provides examples of loading data frames in Pandas, exploring and manipulating data, grouping and aggregating data, filtering, sorting, and handling missing values.

Pandas data transformational data structure patterns and challenges finalRajesh M

Recently uploaded (20)

LLM finetuning for multiple choice google bertChadapornK

Classification_in_Machinee_Learning.pptxwencyjorda88

IAS-slides2-ia-aaaaaaaaaaain-business.pdfmcgardenlevi9

Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPareaRusan

定制学历(美国Purdue毕业证)普渡大学电子版毕业证Taqyea

2025年新版美国毕业证普渡大学文凭【q微1954292140】办理普渡大学毕业证(Purdue毕业证书)国外学位认证/毕业证购买【q微1954292140】普渡大学offer/学位证、留信官方学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作【q微1954292140】Buy Purdue University Diploma购买美国毕业证，购买英国毕业证，购买澳洲毕业证，购买加拿大毕业证，以及德国毕业证，购买法国毕业证（q微1954292140）购买荷兰毕业证、购买瑞士毕业证、购买日本毕业证、购买韩国毕业证、购买新西兰毕业证、购买新加坡毕业证、购买西班牙毕业证、购买马来西亚毕业证等。包括了本科毕业证，硕士毕业证。主营项目： 1、真实教育部国外学历学位认证《美国毕业文凭证书快速办理普渡大学国外本科offer在线制作》【q微1954292140】《论文没过普渡大学正式成绩单》，教育部存档，教育部留服网站100%可查. 2、办理Purdue毕业证，改成绩单《Purdue毕业证明办理普渡大学制作成绩单》【Q/WeChat：1954292140】Buy Purdue University Certificates《正式成绩单论文没过》，普渡大学Offer、在读证明、学生卡、信封、证明信等全套材料，从防伪到印刷，从水印到钢印烫金，高精仿度跟学校原版100%相同. 3、真实使馆认证（即留学人员回国证明），使馆存档可通过大使馆查询确认. 4、留信网认证，国家专业人才认证中心颁发入库证书，留信网存档可查. 《普渡大学成绩单制作案例美国毕业证书办理Purdue2025年新版毕业证书》【q微1954292140】学位证1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。高仿真还原美国文凭证书和外壳，定制美国普渡大学成绩单和信封。毕业证办理需要多久拿到？Purdue毕业证【q微1954292140】办理美国普渡大学毕业证(Purdue毕业证书)【q微1954292140】文凭办理普渡大学offer/学位证成绩单定制、留信官方学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作。帮你解决普渡大学学历学位认证难题。美国文凭普渡大学成绩单，Purdue毕业证【q微1954292140】办理美国普渡大学毕业证(Purdue毕业证书)【q微1954292140】专业定制国外成绩单普渡大学offer/学位证成绩单温感光标、留信官方学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作。帮你解决普渡大学学历学位认证难题。美国文凭购买，美国文凭定制，美国文凭补办。专业在线定制美国大学文凭，定做美国本科文凭，【q微1954292140】复制美国Purdue University completion letter。在线快速补办美国本科毕业证、硕士文凭证书，购买美国学位证、普渡大学Offer，美国大学文凭在线购买。【q微1954292140】帮您解决在美国普渡大学未毕业难题（Purdue University）文凭购买、毕业证购买、大学文凭购买、大学毕业证购买、买文凭、日韩文凭、英国大学文凭、美国大学文凭、澳洲大学文凭、加拿大大学文凭（q微1954292140）新加坡大学文凭、新西兰大学文凭、爱尔兰文凭、西班牙文凭、德国文凭、教育部认证，买毕业证，毕业证购买，买大学文凭，购买日韩毕业证、英国大学毕业证、美国大学毕业证、澳洲大学毕业证、加拿大大学毕业证（q微1954292140）新加坡大学毕业证、新西兰大学毕业证、爱尔兰毕业证、西班牙毕业证、德国毕业证，回国证明，留信网认证，留信认证办理，学历认证。从而完成就业。普渡大学毕业证办理，普渡大学文凭办理，普渡大学成绩单办理和真实留信认证、留服认证、普渡大学学历认证。学院文凭定制，普渡大学原版文凭补办，扫描件文凭定做，100%文凭复刻。特殊原因导致无法毕业，也可以联系我们帮您办理相关材料：１：在普渡大学挂科了，不想读了，成绩不理想怎么办？？？ 2：打算回国了，找工作的时候，需要提供认证《Purdue成绩单购买办理普渡大学毕业证书范本》【Q/WeChat：1954292140】Buy Purdue University Diploma《正式成绩单论文没过》有文凭却得不到认证。又该怎么办？？？美国毕业证购买，美国文凭购买，

VKS-Python-FIe Handling text CSV Binary.pptxVinod Srivastava

Principles of information security Chapter 5.pptEstherBaguma

FPET_Implementation_2_MA to 360 Engage Direct.pptxssuser4ef83d

Deloitte Analytics - Applying Process Mining in an audit contextProcess mining Evangelist

Mieke Jans is a Manager at Deloitte Analytics Belgium. She learned about process mining from her PhD supervisor while she was collaborating with a large SAP-using company for her dissertation. Mieke extended her research topic to investigate the data availability of process mining data in SAP and the new analysis possibilities that emerge from it. It took her 8-9 months to find the right data and prepare it for her process mining analysis. She needed insights from both process owners and IT experts. For example, one person knew exactly how the procurement process took place at the front end of SAP, and another person helped her with the structure of the SAP-tables. She then combined the knowledge of these different persons.

Simple_AI_Explanation_English somplr.pptxssuser2aa19f

chapter 4 Variability statistical research .pptxjustinebandajbn

Ch3MCT24.pptx measure of central tendencyayeleasefa2

Digilocker under workingProcess Flow.pptxsatnamsadguru491

04302025_CCC TUG_DataVista: The Design Storyccctableauusergroup

computer organization and assembly language.docxalisoftwareengineer1

Deloitte - A Framework for Process Mining ProjectsProcess mining Evangelist

Tijn van der Heijden is a business analyst with Deloitte. He learned about process mining during his studies in a BPM course at Eindhoven University of Technology and became fascinated with the fact that it was possible to get a process model and so much performance information out of automatically logged events of an information system. Tijn successfully introduced process mining as a new standard to achieve continuous improvement for the Rabobank during his Master project. At his work at Deloitte, Tijn has now successfully been using this framework in client projects.

ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxpankaj6188303

03 Daniel 2-notes.ppt seminario escatologiaAlexander Romero Arosquipa

Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...James Francis Paradigm Asset Management

By James Francis, CEO of Paradigm Asset Management In the landscape of urban safety innovation, Mt. Vernon is emerging as a compelling case study for neighboring Westchester County cities. The municipality’s recently launched Public Safety Camera Program not only represents a significant advancement in community protection but also offers valuable insights for New Rochelle and White Plains as they consider their own safety infrastructure enhancements.

EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbJessaMaeEvangelista2

LLM finetuning for multiple choice google bertChadapornK

Classification_in_Machinee_Learning.pptxwencyjorda88

IAS-slides2-ia-aaaaaaaaaaain-business.pdfmcgardenlevi9

Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPareaRusan

定制学历(美国Purdue毕业证)普渡大学电子版毕业证Taqyea

VKS-Python-FIe Handling text CSV Binary.pptxVinod Srivastava

Principles of information security Chapter 5.pptEstherBaguma

FPET_Implementation_2_MA to 360 Engage Direct.pptxssuser4ef83d

Deloitte Analytics - Applying Process Mining in an audit contextProcess mining Evangelist

Simple_AI_Explanation_English somplr.pptxssuser2aa19f

chapter 4 Variability statistical research .pptxjustinebandajbn

Ch3MCT24.pptx measure of central tendencyayeleasefa2

Digilocker under workingProcess Flow.pptxsatnamsadguru491

04302025_CCC TUG_DataVista: The Design Storyccctableauusergroup

computer organization and assembly language.docxalisoftwareengineer1

Deloitte - A Framework for Process Mining ProjectsProcess mining Evangelist

ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxpankaj6188303

03 Daniel 2-notes.ppt seminario escatologiaAlexander Romero Arosquipa

Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...James Francis Paradigm Asset Management

EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbJessaMaeEvangelista2

Data engineering and analytics using python

1. Data Engineering and Analytics using Python PURNA CHANDER RAO. KATHULA

2. Talking Topics  Jupyter notebook  About me  Python modules for Data Science  Anaconda  Pandas  About pandas  Data Munging / Data Preparation.  Demo  Seaborn  About seaborn  Machine Learning  Linear Regression.

3. About me..  Job Title = Architect QA  Build Tools using Python for QA automation testing .  Currently Learning

4. Python modules for Data Science  Packages used for Data Analysis and Analytics  Jupyter Notebook  Pandas  Numpy  Scipy  Matplotlib  Seaborn  Scikitlearn

5. Anaconda

6. Anaconda Distribution

7. What is Anaconda ?  Essentially a Large ( ~ 400 MB ) Python Installation.  But Contains Everything you need for Data Analysis  Unless you have a special reason not to , you should just install and use this.

8. Pandas

9. About Pandas  What is Pandas ? Pandas is a Python library for data analysis and data manipulation. A python version of the R data.frame library.  Key Features of Pandas  It has API’s for loading data from different file formats into memory.  ( exel, tsv, csv, db and etc).  Data is structured in the form of Rows and Columns.  Retrieval of data is similar as SQL, can perform all the operations such as Groupby, Joins, Views and etc..  Merging of data from multiple datasets.  Does support much of DataTime series functionality, Timezone, Business Days, Holidays and etc..  Boolean Indexing  Fancy Indexing

10. Core DataStructures of Pandas  DataFrames  Series Core Operations Create Select Insert Map Join Sort Clean ApplyMap View Update Filter Append Group Summarize Confirm Rotate

11. Create ( Creating a DataFrame) View ( Viewing the rows and columns)

12. View ( Viewing the rows and columns)

13. Insert ( Adding a new column to dataframe)

14. Filter ( Slicing and dicing the datframe)

15. Map ( Map() and Apply map())

16. Append (Joining the dataframes based on x-axis=0 )

17. Concat (Joining the dataframes on Axis = 0 or 1)

18. Join ( Inner , Left, Right , Outer)

19. Join ( Inner )

20. Join ( Outer)

21. Join ( Left)

22. Join ( Right)

23. Group (groupby() )

24. Sort (by columns ascending True or False)

25. Clean ( Drop, Fillna, duplicates)

26. Clean ( Drop)

27. Clean ( Fillna ( method=‘ffill / bfill’)

28. Conform ( reindex() / resample, dropping / NAN as needed)

29. ReSample ()

30. ReSample (Monthly, Weekly, Yearly)

31. Rotate ( Transpose)

32. Rotate ( Pivot_table)

33. Rotate ( Stack)

34. Rotate ( unStack)

35. SeaBorn Analytics

36. What is Seaborn?  Seaborn provides a high-level interface to matplotlib. It provides a high level interface for drawing attractive statistical graphs.

37. Demo ( Restaurant Dataset visualization)

38. Machine Learning ( Linear Regression) DEMO

Data engineering and analytics using python

Recommended

More Related Content

What's hot (20)

Similar to Data engineering and analytics using python (20)

Recently uploaded (20)

Data engineering and analytics using python