
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Difference Between NumPy and Pandas
Both pandas and NumPy are validly used powerful open-source libraries in python. These packages have their own applicability. A lot of pandas functionalities are built on top of NumPy, and they are both part of the SkiPy Analytics world.
Numpy stands for Numerical Python. NumPy is the core library for scientific computing. it can deal with multidimensional data, which is nothing but n-dimensional numerical data. Numpy array is a powerful N-dimensional array object which is in the form of rows and columns.
Many NumPy operations are implemented in the C language. It is fast and it requires less memory than pandas.
Numpy allows you to do every numerical task like linear algebra and many other advanced linear algebra tasks. These include tasks like inverting a matrix, Singular value decomposition, determinant estimation, etc.
Let’s take an example and see how we gonna do mathematical operations.
Example
import numpy as np arr = np.array([[2,12,3], [10,5,7],[9,8,11]]) print(arr) arr_inv = np.linalg.inv(arr) print(arr_inv)
Explanation
The first line of the above block imports the NumPy module and np is representing the alias name for the NumPy module. The variable arr is a 2-Dimensional array and it has 3 rows and 3 columns. After that, we are calculating the inverse matrix of our array arr by using the inv() function available in the numpy.linalg (linear algebra) module.
Output
[[ 2 12 3] [10 5 7] [ 9 8 11]] [[ 0.0021692 0.23427332 -0.14967462] [ 0.10195228 0.01084599 -0.03470716] [-0.07592191 -0.19956616 0.23861171]]
This output block has two arrays first one is representing the array of values from the arr variable and the second one is an inverted matrix of arr (variable arr_inv).
Pandas provides high-performance data manipulation in Python and it requires NumPy for operating as it is built on the top of NumPy. The name of Pandas is derived from the word Panel Data, which means Econometrics from Multidimensional data.
Pandas allows you to do most of the things that you can do with the spreadsheet with Python code, and NumPy majorly works with numerical data whereas Pandas works with tabular data. This tabular data can be any form like it may be CSV file or SQL data.
The Pandas provides powerful tools like DataFrame and Series that are mainly used for analyzing the data.
Let’s take an example and see how pandas will handle tabular data.
Example
data = pd.read_csv('titanic.csv') print(data.head())
Explanation
Pandas provides a number of functions to read any type of data into a pandas DataFrame or Series, in this above example we read the titanic data set as pandas dataframe. And displayed the output using the head() method.
Output
PassengerId Survived Pclass \ 0 1 0 3 1 2 1 1 2 3 1 3 3 4 1 1 4 5 0 3 Name Gender Age SibSp \ 0 Braund, Mr. Owen Harris male 22.0 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 2 Heikkinen, Miss. Laina female 26.0 0 3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 4 Allen, Mr. William Henry male 35.0 0 Parch Ticket Fare Cabin Embarked 0 0 A/5 21171 7.2500 NaN S 1 0 PC 17599 71.2833 C85 C 2 0 STON/O2. 3101282 7.9250 NaN S 3 0 113803 53.1000 C123 S 4 0 373450 8.0500 NaN S
As we can see, a pandas data frame can store any type of data whereas NumPy is only dealing with a numerical value.