Pandas Dataframe Export The CSV File
Pandas Dataframe Export The CSV File
What’s pandas
Pandas is an open source package that provides numerous tools for data analysis. It
offers fast, flexible and expressive data structures that can be used for many
different data manipulation tasks.
In order to use Pandas in your Python IDE you need to import the Pandas library first :
import pandas as pd
1. Series: is one-dimensional array. It can store data of any type. Its values are mutable
but the size cannot be changed.
2. DataFrame: is two-dimensional data with mutable size, it allows to store and
manipulate tabular data in rows of observations and columns of variables.
import pandas as pd
import numpy as np
array = np.array(["blue", "yellow", "pink", "purple"]) # get the array
print(series1)
2. A list:
print(series2)```
https://www.tutorialspoint.com/python_pandas/python_pandas_series.htm
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.add.html
https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781787123137/3/
ch03lvl1sec31/re-indexing-a-series
import pandas as pd
S=pd.Series(occurence, index=color)
print(S)
Series addition
If we add two series with the same index, we get a new series with the same index
and the corresponding values will be added :
import pandas as pd
print(S1+S2)
Dataframe Introduction
DataFrame is a 2-dimensional labeled data structure with columns of potentially
different types.
import pandas as pd
2. From dictionary:
df = pd.DataFrame(dictionary)
print(df)
import numpy as np
import pandas as pd
my_numpy_array=np.random.randn(3,4)
df=pd.DataFrame(my_numpy_array, columns=list("abcd"))
print(df)
import pandas as pd
https://databricks.com/glossary/pandas-dataframe
df.info()
Now, let’s use the describe command for calculating some statistical data for one
specific column.
df.describe() # we will get a detailed description of numerical variables of our data
such as mean, min, std, max...etc
https://note.nkmk.me/en/python-pandas-head-tail/
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html
Dataframe Bracket Selection
In this course, we will often have to select specific rows or columns from our
DataFrame.
One of the easiest ways to do that is to use brackets:
https://datatofish.com/select-rows-pandas-dataframe/
https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.loc.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.iloc.html
Dataframe loc/iloc
Dataframe loc
The loc() method allows us to extract rows and columns by labeled index.
Dataframe iloc
The iloc() follows the same rules as loc(). It extracts rows and columns by selecting
indexes.
df.iloc[:, 1:3] #select the second and third columns with keeping all rows
print(df)
Setting index in dataframe
We can use the set_index() function if we want to replace the index using one or
more existing column.
Old Index
New Index
https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.set_index.html
Dataframe Concatenate
https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.drop.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
Dataframe drop
To drop specified labels from rows or columns, we simply use drop() method.
For example, we want to delete the country column we added previously:
df.drop("country", axis=1)
drop() method has inplace=False as default, you can see that the country column is
not gone. Take a break & make some research.
https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.drop.html