0% found this document useful (0 votes)
63 views

Pandas Dataframe Export The CSV File

Pandas is an open source library that provides tools for data analysis and manipulation. It offers flexible data structures like Series and DataFrames. Series are one-dimensional arrays that can store data of any type. DataFrames are two-dimensional data structures that allow storing and manipulating tabular data. DataFrames can be created from lists, dictionaries, NumPy arrays, or CSV files. They provide methods for viewing, selecting, and describing data.

Uploaded by

ammouna beng
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Pandas Dataframe Export The CSV File

Pandas is an open source library that provides tools for data analysis and manipulation. It offers flexible data structures like Series and DataFrames. Series are one-dimensional arrays that can store data of any type. DataFrames are two-dimensional data structures that allow storing and manipulating tabular data. DataFrames can be created from lists, dictionaries, NumPy arrays, or CSV files. They provide methods for viewing, selecting, and describing data.

Uploaded by

ammouna beng
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction to dataframe

What’s pandas
Pandas is an open source package that provides numerous tools for data analysis. It
offers fast, flexible and expressive data structures that can be used for many
different data manipulation tasks.

In order to use Pandas in your Python IDE you need to import the Pandas library first :

import pandas as pd

Pandas data structures


The two primary data structures of pandas are:

1. Series: is one-dimensional array. It can store data of any type. Its values are mutable
but the size cannot be changed.
2. DataFrame: is two-dimensional data with mutable size, it allows to store and
manipulate tabular data in rows of observations and columns of variables.

How to create series


A series may be created from:
1. A numpy array:

import pandas as pd

import numpy as np
array = np.array(["blue", "yellow", "pink", "purple"]) # get the array

series1 = pd.Series(array) #create the series from the array

print(series1)

2. A list:

list = [19, 175, 41, 22]

series2 = pd.Series(list) #create the series from the list

print(series2)```

https://www.tutorialspoint.com/python_pandas/python_pandas_series.htm

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.add.html

https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781787123137/3/
ch03lvl1sec31/re-indexing-a-series

Series Changing Index


A big advantage we gain compared to NumPy arrays is that we can create a Series
using our own indexes.
For example:

import pandas as pd

color=["pink", "white", "black", "blue"]

occurence = [20, 15, 6, 43]

S=pd.Series(occurence, index=color)

print(S)

Series addition
If we add two series with the same index, we get a new series with the same index
and the corresponding values will be added :

import pandas as pd

color=["pink", "white", "black", "blue"]


S1=pd.Series([20, 15, 6, 43], index=color)

S2=pd.Series([3, 22, 9, 10], index=color)

print(S1+S2)

Dataframe Introduction
DataFrame is a 2-dimensional labeled data structure with columns of potentially
different types.

 Dataframe columns are made up of pandas Series.

You can think of it like a spreadsheet or SQL table,or a dictionary of Series.

How to create DataFrame


1. From a list:

import pandas as pd

list = [['Jack', 34, 'Paris'], ['Thomas', 30, 'Roma'],

['Alexandre', 16, 'New York']]

df = pd.DataFrame(list, columns =['name', 'age', 'city'])

2. From dictionary:

dictionary = { 'name' : ['Jack', 'Thomas', 'Alexandre'],

'age' : [34, 30, 16],

'city' : ['Paris', 'Roma', 'New York']}

df = pd.DataFrame(dictionary)
print(df)

3. From numpy array:

import numpy as np

import pandas as pd

my_numpy_array=np.random.randn(3,4)

df=pd.DataFrame(my_numpy_array, columns=list("abcd"))

print(df)

4. From csv file:

Let's create a dataframe from this csv file

import pandas as pd

df=pd.read_csv("csv file example", sep=";")

https://databricks.com/glossary/pandas-dataframe

Dataframe exportation to csv


1. We created a dataframe using dictionary.
2. We uploaded it into a csv file.
3. We created a new dataframe from our csv file.
4. We used the head command to show the first 5 rows.
https://datatofish.com/export-dataframe-to-csv/

Getting information about dataframe


To show general information of different columns such as the type, we write:

df.info()

 Viewing our data

df.head() #to show the first 5 rows of our data

df.tail() # to show the last 5 rows of our data

 Describing our data

Now, let’s use the describe command for calculating some statistical data for one
specific column.
df.describe() # we will get a detailed description of numerical variables of our data
such as mean, min, std, max...etc

https://note.nkmk.me/en/python-pandas-head-tail/

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html
Dataframe Bracket Selection
In this course, we will often have to select specific rows or columns from our
DataFrame.
One of the easiest ways to do that is to use brackets:

https://datatofish.com/select-rows-pandas-dataframe/

https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.loc.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.iloc.html

Dataframe loc/iloc
 Dataframe loc
The loc() method allows us to extract rows and columns by labeled index.

df.index=["Jack", "Thomas", "Alexandre", "Anne"] #get the index labeled

df.loc[["Jack", "Thomas"]] #select the first and second row

 Dataframe iloc
The iloc() follows the same rules as loc(). It extracts rows and columns by selecting
indexes.

df.iloc[:, 1:3] #select the second and third columns with keeping all rows

print(df)
Setting index in dataframe
We can use the set_index() function if we want to replace the index using one or
more existing column.

 Old Index

 New Index

https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.set_index.html
Dataframe Concatenate

https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.drop.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

Dataframe drop
To drop specified labels from rows or columns, we simply use drop() method.
For example, we want to delete the country column we added previously:

df.drop("country", axis=1)

drop() method has inplace=False as default, you can see that the country column is
not gone. Take a break & make some research.

https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.drop.html

Dataframe with Pandas Recap


 Pandas is an open source library which offers several tools for data analysis. It
provides fast, flexible and expressive data struc tures that can be used for numerous
data manipulation tasks.
 We can create DataFrame from list, dictionary, numpy array or csv file.
 To convert data to a csv file : data.to_csv(“file.csv”)
 To see information about columns : dataframe.info()
 To see a brief description about columns and their values : dataframe.describe()
 To view the first 5 rows : dataframe.head()
 To view the last 5 rows : dataframe.tail()
 There are multiple ways to select rows and columns from Pandas DataFrames. Iloc
and loc are the main operations for retrieving data: The iloc method is used to select
indexes for Pandas Dataframe. Whereas loc method ensure the extraction by selecting
labels .
 We can also select specific rows or columns from using the brackets.

You might also like