0% found this document useful (0 votes)
9 views

pandas_merged

Uploaded by

shreeja471
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

pandas_merged

Uploaded by

shreeja471
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Pandas Cheat Sheet Updating Rows/Columns

Import convention
>>> import pandas as pd >>> df.rename(columns={'age':'Age'}) >>> p_df.append(
- Renames the column names >>> df.replace(to_replace=[51, 69.3], value = 58) {'name':'Jim lake',
- Replaces values ‘51, 69.3’ to 58 in the whole dataframe’. 'first':'Jim',

Creating data >>> df.loc[2, ['age', 'weight']] = [35, 89.1]


- Updates row values at given columns. >>> df['age'].replace({51:58})
'last':'lake'}, ignore_index=True)
- Appends rows to p_df and returns a new object.
Creating Series - Replace value ‘51’ in age column to ‘58’.
>>> df["age"].apply(lambda x: x + 5) >>> df2 = pd.DataFrame(
>>> s = pd.Series([3, -5, 7, 4], - Updates the column value as per the lambda function. >>> p_df[['first', 'last']] = {"place" : ["HYD","DEL"],
index=['a', 'b', 'c', 'd']) p_df['name'].str.split(' ', expand=True) "state" : ["TEL", "UP"]})
>>> df.apply(max) - Splits columns. >>> p_df.merge(df2, on="place")
Creating Dataframe - Applies the given function on dataframe. - Merge DataFrames or Series objects similar to SQL join operation.
>>> age_df = pd.DataFrame(
>>> df = pd.DataFrame(
>>> p_df = pd.DataFrame( {"age": [35, 17]}) >>> df.sort_values(by='age')
{"name" : ["Ram","Rahul","Ravi"],
{"name" : ["Jack Smith", 'Jane Lodge'], >>> pd.concat([p_df, age_df], axis=1)
"age" : [51, 28, 19], - Sort by the values of the given column.
"place" : ["HYD", "DEL"]})
"weight" : [69.3, 44.6, 36.9]}) - Concatenates pandas objects along axis.
>>> df['age'].nlargest(2)
>>> p_df.applymap(str.lower) >>> p_df.drop(labels='last', axis='columns') - Orders first 2 rows based on given column in descending order.
Loading Data - Applies the function to every element. - Removes rows or columns by specifying label names and
corresponding axis. >>> df['age'].nsmallest(2)
>>> df = pd.read_csv('data.csv')
>>> df['name'].map({'Rahul':'Raghu'}) - Orders first 2 rows based on given column in ascending order.
- Loading the data from a csv file into python.
- Map values of the Series according to input correspondence.

Properties of Dataframe Accessing Data Filtering Based on Criteria


>>> df.head(n) First n rows >>> df.loc[0] Row by label >>> df [df['age'] > 50] Extracts rows that meet logical criteria.
>>> df.tail(n) Last n rows >>> df.loc[[0, 2], Group of rows and columns by label(s)
>>> df.shape Shape of df ['age', 'weight']] >>> df.query('age < weight & age>=11') DataFrame resulting from the provided query expression.

>>> df.columns Column labels >>> df.iloc[[0, 1]] Group of rows and columns by indices.

>>> df.dtypes Datatypes of columns >>> filter = df['name'].str.contains('Rah') Series resulting from the provided string query expression.
>>>df.at[1,'weight'] Single value for a row-column label pair.
>>> df[filter]
>>>df.describe Summary statistics
Display Options Grouping and Aggregation Cleaning Data
>>> pd.set_option('display.max_rows', n)
Handling Missing Values Changing Datatypes
- Sets the max visible rows for dataframe. >>> df.groupby(by=['age', 'name'])
>>> nan_df = pd.DataFrame({ "A" :[1.0, -3.0, 1.0],
- Returns Groupby object grouped by values in given "B" : [1.0, np.nan, 1.0], >>> df['weight'].astype('int64')
>>> pd.reset_option('display') - Converts 'weight' column into integer.
columns. "C" : [3.0, -2.0, 3.0],
- Resets all the display options.
"D" :[1.0, -3.0, 1.0])}
>>> df['name'].value_counts() >>> nan_df.isna() >>> df.astype('string')
- Counts the number of times each value is repeated. - Returns a boolean same-sized object indicating if the values are NA. - Converts every element in the df to string.

Changing the Index >>> df.groupby('name')['age'].mean()


- Splits into groups based on 'age' and
>>> nan_df.fillna(2) >>> data = pd.DataFrame(
- Fills NA/NaN values with the given value. {'year': [2015, 2016],
aggregation done on 'name'.
'month': [2, 3],'day': [4, 5]})
>>> nan_df.dropna()
>>> df.set_index('name') >>> df.count() - Returns a DataFrame with the NaN entries dropped from it.
>>> datetime_df = pd.to_datetime(data)
- Set the index to become the ‘name’ column. - Counts non-NA cells for each column or row.
- Converts into datetime datatype.
>>> nan_df.replace('NA', np.nan, inplace=True)
>>> df.reset_index() - Handle other missing values by replacing with given value.
>>> df['age'].min() >>> datetime_df.dt.month
- Reset the index of df and use the default one.
- Returns minimum of the values. - Returns months in the timestamps.
>>> df.first_valid_index()
>>> df.sort_index(axis=0) - Index of the first non-NA/null value.
>>> df.aggregate(['sum', 'min', 'mean']) >>> datetime_df.dt.year
- Sort object by labels (along an axis).
- Aggregates the data using the functions: Handling Duplicates - Returns years in the timestamps.
>>> pd.read_csv('data.csv', index_col = 'sum', 'min', mean'. >>> nan_df.duplicated()
- Returns a boolean series for each of the duplicated rows. >>> datetime_df.dt.day_name()
'column_name')
- Setting the index while reading the csv file. >>> df['age'].cumsum() - Returns weekday in the timestamps.
- Returns the cumulative sum of a Series or DataFrame. >>> nan_df.drop_duplicates()
- Returns a dataframe with the duplicated rows removed.

You might also like