Pandas

Series is like a column (has only one dimension), and a DataFrame (has two dimensions) is like a table or a spreadsheet with rows and columns

Create panda series

Method	Syntax	Description	Example
From a list	pd.Series(data=list)	Series from a Python list	data = [1, 2, 3] s = pd.Series(data) pd.Series([2]*5)
From a NumPy array	pd.Series(data=np.array)	Series from a NumPy array	data = np.array([1, 2, 3]) s = pd.Series(data)
From a dictionary	pd.Series(data=dict)	Series from a dictionary (keys become index)	data = {‘a’: 1, ‘b’: 2, ‘c’: 3} s = pd.Series(data)
From a scalar value	pd.Series(data=scalar, index=index)	Series with a single value and specified index	s = pd.Series([5,6,7], index=[‘a’, ‘b’, ‘c’])
String			pd.Series(list(“String”))
arrange			pd.Series(np.arange(1,2))
random			pd.Series(np.random.normal(size=3))

Panda series operations

Method	Description	Example
Length	Length	x=pd.Series(np.arange(1,10)) len(x)
Shape	Size	x.shape (9,)
Values	Values	x.values
Index	Access index	x.index o/p: RangeIndex(start=0, stop=9, step=1)
Unique values	Value counts	z.value_counts() z.value_counts().head(2) z.value_counts().head(2).values

DataFrames

Series object contains two components – a set of values, and index labels

DataFrame has three components – column, index, and a NumPy array object that contains the values

Method	Syntax	Description	Example
From a dictionary	pd.DataFrame(data=dict)	from a dictionary	data = {‘col1’: [1, 2], ‘col2’: [3, 4]} df = pd.DataFrame(data)
From a list of lists	pd.DataFrame(data=list_of_lists, columns=column_names)	from a list of lists	data = [[1, 2], [3, 4]] df = pd.DataFrame(data, columns=[‘col1’, ‘col2’])
From a NumPy array	pd.DataFrame(data=np_array, columns=column_names)	from a NumPy array	data = np.array([[1, 2], [3, 4]]) df = pd.DataFrame(data, columns=[‘col1’, ‘col2’]) pd.DataFrame(np.arange(1,9).reshape(2,4))
Tuples			combined_ages=pd.DataFrame([(22,24,20),(40,50,45)],columns=[‘class 1′,’class 2′,’class 3’])
From a Series	pd.DataFrame(data=series)	from a Series (single column)	s = pd.Series([1, 2, 3]) df = pd.DataFrame(s, columns=[‘col1’])
Combine series			student_ages=pd.Series([22,24,20]) #series 1 teacher_ages=pd.Series([40,50,45])#series 2 combined_ages=pd.DataFrame([student_ages, teacher_ages]) #DataFrame combined_ages.columns=[‘class 1′,’class 2’, ‘class 3’]#naming columns
From a CSV file	pd.read_csv(‘file.csv’)	Reads data from a CSV	df = pd.read_csv(‘data.csv’)
From an Excel file	pd.read_excel(‘file.xlsx’)	Reads data from an Excel	df = pd.read_excel(‘data.xlsx’)
Json, html			df = pd.read_json(‘data.json’) url=”https://www.@.com” table=pd.read_html(url) table[0]

Accessing attributes in a DataFrame

a_ages=pd.DataFrame({c1′:[10,2],c2′:[2,6],c3′:[1,8]})

index attribute: a.index
columns attribute: a.columns
value: a.values

Modifying Dataframe

Operation	Syntax	Example
Adding a column	df[‘new_column’] = values	df = pd.DataFrame({‘A’: [1, 2, 3]}) df[‘B’] = [4, 5, 6]df.insert(2,’c0′,[18,35]) #here 2 is column numberdf.loc[:,’c4]=[20,40] #add new column at enddf = pd.Series([1,2]) df=pd.concat([df,C5],axis=1)
Adding a row	df.loc[index] = values	df.loc[3] = [7, 8]df=df.append({‘C1′:11,’C2′: 22,’C3’:33},ignore_index=True)new_row=pd.DataFrame([{‘C1′:11,’C2′: 22,’C3’:33}]); pd.concat([df,new_row])
Removing a column	del df[‘column_name’] or df.drop(‘column_name’, axis=1)	del df[‘B’]df.ages.pop(‘C2′)df.drop([‘C1’],axis=1,inplace=True)
Removing a row	df.drop(index, axis=0)	df.drop(2, axis=0)df[~(df.values<10)] #using boolean selection
Renaming columns	df.rename(columns={‘old_name’: ‘new_name’})	df.rename(columns={‘A’: ‘ColumnA’}, inplace=True)

Modifying values	df.loc[row_index, column_name] = new_value	df.loc[0, ‘A’] = 10
Filtering rows	df[condition]	filtered_df = df[df[‘A’] > 2]
Sorting DataFrame	df.sort_values(by=’column_name’)	df.sort_values(by=’A’, ascending=False)
Replace		df.replace({8:15})

Indexing

Operation	Description	Syntax	Example
Creating a Custom Index	Sets a new index for a DataFrame	df.set_index(‘column_name’, inplace=True)	df.set_index(‘Name’, inplace=True)
Searching Using an Index	Accesses data based on the index	df.loc[index_value]	df.loc[‘Alice’]
Searching Without Using an Index	Accesses data based on position	df.iloc[position]	df.iloc[0]
Immutability of an Index	Modifying the index creates a new DataFrame	df.index = new_index	df.index = df.index + 1
Alignment of Indexes	Aligns DataFrames based on their indexes	df1.align(df2, join=’inner’, axis=0)	df1.align(df2, join=’outer’, axis=1)
Set Operations on Indexes	Performs set operations on indexes	df1.index.union(df2.index)	df1.index.intersection(df2.index)
Symmetric Difference Operation	Returns elements in either set but not both	df1.index.symmetric_difference(df2.index)	df1.index ^ df2.index

import pandas as pd

data = {‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’],

‘Age’: [25, 30, 28],

‘City’: [‘New York’, ‘Los Angeles’, ‘Chicago’]}

df = pd.DataFrame(data)

	Name	Age	City
0	Alice	25	New York
1	Bob	30	Los Angeles
2	Charlie	28	Chicago

# Creating a custom index

df.set_index(‘Name’, inplace=True)

	Age	City
Name
Alice	25	New York
Bob	30	Los Angeles
Charlie	28	Chicago

# Searching using the index

print(df.loc[‘Alice’])

	Age	City
Alice	25	New York

# Searching without using the index

print(df.iloc[1])

	Age	City
Bob	30	Los Angeles

# Immutability of index

new_index = df.index + ‘_new’

df_new.index = new_index

print(df_new)

	Age	City
Name
Alice_new	25	New York
Bob_new	30	Los Angeles
Charlie_new	28	Chicago

# Alignment of indexes

# Create two DataFrames with different indexes

df1 = pd.DataFrame({‘A’: [1, 2], ‘B’: [3, 4]}, index=[‘x’, ‘y’])

A B
x 1 3
y 2 4

df2 = pd.DataFrame({‘C’: [5, 6], ‘D’: [7, 8]}, index=[‘x’, ‘z’])

C D
x 5 7
z 6 8

# Inner join

print(df1.join(df2, how=’inner’))

A B C D
x 1 3 5 7

# Outer join

print(df1.join(df2, how=’outer’))

A B C D
x 1.0 3.0 5.0 7.0
y 2.0 4.0 NaN NaN
z NaN NaN 6.0 8.0

# Left join

print(df1.join(df2, how=’left’))

A B C D
x 1 3 5.0 7.0
y 2 4 NaN NaN

# Right join

print(df1.join(df2, how=’right’))

A B C D
x 1.0 3.0 5 7
z NaN NaN 6 8

# Set operations on indexes

# Create two DataFrames with different indexes

df1 = pd.DataFrame({‘A’: [1, 2, 3]}, index=[‘x’, ‘y’, ‘z’])

df2 = pd.DataFrame({‘B’: [4, 5]}, index=[‘x’, ‘w’])

print(df1,”\n”,df2)

A
x 1
y 2
z 3

B
x 4
w 5

# Perform set operations on indexes

union_index = df1.index.union(df2.index)

intersection_index = df1.index.intersection(df2.index)

difference_index = df1.index.difference(df2.index)

symmetric_difference_index = df1.index.symmetric_difference(df2.index)

print(“Union:”, union_index)

Union: Index([‘w’, ‘x’, ‘y’, ‘z’], dtype=’object’)

print(“Intersection:”, intersection_index)

Intersection: Index([‘x’], dtype=’object’)

print(“Difference:”, difference_index)

Difference: Index([‘y’, ‘z’], dtype=’object’)

print(“Symmetric Difference:”, symmetric_difference_index)

Symmetric Difference: Index([‘w’, ‘y’, ‘z’], dtype=’object’)

Group by and Aggregation

grouped.agg(func, axis=0)

Operation	Description	Syntax	Example
Group by one column	Group data by a single column	df.groupby(‘column_name’)	df.groupby(‘Department’)
Group by multiple columns	Group data by multiple columns	df.groupby([‘column1’, ‘column2’])	df.groupby([‘Department’, ‘Location’])
Apply aggregation function	Apply a function to each group	grouped.agg(func)	grouped.agg(‘mean’)
Apply multiple aggregation functions	Apply multiple functions to each group	grouped.agg([‘sum’, ‘mean’])	grouped.agg({‘Sales’: [‘sum’, ‘mean’], ‘Profit’: ‘max’})

Aggregate functions

sum(): Calculates the sum.
mean(): Calculates the mean.
min(): Calculates the minimum.
max(): Calculates the maximum.
count(): Counts the number of non-null values.
std(): Calculates the standard deviation.
var(): Calculates the variance.
first(): Returns the first value.
last(): Returns the last value.
describe(): Generates descriptive statistics.

Operation	Description	Syntax	Example
get_group()	Retrieves a specific group from a GroupBy object	grouped.get_group(name)	grouped.get_group(‘Sales’)
Filtering groups	Filters groups based on conditions	grouped.filter(lambda x: condition)	grouped.filter(lambda x: x[‘Salary’].mean() > 50000)
.transform()	Applies a function to each group and returns a Series or DataFrame with the same shape as the original groups	grouped.transform(func)	grouped[‘Salary’].transform(lambda x: x – x.mean())
.apply()	Applies a function to each group and returns a DataFrame with the transformed values	grouped.apply(func)	grouped.apply(lambda x: x.max() – x.min())

Operation	Description	Syntax	Example
.append()	Appends rows from another DataFrame or Series	df.append(other, ignore_index=False, verify_integrity=False)	df = df.append(new_data, ignore_index=True)
pd.concat()	Concatenates objects along a particular axis	pd.concat(objs, axis=0, join=’outer’, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)	combined_df 1. stackoverflow.com stackoverflow.com = pd.concat([df1, df2], axis=1)
pd.join()	Joins columns of two DataFrames based on their indexes	df1.join(df2, how=’inner’, lsuffix=”, rsuffix=”, sort=False)	merged_df = df1.join(df2, how=’left’)
pd.merge()	Merges DataFrames based on one or more columns	pd.merge(left, right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(‘_x’, ‘_y’), copy=True, indicator=False, validate=None)	merged_df = pd.merge(df1, df2, on=’key_column’, how=’outer’)

data = {‘Department’: [‘Sales’, ‘Marketing’, ‘Sales’, ‘Marketing’, ‘HR’],

‘Salary’: [50000, 60000, 45000, 55000, 35000]}

df = pd.DataFrame(data)

	Department	Salary
0	Sales	50000
1	Marketing	60000
2	Sales	45000
3	Marketing	55000
4	HR	35000

# Groupby and get a group

grouped = df.groupby(‘Department’)

sales_group = grouped.get_group(‘Sales’)

	Department	Salary
0	Sales	50000
2	Sales	45000

# Filter groups

high_salary_groups = grouped.filter(lambda x: x[‘Salary’].mean() > 50000)

	Department	Salary
1	Marketing	60000
3	Marketing	55000

# Transform and apply

salary_diff = grouped[‘Salary’].transform(lambda x: x – x.mean())

summary_stats = grouped.apply(lambda x: pd.Series({‘min’: x.min(), ‘max’: x.max()}))

	min	max
Department			Department
HR	Department HR Salary 35000 dtype:…	Department HR Salary 35000 dtype:…	HR
Marketing	Department Marketing Salary 5500…	Department Marketing Salary 6000…	Marketing
Sales	Department Sales Salary 45000 dtype:…	Department Sales Salary 50000 dtype:…	Sales

# Append, concat (use concat instead of append)

new_data = {‘Department’: [‘IT’], ‘Salary’: [65000]}

df_new = pd.DataFrame(new_data)

df_appended = pd.concat([df, df_new], ignore_index=True)

	Department	Salary
0	Sales	50000
1	Marketing	60000
2	Sales	45000
3	Marketing	55000
4	HR	35000
5	IT	65000

df1 = pd.DataFrame({‘A’: [1, 2], ‘B’: [3, 4]}, index=[‘x’, ‘y’])

df2 = pd.DataFrame({‘C’: [5, 6], ‘D’: [7, 8]}, index=[‘x’, ‘z’])

A B

x 1 3

y 2 4

C D

x 5 7

z 6 8

df_joined = df1.join(df2, how=’outer’)

A B C D

x 1.0 3.0 5.0 7.0

y 2.0 4.0 NaN NaN

z NaN NaN 6.0 8.0

df_merged = pd.merge(df1, df2, left_index=True, right_index=True, how=’inner’)

A B C D

x 1 3 5 7

# Pivot table

pivot_table = df.pivot_table(index=’Department’, values=’Salary’, aggfunc=’mean’)

print(pivot_table)

Pandas (Panel Data)

Pandas

Venu Kumar M

Pandas

Venu Kumar M

Related Posts

URL Navigator Chrome Extension

Ease of doing business in India

A study on Supply Chain Management (SCM) of Pharmaceutical Industry

Trending now