How To Use Pandas Library in Python

In our previous Python tutorial, we have explained How To Use Lambda Function in Python. In this tutorial, we will explain How To Use Pandas library in Python.

Pandas and python makes data science and analytics extremely easy and effective. Pandas is an open source Python library that allows the handling of tabular data.

We will cover following in this tutorial:

What is pandas?
What is data science or data analytics?
What Can Pandas Do?
Pandas installation
Pandas Series
Pandas DataFrames

What is Pandas?

Pandas is a Python library, Wes McKinney in 2008. It was mainly built to help to work with datasets in Python for finance related work.

Pandas is a widely used open source Python library for data science. It was was build to work with two-dimensional data structure called a DataFrame similar to Excel spreadsheets. It provides fast, flexible, high-performance, easy-to-use structures, and data analysis tools. It is used for working with datasets for analyzing, exploring, manipulating data, cleaning messy data sets, and make them readable and relevant as relevant data is important in data science.

What is data science or data analytics?

Data science or data analytics is a process of analyzig large set of data points to get ansers on questions related to that data set. Pandas is a Python library that makes data science easy and effective.

What Can Pandas Do?

Pandas desgined to work with data sets. With Pandas, we can get the corelations between two or more coloumns. We can also get avarage value, max and min value. We can also clean the messey data sets and delete rows that are not relevant, or have worng values.

Python pandas can be used for different kinds of data, such as:

Ordered and unordered data.
Unlabeled data.
Messy data sets.
Any type of observational or statistical data sets.

Pandas Installation

Now we will install Pandas library. If Python and PIP already installed on a system, then installation of Pandas is very easy. You can install it using below command:

pip install pandas

If above command fails due to any reason, then you can use a Python distribution that already has Pandas installed like, Anaconda, Spyder etc.

Operations on Pandas Series

The Pandas series is a one-dimensional array that contain any type of data. The series can be created using the following constructor:

pandas.Series(data, index, dtype, copy)

Now we will create a empty series by importing Padas:

import pandas as pd 
s = pd.Series()
print (s)

The above will output following:

Series([], dtype: float64)

There are number of ways to create a Pandas series.We can use lists, array and dictionary to create series. We will use these variables to create series.

Create Pandas Series from a Python List

Now we will create series using passing Python list.

import pandas as pd 
myList = [10, 20, 30, 40, 50]
s = pd.Series(myList)
print (s)

When we run above code, it will output series like below:

0    10
1    20
2    30
3    40
4    50
dtype: int64

The output is returned as two coloumn. As the series allows labeling, so the first coloumn is of lebel and second is the data from list.

We can add our own label by passing labels list and data list:

import pandas as pd 
labels = ['a', 'b', 'c', 'd', 'e']
myList = [10, 20, 30, 40, 50]
s = pd.Series(myList, index=labels)
print (s)

When we run above code, it will output series with label and data like below:

a    10
b    20
c    30
d    40
e    50
dtype: int64

The main advantage of using labels is that it allows to reference an element of the Series using its label instead of its numerical index.

Create Pandas Series from a Dictionary

We can also pass in a dictionary to create a pandas Series.

import pandas as pd 
dict = {'a':10, 'b':20, 'c':30, 'd':40, 'e':50}
s = pd.Series(dict)
print (s)

When we run above code, it will output series from dictionary with label and data like below:

a    10
b    20
c    30
d    40
e    50
dtype: int64

Create Pandas Series from NumPy Arrays

We can pass NumPy Arrays to create Pandas Series. Here we will import NumPy module and create array. Then pass that array to create Series.

import pandas as pd 
import numpy as np
myArray = np.array([10, 20, 30, 40, 50])
s = pd.Series(myArray)
print (s)

When we run above code, it will output series from NumPy Array data like below:

0    10
1    20
2    30
3    40
4    50
dtype: int32

Accessing Data From Pandas Series

We can access the data in the series by entering the index number of the element or the label on an element.

Accessing Series Data By Using Index

Here we will access series data by index:

import pandas as pd 
import numpy as np

myArray = np.array([10, 20, 30, 40, 50])
s = pd.Series(myArray)

print (s[0])
print (s[4])

When we run above code, it will output data like below:

10
50

Accessing Series Data By Using Label

Here we will access Series data by label.

import pandas as pd 
dict = {'a':10, 'b':20, 'c':30, 'd':40, 'e':50}
s = pd.Series(dict)
print (s['a'])
print (s['e'])

When we run above code, it will output data like below:

10
50

Pandas DataFrame

Pandas DataFrame is a 2 dimensional data structure like a 2-dimensional array, or a table in which data is arranged in the form of rows and columns. We can create a DataFrame using the following constructor:

pandas.DataFrame(data, index, columns, dtype, copy)

Now we will create a empty DataFrame by importing Padas:

import pandas as pd 
df = pd.DataFrame()
print (df)

The above will output following empty DataFrame:

Empty DataFrame
Columns: []
Index: []

Create a DataFrame from Python List

We can create a DataFrame by passing a simple data list.

import pandas as pd 
dataList = [1, 2, 3, 4, 5]
df = pd.DataFrame(dataList)
print (df)

The above program will output folliwng DataFrame with default indexes and values:

We can also pass data list array and coloumns to create DataFrame:

import pandas as pd 

dataList = [['smith', 20, 'India'],['william', 30, 'France'],['steve', 40, 'Britain'],['Andy', 35, 'Canada'],['Gary', 50, 'USA']]

df = pd.DataFrame(dataList, columns = ['Name', 'Age', 'Country'])

print (df)

The above program will output folliwng DataFrame with coloumns label and values:

      Name  Age  Country
0    smith   20    India
1  william   30   France
2    steve   40  Britain
3     Andy   35   Canada
4     Gary   50      USA

Creating a DataFrame from a Series Dictionary

We can also create a DataFrame by passing a series dictionary. Here we are passing series dictionary to form a DataFrame.

import pandas as pd

dict = {'India': pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 'Japan': pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])}

df = pd.DataFrame(dict)
print (df)

The above program will output folliwng DataFrame:

   India  Japan
a      1      1
b      2      2
c      3      3
d      4      4
e      5      5

Accessing Column

We can access a particular column by mentioning the column name. Here we are getting DataFrame by coloumn name.

import pandas as pd 

dict = {'India': pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 'Japan': pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])}

df = pd.DataFrame(dict)
print (df['Japan'])

The above program will output folliwng DataFrame:

a    1
b    2
c    3
d    4
e    5
Name: Japan, dtype: int64

Adding New column

We can add a new coloumn to DataFrame by assigning series data new coloumn.

import pandas as pd 

dict = {'India': pd.Series([7, 9, 13, 15, 35], index=['a', 'b', 'c', 'd', 'e']), 'Japan': pd.Series([5, 10, 15, 20, 25], index=['a', 'b', 'c', 'd', 'e'])}

df = pd.DataFrame(dict)

# Adding column
df['France'] = pd.Series([10, 20, 30, 40, 50, 60, 70], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

print (df)

The above program will output folliwng DataFrame after adding new column:

   India  Japan  France
a      7      5      10
b      9     10      20
c     13     15      30
d     15     20      40
e     35     25      50

Delete Column

We can delete a column from DataList using del or pop function.

Here deleteing column using del and pop function:

import pandas as pd 

dict = {'India': pd.Series([7, 9, 13, 15, 35], index=['a', 'b', 'c', 'd', 'e']), 'Japan': pd.Series([5, 10, 15, 20, 25], index=['a', 'b', 'c', 'd', 'e']), 'France' : pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])}

df = pd.DataFrame(dict)

# Delete a column using del function
del df['France']

# Delete a column using pop function
df.pop('Japan')

print (df)

The above program will output folliwng DataFrame after deleting two column:

Indexing a DataFrame

We can do integer-based indexing with DataFrame using iloc() method.

import pandas as pd 
import numpy as np 

df = pd.DataFrame(np.random.randn(6, 5), columns = ['A', 'B', 'C', 'D', 'E'])

print (df.iloc[:5])

The above program will output following:

          A         B         C         D         E
0 -0.469348 -0.596175  0.086608  0.651538 -1.191260
1 -0.664254 -0.901478  0.623666 -0.205776 -0.034960
2  1.349643 -1.349104 -0.757116  0.387509  1.166415
3 -2.437482 -0.006055 -0.682298 -0.039461  0.069462
4 -0.038990  0.048944  2.251811  0.353188 -1.451316

Conclusion

In this tutorial, we have covered about Python Pandas and its functions to use Pandas Serias and DataFrame. We will try to cover more functions related to Python Pandas in other tutorials. If you have any questions or comments, you can post them in comments section to get back to you.