Data slicing or indexing in python on datasets.

Kishan Tongrao
6 min readFeb 10, 2020

Hey guys this part is a very basic and important part. Before performing any action on the dataset we should know some rules of indexing and how indexing performs in Python.

Here we are using the Anaconda tool to perform some action on dataset which is a .csv file. By using data slicing we can perform data operations on limited data from datasets.

Import library to load a dataset

Library

When we are working with .csv file we always use pandas library to import the dataset into our Spyder.

Import dataset into Spyder

Import dataset

To import dataset as .csv file we have pd.read_csv(‘filename’). Initially we are assigning this dataset to a variable which is of type DataFrame.

Dataset in Spyder

Above is our dataset which contains some random dataset. We are performing all the operations of data slicing on an above dataset.

Assign the whole dataset to a new variable

There are three-way we can do this as follows.

1: Using ‘=’

2: Using .copy() method

3: Using [:]

Before going on we should know how to check the memory address of the dataset in the Python. So we use hex(id()) to get the address of the dataset in Python.

#1: Here we directly use ‘=’ to perform assign operation as follows.

Using ‘=’

But the fun fact is in the result.

‘=’ Output

We see some amazing results we see the addresses are the same. So dataset1 and dataset are pointing to same memory location.

#2: We use here the .copy() method to copy all data from the dataset variable to another variable.

.copy() method

and again we see some amazing results.

.copy() result

After performing the assign operation using .copy() method we can see the addresses are different. So dataset and datatset2 are not sharing same memory reference.

#3: This is a more common way to assign a dataset variable to another variable.

[:]

and off course the result.

[:] result

Here we see some interesting results .copy() and [:] addresses and not the same. So if we want to create a different dataset variable but not wasting some memory at that time just use the first method that is ‘=’. Because in other methods we can see the memory reference is changing.

Row operations

We saw how to copy or assign the whole dataset to another dataset variable. Now its time to perform some slicing on rows of the dataset.

To print a particular column from data we use the following methods.

?

You may things like this at first like I did but we get an error if we run this line of the script. I know you don’t believe me so see the below result.

Error

So how to display the first column from our dataset variable?

We use [‘column_name’] or variable.column_name to display entire column as below.

print entire column
Results

So we are finally able to display entire column data using indexing by name of columns.

How to display the first three rows only?

Only the first three rows
First three rows

Hmm…

How to display the first five-row using some other method?

First five rows
First five rows

Ok, now how to print the last row?

Last row

Row and Column operations

All rows and all columns.

All row and first two-column.

All rows and last column.

All rows and first three columns.

First two-row and first two columns.

Method to perform the indexing.

.head() is used to print rows and by default, it will print five rows.

.tail() is used to give the last five rows by default.

The first 7 represent number of rows and 6 represent number of columns in the dataset.

It will print multiplication of number of rows and number of columns.

It will print a number of rows in the dataset.

It will print all the column names from the dataset.

This is not over, perform some more operation and get more confidence.

Thank you guys.

--

--