Importing and Saving Data

This is part two of my tutorial on learning python from the perspective of an R user. In this lesson, I will cover how to load in and save data using Pandas.

I will have a quick list of functions and their R equivalent on top.

For reading in data:

pd.read_csv() = read.csv()
pd.read_excel() = read.xlsx() for the first sheet

For saving data:

df.to_csv() = write.csv()
write.xlsx() = df.to_excel("df.xlsx", index = False, sheet_name = "name", startrow=0, startcol=0)

Reading in CSVs

# import loads in libraries
import pandas as pd

# read_csv to load in data
# you pass the path in. you can use relative paths from here
df = pd.read_csv('../data/pokemon_data.csv')

df.head(2)

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	60	62	63	80	80	60	1	False

Reading in Text Files

In order to read in txt files, you still use the pd.read_csv() command but you need to specify a delimiter. Some common delimiters are:

For semi colons: pd.read_csv('df.txt', sep = ';')
For tabs: pd.read_csv('df.txt', sep = '\t')
For bars: pd.read_csv('df.txt', sep = '|')

# loading in other file types

# you need to specify the delimiter
txt = pd.read_csv('pokemon_data.txt', sep = '\t')
txt.head(2)

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	60	62	63	80	80	60	1	False

Reading in Excel Files

This is very easy compared to R but there is still some trickiness.

# if you just want to read in the first sheet
xlsx = pd.read_excel('pokemon_data.xlsx')
xlsx.tail(2)

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
798	720.0	HoopaHoopa Unbound	Psychic	Dark	80.0	160.0	60.0	170.0	130.0	80.0	6.0	True
799	721.0	Volcanion	Fire	Water	80.0	110.0	120.0	130.0	90.0	70.0	6.0	True

The following will let you get the sheet by the index:

xls = pd.ExcelFile('pokemon_data.xlsx')
df1 = xls.parse(0)
df2 = xls.parse(1)

df2.tail(2)

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
5	5.0	Charmeleon	Fire	NaN	58.0	64.0	58.0	80.0	65.0	80.0	1.0	False
6	6.0	Charizard	Fire	Flying	78.0	84.0	78.0	109.0	85.0	100.0	1.0	False

The following will let you get the sheet by name:

xls = pd.ExcelFile('pokemon_data.xlsx')

# the sheet name is case sensitive
df1 = pd.read_excel(xls, 'sheet1')
df2 = pd.read_excel(xls, 'sheet2')

df2.tail(2)

	#	Name	Type 1	Type 2	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
5	5.0	Charmeleon	Fire	NaN	58.0	64.0	58.0	80.0	65.0	80.0	1.0	False
6	6.0	Charizard	Fire	Flying	78.0	84.0	78.0	109.0	85.0	100.0	1.0	False

Saving the Data

It is quite easy to save as a csv. Don't forget to set the index = False.

df.to_csv("df.csv", index = False)

Saving to Excel

There are a couple more options when saving something as an excel. You can specify the sheet name and the row & column start.

df.to_excel("df.xlsx", index = False, sheet_name = "test", startrow=2, startcol=1)

Saving to Text File

df.to_csv("df.txt", sep = '\t')