Importing and Saving Data

This is part two of my tutorial on learning python from the perspective of an R user. In this lesson, I will cover how to load in and save data using Pandas.

I will have a quick list of functions and their R equivalent on top.

For reading in data:

For saving data:

Reading in CSVs

# import loads in libraries
import pandas as pd

# read_csv to load in data
# you pass the path in. you can use relative paths from here
df = pd.read_csv('../data/pokemon_data.csv')

df.head(2)
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False

Reading in Text Files

In order to read in txt files, you still use the pd.read_csv() command but you need to specify a delimiter. Some common delimiters are:

# loading in other file types

# you need to specify the delimiter
txt = pd.read_csv('pokemon_data.txt', sep = '\t')
txt.head(2)
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False

Reading in Excel Files

This is very easy compared to R but there is still some trickiness.

# if you just want to read in the first sheet
xlsx = pd.read_excel('pokemon_data.xlsx')
xlsx.tail(2)
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
798 720.0 HoopaHoopa Unbound Psychic Dark 80.0 160.0 60.0 170.0 130.0 80.0 6.0 True
799 721.0 Volcanion Fire Water 80.0 110.0 120.0 130.0 90.0 70.0 6.0 True

The following will let you get the sheet by the index:

xls = pd.ExcelFile('pokemon_data.xlsx')
df1 = xls.parse(0)
df2 = xls.parse(1)

df2.tail(2)
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
5 5.0 Charmeleon Fire NaN 58.0 64.0 58.0 80.0 65.0 80.0 1.0 False
6 6.0 Charizard Fire Flying 78.0 84.0 78.0 109.0 85.0 100.0 1.0 False

The following will let you get the sheet by name:

xls = pd.ExcelFile('pokemon_data.xlsx')

# the sheet name is case sensitive
df1 = pd.read_excel(xls, 'sheet1')
df2 = pd.read_excel(xls, 'sheet2')

df2.tail(2)
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
5 5.0 Charmeleon Fire NaN 58.0 64.0 58.0 80.0 65.0 80.0 1.0 False
6 6.0 Charizard Fire Flying 78.0 84.0 78.0 109.0 85.0 100.0 1.0 False

Saving the Data

It is quite easy to save as a csv. Don't forget to set the index = False.

df.to_csv("df.csv", index = False)

Saving to Excel

There are a couple more options when saving something as an excel. You can specify the sheet name and the row & column start.

df.to_excel("df.xlsx", index = False, sheet_name = "test", startrow=2, startcol=1)

Saving to Text File

df.to_csv("df.txt", sep = '\t')