Importing and Saving Data
This is part two of my tutorial on learning python from the perspective of an R user. In this lesson, I will cover how to load in and save data using Pandas.
I will have a quick list of functions and their R equivalent on top.
For reading in data:
pd.read_csv()
=read.csv()
pd.read_excel()
=read.xlsx()
for the first sheet
For saving data:
df.to_csv()
=write.csv()
write.xlsx()
=df.to_excel("df.xlsx", index = False, sheet_name = "name", startrow=0, startcol=0)
Reading in CSVs
# import loads in libraries
import pandas as pd
# read_csv to load in data
# you pass the path in. you can use relative paths from here
df = pd.read_csv('../data/pokemon_data.csv')
df.head(2)
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
Reading in Text Files
In order to read in txt files, you still use the pd.read_csv()
command but you need to specify a delimiter. Some common delimiters are:
- For semi colons:
pd.read_csv('df.txt', sep = ';')
- For tabs:
pd.read_csv('df.txt', sep = '\t')
- For bars:
pd.read_csv('df.txt', sep = '|')
# loading in other file types
# you need to specify the delimiter
txt = pd.read_csv('pokemon_data.txt', sep = '\t')
txt.head(2)
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
Reading in Excel Files
This is very easy compared to R but there is still some trickiness.
# if you just want to read in the first sheet
xlsx = pd.read_excel('pokemon_data.xlsx')
xlsx.tail(2)
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
798 | 720.0 | HoopaHoopa Unbound | Psychic | Dark | 80.0 | 160.0 | 60.0 | 170.0 | 130.0 | 80.0 | 6.0 | True |
799 | 721.0 | Volcanion | Fire | Water | 80.0 | 110.0 | 120.0 | 130.0 | 90.0 | 70.0 | 6.0 | True |
The following will let you get the sheet by the index:
xls = pd.ExcelFile('pokemon_data.xlsx')
df1 = xls.parse(0)
df2 = xls.parse(1)
df2.tail(2)
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 5.0 | Charmeleon | Fire | NaN | 58.0 | 64.0 | 58.0 | 80.0 | 65.0 | 80.0 | 1.0 | False |
6 | 6.0 | Charizard | Fire | Flying | 78.0 | 84.0 | 78.0 | 109.0 | 85.0 | 100.0 | 1.0 | False |
The following will let you get the sheet by name:
xls = pd.ExcelFile('pokemon_data.xlsx')
# the sheet name is case sensitive
df1 = pd.read_excel(xls, 'sheet1')
df2 = pd.read_excel(xls, 'sheet2')
df2.tail(2)
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 5.0 | Charmeleon | Fire | NaN | 58.0 | 64.0 | 58.0 | 80.0 | 65.0 | 80.0 | 1.0 | False |
6 | 6.0 | Charizard | Fire | Flying | 78.0 | 84.0 | 78.0 | 109.0 | 85.0 | 100.0 | 1.0 | False |
Saving the Data
It is quite easy to save as a csv. Don't forget to set the index = False.
df.to_csv("df.csv", index = False)
Saving to Excel
There are a couple more options when saving something as an excel. You can specify the sheet name and the row & column start.
df.to_excel("df.xlsx", index = False, sheet_name = "test", startrow=2, startcol=1)
Saving to Text File
df.to_csv("df.txt", sep = '\t')