pandasでCSV/TSVファイルをもとにDataFrameを作成する方法をまとめます。
pandasでCSVを読み込む方法
pandasにはCSVの読み込みのために pandas.read_csv
が用意されています。公式のドキュメントは以下のページです。
pandas.read_csv
ではオプションの指定によりヘッダー有無の指定やヘッダーの行位置の変更、インデックス列の指定など、CSVのデータ構造に応じたDataFrameへの変換が柔軟に行えます。セパレータを変更すればTSVの読み込みも可能です。
実践:pandasでCSVを読み込んでみる
以下のJupyter Notebookにサンプルコードを作成しています。
In [1]:
import pandas as pd
In [2]:
# ヘッダーありcsvを読み込む
df_csv = pd.read_csv("input_files/sample_first_line_header.csv")
print(df_csv)
# input csv:
# "no","title","text","class"
# 1,"one","text-one","class-first"
# 2,"two","text-two","class-second"
# 3,"three","text-three","class-third"
# display:
# no title text class
# 0 1 one text-one class-first
# 1 2 two text-two class-second
# 2 3 three text-three class-third
In [3]:
# ヘッダーありcsvを読み込む時、2行目をヘッダーに指定する
df_csv = pd.read_csv("input_files/sample_second_line_header.csv", header=1)
print(df_csv)
# input csv:
# "hoge","hoge","hoge","hoge"
# "no","title","text","class"
# 1,"one","text-one","class-first"
# 2,"two","text-two","class-second"
# 3,"three","text-three","class-third"
# display:
# no title text class
# 0 1 one text-one class-first
# 1 2 two text-two class-second
# 2 3 three text-three class-third
In [4]:
# ヘッダーなしcsvを読み込む
df_csv = pd.read_csv("input_files/sample_no_header.csv", header=None)
print(df_csv)
# input csv:
# 1,"one","text-one","class-first"
# 2,"two","text-two","class-second"
# 3,"three","text-three","class-third"
# display:
# 0 1 2 3
# 0 1 one text-one class-first
# 1 2 two text-two class-second
# 2 3 three text-three class-third
In [5]:
# インデックスを指定してcsvを読み込む
df_csv = pd.read_csv("input_files/sample_first_line_header.csv", index_col=0)
print(df_csv)
# input csv:
# "no","title","text","class"
# 1,"one","text-one","class-first"
# 2,"two","text-two","class-second"
# 3,"three","text-three","class-third"
# display:
# title text class
# no
# 1 one text-one class-first
# 2 two text-two class-second
# 3 three text-three class-third
In [6]:
# csvの先頭の指定行を読み込む
df_csv = pd.read_csv("input_files/sample_first_line_header.csv", nrows=2)
print(df_csv)
# input csv:
# "no","title","text","class"
# 1,"one","text-one","class-first"
# 2,"two","text-two","class-second"
# 3,"three","text-three","class-third"
# display:
# no title text class
# 0 1 one text-one class-first
# 1 2 two text-two class-second
In [7]:
# tsvを読み込む(セパレータを変更する)
df_csv = pd.read_csv("input_files/sample_first_line_header.tsv", sep="\t")
print(df_csv)
# input csv:
# "no" "title" "text" "class"
# 1 "one" "text-one" "class-first"
# 2 "two" "text-two" "class-second"
# 3 "three" "text-three" "class-third"
# display:
# no title text class
# 0 1 one text-one class-first
# 1 2 two text-two class-second
# 2 3 three text-three class-third