プログラミング

pandasでCSV/TSVを読み込む(read_csv)

pandasでCSV/TSVファイルをもとにDataFrameを作成する方法をまとめます。

pandasでCSVを読み込む方法

pandasにはCSVの読み込みのために pandas.read_csv が用意されています。公式のドキュメントは以下のページです。

pandas.read_csv ではオプションの指定によりヘッダー有無の指定やヘッダーの行位置の変更、インデックス列の指定など、CSVのデータ構造に応じたDataFrameへの変換が柔軟に行えます。セパレータを変更すればTSVの読み込みも可能です。

実践:pandasでCSVを読み込んでみる

以下のJupyter Notebookにサンプルコードを作成しています。

In [1]:
import pandas as pd
In [2]:
# ヘッダーありcsvを読み込む
df_csv = pd.read_csv("input_files/sample_first_line_header.csv")
print(df_csv)

# input csv:
# "no","title","text","class"
# 1,"one","text-one","class-first"
# 2,"two","text-two","class-second"
# 3,"three","text-three","class-third"

# display:
#    no  title        text         class
# 0   1    one    text-one   class-first
# 1   2    two    text-two  class-second
# 2   3  three  text-three   class-third
   no  title        text         class
0   1    one    text-one   class-first
1   2    two    text-two  class-second
2   3  three  text-three   class-third
In [3]:
# ヘッダーありcsvを読み込む時、2行目をヘッダーに指定する
df_csv = pd.read_csv("input_files/sample_second_line_header.csv", header=1)
print(df_csv)

# input csv:
# "hoge","hoge","hoge","hoge"
# "no","title","text","class"
# 1,"one","text-one","class-first"
# 2,"two","text-two","class-second"
# 3,"three","text-three","class-third"

# display:
#    no  title        text         class
# 0   1    one    text-one   class-first
# 1   2    two    text-two  class-second
# 2   3  three  text-three   class-third
   no  title        text         class
0   1    one    text-one   class-first
1   2    two    text-two  class-second
2   3  three  text-three   class-third
In [4]:
# ヘッダーなしcsvを読み込む
df_csv = pd.read_csv("input_files/sample_no_header.csv", header=None)
print(df_csv)

# input csv:
# 1,"one","text-one","class-first"
# 2,"two","text-two","class-second"
# 3,"three","text-three","class-third"

# display:
#    0      1           2             3
# 0  1    one    text-one   class-first
# 1  2    two    text-two  class-second
# 2  3  three  text-three   class-third
   0      1           2             3
0  1    one    text-one   class-first
1  2    two    text-two  class-second
2  3  three  text-three   class-third
In [5]:
# インデックスを指定してcsvを読み込む
df_csv = pd.read_csv("input_files/sample_first_line_header.csv", index_col=0)
print(df_csv)

# input csv:
# "no","title","text","class"
# 1,"one","text-one","class-first"
# 2,"two","text-two","class-second"
# 3,"three","text-three","class-third"

# display:
#     title        text         class
# no                                 
# 1     one    text-one   class-first
# 2     two    text-two  class-second
# 3   three  text-three   class-third
    title        text         class
no                                 
1     one    text-one   class-first
2     two    text-two  class-second
3   three  text-three   class-third
In [6]:
# csvの先頭の指定行を読み込む
df_csv = pd.read_csv("input_files/sample_first_line_header.csv", nrows=2)
print(df_csv)

# input csv:
# "no","title","text","class"
# 1,"one","text-one","class-first"
# 2,"two","text-two","class-second"
# 3,"three","text-three","class-third"

# display:
#    no title      text         class
# 0   1   one  text-one   class-first
# 1   2   two  text-two  class-second
   no title      text         class
0   1   one  text-one   class-first
1   2   two  text-two  class-second
In [7]:
# tsvを読み込む(セパレータを変更する)
df_csv = pd.read_csv("input_files/sample_first_line_header.tsv", sep="\t")
print(df_csv)

# input csv:
# "no"	"title"	"text"	"class"
# 1	"one"	"text-one"	"class-first"
# 2	"two"	"text-two"	"class-second"
# 3	"three"	"text-three"	"class-third"

# display:
#    no  title        text         class
# 0   1    one    text-one   class-first
# 1   2    two    text-two  class-second
# 2   3  three  text-three   class-third
   no  title        text         class
0   1    one    text-one   class-first
1   2    two    text-two  class-second
2   3  three  text-three   class-third

-プログラミング

© 2021 いちたどん.com