「boost::tokenizerでCSVを読み込む」の版間の差分

2013年3月23日 (土) 16:31時点における最新版

C++でCSVの行を扱うには、boost::tokenizerを使うことで、行を比較的簡単にパースできます。ここでは、boost::tokenizerを利用し、CSVファイルを読み込み、パースを行います。

概要

CSVファイルを扱うためには、CSVファイルのエントリをカンマやタブで分割しなければなりません。ここでは、CSVデータのパースやCSVファイルの読み込みの例を示します。

boost::tokenizer の基本的な使い方については、 boost::tokenizer をご参照ください。

カンマ区切りのCSVを扱う

ソースコード

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
#include <boost/foreach.hpp>
 
typedef boost::tokenizer< boost::escaped_list_separator<char> >
BOOST_TOKENIZER_ESCAPED_LIST;
 
void dump(BOOST_TOKENIZER_ESCAPED_LIST &tokens)
{
        BOOST_FOREACH(std::string s, tokens) {
                std::cout << "<" << s << "> ";
        }
        std::cout << std::endl;
}
 
int
main(int argc, char const* argv[])
{
        std::string str1("2013,\"foo\",,field 4");
 
        BOOST_TOKENIZER_ESCAPED_LIST tokens1(str1);
        dump(tokens1);
 
        return 0;
}

コンパイル

clang++ -I/usr/local/include escape_list_separator_1.cpp -o escape_list_separator_1

実行例

$ ./escape_list_separator_1
<2013> <foo> <> <field 4>

タブ区切りのCSVを扱う

CSV というよりは、TSV です。タブで区切られたCSVを扱うためには、 boost:escaped_list_separator<char> にタブを指定し、boost::tokenizerの第２引数に指定します。

ソースコード

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
#include <boost/foreach.hpp>
 
typedef boost::escaped_list_separator<char>    
BOOST_ESCAPED_LIST_SEP;
typedef boost::tokenizer< boost::escaped_list_separator<char> >
BOOST_TOKENIZER_ESCAPED_LIST;
 
void dump(BOOST_TOKENIZER_ESCAPED_LIST &tokens)
{
        BOOST_FOREACH(std::string s, tokens) {
                std::cout << "<" << s << "> ";
        }
        std::cout << std::endl;
}
 
int
main(int argc, char const* argv[])
{
        std::string str2("2014\t\"bar\"\t\tfield 4");
 
        BOOST_ESCAPED_LIST_SEP sep_tab('\\', '\t', '\"');
 
        BOOST_TOKENIZER_ESCAPED_LIST tokens2(str2, sep_tab);
        dump(tokens2);
 
        return 0;
}

コンパイル

clang++ -I/usr/local/include escape_list_separator_1.cpp -o escape_list_separator_1

実行例

$ ./escape_list_separator_1
<2014> <bar> <> <field 4>

カンマ区切りのCSVファイルを読み込む

CSVファイル test.csv

2012,"foo",,field 4
2013,"bar",,field 4

ソースコード boost_read_csv.cpp

#include <iostream>
#include <string>
#include <fstream>
#include <boost/tokenizer.hpp>
#include <boost/foreach.hpp>
 
typedef boost::escaped_list_separator<char>    
BOOST_ESCAPED_LIST_SEP;
typedef boost::tokenizer< boost::escaped_list_separator<char> >
BOOST_TOKENIZER_ESCAPED_LIST;
 
void dump(BOOST_TOKENIZER_ESCAPED_LIST &tokens)
{
        BOOST_FOREACH(std::string s, tokens) {
                std::cout << "<" << s << "> ";
        }
        std::cout << std::endl;
}
 
int
main(int argc, char const* argv[])
{
 
        std::ifstream   ifs;
        std::string     csv_file_path ("test.csv");
 
        try {
                ifs.open(csv_file_path.c_str() );
                std::string     line;
 
                while (getline(ifs, line) ) {
                        BOOST_TOKENIZER_ESCAPED_LIST tokens1(line);
                        dump(tokens1);
                }
                ifs.close();
        } catch (std::exception &ex) {
                std::cerr << ex.what() << std::endl;
        }
        return 0;
}

コンパイル

clang++ -I/usr/local/include boost_read_csv.cpp -o boost_read_csv

実行例

$ ./boost_read_csv
<2013> <foo> <> <field 4>
<2014> <bar> <> <field 4>

タブ区切りのTSVファイルを読み込む

タブ区切りのTSV(CSV)ファイルを読み込む場合には、以下のようにコードを変更するだけです。

BOOST_ESCAPED_LIST_SEP sep_tab('\\', '\t', '\"');
BOOST_TOKENIZER_ESCAPED_LIST tokens1(line, sep_tab);

int
main(int argc, char const* argv[])
{
 
        std::ifstream   ifs;
        std::string     csv_file_path ("test.csv");
 
        try {
                ifs.open(csv_file_path.c_str() );
                std::string     line;
 
                while (getline(ifs, line) ) {
			BOOST_ESCAPED_LIST_SEP sep_tab('\\', '\t', '\"');
			BOOST_TOKENIZER_ESCAPED_LIST tokens1(line, sep_tab);
                        dump(tokens1);
                }
                ifs.close();
        } catch (std::exception &ex) {
                std::cerr << ex.what() << std::endl;
        }
        return 0;
}

@@ 行3: / 行3: @@
 -->
-[[C++]]でCSVの行を扱うには、[[boost tokenizer]]を使うことで、行を比較的簡単にパースできます。ここでは、[[boost tokenizer]]を利用し、CSVファイルを読み込み、パースを行います。
+[[C++]]でCSVの行を扱うには、[[boost::tokenizer]]を使うことで、行を比較的簡単にパースできます。ここでは、[[boost::tokenizer]]を利用し、CSVファイルを読み込み、パースを行います。
 __TOC__
@@ 行12: / 行12: @@
 ここでは、CSVデータのパースやCSVファイルの読み込みの例を示します。
-[[boost tokenizer]] の基本的な使い方については、 [[boost tokenizer]] をご参照ください。
+[[boost::tokenizer]] の基本的な使い方については、 [[boost::tokenizer]] をご参照ください。
 == カンマ区切りのCSVを扱う ==
@@ 行217: / 行217: @@
 == 関連項目 ==
-* [[Boost tokenizer]]
+* [[boost::tokenizer]]
-* [[Boost split]]
+* [[boost::split]]
-* [[Boost trim]]
+* [[boost::trim]]
 * [[C++ライブラリ]]

「boost::tokenizerでCSVを読み込む」の版間の差分

2013年3月23日 (土) 16:31時点における最新版

目次

概要

カンマ区切りのCSVを扱う

ソースコード

コンパイル

実行例

タブ区切りのCSVを扱う

ソースコード

コンパイル

実行例

カンマ区切りのCSVファイルを読み込む

CSVファイル test.csv

ソースコード boost_read_csv.cpp

コンパイル

実行例

タブ区切りのTSVファイルを読み込む

関連項目

案内メニュー

個人用ツール

名前空間

変種

表示

その他

検索

案内

ツール

SponsoredLink