ビットの海

ゆるふわソフトウェアエンジニアしゃぜのブログ

Embulk で Hello World 的なもの(2023年版)

Embulk コマンドのセットアップ

これみて(最新バージョンに読み替えよう)

shase428.hatenablog.jp

Bundle 環境のセットアップ

## サンプルのディレクトリの作成
$ mkdir embulk_sample
## プロジェクト用のbundleを作成
$ embulk mkbundle bundle

2023-11-20 15:24:23.759 +0900: Embulk v0.9.23
Initializing bundle...
  Creating Gemfile
  Creating .bundle/config
  Creating embulk/input/example.rb
  Creating embulk/output/example.rb
  Creating embulk/filter/example.rb

bundle/Gemfileを以下のように書き換える

source 'https://rubygems.org/'
gem 'embulk', '< 0.10'
gem 'embulk-input-command'
## bundle install する
$ cd bundle

$ embulk bundle install --path=vendor/bundle
2023-11-20 15:28:10.995 +0900: Embulk v0.9.23
Fetching gem metadata from https://rubygems.org/........
Fetching gem metadata from https://rubygems.org/.
Resolving dependencies...
Using bundler 1.16.0
Fetching msgpack 1.4.1 (java)
Installing msgpack 1.4.1 (java)
Fetching embulk 0.11.1 (java)
Installing embulk 0.11.1 (java)
Fetching embulk-input-command 0.1.4
Installing embulk-input-command 0.1.4
Bundle complete! 2 Gemfile dependencies, 4 gems now installed.
Bundled gems are installed into `./vendor/bundle`

## 確認
$ embulk bundle list
2023-11-20 15:29:24.988 +0900: Embulk v0.9.23
Gems included by the bundle:
  * bundler (1.16.0)
  * embulk (0.11.1)
  * embulk-input-command (0.1.4)
  * msgpack (1.4.1)

config.yml の作成

## config.yml 作成
$ cd ..
$ vim config.yml

config.yml の中身

in:
  type: command
  command: echo "a,b" && echo "1,2" && echo "10,11"
  parser:
    charset: UTF-8
    newline: LF
    type: csv
    delimiter: ','
    columns:
      - {name: a, type: long}
      - {name: b, type: long}

out:
  type: stdout

preview & run

 $ embulk preview -b bundle config.yml
2023-11-20 15:44:48.731 +0900: Embulk v0.9.23
2023-11-20 15:44:49.287 +0900 [WARN] (main): DEPRECATION: JRuby org.jruby.embed.ScriptingContainer is directly injected.
2023-11-20 15:44:51.113 +0900 [INFO] (main): BUNDLE_GEMFILE is being set: "/Users/foobar/tmp/embulk_sample/bundle/Gemfile"
2023-11-20 15:44:51.114 +0900 [INFO] (main): Gem's home and path are being cleared.
2023-11-20 15:44:52.949 +0900 [INFO] (main): Started Embulk v0.9.23
2023-11-20 15:44:53.063 +0900 [INFO] (0001:preview): Loaded plugin embulk-input-command (0.1.4)
2023-11-20 15:44:53.101 +0900 [INFO] (0001:preview): Try to read 32,768 bytes from input source
2023-11-20 15:44:53.107 +0900 [INFO] (0001:preview): Running command [sh, -c, echo "a,b" && echo "1,2" && echo "10,11"]
2023-11-20 15:44:53.139 +0900 [INFO] (0001:preview): Running command [sh, -c, echo "a,b" && echo "1,2" && echo "10,11"]
2023-11-20 15:44:53.204 +0900 [WARN] (0001:preview): Skipped line -:1 (java.lang.NumberFormatException: For input string: "a"): a,b
+--------+--------+
| a:long | b:long |
+--------+--------+
|      1 |      2 |
|     10 |     11 |
+--------+--------+

$ embulk run -b bundle config.yml
2023-11-20 15:45:32.403 +0900: Embulk v0.9.23
2023-11-20 15:45:33.306 +0900 [WARN] (main): DEPRECATION: JRuby org.jruby.embed.ScriptingContainer is directly injected.
2023-11-20 15:45:36.532 +0900 [INFO] (main): BUNDLE_GEMFILE is being set: "/Users/foobar/tmp/embulk_sample/bundle/Gemfile"
2023-11-20 15:45:36.538 +0900 [INFO] (main): Gem's home and path are being cleared.
2023-11-20 15:45:40.841 +0900 [INFO] (main): Started Embulk v0.9.23
2023-11-20 15:45:41.039 +0900 [INFO] (0001:transaction): Loaded plugin embulk-input-command (0.1.4)
2023-11-20 15:45:41.155 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=20 / output tasks 10 = input tasks 1 * 10
2023-11-20 15:45:41.161 +0900 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
2023-11-20 15:45:41.194 +0900 [INFO] (0015:task-0000): Running command [sh, -c, echo "a,b" && echo "1,2" && echo "10,11"]
2023-11-20 15:45:41.262 +0900 [WARN] (0015:task-0000): Skipped line -:1 (java.lang.NumberFormatException: For input string: "a"): a,b
1,2
10,11
2023-11-20 15:45:41.271 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2023-11-20 15:45:41.290 +0900 [INFO] (main): Committed.
2023-11-20 15:45:41.290 +0900 [INFO] (main): Next config diff: {"in":{},"out":{}}

ファイル構成

$ tree -L 2
.
├── bundle
│   ├── Gemfile
│   ├── Gemfile.lock
│   ├── embulk
│   └── vendor
└── config.yml