Simplest CDAP pipeline config

Table of Contents

1 Introduction

This simplest pipeline is a pass through from source to sink with no transformations in the middle and yet it took a lot of trial and error to get it working. Better strike while the iron is hot.

2 The source: an Excel spreadsheet

Since I don't have Microsoft office suite on my Mac, I used Apple's Numbers app to export a table I hacked together as an excel file.

3 Trial and error: the success rate is only 25%

trial-and-error.png

4 Config source as en .xlsx file

The sheet number is zero! I get errors if I used 1.

cfg-src-xlsx-0.png

5 Config sink using /tmp as the fake hdfs directory

cfg-sink-fake-hdfs.png

6 Config sink using a real local hadoop installation

cfg-sink-real-hdfs.png

7 The fake hdfs sink is just a file in my /tmp dir

/tmp/three:
total used in directory 24 available 4293368400
drwxr-xr-x   6 gug   wheel   204 Oct 15 08:00 .
drwxrwxrwt  33 root  wheel  1122 Oct 15 08:00 ..
-rw-r--r--   1 gug   wheel     8 Oct 15 08:00 ._SUCCESS.crc
-rw-r--r--   1 gug   wheel    12 Oct 15 08:00 .part-m-00000.crc
-rw-r--r--   1 gug   wheel     0 Oct 15 08:00 _SUCCESS
-rw-r--r--   1 gug   wheel   132 Oct 15 08:00 part-m-00000

8 The real hdfs sink is accessible using hadoop commmands

bash-3.2$ bin/hdfs dfs -ls /user/gug/nine
17/10/15 08:58:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   3 gug supergroup          0 2017-10-15 08:56 /user/gug/nine/_SUCCESS
-rw-r--r--   3 gug supergroup        132 2017-10-15 08:56 /user/gug/nine/part-m-00000

bash-3.2$ bin/hdfs dfs -cat /user/gug/nine
17/10/15 08:59:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
cat: `/user/gug/nine': Is a directory

bash-3.2$ bin/hdfs dfs -cat /user/gug/nine/*
17/10/15 08:59:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
test.xlsx,Sheet 1,0.0
test.xlsx,Sheet 1,1.0
test.xlsx,Sheet 1,2.0
test.xlsx,Sheet 1,3.0
test.xlsx,Sheet 1,4.0
test.xlsx,Sheet 1,5.0
bash-3.2$

Author: Gary Gu

Created: 2017-10-15 Sun 19:30

Emacs 25.2.1 (Org mode 8.2.10)

Validate