Simplest CDAP pipeline config
Table of Contents
- 1. Introduction
- 2. The source: an Excel spreadsheet
- 3. Trial and error: the success rate is only 25%
- 4. Config source as en .xlsx file
- 5. Config sink using /tmp as the fake hdfs directory
- 6. Config sink using a real local hadoop installation
- 7. The fake hdfs sink is just a file in my /tmp dir
- 8. The real hdfs sink is accessible using hadoop commmands
1 Introduction
This simplest pipeline is a pass through from source to sink with no transformations in the middle and yet it took a lot of trial and error to get it working. Better strike while the iron is hot.
2 The source: an Excel spreadsheet
Since I don't have Microsoft office suite on my Mac, I used Apple's Numbers app to export a table I hacked together as an excel file.
3 Trial and error: the success rate is only 25%
4 Config source as en .xlsx file
The sheet number is zero! I get errors if I used 1.
5 Config sink using /tmp as the fake hdfs directory
6 Config sink using a real local hadoop installation
7 The fake hdfs sink is just a file in my /tmp dir
/tmp/three: total used in directory 24 available 4293368400 drwxr-xr-x 6 gug wheel 204 Oct 15 08:00 . drwxrwxrwt 33 root wheel 1122 Oct 15 08:00 .. -rw-r--r-- 1 gug wheel 8 Oct 15 08:00 ._SUCCESS.crc -rw-r--r-- 1 gug wheel 12 Oct 15 08:00 .part-m-00000.crc -rw-r--r-- 1 gug wheel 0 Oct 15 08:00 _SUCCESS -rw-r--r-- 1 gug wheel 132 Oct 15 08:00 part-m-00000
8 The real hdfs sink is accessible using hadoop commmands
bash-3.2$ bin/hdfs dfs -ls /user/gug/nine 17/10/15 08:58:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 3 gug supergroup 0 2017-10-15 08:56 /user/gug/nine/_SUCCESS -rw-r--r-- 3 gug supergroup 132 2017-10-15 08:56 /user/gug/nine/part-m-00000 bash-3.2$ bin/hdfs dfs -cat /user/gug/nine 17/10/15 08:59:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable cat: `/user/gug/nine': Is a directory bash-3.2$ bin/hdfs dfs -cat /user/gug/nine/* 17/10/15 08:59:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable test.xlsx,Sheet 1,0.0 test.xlsx,Sheet 1,1.0 test.xlsx,Sheet 1,2.0 test.xlsx,Sheet 1,3.0 test.xlsx,Sheet 1,4.0 test.xlsx,Sheet 1,5.0 bash-3.2$