HW 엔지니어를 위한 Deep Learning: 4월 2018

2018년 4월 17일 화요일

import ibm_db 시의 "libibmc++.so.1: cannot open shared object file" error 해결 방법

앞선 posting에서 정리한 IBM DB2 client 설치 후 "import ibm_db"를 할 때 다음과 같은 error를 겪을 수 있습니다.

user01@lcia-minsky01:~$ python3
>>> import ibm_db
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: libibmc++.so.1: cannot open shared object file: No such file or directory

이 "libibmc++.so.1: cannot open shared object file: No such file or directory" error는 pip install과는 무관한 것으로서, 서버에 IBM XL C/C++ compiler가 설치되어 있지 않기 때문에 발생하는 것입니다. (아래 link 참조)

https://www-01.ibm.com/support/docview.wss?uid=swg21249819

문제는 이 XL C/C++ compiler가 유료 SW라는 점입니다. 다행인 점은 이 SW의 무료 Community Edition도 있고 그 download link도 (아래 link 참조) 있다는 것이고요.

아래 link에 따라 진행하면 됩니다.

https://www.ibm.com/developerworks/community/blogs/572f1638-121d-4788-8bbb-c4529577ba7d/entry/Download_XL_Fortran_for_Linux_V15_1_6_Community_Edition_a_no_charge_fully_functional_unlimited_production_use_compiler?lang=en

http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/README.html

먼저 아래와 같이 IBM site인 http://public.dhe.ibm.com에 대해 trusted key를 업데이트 해줍니다. 이걸 해주면 /etc/apt/trusted.gpg key 값이 갱신됩니다.

minsky@minsky:~$ wget -q http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/ubuntu/public.gpg -O- | sudo apt-key add -
OK

minsky@minsky:~$ ls -l /etc/apt/trusted.gpg
-rw-r--r-- 1 root root 20740 Apr 16 21:55 /etc/apt/trusted.gpg

다만, 위에서 보시는 바와 같이 이는 sudo 권한이 필요하며, 또 회사 정책에 따라 이런 작업을 못하게 되어 있을 수도 있습니다. 그런 경우 좌절하지 마시고 위 과정은 그냥 pass 하셔도 됩니다. 이를 안 해주면 나중에 apt-get update를 해줄 때마다 http://public.dhe.ibm.com에 대해서 "The repository is not signed"라는 warning이 귀찮게 나오지만, 그렇다고 설치가 안 되는 것은 아닙니다.

아래와 같이 apt repository를 새로 만듭니다.

minsky@minsky:~$ cd /etc/apt/sources.list.d/

minsky@minsky:/etc/apt/sources.list.d$ sudo vi xlc.list
deb [arch=ppc64el] http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/ubuntu xenial main

그리고 apt repository를 갱신해줍니다. 아까 말씀드렸듯이, 만약 위의 trusted key를 갱신하지 않으면 아래처럼 warning이 나오는데, 무시하셔도 됩니다.

minsky@minsky:/etc/apt/sources.list.d$ sudo apt-get update
...
W: GPG error: http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/ubuntu xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 4B19F6F50761C815
W: The repository 'http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/ubuntu xenial InRelease' is not signed.
N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use.

이제 아래와 같이 XLC compiler Community Edition을 설치합니다. 역시 trusted key를 갱신하지 않으면 아래처럼 warning이 나오는데 무시하시고 Y를 눌러 진행하십시요.

minsky@minsky:/etc/apt/sources.list.d$ sudo apt-get install xlc.13.1.6 libxlc-devel.13.1.6 xlc-license-community.13.1.6
...
WARNING: The following packages cannot be authenticated!
libxlc libxlc-devel.13.1.6 libxlmass-devel.8.1.6 libxlsmp libxlsmp-devel.4.1.6
xlc-license-community.13.1.6 xlc.13.1.6
Install these packages without verification? [y/N] y
...
Setting up libxlc (13.1.6.1-171213) ...
Setting up libxlc-devel.13.1.6 (13.1.6.1-171213) ...
Setting up libxlmass-devel.8.1.6 (8.1.6.0-171201) ...
Setting up libxlsmp (4.1.6.1-171213) ...
Setting up libxlsmp-devel.4.1.6 (4.1.6.1-171213) ...
Setting up xlc-license-community.13.1.6 (13.1.6.1-171213) ...
Note: XL C/C++ Community Edition is a no-charge product and does not include official IBM support.
You can provide feedback at the XL on POWER C/C++ Community Edition forum (http://ibm.biz/xlcpp-linux-ce).
For information about a fully supported XL C/C++ compiler,
visit XL C/C++ for Linux (http://ibm.biz/xlcpp-linux).
Setting up xlc.13.1.6 (13.1.6.1-171213) ...
Run 'sudo /opt/ibm/xlC/13.1.6/bin/xlc_configure' to review the license and configure the compiler.

이제 설치가 끝났습니다. 설치되는 directory는 /opt/ibm 입니다. 아래와 같이 찾아보면 /opt/ibm/lib/libibmc++.so.1 이 존재하는 것을 보실 수 있습니다.

minsky@minsky:~$ sudo find /opt -name libibmc++.so.1
/opt/ibm/lib/profiled/libibmc++.so.1
/opt/ibm/lib/libibmc++.so.1

이제 다음과 같이 ibm_db를 import 할 때 아무 문제가 없는 것을 보실 수 있습니다.

minsky@minsky:~$ python
Python 3.6.4 |Anaconda, Inc.| (default, Feb 11 2018, 08:19:13)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ibm_db
>>>

** 아마 별도로 LD_LIBRARY_PATH를 안 줘도 될텐데 (저같은 경우는 안 줘도 되더군요), 혹시 error가 나면 LD_LIBRARY_PATH에 위 directory를 첨가해주시면 됩니다.

$ export LD_LIBRARY_PATH=/opt/ibm/lib:$LD_LIBRARY_PATH

** 이 XLC/C++ compiler는 어디까지나 Community Edition으로서, 별도의 기술/장애 지원이 안 되는 무료 SW라는 점을 유의하시기 바랍니다.

2018년 4월 16일 월요일

ppc64le Ubuntu python3 환경에서 IBM DB2 client setup 방법

결론부터 말씀드리면, 아래 Google Drive에 올려놓은 ibm_db-2.0.8.tar.gz를 download 받으셔서 아래와 같이 설치하시면 됩니다.

$ pip install --no-index --find-links=file://"파일올려놓은디렉토리" ibm_db==2.0.8

https://drive.google.com/open?id=1ogehW-IkZJHF7emG8Db2XzRRNlznJOwi

좀더 자세한 설명은 아래와 같습니다.

기본적으로 python3 환경에서 IBM DB2 client의 setup하는 방법은 아래를 따르시면 됩니다.

https://www.ibm.com/developerworks/community/blogs/96960515-2ea1-4391-8170-b0515d08e4da/entry/Installing_Python_ibm_db_driver?lang=en

그런데 제가 해보니 다음과 같이 이런저런 error가 납니다. (Python2에서는 잘 됩니다만, python3에서만 error가 나네요.)

minsky@minsky:~$ pip install ibm_db
Collecting ibm_db
Using cached ibm_db-2.0.8.tar.gz
Building wheels for collected packages: ibm-db
...
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-b5m2e82l/ibm-db/setup.py", line 17, in <module>
os.rename('tests','test_2')
OSError: [Errno 39] Directory not empty: 'tests' -> 'test_2'

----------------------------------------
Command "/home/minsky/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-b5m2e82l/ibm-db/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-3hxlsbb8/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-b5m2e82l/ibm-db/

이는 이 속에 포함된 setup.py 파일 속의 tests, test_2와 같은 일부 directory 명명이 잘못된 것 외에도, 기본적으로 DB2 client가 현재는 ppc64 (big endian)에 대해서만 공식 지원되고 ppc64le (little endian)에 대해서는 setup.py 내에 언급이 없기 때문입니다.

그러나 실제로는 ppc64_odbc_cli.tar.gz 뿐만 아니라 아래처럼 ppc64le_odbc_cli.tar.gz도 제공됩니다. 즉, ppc64le에 대해서도 설치가 가능합니다.

https://public.dhe.ibm.com/ibmdl/export/pub/software/data/db2/drivers/odbc_cli/ppc64le_odbc_cli.tar.gz

또한 이를 반영하는 임시 patch가 아래 URL에 제공되고 있습니다.

이를 이용하여, 좀더 간편하게 IBM DB2를 설치할 수 있도록 아래와 같이 ibm_db-2.0.8.tar.gz를 수정해서 올렸습니다.

먼저, PYPI repository로부터 ibm_db source package를 download 받습니다.

minsky@minsky:~$ pip download ibm_db
Collecting ibm_db
Downloading ibm_db-2.0.8.tar.gz (689kB)
100% |████████████████████████████████| 696kB 1.5MB/s
Saved ./ibm_db-2.0.8.tar.gz
Successfully downloaded ibm-db

이걸 풀어서 아래 URL에 제시된 patch 내용에 따라 setup.py를 편집합니다.

https://github.com/ibmdb/python-ibmdb/issues/292

minsky@minsky:~$ tar -zxf ibm_db-2.0.8.tar.gz

minsky@minsky:~$ cd ibm_db-2.0.8

minsky@minsky:~/ibm_db-2.0.8$ vi setup.patch
...저 위 URL 내용에 따라 편집합니다.

편집을 마친 뒤 다시 이걸 tar.gz으로 말아둡니다.

minsky@minsky:~/ibm_db-2.0.8$ cd ..

minsky@minsky:~$ tar -zcf ibm_db-2.0.8.tar.gz ibm_db-2.0.8

이제 아래와 같이 pip install로 local directory에 있는 ibm_db-2.0.8.tar.gz 파일을 이용해 install하면 됩니다.

minsky@minsky:~$ pip install --no-index --find-links=file:///home/minsky ibm_db==2.0.8
Collecting ibm_db==2.0.8
Building wheels for collected packages: ibm-db
Running setup.py bdist_wheel for ibm-db ... done
Stored in directory: /home/minsky/.cache/pip/wheels/ed/cd/b9/f3b516860b310ca7806e09e423a4ecb8d076ee023dea5d5367
Successfully built ibm-db
Installing collected packages: ibm-db
Successfully installed ibm-db-2.0.8

확인은 아래와 같이 하실 수 있습니다.

minsky@minsky:~$ pip list --format=columns | grep ibm-db
ibm-db 2.0.8

2018년 4월 12일 목요일

축약형 ILSVRC2012_img_train_t3.tar를 이용한 LMDB 포맷 파일

지난 편에 이어서 이번에는 축약형 ILSVRC2012_img_train_t3.tar를 caffe 테스트용 lmdb로 포맷하는 방법을 정리했습니다.

일단 training용 raw image는 지난 편에서 풀어놓은 ~/ilsvrc2012/raw-data/train 속의 것을 그대로 이용하면 됩니다. 그러나 validation용 dataset은 따로 속아내야 합니다.

[user1@ac922 raw-data]$ mkdir val && cd val

[user1@ac922 val]$ tar -xf ~/files/ILSVRC2012_img_val.tar

[user1@ac922 val]$ ls *.JPEG | wc -l
50000

먼저 아래와 같이 caffe에 기본 포함되어 있는 get_ilsvrc_aux.sh를 이용하여 각종 label 파일들을 download 받습니다.

[user1@ac922 ~]$ cd ~/caffe/data/ilsvrc12/

[user1@ac922 ilsvrc12]$ ./get_ilsvrc_aux.sh

Download 받은 val.txt와 train.txt 등을 아래와 같이 ILSVRC2012_img_train_t3.tar에 맞게 편집합니다.

[user1@ac922 ilsvrc12]$ head val.txt
ILSVRC2012_val_00000001.JPEG 65
ILSVRC2012_val_00000002.JPEG 970
ILSVRC2012_val_00000003.JPEG 230
ILSVRC2012_val_00000004.JPEG 809
ILSVRC2012_val_00000005.JPEG 516
ILSVRC2012_val_00000006.JPEG 57
ILSVRC2012_val_00000007.JPEG 334
ILSVRC2012_val_00000008.JPEG 415
ILSVRC2012_val_00000009.JPEG 674
ILSVRC2012_val_00000010.JPEG 332

[user1@ac922 ilsvrc12]$ wc -l val.txt
50000 val.txt

[user1@ac922 ilsvrc12]$ head train.txt
n01440764/n01440764_10026.JPEG 0
n01440764/n01440764_10027.JPEG 0
n01440764/n01440764_10029.JPEG 0
n01440764/n01440764_10040.JPEG 0
n01440764/n01440764_10042.JPEG 0
n01440764/n01440764_10043.JPEG 0
n01440764/n01440764_10048.JPEG 0
n01440764/n01440764_10066.JPEG 0
n01440764/n01440764_10074.JPEG 0
n01440764/n01440764_1009.JPEG 0

[user1@ac922 ilsvrc12]$ wc -l train.txt
1281167 train.txt

[user1@ac922 ilsvrc12]$ cat train.txt | cut -d"/" -f 1 | sort -u | wc -l
1000

[user1@ac922 ilsvrc12]$ cp train.txt train.txt.org

[user1@ac922 ilsvrc12]$ cp val.txt val.txt.org

[user1@ac922 ilsvrc12]$ for i in `ls -R ~/ilsvrc2012/raw-data/train/*`
> do
> grep ${i} train.txt >> train.txt.new
> done

위와 같이 해서 train.txt.new에 축약형 멍멍이 사진 list인 20580장의 list만 따로 뽑았습니다.

[user1@ac922 ilsvrc12]$ wc -l train.txt.new
20580 train.txt.new

[user1@ac922 ilsvrc12]$ cp train.txt.new train.txt

기타 synsets.txt나 synset_words.txt 등의 label에도 같은 작업을 해줍니다.

[user1@ac922 ilsvrc12]$ wc -l synsets.txt
1000 synsets.txt

[user1@ac922 ilsvrc12]$ head synsets.txt
n01440764
n01443537
n01484850
n01491361
n01494475
n01496331
n01498041
n01514668
n01514859
n01518878

[user1@ac922 ilsvrc12]$ cp synsets.txt synsets.txt.org

[user1@ac922 ilsvrc12]$ for i in `ls ~/ilsvrc2012/raw-data/train`
> do
> grep ${i} synsets.txt >> synsets.txt.new
> done

[user1@ac922 ilsvrc12]$ wc -l synsets.txt.new
120 synsets.txt.new

[user1@ac922 ilsvrc12]$ cp synsets.txt.new synsets.txt

[user1@ac922 ilsvrc12]$ head synset_words.txt
n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
n01491361 tiger shark, Galeocerdo cuvieri
n01494475 hammerhead, hammerhead shark
n01496331 electric ray, crampfish, numbfish, torpedo
n01498041 stingray
n01514668 cock
n01514859 hen
n01518878 ostrich, Struthio camelus

[user1@ac922 ilsvrc12]$ cp synset_words.txt synset_words.txt.org

[user1@ac922 ilsvrc12]$ wc -l synset_words.txt
1000 synset_words.txt

[user1@ac922 ilsvrc12]$ for i in `ls ~/ilsvrc2012/raw-data/train`
> do
> grep ${i} synset_words.txt >> synset_words.txt.new
> done

[user1@ac922 ilsvrc12]$ wc -l synset_words.txt.new
120 synset_words.txt.new

[user1@ac922 ilsvrc12]$ head synset_words.txt.new
n02085620 Chihuahua
n02085782 Japanese spaniel
n02085936 Maltese dog, Maltese terrier, Maltese
n02086079 Pekinese, Pekingese, Peke
n02086240 Shih-Tzu
n02086646 Blenheim spaniel
n02086910 papillon
n02087046 toy terrier
n02087394 Rhodesian ridgeback
n02088094 Afghan hound, Afghan

[user1@ac922 ilsvrc12]$ cp synset_words.txt.new synset_words.txt

이제 ~/ilsvrc2012/raw-data/val 속에 들어있는 6만장의 validation용 JPEG 파일 중에서 축약형 training dataset에 맞는 카테고리의 파일들만 걸러내는 작업을 아래와 같이 합니다.

[user1@ac922 ilsvrc12]$ cat train.txt | awk '{print $2}' | sort -u | wc -l
120

[user1@ac922 ilsvrc12]$ cat train.txt | awk '{print $2}' | sort -u > train.id

[user1@ac922 ilsvrc12]$ cat train.id | head
151
152
153
154
155
156
157
158
159
160

[user1@ac922 ilsvrc12]$ cat train.id | tail
262
263
264
265
266
267
268
273
274
275

[user1@ac922 ilsvrc12]$ sed 's/ /@/' val.txt > val.imsi

[user1@ac922 ilsvrc12]$ for i in `cat val.imsi`
> do
> j=`echo ${i} | cut -d'@' -f2`
> if [[ $j -gt 150 && $j -lt 276 ]]
> then
> echo ${i} >> val.txt.new
> fi
> done

[user1@ac922 ilsvrc12]$ sed 's/@/ /' val.txt.new > val.txt

[user1@ac922 ilsvrc12]$ wc -l val.txt
6250 val.txt

[user1@ac922 ilsvrc12]$ for i in `ls ~/ilsvrc2012/raw-data/val`
> do
> grep ${i} val.txt > /dev/null
> if [[ $? -ne 0 ]]
> then
> rm ~/ilsvrc2012/raw-data/val/${i}
> fi
> done

일부 겹치는 것이 있어서 아래와 같이 6250장이 걸러졌습니다.

[user1@ac922 ilsvrc12]$ ls ~/ilsvrc2012/raw-data/val | wc -l
6250

이제 이것으로 아래와 같이 create_imagenet.sh을 수행하여 lmdb 파일들을 만듭니다. 저희 AC922에 좀 문제가 있어서 부득이 POWER8 Minsky 서버로 옮겨서 작업을 수행했습니다.

그리고 inception v3에 맞도록 RESIZE_HEIGHT와 RESIZE_WIDTH를 default인 256 x 256 대신 320 x 320으로 바꾸어 lmdb를 생성하겠습니다.

minsky@minsky:/opt/DL/caffe-nv$ vi ./examples/imagenet/create_imagenet.sh
...
RESIZE=true
if $RESIZE; then
RESIZE_HEIGHT=320
RESIZE_WIDTH=320

minsky@minsky:/opt/DL/caffe-nv$ ./examples/imagenet/create_imagenet.sh
Creating train lmdb...
I0412 01:31:48.584789 44556 convert_imageset.cpp:83] Shuffling data
I0412 01:31:49.089481 44556 convert_imageset.cpp:86] A total of 20580 images.
I0412 01:31:49.089962 44556 db_lmdb.cpp:35] Opened lmdb examples/imagenet/ilsvrc12_train_lmdb
...
Creating val lmdb...
...
I0412 01:33:47.749966 44599 convert_imageset.cpp:150] Processed 6250 files.
Done.

그리고 이렇게 새로 만들어진 lmdb 파일들에 대해 새로 imagenet_mean.binaryproto을 만들어야 합니다. 이것도 이미 주어진 script를 이용합니다.

minsky@minsky:/opt/DL/caffe-nv$ ./examples/imagenet/make_imagenet_mean.sh
Done.

minsky@minsky:/opt/DL/caffe-nv/data/ilsvrc12$ ls -l imagenet_mean.binaryproto
-rw-rw-r-- 1 minsky minsky 1228814 Apr 13 08:51 imagenet_mean.binaryproto

이렇게 만들어진 ilsvrc12_lmdb 파일들과 각종 label 및 imagenet_mean.binaryproto가 들어있는 data/ilsvrc12 디렉토리 전체를 아래와 같은 파일 이름으로 묶어서 아래 URL의 구글 드라이브에 올려놓았습니다.

minsky@minsky:/opt/DL/caffe-nv/examples/imagenet$ ls -l *.tgz
-rw-rw-r-- 1 minsky minsky 6806226325 Apr 13 08:48 ilsvrc12_lmdb_small_320.tgz

minsky@minsky:/opt/DL/caffe-nv/data$ tar -zcf ilsvrc12_320.tgz ilsvrc12

ilsvrc12_lmdb_small_320.tgz :
https://drive.google.com/open?id=12LqkuqCChOjK9zz1ZNGJa_YVXglgPUuB

ilsvrc12_320.tgz :
https://drive.google.com/open?id=1cugEHL2zm5UuvOy0MGSFJAPFAHg-KBEP

그리고 아래 URL은 caffe로 inception v1을 수행하기 위한 script 묶음입니다.

https://drive.google.com/open?id=17M9CcZFyHIKBx3Hv4EXAWXlG7bmKd6HO

2018년 4월 11일 수요일

축약형 ILSVRC2012_img_train_t3.tar을 이용한 training dataset 준비

원래 tensorflow로 inception v3를 테스트할 때는 보통 128만장의 사진을 담은 138GB짜리 ILSVRC2012_img_train.tar를 씁니다만, 요즘 image-net.org의 서버 사정이 안 좋은지 download가 제대로 되지 않습니다. 대신 강아지들의 사진만 따로 담은 훨씬 작은 760MB 정도의 ILSVRC2012_img_train_t3.tar을 써서 동일한 테스트를 할 수 있습니다.

테스트에 필요한 파일들은 ILSVRC2012_img_train_t3.tar와 ILSVRC2012_img_val.tar, 그리고 bounding_box를 위한 xml 파일들을 담은 Annotation.tar.gz 입니다.

[user1@ac922 raw-data]$ pwd
/home/user1/ilsvrc2012/raw-data

[user1@ac922 raw-data]$ ls -l ../../files/IL*.tar
-rw-rw-r--. 1 user1 user1 762460160 Jul 4 2012 ../../files/ILSVRC2012_img_train_t3.tar
-rw-rw-r--. 1 user1 user1 6744924160 Jun 15 2012 ../../files/ILSVRC2012_img_val.tar

[user1@ac922 raw-data]$ wget http://image-net.org/Annotation/Annotation.tar.gz

다음과 같이 ILSVRC2012_img_train_t3.tar을 train directory에 풀어놓습니다. n02085620 등의 이름으로 된 sub-directory별로 사진들이 들어 있습니다.

[user1@ac922 raw-data]$ cd train

[user1@ac922 train]$ tar -xf ../../../files/ILSVRC2012_img_train_t3.tar

[user1@ac922 train]$ for i in `ls n*.tar | cut -d. -f1`
> do
> mkdir ${i}
> tar -xf ${i}.tar -C ${i}
> rm ${i}.tar
> done

원래의 ILSVRC2012_img_train.tar라면 총 1000개의 directory가 생기겠으나, 이 ILSVRC2012_img_train_t3.tar는 120개입니다.

[user1@ac922 train]$ ls -1 | wc -l
120

[user1@ac922 train]$ ls | head
n02085620
n02085782
n02085936
n02086079
n02086240
n02086646
n02086910
n02087046
n02087394
n02088094

[user1@ac922 train]$ du -sm .
753 .

6GB 정도되는 ILSVRC2012_img_val.tar는 일단 validation directory에 그냥 풀어넣습니다. 이건 총 50000장의 사진입니다.

[user1@ac922 validation]$ tar -xf ../../../files/ILSVRC2012_img_val.tar

[user1@ac922 validation]$ ls | head
ILSVRC2012_val_00000001.JPEG
ILSVRC2012_val_00000002.JPEG
ILSVRC2012_val_00000003.JPEG
ILSVRC2012_val_00000004.JPEG
ILSVRC2012_val_00000005.JPEG
ILSVRC2012_val_00000006.JPEG
ILSVRC2012_val_00000007.JPEG
ILSVRC2012_val_00000008.JPEG
ILSVRC2012_val_00000009.JPEG
ILSVRC2012_val_00000010.JPEG

[user1@ac922 validation]$ ls | wc -l
50000

[user1@ac922 validation]$ du -sm .
6496 .

이제 bounding_boxes를 위해 Annotation.tar.gz을 풀어놓습니다. 이 속에도 directory별로 tar.gz 파일들이 들어있습니다.

[user1@ac922 validation]$ cd ..

[user1@ac922 raw-data]$ tar -zxf ./Annotation.tar.gz

[user1@ac922 raw-data]$ for i in `ls n*.tar.gz`
> do
> tar -zxf ${i}
> rm ${i}
> done

이제 이 Annotation이라는 이름의 directory를 bounding_boxes라는 이름으로 바꿉니다.

[user1@ac922 raw-data]$ mv Annotation bounding_boxes

[user1@ac922 bounding_boxes]$ ls | head
n00007846
n00015388
n00017222
n00021265
n00439826
n00440039
n00440941
n00441824
n00443692
n00445351

이 속에는 총 3627개의 directory가 있으며, 그 각각마다 xml 파일들이 들어있습니다.

[user1@ac922 bounding_boxes]$ ls | wc -l
3627

[user1@ac922 bounding_boxes]$ ls n00007846 | head
n00007846_103856.xml
n00007846_104163.xml
n00007846_104414.xml
n00007846_105689.xml
n00007846_106069.xml
n00007846_107024.xml
n00007846_109518.xml
n00007846_109796.xml
n00007846_110945.xml
n00007846_111865.xml

이 xml 파일들과 raw image들을 TFRecord 포맷으로 변환하겠습니다. 거기에는 아래 github에서 제공되는 script tool과 label 들을 사용합니다.

[user1@ac922 ~]$ git clone https://github.com/tensorflow/models.git

[user1@ac922 ~]$ cd ~/models/research/inception/inception/data

이 속에 들어있는 imagenet_metadata.txt 등의 각종 label 파일들은 모두 ILSVRC2012_img_train_t3.tar가 아니라 ILSVRC2012_img_train.tar를 기준으로 만들어진 것입니다. 이것을 그대로 사용하면 TFRecord로 변환할 때 error가 생길 것이므로, 이 중에서 ILSVRC2012_img_train_t3.tar에 들어있는 label들만 골라내어 새로 만드는 작업을 해야 합니다.

먼저 backup을 떠놓습니다.

[user1@ac922 data]$ cp imagenet_metadata.txt imagenet_metadata.txt.org

[user1@ac922 data]$ head imagenet_metadata.txt
n00004475 organism, being
n00005787 benthos
n00006024 heterotroph
n00006484 cell
n00007846 person, individual, someone, somebody, mortal, soul
n00015388 animal, animate being, beast, brute, creature, fauna
n00017222 plant, flora, plant life
n00021265 food, nutrient
n00021939 artifact, artefact
n00120010 hop

[user1@ac922 data]$ wc -l imagenet_metadata.txt
21842 imagenet_metadata.txt

다음과 같이 실제로 ~/ilsvrc2012/raw-data/train directory 속에 들어있는 이름들만 골라서 새로 imagenet_metadata.txt.new를 만든 뒤, 그것을 imagenet_metadata.txt에 overwrite합니다.

[user1@ac922 data]$ for i in `ls ~/ilsvrc2012/raw-data/train`
> do
> grep $i imagenet_metadata.txt >> imagenet_metadata.txt.new
> done

총 120개만 골라진 것을 볼 수 있고, 또 거기 들어간 이름들은 모두 멍멍이 종류인 것을 볼 수 있습니다.

[user1@ac922 data]$ wc -l imagenet_metadata.txt.new
120 imagenet_metadata.txt.new

[user1@ac922 data]$ head imagenet_metadata.txt.new
n02085620 Chihuahua
n02085782 Japanese spaniel
n02085936 Maltese dog, Maltese terrier, Maltese
n02086079 Pekinese, Pekingese, Peke
n02086240 Shih-Tzu
n02086646 Blenheim spaniel
n02086910 papillon
n02087046 toy terrier
n02087394 Rhodesian ridgeback
n02088094 Afghan hound, Afghan

imagenet_lsvrc_2015_synsets.txt에 대해서도 동일한 작업을 해줍니다.

[user1@ac922 data]$ wc -l imagenet_lsvrc_2015_synsets.txt
1000 imagenet_lsvrc_2015_synsets.txt.org

[user1@ac922 data]$ cp imagenet_lsvrc_2015_synsets.txt imagenet_lsvrc_2015_synsets.txt.org

[user1@ac922 data]$ head imagenet_lsvrc_2015_synsets.txt
n01440764
n01443537
n01484850
n01491361
n01494475
n01496331
n01498041
n01514668
n01514859
n01518878

[user1@ac922 data]$ for i in `ls ~/ilsvrc2012/raw-data/train`
> do
> grep $i imagenet_lsvrc_2015_synsets.txt >> imagenet_lsvrc_2015_synsets.txt.new
> done

[user1@ac922 data]$ wc -l imagenet_lsvrc_2015_synsets.txt.new
120 imagenet_lsvrc_2015_synsets.txt.new

[user1@ac922 data]$ cp imagenet_lsvrc_2015_synsets.txt.new imagenet_lsvrc_2015_synsets.txt

[user1@ac922 data]$ head imagenet_lsvrc_2015_synsets.txt
n02085620
n02085782
n02085936
n02086079
n02086240
n02086646
n02086910
n02087046
n02087394
n02088094

[user1@ac922 data]$ wc -l imagenet_lsvrc_2015_synsets.txt
120 imagenet_lsvrc_2015_synsets.txt

그리고 validation용 label인 imagenet_2012_validation_synset_labels.txt에 대해서도 비슷한 작업을 해줍니다.

[user1@ac922 data]$ wc -l imagenet_2012_validation_synset_labels.txt
50000 imagenet_2012_validation_synset_labels.txt

[user1@ac922 data]$ cp imagenet_2012_validation_synset_labels.txt imagenet_2012_validation_synset_labels.txt.org

[user1@ac922 data]$ head imagenet_2012_validation_synset_labels.txt
n01751748
n09193705
n02105855
n04263257
n03125729
n01735189
n02346627
n02776631
n03794056
n02328150

[user1@ac922 data]$ for i in `ls ~/ilsvrc2012/raw-data/train`
> do
> grep $i imagenet_2012_validation_synset_labels.txt >> imagenet_2012_validation_synset_labels.txt.new
> done

이 작업을 통해 총 5만장이던 validation용 이미지가 이제 6천장으로 줄어든 것을 보실 수 있습니다.

[user1@ac922 data]$ wc -l imagenet_2012_validation_synset_labels.txt.new
6000 imagenet_2012_validation_synset_labels.txt.new

이제 bounding_boxes에 들어있는 것들도 비슷한 작업을 통해 개수를 줄여줍니다.

[user1@ac922 data]$ for i in `ls ~/ilsvrc2012/raw-data/bounding_boxes`
> do
> grep ${i} imagenet_lsvrc_2015_synsets.txt
> if [[ $? -ne 0 ]]
> then
> rm -rf ~/ilsvrc2012/raw-data/bounding_boxes/${i}
> fi
> done

다음과 같이 120개 directory만 남은 것을 보실 수 있습니다.

[user1@ac922 data]$ ls ~/ilsvrc2012/raw-data/bounding_boxes | wc -l
120

이제 TFRecord 파일들을 담을 directory를 만들어 줍니다.

[user1@ac922 data]$ mkdir ~/ilsvrc2012/tfrecord

원래 github에서 제공되는 download_and_preprocess_imagenet.sh를 수행하면 download 부터 TFRecord로의 변환까지 일괄적으로 수행됩니다만, 여기서는 138GB의 data를 download 받을 수도 없고 또 이런저런 환경이 다르므로, 아래와 같이 그 핵심부분만 따와서 새로 tf_preprocess.sh라는 script를 만들었습니다.

[user1@ac922 data]$ vi tf_preprocess.sh
#!/bin/bash
DATA_DIR="${1%/}"
SCRATCH_DIR="${DATA_DIR}/raw-data/"
WORK_DIR="$HOME/models/research/inception/inception"

TRAIN_DIRECTORY="${SCRATCH_DIR}train/"
VALIDATION_DIRECTORY="${SCRATCH_DIR}validation/"

BOUNDING_BOX_SCRIPT="${WORK_DIR}/data/process_bounding_boxes.py"
BOUNDING_BOX_FILE="${SCRATCH_DIR}/imagenet_2012_bounding_boxes.csv"
BOUNDING_BOX_DIR="${SCRATCH_DIR}bounding_boxes/"

# Preprocess the validation data by moving the images into the appropriate
# sub-directory based on the label (synset) of the image.
echo "Organizing the validation data into sub-directories."
PREPROCESS_VAL_SCRIPT="${WORK_DIR}/data/preprocess_imagenet_validation_data.py"
VAL_LABELS_FILE="${WORK_DIR}/data/imagenet_2012_validation_synset_labels.txt"

"${PREPROCESS_VAL_SCRIPT}" "${VALIDATION_DIRECTORY}" "${VAL_LABELS_FILE}"

# Convert the XML files for bounding box annotations into a single CSV.
echo "Extracting bounding box information from XML."
BOUNDING_BOX_SCRIPT="${WORK_DIR}/data/process_bounding_boxes.py"
BOUNDING_BOX_FILE="${SCRATCH_DIR}/imagenet_2012_bounding_boxes.csv"
BOUNDING_BOX_DIR="${SCRATCH_DIR}bounding_boxes/"
LABELS_FILE="${WORK_DIR}/data/imagenet_lsvrc_2015_synsets.txt"

"${BOUNDING_BOX_SCRIPT}" "${BOUNDING_BOX_DIR}" "${LABELS_FILE}" \
| sort > "${BOUNDING_BOX_FILE}"
echo "Finished downloading and preprocessing the ImageNet data."

# Build the TFRecords version of the ImageNet data.
BUILD_SCRIPT="/usr/bin/python ${WORK_DIR}/data/build_imagenet_data.py"
OUTPUT_DIRECTORY="${DATA_DIR}/tfrecord"
IMAGENET_METADATA_FILE="${WORK_DIR}/data/imagenet_metadata.txt"

python $HOME/models/research/inception/inception/data/build_imagenet_data.py \
--train_directory="${TRAIN_DIRECTORY}" \
--validation_directory="${VALIDATION_DIRECTORY}" \
--output_directory="${OUTPUT_DIRECTORY}" \
--imagenet_metadata_file="${IMAGENET_METADATA_FILE}" \
--labels_file="${LABELS_FILE}" \
--bounding_box_file="${BOUNDING_BOX_FILE}"

이 tf_preprocess.sh는 그 속에서 build_imagenet_data.py를 불러 TFRecord로의 변환을 수행합니다. 이 script는 총 1000개의 shard에 training data를 넣는데, 이 숫자를 120으로 줄이겠습니다. validation shard는 32개로 줄이겠습니다.

[user1@ac922 data]$ vi build_imagenet_data.py
...
#tf.app.flags.DEFINE_integer('train_shards', 1024,
tf.app.flags.DEFINE_integer('train_shards', 120,
'Number of shards in training TFRecord files.')
#tf.app.flags.DEFINE_integer('validation_shards', 128,
tf.app.flags.DEFINE_integer('validation_shards', 32,
'Number of shards in validation TFRecord files.')
...

이제 tf_preprocess.sh script에 실행 permission을 주고 실행하면 TFRecord로의 변환이 시작됩니다.

[user1@ac922 data]$ chmod a+x tf_preprocess.sh

[user1@ac922 data]$ ./tf_preprocess.sh ~/ilsvrc2012
...
2018-04-11 19:22:21.150944 [thread 2]: Wrote 2572 images to 2572 shards.
2018-04-11 19:22:21.151077 [thread 6]: Wrote 172 images to /home/user1/ilsvrc2012/tfrecord/train-00104-of-00120
2018-04-11 19:22:21.151131 [thread 6]: Wrote 2572 images to 2572 shards.
2018-04-11 19:22:21.247600 [thread 7]: Wrote 172 images to /home/user1/ilsvrc2012/tfrecord/train-00119-of-00120
2018-04-11 19:22:21.247631 [thread 7]: Wrote 2573 images to 2573 shards.
2018-04-11 19:22:21.506394: Finished writing all 20580 images in data set.

이렇게 멍멍이 사진 20580장의 변환이 완료되었습니다.

[user1@ac922 data]$ cd ~/ilsvrc2012/tfrecord/
[user1@ac922 tfrecord]$ ls train-* | wc -l
120
[user1@ac922 tfrecord]$ ls validation-* | wc -l
32
[user1@ac922 tfrecord]$ du -sm .
1475 .

이렇게 만들어진 ~/ilsvrc2012/tfrecord 디렉토리 밑의 file들을 다음 명령어로 뭉쳐서 아래 URL에 올려두었습니다.

[user1@ac922 ilsvrc2012]$ tar -zcvf tfrecord.tgz tfrecord

https://drive.google.com/open?id=1rQcxAWeNbByy0Yooj6IbROyVRsdQPn5-

2018년 4월 6일 금요일

AC922의 CUDA 9.1에서의 기본 문제 해결 및 튜닝

다시 한번 정리합니다. 현재 버전의 CUDA 9.1에서는 AC922의 GPU들을 제대로 운용하기 위해서는 다음과 같이 약간의 손질이 필요합니다. 그러지 않을 경우 nvidia-smi에서 볼 때 UNKNOWN ERROR가 발생하는 등 GPU가 정상 작동을 하지 않습니다.

[root@ac922 ~]# dracut --force

[root@ac922 ~]# vi /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0

[root@ac922 ~]# vi /usr/lib/systemd/system/nvidia-persistenced.service
[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target

[Service]
Type=forking
PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid
Restart=always
ExecStart=/usr/bin/nvidia-persistenced --verbose
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced

[Install]
WantedBy=multi-user.target

[root@ac922 ~]# sudo systemctl enable nvidia-persistenced

[root@ac922 ~]# sudo nvidia-smi -pm 1

[root@ac922 ~]# vi /lib/udev/rules.d/40-redhat.rules (아래 줄을 #으로 comment-out)
...
#SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/uname -p", RESULT!="s390*", ATTR{state}=="offline", ATTR{state}="online"
...

여기서 이제 rebooting을 해야 합니다. 여기서 기본 튜닝 요소를 미리 /etc/rc.local에 다음과 같이 넣어두는 것이 좋습니다.

[root@ac922 ~]# ls -l /etc/rc.local
...
/usr/sbin/ppc64_cpu --smt=off
sleep 10
/usr/bin/cpupower frequency-set --governor performance
sleep 10
/usr/bin/nvidia-smi -ac 877,1530

그리고 /etc/rc.local과 연결된 /etc/rc.d/rc.local에 실행 퍼미션을 주는 것을 잊지 말아야 합니다.

[root@ac922 ~]# ls -l /etc/rc.local
lrwxrwxrwx. 1 root root 13 Apr 3 14:29 /etc/rc.local -> rc.d/rc.local

[root@ac922 ~]# chmod a+x /etc/rc.d/rc.local

2018년 4월 5일 목요일

AC922 Redhat에 Theano와 Keras 설치 및 테스트

AC922 Redhat에 python 3.6을 설치하는 것은 다음과 같이 Anaconda를 설치하는 것이 가장 간단하고 가장 편리합니다.

[user1@ac922 ~]$ wget https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-ppc64le.sh (python 2.x을 설치하려는 경우는 Anaconda3 대신 Anaconda2로 대체)

[user1@ac922 ~]$ chmod a+x Anaconda3-5.1.0-Linux-ppc64le.sh

[user1@ac922 ~]$ ./Anaconda3-5.1.0-Linux-ppc64le.sh

이렇게 설치가 끝나면 거기에 딸린 pip를 통해서 쉽게 theano와 keras를 설치할 수 있습니다.

[user1@ac922 ~]$ which python
~/anaconda3/bin/python
[user1@ac922 ~]$ python --version
Python 3.6.4 :: Anaconda, Inc.

Keras 2.0.9를 골라서 설치하려면 다음과 같이 하시면 됩니다.

[user1@ac922 ~]$ pip install keras==2.0.9
Collecting keras==2.0.9
Using cached Keras-2.0.9-py2.py3-none-any.whl
Requirement already satisfied: pyyaml in ./anaconda3/lib/python3.6/site-packages (from keras==2.0.9)
Requirement already satisfied: scipy>=0.14 in ./anaconda3/lib/python3.6/site-packages (from keras==2.0.9)
Requirement already satisfied: numpy>=1.9.1 in ./anaconda3/lib/python3.6/site-packages (from keras==2.0.9)
Requirement already satisfied: six>=1.9.0 in ./anaconda3/lib/python3.6/site-packages (from keras==2.0.9)
Installing collected packages: keras
Successfully installed keras-2.0.9

Keras를 테스트해보시려면 먼저 tensorflow를 설치하셔야 합니다. 그건 이미 올려둔 아래 posting에 따라 설치하시면 됩니다.

http://hwengineer.blogspot.kr/2018/04/ac922-redhat-74-python-36-tensorflow.html

이제 아래와 같이 keras에서 GPU를 사용하는지 확인해봅니다.

[user1@ac922 ~]$ python
Python 3.6.4 |Anaconda, Inc.| (default, Feb 11 2018, 08:19:13)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from keras import backend as K
/home/user1/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
....
tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0035:04:00.0, compute capability: 7.0)

>>> K.tensorflow_backend._get_available_gpus()
['/device:GPU:0', '/device:GPU:1', '/device:GPU:2', '/device:GPU:3']

잘 되는 것을 확인하실 수 있습니다.

이제 theano를 설치합니다. 먼저, theano 1.0.0이 GPU를 쓰기 위해서는 pygpu 0.7 이상을 설치해야 합니다. 기본적으로 설치되는 pygpu는 0.6.9이므로, 이걸 그대로 사용하면 다음과 같은 error를 겪게 됩니다.

ERROR (theano.gpuarray): pygpu was configured but could not be imported or is too old (version 0.7 or higher required)

따라서 먼저 pygpu 0.7을 설치해야 하는데, 이건 다음과 같이 libgpuarray라는 package를 설치하면 됩니다.

[user1@ac922 ~]$ git clone https://github.com/Theano/libgpuarray.git

[user1@ac922 ~]$ cd libgpuarray

[user1@ac922 libgpuarray]$ cmake3 .

[user1@ac922 libgpuarray]$ make && sudo make install

이러면 libgpuarray.so가 /usr/local/lib 밑에 설치됩니다. 이것이 우선적으로 loading되도록 LD_LIBRARY_PATH를 아래와 같이 설정합니다.

[user1@ac922 ~]$ export LD_LIBRARY_PATH=/usr/local/lib:/usr/local/cuda-9.1/lib64:/usr/lib:/usr/lib64

그리고 python binding을 위해 아래와 같이 마무리합니다.

[user1@ac922 libgpuarray]$ python setup.py install

이제 pip list를 해보면 다음과 같이 pygpu가 설치된 것을 보실 수 있습니다.

[user1@ac922 ~]$ pip list --format=columns | grep pygpu
pygpu 0.7.5+11.g04c2892.dirty

이제 theano를 설치합니다. 최신 버전은 1.0.1입니다만 여기서는 1.0.0을 선택합니다.

[user1@ac922 ~]$ pip install theano==1.0.0
Collecting theano==1.0.0
Downloading Theano-1.0.0.tar.gz (2.9MB)
100% |████████████████████████████████| 2.9MB 390kB/s
Requirement already satisfied: numpy>=1.9.1 in ./anaconda3/lib/python3.6/site-packages (from theano==1.0.0)
Requirement already satisfied: scipy>=0.14 in ./anaconda3/lib/python3.6/site-packages (from theano==1.0.0)
Requirement already satisfied: six>=1.9.0 in ./anaconda3/lib/python3.6/site-packages (from theano==1.0.0)
Building wheels for collected packages: theano
Running setup.py bdist_wheel for theano ... done
Stored in directory: /home/user1/.cache/pip/wheels/e1/38/41/29fead4ea90d8fb9e23af0ba80d24222f8ba6effe93896ecbf
Successfully built theano
Installing collected packages: theano
Successfully installed theano-1.0.0

여기서 끝나는 것이 아니라, theano의 환경 설정을 아래와 같이 해줘야 합니다.

[user1@ac922 ~]$ vi ~/.theanorc

[global]

device = cuda0

floatX = float32

[cuda]

cuda.root = /usr/local/cuda-9.1

[dnn]

library_path = /usr/local/cuda-9.1/targets/ppc64le-linux/lib

include_path = /usr/local/cuda-9.1/targets/ppc64le-linux/include

enabled = True

확인은 다음과 같이 합니다.

[user1@ac922 ~]$ pip list --format=columns | grep -e 'Keras\|Theano'
Keras 2.0.9
Theano 1.0.0

확인차 아래와 같이 연습용 theano sample을 하나 돌려보겠습니다.

[user1@ac922 files]$ vi theano_ex1.py
from theano import function, config, shared, tensor
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')

여기서 "device=cpu" 또는 "device=cuda#"이라는 THEANO_FLAGS를 이용하여 CPU 또는 GPU를 쓰도록 제어할 수 있습니다. 먼저 CPU로 연산하도록 하는 예입니다.

[user1@ac922 files]$ time THEANO_FLAGS=device=cpu python theano_ex1.py
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 6.145363 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753
1.62323285]
Used the cpu

real 0m6.787s
user 0m6.739s
sys 0m0.050s

이제 dev=cuda#으로 해봅니다.

[user1@ac922 files]$ time THEANO_FLAGS=device=cuda0 python theano_ex1.py

Traceback (most recent call last):
File "__init__.pxd", line 1011, in numpy.import_array
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
...

억, error입니다. Googling해보면 이는 numpy의 버전이 낮아서 생기는 문제입니다. 다음과 같이 버전을 높여주면 됩니다.

[user1@ac922 ~]$ pip install numpy --upgrade
Collecting numpy
Installing collected packages: numpy
Found existing installation: numpy 1.13.3
Uninstalling numpy-1.13.3:
Successfully uninstalled numpy-1.13.3
Successfully installed numpy-1.14.2

이제 다시 수행해봅니다.

[user1@ac922 files]$ time THEANO_FLAGS=device=cuda0 python theano_ex1.py
Using cuDNN version 7005 on context None
Mapped name None to device cuda0: Tesla V100-SXM2-16GB (0004:04:00.0)
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.107720 seconds
Result is [1.2317803 1.6187935 1.5227807 ... 2.2077181 2.2996776 1.623233 ]
Used the gpu

real 0m4.753s
user 0m2.746s
sys 0m2.001s

잘 됩니다. CPU를 이용하는 것보다 당연히 더 빠릅니다. 아까 $HOME/.theanorc에 device를 cuda0로 해놓았기 때문에, 사실 여기서는 THEANO_FLAGS를 따로 지정하지 않아도 default로 cuda0, 즉 GPU 0번을 이용하여 수행됩니다.

[user1@ac922 files]$ time python theano_ex1.py
Using cuDNN version 7005 on context None
Mapped name None to device cuda0: Tesla V100-SXM2-16GB (0004:04:00.0)
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.139874 seconds
Result is [1.2317803 1.6187935 1.5227807 ... 2.2077181 2.2996776 1.623233 ]
Used the gpu

real 0m4.725s
user 0m2.656s
sys 0m2.057s

혹시 더 최신의 Theano를 source로부터 직접 build하고 싶으시면 다음과 같이 하시면 됩니다.

[user1@ac922 files]$ pip install -U git+https://github.com/Theano/Theano.git

Collecting Theano from git+https://github.com/Theano/Theano.git#egg=Theano

Cloning https://github.com/Theano/Theano.git to /tmp/pip-build-wpolv6wn/Theano

Collecting numpy>=1.9.1 (from Theano)

Downloading numpy-1.14.2.zip (4.9MB)

100% |████████████████████████████████| 4.9MB 232kB/s

Collecting scipy>=0.14 (from Theano)

Downloading scipy-1.0.1.tar.gz (15.5MB)

100% |████████████████████████████████| 15.5MB 75kB/s

Requirement already up-to-date: six>=1.9.0 in /home/user1/anaconda3/lib/python3.6/site-packages (from Theano)

Building wheels for collected packages: numpy, scipy

Running setup.py bdist_wheel for numpy ... done

Stored in directory: /home/user1/.cache/pip/wheels/7d/03/d4/050b7a6ff45c26a43b5a5b9e4ad1983b35b2aa1df42d44b5b1

Running setup.py bdist_wheel for scipy ... done

Stored in directory: /home/user1/.cache/pip/wheels/bc/15/5b/4b524f3adb3be2b81d59001f2b06ac758b7b73e77b4345a8b7

Successfully built numpy scipy

Installing collected packages: numpy, scipy, Theano

Found existing installation: numpy 1.13.3

Uninstalling numpy-1.13.3:

Successfully uninstalled numpy-1.13.3

Found existing installation: scipy 1.0.0

Uninstalling scipy-1.0.0:

Successfully uninstalled scipy-1.0.0

Found existing installation: Theano 1.0.0

Uninstalling Theano-1.0.0:

Successfully uninstalled Theano-1.0.0

Running setup.py install for Theano ... done

Successfully installed Theano-1.0.1+70.gd9b8a7c numpy-1.14.2 scipy-1.0.1

[user1@ac922 files]$ pip list --format=columns | grep -i theano

Theano 1.0.1+70.gd9b8a7c

2018년 4월 2일 월요일

AC922 Redhat 7.4 + Python 3.6에서 Tensorflow 1.5.0 build하기

이번에는 Tensorflow 1.5.0을 AC922 Redhat 7.4 + Python 3.6에서 build하고 테스트 삼아 CIFAR10 training 돌리는 것까지를 해보겠습니다.

기본적인 사전 준비 작업은 아래 URL에 나온 TF 1.4.1의 것과 동일합니다. 즉 anaconda 5.1을 설치하고 또 bazel-0.8.1을 빌드해서 /usr/local/bin 밑에 copy 해두면 됩니다.

https://hwengineer.blogspot.kr/2018/01/ac922-redhat-python3-tensorflow-141.html

먼저 tensorflow의 source code를 github에서 download 받습니다.

[user1@ac922 ~]$ git clone https://github.com/tensorflow/tensorflow

[user1@ac922 ~]$ cd tensorflow

v1.5.0으로 checkout 합니다. 위의 posting에 따라 1.4.1에 patch가 적용된 상태라면 '변경된 source가 있으니 stash이든 commit이든 먼저 하고 checkout하라'고 나올텐데, 우리는 기존 patch 적용은 다 버리고 갈 것이니 그냥 -f 옵션을 써서 강제로 checkout 하겠습니다.

[user1@ac922 tensorflow]$ git checkout -f tags/v1.5.0

이제 몇가지 python package와 환경변수들을 맞춰두겠습니다.

[user1@ac922 tensorflow]$ conda install wheel numpy six

[user1@ac922 tensorflow]$ which protoc
~/anaconda3/bin/protoc

[user1@ac922 tensorflow]$ export PROTOC=~/anaconda3/bin/protoc

[user1@ac922 tensorflow]$ which bazel
/usr/local/bin/bazel

[user1@ac922 tensorflow]$ export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64:/usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:$LD_LIBRARY_PATH

TF1.5에서는 1.4와는 달리 ppc64le 아키텍처에서도 boring_ssl에서 error가 나지 않습니다. 대신 그냥 bazel을 수행하면 다음과 같은 error가 날 수 있습니다.

gcc: error: unrecognized command line option '-march=native'

이는 gcc 4.8에 ppc64le 아키텍처에 대해서는 -march라는 argument가 없는데, TF1.5에 포함된 configure.py에는 저 -march=native라는 것이 default로 되어 있기 때문입니다. 따라서 아래와 같이 고쳐주어야 error가 나지 않습니다.

[user1@ac922 tensorflow]$ vi ./configure.py
...
# gcc on ppc64le does not support -march, use mcpu instead
""" default_cc_opt_flags = '-mcpu=native' """
default_cc_opt_flags = '-mcpu=power8'
else:
""" default_cc_opt_flags = '-march=native' """
default_cc_opt_flags = '-mcpu=power8'
...
# It should be safe on the same build host.
"""write_to_bazelrc('build:opt --host_copt=-march=native')"""
write_to_bazelrc('build:opt --host_copt=-mcpu=power8')
...

이제 configure를 수행합니다.

[user1@ac922 tensorflow]$ ./configure
...
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
...
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
...
Do you wish to build TensorFlow with CUDA support? [y/N]: y
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 9.1
Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-9.1
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 7
Please specify the location where cuDNN 7.0 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-9.1]:/usr/local/cuda-9.1/targets/ppc64le-linux/lib
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]7.0
...
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -mcpu=power8]:
(위에서 configure.py를 수정했으므로 이미 default가 -mcpu=power8로 바뀌어 있는 것이 보입니다. 따라서 그냥 아무 input을 넣지 않아도 됩니다.)
...

이제 bazel build를 수행합니다.

[user1@ac922 tensorflow]$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
...
cc1plus: warning: unrecognized command line option "-Wno-self-assign" [enabled by default]
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 432.069s, Critical Path: 138.14s
INFO: Build completed successfully, 6026 total actions

아무 문제없이 완료됩니다. 이제 wheel package를 build합니다.

[user1@ac922 tensorflow]$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/files/tensorflow_pkg
...
Mon Apr 2 16:26:06 KST 2018 : === Output wheel file is in: /home/user1/files/tensorflow_pkg

아래와 같이 wheel package가 build된 것을 보실 수 있습니다.

[user1@ac922 tensorflow]$ ls -l ~/files/tensorflow_pkg
total 76724
-rw-rw-r--. 1 user1 user1 78562178 Apr 2 16:26 tensorflow-1.5.0-cp36-cp36m-linux_ppc64le.whl

이 파일을 pip 명령으로 설치하면 됩니다. 기존에 tensorflow 1.4.1이 설치되어있던 시스템이라면 이 과정에서 자동으로 TF1.4.1이 deinstall 됩니다.

[user1@ac922 tensorflow]$ pip install ~/files/tensorflow_pkg/tensorflow-1.5.0-cp36-cp36m-linux_ppc64le.whl
Processing /home/user1/files/tensorflow_pkg/tensorflow-1.5.0-cp36-cp36m-linux_ppc64le.whl
Requirement already satisfied: six>=1.10.0 in /home/user1/anaconda3/lib/python3.6/site-packages (from tensorflow==1.5.0)
Requirement already satisfied: numpy>=1.12.1 in /home/user1/anaconda3/lib/python3.6/site-packages (from tensorflow==1.5.0)
Requirement already satisfied: wheel>=0.26 in /home/user1/anaconda3/lib/python3.6/site-packages (from tensorflow==1.5.0)
Collecting absl-py>=0.1.6 (from tensorflow==1.5.0)
Downloading absl-py-0.1.13.tar.gz (80kB)
100% |████████████████████████████████| 81kB 721kB/s
Collecting tensorflow-tensorboard<1.6.0,>=1.5.0 (from tensorflow==1.5.0)
Downloading tensorflow_tensorboard-1.5.1-py3-none-any.whl (3.0MB)
100% |████████████████████████████████| 3.0MB 378kB/s
Requirement already satisfied: protobuf>=3.4.0 in /home/user1/anaconda3/lib/python3.6/site-packages (from tensorflow==1.5.0)
Requirement already satisfied: werkzeug>=0.11.10 in /home/user1/anaconda3/lib/python3.6/site-packages (from tensorflow-tensorboard<1.6.0,>=1.5.0->tensorflow==1.5.0)
Requirement already satisfied: html5lib==0.9999999 in /home/user1/anaconda3/lib/python3.6/site-packages (from tensorflow-tensorboard<1.6.0,>=1.5.0->tensorflow==1.5.0)
Requirement already satisfied: bleach==1.5.0 in /home/user1/anaconda3/lib/python3.6/site-packages (from tensorflow-tensorboard<1.6.0,>=1.5.0->tensorflow==1.5.0)
Requirement already satisfied: markdown>=2.6.8 in /home/user1/anaconda3/lib/python3.6/site-packages (from tensorflow-tensorboard<1.6.0,>=1.5.0->tensorflow==1.5.0)
Requirement already satisfied: setuptools in /home/user1/anaconda3/lib/python3.6/site-packages (from protobuf>=3.4.0->tensorflow==1.5.0)
Building wheels for collected packages: absl-py
Running setup.py bdist_wheel for absl-py ... done
Stored in directory: /home/user1/.cache/pip/wheels/76/f7/0c/88796d7212af59bb2f496b12267e0605f205170781e9b86479
Successfully built absl-py
Installing collected packages: absl-py, tensorflow-tensorboard, tensorflow
Found existing installation: tensorflow-tensorboard 0.4.0
Uninstalling tensorflow-tensorboard-0.4.0:
Successfully uninstalled tensorflow-tensorboard-0.4.0
Found existing installation: tensorflow 1.4.1
Uninstalling tensorflow-1.4.1:
Successfully uninstalled tensorflow-1.4.1
Successfully installed absl-py-0.1.13 tensorflow-1.5.0 tensorflow-tensorboard-1.5.1

pip list 명령으로 잘 설치되었는지 확인합니다.

[user1@ac922 tensorflow]$ pip list --format=columns | grep tensor
tensorflow 1.5.0
tensorflow-tensorboard 1.5.1

이제 test 삼아 cifar10을 수행해보겠습니다. 아래와 같이 기존에 받아둔 https://github.com/tensorflow/models 속에 든 cifar10을 이용합니다.

[user1@ac922 tensorflow]$ cd ~/models/tutorials/image/cifar10

[user1@ac922 cifar10]$ python cifar10_multi_gpu_train.py --num_gpus=4 --batch_size=1024
...
2018-04-02 16:32:08.080741: step 0, loss = 4.67 (1175.1 examples/sec; 0.871 sec/batch)
2018-04-02 16:32:12.800667: step 10, loss = 4.58 (25133.3 examples/sec; 0.041 sec/batch)
2018-04-02 16:32:14.351431: step 20, loss = 4.40 (27159.8 examples/sec; 0.038 sec/batch)
2018-04-02 16:32:15.920159: step 30, loss = 4.38 (27289.8 examples/sec; 0.038 sec/batch)
...

위와 같이 잘 수행되는 것을 보실 수 있습니다. TF1.4.1과 (당연하겠으나) 성능면에서는 별 차이가 없는 것으로 보입니다.

아울러 위 과정에서 빌드한 tensorflow 1.5.0 for python3의 wheel 파일을 아래 google drive에 올려두겠습니다. 품질을 책임질 수 있는 파일이 아닌 점은 양해부탁드립니다.

https://drive.google.com/open?id=1KY3tFk3wZW0z3puZtsnBYt9wPZqDPzho