Hello.
Since I found some problems when I building machine learning(TensorFlow + Jupyter notebook) in docker environment on SynQuacer, I would like to share it.
Could you tell me some advices to fix these problems?
- Failed to install TensorFlow from pip
- “pip install” command is too slow
- Failed to build TensorFlow because of -mfpu option
- local_resorces option is not working when build TF with bazel
Here is Dockerfile for reproduction:
And here is my SynQuacer spec:
RAM | 4GB (I couldn’t find compatible DIMM, which should I use?) |
HDD | 1TB |
Host OS | Debian |
Kernel | 4.14.32.linaro.281-1 |
Container OS | linaro/base-arm64-ubuntu:xenial |
1. Failed to install TensorFlow from pip:
I tried to install TensorFlow from pip, but it was failed.
Here is error log:
root@5e2a3172b85b:~# pip3 install tensorflow
Collecting tensorflow
Could not find a version that satisfies the requirement tensorflow (from versions: )
No matching distribution found for tensorflow
You are using pip version 8.1.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
It seems there is no prebuild aarch64 TensorFlow binary. So, I need to build it from source code.
However, I want to install from pip because it is not easy to use.
2.pip install command is too slow:
I ran below command, but I had to wait for about 3 hour to finish this command.
# pip3 --no-cache-dir install Pillow ipykernel jupyter gast grpcio absl-py protobuf tensorboard scipy
Here is top command result when executing pip install command.
The pip install command executing the cc1plus using single core( to build native extension?).
I thinking it is cause of this problem.
Similar problems are discussed at StackOverflow, but I could not find good answer.
I want to fix this problem to shorten the time on low-power multicore machine(like SynQuacer).
However, I didn’t know where to fix. Could someone help me to solve it?
3. Failed to build TensorFlow because of -mfpu option
I couldn’t build TensorFlow because TensorFlow’s build system sets -mfpu=neon option to gcc.
Here is error log:
/usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/arm-opt/bin/tensorflow/contrib/lite/kernels/internal/_objs/neon_tensor_utils/tensorflow/contrib/lite/kernels/internal/reference/portable_tensor_utils.pic.d '-frandom-seed=bazel-out/arm-opt/bin/tensorflow/contrib/lite/kernels/internal/_objs/neon_tensor_utils/tensorflow/contrib/lite/kernels/internal/reference/portable_tensor_utils.pic.o' -fPIC -iquote . -iquote bazel-out/arm-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/arm-opt/genfiles/external/bazel_tools -iquote external/arm_neon_2_x86_sse -iquote bazel-out/arm-opt/genfiles/external/arm_neon_2_x86_sse -iquote external/gemmlowp -iquote bazel-out/arm-opt/genfiles/external/gemmlowp -funsafe-math-optimizations -ftree-vectorize -fomit-frame-pointer -O3 '-mfpu=neon' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c tensorflow/contrib/lite/kernels/internal/reference/portable_tensor_utils.cc -o bazel-out/arm-opt/bin/tensorflow/contrib/lite/kernels/internal/_objs/neon_tensor_utils/tensorflow/contrib/lite/kernels/internal/reference/portable_tensor_utils.pic.o)
gcc: error: unrecognized command line option '-mfpu=neon'
I created the ad-hockery patch to avoid this problem, but it is not fundamental solution.
https://github.com/tnishinaga/tensorflow/commit/8a11597ee0cc5823e3dc57aa7682970e56b51517
I think it is cause of this problem that TensorFlow’s build system recognizes SynQuacer as ARM32 environment.
4. local_resorces option is not working when build TF with bazel:
I set local_resources option to bazel to limit the memory because my SynQuacer’s RAM is 4GB. (I’m using bazel 0.50)
Here is the command to build TensorFlow:
# bazel build -c opt \
--copt="-mcpu=cortex-a53+fp" \
--verbose_failures tensorflow/tools/pip_package:build_pip_package \
--local_resources 3072,24.0,1.0
However, local_resorces option seems not working.
virtual memory exhausted: Cannot allocate memory
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 24530.257s, Critical Path: 23750.57s
INFO: 180 processes: 180 local.
FAILED: Build did NOT complete successfully
root@949cda07f529:~/tensorflow-1.9.0-rc2#
I got “Cannot allocate memory” error when swap enabled.
And crash the docker process when swap disabled.
I change the build command to avoid this problem, however it is not good way.
bazel build -c opt \
--copt="-mcpu=cortex-a53+fp" \
--verbose_failures tensorflow/tools/pip_package:build_pip_package \
-j 3
I think the bazel should adjust it.