前文已经在服务器上下载好了docker,下面来搭建一个完整的开发测试服务环境。
先从Docker Hub上拉取一个tf service镜像,然后我们改造成需要的虚拟环境。
$ sudo service docker restart
$ sudo docker pull tensorflow/serving:latest-devel
运行该镜像 参数参考
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
tensorflow/serving latest-devel a1cd4ca64ea0 4 weeks ago 2.907 GB
$ sudo docker run -it -p 9001:9001 tensorflow/serving:latest-devel
这样会打开并进入一个可交互的虚拟环境,实体机端口:虚拟端口 也作了映射。
root@f5b5a877155e:/tensorflow-serving#
## 查看该虚拟环境
root@f5b5a877155e:/tensorflow-serving# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.5 LTS
Release: 16.04
Codename: xenial
root@f5b5a877155e:/tensorflow-serving# ll
total 64
-rw-r--r-- 1 root root 1689 Aug 15 01:04 WORKSPACE
drwxr-xr-x 15 root root 4096 Aug 15 01:04 tensorflow_serving/
drwxr-xr-x 2 root root 4096 Aug 15 01:04 third_party/
drwxr-xr-x 2 root root 4096 Aug 15 01:04 tools/
只有个tf-service的服务在当前虚拟环境下,距离我们想要的可开发、可测试环境还差很大一截。
##安装vim
root@f5b5a877155e:/download# apt-get install vim
##安装anaconda
root@f5b5a877155e:/download# mkdir /download; cd /download;
root@f5b5a877155e:/download# wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-5.2.0-Linux-x86_64.sh
root@f5b5a877155e:/download# sh Anaconda3-5.2.0-Linux-x86_64.sh
##查看python[conda也没有]##
root@f5b5a877155e:/download# python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
##发现还是系统默认python,而非anaconda的python ##
root@f5b5a877155e:/download# source ~/.bashrc
##里面有一行手动来生效#export PATH="/root/anaconda3/bin:$PATH"##
root@f5b5a877155e:/download# python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
root@f5b5a877155e:/download# conda install tensorflow
root@f5b5a877155e:/download# conda install grpcio
该虚拟环境中已经有了TF-Service了,但是还缺依赖
root@f5b5a877155e:/download# apt-get update && apt-get install -y build-essential curl libcurl3-dev git libfreetype6-dev libpng12-dev libzmq3-dev pkg-config python-dev python-numpy python-pip software-properties-common swig zip zlib1g-dev
root@f5b5a877155e:/download# echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list
root@f5b5a877155e:/download# curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
root@f5b5a877155e:/download# apt-get update && apt-get install tensorflow-model-server
root@f5b5a877155e:/download# pip install tensorflow-serving-api
利用已有的tf-Serving里面的脚本,训练个mnist分类模型。
##将实体机已经下载好的数据copy进容器的/tmp/目录下##
## docker 1.8+ 可以直接使用sudo docker cp /local/file ContID:/tmp/
## docker 1.7- 则不可以直接使用cp ## 而是使用管道 ## 4个文件都用到 ##
root@f5b5a877155e:/download# cat train-images-idx3-ubyte.gz |sudo docker exec -i f5b5a877155e sh -c 'cat > /tmp/train-images-idx3-ubyte.gz'
root@f5b5a877155e:/download# cd tensorflow-serving
root@f5b5a877155e:/tensorflow-serving# python tensorflow_serving/example/mnist_saved_model.py models/mnist
Extracting /tmp/train-images-idx3-ubyte.gz
Extracting /tmp/train-labels-idx1-ubyte.gz
Extracting /tmp/t10k-images-idx3-ubyte.gz
Extracting /tmp/t10k-labels-idx1-ubyte.gz
2018-09-16 05:33:33.994294: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-09-16 05:33:33.994496: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
training accuracy 0.9092
Done training!
Exporting trained model to b'models/mnist/1'
Done exporting!
开启下训练好的模型服务,并后台执行
root@f5b5a877155e:/tensorflow-serving# tensorflow_model_server --port=9000 --model_name=mnist --model_base_path=/tensorflow-serving/models/mnist/ &> mnist_log &
root@f5b5a877155e:/tensorflow-serving# python tensorflow_serving/example/mnist_client.py --num_tests=10 --server=0.0.0.0:9000
Extracting /tmp/train-images-idx3-ubyte.gz
Extracting /tmp/train-labels-idx1-ubyte.gz
Extracting /tmp/t10k-images-idx3-ubyte.gz
Extracting /tmp/t10k-labels-idx1-ubyte.gz
..........
Inference error rate: 10.0%
在实体机下,同样可以来访问该虚拟机提供的TF Service,同样的命令。
不停止服务地退出容器,进入一个正在运行的container。
root@ContainerID:/# ctrl+p+q 退出容器但不关闭容器,服务仍然可供访问
root@ContainerID:/# ctrl+d 退出容器且关闭容器,服务也就停止掉了
$ sudo docker start -i <Container ID>
注意:这里配置的虚拟环境,包括了开发和测试及对外提供服务的各个项目。
在实际使用时,只需要在外部将模型训练好,在docker的虚拟环境下可以测试和对外提供服务即可。
如何将已经配置好的容器固化下来,并发布出去,供其他人或者机器下载使用。
先注册个 Docker hub账户【与github类似】可以直接通过本地登录,来将账户和密码写入本地json存储。
## Container --> Image --> push
$ sudo docker login
$ sudo docker commit -a "author" <Container ID> tensorflow/serving/mnist:v1.0
$ sudo docker push tensorflow/serving/mnist:v1.0
$ sudo docker inspect tensorflow/serving/mnist:v1.0
换台机器,需要有docker,拉取配置好的镜像。
$ sudo docker pull tensorflow/serving/mnist:v1.0
$ sudo docker images
$ sudo docker run -it -p port:port <REPOSITORY>:<TAG>
注意:如果不想使用 Docker Hub,也可以用export 和import来做镜像的平移。
$ sudo docker export <Containe> > your_image_name:version
$ sudo docker import your_image_name:version
## 运行导入的镜像 ##
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f5b5a877155e tensorflow/serving:latest-devel "/bin/bash" 2 hours ago Up 2 hours 0.0.0.0:9002->9002/tcp determined_albattani
## 可以看到有command 一列 ## 一般是/bin/bash
$ sudo docker run -name "contain_name" -it -p port:port your_image_name:version COMMAND
下一节分享,如何具体修改TF Service适应线上业务。