[筆記] 在ubuntu 18.04 下安裝nvidia 顯示卡驅動程式以及 pgstrom / Install Nvidia Driver Cuda Pgstrom in Ubuntu 1804

因為老闆說要試試看用GPU 來跑postgresql 威力

手邊剛好有一張 geforce gt 720

一開始沒想太多,看到有這張卡的驅動程式,然後CUDA也有支援

就直接從桌機拔下來,接去LAB Server ,然後就開始一連串的難關了…

整個過程大致上分為四個步驟

安裝 nvidia driver

安裝 CUDA

安裝 postgresql

安裝 pgstrom


安裝 nvidia driver

試過幾種方法,最後還是覺得用apt來安裝比較妥當
先新增repository、update、裝driver

1
2
3
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install ubuntu-drivers-common

然後用這個指令

1
ubuntu-drivers devices

看一下現在的驅動程式狀態

1
2
3
4
5
6
7
8
9
10
11
12
administrator@hqdc032:~/pg-strom$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001288sv0000174Bsd0000326Bbc03sc00i00
vendor : NVIDIA Corporation
model : GK208B [GeForce GT 720]
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-418 - third-party free
driver : nvidia-340 - distro non-free
driver : nvidia-driver-430 - third-party free recommended
driver : nvidia-driver-390 - third-party free
driver : nvidia-driver-415 - third-party free
driver : xserver-xorg-video-nouveau - distro free builtin

我這張卡,可以裝到 430 的版本,
接下來就繼續安裝驅動程式、裝完之後重開機

1
2
sudo apt install nvidia-driver-430 
sudo reboot

這時候,應該可以看到一些基本資訊

1
lsmod|grep nvidia

大概長這樣

1
2
3
4
5
6
7
nvidia_uvm            798720  0
nvidia_drm 45056 3
nvidia_modeset 1093632 7 nvidia_drm
nvidia 18194432 258 nvidia_uvm,nvidia_modeset
drm_kms_helper 172032 1 nvidia_drm
drm 401408 6 drm_kms_helper,nvidia_drm
ipmi_msghandler 53248 2 ipmi_devintf,nvidia

到這邊 nvidia 驅動程式安裝完成,接下來安裝 cuda

安裝 CUDA

同樣採用官網下載deb 回來安裝的方法

到這邊 https://developer.nvidia.com/cuda-downloads

依照自己的系統選擇

我選擇 Linux – x86_64 – Ubuntu – 18.04 – deb(local)

畫面上就會有安裝步驟,照著做就沒問題了

1
2
3
4
5
6
7
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

一樣,安裝完成後重新開機,然後來編譯一個 deviceQuery 的小程式來看看資訊

1
2
cd /usr/local/cuda-10.1/samples/1_Utilities/deviceQuery
sudo make

會產生一個叫 deviceQuery 的執行檔,執行後,會有相關資訊

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
administrator@hqdc032:/usr/local/cuda-10.1/samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 720"
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 1996 MBytes (2093416448 bytes)
( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores
GPU Max Clock rate: 797 MHz (0.80 GHz)
Memory Clock rate: 900 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS

到這邊, CUDA 也安裝完成,再來是簡單的 postgresql 11

安裝 postgresql 11

步驟差不多,就是新增repository,然後選擇要安裝的套件,不過套件的選擇和平常安裝postgresql 不太一樣

1
2
3
4
5
sudo apt install wget ca-certificates
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
sudo apt update
sudo apt install postgresql-client-11 postgresql-11 postgresql-server-dev-11 postgresql-common libicu-dev

跑完之後,就直接啟動 postgresql service 就可以了

再來是最麻煩的 pgstorm

pgstorm

話說,這軟體到底叫啥名字? pgstrom , pg-strom ? 看起來就像是拼錯字啊! =.=

https://github.com/heterodb/pg-strom

先 git clone 回來,然後make、make install

講是很簡單,但是一開始碰到很多問題,有去github 跟開發團隊回報,幸好有回覆我..

總之,目前在ubuntu 18.04 + postgresql-11 的環境下編譯是沒有問題了

UPDATE

今天拿到一張 GTX 1050 ti ,想說終於可以來測試看看 pg_strom 了

不過發現在ubuntu 底下,照著這篇操作還是會有問題

在做完git clone 要 make 之前,要先執行底下兩行指令

其中的 11 是 postgresql 版本,要依照自己安裝的版本做調整

1
2
sudo ln -snf /usr/lib/postgresql/11/lib/libpgcommon.a /usr/lib/x86_64-linux-gnu/libpgcommon.a
sudo ln -snf /usr/lib/postgresql/11/lib/libpgport.a /usr/lib/x86_64-linux-gnu/libpgport.a

接著再去 make 就沒問題了

1
2
3
git clone https://github.com/heterodb/pg-strom.git
cd pg-strom
make && sudo make install

再來設定一下 postgresql

postgresql 相關設定

需要修改一下postgresql.conf,來指定載入 pgstrom 的 library

官方是這麼說的

1
PG-Strom module must be loaded on startup of the postmaster process by the shared_preload_libraries. Unable to load it on demand. Therefore, you must add the configuration below.

修改 /etc/postgresql/11/main/postgresql.conf
加入一行

1
shared_preload_libraries = '$libdir/pg_strom'

然後還有其他三個要修改,不過這個不改不會影響pgstrom 的啟動

看狀況要不要修改吧

1
2
3
max_worker_processes = 100
shared_buffers = 10GB
work_mem = 1GB

修改完後,重新啟動 postgresql service 有沒有很感動!?我終於可以享受用GPU跑SQL Query 的快感啦!!!

咦??等等,為什麼postgresql service 沒起來!?

看一下 /var/log/syslog

1
2
3
4
5
6
7
8
9
10
Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: Error: /usr/lib/postgresql/11/bin/pg_ctl /usr/lib/postgresql/11/bin/pg_ctl start -D /var/lib/postgresql/11/main -l /var/log/postgresql/postgresql-11-main.log -s -o  -c config_file="/etc/postgresql/11/main/postgresql.conf"  exited with status 1:
Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: 2019-08-20 14:23:43.538 CST [11806] LOG: PG-Strom version 2.2 built for PostgreSQL 11
Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: 2019-08-20 14:23:43.565 CST [11806] LOG: PG-Strom: GPU0 GeForce GT 720 - CC 3.5 is not supported
Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: 2019-08-20 14:23:43.565 CST [11806] FATAL: PG-Strom: no supported GPU devices found
Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: 2019-08-20 14:23:43.565 CST [11806] LOG: database system is shut down
Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: pg_ctl: could not start server
Aug 20 14:23:43 hqdc032 postgresql@11-main[11801]: Examine the log output.
Aug 20 14:23:43 hqdc032 systemd[1]: postgresql@11-main.service: Can't open PID file /run/postgresql/11-main.pid (yet?) after start: No such file or directory
Aug 20 14:23:43 hqdc032 systemd[1]: postgresql@11-main.service: Failed with result 'protocol'.
Aug 20 14:23:43 hqdc032 systemd[1]: Failed to start PostgreSQL Cluster 11-main.

啊幹!pg-strom 不支援這張GT 720啦!

https://github.com/heterodb/pg-strom/wiki/001:-GPU-Availability-Matrix

簡單說,就是至少從 GTX 1080 起跳,其他都不用想了

幹,花了兩天的時間在弄這東西,結果明明一開始就應該要先檢查的相容列表卻沒有檢查…

好了,現在就看准不准我買一張 GTX 1080 來測試了….

只是這價格嘛…嗯咳,不是我該煩惱的問題了..

# nvidia
You forgot to set the qrcode for Alipay. Please set it in _config.yml.
You forgot to set the qrcode for Wechat. Please set it in _config.yml.
You forgot to set the business and currency_code for Paypal. Please set it in _config.yml.
You forgot to set the url Patreon. Please set it in _config.yml.
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×