环境准备
机名 | IP地址 | 角色 |
etx1 | 192.168.0.81 | 管理节点 |
etx2 | 192.168.0.82 | 计算节点 |
etx3 | 192.168.0.83 | 计算节点 |
准备3台服务器用于管理节点和计算节点,
环境准备:配置,epel yum ,/etc/hosts, 共享目录nfs(共享路径/eda,用于安装lsf数据) ,关闭selinux, firewalld,配置sssd加域同步账户等,配置-略
安装包说明
上传安装包:
依次是:
lsf10.1_linux2.6-glibc2.3-x86_64-520099.tar.Z 安装文件,不需要解压
lsf10.1_linux2.6-glibc2.3-x86_64-529611.tar.Z 安装文件,不需要解压
lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z 安装文件,不需要解压
lsf10.1_lsfinstall_linux_x86_64.tar.Z 主安装包文件
platform_lsf_std_entitlement.dat license文件
开始安装
解压文件
tar -zxvf lsf10.1_lsfinstall_linux_x86_64.tar.Z
设置安装环境, 编辑install.config文件
LSF_TOP=”/eda/ibm/”
LSF_ADMINS=”lsfadmin”
LSF_CLUSTER_NAME=”xxxxcluster1″
LSF_MASTER_LIST=”etx1″
LSF_ENTITLEMENT_FILE=”/eda/lsf_10.1_529611/platform_lsf_std_entitlement.dat”
CONFIGURATION_TEMPLATE =”HIGH_THROUGHPUT”
LSF_TARDIR=”/eda/lsf_10.1_529611″
LSF_ADD_SERVERS=”etx2 etx3″
【配置文件参数解释】
LSF_TOP : 设置安装路径。
LSF_ADMINS : 设置管理员账号,当前设置为我自己的账号,但是企业中建议创建一个公用的管理员账号 lsfadmin。
LSF_CLUSTER_NAME : 集群名称。
LSF_MASTER_LIST :master 机器列表,如果有多台机器,建议至少设置两台 master,作为冗余备份。
LSF_TARDIR : 安装文件解压缩路径。( 需要填写“lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z”和“llsf10.1_lsfinstall_linux_x86_64.tar”所在的目录,否则会报错No valid LSF distribution file(s) (.tar.Z or .tar.gz) is found in “/opt/lsf/tardir”.)
CONFIGURATION_TEMPLATE :配置模式,如果是 IC 应用场景,建议设置为 HIGH_THROUGHPUT 高性能模式。
LSF_ADD_SERVERS :添加计算机节点机器,也可以安装后配置。
LSF_ADD_CLIENTS :添加客户机(投递机)节点,也可以安装后配置。
LSF_ENTITLEMENT_FILE : license授权文件所在
执行安装:
[root@etx1 lsf10.1_lsfinstall]# ./lsfinstall -f install.config
Logging installation sequence in /eda/lsf_10.1_529611/lsf10.1_lsfinstall/Install.log
International Program License Agreement
Part 1 – General Terms
BY DOWNLOADING, INSTALLING, COPYING, ACCESSING, CLICKING ON
AN “ACCEPT” BUTTON, OR OTHERWISE USING THE PROGRAM,
LICENSEE AGREES TO THE TERMS OF THIS AGREEMENT. IF YOU ARE
ACCEPTING THESE TERMS ON BEHALF OF LICENSEE, YOU REPRESENT
AND WARRANT THAT YOU HAVE FULL AUTHORITY TO BIND LICENSEE
TO THESE TERMS. IF YOU DO NOT AGREE TO THESE TERMS,
* DO NOT DOWNLOAD, INSTALL, COPY, ACCESS, CLICK ON AN
“ACCEPT” BUTTON, OR USE THE PROGRAM; AND
* PROMPTLY RETURN THE UNUSED MEDIA, DOCUMENTATION, AND
Press Enter to continue viewing the license agreement, or
enter “1” to accept the agreement, “2” to decline it, “3”
to print it, “4” to read non-IBM terms, or “99” to go back
to the previous screen.
1
LSF pre-installation check …
Checking the LSF TOP directory /eda/ibm …
… Done checking the LSF TOP directory /eda/ibm …
You are installing IBM Spectrum LSF – 10.1 Standard Edition.
Checking LSF Administrators …
LSF administrator(s): “lsfadmin”
Primary LSF administrator: “lsfadmin”
Checking the configuration template HIGH_THROUGHPUT …
HIGH_THROUGHPUT will be used as the configuration template.
Done checking configuration template …
Done checking ENABLE_STREAM …
Done checking ENABLE_CGROUP …
Done checking ENABLE_GPU …
[Sat May 11 10:14:35 CST 2024:lsfprechk:WARN_2007]
Hosts defined in LSF_MASTER_LIST must be LSF server hosts. The
following hosts will be added to server hosts automatically: etx1.
Checking the patch history directory …
Creating /eda/ibm/patch …
… Done checking the patch history directory /eda/ibm/patch …
Checking the patch backup directory …
… Done checking the patch backup directory /eda/ibm/patch/backup …
Searching LSF 10.1 distribution tar files in /eda/lsf_10.1_529611 Please wait …
1) linux2.6-glibc2.3-x86_64
Press 1 or Enter to install this host type: 1
You have chosen the following tar file(s):
lsf10.1_linux2.6-glibc2.3-x86_64
Checking selected tar file(s) …
… Done checking selected tar file(s).
Pre-installation check report saved as text file:
/eda/lsf_10.1_529611/lsf10.1_lsfinstall/prechk.rpt.
… Done LSF pre-installation check.
Installing LSF binary files ” lsf10.1_linux2.6-glibc2.3-x86_64″…
Creating /eda/ibm/10.1 …
Copying lsfinstall files to /eda/ibm/10.1/install
Creating /eda/ibm/10.1/install …
Creating /eda/ibm/10.1/install/scripts …
Creating /eda/ibm/10.1/install/instlib …
Creating /eda/ibm/10.1/install/patchlib …
Creating /eda/ibm/10.1/install/lap …
Creating /eda/ibm/10.1/install/conf_tmpl …
… Done copying lsfinstall files to /eda/ibm/10.1/install
Installing linux2.6-glibc2.3-x86_64 …
Please wait, extracting lsf10.1_linux2.6-glibc2.3-x86_64 may take up to a few minutes …
… Adding package information to patch history.
… Done adding package information to patch history.
… Done extracting /eda/lsf_10.1_529611/lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z.
Creating links to LSF commands …
… Done creating links to LSF commands …
Modifying owner, access mode, setuid flag of LSF binary files …
… Done modifying owner, access mode, setuid flag of LSF binary files …
Creating the script file lsf_daemons …
… Done creating the script file lsf_daemons …
… linux2.6-glibc2.3-x86_64 installed successfully under /eda/ibm/10.1.
… Done installing LSF binary files “linux2.6-glibc2.3-x86_64”.
Creating LSF configuration directories and files …
Creating /eda/ibm/work …
Creating /eda/ibm/log …
Creating /eda/ibm/conf …
Creating /eda/ibm/conf/lsbatch …
… Done creating LSF configuration directories and files …
Creating a new cluster “xxxxcluster1” …
Adding entry for cluster xxxxcluster1 to /eda/ibm/conf/lsf.shared.
Installing lsbatch directories and configurations …
Creating /eda/ibm/conf/lsbatch/xxxxcluster1 …
Creating /eda/ibm/conf/lsbatch/xxxxcluster1/configdir …
Added user group “lsfadmins” containing all cluster administrators.
Added host group “master_hosts” containing all master candidate hosts.
Creating /eda/ibm/work/xxxxcluster1 …
Creating /eda/ibm/work/xxxxcluster1/logdir …
Creating /eda/ibm/work/xxxxcluster1/live_confdir …
Creating /eda/ibm/work/xxxxcluster1/lsf_indir …
Creating /eda/ibm/work/xxxxcluster1/lsf_cmddir …
Adding server hosts …
Host(s) “etx1 etx2 etx3” has (have) been added to the cluster “xxxxcluster1”.
Adding LSF_MASTER_LIST in lsf.conf file…
… LSF configuration is done.
… Creating EGO configuration directories and files …
Creating /eda/ibm/conf/ego …
Creating /eda/ibm/conf/ego/xxxxcluster1 …
Creating /eda/ibm/conf/ego/xxxxcluster1/kernel …
Creating /eda/ibm/work/xxxxcluster1/ego …
… Done creating EGO configuration directories and files.
Configuring EGO components…
… EGO configuration is done.
… Creating resource connector configuration directories and files …
Creating /eda/ibm/conf/resource_connector …
Creating /eda/ibm/conf/resource_connector/ego …
Creating /eda/ibm/conf/resource_connector/openstack …
… Done creating resource connector configuration directories and files.
… Finished resource connector configuration.
… LSF inventory tag file is installed.
… LSF entitlement file is installed.
Creating lsf_getting_started.html …
… Done creating lsf_getting_started.html
Creating lsf_quick_admin.html …
… Done creating lsf_quick_admin.html
lsfinstall is done.
To complete your LSF installation and get your
cluster “xxxxcluster1” up and running, follow the steps in
“/eda/lsf_10.1_529611/lsf10.1_lsfinstall/lsf_getting_started.html”.
After setting up your LSF server hosts and verifying
your cluster “xxxxcluster1” is running correctly,
see “/eda/ibm/10.1/lsf_quick_admin.html”
to learn more about your new LSF cluster.
After installation, remember to bring your cluster up to date
by applying the latest fix pack from IBM Fix Central.
https://www.ibm.com/support/fixcentral/
Detailed steps for getting fixes from Fix Central, are in the
LSF installation guide on IBM Knowledge Center.
http://www.ibm.com/support/knowledgecenter/search/fix%20central?scope=SSWRJV
[root@etx1 lsf10.1_lsfinstall]# cd /eda/ibm/
[root@etx1 ibm]# ll
total 8
drwxr-xr-x. 12 root root 201 May 11 10:15 10.1
drwxr-xr-x. 5 lsfadmin root 237 May 11 10:15 conf
drwxr-xr-x. 2 lsfadmin root 6 May 11 10:15 log
-rw-r–r–. 1 lsfadmin 10007 417 May 27 2016 LSF_redist.txt
drwxr-xr-x. 5 lsfadmin cad 68 May 11 10:14 patch
-rw-r–r–. 1 lsfadmin root 753 May 11 10:15 patch.conf
drwxr-xr-x. 3 lsfadmin root 21 May 11 10:15 properties
drwxr-xr-x. 3 lsfadmin root 26 May 11 10:15 work
安装完成
初始化配置
编辑/eda/ibm/conf/lsf.cluster.xxxxcluster1
编辑/eda/ibm/conf/lsbatch/xxxxcluster1/configdir/lsb.hosts
设置节点lsf自动启动守护进程(在新主机执行)
/eda/ibm/10.1/install/hostsetup –top=”/eda/ibm/” –boot=”y”
启动 lsfstartup
在节点上运行命令/eda/ibm/conf/lsbatch/xxxxcluster1/configdir/3_start_lsf.sh
lsload ,bhosts 命令查看集群是否正常
切换用户bsub测试
常用的维护脚本&命令
1_add_node.txt 添加新节点流程
step 1
edit lsf.cluster.hj-lsf & add cluster server
edit lsb.hosts & add cluster host and ip
step 2
edit /etc/hosts & update all server /etc/hosts file
step 3 change master server
lsadmin reconfig
badmin mbdrestart
badmin reconfig
2_restart_lsf.sh 重启lsf节点lim,res,sbatch进程
source /eda/ibm/conf/profile.lsf
lsadmin limrestart
lsadmin resrestart
badmin hrestart
3_start_lsf.sh启动lsf节点lim,res,sbatch进程
source /eda/ibm/conf/profile.lsf
lsadmin limstartup
lsadmin resstartup
badmin hstartup
守护进程
[root@etx2 ~]# lsf_daemons status
Show status of the LSF subsystem
lim (pid 12330) is running…
res (pid 12343) is running…
sbatchd (pid 12342) is running…
[root@etx2 ~]# lsf_daemons restart
Stopping the LSF subsystem
Starting the LSF subsystem
[root@etx2 ~]#