LSF 10.1部署笔记

环境准备

机名 IP地址 角色
etx1 192.168.0.81 管理节点
etx2 192.168.0.82 计算节点
etx3 192.168.0.83 计算节点

准备3台服务器用于管理节点和计算节点,

环境准备:配置,epel yum ,/etc/hosts, 共享目录nfs(共享路径/eda,用于安装lsf数据) ,关闭selinux, firewalld,配置sssd加域同步账户等,配置-略

安装包说明

上传安装包:

依次是:

lsf10.1_linux2.6-glibc2.3-x86_64-520099.tar.Z        安装文件,不需要解压

lsf10.1_linux2.6-glibc2.3-x86_64-529611.tar.Z        安装文件,不需要解压

lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z                     安装文件,不需要解压

lsf10.1_lsfinstall_linux_x86_64.tar.Z          主安装包文件

platform_lsf_std_entitlement.dat        license文件

开始安装

解压文件

tar -zxvf lsf10.1_lsfinstall_linux_x86_64.tar.Z

设置安装环境, 编辑install.config文件

LSF_TOP=”/eda/ibm/”

LSF_ADMINS=”lsfadmin”

LSF_CLUSTER_NAME=”xxxxcluster1″

LSF_MASTER_LIST=”etx1″

LSF_ENTITLEMENT_FILE=”/eda/lsf_10.1_529611/platform_lsf_std_entitlement.dat”

CONFIGURATION_TEMPLATE =”HIGH_THROUGHPUT”

LSF_TARDIR=”/eda/lsf_10.1_529611″

LSF_ADD_SERVERS=”etx2 etx3″

【配置文件参数解释】

LSF_TOP : 设置安装路径。

LSF_ADMINS : 设置管理员账号,当前设置为我自己的账号,但是企业中建议创建一个公用的管理员账号 lsfadmin。

LSF_CLUSTER_NAME : 集群名称。

LSF_MASTER_LIST :master 机器列表,如果有多台机器,建议至少设置两台 master,作为冗余备份。

LSF_TARDIR : 安装文件解压缩路径。( 需要填写“lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z”和“llsf10.1_lsfinstall_linux_x86_64.tar”所在的目录,否则会报错No valid LSF distribution file(s) (.tar.Z or .tar.gz) is found in “/opt/lsf/tardir”.)

CONFIGURATION_TEMPLATE :配置模式,如果是 IC 应用场景,建议设置为 HIGH_THROUGHPUT 高性能模式。

LSF_ADD_SERVERS :添加计算机节点机器,也可以安装后配置。

LSF_ADD_CLIENTS :添加客户机(投递机)节点,也可以安装后配置。

LSF_ENTITLEMENT_FILE : license授权文件所在

执行安装:

[root@etx1 lsf10.1_lsfinstall]# ./lsfinstall -f install.config

Logging installation sequence in /eda/lsf_10.1_529611/lsf10.1_lsfinstall/Install.log

International Program License Agreement

Part 1 – General Terms

BY DOWNLOADING, INSTALLING, COPYING, ACCESSING, CLICKING ON

AN “ACCEPT” BUTTON, OR OTHERWISE USING THE PROGRAM,

LICENSEE AGREES TO THE TERMS OF THIS AGREEMENT. IF YOU ARE

ACCEPTING THESE TERMS ON BEHALF OF LICENSEE, YOU REPRESENT

AND WARRANT THAT YOU HAVE FULL AUTHORITY TO BIND LICENSEE

TO THESE TERMS. IF YOU DO NOT AGREE TO THESE TERMS,

* DO NOT DOWNLOAD, INSTALL, COPY, ACCESS, CLICK ON AN

“ACCEPT” BUTTON, OR USE THE PROGRAM; AND

* PROMPTLY RETURN THE UNUSED MEDIA, DOCUMENTATION, AND

Press Enter to continue viewing the license agreement, or

enter “1” to accept the agreement, “2” to decline it, “3”

to print it, “4” to read non-IBM terms, or “99” to go back

to the previous screen.

1

LSF pre-installation check …

Checking the LSF TOP directory /eda/ibm …

… Done checking the LSF TOP directory /eda/ibm …

You are installing IBM Spectrum LSF – 10.1 Standard Edition.

Checking LSF Administrators …

LSF administrator(s):       “lsfadmin”

Primary LSF administrator:  “lsfadmin”

Checking the configuration template HIGH_THROUGHPUT …

HIGH_THROUGHPUT will be used as the configuration template.

Done checking configuration template …

Done checking ENABLE_STREAM …

Done checking ENABLE_CGROUP …

Done checking ENABLE_GPU …

[Sat May 11 10:14:35 CST 2024:lsfprechk:WARN_2007]

Hosts defined in LSF_MASTER_LIST must be LSF server hosts. The

following hosts will be added to server hosts automatically: etx1.

Checking the patch history directory  …

Creating /eda/ibm/patch …

… Done checking the patch history directory /eda/ibm/patch …

Checking the patch backup directory …

… Done checking the patch backup directory /eda/ibm/patch/backup …

Searching LSF 10.1 distribution tar files in /eda/lsf_10.1_529611 Please wait …

1) linux2.6-glibc2.3-x86_64

Press 1 or Enter to install this host type: 1

You have chosen the following tar file(s):

lsf10.1_linux2.6-glibc2.3-x86_64

Checking selected tar file(s) …

… Done checking selected tar file(s).

Pre-installation check report saved as text file:

/eda/lsf_10.1_529611/lsf10.1_lsfinstall/prechk.rpt.

… Done LSF pre-installation check.

Installing LSF binary files ” lsf10.1_linux2.6-glibc2.3-x86_64″…

Creating /eda/ibm/10.1 …

Copying lsfinstall files to /eda/ibm/10.1/install

Creating /eda/ibm/10.1/install …

Creating /eda/ibm/10.1/install/scripts …

Creating /eda/ibm/10.1/install/instlib …

Creating /eda/ibm/10.1/install/patchlib …

Creating /eda/ibm/10.1/install/lap …

Creating /eda/ibm/10.1/install/conf_tmpl …

… Done copying lsfinstall files to /eda/ibm/10.1/install

Installing linux2.6-glibc2.3-x86_64 …

Please wait, extracting lsf10.1_linux2.6-glibc2.3-x86_64 may take up to a few minutes …

… Adding package information to patch history.

… Done adding package information to patch history.

… Done extracting /eda/lsf_10.1_529611/lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z.

Creating links to LSF commands …

… Done creating links to LSF commands …

Modifying owner, access mode, setuid flag of LSF binary files …

… Done modifying owner, access mode, setuid flag of LSF binary files …

Creating the script file lsf_daemons …

… Done creating the script file lsf_daemons …

… linux2.6-glibc2.3-x86_64 installed successfully under /eda/ibm/10.1.

… Done installing LSF binary files “linux2.6-glibc2.3-x86_64”.

Creating LSF configuration directories and files …

Creating /eda/ibm/work …

Creating /eda/ibm/log …

Creating /eda/ibm/conf …

Creating /eda/ibm/conf/lsbatch …

… Done creating LSF configuration directories and files …

Creating a new cluster “xxxxcluster1” …

Adding entry for cluster xxxxcluster1 to /eda/ibm/conf/lsf.shared.

Installing lsbatch directories and configurations …

Creating /eda/ibm/conf/lsbatch/xxxxcluster1 …

Creating /eda/ibm/conf/lsbatch/xxxxcluster1/configdir …

Added user group “lsfadmins” containing all cluster administrators.

Added host group “master_hosts” containing all master candidate hosts.

Creating /eda/ibm/work/xxxxcluster1 …

Creating /eda/ibm/work/xxxxcluster1/logdir …

Creating /eda/ibm/work/xxxxcluster1/live_confdir …

Creating /eda/ibm/work/xxxxcluster1/lsf_indir …

Creating /eda/ibm/work/xxxxcluster1/lsf_cmddir …

Adding server hosts …

Host(s) “etx1 etx2 etx3” has (have) been added to the cluster “xxxxcluster1”.

Adding LSF_MASTER_LIST in lsf.conf file…

… LSF configuration is done.

… Creating EGO configuration directories and files …

Creating /eda/ibm/conf/ego …

Creating /eda/ibm/conf/ego/xxxxcluster1 …

Creating /eda/ibm/conf/ego/xxxxcluster1/kernel …

Creating /eda/ibm/work/xxxxcluster1/ego …

… Done creating EGO configuration directories and files.

Configuring EGO components…

… EGO configuration is done.

… Creating resource connector configuration directories and files …

Creating /eda/ibm/conf/resource_connector …

Creating /eda/ibm/conf/resource_connector/ego …

Creating /eda/ibm/conf/resource_connector/openstack …

… Done creating resource connector configuration directories and files.

… Finished resource connector configuration.

… LSF inventory tag file is installed.

… LSF entitlement file is installed.

Creating lsf_getting_started.html …

… Done creating lsf_getting_started.html

Creating lsf_quick_admin.html …

… Done creating lsf_quick_admin.html

lsfinstall is done.

To complete your LSF installation and get your

cluster “xxxxcluster1” up and running, follow the steps in

“/eda/lsf_10.1_529611/lsf10.1_lsfinstall/lsf_getting_started.html”.

After setting up your LSF server hosts and verifying

your cluster “xxxxcluster1” is running correctly,

see “/eda/ibm/10.1/lsf_quick_admin.html”

to learn more about your new LSF cluster.

After installation, remember to bring your cluster up to date

by applying the latest fix pack from IBM Fix Central.

https://www.ibm.com/support/fixcentral/

Detailed steps for getting fixes from Fix Central, are in the

LSF installation guide on IBM Knowledge Center.

http://www.ibm.com/support/knowledgecenter/search/fix%20central?scope=SSWRJV

[root@etx1 lsf10.1_lsfinstall]# cd /eda/ibm/

[root@etx1 ibm]# ll

total 8

drwxr-xr-x. 12 root     root  201 May 11 10:15 10.1

drwxr-xr-x.  5 lsfadmin root  237 May 11 10:15 conf

drwxr-xr-x.  2 lsfadmin root    6 May 11 10:15 log

-rw-r–r–.  1 lsfadmin 10007 417 May 27  2016 LSF_redist.txt

drwxr-xr-x.  5 lsfadmin cad    68 May 11 10:14 patch

-rw-r–r–.  1 lsfadmin root  753 May 11 10:15 patch.conf

drwxr-xr-x.  3 lsfadmin root   21 May 11 10:15 properties

drwxr-xr-x.  3 lsfadmin root   26 May 11 10:15 work

安装完成

初始化配置

编辑/eda/ibm/conf/lsf.cluster.xxxxcluster1

编辑/eda/ibm/conf/lsbatch/xxxxcluster1/configdir/lsb.hosts

设置节点lsf自动启动守护进程(在新主机执行)

/eda/ibm/10.1/install/hostsetup –top=”/eda/ibm/” –boot=”y”

启动 lsfstartup

在节点上运行命令/eda/ibm/conf/lsbatch/xxxxcluster1/configdir/3_start_lsf.sh

lsload ,bhosts 命令查看集群是否正常

切换用户bsub测试

常用的维护脚本&命令

1_add_node.txt 添加新节点流程

step 1

edit lsf.cluster.hj-lsf  & add cluster server

edit lsb.hosts & add cluster host and ip

step 2

edit /etc/hosts & update all server /etc/hosts file

step 3 change master server

lsadmin reconfig

badmin mbdrestart

badmin reconfig

2_restart_lsf.sh 重启lsf节点lim,res,sbatch进程

source /eda/ibm/conf/profile.lsf

lsadmin limrestart

lsadmin resrestart

badmin hrestart

3_start_lsf.sh启动lsf节点lim,res,sbatch进程

source /eda/ibm/conf/profile.lsf

lsadmin limstartup

lsadmin resstartup

badmin hstartup

守护进程

[root@etx2 ~]# lsf_daemons status

Show status of the LSF subsystem

lim (pid 12330) is running…

res (pid 12343) is running…

sbatchd (pid 12342) is running…

[root@etx2 ~]# lsf_daemons restart

Stopping the LSF subsystem

Starting the LSF subsystem

[root@etx2 ~]#

LSF 10.1部署笔记

发表回复

滚动到顶部