Вы находитесь на странице: 1из 22

Job Management Systems

Author: Anand Vaidya
Why use SGE?
Maintain order in a shared resource li!e "ueing u#
at a movie tic!et counter rather than mobbing the
A##ly di$$erent usage #olicies %h&s and %ro$s get
better treatment than $irst year grads
Everyone gets a $air share o$ the com#uting
What is SGE?
SGE is a distributed resource management
Provides users the means to submit
computationally demanding tasks to the
SGE system for transparent distribution of
the associated workload.
How does SGE work?
Users submit jobs to the Grid Engine.
Unless resources are immediately available
non-interactive jobs are kept in ueues
until resources to e!ecute them become
"obs are passed onto the available
e!ecution hosts
#ecords of each jobs progress through the
system are kept and reported when
SGE Components
%aster &coordinate activities' hold ueues(
E!ecution &workers(
)dministration &sets up system' ueues etc(
Submit &users can submit jobs from these(
Usually the master and admin host are the same
*ueues &de+ned by the administrator(
User and )dministrator ,ommands
-aemons. sge/master &%aster -aemon('
sge/schedd &Scheduler -aemon(' sge/e!ecd
&E!ecution -aemon( and sge/commd
&,ommunication -aemon(
SGE Commands - qhost
'hat is the state o$ the cluster( )o* many nodes+
ty#e+ load( 'hat is my chance o$ getting a node(
,root@shar! -./ "host
)0S12AME A34) 24%5 60A& MEM101 MEM5SE S'A%10 S'A%5S
global 7 7 7 7 7 7 7
shar!7c88 l9:;7amd<; : :.8: 3.=G :;8.>M ;.8G 8.8
shar!7c8: l9:;7amd<; : :.88 3.=G :1;.=M ;.8G 8.8
shar!7c83 l9:;7amd<; : 1.?< 3.=G :1@.=M ;.8G 8.8
SGE Commands - qsub
4reate a Aobscri#ts BmyAob.shC
Submit $or e9ecution
D "sub myAob.sh
Eour Aob ?;: BFmyAob.shFC has been submitted.
Sim#lest Job:
,vaidya@shar! -.D cat myAob.sh
slee# 18
date I Htm#Htest1.out.t9t
Variations: "sub 7c*d myAob.sh
&,( )nand 0aidya anand1novaglobal.com.sg
SGE Commands - qstat
chec! status o$ your Aob:
"stat J "stat 7$ J
"stat 7u username J "stat 7A AobKid
,root@shar! -./ "stat
Aob7L& #rior name user state submitHstart at "ueue
slots Aa7tas!7L&
<3= 8.@@@88 )4%&LV? test1 r 8@H1?H:88< 18:1<:31 all."@shar!7c88
<@> 8.@@@88 )4%&LV1 test1 r 8@H1?H:88< 13:3?:3@ all."@shar!7c88
<=; 8.@@@88 M44&VL test1 r 8@H1?H:88< :3:@::1= all."@shar!7c8:
<=@ 8.@@@88 M44&VL1 test1 r 8@H1?H:88< :3:@::1= all."@shar!7c8:
SGE Commands - qstat
Status o$ the Aob is indicated by letters as:
"* 7 *aiting t 7 trans$ering
r 7 running s+S 7 sus#ended
37 restarted 1 7 threshold
SGE Commands - qdel
&elete your Aob+ i$ you *ish
"del ?;3
vaidya has deleted Aob ?;3
SGE Commands - qmon
"mon is a N'indo*s G5L tool to
submitHdeleteHvie* Aobs+ con$igure SGE system
E9am#le: Submit a Aob using "mon
4lic! the Job Submission icon.
4lic! the Job Scri#t $ile selection icon to o#en a $ile selection bo9
and select your scri#t $ile. 1hen+ clic! 0O.
4lic! the Submit button at the bottom o$ the Job Submission
A$ter a cou#le o$ seconds+ you should be able to monitor your
Aob in the Job 4ontrol dialog. 4lic! the Job 4ontrol icon in the
PM02 control #anel.
Eou $irst see it under %ending Jobs+ and it "uic!ly moves to
3unning Jobs a$ter it gets started.
SGE Commands qsh, qtcsh
Submit a Lnteractive session re"uest:
Ensure you have a valid NServer running on your
des!to#. Allo* remote 9clients to dis#lay on your
Submit an Lnteractive session re"uest:
2ote: using this $eature needs additional con$iguration+ may
not *or! other*ise.
SGE Commands obscript
sam#le Aob scri#t:
/D 7c*d
/D 7A y
/D 7S HbinHbash
/D 7V
slee# 18
SGE Commands obscript
sam#le Aob scri#t:
/D 7c*d
/D 7A y
/D 7S HbinHbash
DM%LK&L3Hm#irun 7n# D2S601S 7machine$ile
D1M%&L3Hmachines my#arallel#rog.e9e Qin$ile.t9t out$ile.t9tR
SGE Commands obscript
7c*d S change to current dir be$ore running Aob
7A y S merge error *ith stdout
7r y S code is re7runnable
72 Aname S set the Aob name
7l hKrt S 88:38:88 run Aob $or ma9 o$ 38mins
7#e m#ich Lnvo!e #arallel environment
7#e m#ich7ib use in$iniband #arallel environment
7#e m#ich7eth use ethernet #arallel env
7V S carry all env variable settings
!dmin Commands
2e9t $e* slides sho* commands use$ul $or SGE
admins Bnot usersHresearchersC
SGE Commands qcon"
com#le9es: "con$ 7sc
"ueues: "con$ 7s"l
%E: "con$ 7s#l
e9ec host: "con$ 7sel "con$ 7se c3@
submit hosts: "con$ 7ss
admin hosts: "con$ 7sh
list calendars "con$ 7scall
con$iguration "con$ 7scon$
user list: "con$ 7suserl
Scheduler con$: "con$ 7sscon$
SGE Commands qpin#
[anand@shark-c02 ~]$ qping -info shark-c01 537 execd 1
05/24/200 21!57!34!
"#$% &ersion! 0'1
"#$% (essage id! 1
s)ar) )i(e! 05/24/200 21!31!37
r.n )i(e [s]! 17+
(essages in read /.ffer! 0
(essages in 0ri)e /.ffer! 0
nr' of connec)ed c1ien)s! 2
s)a).s! 0
info! dispa)cher! $ *0'04- 2 34
%oni)or! disa/1ed
$S% Commands
bsub submit a job
bsto# sus#end a Aob
bresume resume a sus#ended tas!
bto# move Aob to to#
bs*itch move Aobs bet*een "ueues
lsgrun run a tas! on a set o$ hosts
b!ill !ill a Aob
$S% Commands
lsmon monitor load, resource
lsid sho* ls$ details Bversion etcC
lshosts sho* hosts T static in$o
lsload sho* load in$o $or hosts
lsin$o sho* ls$ con$ig in$o
busers sho* user in$o
bacct sho* acct in$o on $inished Aobs
bAobs sho* in$o on Aobs
b#ee! sho* stdinHstdout o$ un$inished Aobs
!cknowled#ements & Copyin#
1his material is based on my e9#erience as *ell as material
collected $rom SGE documentation.
1his #resentation can be redistributed as $ollo*s:
2o commercial re7distribution: eg+ as #art o$ a $or7#ro$it 4&30M
or as #art o$ your sales #itch. See! my #ermission $irst.
Must attribute the document creator.
Share ali!e: L$ you use this document and enhance it or modi$y+
share the modi$ications or the modi$ied document
'hich means L a##ly: 4reative 4ommons 6icense+
'he End
1han!s $or your time. L$ you have any $eedbac!+ corrections or
"uestions #lease contact me: Anand Vaidya+
1his document *as created *ith 0#en0$$ice on 6inu9. email me i$
you *ant the od# $ile instead o$ the #d$