Вы находитесь на странице: 1из 15

AUTOMATIC HIGHLIGHTS EXTRACTION IN CRICKET

CS365A - ARTIFICIAL INTELLIGENCE


IIT KANPUR
2013-14


Anjani Kumar (11101) Guided by:
Sumedh Masulkar (11736) Dr. Amitabha Mukerjee
{anjanik, sumedh}@iitk.ac.in amit@cse.iitk.ac.in

April 23, 2014












1









ABSTRACT

This project deals with a very important branch of artificial intelligence, i.e. Vision. This
project presents ways to detect important segments of a cricket video, and thus extract highlights
automatically froma full length video. Similar techniques can be implied to create highlights of other
sports too, thus reducing huge amount of efforts people put to create game highlights in sports. In
this project, we specifically aimto produce better and reliable results using supervised learning. The
extraction process is carried out at multiple levels, removing some unnecessary part of the video at
each level. We achieve promising results, which are significant improvements over those of
unsupervised methods prevalent for this problem.

2






ACKNOWLEDGEMENTS

We would like to express our sincere gratitude towards Dr. Amitabha Mukerjee, Department
of Computer Science and Engineering, IIT Kanpur, and our instructor for his invaluable support and
guidance provided throughout the project. We consider ourselves very fortunate to have worked
under his supervision. We are also grateful to Mr. MS Ram and Miss. Sunakshi Gupta for their
suggestions. We are also thankful to Mr. Dipen Raghuwani for the dataset he provided.
And, lastly, we thank all our friends and colleagues for their support and assistance in the
course of the project.




3
Coming Up.

1. Introduction.(5)

2. Background..(5)

3. Our Approach..(6)
1. Hierarchy of Extraction......(6)
2. Level 1(Difficulty)..(7)
3. Level 2(Difficulty and solutions)(7)
4. Level 3(Difficulty and solution)....(8)
5. Level 4a(Another approach) ......(9)
6. Level 4b(Difficulties) (9)
7. Level 5.(10)

4. Results...(11)

5. Future Work(13)

6. Conclusion ..(13)

7. Bibliography(14)

8. Disclaimer(15)

4
Introduction

Highlightsextractionofsportsisapopulartopic.Sportsvideosoffulllengthcontainuninteresting
eventstoo,andintodaystimecompressedworld,peopleonlywanttoviewtheinteresting
sequencesofasportmatch.Cricketisthesecondmostwatchedgameintheworld.Our
interestincricketwasanothermotivatingfactortotakethisasaproject.
Whatpeopledonotexpectincrickethighlights?
Uninterestingpartofthegame,replays,andspectatorscheering.So,weaimto
automaticallydetectframesbelongingtothesesequencesandremovethem,
thusproducinghighlightsofthematch.Actually,similarapproachescanbeused
toextracthighlightsinothersportstoo.
Whatelseisexpectedfromtheproject?
Thedetectionoftheeventsmustbeaccurateenough,sothatneitheritmisses
outimportantevents,nordoesitfailtoremovetheuninterestingparts.

Background

AutomaticgenerationofcrickethighlightsusingHiddenMarkovModel(HMM)wasproposedin
[1][2][3].[3]fusedinaudioinformationinadditiontomotioninformation.Whereasin[4],the
authorproposedanunsupervisedeventdiscoveryanddetectionframeworkwithuseofcolor
histogram(CH)orhistogramoforientedgradients(HOG),whichcanpotentiallybegeneralizedto
differentsports.Theunigramandbigramstatisticsofdetectedeventsarethenusedtoprovidea
compactrepresentationofthevideo.[5]presentedanothernovelapproachtowardshighlights
generationofsportsvideosbyextractingeventsandsemanticconcepts.Themethodextracted
eventsequencefromvideoandclassifieseachsequenceintoaconceptbysequential
associationmining.Theextractedconceptsandeventsarethenselectedaccordingtotheir
degreeofimportance.Thiswasfurtherimprovedin[6].

[6]presentedahierarchicalframeworkandeffectivealgorithmsforcricketeventdetectionand
classification,whichavoidsshotdetectionandclustering.Extractionwasdividedintomultiple
levels(describedinourapproachbelow).

[7]againusedshotdetectiontechniques,andtextprocessingonthecommentarytoidentify
actionineachball.

5
Our Approach

Weareprimarilygoingtofollowtheworkdonein[6].
Hierarchy of Extraction

(Image
from[6])

Ascanbeseeninthediagram,thereare5levelsinthehierarchy.
Level1Excitementdetection:Aparticularvideoframeisconsideredasanexcitation
frameifproductofitsaudioexcitementandzerocrossingrate(ZCR)exceedsacertain
threshold.
Level2ReplayDetection:Replaysegmentissandwichedbytwologotransitions.
Hence,replayscanbedetectedusingHueHistogramDifference(HDD)andremoved.
Level3FieldViewDetection:DominantGrassPixelRatio(DGPR)iscalculatedfora
view,whichvariesbetween0.16to0.24forthefieldview.Thus,anonfieldviewcanbe
removed.
Level4FieldViewandCloseUpDetection:Percentageoffieldpixelsinregionsare
calculatedandsomethresholdsarefixed,andframecanthenbeclassifiedaslongview,
6
boundaryvieworpitchview.Similarly,edgepixelsareusedtodetectcloseviewsor
crowdviews.
Level5FieldersgatheringorcrowdDetection:Crowdframesareremovedfromthe
video.Thedetectionisdonebycomputinghistogramdistanceofhuehistogramof
frames.
Thus,highlightsforthegivenvideoareextracted.

Level1:ExcitementDetection
Level1ofthehierarchyinvolves
Spectatorscheerandcommentatorsspeechanalysis.
TwopopularcontentanalysistechniquesShorttimeaudioenergy(E)andShorttime
ZeroCrossingRate(Z)
[6]
.
IfE*Zisgreaterthanagiventhreshold(somefunctionofmean),theparticularframeis
anexcitationframe,otherwisenot.
Alltheframes,markedunexcitedmayberemovedfromthesequence.

Difficulty:Matlabranoutofmemorywhenwetriedtoreadafilelargerthan23seconds.Sowe
couldnottestthisonthecricketvideowewereusing.Evenifwetriedtestingonsmallerpartsof
video,thatwouldhavegivenuswrongmean,andincorrectresults.Butwetesteditonadifferent
file(so.wav,availablewithcode),onwhichitseemedtoworkjustfine.

Level2:ReplayDetection
Characteristicsofactionreplaywetriedtoexploitfordetection:
Areplayissandwichedbetweentwologo(templates)transitionsandthescore
barisremoved.








Wecandetectthelogotransitions(andthusreplay)bycalculatinghuehistogram
differenceofframeswiththereferencelogotemplate.



7
Problems with the approach:
Thethresholdwhichshouldbeusedtoclassifyaframeaslogotransition,maydifferfromlogoto
logo,andthus,frommatchtomatch.Ifsetat1/4thofmean,itdoesnotshowmuchpromising
result.Therearenofalsepositives,buttherearemanytruenegatives.
Ourproposedsolution:
Method1.Weproposeasolutionwhichislessdependentonthespecificmatchforreplay
detection.Wecalculatercorrelationcoefficient(corr2inmatlab)oftheframeswiththe
referencelogo.

where =mean2(A),and =mean2(B).


corr2calculatesthedegreesimilarityofimages.Forexactlysimilarimages,itsvalueis1and0
fortwoverydifferentimages.Thus,applyingcorr2onareplaytemplateandframes,ifthevalue
ofcorrelationcoefficientisgreaterthansomethreshold,wecanclassifyitaspartoflogo
transition.Experimentally,wefoundbestresultswhenthethresholdwastakentobe0.65.This
methodisalsomuchfasterthantheapproachin[6].
Method2(notintheposter).Anothermethodwhichshowsbetterresultswouldbetosimply
detectthescorebars,alsosuggestedin[8].Ifaframedoesnotcontainscorebar,thanitisa
replayframe.Detectingscorebariseasyandefficient.Wetakeareferencescorebar,thento
detectifaframecontainsscorebar,wetakethepartwherethescorebarissupposedtobe.For,
ourdataset,itwasatbottom6/7
th
oftheimage.Then,wecalculatehuehistogramdifferenceof
thepartwithreferencescorebar.Athresholdof8000gavepromisingresultswith
accuracy>97%.

Level3:FieldViewDetection
DominantGrassPixelRatio(DGPR)isusedtoclassifyframes.
DGPR=(x
g
/x)wherex
g
isnumberofpixelsofgrass,andxistotalnumberofpixels.
Forfieldview,DGPRvaluesisgreaterthan0.07whereasDGPRissmallerfornonfield
views.
Problems with the approach:
Thethresholdmayvaryfrommatchtomatchsincecolorofgrassinthematchmayvary.Itis
reallydifficulttocalculatethethresholdexperimentallyforeverymatch.

8
Ourproposedsolution:
Weproposetousesupervisedlearning.Wetrainsvmonsometrainingimagesoffieldview,and
thenusethetrainedsvmtoclassifyimagesasfieldviewornonfieldview.

Level4a:FieldViewClassification
Classifiedaspitchview,longvieworboundaryview.
Introducestheconceptoffluxtensortemporalvariationsoftheopticaloweldwithin
thelocal3Dspatiotemporalvolume.
Percentageoffieldpixelsusedtodifferentiatebetweenviews.
Algorithm
[6]

Problems with the approach:


Thethresholdmayvaryfrommatchtomatchsincecolorofgrassinthematchmayvary.Itis
reallydifficulttocalculatethethresholdexperimentallyforeverymatch.Andifthevideoisnot
classifiedaslongvieworboundaryview,duetolittlevariationfromthreshold,itisincorrectly
classifiedaspitchview,whichgivesalotsoffalsepositivesforpitchview,asvisibleinthe
results.
Ourproposedsolution:
Weproposetousesupervisedlearning.Wetrainsvmonsometrainingimagesoffieldview,and
thenusethetrainedsvmtoclassifyimagesaslongview,boundaryorpitchview.

Level4b:NonFieldViewClassification
CloseUporCrowdview
RGBimageisconvertedtoYC
b
C
r
.
Percentageofedgepixels(EP)arecalculatedusingCannyoperator.
AthresholdforEPclassifiesframesascloseupvieworcrowdview.
9

Note:Wealsotriedsvmforthislevel,anditgavefarbetterresultsforcrowddetection,butnotin
caseofcloseups.Theimagesofthefollowingkindisalwaysclassifiedascrowdview,which
istaggedasacloseupframemanually.Itisimportanttorealizethatsolutionofthisproblemis
nottrivial.Thisistheonlyreason,supervisedandunsupervisedmethodsbothgave<8590%
accuraciesforlevel4b.

Level5:CrowdClassification
5bCrowdclassificationintospectatorsorfieldersgathering.
Fieldersusuallygatherafteraninterestingeventandhavefieldasbackground,andhave
similarcoloredclothes.Thisframesshouldbekeptinhighlights.

Fieldersgathering Spectatorsview

10
Results

Important:
.(1)
.(2)
*Fromwikipedia.
Heretp=Truepositives,fp=Falsepositives,fn=FalseNegatives

Experimental Results:
Level2:ReplayDetection
Usingthreshold0.65forcorr2,weobservedfollowingvaluesofprecisionandrecallusing
ourmethodandusingapproachsuggestedin[6].
Theprecisionandrecallfor[6]arebasedoncodewedevelopedandwastestedonour
dataset.

Replaydetection(Our
approach)
Replaydetection(approach
in[6])
Precision 100% 100%
Recall 96.77% 53.33%

Level2:ScorebarDetection
[7]
(addedafterposterpresentation)
Weobservedfollowingrecallandprecisionfordetectingframeswithnoscorebars.The
testingwasdoneonover20000frames.
Precision96.99%
Recall100%

ClassificationintoFieldviewsandNonfieldviews:

Aftertrainingsvmon4254images,andtestingon4176images,weobservedfollowingprecision
andrecall
Precision96.48%
Recall88.03%
ascomparedtothefollowingfigureswereproducedbyusingapproachproposedby[6],after
performingthetestsonsameimages,
Precision72.70%
11
Recall78.63%

Allthefollowingtestshavebeenperformedonover4000images,aftertrainingdoneby
approximatelysamenumberofimages,ifnotmentionedotherwise.

ClassificationofFieldviewintopitch,long,boundaryviews:

Pitch
view(Our
method)
Pitchview
(approach
in[6])
Long
view(Our
method)
Longview
(approach
in[6])
Boundary
view(Our
method)
Boundary
view
(approach
in[6])
Precision 98.21% 20.66% 96.11% 63.82% 97.56% 8.47%
Recall 95.21% 69.18% 96.60% 28.03% 93.69% 27.5%

ClassificationofNonFieldviewintoCrowdview,Closeupviews:

Crowd
view(Our
method)
Crowdview
(approachin
[6])
Closeup
view(Our
method)
Closeupview
(approachin
[6])
Precision 94.29% 44.66% 82.42% 98.58%
Recall 98.54% 93.70% 52.71% 79.11%

Classificationofcrowdviewintofieldersgathering,spectatorscrowd:

Aftertrainingsvmon2444images,andtestingon1184images,weobservedfollowingprecision
andrecall
Precision100%
Recall99.42%

*Pleaseseedisclaimer.
12
Future Works

Infuture,wecanworkupontoimprovetheaccuracyevenmoreforthedetectionofeventsat
differentlevelssoastoextractthehighlightswithoutmuchhumanintervention.Anythingless
thanperfect,maymissoutsomeinterestingevents,andmayinsteadaddothersequences
whichmaynotbeinterestingtotheviewer.Thiswouldbethereasonmanualextractionwouldbe
preferredoverautomaticextractionofsportshighlights.Thus,futureworkincludesimprovingthe
resultsbyfindingbetteralternatives.Anotherfutureworkincludesfindingsolutionstosome
problemssuchascloseupdetectioninlevel4b.

Conclusion

Weappliedsupervisedmethodstoperformeventdetectionincricket.Theresultsobserved
wereverysignificantlybetterthantheunsupervisedapproach.Wetestedourapproachesfor
eachlevelonadatasetofmorethan4000frames.Forthedifferentlevelsintheextraction
hierarchy,weachieved~95%foralmostalllevels.Thisseemsverypromisingforautomatic
highlightsextractioninsportsvideo.Hence,weachievedourgoal.

13
Bibliography

[1]KameshNamuduri.AutomaticextractionofhighlightsfromacricketvideousingMPEG7
descriptors.
[2]JinjunWang,ChangshengXu,EngsiongChng,QiTian.SportsHighlightDetectionfrom
KeywordSequencesUsingHMM,inProceedingsoftheInternationalConferenceonMultimedia
andExpo,2004.
[3]ChihCheihCheng,ChiouTingHsu.FusionofAudioandMotionInfromationonHMMBased
HighlightExtractionforBaseballGames,inProceedingsoftheIEEETransactionson
Multimedia,vol.8,no.3,June2006.
[4]HaoTang,VivekKwatra,MehmetEmreSargin,UllasGargi.DetectingHighlightsinSports
Videos:Cricketasatestcase,2011.
[5]MaheshkumarH.Kolekar,SomnathSengupta.Semanticconceptminingincricketvideosfor
automatedhighlightgeneration,2009.
[6]M.H.Kolekar,K.Palaniappan,S.Sengupta.SemanticEventDetectionandClassificationin
CricketVideoSequence,inProceedingsoftheIndianConferenceonComputerVision,
Graphics&ImageProcessing,2008.
[7]DipenRughwani.ShotClassificationandSemanticQueryProcessingonBroadcastCricket
Videos.http://cse.iitk.ac.in/~vision/dipen/.
[8]N.Harikrishna,SanjeevSatheesh,S.DineshSriram,K.S.Easwarakumar.Temporal
ClassificationofEventsinCricketVideos,2011.


14
Disclaimer

1.Alltheresultsshownusingapproachin[6]arebasedonthecodewedevelopedusingthe
suggestedapproach,andourdataset.
2.Thedatasetwereframes(>37000)fromfirst10oversofAustraliaSriLankamatchprovided
byDipenRughwani
[7]
.
3.Allthecodewasdevelopedbyusandnottakenfromanywhereexceptthecodefor
Level1(audioanalysis)zerocrossingrateandshorttimeenergywastakenfrommatlab
website,butthecodefortestingwaswrittenbyusagain.
http://www.mathworks.in/matlabcentral/fileexchange/23571shorttimeenergyandzerocrossin
grate

Note:

Allcodeusedfortestingpurposesisavailablehere
http://home.iitk.ac.in/~sumedh/cs365/project/code.zip
Forotherdetails,refer,
http://home.iitk.ac.in/~sumedh/cs365/project/
15

Вам также может понравиться