Академический Документы
Профессиональный Документы
Культура Документы
Weusecookiestoimproveourwebsite.Bycontinuingweassumeyourpermissiontousecookies,asdetailedinourprivacyandcookiespolicy(/privacy)(closethismessage).
MENU
Marketbasketanalysis:identifyingproductsandcontent
thatgowelltogether
Anityanalysis(http://en.wikipedia.org/wiki/Anity_analysis)andassociationrulelearning
(http://en.wikipedia.org/wiki/Association_rule_learning)encompassesabroadsetofanalyticstechniquesaimedatuncoveringtheassociations
andconnectionsbetweenspecicobjects:thesemightbevisitorstoyourwebsite(customersoraudience),productsinyourstore,orcontent
itemsonyourmediasite.Ofthese,marketbasketanalysisisperhapsthemostfamousexample.Inamarketbasketanalysis,youlooktoseeif
therearecombinationsofproductsthatfrequentlyco-occurintransactions.Forexample,maybepeoplewhobuyourandcastingsugar,also
tendtobuyeggs(becauseahighproportionofthemareplanningonbakingacake).Aretailercanusethisinformationtoinform:
Storelayout(putproductsthatco-occurtogetherclosetooneanother,toimprovethecustomershoppingexperience)
Marketing(e.g.targetcustomerswhobuyourwithoersoneggs,toencouragethemtospendmoreontheirshoppingbasket)
Onlineretailersandpublisherscanusethistypeofanalysisto:
Informtheplacementofcontentitemsontheirmediasites,orproductsintheircatalogue
Driverecommendationengines(likeAmazonscustomerswhoboughtthisproductalsoboughttheseproducts)
Delivertargetedmarketing(e.g.emailingcustomerswhoboughtproductsspecicproductswithotherproductsandoersonthose
productsthatarelikelytobeinterestingtothem.)
Thereareawiderangeofalgorithms,availableonawidevarietyofplatforms,forperformingmarketbasketanalysis.Inthisintroductoryrecipe,
wewillcover:
1.Marketbasketanalysis:thebasics
2.PerformingmarketingbasketanalysisusingtheapriorialgorithmusingRandthe arules package
3.Managinglargeresultsets:visualizingrulesusingthe arulesViz package
4.Interpretingtheresults:usingtheanalysistodrivebusinessdecision-making
5.Expandingontheanalysis-zoomingoutfromthebaskettolookacustomerbehavioroverlongerperiodsanddierentevents
1.Marketbasketanalysis:thebasics
Terminology
Itemsaretheobjectsthatweareidentifyingassociationsbetween.Foranonlineretailer,eachitemisaproductintheshop.Forapublisher,
eachitemmightbeanarticle,ablogpost,avideoetc.Agroupofitemsisanitemset.
I = {i1 , i2 , ..., in }
Transactionsareinstancesofgroupsofitemsco-occuringtogether.Foranonlineretailer,atransactionis,generally,a,transaction.Fora
publisher,atransactionmightbethegroupofarticlesreadinasinglevisittothewebsite.(Itisuptotheanalysttodeneoverwhatperiodto
measureatransaction.)Foreachtransaction,then,wehaveanitemset.
tn = {ii , ij , ..., ik }
Rulesarestatementsoftheform
Theoutputofamarketbasketanalysisisgenerallyasetofrules,thatwecanthenexploittomakebusinessdecisions(relatedtomarketingor
productplacement,forexample).
http://snowplowanalytics.com/guides/recipes/cataloganalytics/marketbasketanalysisidentifyingproductsthatsellwelltogether.html 1/9
3/21/2017 MarketbasketanalysisidentifyingproductsandcontentthatgowelltogetherSnowplow
Thesupportofanitemoritemsetisthefractionoftransactionsinourdatasetthatcontainthatitemoritemset.Ingeneral,itisnicetoidentify
rulesthathaveahighsupport,asthesewillbeapplicabletoalargenumberoftransactions.Forsupermarketretailers,thisislikelytoinvolve
basicproductsthatarepopularacrossanentireuserbase(e.g.bread,milk).Aprintercartridgeretailer,forexample,maynothaveproductswith
ahighsupport,becauseeachcustomeronlybuyscartridgesthatarespecictohis/herownprinter.
ThecondenceofaruleisthelikelihoodthatitistrueforanewtransactionthatcontainstheitemsontheLHSoftherule.(I.e.itisthe
probabilitythatthetransactionalsocontainstheitem(s)ontheRHS.)Formally:
TheliftofaruleistheratioofthesupportoftheitemsontheLHSoftheruleco-occuringwithitemsontheRHSdividedbyprobabilitythatthe
LHSandRHSco-occurifthetwoareindependent.
Ifliftisgreaterthan1,itsuggeststhattheprecenseoftheitemsontheLHShasincreasedtheprobabilitythattheitemsontherighthandside
willoccuronthistransaction.Iftheliftisbelow1,itsuggeststhatthepresenceoftheitemsontheLHSmaketheprobabilitythattheitemson
theRHSwillbepartofthetransactionlower.Iftheliftis1,itsuggeststhatthepresenceofitemsontheLHSandRHSreallyareindependent:
knowingthattheitemsontheLHSarepresentmakesnodierencetotheprobabilitythatitemswilloccurontheRHS.
Whenweperformmarketbasketanalysis,then,wearelookingforruleswithaliftofmorethanone.Ruleswithhighercondenceareones
wheretheprobabilityofanitemappearingontheRHSishighgiventhepresenceoftheitemsontheLHS.Itisalsopreferable(highervalue)to
actionrulesthathaveahighsupport-asthesewillbeapplicabletoalargernumberoftransactions.However,inthecaseoflong-tailretailers,
thismaynotbepossible.
Backtotop.
2.PerformingmarketingbasketanalysisusingtheapriorialgorithmusingRandthearules
package
Justtorecap:thepurposeofthisanalysisistogenerateasetofrulesthatlinktwoormoreproductstogether.Eachoftheserulesshouldhavea
liftgreaterthanone.Inaddition,weareinterestedinthesupportandcondenceofthoserules:highercondencerulesareoneswherethereis
ahigherprobabilityofitemsontheRHSbeingpartofthetransactiongiventhepresenceofitemsontheLHS.Wedexpectrecommendations
basedontheserulestodriveahigherresponserate,forexample.Werealsobetteroactioningruleswithhighersupportrst,asthesewillbe
applicabletoawiderrangeofinstances.
Inthisexample,weregoingtoperformtheanalysisforanonlineretailerrunningSnowplow.Weregoingtodotheclassicmarketbasket
analysis:bythatImeanwearegoingtolookforrulesbasedonactualtransactions.(Lateroninthisrecipe,wellconsidertheprosandconsof
deningthescopeorourbasketdierently.)
WeregoingtouseR(http://www.r-project.org/)toperformthemarketbasketanalysis.Risagreatstatisticalandgraphicalanalysistool,well
suitedtomoreadvancedanalysis.WeregoingtousetheArulespackage(http://cran.r-project.org/web/packages/arules/index.html),which
implementstheApriori(http://en.wikipedia.org/wiki/Apriori_algorithm)algorithm,oneofthemostcommonlyusedalgorithmsforidentifying
associationsbetweenitems.
Tostartwith,weneedtofetchtransactiondatafromSnowplowwhichidentiesgroupsofitemsbytransaction.ThefollowingSQLqueryfetches
thesedirectly:itreturnsalineofdataforeverylineitemofeachtransaction,withthetransactionidandtheitemname:
/*PostgreSQL/Redshift*/
SELECT
"ti_orderid"AS"transaction_id",
"ti_name"AS"sku"
FROM
"events"
WHERE
"event"='transaction_item'
WecanpullthisdatadirectlyintoRfromR.(ForassistancesettingupRtousewithSnowplow,seethesetupguide
(https://github.com/snowplow/snowplow/wiki/Setting-up-R-to-perform-more-sophisticated-analysis-on-your-Snowplow-data).)First,weload
upR,andconnectRtoourSnowplowtableinRedshiftbyenteringthefollowingattheRprompt:
http://snowplowanalytics.com/guides/recipes/cataloganalytics/marketbasketanalysisidentifyingproductsthatsellwelltogether.html 2/9
3/21/2017 MarketbasketanalysisidentifyingproductsandcontentthatgowelltogetherSnowplow
library("RPostgreSQL")
con<dbConnect(drv,host="<<REDSHIFTENDPOINT>>",port="<<PORTNUMBER>>",dbname="<<DBNAME>>",user="<<USERNAME>>",password="<<P
ASSWORD>>")
ThenweexecuteourSQLqueryabove,fetchingthedataasadataframeinR:
t<dbGetQuery(con,"
SELECT
\"ti_orderid\"AS\"transaction_id\",
\"ti_name\"AS\"sku\"
FROM
\"events\"
WHERE
\"event\"='transaction_item'
")
Wecantakeapeakattherstverecordsonourdataframebyexecuting
head(t)
Notehoweachlineofdatarepresentsasinglelineitem,sothatthersttransaction(whichincludestwoitems)spanstwolines.
Nowweneedtorecordslinesbytransactionid,sothattheindividualproductsthatbelongtoeachtransactionareaggregatedacrossrecords
intoasinglerecordasanarrayofproducts.ThisisdonebyexecutingthefollowingattheRprompt:
i<split(t$sku,t$transaction_id)
http://snowplowanalytics.com/guides/recipes/cataloganalytics/marketbasketanalysisidentifyingproductsthatsellwelltogether.html 3/9
3/21/2017 MarketbasketanalysisidentifyingproductsandcontentthatgowelltogetherSnowplow
NowweconvertthedataintoaTransactionobjectoptimizedforrunningthearulesalgorithm:
library("arules")
txn<as(i,"transactions")
Finally,wecanrunouralgorithm:
basket_rules<apriori(txn,parameter=list(sup=0.005,conf=0.01,target="rules"))
Whenrunningtherule,wesetminimumsupportandcondencethresholds,belowwhichRignoresanyrules.Theseareusedtooptimizethe
runningofthealgorithm:guringoutassociationrulescanbecompulationallyexpensive,becauseforacompanywithalargecatalogueof
items,thenumberofcombinationsofitemsisenormous(itincreasesexponentiallywiththenumberofitems).Hence,anythingwegivethe
algorithmtominimizethecomputationalburdeniswelcome.
Inourcase,wevegivenlowguresforsupportandcondence.Thisisbecauseourtestexampleisbasedonalongtailretailer,whooersmore
than10kSKUs,againstwhichc.90kpurchaseshavebeenmade.Themaximumsupportanyoneoftheproductshasisverylow:thiscanbe
conrmedbyplottingtherelativefrequencyofeachitem(i.e.thefractionoftransactions)forthetop25itemsbyitemfrequency(i.e.the
fractionoftransactionsthateachitemappearsin).Thiscanbedonebyrunning:
itemFrequencyPlot(txn,topN=25)
Inwhichcasethefollowingplotwasproduced:
http://snowplowanalytics.com/guides/recipes/cataloganalytics/marketbasketanalysisidentifyingproductsthatsellwelltogether.html 4/9
3/21/2017 MarketbasketanalysisidentifyingproductsandcontentthatgowelltogetherSnowplow
Notehowthemostfrequentitemappearsinlessthan2%oftransactionsrecorded.
Inyourcasethedistributionofitemsbytransactionmightlookverydierent,andsoverydierentsupportandcondenceparametersmaybe
applicable.Todeterminewhatworksbest,youneedtoexperimentwithdierentparameters:youllseethatasyoureducethem,thenumberof
rulesgeneratedwillincrease,whichwillgiveyoumoretoworkwith.However,youllneedtosiftthroughtherulesmorecarefullytoidentify
thosethatwillbemoreimpactfulforyourbusiness.Wereturntothisthemeinthenextsection.
Lastly,letsinspecttheactualrulesgeneratedbythealgorithm:
inspect(basket_rules)
Inourcase,thealgorithmhasidentied9rules.Therst7arenothelpful:therearenoitemsontheLHS.(Forthesesevenrules,notehow
becausetherearenoitemsontheLHS,thesupport=thecondenceandthelift=1.)
Thelasttworulesareinterestingthough:theysuggestthatpeoplewhobuytheMemoBlockApplearemorelikelytobuytheMemoBlock
Pearandvice-versa.Notjustthat,buttheyaremuchmorelikelytodoso:thecondenceis66-suggestingtheyareverystronglyassociated.
Backtotop.
3.Managinglargeresultsets:visualizingrulesusingthearulesVizpackage
http://snowplowanalytics.com/guides/recipes/cataloganalytics/marketbasketanalysisidentifyingproductsthatsellwelltogether.html 5/9
3/21/2017 MarketbasketanalysisidentifyingproductsandcontentthatgowelltogetherSnowplow
3.Managinglargeresultsets:visualizingrulesusingthearulesVizpackage
Inthepreviousexamplewesettheparametersforsupportandcondencesothatonlyasmallsetofruleswerereturned.Asmentioned,
however,itisoftenbettertoreturnalargerset,toincreasethechancesthatwegeneratemorerelevantrulesforourbusiness.
Letsrerunthealgorithm,butthistimereduceourparametersforsupportandcondence,andsavetheresultsetintoadierentobject:
basket_rules_broad<apriori(txn,parameter=list(sup=0.001,conf=0.001,target="rules"))
Inourcase,3.2Mruleswerereturned.Thisiswaytomanytovisuallyinspect-howeverwecanlookatthetop20bylift:
library("arulesViz")
plot(basket_rules_broad)
Ourplotlooksasfollows:
http://snowplowanalytics.com/guides/recipes/cataloganalytics/marketbasketanalysisidentifyingproductsthatsellwelltogether.html 6/9
3/21/2017 MarketbasketanalysisidentifyingproductsandcontentthatgowelltogetherSnowplow
Theplotshowsthatruleswithhighlifttypicallyhavelowsupport.(Thisisnotsurprising,giventhemaths.)Wecanuseaplotliketheoneabove
toidentifyruleswithbothhighsupportandcondence:the arulesViz packageletsusplotthegraphsinaninteractivemode,sothatwecan
clickonindividualpointsandexploretheassociateddata.Formoredetails,see[thefullpackageinstructions](http://cran.r-
project.org/web/packages/arulesViz/vignettes/arulesViz.pdf ).
Howmanyruleswegenerate,andhowweprioritisewhichrulesweaction,dependonwhichbusinessquestionsweplantoanswerwithour
analysis.Thisisdiscussedfurtherinthenextsection.
Backtotop.
4.Usingtheanalysistodrivebusinessdecision-making
Beforeweusethedatatomakeanykindofbusinessdecision,itisimportantthatwetakeastepbackandremembersomethingimportant:
Theoutputoftheanalysisreectshowfrequentlyitemsco-occurintransactions.Thisisafunctionbothofthestrengthofassociation
betweentheitems,andthewaythesiteownerhaspresentedthem.
Tosaythatinadierentway:itemsmightcooccurnotbecausetheyarenaturallyconnected,butbecausewe,thepeopleinchargeofthesite,
havepresentedthemtogether.
Thisisanexampleofamoregeneralprobleminwebanalytics:ourdatareectsthewayusersbehave,andthewaywehaveencouragedthemto
behave,bythewebsitedesigndecisionswehavemade.Weneedtobeconsciousofthis,because,ifassuggestedearlierintherecipe,weusethe
resultstoinformwhereitemsareplacedrelativetooneanother,weneedtocontrolforhowclosetheyaresituatedonthewebsitetoday,sothat
wedontendupconrmingwhatwehaveassumed.So,forexample,ifitemskandlshowastrongassociation,andarepresentednexttoone-
anotheralreadyonoursite,thatisnotthatinteresting.Iftheyarefarapartonoursite,thatisinteresting-maybeweshouldputthemcloser
together.Ifthoseitemsareclosetogether,buttheanalysisshowsthereisnotastrongassociation,weshouldprobablyseparatethem:our
previousassumptionthattheyshouldbeplacedtogethermayhavebeenwrong.
Usingthedatatodrivewebsiteorganization
Thereareanumberofwayswecanusethedatatodrivesiteorganisation:
http://snowplowanalytics.com/guides/recipes/cataloganalytics/marketbasketanalysisidentifyingproductsthatsellwelltogether.html 7/9
3/21/2017 MarketbasketanalysisidentifyingproductsandcontentthatgowelltogetherSnowplow
1.Largeclustersofco-occuringitemsshouldprobablybeplacedintheirowncategory/theme
2.Itempairsthatcommonlyco-occurshouldbeplacedclosetogetherwithinbroadercategoriesonthewebsite.Thisisespeciallyimportant
whereoneiteminapairisverypopular,andtheotheritemisveryhighmargin.
3.Longlistsofrules(includingoneswithlowsupportandcondence)canbeusedtoputrecommendationsatthebottomofproductpages
andonproductcartpages.Theonlythingthatmattersfortheserulesisthattheliftisgreaterthanone.(Andthatwepickthoserulesthat
areapplicableforeachproductwiththehighliftwheretheproductrecommendedhasahighmargin.)
4.Intheeventthatdoingtheabove(3)drivessignicantupliftinprot,itwouldstrengthenthecasetoinvestinarecommendationsystem,
thatusesasimilaralgorithminanoperationalcontexttopowerautomaticrecommendationengineonyourwebsite.
Usingthedatafortargetedmarketing
Thesameresultscanbeusedtodrivetargetedmarketingcampaigns.Foreachuser,wepickahandfulofproductsbasedonproductstheyhave
boughttodatewhichhavebothahighupliftandahighmargin,andsendthemae.g.personalizedemailordisplayadsetc.
Howweusetheanalysishassignicantimplicationsfortheanalysisitself:ifwearefeedingtheanalysisintoamachine-drivenprocessfor
deliveringrecommendations,wearemuchmoreinterestedingeneratinganexpansivesetofrules.If,however,weareexperimentingwith
targetedmarketingforthersttime,itmakesmuchmoresensetopickahandfulofparticularlyhighvaluerules,andactionjustthem,before
workingoutwhethertoinvestintheeortofbuildingoutthatcapabilitytomanageamuchwiderandmorecomplicatedruleset.
Backtotop.
5.Expandingontheanalysis:zoomingoutfromthebaskettolookacustomerbehaviorover
longerperiodsanddierentevents
Intheaboveexample,weusedactualtransactioneventstoidentifyassociationsbetweenproductsforanonlineretailer.
Stickingwithourretailexample,however,wecouldhaveexpandedthescopeofourdenitionoftransactions.Insteadofjustlookingatthe
basketforsuccessfultransactions,wecouldhavelookedatuserscompletebaskets(whetherornottheywentontobuy).Theanalysissteps
wouldhavebeenalmostexactlythesame,however,insteadofpullingtransactiondataoutofSnowplow,wedhavepulledadd-to-basketdata
out,usingaquerylikethefollowing:
/*PostgreSQL/Redshift*/
SELECT
"domain_userid"+''+"domain_sessionidx"AS"transaction_id",
"ev_property"AS"sku"
FROM
"events"
WHERE
"ev_action"='addtobasket'
Wecouldincreasethescopefurther,soinsteadoflookingatadd-to-basket-events,welookateveryproductthateachvisitorhasviewed,and
associategroupsofproductsthatindividualusershavelookedatwithinasinglesession:
/*PostgreSQL/Redshift*/
SELECT
"domain_userid"+''+"domain_sessionidx"AS"transaction_id",
"page_urlpath"
FROM
"events"
WHERE
"event"='page_view'
NotehowthistimeeachproductisidentiedbyURLratherthanbySKU.ItmaybeappropriatetolteroutURLsthatdonotcorrespondwith
productpages.
Finally,wecouldexpandourwindowfurther,soinsteadofconningourselvestoasinglesession,welookatthesameuserovermultiple
sessions,i.e.:
http://snowplowanalytics.com/guides/recipes/cataloganalytics/marketbasketanalysisidentifyingproductsthatsellwelltogether.html 8/9
3/21/2017 MarketbasketanalysisidentifyingproductsandcontentthatgowelltogetherSnowplow
/*PostgreSQL/Redshift*/
SELECT
"domain_userid"AS"transaction_id",
"page_urlpath"
FROM
"events"
WHERE
"event"='page_view'
Thesenal,widerscopeexamples,arelikelytobemoreappropriateforpublishersandmediasiteowners,whowanttoidentifyassociations
betweenarticles,writers/authors/producersandcategoriesofcontent,ratherthanproductsinashop.
Backtotop.
Signuptoourmailinglisttobenotiedofnewreleasesandrelatednews.
Emailaddress SUBSCRIBE
COMPANY CONTACTUS
About(/about) contact@snowplowanalytics.com(mailto:co%6Etact@snow%70lowa%6E%61%6Cy%74ics%2Ecom)
Team(/about/team) TheRomaBuilding,
Jobs(/about/jobs) 32-38ScruttonStreet
Blog(/blog) EC2A4RQLondon,UK
COPYRIGHT2012-2017SNOWPLOWANALYTICS,LTD.PRIVACYPOLICY(/PRIVACY).
http://snowplowanalytics.com/guides/recipes/cataloganalytics/marketbasketanalysisidentifyingproductsthatsellwelltogether.html 9/9