Вы находитесь на странице: 1из 12

5/27/2017 dc.

js|BecomingADataScientist

BecomingADataScientist

TagArchives:dc.js

CreateRichInteractiveVisualisations

PostedonMay21,2013by.
Howtousedc.jstoquickly(andeasily!)createvisuallyimpactfulinteractivevisualisationsof
data.Inanafternoon.Somethinglikethisinteractivevisualisation(hp://frozenhollows
5121.herokuapp.com/).

Oftenitisdesirabletocreateavisualisationofadatasettoenableinteractiveexplorationorsharean
overviewofthedatawithteammembers.

Goodvisualisationshelpingeneratinghypothesisaboutthedatawhichcanbetested/validated
throughfurtheranalyses.

Desirablefeaturesofasuchavisualisationinclude:accessibleviabrowser(anyonecanaccessit!),
interactive(supportsdiscovery),scalable(asolutionsuitsdatasetsofmultiplesizes),easy/quickto
Implement(goodasaprototypedevelopmenttool),andexible(customstylingcanemphasis
importantfeatures).

Theprocessforcreatinganexploratoryvisualisationusuallylookslikethis:

1.ExploreData&DataFeatures
2.BrainstormFeatures/Hypothesisaboutpaerns
3.RoughlySketchVisual
4.IterativelyImplementVisualisation
5.ObserveUsersinteracting
6.Rene,Test,Release

Whenrstworkingwithadataset,understandinghowitwillbeusefulisaprimaryobjective.
Rapidlyiteratingthroughtheprocessoutlinedabovecanhelpusunderstanditsusefulnessvery
quickly.Accesstotherighttoolscanhelpusinrapidlyiteratingthroughthisprocess.

ThatiswhereDimensionalCharting(dc.js(hp://nickqizhu.github.io/dc.js/))comesin.dc.jsisa
neatlilejavascriptlibrarythatleveragesboththevisualisationpowerofDataDrivenDocuments
(d3.js)andtheinteractive/coordinationofCrosslter(crosslter.js).

dc.jsisanopensource,extremelyeasytopickupjavascriptlibrarywhichallowsustoimplement
https://becomingadatascientist.wordpress.com/tag/dcjs/ 1/12
5/27/2017 dc.js|BecomingADataScientist

dc.jsisanopensource,extremelyeasytopickupjavascriptlibrarywhichallowsustoimplement
neatcustomvisualisationsinamaerofhours.

Thispostwillwalkthroughtheprocess(fromstarttonish)tocreateadatavisualisation.Today,the
emphasisistoveryquicklyarriveatanopinionatedvisualisationofourdatathatenablesusto
explore/testspecichypothesis.Inorderillustratethisconcept,wewilluseasimpledatasettaken
fromtheyelppredictionchallenge(hp://www.kaggle.com/c/yelprecruiting/)onKaggle.Speciclly
wewillbeusingtheleyelp_test_set_business.json.

(Step1)ExploreData&DataFeatures

Thedatawevechosenforthisvisualisationisjson,hasoneobjectperbusinessandeachobjectis
structuredaccordingthethefollowingexample:

(hps://becomingadatascientist.les.wordpress.com/2013/05/screenshot20130518at192429.png)

Thefulldatasetcontainsapproximately1,200business,alllocatedinArizona.Letsassumeweare
interestedinexploringthedierencebetweencities,forexample,thereviewcountandaverage
rankingbybusinesspercity.Thefeaturesimportantforthiswillbe(1)City,(2)ReviewCount,(3)
Stars(Rating),(4)Locationand(5)BusinessID.

(Step2)BrainstormFeatures/Hypothesisaboutpaerns

QuickBrainstorm(ofdesirablehypothesisorquestions):

Whichcitieshaveahighnumberofbusinessesthanothers?
Dospeciccitieshavehigherratedbusinessesthanothers?
Docertaincitieshaveahigheraveragenumberofreviewsperbusiness?
Aretheircitieswithverylownumberofreviews?
Whatproportionofbusinessesinacityare1starcomparedto5star?
Listthehighest/lowestratedbusinessforaspeciccity(foranecdotalexploration)

(Step3)RoughlySketchVisual

Basedontheabove,itseemslikeagroupingcomparisonbycity,withadrilldownintospecic
features(rating,numberofreviews,listofspecicbusinesseswouldbeuseful).Soourvisualisation
willhavetohitalltheseobjectives.Withindc.js,wehavetheoptionofthefollowingcharts.

https://becomingadatascientist.wordpress.com/tag/dcjs/ 2/12
5/27/2017 dc.js|BecomingADataScientist

(hps://becomingadatascientist.les.wordpress.com/2013/05/screenshot20130521at143625.png)

Afteralilewhiteboardbrainstorming,wearriveatsomethingthatlookslikethebelow.The
numbers(markedinred)refertothefollowingvisualisations(noteasecondaryobjectiveherewasto
usemanyvisualisations):

1.Bubblechart(bubble=city,bubblesize=numberofbusinesses,xaxis=avg.reviewperbusiness,
yaxis=averagestars)
2.PieChart(%ofbusinesseswitheachstarcount)
3.VolumeChart/Histogram(averageratinginstars/#ofbusinesses)
4.LineChart(averageratinginstars/#ofbusinesses)
5.DataTable(businessname,city,reviews,stars,locationlinktomap)
6.RowChart(#reviewspercity)

(hps://becomingadatascientist.les.wordpress.com/2013/05/photo2.jpg)

https://becomingadatascientist.wordpress.com/tag/dcjs/ 3/12
5/27/2017 dc.js|BecomingADataScientist

(Step4)IterativelyImplementVisualisation

Thisnextstepisiterative.Thisistheprocessbywhichweimplementourroughsketchintoour
visualisation.Thisisachievedthroughathreestepprocess.

(hps://becomingadatascientist.les.wordpress.com/2013/05/screenshot20130521at143935.png)

Implementingourvisualisation(Step1)DevelopmentEnvironmentSetup

1.Inanewfoldercreateindex.html(withHelloWorldinside),simple_vis.js
2.Copyyelpdata&components*(js/css)intosubfolders(data(.json),javascripts(.js),
stylesheets(.css)
3.Startwebserver(mongoose.exe(hps://code.google.com/p/mongoose/))fromfolder(orpython
mhtp.server(hp://www.andyjamesdavies.com/javascript/simplehpserveronmacosxin
seconds)ifonmac)
4.Openbrowsertourllocalhost:8080(testthatitisworking)
5.Openjavascriptconsole

*ThecomponentswewillneedarejQuery,d3.js,crosslter.js,dc.js,dc.css,boostrap.css,
bootstrap.css(thesearealllocatedintheresourceszip(hp://frozenhollows
5121.herokuapp.com/meetup_resources.zip)le).
Implementingourvisualisation(Step2)HTMLCoding
Firstwellhavetoloadtheappropriatecomponents(outlinedabove).Thebeginningofthehtml
shouldlooklikethis:

1 <!DOCTYPEhtml>
2 <htmllang='en'>
3 <head>
4 <metacharset='utf8'>
5
6 <scriptsrc='javascripts/d3.js'type='text/javascript'></script>
7 <scriptsrc='javascripts/crossfilter.js'type='text/javascript'></script>
8 <scriptsrc='javascripts/dc.js'type='text/javascript'></script>
9 <scriptsrc='javascripts/jquery1.9.1.min.js'type='text/javascript'></script
10 <scriptsrc='javascripts/bootstrap.min.js'type='text/javascript'></script>
11
12 <linkhref='stylesheets/bootstrap.min.css'rel='stylesheet'type='text/css'
13 <linkhref='stylesheets/dc.css'rel='stylesheet'type='text/css'>
14
15 <scriptsrc='simple_vis.js'type='text/javascript'></script>
16 </head>

Secondly,asweareusingbootstraplayoutwellwanttosketchoutthedivswelluse(formoreon
this,webootstrapscaolding(hp://twier.github.io/bootstrap/scaolding.html#uidGridSystem)).

https://becomingadatascientist.wordpress.com/tag/dcjs/ 4/12
5/27/2017 dc.js|BecomingADataScientist

(hps://becomingadatascientist.les.wordpress.com/2013/05/photo4.jpg)
Whenwevetranslatedthislayouttohtml,itlookslikethe.Eachdivinthecodebelowreferstoabox
inthediagramabove(andnesteddivsareboxeswithinaboxitsthatsimple!).Alsonotethatwe
havegiveneachofourspandivsanidaributetoindicatethevisualisationthatwillgointoit(e.g.
line12<divclass=bubblegraphspan12id=dcbubblegraph>.Thereasonfordoingthiswill
becomeapparentwhenwelookatthejavascriptlater.

1 <body>
https://becomingadatascientist.wordpress.com/tag/dcjs/ 5/12
5/27/2017 dc.js|BecomingADataScientist

1 <body>
2 <divclass='container'id='maincontainer'>
3 <divclass='content'>
4 <divclass='container'style='font:10pxsansserif;'>
5 <h3>Visualisationof<ahref="http://www.kaggle.com/c/yelprecru
6 <h4>Demoforthe<ahref="http://www.meetup.com/DublinDataVisu
7 <divclass='rowfluid'>
8 <divclass='remaininggraphsspan8'>
9 <divclass='rowfluid'>
10 <divclass='bubblegraphspan12'id='dcbubblegraph
11 <h4>AverageRating(xaxis),AverageNumberofR
12 </div>
13 </div>
14 <divclass='rowfluid'>
15 <divclass='piegraphspan4'id='dcpiegraph'>
16 <h4>AverageRatinginStars(Pie)</h4>
17 </div>
18 <divclass='piegraphspan4'id='dcvolumechart'
19 <h4>AverageRatinginStars/NumberofReviews
20 </div>
21 <divclass='piegraphspan4'id='dclinechart'
22 <h4>AverageRatinginStars/NumberofReviews
23 </div>
24 </div>
25 <!/otherlittlegraphsgohere>
26 <divclass='rowfluid'>
27 <divclass='span12tablegraph'>
28 <h4>DataTableforFilteredBusinesses</h4>
29 <tableclass='tabletablehoverdcdatatable'
30 <thead>
31 <trclass='header'>
32 <th>Name</th>
33 <th>City</th>
34 <th>ReviewScore(inStars)</th
35 <th>TotalReviews</th>
36 <th>Location</th>
37 </tr>
38 </thead>
39 </table>
40 </div>
41 </div>
42 </div>
43 <divclass='remaininggraphsspan4'>
44 <divclass='rowfluid'>
45 <divclass='rowgraphspan12'id='dcrowgraph'
46 <h4>ReviewsPerCity</h4>
47 </div>
48 </div>
49 </div>
50 </div>
51 </div>
52
53 </div>
54 </div>
55 </body>
56 </html>

Implementingourvisualisation(Step3)JavascriptCoding

Perhapsthemostdicultparttograsp,theJavaScriptcodingiscompletedaccordingtothe
https://becomingadatascientist.wordpress.com/tag/dcjs/ 6/12
5/27/2017 dc.js|BecomingADataScientist

Perhapsthemostdicultparttograsp,theJavaScriptcodingiscompletedaccordingtothe
followingsteps:

1.LoadData
2.CreateChartObject(s)
3.RunDataThroughCrosslter
4.CreateDataDimensions&Groups
5.ImplementCharts
6.RenderCharts

Thecodeforthisisclearlycommentedbelow.

1 /********************************************************
2 **
3 *dj.jsexampleusingYelpKaggleTestDataset*
4 *Eol9thMay2013*
5 **
6 ********************************************************/
7
8 /********************************************************
9 **
10 *Step0:Loaddatafromjsonfile*
11 **
12 ********************************************************/
13 d3.json("data/yelp_test_set_business.json",function(yelp_data){
14
15 /********************************************************
16 **
17 *Step1:Createthedc.jschartobjects&lingtodiv*
18 **
19 ********************************************************/
20 varbubbleChart=dc.bubbleChart("#dcbubblegraph");
21 varpieChart=dc.pieChart("#dcpiegraph");
22 varvolumeChart=dc.barChart("#dcvolumechart");
23 varlineChart=dc.lineChart("#dclinechart");
24 vardataTable=dc.dataTable("#dctablegraph");
25 varrowChart=dc.rowChart("#dcrowgraph");
26
27 /********************************************************
28 **
29 *Step2:Rundatathroughcrossfilter*
30 **
31 ********************************************************/
32 varndx=crossfilter(yelp_data);
33
34 /********************************************************
35 **
36 *Step3:CreateDimensionthatwe'llneed*
37 **
38 ********************************************************/
39
40 //forvolumechart
41 varcityDimension=ndx.dimension(function(d){returnd.city;});
42 varcityGroup=cityDimension.group();
43 varcityDimensionGroup=cityDimension.group().reduce(
44 //add
45 function(p,v){
46 ++p.count;
47 p.review_sum+=v.review_count;
48 p.star_sum+=v.stars;
49 p.review_avg=p.review_sum/p.count;
https://becomingadatascientist.wordpress.com/tag/dcjs/ 7/12
5/27/2017 dc.js|BecomingADataScientist

49 p.review_avg=p.review_sum/p.count;
50 p.star_avg=p.star_sum/p.count;
51 returnp;
52 },
53 //remove
54 function(p,v){
55 p.count;
56 p.review_sum=v.review_count;
57 p.star_sum=v.stars;
58 p.review_avg=p.review_sum/p.count;
59 p.star_avg=p.star_sum/p.count;
60 returnp;
61 },
62 //init
63 function(p,v){
64 return{count:0,review_sum:0,star_sum:0,review_avg:0,star_av
65 }
66 );
67
68 //forpieChart
69 varstartValue=ndx.dimension(function(d){
70 returnd.stars*1.0;
71 });
72 varstartValueGroup=startValue.group();
73
74 //Fordatatable
75 varbusinessDimension=ndx.dimension(function(d){returnd.business_id;
76 /********************************************************
77 **
78 *Step4:CreatetheVisualisations*
79 **
80 ********************************************************/
81
82 bubbleChart.width(650)
83 .height(300)
84 .dimension(cityDimension)
85 .group(cityDimensionGroup)
86 .transitionDuration(1500)
87 .colors(["#a60000","#ff0000","#ff4040","#ff7373","#67e667","#39e63
88 .colorDomain([12000,12000])
89
90 .x(d3.scale.linear().domain([0,5.5]))
91 .y(d3.scale.linear().domain([0,5.5]))
92 .r(d3.scale.linear().domain([0,2500]))
93 .keyAccessor(function(p){
94 returnp.value.star_avg;
95 })
96 .valueAccessor(function(p){
97 returnp.value.review_avg;
98 })
99 .radiusValueAccessor(function(p){
100 returnp.value.count;
101 })
102 .transitionDuration(1500)
103 .elasticY(true)
104 .yAxisPadding(1)
105 .xAxisPadding(1)
106 .label(function(p){
107 returnp.key;
108 })
109 .renderLabel(true)
110 .renderlet(function(chart){
https://becomingadatascientist.wordpress.com/tag/dcjs/ 8/12
5/27/2017 dc.js|BecomingADataScientist

110 .renderlet(function(chart){
111 rowChart.filter(chart.filter());
112 })
113 .on("postRedraw",function(chart){
114 dc.events.trigger(function(){
115 rowChart.filter(chart.filter());
116 });
117 });
118 ;
119
120
121 pieChart.width(200)
122 .height(200)
123 .transitionDuration(1500)
124 .dimension(startValue)
125 .group(startValueGroup)
126 .radius(90)
127 .minAngleForLabel(0)
128 .label(function(d){returnd.data.key;})
129 .on("filtered",function(chart){
130 dc.events.trigger(function(){
131 if(chart.filter()){
132 console.log(chart.filter());
133 volumeChart.filter([chart.filter().25,chart.filter()(0.2
134 }
135 elsevolumeChart.filterAll();
136 });
137 });
138
139 volumeChart.width(230)
140 .height(200)
141 .dimension(startValue)
142 .group(startValueGroup)
143 .transitionDuration(1500)
144 .centerBar(true)
145 .gap(17)
146 .x(d3.scale.linear().domain([0.5,5.5]))
147 .elasticY(true)
148 .on("filtered",function(chart){
149 dc.events.trigger(function(){
150 if(chart.filter()){
151 console.log(chart.filter());
152 lineChart.filter(chart.filter());
153 }
154 else
155 {lineChart.filterAll()}
156 });
157 })
158 .xAxis().tickFormat(function(v){returnv;});
159
160 console.log(startValueGroup.top(1)[0].value);
161
162 lineChart.width(230)
163 .height(200)
164 .dimension(startValue)
165 .group(startValueGroup)
166 .x(d3.scale.linear().domain([0.5,5.5]))
167 .valueAccessor(function(d){
168 returnd.value;
169 })
170 .renderHorizontalGridLines(true)
171 .elasticY(true)
https://becomingadatascientist.wordpress.com/tag/dcjs/ 9/12
5/27/2017 dc.js|BecomingADataScientist

171 .elasticY(true)
172 .xAxis().tickFormat(function(v){returnv;});;
173
174 rowChart.width(340)
175 .height(850)
176 .dimension(cityDimension)
177 .group(cityGroup)
178 .renderLabel(true)
179 .colors(["#a60000","#ff0000","#ff4040","#ff7373","#67e667","#39e63
180 .colorDomain([0,0])
181 .renderlet(function(chart){
182 bubbleChart.filter(chart.filter());
183 })
184 .on("filtered",function(chart){
185 dc.events.trigger(function(){
186 bubbleChart.filter(chart.filter());
187 });
188 });
189
190
191 dataTable.width(800).height(800)
192 .dimension(businessDimension)
193 .group(function(d){return"ListofallSelectedBusinesses"
194 })
195 .size(100)
196 .columns([
197 function(d){returnd.name;},
198 function(d){returnd.city;},
199 function(d){returnd.stars;},
200 function(d){returnd.review_count;},
201 function(d){return'<ahref=\"http://maps.google.com/maps?z=12&t=m&q=
202 ])
203 .sortBy(function(d){returnd.stars;})
204 //(optional)sortorder,:defaultascending
205 .order(d3.ascending);
206 /********************************************************
207 **
208 *Step6:RendertheCharts*
209 **
210 ********************************************************/
211
212 dc.renderAll();
213 });

Aftercompletingthestepsabove,weareleftwithsomethinglikethis(alsohostedhere(hp://frozen
hollows5121.herokuapp.com/)).

https://becomingadatascientist.wordpress.com/tag/dcjs/ 10/12
5/27/2017 dc.js|BecomingADataScientist

(hps://becomingadatascientist.les.wordpress.com/2013/05/screenshot20130509at213233.png)

(Step5)ObserveUsersinteracting

Perhapsthemostimportantstep.Ifavisualisationistobeuseful,itmustrstbeunderstoodbythe
userand(asthisisinteractive),encouragetheusertoexplorethedata.

Inthisstepwhatwevetryingtoachieveistowatchauserfamiliarwiththecontextinteractwiththe
visualisationandlookforcuesthattellussomethingisworkingforthem(e.g.listedforthingslike
ParadiseValleyhasahighnumberofaveragereviewsorTherearerelativelyfewlowreviewsin
Phoenix,andthoseallseemtobehardwarestores).Ifyourinteractivevisualisationisworkingwell,
oftenyouwillseeuserspotamacrotrendLowreviewincityXXXandconrmwhythisisbyclick
lteringCityXhasmanycheapthriftstores,whichIknowgetlowreviews.

Youllknowyourvisualisationisnotworkingifsomefactdoesnotsuprise,delightorappearto
proveausershypothesis.

(Step6)Rene,Test,Release

FollowingStep5,wewanttoimproveonthevisualisation.Dependingonhowwellitworkedthese
changesmightbecosmetic(thecolourschemeconfusedtheuser)orthemmightbetransformational
(thevisualisationdidntengagetheuser).Thismightmeanreturningtostep2orstep4.Ifthe
visualisationwillberefreshedwithnewdataatregularintervals,itisoftenagoodideato

periodicallyobserveusestounderstandifhowtheirneedshavechangedhavingunderstood(and 11/12
https://becomingadatascientist.wordpress.com/tag/dcjs/
5/27/2017 dc.js|BecomingADataScientist

periodicallyobserveusestounderstandifhowtheirneedshavechangedhavingunderstood(and
hopefullysolved!)theirinitialdatachallenge.Giventheusersnewneeds,repeatingtheentire
visualisationprocessagainmaybebenecial.

ThatsIt.

Phew!Welldoneformakingitthisfar.Thersttimeyoureadthispost,alotofitmightseemnew,
butreviewingthisinconjunctionwiththecodeintheresources(hp://frozenhollows
5121.herokuapp.com/meetup_resources.zip)leshouldanswerquestionsyouhave.

Remember,ifyoucannailthis(whichshouldnttaketoolong),youllbeabletocreateneat
interactivevisualisationquicklyandeasilyinmanycontexts(whichcanimpressalotofpeople)!

Asalways,commentsandfeedbackareappreciated.Pleaseleavethembelow,oronourfacebook
page(hps://www.facebook.com/BecomingADataScientist).

PostedinVisualisation / Taggedcrosslter.js,d3.js,dc.js,DublinDataVis,Visualisation / 8
Comments

BlogatWordPress.com.

https://becomingadatascientist.wordpress.com/tag/dcjs/ 12/12

Вам также может понравиться