Вы находитесь на странице: 1из 49

Exploration

Techniques for
Stranded Customer Intelligence
Collateral
An MS-ISE Final Project

Michael Hay – April 2014


Table of Contents

ABSTRACT 4

1. INTRODUCTION 4

2. FACT-FINDING AND RESULTS 5

FACT-FINDING PROCESS 5
RESULTS 6
CONTENT HISTORY 6
CONTENT EXPERIENCE 6
CONTENT OUTCOMES 8
OTHER DATA 9
RESULTING USE CASE(S) AND REFLECTIONS 9

3. RELEVANT WORKS 10

4. TECHNOLOGY SELECTION, UTILIZED DATA, AND PROTOTYPE SYSTEM 11

TECHNOLOGY SELECTION 11
UTILIZED DATA 13
PROTOTYPE SYSTEM 13

5. TESTING APPROACH, ANALYSIS TECHNIQUES, AND RESULTS 16

TESTING APPROACH 16
ANALYSIS TECHNIQUES 17
RESULTS 18

6. CONCLUSIONS AND FUTURE WORK 20

REFERENCES 24

APPENDIXES 25

APPENDIX 1 – GENERIC INTERVIEW QUESTIONNAIRE & EXAMPLE INTERVIEW OUTCOME 25


GENERIC INTERVIEW QUESTIONNAIRE 25
EXAMPLE INTERVIEW OUTCOME 25
APPENDIX 2 – USABILITY TEST QUESTIONNAIRE AND TASK LIST 32
APPENDIX 3 – EXEMPLARY CUSTOMER INTERVIEW MATERIALS & RELATED JSON DATA 34
NOTICE 34
HELLER EHRMAN – EMPLOYMENT ATTORNEY 34
RELATED JSON DATA 37
APPENDIX 4 – PROTOTYPE EXEMPLARY SOURCE CODE 39
KEY WORD EXTRACTION & IMAGE CREATION 39

2
WEB USER INTERFACE FOR ALTERNATIVE 4 – SOURCE & GUI 40
APPENDIX 5 – SURVEY QUESTIONS AND RESPONSES 42
QUESTIONS 42
RESPONSES 43
APPENDIX 6 – RAW DATA WITH DESCRIPTIVE STATISTICS AND KEY STATISTICAL TESTS 44
RAW DATA WITH DESCRIPTIVE STATISTICS 44
KEY STATISTICAL TESTS 45

3
Abstract
A basic system that reveals new insights from existing customer interview material is presented.
Construction of the system begins with a fact-finding process discerning key required use cases
and tasks from the target audience. A coarse prototype consisting of four alternative visual
representations of a set of customer interview material is implemented. The target audience will
test these alternatives, with time to complete major test elements recorded. Measured results are
compiled and compared to illustrate that not all visualizations result in optimal user performance.
Finally, these findings are reported that illustrate benefits from relatively simple discoveries (i.e.
structured file naming schemas) to complex reflections (i.e. building an audio search engine with
key word visualization may improve productivity).

1. Introduction
In the Information Technology (IT) markets, competitive tactics and business model changes are
mandating reevaluations of how companies interact with their users. One impacted IT sub-market,
computer data storage, is going through sweeping changes. An example of this is a movement
away from the Standard Consumption Model1 to the Utility Consumption Model2. A second
example within the Standard Consumption Model is the procurement of complete systems instead
of just data storage; in fact users are purchasing complete systems inclusive of applications,
management software, computer servers, data networking, and data storage. How can an
organization detect macro changes and micro details leading to relevant portfolio changes that
meet market needs? One obvious approach, to discerning change at both macro and micro
levels, is to have Planners3 employ a very traditional fact-finding technique called a user interview
[1]. These engagements target a representative set of users in an effort to understand and record
their needs. From this set of recorded data Planners construct use cases, prototypes, and define
requirements that are used in the development of an Information System. However, experience
has shown that Artifacts4 derived from past fact-finding efforts are often stranded and forgotten.
This results in Planners re-visiting the same topics, reengaging with the same customers lacking
awareness of past interactions, failing to spot trends over time, lacking of awareness of specific
geographic needs, and so on. To understand why Artifacts are forgotten and to potentially explore
how they may become un-stranded, this project began by performing its own fact-finding activity

1
The IT Standard Consumption Model is defined as an approach whereby organizations purchase
equipment and/or from a vendor and deploys equipment within a facility they own or lease.
2
The IT Utility Consumption Model is defined as a way of procuring IT functions by organizations in a way
that is equivalent to utilities like power and water. In this method consumers do not directly purchase
equipment, but instead pay for a service on a periodic basis, for example 10 Gigabytes of compute data
storage for $50 USD per year.
3
Planners are people who have one or more of the following archetypical roles: Product Managers, Product
Marketing Managers, Product Designers, Researchers, and Engineers.
4
Artifacts include but aren’t limited to PDF documents, presentation files, audio recordings, flat text
documents, etc.

4
[2]. Planners, the selected target audience, were interviewed to glean and document their most
pressing needs. Based upon these interviews, a proper use case was derived and a prototype
system constructed implementing the most pressing needs. The prototype implementation
focused on the presentation and visualization aspects of a system. Ultimately, through utilizing
usability testing techniques, a quantitative evaluation of user performance, or time to complete a
task, occurred by comparing four alternative visualization approaches.
This paper is organized in the following manner; Processes and results of the fact-finding activity
are reported in Section 2 including some hints at the analysis approach utilized. Section 3 reflects
on findings from within relevant literature and suggests how it might impact the project. Section 4
details the data set, preparation of the data set, and basic construction of the prototype system. In
Section 5 the usability testing approach is reported along with the results of the measurements
comparing the alternatives. Lastly, conclusions and plans of future works are covered in Section 6.

2. Fact-Finding and Results

Fact-Finding Process
Is any evidence proving the author’s assertion regarding stranded Artifacts available? Further how
do Planners want to experience and interact with these Artifacts, and what do they expect to do
with them? In an effort to understand answers to these key questions, a fact-finding activity was
performed with a focus on interviewing Planners. Specifically, 16 interviews were conducted from
February 10, 2014 to March 14, 2014, and each session lasted between 30-45 minutes covering
four categories. During each interview the author captured notes including quotes and distillations
of ideas for each interviewee. (Note, the Generic Interview Questionnaire and Example Interview
Outcome are detailed in Appendix 1.) After concluding all interviews, the author read and analyzed
the results by recording answers into Microsoft Excel. As answers to questions were easily
discernable (e.g. Yes or No) the results were quickly recorded, but detecting more complex
concepts required deeper reading that led to a transformation of concepts into summarizing key
words5. For example where one interviewee might say, “…[if] I had the index or list of questions
then I could at least see…” and another, “…if you have the experience of going to a public library
you can do searches on anything…” then both attendees inputs were interpreted as a single
concept assigned the key words “Index/Search.” With all interviews analyzed, results were
compiled into a series of tables and graphs. These visual compilations were used to conclude if
the original assertion regarding stranded Artifacts was correct, and uncover Planners’ preferred
experience of the Artifacts. The latter concept, uncovering the users’ preferred experience,
ultimately led to a key use case the author leveraged to construct the prototype system.

5
Whitten and Bentley don’t suggest that there are any hard and fast rules to distill ideas into discernable
requirements or use cases. Instead they insinuate that this comes through both experience as a skilled
systems analysis and through techniques like Joint Requirements Planning as well as via processes like
brainstorming [1].

5
Fact-Finding Results

Content History
Were most Planners interviewed aware of the results of past customer interviews and the Artifacts
representing them? Many, (11 out of 16), answered yes or partially yes they were aware.
However, a deeper look across the body of interviewees revealed varying descriptions of at least
location, type of interview, and scope. For example, participant 1 said, “I know some, but there
are a bunch that haven’t been re-aggregated. So I don’t know where everything is.” While
participant 3 stated, “I know there is
material and I’ve looked for it and
couldn’t find it.” These two Q.b.
examples suggest awareness of the
Artifacts, but at the same time that Q.a.
there are more Artifacts than they
0% 20% 40% 60% 80% 100%
knew of. Summary of the findings
for this category are referenced in Q.a. Q.b.
Figure 2-1, with Q.a. dealing with Yes 2 3
Artifact awareness and Q.b. dealing Partial Yes 9 6
with the acceptability of Artifact Partial No 0 0
format. While it would be easy to No 5 6
assume that, due to the number of N/A 0 1
positive and partially positive
answers, there was complete Figure 2-1 Summary of findings for awareness of Artifacts &
awareness of the Artifacts the desirability of customer study format.
actual interview data doesn’t support this conclusion. Instead the best conclusion is that more
than a super majority, or 87.5%, of the interviewees claimed to understand that what they knew
represented a subset of all available Artifacts. Therefore, the assertion that Artifacts are stranded
should be considered reasonable. Similarly, 75% of those interviewed found that the format for
existing customer interview Artifacts either partially met or outright failed to meet their needs. Here,
one can assume that additional ways to consume, experience and explore these Artifacts are
highly desirable.

Content Experience
This section in the questionnaire investigated different ways planners might want to interact with
the Artifacts. Unlike the previous case, it was not possible to discern a positive, somewhat
positive, somewhat negative or negative answer to a question. Instead the logic employed to gain
an understanding of what kinds of use cases and capabilities are needed came from the following
steps.
1. Qualitatively assign conceptual key words to statements representing ideas,
2. Read answers to each question across all interviews matching defined key words to related
concepts and ideas,

6
3. Evaluate the frequency of occurrence of key words in a bar graph and associated table,
see figure 2-2, and
4. Correlate the key word to a usage pattern or capability likely needed in the prototype.

20 19
18
16 14
14
12
10 8 8
8
6 5 5 5 5
4 3 3
2
2 1 1 1 1 1 1
0

Figure 2-2 Key word frequency graph relating how Planners would like to experience user interview
collateral.

While every category could be reported on only the most frequent are reported in order of
frequency. The most frequent concept, both directly and indirectly, is the idea of reporting the
number of occurrences of key terms. Specifically, the concept was directly repeated nineteen
times and then indirectly6 another nine times for a total of 28 total instances. By both direct and
indirect measures the next most important concept was moving from abstracted customer
interview data to detailed Artifacts, reported at 14 and 57, respectively for a total of 19
occurrences. Another interesting piece of information or data access pattern is Index/Search,
which suggested that users wanted to invoke a “Google-like” approach to find the right set of
Artifacts for deeper exploration. Finally, the last facet of relevant information gleaned, is the type of
system that the users would want to experience the prototype on. Here, the highest frequency of
reference was a laptop platform. All remaining key words are possibly of interest for the reader to
explore, but not considered when constructing the prototype.

6
Here “indirectly” means that concepts “Key-Term | Time,” “Key-Term | Vertical,” and “Key-Term | Geo” can
also be counted as “Key-Term” and are therefore add to the frequency of Key-Term.
7
Note that A to D & D to A represents Abstract to Detailed & Detailed to Abstract meaning the interviewees
desired both patterns for accessing the Artifacts. This suggests the interviewees would at least be partially
satisfied by a pattern starts from abstract summaries and moves to capturing individual detail or Artifacts.

7
Content Outcomes
What do Planners want to do with customer interview materials or the Artifacts? This section
investigated this point in particular. Again, as described in the section, Content Experience the
same process to assign key words and count their frequency was employed. Figure 2-3
graphically reports the findings of this section of the questionnaire.

14 12
12 11
10 8
8 6
6 5
4 2
2 1 1 1 1 1 1
0

Figure 2-3 Key words summarizing what interviewees were most likely to do with the Artifacts.

Findings from the body of interviews were not surprising to the author. Specifically, the most
commonly occurring key words representing important concepts include usage of the collateral for
requirements/panning (freq. = 12), development of a personal strategy (freq. = 11), validation of
plans or requirements (freq. = 8), understanding the customer strategy (freq. = 6), and to grow a
repository of artifacts (freq. = 5). Given that those interviewed all had jobs that fell somewhere
within the domains of product management, strategic product planning, product marketing, and
engineering, the results aligned well to the kinds of functions a Planner would employ. While it is
important to understand the motivation of the Planners, results from this section proved not
impactful to the prototype system.
Other Data
Beyond customer interview materials are other data sources that should be included in the work
outcomes from Planners? The final section of the questionnaire allowed the interviewee to grapple
exactly with this question. Interestingly, a consistent requirement emerged from the participants: A
desire to consume data and information about their competitors in the market. In many cases the
need for competitive information was expressed in combination with another topic such as
competitive information and technology trends (shown in figure 2-4 as Competitive + Tech.
Trends). Additional needs were expressed to combine competitive information more generally with
other information like customer trends. For example, participant 4 stated, “[Competitive

3.5
3
2.5
2
1.5 3 3
1 2
0.5 1 1 1 1 1 1 1 1
0 0 0 0 0

Figure 2-4 Report of key words of other types of data desired beyond customer interview
collateral.

information should be triangulated] with what the analysts are seeing and also [with] what our
customers are thinking. To me, [with this combination] we should be so far ahead of the financial
analysts.” This clearly illustrates that interviewees believed there was value in coupling data
sources together. Like the previous section, results here were not reflected in the prototype
system and are instead discussed as a point of future work.

Resulting Use Case(s) and Reflections


Given the time scale of the project, the author hoped the fact-finding activity would result in a small
set of use cases. Fortunately, the results from fact-finding easily supported the author’s hope,
revealing a core use case desirable for Planners. Therefore, reviewing the Content Experience
results and using the author’s experiences it was relatively easy to construct a single use case for
prototyping, see table 2-1.

Use case name: Abstract to detailed exploration


User Action System Response
1. User logs in. 1. A screen that includes more than one

9
alternative visual experience renders.
2. User clicks on one alternative. 2. Causes the alternative to run/start, and
each alternative includes:
• An abstracted visualization of the
entire set of customer interview
materials,
• Relevant controls to help the user
traverse from the abstracted
visualization of all interviews to one
or more specific customer
interviews.
3. User performs a task designed to cause 3. Produces a summary of an individual
them to traverse from an abstracted interview including at least one URL
visualization to one or more specific (Uniform Resource Locator) pointing to a
customer interview documents. specific customer interview document.
4. User clicks on the URL(s). 4. Causes the referenced document(s) to
download.
Table 2-1 basic use case, Abstract to detailed exploration, extrapolated from the results of the fact-finding
process.

For this use case it was assumed that all alternative visualization approaches would provide
participants some form of key term visualization, systematic control, and facilitate the download of
discrete Artifacts via an URL. This implies that User Actions and associated System Responses for
steps 2, 3 and 4 are intentionally generalized. Practically speaking, use case generalization is a
sound best practice minimizing development efforts while maximizing capabilities. In fact, in the
author’s experience, use cases may be developed in an object oriented manner so that one use
case can “call” another. Use case interdependencies and hierarchies, expressed through object
oriented ideals, help Planners better develop applications that meet user expectations without
undue design and development burdens.

3. Relevant Works
While processes, reports and visualizations of data from user surveys or polls are well-oiled
machines today, the idea of automatically leveraging unstructured data Artifacts to design
Information Systems seemed not properly studied. Given the success of product companies like
Apple Inc. who must implement sound fact-finding and planning methods to achieve their stated
corporate and financial goals, this was quite a surprising finding [3]. To be clear, the author is not
asserting that domain specific or general-purpose mining, analysis and visualization of unstructured
data like text, audio, and images are not well studied. Instead, the author is stating that the
application of one or more of these techniques used to plan Information Systems is not readily
available in the literature, and may not be well studied in detail. Therefore, the author assembled a
set of seemingly unrelated reports and documents across a broad set of topics.

10
In their work to analyze and visualize mobile Call Detail Records (CDR) Blaas et al. provided a
method to approach the problem: discern key use cases, construct a prototype system, measure
the results and study the findings [2]. An important need arising from the interviews conducted in
this project is; key term or word frequency analysis and visualization. While the author did not
employ advanced word frequency analysis techniques a broad understanding of the topic was
needed, and Baron et al. supplied that [4]. A repeated theme in the interviews is a nearly explicit
requirement to speed up or save time when a Planner needs to engage the Artifacts for their
objectives and activities. This suggests that manual efforts to organize the Artifacts were to be
minimized. In this spirit Tanner and Zhou from Lexis-Nexis provided insight on the idea of
automatic content organization based upon key term analysis [5]. However, when generating the
prototype system their structure and methods were not used. Furthermore, their approach to
usability testing, of their prototype implementation, did not include a quantitative view of user
performance. Other works were helpful in providing ideas for visualizing data – leveraging
emerging Big Data visualization techniques – in ways that are comprehensible to users [6] [7].

4. Technology Selection, Utilized Data, and Prototype System

Technology Selection
Construction of an operable prototype required a rapid survey followed by a quick selection of
relevant technologies. Guiding the survey was one requirement discovered from the fact-finding
phase, and one constraint. A requirement discerned from the fact-finding phase suggested a
technology selection, which could work in a laptop context. Since the timescale was relatively
short technology choices were constrained to select any relevant technology that quickened
development time. Generally, the relevant types of technologies came in two categories: Toolkits
that handled unstructured data processing and technologies that visualized data sets. Due to the
author’s previous knowledge of the Python programing framework, and the availability of
extensions that enabled key word counting, Python was selected for the processing of the
unstructured data within the utilized Artifacts. Selection of the visualization toolkit proved more
challenging; yet the application of the primary selection criteria facilitated a quick decision. For this
phase several visualization toolkits were reviewed and a quick reporting of each follows in table 4-
1.

Name Description Comment(s)


Tumult Hype Tumult Hype is an HTML5 authoring tool. Sufficient for building rapid
What is commonly referred to as “HTML5” is prototypes; yet the addition of
really a platform of technologies including the live or semi-live data requires
latest HTML tags, CSS styles, and improved coding independent of the tool.
JavaScript performance. HTML5’s capabilities
allow for stunning visual effects and smooth
animations, but previously required difficult
hand-coding. There were no designer-friendly
tools for building animated HTML5 content…

11
until Tumult Hype [8].
D3.js D3.js is a JavaScript library for manipulating To achieve the construction of
documents based on data. D3 helps you any system requires deep
bring data to life using HTML, SVG and CSS. knowledge of Javascript
D3’s emphasis on web standards gives you development techniques.
the full capabilities of modern browsers
without tying yourself to a proprietary
framework, combining powerful visualization
components and a data-driven approach to
DOM manipulation [9].
MIT’s SIMILE Exhibit Exhibit 3.0 is a publishing framework for large- System construction requires
scale data-rich interactive Web pages. Exhibit understanding of emerging text
lets you easily create Web pages with based data structures like
advanced text search and filtering JSON and slight modifications
functionalities, with interactive maps, to standard HTML code.
timelines, and other visualizations. The Exhibit
3.0 software has two separate modes:
Scripted for building smaller in-browser
Exhibits, and Staged for bigger server-based
Exhibits [10].
Table 4-1 a quick description of the various toolkits reviewed.

Due to the author’s previous knowledge of Hype and Exhibit, and an ability of these technologies
to display on laptops and mobile devices, the
requirement and constraint were met. Number of customers 25
Therefore, deep technical evaluations of each Number of interviews 26
toolkit were not performed and the Number of countries 6

technology selection was concluded. An Countries represented United States, Spain, Finland,
India, China, Singapore
added benefit of the Exhibit toolkit is that a Number of verticals 11
backend database or search engine was not Verticals represented Credit Reporting, Energy,
Financial Services,
required. Instead, a JSON structured flat file
8
Government, Information and
could be produced, bundled with any Communication Technology,
Insurance, Legal, Media and
visualization, and when browser clients Entertainment, Retail, Systems
Integration,
access a visualization instance the data set Telecommunications
could be included or distributed along with Total combined pages 229
the instance. With the toolkits chosen, Format Microsoft Word Version 2011

selection of the relevant data set and Table 4-2 metadata summary for included customer
prototype development proceeded. study materials.

8
JSON (JavaScript Object Notation) a formatted text structure loosely following the ECMA script standard
designed for structured data interchange between applications and programming languages [15].

12
Utilized Data
In an effort to make the system as believable as possible real customer interview Artifacts were
used. A total of 26 Microsoft Word documents, each embodying a single interview, were identified
and included in this project. Due to the sensitivity of most of these materials, this report
intentionally minimizes detail on them. Therefore demographic style data covering the overall set of
26 is reported in table 4-2. The author hopes that this provides a sense of the scale and
properties of the data set analyzed for the project. However, because one of the companies
included in the set of data analyzed is no longer a going concern information about it can be
reported in detail, see Appendix 3. Finally, to prepare the data for detailed analysis the files were
converted to plain text and the file names were enriched with metadata. The last point, on
augmenting the file names with additional metadata, actually proved critically important in the
implementation of the prototype system. Specifically, the file names included data like customer
name, vertical, geographic location, and so on. The precise file name structure implemented is
represented in table 4-3 using the exemplary case of Heller Eherman.

Field Date Region Country State/Province City Vertical Study Name Customer Type
Example 20070202 Amer USA California Palo Alto Legal Content Services Heller Eherman Interview
Derived File name 20070202-AMER-USA-CALIFORNIA-PALO ALTO-LEGAL-Content Services-Heller Ehrman-Interview.txt
Table 4-3 file format structure including an example Heller Eherman and extrapolated file instance.

Prototype System
With the decisions about technology concluded and the data set identified, construction of the
prototype system commenced. Development first started by building a short program to construct
a structured summary of the 26 interviews, and as previously documented, Python was utilized for
this effort. While the actual program utilized is available in appendix 4 basic functions of the
program are described.

1. START
a. Identify each file in a supplied directory
b. For each file do the following steps
i. Extract summary metadata for the study from the file name
ii. Open the file and compute the top 15 most frequent key words
iii. Construct a JSON data stanza (see appendix 3 for an example JSON stanza related to Heller
Eherman)
iv. Append the JSON data stanza to the entire data structure
v. Generate a thumbnail image of the top 15 most frequent key words
c. Persist the JSON data structure as a file
2. FINISH

13
Once the structured summary was built, development then moved on to the visual presentation of
the system. For this phase, basic usability principals were used, such as Aesthetic and minimalist
design, but a deep usability evaluation was not performed. Here development used both Tumult
Hype and MIT’s Exhibit toolkit concluding in the production of a wrapping experience and four
alternative visualizations. Additionally, to capture a qualitative sense of how any participant felt
about the test a survey on SurveyMonkey was created9. Each alternative implemented some part
of the Planners’ needs expressed during the fact-finding phase. A map for the visualization portion

Figure 4-1 a map of the visual presentation of the entire prototype system.

of the entire prototype system is available in figure 4-1. In an effort to more clearly connect
implementation to needs expressed during fact-finding, detail about each alternative is presented
in table 4-4.

Alternative Explanation
Alternative – 1: Geographic view of key words by • Visualizes each customer’s 15 most frequent key
customer. words according to their location in the world.
• Participants must click on the key word bubble to
see specific information including:

9
Based upon the author’s training sometimes an inverse correlation is evident between actual user
performance and the reported perception. If present in the results the author hoped to help point out that
just collecting survey data is insufficient to determine the effectiveness of a system. See appendix 5 for the
survey questions and the summarized responses.

14
o 15 most frequent key words
o Customer name which is hyperlinked to the actual
detailed interview
o Date of the interview
o The vertical of the customer (E.g.
Telecommunications, etc.)
• Implements features like key words by geography,
summary visualizations of key words, interview data
over time, and the reporting of customer verticals.
Alternative – 2: • Visualizes each customer’s 15 most frequent key
Geographic words according to their location in the world and
view of key afford participants the option to search and filter
words by customer names, key words, etc.
customer • Participants are able to perform all functions of
including Alternative – 1 with the following additions:
filtering and o Search and Filter on key words
search. o Search and Filter on customer names
o Search and Filter on customer verticals
• Implements all of the features in Alternative – 1
adding additional affordances around searching and
filtering to ease the access to the available data.
When either a search, filtering or combined search-
filtering action is performed the system updates the
map to match the criteria. That is to say only the
customers who match the criteria remain on the
map.
Alternative – 3: • Visualizes each customer’s 15 most frequent key
Time view of words according to when they were performed;
key words by affords participants the option to search and filter
customer. customer names, key words, etc.
• Participants are able to perform the same search and
filtering functions of Alternative – 2 in addition to the
following:
o Each interview is placed on a timeline chart
according to when it was conducted
o When a customer name, corresponding to an
interview, is clicked it reports the same information
as described in Alternative – 1
• Implements all of the search and filtering features in
Alternative – 2 adding additional affordances to more
explicitly visualize the relationship between time and
customer interviews.
Alternative – 4: • Visualizes each customer’s 15 most frequent key
View of words according to when and where they were
customer and performed; affords participants the option to search
key word data and filter customer names, key words, etc.
by geography • Participants are able to perform the same geospatial,
and time. and time views of data with search and filtering
functions as outlined in all of the other alternatives.
• Implements all of the search and filtering features in
all alternatives.

15
Table 4-4 explanation of each of the implemented alternatives.

Once the implementation for each alternative was completed, a wrapping visual experience was
built in Tumult Hype. Development of the wrapping experience was needed to ease interactions
between the prototype system and the participants. Specifically, the goal was to simplify their
movements through each alternative visualization approach and the survey. Once all of the
development items were completed, the prototype was deployed into the included Apache web
server on a Mac OS X V10.9.2 system and was readied for testing.

5. Testing Approach, Analysis Techniques, and Results

Testing Approach
To execute the usability tests a protocol was developed that intentionally caused the participants to
perform the core use case, “Abstract to detailed exploration” (see appendix 2 for the actual
protocol). Since learnability has the potential to positively impact user performance individual steps
within the protocol were intentionally designed to minimize, not eliminate, between task learnability.
Furthermore to gain awareness of an alternative’s inherent ability to cause usability slips or errors
each task had both correct and incorrect answers. These facets, coupled to snooping web server
logs, allowed the author to measure the time to complete a task successfully, slip to success, or
fail all together for each alternative.

Actual testing occurred in typical Information Technology


Location Address
workplaces at three locations, see table 5-1, from April
1. 2845 Lafayette St., Santa
4th, 2014 to April 25th, 2014. Each test was executed on
Clara, CA, 95050, USA
an Apple MacBook Pro 15.2 inch laptop system running 2. 292 Yoshida-cho, Totsuka-ku,
the OS X Mavericks operating system. No special Yokohama, Kanagawa 244-
hardware was used to measure user performance, 0817, Japan
physiological behaviors, etc. Software for the prototype 3. 300 Beach Road 28-01, The
Concourse, Singapore 199555
system was installed into a directory space accessible to
Table 5-1 testing locations around the
the locally running instance of the Apache web server.
world.
Prior to starting each test the system was reset to a
known good state with the Apache log files emptied, web cookies and data from Survey Monkey
removed, all web browser caches emptied, and all filtering options within the prototype unset10.
To initiate a test each participant was given a paper copy of the protocol, asked to read it first, and
then execute the test. The participant then sat at a desk or table and performed the test, referring
to the paper copy of the protocol as required.

10
Note that at least one test failed to have the web browser caches emptied and cookies for SurveyMonkey
removed. This test completely failed and the data was not included. Two other tests provided partial data
because the web browser cache was only partially emptied. Data from these two sessions were included.

16
Overall, 30 participants were included in the study each executing a test session that lasted
approximately 10 minutes or less. Additional detail about the sample of participants is provided in
this section to give the reader a sense of who executed the tests within the study.
• Participants matched the earlier and operationally defined role of a Planner. That is they
were engineers, product managers, product planners, product marketers, and so on.
• No incentives were provided or recruitment strategies employed to entice the participants
to perform the test.
• No preference was given to race, culture, gender, age, or work experience.
• Generally the participants ranged from the mid-thirties to mid-fifties in age having IT work
experiences from fifteen to thirty years.
• Additionally, due to the various locations American, Japanese and Singaporean cultures
were included in the study.

Analysis Techniques
Closing a usability testing session consisted of executing a simple script that extracted key entries
from the Apache web server log, stored the results in a CSV file, emptied the web server log and
restarted the web server. With session data stored the time to complete the task, cause a slip or
cause an error was computed11. The results were times, in seconds, for each task per participant,
see table 5-2. Per session time data were then consolidated into a single Microsoft Excel file for
statistical analysis. However for the survey, SurveyMonkey handled both data persistence and the

Time URL Action Seconds0to0Completion Correct01=y,00=n Alternative


52734 /~mihay/mockup.html <6Start6session
/~mihay/mockup.hyperesources/iframe=
52737
htmlwidget.html
/~mihay/mockup.hyperesources/iframe=
53964 <6Enter6alternative
bytimehtml.html
53964 /~mihay/sample_1/locationNoFilter.html
/~mihay/docs/20070202=AMER=USA=CALIFORNIA=
54005 SANTA%20CLARA=RETAIL=Content%20Services= <6Download6|6Slip 41 0 1
eBay%20HR=Interview.docx
/~mihay/docs/20111130=AMER=USA=CALIFORNIA=
54085 <6Download6|6Correct 121 1 1
FAIRFIELD=TELCO=CC=AT%20and%20T=Interview.docx
/~mihay/mockup.hyperesources/iframe=bytimehtml=
54098 <6Enter6alternative
1.html
54098 /~mihay/sample_1/location.html
/~mihay/docs/20111130=AMER=USA=CALIFORNIA=
54139 <6Download6|6Correct 41 1 2
FAIRFIELD=TELCO=CC=AT%20and%20T=Interview.docx
/~mihay/mockup.hyperesources/iframe=bytimehtml=
54170
2.html
54170 /~mihay/sample_1/time.html <6Enter6alternative
54182 /~mihay/sample_1/__history__.html?0
/~mihay/docs/20111130=AMER=USA=CALIFORNIA=
54243 <6Download6|6Error 73 0 3
FAIRFIELD=TELCO=CC=AT%20and%20T=Interview.docx
/~mihay/mockup.hyperesources/iframe=bytimehtml=
54249 <6Enter6alternative
3.html
54249 /~mihay/sample_1/locationAndTime.html
/~mihay/docs/20120905=APAC=CHINA=BEIJING=BEIJING=
54299 <6Download6|6Correct 50 1 4
ICT=CC=Neusoft%20Reseller=Interview.docx

Table 5-2 exemplary session data showing task times for correct, slip and error actions.

11
Times were computed by subtracting the timestamp of a downloaded customer interview document from
the entry time into a particular alternative.

17
computation of basic descriptive statistics. Initial analysis of the session data consisted of a c2 test
for normality to ensure that further analysis could be performed. With the c2 test completed, a one-
way ANOVA was performed determining if differences between the means existed, see appendix
6. Following the ANOVA post-hoc T-Tests were computed to detect differences between
individual means. As for the survey data, tabulation of results for the survey was computed by the
SurveyMonkey service. Since the primary reason for the survey is a qualitative sense of participant
perception c2 tests, an ANOVA, and post-hoc T-Tests were not used.

Results
With the data collected, it was prepared
Alternative P+Val No.0Missing0Data0Points Ho:0Assume0normality
for analysis. An initial review of the data Alt$1 0.4693 3 Accept/Null
Alt$2 0.8847 3 Accept/Null
showed that in some cases data points
Alt$3 0.8141 6 Accept/Null
were missing – either due to failures Alt$4 0.2973 3 Accept/Null
generated directly by the participant or Table 5-3 missing data points per alternative and Chi
due to the web browser’s cache not squared tests on normality.
being emptied. Therefore, for those missing data the mean for each alternative was substituted.
While this obviously skewed the data towards the mean results from the c2 tests showed that the
assumption of normality could not be rejected, see table 5-3. With c2 tests allowing for confidence
in the assumption of normality the additional tests of ANOVA and post hoc T-Tests were
performed. The ANOVA results showed significant differences between the means (p-
val=1.04639E-12) of all of the alternative visualizations. Further post hoc T-Tests illustrated
significant differences between most of the means except for two, alternatives 2 and 4, see table
Comparison P+Val Alpha Ho:2No2difference2in2means 5-4. With these basic tests
Alt$1&to&Alt$2 0.000028293 0.05 Reject&Null
Alt$1&to&Alt$3 0.047058421 0.05 Reject&Null completed user performance
Alt$1&to&Alt$4 0.000012119 0.05 Reject&Null amongst the alternatives could be
Alt$2&to&Alt$3 0.000000002 0.05 Reject&Null
Alt$2&to&Alt$4 0.504529797 0.05 Accept&Null
considered and contemplated.
Alt$3&to&Alt$4 1.2485351E$9 0.05 Reject&Null
Table 5-4 post hoc T-Test comparisons between the alternatives. After concluding the initial analysis,
testing normality assumptions and
checking for differences between the means, the results were graphed and are presented in figure
5-1. Again, it is possible to state that differences existed between all alternatives, excluding 2 at
83.44 seconds and 4 at 76.93 seconds. This proved not surprising to the author as their
construction was similar – both included map based visualizations and identical filtering widgets.
As a result, no clear winner between the alternatives can be declared. Instead it is possible to
state that the visualizations, included in the prototype, which contained map based visualizations
coupled to filtering controls exhibited the best-measured user performance. One potential reason,
for insignificant differences between the means of alternatives 2 and 4, likely stems from between
task learnability. That is because since both alternatives included the same set of widgets, it is
highly possible that interactions with alternative 2 were learned and carried over to alternative 4 by
the participants. In essence, learnability between the tasks likely allowed the participants to
complete task 4 with higher performance. Additional detail and commentary are reported per

18
200.00
175.50
180.00
160.00 142.67
140.00
Time in seconds

120.00
100.00 83.44
76.93
80.00
60.00
40.00
20.00
0.00
Alt-1 Alt-2 Alt-3 Alt-4
Alternatives

Figure 5-1 per alternative means, in seconds. Note that no significant difference is discernable
between alternatives 2 and 4.

alternative including a characterization, via additional descriptive statistics, if the alternative caused
errors or slips, and the results of the survey self-reported by the participants.

• Alternative 1 – With a mean of 142.67 ± 62.95 seconds participants slipped a high


percentage of the time (56.7%), and error percentages were relatively low at 10%.
Interestingly, only 24.14% reported this alternative as difficult to use with the remainder
reporting as extremely easy to use and moderately easy to use, 34.48% and 41.38%
respectively. Based upon the author’s training, this is not surprising as it represents bias in
the response of the participant. That is, on many occasions participants tend to self-report
a higher degree of perceived performance even though measured performance data
doesn't support the perception. An interesting qualitative observation comes from the high
rate of slippage. Participants who slipped generally downloaded exactly the same incorrect
customer study Artifact, and then recovered to the correct answer. A potential reason for
the high rate of slippage is discussed in the conclusion section.
• Alternative 2 – Resulted in a mean of 83.44 ± 33.59 seconds and low percentages of slips
and errors at 3% and 3% respectively. No participant reported this alternative as difficult to
use. 27.59% reported as extremely ease to use, and 72.41% reported it as moderately
easy to use.
• Alternative 3 – Produced a mean performance of 175.5 ± 61.29 seconds with the highest
percentage of errors at 17% and only 3% slippage. Being the lowest performing alternative
seemed to result from the primary timeline widget either having no bound or generally being
unclear. Qualitative observations showed that users struggled with the timeline widget

19
either by moving well beyond the final customer study Artifact, or not initially understanding
that the timeline widget was active and could be controlled. Participants reporting on their
perception of ease of use for this alternative was mixed with 20.69% reporting extremely
easy to use, 48.28% reporting moderately easy to use, and 31.03% reporting difficult to
use.
• Alternative 4 – With the lowest absolute mean at 76.93 ± 41.2 seconds and low error and
slip percentages, 7% and 3% respectively, participants self-reported this alternative
overwhelmingly as extremely or moderately easy to use, 72.41% and 20.69% respectively.
In fact only 2 participants, or 6.9%, suggested the alternative was difficult to use. In this
case participant bias seemed to match measured performance data.

One observation from the resulting data including the self-reporting survey data: It appears that
participant perception of their performance sometimes differs from their measured performance.
An exact and precise reason behind these phenomena cannot be clearly understood leaving both
the author and the reader to guess.

6. Conclusions and Future Work


Overall, the work studied the concepts related to traditional steps in Information Systems design
and planning. Notably, with fact-finding the author applied an approach that may ultimately result
in a more structured way to think about generating requirements for engineering teams.
Specifically, the author hopes to apply this process to upcoming customer studies at his company
to uncover insights into key use cases, desired non-functional behaviors, and overall awareness of
a particular concept by interviewed customers. While not the focus of the paper, the author
imagines a relatively structured process can emerge that includes at least the below:

1. During the interview cycle, continuous debate on key concepts ideally resulting in an
evolving definition of uncovered key concepts and their associated key words.
2. Post interview cycle a joint session with most or all interviewers to debate and validate key
concepts with an aim to structure the findings in a manner as illustrated in this paper.
3. Delivery of a comprehensive report and associated visualizations that could represent the
body of the work to provide collateral consumers an easy way to enter and see what
outcomes and findings are relevant to them.

However, this process is still overly manual and likely a challenge to team members who are non-
native English speakers. This is where some simple findings from this project could be applied to a
slightly modified fact-finding process. Notably, the encoding of customer metadata into the file
name structure as exemplified in the Heller Eherman case is suggested. In fact, the majority of the
prototype system was made possible by using this clever naming structure that allowed a Python
program to extract metadata needed to produce the alternative visualizations. Furthermore other
important processes emerged to ensure that awareness of past interview cycles remains high.
Notably one interviewee suggested that through the usage of status reports, face-to-face

20
meetings, internal blog posts and summary quotes a continuous stream of information be
published. This interviewee’s intention was that repeated pointers to the content might actually
serve to slowly increase awareness of a customer interview repository. Ultimately, the author
hopes that various quantitative and qualitative techniques will cause more frequent usage of a
repository effectively leading to un-stranding the Artifacts.

Beyond application of structured debate and consideration and active advertisement, the project
also sought to review what it would take to build systems that extract meaning from a repository of
Artifacts. The author’s hope is that in the long term such an approach would ease Planners
struggles with finding meaning from the collateral. While the system dealt mostly with the front end
visualizations, a repeated suggestion of time savings and excising bias when generating interview
reports proved compelling. That is because it suggests more automated approaches to extract
generic meanings could both reduce time in pinpointing a subset of Artifacts and instill confidence
in Planners that bias has been minimized. Further because the workers who conduct customer
interviews may have limited English proficiency an automated system may also remove undue
burden on the interviewers, shortening the cycle time required to turn around an Artifact. Ideally, a
system would be able to extract meaning directly from audio recordings, limiting or eliminating
outright the requirement to perform detailed transcriptions or meeting summaries. Yet alternatives
might be imagined such as using low cost human powered transcription services to extract the
conversation word-for-word including a rough time index. Beyond the gathering of new interview
collateral it was clear that there is a hidden body of content available for consumption by the
author’s company’s Planners, which needs to be gathered and curated into a form consumable by
more than just individual personnel. Furthermore the author imagines even if some collateral was
missed, through an exhaustive search, the gathering and mining of a consistent set of collateral
could prove useful for advanced text mining, search/indexing, and as studied in this project,
structured visualizations of the repository.

In particular, this project focused in detail


on several alternative visualization
approaches specifically for customer
interview materials. What was uncovered,
at local level, are two alternative
visualizations exhibiting similar user
performance, in terms of time to conclude
a task. In addition when these user
performance data were compared to how
participants perceived the difficulty of each
task performed there was seemingly an
inverse correlation between their perception Figure 6-1 example of a usability defect which likely
and their performance. While certainly a caused errors and slips.

topic for deeper study, it does suggest that


merely looking at survey data alone, to judge user performance, is insufficient to make an informed

21
decision on partial or overall system construction. Instead, a more reliable way to compare the
alternatives should include multiple data sources and combine them into a complete picture. At a
global level, and due to actual observations of the tests, the author does wonder about how
emerging Big Data sets are going to be visualized. Specifically, if the local findings can be
extrapolated slightly does it suggest more study is required to better understand the impact of
visualization techniques on user performance for Big Data? Essentially, the author wonders if
application developers may overload dashboards and other GUI visualization techniques resulting
in usability errors, information loss and ultimately incorrect decision-making. Given the promise of
Big Data, in order to revolutionize decision making poor selection of visualizations may even result
in dire consequences. For example, an interesting finding derived from observation is that the
customer detail bubble used in the prototype system was inappropriately designed for consistent
user performance, see figure 6-1. Notably, the usage of color in the visualization of the top 15 key
terms may cause a user to skip over structured customer data also reported in the bubble like the
vertical the customer belonged to. What resulted was users consistently slipped or failed the task
outright, and in fact to help the users recover the author asked a generic question, “Please review
the task to determine if you’ve completely it correctly by downloading a file associated to a
telecommunications provider.” The author suspects if the bubble had been designed with color
throughout perhaps the rate of errors and slips would have been reduced for the first task.
Moreover, the generally poor performance on the third alternative, when coupled to observations of
the tests, illustrated that some kind of
scaffolding was likely required to guide the
user when interacting with the utilized timeline
widget. Optionally, another widget, which
bounded the span of visualized time, could
be employed that clearly places restricted
boundaries on time. Specifically, the author
observed users, once they experimented with
and understood how the widget worked,
moved well past the last data point. That is
Figure 6-2 the time widget used in alternative 3.
the, time widget could effectively go
backwards and forwards in time without bound, see figure 6-2. Hence, the suggestion of a
different widget that constrains or puts explicit boundary conditions on how far back in time a user
could progress. Moving back to the author’s point on Big Data visualizations: If developers
implement visualizations that cause high rates of error or slippage could the consequences be
dire? Since the author is aware of Big Data visual presentations representing the likes of bullet
train inter-arrival times, usability slips and errors caused by poor selection of visualizations has the
potential to be dire in consequence. A potential path ahead to reduce slips and errors both in the
local and global contexts follows.

Included in the survey were two questions that sought to uncover the level of interest in the formal
development of a proper system implied by this study. From this survey 96.55% of the
participants thought additional investment should be sought to better manage the Artifacts, note

22
one participant responded that they weren’t sure. Furthermore than a super majority, 75.86%, of
participants strongly agreed that such a system should include key word or key term visualization
techniques. This leads to the following question: What kinds of capabilities might a “proper
system” require? Certainly the core use case discovered during the fact-finding phase is
mandatory, and due to the results of this study the inclusion of visualizing Artifacts overlaid on a
map, with filtering and searching, should be highly preferred. Beyond that Google-like search,
presentation of competitive information, inclusion of industry specific analyst data, and an ability to
combine these various data all seem relevant. The overarching goal in the inclusion of additional
modes of visualization, information processing, and more data types is assisting Planners in
uncovering insights for their various objectives. Further hints, peppered throughout the user
interviews, on time saving and eliminating bias suggest some approach to better link summaries of
data to direct references of the source material. For example, if it were possible to index key
words from audio recordings any reporting must directly reference the source audio recording(s)
ideally including the originating time sequence. However, as new use cases, widgets and
affordances are granted to users how can user performance be continuously quantitatively
evaluated? Certainly, this study also suggests a path forward through the capture of systems logs;
yet system log capture should be augmented with some form of continuous automated capture of
detailed user session data to better track user behaviors. These data could then be feed back and
forward into a design and development process that takes into account continuous measurement
of user performance. Of course, more formal and regular usability tests, augmented by capturing
system and session logs, should be performed systematically prior to release of the tool assuring
that the materialized features realize expressed requirements. Finally, from interviewee onto
participant it there was clear interest expressed in realizing a customer and market intelligence
system that boosts productivity of Planners. While the author isn’t yet certain how a real system
will materialize he is sure that this study will improve the offering planning experience at his
company!

23
References
[1] J. L. Whitten, L. D. Bentley, and Kevin Dittman, Systems Analysis and Design Methods, 7th
ed., Brent Gordon, Ed. New York, New York, America: McGraw-Hill, 2007.
[2] Jorik Blaas et al., Exploration and Analysis of Massive Mobile Phone Data: A Layered Visual
Analytics approach, Feb 15, 2013, A work product of the Orange Data for Development (D4D)
challenge.
[3] Apple, Inc. (2012, Oct.) Investor Relations. [Online].
http://investor.apple.com/common/download/sec.cfm?companyid=AAPL&fid=1193125-12-
444068&cik=320193
[4] Alistair Baron, Paul Rayson, and Dawn Archer, Word frequency and key word statistics in
historical corpus linguistics, Jan 12, 2009, Used to provide a broad understanding of word and
key term frequency analysis.
[5] Troy Tanner and Joe Zhou, Construction and Visualization of Key Term Hierarchies, Mar 11,
2002, Provided insights on the construction of systems based upon automated key term
extraction.
[6] Olha Buchel and Eva Fischer, "Can Interactive Map-Based Visualizations Reveal Contexts of
Scientific Datasets? ," Faculty of Information and Media Studies , The University of Western
Ontario , Ontario, Paper 2012.
[7] Alexander Haubold and John R. Kender, "ANALYSIS AND VISUALIZATION OF INDEX
WORDS FROM AUDIO TRANSCRIPTS OF INSTRUCTIONAL VIDEOS," Department of
Computer Science, Columbia University, New York, Jun 16, 2004.
[8] Tumult. (2014) Tumult Hype Documentation - Overview. [Online].
http://tumult.com/hype/documentation/overview/
[9] Mike Bostock. (2013) D3.js - Data-Driven Documents. [Online]. http://d3js.org
[10] Massachusetts Institute of Technology. (2012) MIT - SIMILE - Exhibit 3.0. [Online].
http://simile-widgets.org/exhibit3/
[11] Jeffrey M. Thompson and Mats P. E. Heimdahl, "Structuring Product Family Requirements for
n-Dimensional and Hierarchical Product Lines.," Department of Computer Science and
Engineering, University of Minnesota, Minneapolis, 2004.
[12] Padmanabhan C. Prasanna, "DECIMAL: A Requirements Engineering Tool for Product
Families," Computer Science, Iowa State University, Ames, 2001.
[13] Steven Firer and S. Mitchell Williams, "Intellectual capital and traditional measures of corporate
performance," Journal of Intellectual Capital, vol. 4, no. 3, pp. 348-360, 2003.
[14] David L. Parnas, "On the Design and Development of Program Families," IEEE Transactions on
Software Development, vol. SE-2, no. 1, pp. 1-9, Mar. 1976.

24
Appendixes

Appendix 1 – Generic Interview Questionnaire & Example Interview Outcome

Generic Interview Questionnaire


Attendees:
Date:
Title: Interview notes and questions for determining feature set

Questions:
1. History of current content
a. Do you know if there are any customer study materials and
where you might go to find them?
b. If you are aware of the materials is the current format sufficient
or insufficient for your needs?
2. Experiencing the content
a. How would you like to explore the materials to get the best
possible benefit?
b. Do you imagine that some kind of visualization of the findings
would be useful/helpful?
c. Are you familiar with Word/Tag Clouds and key term
visualization? If so do you think they would be helpful?
d. Are specific organization techniques useful/helpful such as
content/key term by geography, time, and vertical/sector? Are
there others than those mentioned?
e. Do you imagine that you want to get to the content directly or
are more summarized abstracts or key term visualizations a
better place to start?
f. What platform is the best target for such an exploration system?
3. Outcomes and consumption practices for the content
a. What kinds of discoveries and findings do you anticipate are
possible or even relevant?
b. If so what kind do you think are preferable or relevant?
c. How do you typically use customer study materials in your
plans?
d. If you do not how do you consolidate your own information to
produce release, plan, other content?
4. Are there other kinds of data to include in conjunction with customer
study data/materials? If so can you describe the data?

Example Interview Outcome


Attendees: Participant 1, Participant 2, Participant 3

25
Date: Feb. 10, 2014 9:00AM PST

Title: Interview notes and questions for determining feature set

Questions:
1. History of current content
a. Do you know if there are any customer study materials and
where you might go to find them?
i. Participant 1: I know some, but there are a bunch that
haven’t been re-aggregated. So I don’t know where
everything is.
Participant 3: YRL may have done some of the work,
and there was any kind of reporting that was done but
not sure about it. I don’t know where the materials are
at CompanyX, and I in fact I have a cached copy on my
laptop. There were materials that were resent to
CompanyX. There is information on CompanyX’s shared
files, but only accessible by the product planning team.
ii. Participant 2: I know there is material and I’ve looked for
it and couldn’t find it. Is there a reason it isn’t in the
Sharepoint site?
iii. CONCLUSION: Even among team members who have
participated or consumed content there is a distinct lack
of knowledge about where the collateral is located, and
intelligence that there might be additional content, but
not certain.
b. If you are aware of the materials is the current format sufficient
or insufficient for your needs?
i. Participant 1: I have had teams that have used the super
detailed reports and this is what they wanted. However,
from my perspective the format is insufficient. Today you
have to read and aggregate it yourself. There is no easily
searchable approach to get to data so that we can
identify contents. There are a lot of audio files, but they
are dark data. For a recent project I had my team do
raw transcriptions, but I cannot aggregate the work.
Transcription work takes hours from the interview team
to do. So the short answer is that no they aren’t usable
at large. A lot of the data is dark. Some of our
colleagues in Japan were trained on detailed and
specifics for particular topics so this could be a change
in skills and awareness in the planning process.

26
ii. Participant 3: No. Many reports are long and raw. There
isn’t really much conclusion in the reports. Also we don’t
think about reusing the reports in other ways for other
projects. When we start the studies there are specific
questions related to current products from various parts
of the organization, but it is very specific to a topic. After
6 months to 1-year we may not be able to find
something on a new topic. This is because we’re
missing key words on particular topics. So the content
may be okay, but we also need some skill sets from our
users on how we can use the content in new ways.
People need to make assumptions and draw
conclusions from the collateral not have things spelled
out. We may need to also change our interview style so
that we talk about problems and look towards a
customer re-visitation problem.
iii. Participant 2: I haven’t looked at the materials to make a
proper judgment. Some form of an index would be
interesting.
iv. CONCLUSION: Some way to search & discover content
is critical. The declaration about there being a lot of dark
data is quite interesting.
2. Experiencing the content
a. How would you like to explore the materials to get the best
possible benefit?
i. Participant 2: I’m thinking if you have a broad range of
topics to discuss with the customer. If I had the index or
list of questions then I could at least see the areas of
consideration that were thought about. It would allow
me triangulate down to the set of interviews to pay
attention to. I’d also like to look at multiple interviews,
multiple topics, etc. I think partially you have to put in
the time. Not sure if there is an automated method to do
that.
ii. Participant 1: What I would like to do is [describe]
something. So if you have the experience of going to a
public library you can do searches on anything. It would
basically give you indexes. It will basically give you hits
to anything anywhere. Since I know the structure of the
interviews they aren’t actually topical. The [raw] content
exposes the customer thinking for a number of years.
I’d like to include not only our data, but also Twitter, HDS

27
Community, and I’d also like to see linkages and data
that incorporates analyst and market views all in one
[tool]. Essentially with key terms which will grow over
time. I’d also like to see some visualization such as
Wolfram Alpha. Number trending, Frequencies, meaning
of words. I’d start with the key term library internal data,
our written reports, but more importantly being able to
get to some of the recorded data. I’d be thrilled with
that, and it doesn’t have to be perfectly automated. Just
having our own stuff searchable and be able to search
the audio content would be huge. Very seldom will the
audio content be high fidelity. Sometimes there are
follow-up discussions that are followed by another team
[PM, Eng, Sales] and we don’t see the connection
between these studies, yet in some cases we don’t even
follow-up. Also we don’t get cold calls and we miss
customers that we don’t have or haven’t acquired.
iii. Participant 3: Two things. First for the data we gather
we have to consider a way to do text mining including
voice to text. This part should be automated. We’re
focused more on the technology side and as a result we
may need to a business to technology key term
thesaurus. This is a mutual learning process with our
customers. It isn’t always that we can ask the right
questions from our customers. It isn’t always a one-time
visit with our customers. [Could we use the materials
with our customer as well? This might suggest some
anonymization.] Not all of the information is shared
across multiple teams. There is some kind of a need for
a single repository [and we need to work on this with one
another]
iv. CONCLUSION: Again indexing and search has come up
especially references to key terms. In particular
Participant 1 also talked about mashing up data types to
achieve a more complete result. Finally, perfect
automation of the whole process isn’t
required/mandatory and some part of the human
process is essential for the “ah ha” moment.
b. Do you imagine that some kind of visualization of the findings
would be useful/helpful?
i. Participant 3: I do.
ii. Participant 2: Yes

28
iii. Participant 1: And I already talked about some from the
previous question.
c. Are you familiar with Word/Tag Clouds and key term
visualization? If so do you think they would be helpful?
i. Participant 1: Yes, yes. Probably need more. They are a
good start as they can help people start with the
beginnings of their investigations. In a company like ours
they could be really dangerous. You may not be able to
refine how questions are asked. If you’re a native
English speaker then it can help, if you’re not then it can
be very limiting and dangerous. I know that almost every
team falls into this trap. Whether or not you have the
content, the teams [outside of the project team] don’t
trust and believe that the people gathering the data were
[not] biased. I don’t know how we solve the fundamental
trust and belief problem. We may not have skilled
people who are capable for investigative reporting and
open-ended questions. If you do all of this visualization
will the teams actually believe it?
ii. Participant 2: I think it would be useful. It isn’t specific
information it is more roadmap-ish and it helps you find
the data. This is like a court case so our goal should
potentially be to open the data for folks to draw their
own opinions.
iii. Participant 3: Visualization will be very useful, but the
way that you visualize is highly dependent on who
created it. I don’t know how we can create things in a
non-uniform way. How can we create visuals in different
ways? If you want to get a specific answer from the
beginning they visualizations are biased. It depends if
you’re non-native then you may miss the context of the
input from audio content. The way that you visualize the
data there isn’t an ultimate way to present the data. At
the same time it is very convincing that you have visuals.
iv. CONCLUSION: The theme of being biased when
gathering the information came up a lot. As a result
better access to the raw content, potentially all the way
down to an originating audio file if it exists, was hinted at.
d. Are specific organization techniques useful/helpful such as
content/key term by geography, time, and vertical/sector? Are
there others than those mentioned?

29
i. Participant 3: Time is important. Yet I’m not sure how
this could be related to customer studies. Can we see
the potential financial opportunity of a trend/task? Can
we connect to other systems and data sources to help
us make a decision?
ii. Participant 1: There are an infinite number of
combinations that I cannot predict. One interesting
parameter is that our number and types of key terms will
increase over time. Specifically the Content Cloud study
would see a high density of the term metadata and not
before that time. However, over time the bias element
should change over time. This would make it possible
for customers to have cleaner views of what customers
are asking for. The most recent visits are super biased.
Can we eliminate key terms that are too frequent? If you
know that J&J has this type of problem and this kind of
tool can go to the EDGAR DB to find of all of these types
of companies who have similar problems. Tier data
comes from IDC, SEC filings, Patent filings, …, there’s all
public data. Investor and manual report. Are stuff is
even more siloed than the public DBs.
iii. Participant 2: I agree with Participant 1. If you have the
lens of time you’ll see something different. Vertical
seems like the ringer. Can we see the perspectives or
who has written or who has conducted the
engagement? Is EDGAR internationalized?
iv. CONCLUSION: Viewing the development of
summaries/key terms over time seems relevant. Also the
idea of noisy terms that are two frequent might either be
removed, a potential affordance, or dampened with the
lens of time. Again the idea of including external data
sources came up.
e. Do you imagine that you want to get to the content directly or
are more summarized abstracts or key term visualizations a
better place to start?
i. See 2.a.Participant 1
ii. See 2.c.Participant 3
iii. PARTICIPANT 2: I would want to get to the content
directly for sure, however the concept of summarized
abstracts sounds very appealing to start my research.
Starting with a list of brief summaries could be beneficial
for directing my search effort. Something as simple as

30
having an abstract that would highlight the nature of the
interview and key subjects discussed.
f. What platform is the best target for such an exploration system?
i. Ran out of time, could not answer
3. Outcomes and consumption practices for the content
a. Laptop was generally recommended.
b. What kinds of discoveries and findings do you anticipate are
possible or even relevant?
i. See 2.d.Participant 3
ii. PARTICIPANT 2: Individual customer strategies and
trends amongst the whole and some number of nuggets
that address specific inquiries whether it was a due to
coincidental response to a question or due to a specific
topic Q&A.
c. If so what kind do you think are preferable or relevant?
i. NOTE: UNABLE TO ASK DUE TO TIME, WILL FOLLOW
UP AND UPDATE
d. How do you typically use customer study materials in your
plans?
i. Participant 2> I have not yet leveraged our archived
customer studies
e. If you do not how do you consolidate your own information to
produce release, plan, other content?
i. PARTICIPANT 2>I use analyst reports, articles and
whitepapers and catalogue the links/references in a
loosly structured list of notes.
4. Are there other kinds of data to include in conjunction with customer
study data/materials? If so can you describe the data?
a. See 2.d.Participant 1
b. 2.d.Participant 3

31
Appendix 2 – Usability Test Questionnaire and Task List
Background: Given your role (e.g. marketer, product manager, product marketing,
planning, engineering, etc.) you’re to perform a series of tasks across a set of alternative
visual treatments. At the end of the session you’ll be asked to take a digital survey to rate
and reflect on the process.

Usage Scenario: Imagine you’re planning for a major release of a new offering/product.
To help you in your planning efforts you want to quickly identify thinking and sentiment from
customers in our customer base. As a result you're going to use a new customer
intelligence system to help find the right set of collateral for your study.

Notice: If you run into any problems/challenges please do not feel afraid to ask questions.

Tasks:
1. Log into the system
2. Using Alternative-1
a. For the new offering you think that Telecommunications companies who
have talked about or referred to the term “data” are critically important.
b. Actions:
i. Please select Alternative-1 from the home screen
ii. Once the system has rendered please find the first
Telecommunications Company including a reference to the key
word data and download the interview document associated to that
customer.
iii. Once the document is downloaded please return to the “Home”
screen.
3. Using Alternative-2
a. After some consideration and study you realized that you’d need to refine
your search slightly. Specifically, you’ll want to find any
Telecommunications Company who talked about or referred to the term
“metadata.”
i. Actions:
1. Please select Alternatve-2 from the home screen
2. Once the system has rendered please find the first
Telecommunications Company who makes reference of
metadata, and download the interview document associated
to that customer.
3. Once the document is downloaded please return to the
“Home” screen.
4. Using Alternative-3
a. During your deliberations you’re beginning to wonder when users began to
wonder how long ago customers began talking about content, Big Data,
etc.
i. Actions:
1. Please select Alternatve-3 from the home screen

32
2. Once the system has rendered please find the oldest
interview, which mentioned the word “content.” Once
you’ve found the interview download the document
associated to that customer.
3. Once the document is downloaded please return to the
“Home” screen.
5. Using Alternative-4
a. Finally, you want the perspective of learning about the behavior of the
Chinese market with respect to Advanced Analytics.
i. Actions:
1. Please select Alternatve-4 from the home screen
2. Once the system has rendered please find the first company,
in China, who makes reference of the word analytics, and
download the interview document associated to that
customer.
3. Once the document is downloaded please return to the
“Home” screen.
6. Conducting the survey
a. Please click on the survey link in the top right hand corner and conclude the
survey.

33
Appendix 3 – Exemplary Customer Interview Materials & Related JSON
Data

Notice
Due to the sensitive nature of customer centered documentation only one example can be
included. There are two reasons why this exemplary information can be included:
1. This particular customer, Heller Ehrman, is no longer in business, and
2. Since the interview was conducted in 2007 there is very little sensitive information
included.

Heller Ehrman – Employment Attorney


Name: Heller Eherman Content Services Interview
Date: Friday, February 02, 2007

Goal: Discuss with Heller Ehrman how they operate (e.g. what records retention policy
is used and how they plan on adhering to it) and potentially determine what kinds of
requirements can be gleaned for HDDS and other Content Services offerings.

Outcome: Understandings of the key pain points were related by the individual
employment attorney interviewed. Further ideas potentially leading to differentiation
points for the Hitachi Storage Solutions Group were hinted at and documented in this
memo.

Background
Heller Ehrman was a law firm headquartered in the San Francisco, USA. Founded in
1890 and surviving the San Francisco earthquake in 1906, the company has recently
been dissolved due to a bad financial year in 2007 and the recent poor economic
climate. More information on Heller Ehrman can be found in Google and at their web
site http://www.hewm.com.

Customer Quotes

• Heller Ehrman (HE) quote on content and metadata: “You have to provide
access to the metadata to the court.”
• HE quote on data presentation: “In theory I can get rid of all of the binders in my
office, but I print them anyway. The reality is that lawyers like to see paper.”
• HE quote on backup tapes and the discovery process: “I think that’s right. I
would say that is true. Sometimes its better to not get the data. Sometimes we
curse the fact that the company has backup tapes, because tapes are notoriously
difficult to recover from.”
• HE quote on avoiding backups: “It is a various complex question. … I think I’m
going to beg off of that question. As there are some times when backups are
required and others that they aren’t required. It is highly dependent on specific
regulation, etc.”

34
• HE quote on communication in the workplace with respect to job performance for
employees: “Too often in the workplace problems occur because problems aren’t
communicated or the communication is obfuscated.”
• HE quote on not getting employee emails: “Sometimes it is not the end of the
world if you aren’t able to get the communications from the employees.”
• HE quote on the amount of information required for lawsuits: “In the event of
running a defense lawsuit there is a voracious appetite for information.”
• HE quote on which tools are used in class action lawsuits: “For big class action
lawsuits Excel is the workhorse for managing information.”

Comments on Proposed Roadmap

• No roadmaps were disclosed to the customer.

Potential Differentiation Points


These are problem areas a user points out which may lead to features and concepts
incorporated into product(s). While no feature has been suggested directly, these points
should eventually map to a definable capability or a trend that maps to a series of
features within product.

• The ability to make the integrity evaluation quickly without having to hire experts
would be very important.
• Building a web based system, which has varied access control mechanisms and
can allow for the inclusion/exclusion of unstructured data objects based on
search terms was explicitly mentioned by the lawyer.
• One can infer that due to the regulations mentioned by the interviewee, regionally
specific search islands that are unified using a search federation model might be
applicable. For example with the privacy laws differing between the Europe and
the US, employee information may not be exported from the Europe to the US.
Therefore it is preferable that regionally specific searches are done implying that
a federation model might be preferable.

Other Topics
• Federal Rules of Civil Procedure: He stated that the FRCP does not become
active until there is a lawsuit. However that does not imply companies should not
be prepared. It essentially requires that any company must be able to track
maintain and produce records in printed and native form including their metadata.
Data must only be retained in the event of litigation. At that point, a litigation hold
is required on the relevant assets, which are a part of trial/discovery process.
One final point of clarification: there might be other regionally specific regulations,
which clearly specify data retention rules and policies that must be adhered to
the FRCP does not countermand those rules or regulations.
o Companies must have a way of storing data related to litigation in a safe
platform.
o State court will most likely follow the federal approach either in law or
guidance.

35
o A typical lawsuit that could occur where FRCP might be applicable is in a
class action lawsuit. This is most interesting because of the large-scale
requirements for data gathering, etc.
• An email correspondence or trail is used to show what an actual person is doing,
in other words eligible for over time, etc. Of late there was a class litigation
lawsuit for wage and compensation with Electronic Arts being one example.
• Data Integrity – means to the attorney data has not been manipulated one way or
the other. When asked how would you determine that a record is authentic, the
answer is that expert witnesses are typically hired to judge if the integrity of the
email or document.
o Typically some of the plaintiffs suggest that the defendants are
manufacturing emails. This implies that there are questions about data
integrity.
• While there aren’t any specific cases related to showing the chain of custody to
protect the privacy of employees’ personnel information including who has
accessed the data, the lawyer can imagine that it would be a problem.
Particularly paper documents are problematic in this area especially for HR
departments.
o Most HR departments are still using paper, but there is a growing trend of
HR IS systems being deployed.
• In order to show that we are doing the right things employers need to document
various performance activities of their employees. This is a burden on the
management, and building the long-term case is important. If there was a way to
make that process go more quickly for the managers and HR teams involved
then it would be a time saving system. This would be for performance both
negative and positive kinds of actions. Suggested something that provides a little
flexibility yet prompts them down the path to make the process easier.
• When talking about employment lawsuits, and providing access to information
the attorney being interviewed suggested some kind of a web portal
infrastructure that both the plaintiff and the defendant could access for
information sharing.
o May want to export these materials in a known format for seeing the case
history or web portal for remote access.
o Want to see every communication or document from the employee,
however, there is a danger in creating a virtual personnel file, perhaps it
would be temporal shadow managers only file.
o Standard emails aren’t appropriate for the personnel file.
o The easier you make it for the defendant the easier you make it for the
plaintiff
• The technology used for data winnowing and gathering is still fairly immature.
For instance having to manage 10000 documents to winnow them down to the
right set is not very easy to do today. While the tools are getting more and more
mature they are still somewhat arduous.
• In the even of he portal kind of system there may be a requirement to export the
results to PDF and in a printable format. However this is not the best approach.
• “For every employment matter there is no consistency in what I get.”
o Offer letters, Performance Improvement Plans, Reviews, etc.
o Documents that are produced by a particular employee are not usually
needed unless it is needed as evidence to prove that their work is of good
or bad quality. This is done mostly in the counseling context.

36
• What organizational changes, post 9/11, are being created in organizations?
o It is not 9/11 related, but there are things related to SoX, which are more
corporate infrastructure in nature.
• How do you know what to keep or not what not to keep, pre-litigation
§ Tell our employers that they need to keep things 5 years after the
employee has departed. May not apply to the typical email like
the ones asking to go out after dinner and there is not any
obligation in the ordinary course to keep it.
• Do employees have the right to take their employee file with them when the leave
a company?
o State dependent and country dependent, and there are some states that
allow users to take their file or a copy of their file with them.
o For example in California, the employee may request to look at their file
but not take it with them.
• Is anything in the realm of audio files that should be kept?
o Yes as voicemails are a regular part of the business process and may be
needed to defend a case retention rules are applicable to these as well.
• What employee metadata is needed when looking at information for discovery
purposes?
o Employee Name
o Employee Serial Number
o Employee Job category – in some cases this may be hard to identify. In
one example they were going through every category in the payroll
system to look for all people who are close to that job type in a given
organization. In some instances this requires looking at the paper pay
records.
• For lawsuit management at this particular law firm there is an extranet system
that contains all of the materials for a given case. This system allows the lawyers
and workers on the case to have varied access to the materials within the portal.
• Having the users from the other side being able to search the content getting
back the results, which don’t include the actual content is something that is very
interesting. From there they could get the listing of results and bring to the
attention to the opposing side the list allowing the opposing counsel to gain
access to their desired information.
• Key pain points from this attorney’s perspective:
o Document collection and analysis is something that has to be done and
the new regulations make this harder to do is very hard and challenging.
o Keeping track of the time spent on a given activity. Might be possible to
create a small report that tells you how long your session has lasted this
could help users

Related JSON Data


{
"customer": "Heller Ehrman",
"fileName": "http://localhost/~mihay/docs/20070202-AMER-USA-CALIFORNIA-PALO%20ALTO-LEGAL-Content%20Services-
Heller%20Ehrman-Interview.docx",
"imageURL": "http://localhost/~mihay/images/Heller_Ehrman_1.png",
"index": 2,
"interviewDate": "2007-02-02",
"keyterms": [
"data",
"heller",
"lawsuit",
"make",

37
"points",
"some",
"system",
"access",
"content",
"file",
"may",
"quote",
"information",
"employee",
"when"
],
"label": "Heller Ehrman Interview",
"place": "PALO ALTO, CALIFORNIA, USA",
"region": "AMER",
"type": "Interview",
"vertical": "LEGAL"
},
{
"index": 2,
"interviewDate": "2007-02-02",
"label": "2",
"type": "Interviews"
},
{
"id": "Heller Ehrman Interview",
"index": 2,
"placeLatLng": "37.45542, -122.16708"
},

38
Appendix 4 – Prototype Exemplary Source Code

Key Word Extraction & Image Creation


import requests, collections, bs4, re, operator, os, urllib2, json return: an array of tags
from collections import Counter """
from roundup.backends.indexer_common import STOPWORDS def tagsToString (theTags):
from pytagcloud import create_tag_image, create_html_data, make_tags, tagArray=[]
LAYOUT_HORIZONTAL, LAYOUTS, LAYOUT_MIX, LAYOUT_VERTICAL, for t in theTags:
LAYOUT_MOST_HORIZONTAL, LAYOUT_MOST_VERTICAL tagArray.append (t[0])
from pytagcloud.colors import COLOR_SCHEMES return tagArray
from pytagcloud.lang.counter import get_tag_counts
from os import walk def getFileNames (myPath):
realFiles=[]
""" myParse=re.compile ('\.txt$')
Desc: Make a list containing three dicts which will turn into JSON for (dirpath, dirnames, filenames) in walk (myPath):
stanzas for f in filenames:
input: if myParse.search (f):
fileName - (string) "20120507-AMER-USA-FLORIDA-MELBOURNE- realFiles.append (f)
SI-CC-Northrup Grumman-Interview" break
keyTerms - (string) '["term", "term"]' return realFiles
idx - (integer) 1
return: three dicts suitable for conversion into a JSON structure, #Generate a list of specific stop words we want to avoid
customer name with idx appended and un myStopWords=['THEM','O','NG','S','T','MANY','LOTS','HAVE','HAS','H
""" AD','FROM','FOR','DO','DOES','DOESN','CAN','AN','ALL','ABOUT','CA
def makeJSONStanza (fileName, keyTerms, idx): ME','WOULD','WAY','WANT','WHICH','YOU','YET','X','VERY','VIA','U','
myBits=re.split('\\-',fileName) OUR','NO','ALSO','SUCH','ALL','HDS','NEED','DIFFERENT','OTHER','O
myType=myBits[-1] THERS','SYSTEMS','USED', 'WHAT']
myTypePlural='Interviews' STOPWORDS.extend (myStopWords)
rawCustomer=myBits[7]
friendlyCustomer='_'.join(re.split('\W+',rawCustomer)) myFiles=getFileNames ('./docs')
myVertical=myBits[5] allJSON=[]
imageLoc='http://localhost/~mihay/images/' myIdx=1
fileURL='http://localhost/~mihay/docs/' + fileName + '.docx'
myLongLat='UNDEFINED' for theFile in myFiles:
myDate=myBits[0][0:4] + "-" + (myBits[0])[4:6] + "-" + (myBits[0])[6:8] #Define the file name and split up the name bits into usable chuncks
myRegion=myBits[1] myFileName=re.split('\\.', theFile)[0]
myCountry=myBits[2] with open("./docs/" + theFile) as file: myText = file.read().lower()
myStateProvince=myBits[3] #Capture the top 15 most commont keyterms
myCity=myBits[4] counts = collections.defaultdict(int)
studyName=[6] for word in re.split('\W+', myText):
myPlace=myCity + ', ' + myStateProvince + ', ' + myCountry if word.upper() not in STOPWORDS and len(word)>2:
return [ counts[word.lower()] += 1
{'label': rawCustomer + " " + myType, 'type': myType, words = sorted((count, word) for word, count in counts.items())
'imageURL': imageLoc + friendlyCustomer + '_' + idx + myTags = [(word, count) for count, word in words[-15:]]
'.png', 'interviewDate': myDate, 'vertical': myVertical, keyTermString=tagsToString (myTags) (JSONStanza,
'customer': rawCustomer, friendlyCustomer)=makeJSONStanza (myFileName, keyTermString, str
'index': int(idx), 'keyterms': keyTerms, 'fileName': fileURL, (myIdx))
'region': myRegion, allJSON.extend (JSONStanza)
'place': myPlace}, myIdx+=1
{'label': idx, 'type': myTypePlural, 'interviewDate': myDate, tags = make_tags(myTags,minsize=10, maxsize=17)
'index': int(idx)}, create_tag_image(tags, './images/' + friendlyCustomer + '.png',
{'id': rawCustomer + " " + myType, 'placeLatLng': myLongLat, size=(192,128), background=(255, 255, 255, 255), layout=2,
'index': int(idx)} fontname='Philosopher', rectangular=True)
], friendlyCustomer + '_' + str (1)
JSONFileName='./keyterms.js'
""" JSONFile=open (JSONFileName, 'w')
Desc: Make a list of tags/terms JSONFile.write (json.dumps(allJSON, sort_keys=True, indent=4,
input: separators=(',', ': ')))
theTags - is a list of tuples JSONFile.close ()

39
Web User Interface for Alternative 4 – Source & GUI
<html> <div><img data-ex-src-content=".imageURL" /></div>
<head> <div>Name: <a data-ex-href-content=".fileName"><span
<title>Key Words by Location and over Time</title> data-ex-content=".label"/></a></div>
<meta http-equiv="content-type" content="text/html;charset=UTF-8" <div>Date: <span data-ex-content=".interviewDate"/></div>
/> <div>Vertical: <span data-ex-content=".vertical"/></div>
</div>
<link href="schema.js" type="application/json" rel="exhibit/data" /> <div data-ex-role="view"
<link href="keywords.js" type="application/json" rel="exhibit/data" /> data-ex-formats="date { mode: medium; show: date }"
data-ex-view-class="Timeline"
<script type="text/javascript" data-ex-label="Key Words over Time"
src="http://localhost/~mihay/scripted/src/exhibit- data-ex-start=".interviewDate"
api.js?bundle=false"></script> data-ex-autoposition="true"
data-ex-bubble-width="320"
<link rel="exhibit-extension" type="text/javascript" data-ex-top-band-pixels-per-unit="400"
href="http://localhost/~mihay/scripted/src/extensions/map/map- data-ex-show-summary="false"
extension.js?service=google&bundle=false"/> data-ex-timeline-height="200"
<link rel="exhibit-extension" type="text/javascript" data-ex-select-coordinator="Interview"
href="http://localhost/~mihay/scripted/src/extensions/time/time- >
extension.js?bundle=false" /> </div>
<div data-ex-role="viewPanel" data-ex-initial-view="0" data-ex-
<link rel='stylesheet' href='styles.css' type='text/css' /> formats="date { mode: medium; show: date }">
</head> <div class="map-lens" data-ex-role="lens" style="display:
none;">
<body> <div><img data-ex-src-content=".imageURL" /></div>
<div data-ex-role="collection" data-ex-item-types="Interview"></div> <div>Name: <a data-ex-href-content=".fileName"><span
<table id="frame"> data-ex-content=".label"/></a></div>
<tr> <div>Date: <span data-ex-
<td id="sidebar"> content=".interviewDate"/></div>
<h1>Filters</h1> <div>Vertical: <span data-ex-content=".vertical"/></div>
<div id="exhibit-browse-panel"> </div>
<b>Search:</b> <div data-ex-role="view"
<div data-ex-role="facet" data-ex-facet data-ex-view-class="Map"
class="TextSearch"></div> data-ex-label="Key Words by Location"
<hr/> data-ex-latlng=".placeLatLng"
<div data-ex-role="facet" data-ex-expression=".customer" data-ex-center="38.479394673276445, -
data-ex-facet-label="Customers" data-ex- 115.361328125"
height="10em"></div> data-ex-zoom="3"
<div data-ex-role="facet" data-ex-expression=".keyterms" data-ex-bubble-width="200"
data-ex-facet-label="Key Terms" data-ex- data-ex-icon=".imageURL"
height="10em"></div> data-ex-shape-width="70"
<div data-ex-role="facet" data-ex-expression=".vertical" data-ex-select-coordinator="Interview"
data-ex-facet-label="Verticals" data-ex- data-ex-shape-height="70">
height="10em"></div> </div>
</div> </div>
</td> </td>
<td id="content"> </tr>
<div data-ex-role="coordinator" id="Interview"></div> </table>
<h1>Timeline and Geography</h1> </body>
<div class="item" data-ex-role="lens" style="display: none;"> </html>

40
41
Appendix 5 – Survey questions and responses

Questions
Feedback on the Customer Intelligence Prototype

1. Based upon your experience please rank the ease of use for Alternative-1.
• Extremely easy to use
• Moderately easy to use
• Difficult to use
2. Based upon your experience please rank the ease of use for Alternative-2.
• Extremely easy to use
• Moderately easy to use
• Difficult to use
3. Based upon your experience please rank the ease of use for Alternative-3.
• Extremely easy to use
• Moderately easy to use
• Difficult to use
4. Based upon your experience please rank the ease of use for Alternative-4.
• Extremely easy to use
• Moderately easy to use
• Difficult to use
5. After looking at the prototype, generally I think that using key word analysis and
visualization techniques are helpful for gathering customer intelligence.
• Strongly agree
• Moderately agree
• Agree
• Moderately disagree
• Strongly disagree
6. After experiencing the prototype I think we should invest in improvements to manage our
customer intelligence materials.
• Agree
• Disagree
• Not Sure

42
Responses

43
Appendix 6 – Raw data with descriptive statistics and Key statistical tests

Raw data with descriptive statistics


Session Alt$1$C Alt*1*INC Alt$2$C Alt*2*INC Alt$3$C Alt*3*INC Alt$4$C Alt*4*INC = Error
1 121 41 41 73 50 = Slip
2 88 144 130 90 = Missing
3 182 71 86 161 154
4 96 54 63 157 63
5 159 52 73 308 231
6 136 22 81 141 81
7 133 108 125 138
8 115 91 101 92 65
9 112 65 196 40
10 70 54
11 31 58 245 66
12 151 42 150 55
13 44 44 54 26
14 51 303 81 201 69
15 51 103 119 284 59
16 231 56 68 11 159 48
17 49 88 223 105
18 120 19 15 89 33
19 131 54 134 56
20 238 134 164 88
21 328 232 100 254 79
22 144 60 107 66
23 121 50 154 122 99
24 105 102 118 43
25 131 120 308 38
26 149 42 76 274 98 38
27 84 68 110 65
28 282 237 62 111 62
29 216 81 133 229 105
30 107 46 21 112 30

Count 27 20 27 2 24 6 27 3
Session?Count 30 30 30 30 30 30 30 30
Percent?Resp. 90% 67% 90% 7% 80% 20% 90% 10%
1*Percent?Resp. 10% 33% 10% 93% 20% 80% 10% 90%
Average 142.67 82.05 83.44 27.50 175.50 121.00 76.93 52.33
Variance 4419.62 6219.00 1259.03 544.50 4909.13 3965.60 1893.61 520.33
Standard?Dev. 66.48 78.86 35.48 23.33 70.07 62.97 43.52 22.81
Errors 3.00 1.00 5.00 2.00
Slips 17.00 1.00 1.00 1.00
Percent?Err. 10% 3% 17% 7%
Percent?Slip 57% 3% 3% 3%

44
Key statistical tests

ANOVA
Analysis of Variance (One-Way)

Summary
Groups Sample size Sum Mean Variance
Alt-1 30 4,280. 142.66667 3,962.41379
Alt-2 30 2,503.33333 83.44444 1,128.78161
Alt-3 30 5,265. 175.5 3,893.44828
Alt-4 30 2,307.77778 76.92593 1,697.71903

ANOVA
Source of Variation SS df MS F p-level F crit
Between Groups 203,555.31636 3 67,851.77212 25.40703 1.04639E-12 2.68281
Within Groups 309,788.51852 116 2,670.59068

Total 513,343.83488 119

45
T-Tests

46
1"&"3"
" [ t-test assuming
Comparing Means " equal variances (homoscedastic)
" ]
Descriptive Statistics
VAR Sample size Mean Variance
30" 142.66667" 3,962.41379"
30" 175.5" 3,893.44828"

"Summary " " "


Degrees Of Hypothesized Mean
Freedom 58" Difference 0.E+0"
Test Statistics 2.02898" Pooled Variance 3,927.93103"

"Two-tailed distribution " " "


p-level 0.04706" t Critical Value (5%) 2.00172"

" " " "


"
1&2" " " "
" [ t-test assuming
Comparing Means " equal variances (homoscedastic)
" ]
Descriptive Statistics
VAR Sample size Mean Variance
30" 142.66667" 3,962.41379"
30" 83.44444" 1,128.78161"

"Summary " " "


Degrees Of Hypothesized Mean
Freedom 58" Difference 0.E+0"
Test Statistics 4.54606" Pooled Variance 2,545.5977"

"Two-tailed distribution " " "


p-level 0.00003" t Critical Value (5%) 2.00172"

" " " "


"
1&4" " " "
" [ t-test assuming
Comparing Means " equal variances (homoscedastic)
" ]
Descriptive Statistics
VAR Sample size Mean Variance
30" 142.66667" 3,962.41379"
30" 76.92593" 1,697.71903"

"Summary " " "


Degrees Of Hypothesized Mean
Freedom 58" Difference 0.E+0"

47
2&3$
$ [ t-test assuming
Comparing Means $ equal variances (homoscedastic)
$ ]
Descriptive Statistics
VAR Sample size Mean Variance
30$ 83.44444$ 1,128.78161$
30$ 175.5$ 3,893.44828$

$Summary $ $ $
Degrees Of Hypothesized Mean
Freedom 58$ Difference 0.E+0$
Test Statistics 7.11479$ Pooled Variance 2,511.11494$

$Two-tailed distribution $ $ $
p-level 0.000000002$ t Critical Value (5%) 2.00172$

$ $ $ $
$
2&4$ $ $ $
$ [ t-test assuming
Comparing Means $ equal variances (homoscedastic)
$ ]
Descriptive Statistics
VAR Sample size Mean Variance
30$ 83.44444$ 1,128.78161$
30$ 76.92593$ 1,697.71903$

$Summary $ $ $
Degrees Of Hypothesized Mean
Freedom 58$ Difference 0.E+0$
Test Statistics 0.67156$ Pooled Variance 1,413.25032$

$
Two-tailed distribution $ $ $
p-level 0.50453$ t Critical Value (5%) 2.00172$

48
3&4$
$ assuming equal
Comparing Means [ t-test $ variances (homoscedastic)$ ]
Descriptive Statistics
VAR Sample size Mean Variance
30$ 175.5$ 3,893.44828$
30$ 76.92593$ 1,697.71903$

$
Summary $ $ $
Hypothesized Mean
Degrees Of Freedom 58$ Difference 0.E+0$
Test Statistics 7.22058$ Pooled Variance 2,795.58365$

$Two-tailed distribution $ $ $
p-level 0.000000001$ t Critical Value (5%) 2.00172$

49

Вам также может понравиться