Вы находитесь на странице: 1из 16

Sentiment Analysis of Movie Reviews: A Study of

Features and Classiers


Siddharth Jain and Sushobhan Nayak
CS221: Stanford University
sjain2, snayak @ stanford.edu
Abstract
We love movies, and in this project we experiment
on a sentiment analysis task on movie reviews. Our
objective is two-fold: 1) the binary sentiment classi-
cation of a large dataset of movie reviews from IMDB,
2) predicting the critic-assigned rating of the movie
from the review. We extract bag-of-words and tf-idf
and LDA-based language features from the documents
to gauge the saliency of different words and sentence
structures for the task. We then experiment with
different learning algorithms like Naive Bayes, and
different avors of SVM with different kernels, to
classify our documents which helps us compare the
importance and use of different textual features as well
as the capability of the standard learning algorithms
in such a task. We present a detailed analysis of the
effects of the myriads of features and classiers we
have considered and support them with a battery of
experiments on a massive dataset.
I. INTRODUCTION
In this project, we investigate the problem of gauging the
sentiment of a movie reviewer from the review, and predict
how the reviewer rated the movie. Sentiment analysis tasks
are predominantly useful in all rating games, whether they
be shopping/restaurant suggestions or entertainment venue
suggestions: add to that the increased importance of gauging
how a person feels from their social network posts, and you
have got a fairly complex and relevant problem at hand. While
many previous works try to tackle this issue, our work here is
in line of [1] and [2], and we use the later dataset.
The IMDB dataset provided by Andrew Maas[3] contains
50,000 reviews split evenly into 25k train and 25k test sets.
The overall distribution of labels is balanced (25k pos and
25k neg, with equal numbers in both the train and test set).
In the labeled train/test sets, a negative review has a score
<= 4 out of 10, and a positive review has a score >= 7
out of 10. Thus reviews with more neutral ratings are not
included in the train/test sets. Our task here is two-fold: 1) to
classify the documents in the test set as pos or neg , and 2)
to properly try to predict the score of the documents in the
test set, after training our model. This task helps us strengthen
our concepts in two ways: Firstly, it helps us investigate and
different aspects of text processing, especially the relative
prominence of features like bag-of-words (hereafter, BOW),
term frequency-inverse document frequency(tf-idf) and latent
Dirichlet allocation(LDA)-based features. Secondly, by letting
us play with some standard classiers, it helps us internalize
the concepts of parameter selection and comparative perfor-
mance. Furthermore, it lets us test the assumptions derived
from the theory.
II. TASK DEFINITION AND METHODS
The dataset has been described in the previous section. In
the entire collection, no more than 30 reviews are allowed
for any given movie because reviews for the same movie
tend to have correlated ratings. Further, the train and test sets
contain a disjoint set of movies, so no signicant performance
is obtained by memorizing movie-unique terms and their
associated with observed labels. We represent each document
in the dataset as a feature vector, with the features being
either raw bag-of-words, tf-idf or LDA topics (as these are
standard document features used widely in NLP, we refrain
from providing an explanation for each of them due to space
constraints details can be found in the appendix), which
helps us compare between the three for different settings.
For the classication task, the baseline algorithm predicts
the most common label: please note that we are going to
run two sets of classication tasks: a binary classier for
sentiment analysis, and a multi-class classier(and regresser)
for score prediction. The classication problem is handled with
naive-Bayes classication and SVMs (also, given that they are
standard, explained in appendix). We also increase the stakes
by investigating different forms of both the algorithms, through
a variation in kernels, and types of the classiers - so, in
effect, instead of comparing just two complementary methods,
we compare a set of possible methods in a systematic way.
While naive-Bayes gives us a simplistic and fast algorithm
for classication, SVM represents a complex time consuming
algorithm thats expected to provide higher accuracy two
aspects we investigate below.
III. EXPERIMENT
Each document was converted to a bag-of-words repre-
sentation, using sparse matrix libsvm format[4] for storage.
These BOW representations were transformed into tf-idf and
LDA-topic vector representations(500 topics). We then trained
the various classiers and tested the prediction on the test
documents. We used the following standard libraries:
LibSVM[4], for SVM classier.
scikits.learn[5], for naive-bayes classication
Gensim[6], for document feature extraction and conver-
sion from one feature to the other
The comparisons made were the following:
For the multi-class SVM classication, 4 types of SVM
were used, viz. two SVM classiers, C-SVC and -SVC,
and for SVM regression, -SVR and -SVR. While the
classiers would predict the score as one of the train
socres, to wit 1-4/7-10 (reviews with score 5/6 absent
from dataset because they are neutral, hence effectively a
8-class classication task, though we allude to it as a 10-
class classication sometimes in the spirit of score range
of 1-10), the regression values will be free to be real
numbers, that are supposed to get as close as possible to
the expected integer score. We do detailed error analysis
in the following section, and the different types of SVMs
are detailed in the appendix. For the binary sentiment
analysis task, as is expected, only the rst two classiers
were used.
For each type of SVM, 4 types of kernels were used,
linear(u

v), polynomial(( u

v + coef0)
degree
),
radial basis function(RBF): exp(- |u v|
2
) and
sigmoid(tanh( u

v + coef0)), denoted as kernel


type 0-4.
Scaling: The objective was to see how scaling would
affect the SVM classication
For Naive-Bayes classication, we used two classiers:
Multinomial Naive-Bayes, a classic Naive Bayes variant
used in text classication, which usually takes data rep-
resented as word vector counts, although tf-idf vectors
are also known to work well in practice (and also in
this present experiment), and Bernoulli Naive Bayes,
which requires samples to be represented as binary-valued
feature vectors, i.e. 1 for the word that is present in the
document and 0 otherwise, irrespective of the frequency.
(Details in appendix.)
Performance of the three feature representations in all the
above scenarios was analysed.
We ended up testing 155 combinations of these variables,
the details of which have been included in a long table at the
end of this report. Before we move on to the analysis, its to be
noted that, for the multi-class classication task, the baseline
accuracy is 20.088% when the most occurring label(score
1) is predicted. For the binary sentiment classication, since
we have equal number of instances from both classes, the
base case is at chance, i.e. 50%. From the table, even basic
classiers like Naive-Bayes work pretty well in these cases,
with around 80% accuracy with BOW and tf-idf and 60%
with LDA(Fig. 11). Prediction due to SVM varies from the
base case of 50% to a maximum of 88.632%. For the 8-class
classication, accuracy ranges from base case of 20.088% to
41%, while the regression results show a mean-squared error
in the range of 6.8 to 28.8 and squared correlation coefcient
of 0 to 0.509 (both dened in appendix). In general, for the
binary classication task, the SVM classication, except with
the polynomial kernel, gives better prediction than Naive-
Bayes, though it takes a fairly long time, as is expected. SVM
shows a high error rate for multi-class classication, primarily
because of the unbalanced number of training examples for
different classes. Reviews with scores 10 and 1 are the highest
in the training set, and as such, this classication is heavily
biased towards them, so much so that for the default value
for libSVM, with unscaled raw BOW data, the test documents
are completely classied into either 1 or 10. In the following
section, we present a detailed analysis of the results and the
insights gained.
IV. ANALYSIS
A. Time Complexity
While naive-Bayes classication ran in seconds (exact
number unspecied, but always within half-a- minute), SVM
runs took minutes, the minimum being 3min and maximum
being 4 hours 35 mins. This is expected because the present
implementation of naive-Bayes is just a relative frequency
estimator, whereas the version of SVM implemented has to
map a 89,527-D feature space to a higher dimension through
kernel transformation and then run optimization in that space.
When time complexity of features is concerned, tf-idf and raw
BOW have comparable running times, one being higher than
the other at different times and vice-versa; however, LDA takes
much less time than the previous two, 5 to 6 times being the
usual reduction, which is expected because LDA topic model
in the present experiment has only 500 features(Fig. 1, 2).
As implemented, libsvm is polynomial order, while due to
linear nature of the dataset, without kernel transformations,
classication can be done in linear time.
B. Feature-effects
Tf-idf and raw BOW results are comparable, while, in
comparison, LDA results show around a 10-15% drop (Fig.
3, 4). This is expected because the former two boast of a
90,000-D feature space, while the later is in 500-D. In
general, tf-idf is expected to fare better than raw BOW;
however, though such a trend is there, its not too pronounced,
with only <1% boost in most cases. There might be two
reasons why this can be happening: our tf-idf features are not
cosine normalized, and its usually accepted that tf-idf works
better under normalization however, this ill-effect born of
not normalizing is largely seen in longer documents, which,
with more terms and higher term frequencies, tend to get
larger dot products than smaller documents, thereby creating
a strong bias; in our case, as the reviews are of almost the
same length, this effect is therefore not pronounced. Secondly,
another anomaly that can be noticed is that, sometimes tf-
idf predictions are much lower than raw BOWs, e.g. in
the multinomial naive-Bayes case (they are the same in the
Bernoulli case because Bernoulli uses binary features, which
would be the same for both feature-sets, which acts as a sanity
check), where they are 4 and 7% and with scaled-C-SVC with
kernel types 2 and 3, where they are off by 20%: this might
be the case because tf-idf is oblivious to the class labels in the
training set, which can lead to inappropriate scaling for some
features, especially in sentiment analysis tasks: consider two
words representative of positive sentiment that do not occur
in the negative documents at all in this case, while the more
frequent word of the two is a better estimator of the sentiment,
due to its high document frequency, it would be assigned a
lower weight in tf-idf.
C. Scaling effect on SVM
[7] explains the importance of scaling in classication. All
the three feature sets were run in scaled and unscaled modes
regular features were scaled to have values in [0,1], as
opposed to the more regular practice of [-1,1] due to the
sparse nature of the data: [0,1] scaling helps keep missing
feature values to 0, which would otherwise be -1 in many
cases, leading to extra computational load without any gain
in classication performance(at least for a RBF kernel with
optimal parameters). While scaling is expected to improve
classication results, in our experiments we didnt see a
pronounced effect between runs of scaled and unscaled data,
with scaled data only being a marginally better predictor in
regression and almost equally good in classications (Figs.
5, 6), just like the case with tf-idf vs. BOW. This might be
because scaling is relevant when the orders of magnitudes of
different features is unknown, and in that case, scaling prevents
features at the higher end of the measurement spectrum to
be unfairly given larger weights. In our case however, all
the feature dimensions are word-frequencies, and they are
comparable not some artefact of the measuring process, so
that there is minimal bias in that sense for scaling to eliminate.
However, scaling is also helpful in the computation process,
and that explains the little bit of boost.
D. Effect of types of SVM
We have used 4 types of SVMs, C- and -SVC for
classication and - and -SVR for regression. The C- and
versions are similar, with C [0, ), but [0, 1],
with being preferred because it is related to the ratio of
support vectors and the ratio of the training error, and is
intuitively and asthetically pleasing, though both the SVMs
are otherwise equivalent and expected to give similar results,
which is evident from the graph. However, -SVC also shows
slightly more prominent results than its counterpart at times,
especially towards the end of 2- and 10-class classication
part of the graph (Fig. 7). It might be because that the
default parameter(=0.5) conforms to our dataset more than
the default value of C(=1) does. Optimal parameter selection is
expected to provide comparable results. In fact, as we describe
in the next item, the optimal C was found to be 512, much
much larger than the default parameter, which might explain
the poor results in comparison. The SVRs also have a similar
relationship, and their results are indeed comparable (Fig. 8).
E. Parameter setting
While the default parameters in libSVM are good for gen-
eral testing, each dataset requires these parameters to be tuned
properly for a better classication result. While our initial aim
was to explore this scenario, given the stupendous size of the
dataset, the time constraints prevented that from happening.
For example, the algorithm does a grid search to discover the
best C through a 5-fold cross validation on the training data.
While its manageable for datasets with a thousand instances,
the running time runs into days for a 25,000-instances strong
dataset. One instance ran for two days without being able
to produce a result. We improvised and randomly selected
a subset of 5000 examples and ran the grid search, which
took 6 hrs to produce C=512, which was then subsequently
used in a binary sentiment classication using C-SVM to get
a 88.336% accuracy on scaled raw BOW with the RBF kernel,
an improvement of 17% over 71% without parameter tuning.
Notice that a high C of 512 also increases the training time,
so that over the 150 odd runs of the experiment, a large
C seriously undermines the time constraints of a small-class
project. When faced with the choice of a larger C and smaller
number of runs and the default C and 150 runs, we chose
the lesser of the two evils so that we can get more insight into
other aspects of the classication task like feature selection,
kernel manipulation etc., and have an idea of the effect of
parameter tuning from this one instance.
F. Effect of Kernels
We notice that, predominantly, the linear kernel gives the
best results, whereas the polynomial gives the worst. RBF and
sigmoid kernels give comparable results, with the former giv-
ing slightly better results (Fig. 9, 10). This behavior is actually
expected: with polynomial kernels, numerical difculties tend
to happen, for dth power of numbers go to 0 or depending
whether they are >< 1. the kernel matrix for sigmoid is not
always positive denite in general, therefore, its accuracy, in
general, is less than that of the RBF[8]. The relation between
case labels and attributes in text classication problems are
primarily linear, hence the results of the linear kernel also,
for a large number of features, linear kernel is supremely
useful, since a non-linear mapping doesnt improve the perfor-
mance, while takes up a sizable amount of time(linear kernel
has the fastest running time for the classication tasks in
this experiment). The RBF kernel, however, would be able
to produce comparable results with parameter tuning[9].
V. CONCLUSION
The series of experiments helped us gain fruitful insight
into the nuances of selecting features and decide on classiers
in a text-classication task. The results of the experiment
conformed to theoretical predictions most of the time: when
they seemingly didnt, they compelled us to look into the ex-
planations, which were satisfactory and helped understand the
task better. The shear volume of the work a massive dataset
and 150 odd runs helped us gain research experience and
strengthened our concepts. Though we used libSVM to play
with SVMs, given the linear nature of the problem, similar
classication task can be completed with libLinear[10], which
doesnt use kernel transformation. In fact, libsvm is O(n
2
) or
O(n
3
) whereas liblinear is O(n), though it does not support
kernel SVMs, so that investigating that would be a natural
extension. The next stage would be to follow a similarity
type grouping of documents for sentiment classication. The
data set contains another 50000 reviews, which are unlabeled,
and an unsupervised learning attempt on them might take this
attempt further.
REFERENCES
[1] Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis
using subjectivity summarization based on minimum cuts. In In
Proceedings of the ACL, pages 271278, 2004.
[2] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, An-
drew Y. Ng, and Christopher Potts. Learning word vectors for sentiment
analysis. In Proceedings of the 49th Annual Meeting of the Association
for Computational Linguistics: Human Language Technologies - Volume
1, HLT 11, pages 142150, Stroudsburg, PA, USA, 2011. Association
for Computational Linguistics.
[3] Dataset. http://ai.stanford.edu/

amaas/data/sentiment/index.html.
[4] LibSVM. http://www.csie.ntu.edu.tw/

cjlin/libsvm/.
[5] Scikits.learn. http://scikit-learn.org/stable/.
[6] Gensim. http://radimrehurek.com/gensim/.
[7] WS Sarle. Neural Network FAQ. ftp://ftp.sas.com/pub/neural/FAQ.html,
1997.
[8] Hsuan tien Lin and Chih-Jen Lin. A study on sigmoid kernels for svm
and the training of non-psd kernels by smo-type methods. Technical
report, 2003.
[9] S. Sathiya Keerthi and Chih jen Lin. Asymptotic behaviors of support
vector machines with gaussian kernel.
[10] LibLinear. http://www.csie.ntu.edu.tw/

cjlin/liblinear/.
APPENDIX
1) Graphs
2) Table of results
3) Algorithms
ALGORITHMS
Bag-of-Words
A dictionary was created from all the words in the trainset.
The size of the dictionary was the dimention of the training
space, and each word represented one dimension. For each
document, the value of the document along a word-dimension
was the frequency of that word in the document. So, most
documents had 0 in most of the dimensions, so that they
were represented with a sparse matrix in libSVM format for
computational efciency.
Tf-idf
In this experiment, tf-idf assigned a weight to each
dimension for each document as follows: weight of
term i in document j in a corpus of D documents is
weight
ij
= frequency
i,j
log
2
(D/documentFreq
i
)
LDA
The module used was of Gensim
(http://radimrehurek.com/gensim/models/ldamodel.html#id2):
we refrain from giving a detailed explanation, because its
lengthy and LDA is well-understood. We extracted 500 topic
models and each document was projected to this 500-D space.
C-SVC
Given training vectors x
i
R
n
, i = 1, . . . , l and indicator
y R
l
relates to the optimization problem:
min
w,b,L
1
2
w
T
w+ C
l

i=1
L
i
subject to:
y
i
(w
T
(x
i
) + b) 1 L
i
, L
i
0, i = 1, . . . , l
-SVC
min
w,b,L,
1
2
w
T
w +
1
l
l

i=1
L
i
subject to:
y
i
(w
T
(x
i
) + b) L
i
, L
i
0, i = 1, . . . , l, 0
and -SVR
Please refer to [?].
Naive-Bayes Classication
Both dened in detail: http://scikit-
learn.org/stable/modules/naive bayes.html
Accuracy
Accuracy =
no of orrectly predicted data
Total test data
100
Regression
Mean squared error = 1/l

l
i=1
(f(x
i
) y
i
)
2
Sqaured correlation coefcient:
r
2
=
(l

l
i=1
f(x
i
)y
i

l
i=1
f(x
i
)

l
i=1
y
i
)
2
(l

l
i=1
f(x
i
)
2
(

l
i=1
f(x
i
))
2
)(l

l
i=1
y
2
i
(

l
i=1
y
i
)
2
FIGURES
Fig. 1. Time taken by classiers arranged according to the three feature sets.
Fig. 2. Time taken in regression arranged according to the three feature sets.
Fig. 3. Accuracy of classiers arranged according to the three feature sets.
Fig. 4. Correlation coefcient for regression arranged according to the three feature sets.
Fig. 5. Accuracy of classiers for scaled and unscaled versions.
Fig. 6. Correlation coefcient of regression for scaled and unscaled versions.
Fig. 7. Accuracy of C-SVC and -SVC classiers.
Fig. 8. Accuracy of -SVC and -SVR regression.
Fig. 9. Accuracy of classiers for different kernels versions.
Fig. 10. Correlation coefcient of regression for different kernels.
Fig. 11. Naive-Bayes algorithm run results.
CS 221 Sentiment Analysis Results
Classificatio
n
Data
Scaled
Data?
Algorithm Kernel Type
Prediction
Accuracy
Mean
Squared Error
Squared
Correlation
Coefficient
Duration
1 - Two Class 1 - Raw 1 - Unscaled 1 - Multinomial Nave-Bayes NA 81.360% NA NA Unavailable
1 - Two Class 1 - Raw 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 83.010% NA NA Unavailable
1 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 0 84.500% NA NA 0:11:22
1 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 1 50.004% NA NA 0:29:47
1 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 2 73.280% NA NA 0:30:29
1 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 3 67.900% NA NA 0:30:12
1 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 0 87.152% NA NA 0:17:48
1 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 1 52.852% NA NA 0:16:04
1 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 2 87.084% NA NA 0:19:44
1 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 3 85.788% NA NA 0:21:04
1 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 0 85.480% NA NA 0:13:51
1 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 1 50.000% NA NA 0:42:23
1 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 2 70.896% NA NA 0:44:02
1 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 3 70.568% NA NA 0:32:01
1 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 0 88.352% NA NA 0:20:17
1 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 1 50.000% NA NA 0:24:39
1 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 2 88.256% NA NA 0:26:20
1 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 3 84.412% NA NA 0:21:02
1 - Two Class 2 - TFIDF 1 - Unscaled 1 - Multinomial Nave-Bayes NA 77.100% NA NA Unavailable
1 - Two Class 2 - TFIDF 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 83.010% NA NA Unavailable
1 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 0 84.856% NA NA 0:12:09
1 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 1 50.008% NA NA 0:32:10
1 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 2 87.668% NA NA 0:25:42
1 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 3 86.688% NA NA 0:26:24
1 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 0 88.632% NA NA 0:21:02
1 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 1 56.508% NA NA 0:29:34
1 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 2 88.524% NA NA 0:22:47
1 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 3 88.588% NA NA 0:23:31
1 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 0 85.600% NA NA 0:14:03
1 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 1 50.000% NA NA 0:34:24
1 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 2 50.000% NA NA 0:36:13
1 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 3 50.000% NA NA 0:33:11
1 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 0 88.400% NA NA 0:21:56
1 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 1 50.008% NA NA 0:16:28
1 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 2 88.312% NA NA 0:21:14
1 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 3 85.636% NA NA 0:34:06
1 - Two Class 3 - LDA 1 - Unscaled 1 - Multinomial Nave-Bayes NA 66.300% NA NA Unavailable
1 - Two Class 3 - LDA 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 68.320% NA NA Unavailable
1 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 0 66.133% NA NA 0:05:47
1 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 1 50.472% NA NA 0:06:53
1 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 2 51.248% NA NA 0:08:02
1 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 3 51.240% NA NA 0:08:03
1 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 0 62.837% NA NA 0:03:54
1 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 1 50.224% NA NA 0:03:22
1 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 2 54.152% NA NA 0:04:24
1 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 3 60.881% NA NA 0:04:08
1 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 0 67.860% NA NA 0:05:56
1 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 1 50.124% NA NA 0:07:33
1 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 2 53.768% NA NA 0:08:41
1 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 3 53.744% NA NA 0:08:13
1 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 0 62.744% NA NA 0:04:13
1 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 1 50.040% NA NA 0:03:45
1 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 2 58.064% NA NA 0:04:53
1 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 3 62.776% NA NA 0:05:14
2 - Ten Class 1 - Raw 1 - Unscaled 1 - Multinomial Nave-Bayes NA 38.460% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 38.760% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 0 35.496% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 1 20.088% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 2 25.996% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 3 21.960% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 0 39.816% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 1 20.804% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 2 39.688% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 3 38.096% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 5 - nu-SVR 0 NA 15.625 0.274 4:35:33
2 - Ten Class 1 - Raw 1 - Unscaled 5 - nu-SVR 1 NA 12.185 0.010 0:16:29
2 - Ten Class 1 - Raw 1 - Unscaled 5 - nu-SVR 3 NA 11.303 0.191 0:17:08
2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 0 NA 15.185 0.280 Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 1 NA 12.687 0.002 Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 2 NA 10.312 0.184 Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 3 NA 11.105 0.129 Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 0 36.732% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 1 20.088% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 2 20.088% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 3 20.088% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 0 40.484% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 1 9.376% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 2 36.500% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 3 31.220% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 0 NA 10.171 0.370 0:51:14
2 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 1 NA 12.186 0.000 0:16:27
2 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 2 NA 12.167 0.412 0:56:42
2 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 3 NA 12.177 0.414 0:16:42
2 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 0 NA 9.949 0.376 0:39:06
2 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 1 NA 12.186 -0.000 0:35:10
2 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 2 NA 12.148 0.359 0:32:49
2 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 3 NA 12.167 0.359 0:33:21
2 - Ten Class 2 - TFIDF 1 - Unscaled 1 - Multinomial Nave-Bayes NA 31.520% NA NA Unavailable
2 - Ten Class 2 - TFIDF 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 38.760% NA NA Unavailable
2 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 0 36.020% NA NA 0:30:07
2 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 1 20.088% NA NA 0:42:26
2 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 2 37.720% NA NA 0:40:24
2 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 3 37.096% NA NA 0:40:32
2 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 0 40.808% NA NA 0:38:14
2 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 1 20.780% NA NA 0:40:27
2 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 2 40.928% NA NA 0:41:12
2 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 3 40.816% NA NA 0:40:09
2 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 0 NA 28.811 0.180 1:30:37
2 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 1 NA 12.186 0.003 0:20:00
2 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 2 NA 7.979 0.509 0:19:56
2 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 3 NA 9.092 0.453 0:25:29
2 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 0 NA 26.983 0.190 0:49:05
2 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 1 NA 12.285 0.001 0:32:20
2 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 2 NA 6.769 0.509 0:34:05
2 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 3 NA 7.902 0.446 0:35:22
2 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 0 36.768% NA NA 0:47:42
2 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 1 20.088% NA NA 0:53:07
2 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 2 20.088% NA NA 0:57:09
2 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 3 20.088% NA NA 0:59:43
2 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 0 40.492% NA NA 0:54:54
2 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 1 9.668% NA NA 0:27:47
2 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 2 35.812% NA NA 0:41:45
2 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 3 31.724% NA NA 0:36:20
2 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 0 NA 9.953 0.377 1:02:48
2 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 1 NA 12.186 0.000 0:22:07
2 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 2 NA 12.167 0.413 0:21:51
2 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 3 NA 12.176 0.414 0:21:11
2 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 0 NA 9.747 0.383 0:40:27
2 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 1 NA 14.110 0.000 0:37:38
2 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 2 NA 13.960 0.359 0:46:16
2 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 3 NA 14.034 0.359 0:43:54
2 - Ten Class 3 - LDA 1 - Unscaled 1 - Multinomial Nave-Bayes NA 26.320% NA NA Unavailable
2 - Ten Class 3 - LDA 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 29.250% NA NA Unavailable
2 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 0 28.150% NA NA 0:08:55
2 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 1 20.082% NA NA 0:10:08
2 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 2 20.082% NA NA 0:11:04
2 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 3 20.082% NA NA 0:10:38
2 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 0 24.442% NA NA 0:07:39
2 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 1 10.953% NA NA 0:05:33
2 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 2 24.686% NA NA 0:08:47
2 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 3 23.622% NA NA 0:08:57
2 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 0 NA 10.617 0.157 0:04:13
2 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 1 NA 12.185 0.000 0:04:18
2 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 2 NA 12.124 0.053 0:05:01
2 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 3 NA 12.155 0.053 0:11:27
2 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 0 NA 10.835 0.128 0:07:02
2 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 1 NA 12.185 0.000 0:07:30
2 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 2 NA 12.081 0.041 0:09:13
2 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 3 NA 12.119 0.041 0:09:44
2 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 0 29.236% NA NA 0:11:24
2 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 1 20.088% NA NA 0:09:28
2 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 2 20.088% NA NA 0:12:09
2 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 3 20.088% NA NA 0:11:01
2 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 0 25.004% NA NA 0:10:56
2 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 1 16.352% NA NA 0:06:28
2 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 2 22.168% NA NA 0:10:06
2 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 3 23.992% NA NA 0:08:03
2 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 0 NA 10.169 0.182 0:04:49
2 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 1 NA 12.186 0.000 0:04:41
2 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 2 NA 12.058 0.078 0:05:10
2 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 3 NA 12.121 0.077 0:08:30
2 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 0 NA 10.313 0.171 0:07:59
2 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 1 NA 12.186 0.000 0:08:02
2 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 2 NA 11.993 0.059 0:08:53
2 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 3 NA 12.060 0.059 0:09:08

Вам также может понравиться