Академический Документы
Профессиональный Документы
Культура Документы
v), polynomial(( u
v + coef0)
degree
),
radial basis function(RBF): exp(- |u v|
2
) and
sigmoid(tanh( u
amaas/data/sentiment/index.html.
[4] LibSVM. http://www.csie.ntu.edu.tw/
cjlin/libsvm/.
[5] Scikits.learn. http://scikit-learn.org/stable/.
[6] Gensim. http://radimrehurek.com/gensim/.
[7] WS Sarle. Neural Network FAQ. ftp://ftp.sas.com/pub/neural/FAQ.html,
1997.
[8] Hsuan tien Lin and Chih-Jen Lin. A study on sigmoid kernels for svm
and the training of non-psd kernels by smo-type methods. Technical
report, 2003.
[9] S. Sathiya Keerthi and Chih jen Lin. Asymptotic behaviors of support
vector machines with gaussian kernel.
[10] LibLinear. http://www.csie.ntu.edu.tw/
cjlin/liblinear/.
APPENDIX
1) Graphs
2) Table of results
3) Algorithms
ALGORITHMS
Bag-of-Words
A dictionary was created from all the words in the trainset.
The size of the dictionary was the dimention of the training
space, and each word represented one dimension. For each
document, the value of the document along a word-dimension
was the frequency of that word in the document. So, most
documents had 0 in most of the dimensions, so that they
were represented with a sparse matrix in libSVM format for
computational efciency.
Tf-idf
In this experiment, tf-idf assigned a weight to each
dimension for each document as follows: weight of
term i in document j in a corpus of D documents is
weight
ij
= frequency
i,j
log
2
(D/documentFreq
i
)
LDA
The module used was of Gensim
(http://radimrehurek.com/gensim/models/ldamodel.html#id2):
we refrain from giving a detailed explanation, because its
lengthy and LDA is well-understood. We extracted 500 topic
models and each document was projected to this 500-D space.
C-SVC
Given training vectors x
i
R
n
, i = 1, . . . , l and indicator
y R
l
relates to the optimization problem:
min
w,b,L
1
2
w
T
w+ C
l
i=1
L
i
subject to:
y
i
(w
T
(x
i
) + b) 1 L
i
, L
i
0, i = 1, . . . , l
-SVC
min
w,b,L,
1
2
w
T
w +
1
l
l
i=1
L
i
subject to:
y
i
(w
T
(x
i
) + b) L
i
, L
i
0, i = 1, . . . , l, 0
and -SVR
Please refer to [?].
Naive-Bayes Classication
Both dened in detail: http://scikit-
learn.org/stable/modules/naive bayes.html
Accuracy
Accuracy =
no of orrectly predicted data
Total test data
100
Regression
Mean squared error = 1/l
l
i=1
(f(x
i
) y
i
)
2
Sqaured correlation coefcient:
r
2
=
(l
l
i=1
f(x
i
)y
i
l
i=1
f(x
i
)
l
i=1
y
i
)
2
(l
l
i=1
f(x
i
)
2
(
l
i=1
f(x
i
))
2
)(l
l
i=1
y
2
i
(
l
i=1
y
i
)
2
FIGURES
Fig. 1. Time taken by classiers arranged according to the three feature sets.
Fig. 2. Time taken in regression arranged according to the three feature sets.
Fig. 3. Accuracy of classiers arranged according to the three feature sets.
Fig. 4. Correlation coefcient for regression arranged according to the three feature sets.
Fig. 5. Accuracy of classiers for scaled and unscaled versions.
Fig. 6. Correlation coefcient of regression for scaled and unscaled versions.
Fig. 7. Accuracy of C-SVC and -SVC classiers.
Fig. 8. Accuracy of -SVC and -SVR regression.
Fig. 9. Accuracy of classiers for different kernels versions.
Fig. 10. Correlation coefcient of regression for different kernels.
Fig. 11. Naive-Bayes algorithm run results.
CS 221 Sentiment Analysis Results
Classificatio
n
Data
Scaled
Data?
Algorithm Kernel Type
Prediction
Accuracy
Mean
Squared Error
Squared
Correlation
Coefficient
Duration
1 - Two Class 1 - Raw 1 - Unscaled 1 - Multinomial Nave-Bayes NA 81.360% NA NA Unavailable
1 - Two Class 1 - Raw 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 83.010% NA NA Unavailable
1 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 0 84.500% NA NA 0:11:22
1 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 1 50.004% NA NA 0:29:47
1 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 2 73.280% NA NA 0:30:29
1 - Two Class 1 - Raw 1 - Unscaled 3 - C-SVC 3 67.900% NA NA 0:30:12
1 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 0 87.152% NA NA 0:17:48
1 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 1 52.852% NA NA 0:16:04
1 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 2 87.084% NA NA 0:19:44
1 - Two Class 1 - Raw 1 - Unscaled 4 - nu-SVC 3 85.788% NA NA 0:21:04
1 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 0 85.480% NA NA 0:13:51
1 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 1 50.000% NA NA 0:42:23
1 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 2 70.896% NA NA 0:44:02
1 - Two Class 1 - Raw 2 - Scaled 3 - C-SVC 3 70.568% NA NA 0:32:01
1 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 0 88.352% NA NA 0:20:17
1 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 1 50.000% NA NA 0:24:39
1 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 2 88.256% NA NA 0:26:20
1 - Two Class 1 - Raw 2 - Scaled 4 - nu-SVC 3 84.412% NA NA 0:21:02
1 - Two Class 2 - TFIDF 1 - Unscaled 1 - Multinomial Nave-Bayes NA 77.100% NA NA Unavailable
1 - Two Class 2 - TFIDF 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 83.010% NA NA Unavailable
1 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 0 84.856% NA NA 0:12:09
1 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 1 50.008% NA NA 0:32:10
1 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 2 87.668% NA NA 0:25:42
1 - Two Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 3 86.688% NA NA 0:26:24
1 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 0 88.632% NA NA 0:21:02
1 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 1 56.508% NA NA 0:29:34
1 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 2 88.524% NA NA 0:22:47
1 - Two Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 3 88.588% NA NA 0:23:31
1 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 0 85.600% NA NA 0:14:03
1 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 1 50.000% NA NA 0:34:24
1 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 2 50.000% NA NA 0:36:13
1 - Two Class 2 - TFIDF 2 - Scaled 3 - C-SVC 3 50.000% NA NA 0:33:11
1 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 0 88.400% NA NA 0:21:56
1 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 1 50.008% NA NA 0:16:28
1 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 2 88.312% NA NA 0:21:14
1 - Two Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 3 85.636% NA NA 0:34:06
1 - Two Class 3 - LDA 1 - Unscaled 1 - Multinomial Nave-Bayes NA 66.300% NA NA Unavailable
1 - Two Class 3 - LDA 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 68.320% NA NA Unavailable
1 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 0 66.133% NA NA 0:05:47
1 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 1 50.472% NA NA 0:06:53
1 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 2 51.248% NA NA 0:08:02
1 - Two Class 3 - LDA 1 - Unscaled 3 - C-SVC 3 51.240% NA NA 0:08:03
1 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 0 62.837% NA NA 0:03:54
1 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 1 50.224% NA NA 0:03:22
1 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 2 54.152% NA NA 0:04:24
1 - Two Class 3 - LDA 1 - Unscaled 4 - nu-SVC 3 60.881% NA NA 0:04:08
1 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 0 67.860% NA NA 0:05:56
1 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 1 50.124% NA NA 0:07:33
1 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 2 53.768% NA NA 0:08:41
1 - Two Class 3 - LDA 2 - Scaled 3 - C-SVC 3 53.744% NA NA 0:08:13
1 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 0 62.744% NA NA 0:04:13
1 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 1 50.040% NA NA 0:03:45
1 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 2 58.064% NA NA 0:04:53
1 - Two Class 3 - LDA 2 - Scaled 4 - nu-SVC 3 62.776% NA NA 0:05:14
2 - Ten Class 1 - Raw 1 - Unscaled 1 - Multinomial Nave-Bayes NA 38.460% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 38.760% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 0 35.496% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 1 20.088% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 2 25.996% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 3 - C-SVC 3 21.960% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 0 39.816% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 1 20.804% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 2 39.688% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 4 - nu-SVC 3 38.096% NA NA Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 5 - nu-SVR 0 NA 15.625 0.274 4:35:33
2 - Ten Class 1 - Raw 1 - Unscaled 5 - nu-SVR 1 NA 12.185 0.010 0:16:29
2 - Ten Class 1 - Raw 1 - Unscaled 5 - nu-SVR 3 NA 11.303 0.191 0:17:08
2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 0 NA 15.185 0.280 Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 1 NA 12.687 0.002 Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 2 NA 10.312 0.184 Unavailable
2 - Ten Class 1 - Raw 1 - Unscaled 6 - Epsilon-SVR 3 NA 11.105 0.129 Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 0 36.732% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 1 20.088% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 2 20.088% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 3 - C-SVC 3 20.088% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 0 40.484% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 1 9.376% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 2 36.500% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 4 - nu-SVC 3 31.220% NA NA Unavailable
2 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 0 NA 10.171 0.370 0:51:14
2 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 1 NA 12.186 0.000 0:16:27
2 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 2 NA 12.167 0.412 0:56:42
2 - Ten Class 1 - Raw 2 - Scaled 5 - nu-SVR 3 NA 12.177 0.414 0:16:42
2 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 0 NA 9.949 0.376 0:39:06
2 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 1 NA 12.186 -0.000 0:35:10
2 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 2 NA 12.148 0.359 0:32:49
2 - Ten Class 1 - Raw 2 - Scaled 6 - Epsilon-SVR 3 NA 12.167 0.359 0:33:21
2 - Ten Class 2 - TFIDF 1 - Unscaled 1 - Multinomial Nave-Bayes NA 31.520% NA NA Unavailable
2 - Ten Class 2 - TFIDF 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 38.760% NA NA Unavailable
2 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 0 36.020% NA NA 0:30:07
2 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 1 20.088% NA NA 0:42:26
2 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 2 37.720% NA NA 0:40:24
2 - Ten Class 2 - TFIDF 1 - Unscaled 3 - C-SVC 3 37.096% NA NA 0:40:32
2 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 0 40.808% NA NA 0:38:14
2 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 1 20.780% NA NA 0:40:27
2 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 2 40.928% NA NA 0:41:12
2 - Ten Class 2 - TFIDF 1 - Unscaled 4 - nu-SVC 3 40.816% NA NA 0:40:09
2 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 0 NA 28.811 0.180 1:30:37
2 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 1 NA 12.186 0.003 0:20:00
2 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 2 NA 7.979 0.509 0:19:56
2 - Ten Class 2 - TFIDF 1 - Unscaled 5 - nu-SVR 3 NA 9.092 0.453 0:25:29
2 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 0 NA 26.983 0.190 0:49:05
2 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 1 NA 12.285 0.001 0:32:20
2 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 2 NA 6.769 0.509 0:34:05
2 - Ten Class 2 - TFIDF 1 - Unscaled 6 - Epsilon-SVR 3 NA 7.902 0.446 0:35:22
2 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 0 36.768% NA NA 0:47:42
2 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 1 20.088% NA NA 0:53:07
2 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 2 20.088% NA NA 0:57:09
2 - Ten Class 2 - TFIDF 2 - Scaled 3 - C-SVC 3 20.088% NA NA 0:59:43
2 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 0 40.492% NA NA 0:54:54
2 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 1 9.668% NA NA 0:27:47
2 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 2 35.812% NA NA 0:41:45
2 - Ten Class 2 - TFIDF 2 - Scaled 4 - nu-SVC 3 31.724% NA NA 0:36:20
2 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 0 NA 9.953 0.377 1:02:48
2 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 1 NA 12.186 0.000 0:22:07
2 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 2 NA 12.167 0.413 0:21:51
2 - Ten Class 2 - TFIDF 2 - Scaled 5 - nu-SVR 3 NA 12.176 0.414 0:21:11
2 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 0 NA 9.747 0.383 0:40:27
2 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 1 NA 14.110 0.000 0:37:38
2 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 2 NA 13.960 0.359 0:46:16
2 - Ten Class 2 - TFIDF 2 - Scaled 6 - Epsilon-SVR 3 NA 14.034 0.359 0:43:54
2 - Ten Class 3 - LDA 1 - Unscaled 1 - Multinomial Nave-Bayes NA 26.320% NA NA Unavailable
2 - Ten Class 3 - LDA 1 - Unscaled 2 - Bernoulli Nave-Bayes NA 29.250% NA NA Unavailable
2 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 0 28.150% NA NA 0:08:55
2 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 1 20.082% NA NA 0:10:08
2 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 2 20.082% NA NA 0:11:04
2 - Ten Class 3 - LDA 1 - Unscaled 3 - C-SVC 3 20.082% NA NA 0:10:38
2 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 0 24.442% NA NA 0:07:39
2 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 1 10.953% NA NA 0:05:33
2 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 2 24.686% NA NA 0:08:47
2 - Ten Class 3 - LDA 1 - Unscaled 4 - nu-SVC 3 23.622% NA NA 0:08:57
2 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 0 NA 10.617 0.157 0:04:13
2 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 1 NA 12.185 0.000 0:04:18
2 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 2 NA 12.124 0.053 0:05:01
2 - Ten Class 3 - LDA 1 - Unscaled 5 - nu-SVR 3 NA 12.155 0.053 0:11:27
2 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 0 NA 10.835 0.128 0:07:02
2 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 1 NA 12.185 0.000 0:07:30
2 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 2 NA 12.081 0.041 0:09:13
2 - Ten Class 3 - LDA 1 - Unscaled 6 - Epsilon-SVR 3 NA 12.119 0.041 0:09:44
2 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 0 29.236% NA NA 0:11:24
2 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 1 20.088% NA NA 0:09:28
2 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 2 20.088% NA NA 0:12:09
2 - Ten Class 3 - LDA 2 - Scaled 3 - C-SVC 3 20.088% NA NA 0:11:01
2 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 0 25.004% NA NA 0:10:56
2 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 1 16.352% NA NA 0:06:28
2 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 2 22.168% NA NA 0:10:06
2 - Ten Class 3 - LDA 2 - Scaled 4 - nu-SVC 3 23.992% NA NA 0:08:03
2 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 0 NA 10.169 0.182 0:04:49
2 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 1 NA 12.186 0.000 0:04:41
2 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 2 NA 12.058 0.078 0:05:10
2 - Ten Class 3 - LDA 2 - Scaled 5 - nu-SVR 3 NA 12.121 0.077 0:08:30
2 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 0 NA 10.313 0.171 0:07:59
2 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 1 NA 12.186 0.000 0:08:02
2 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 2 NA 11.993 0.059 0:08:53
2 - Ten Class 3 - LDA 2 - Scaled 6 - Epsilon-SVR 3 NA 12.060 0.059 0:09:08