Вы находитесь на странице: 1из 1

08/08/2019

Project NLP
• Read file “tweets.csv” which consists of sentiment text data
• The sentiment text data should be divided into training data (80%) and
testing data (20%) (use random_state=123)
DIGITAL TALENT • Please write a program to classify each testing data using only the training
data by:
SCHOLARSHIP • Manual classification: training data is used to build important keywords
2019 • Compare the accuracy of at least 3 threshold values and at least 3 keywords sizes
• Machine learning algorithm: training data is used to build the model of sentiment
classification
• Compare the accuracy on using several preprocessing steps:
• Using stop word elimination or not
28. NLP Project • Several regex strings either for entity masking or delete unnecessary symbols
• Using stemming or lemmatization
• Compare the accuracy of CountVectorizer & TfidfVectorizer
• Compare at least 5 machine learning algorithms
digitalent.kominfo.go.id

Content of ipynb Script


IKUTI KAMI
• Preprocessing
• Give information on each preprocessing code
• Feature extraction digitalent.kominfo
• Give explanation on the feature extraction code digitalent.kominfo
DTS_kominfo
• Classification Digital Talent Scholarship 2019

• Conduct manual based classification and machine learning based Pusat Pengembangan Profesi dan Sertifikasi
classification Badan Penelitian dan Pengembangan SDM
Kementerian Komunikasi dan Informatika
• Classify the test.csv and measure the accuracy of each method above Jl. Medan Merdeka Barat No. 9
(Gd. Belakang Lt. 4 - 5)
Jakarta Pusat, 10110

digitalent.kominfo.go.id

Вам также может понравиться