Академический Документы
Профессиональный Документы
Культура Документы
1
School of Management, Library and Information Science, Tian Jin Normal University,TianJin,10065,China
2
Information Management, Chaoyang University of Technology, Taichung, Taiwan
3
School of Management, Tian Jin Normal University, TianJin,10065,China
E-MAIL: Lindayan5588@163.com, kevin50201a@gmail.com *crching@yut.edu.tw nkwshuyi@gmail.com
` *&&&
1SPDFFEJOHTPGUIF*OUFSOBUJPOBM$POGFSFODFPO.BDIJOF-FBSOJOHBOE$ZCFSOFUJDT
/JOHCP
$IJOB
+VMZ
broad. From the perspective of user behavior, Wang, Na and Through the results, the same class of topic or the user's
Xu, Dachen[3] used the questionnaire star in the most interesting topic can be recommended, which made
investigation of the protection of personal information in Sina microblogging topic function used more frequently[11].
mobile social networks, and carry out a sampling survey. In From the perspective of social media operators,
the survey, 535 were valid questionnaires. Analysis of the Cheng-Hung Tsai, Han-Wen Liu etc. [12] proposed the
survey results reveals that five aspects of user behavior information explosion, users use various social networking
influence personal information security. Respectively, (1) sites to interact through social network. Users connect to
setting personal information associated with the user to be a various forms of social networking sites anytime, anywhere
password and using the same password on multiple social through the Internet connection, thus social media operators
media; (2) installing social media software from unknown can clearly know the user's information and social relations.
sources of while not constantly update the system security Through the understanding of the user's personal preferences
software; (3) forbidding the use of terminal system privacy and interests, social media operators recommend the user's
control; (4) worrying about unknown information disclosure personal ads, products, articles and other diversified
but often activate the GPS function; (5) being more inclined personal social services. This way not only meets the needs
to upload privacy data to the network disk. These five of users, but also increases the product's click-through rate
aspects of user behavior will increase the possibility of and exposure. This article bases on the tones people used in
privacy exposure[3]. the social network, obtain data from three access--the user
Wang, Shuyi and Zhu, Na[4] put forward new fan page, the user graffiti wall message information, friends’
challenges to the Internet users’ privacy study in the mobile home page. The data is used for analysis of personal
social media user privacy protection strategy research. They preferences, and considering the amount of personal
hold the idea that besides identity, health, family and social data ,the paper find the personal preferences category from
relations, the location of mobile social media users, different groups, so as to recommend personal advertising
information publicized by service providers, posting status and product recommendation services better. Through the
to expose users’ privacy, hackers’ are also involved. Based actual verification, the research results are significantly
on the above issues, the author proposed the government, improved [12].
industry associations, service providers and users themselves From the perspective of user privacy exposure,
and other aspects collaboratively protect the user privacy[4]. Mohamed Bourimi, Ricardo Tesoriero etc.[13] proposed the
From the perspective of social media user privacy, Zhu, privacy and security issues of the multi-modal user interface
Hou[5] in the "social media users’ psychological reaction to for social media applications. The proposed method
privacy protection research" took the domestic and foreign describes how the privacy and security issues are modeled
important journals as the objects of study to analyze the from the perspective of the user interface, and how this
relevant literature about user's privacy psychology and model is developed with a four-level conceptual framework
created an integrated model of privacy concerns. Objects for multi-modal, multi-platform user interfaces. The
were divided into the anterior variables (Privacy experience, approach also explained how these models can be adapted to
privacy awareness, personality differences, demographic the development of social media applications. Finally the
characteristics, culture), the results variables (regulation, author used this model as a social media application to show
behavioral response) and the adjusted variables (privacy the feasibility of the model[13].
calculation, trust)[5]. Elena Zheleva, Lise Getoor proposed that in order to
Longfei Guo[6] took emergence and media report as solve the privacy problem, many social media sites allow
the objects of study. constructed social network privacy users to hide their personal files from the public. How do
concerns dynamic image factor model, observed changes in data users use the online social network mixed with public
user privacy concerns, compared single man’s concerns with and private user profiles to predict the user's private
groups’, explored the different results caused by different attributes? According to the relationship between the
functions of social media. At a result, they found regular classifications of the relationships, they infer the user's
patterns of social network privacy attentions. In addition, the sensitive information through the public information. In
evolutionary game model of privacy protecting addition to links between friends, groups, the study found
Liwen Wang[11] used Python to extract hot topics and that in several well-known social media sites, the use of
participants’ information from the Sina microblogging , and link-based and group-based classification research on social
then used the hierarchical clustering algorithm to extract the networking privacy issues mixed with public and private
hot topics clustering. At the same time, the collaborative user profiles can accurately restore the users’ private
filtering algorithm was used to analyze the extracted data. profiles[14].
The results of the two algorithms are analyzed and compared. Eden Litt [15] suggested that every day hundreds of
1SPDFFEJOHTPGUIF*OUFSOBUJPOBM$POGFSFODFPO.BDIJOF-FBSOJOHBOE$ZCFSOFUJDT
/JOHCP
$IJOB
+VMZ
millions of people log on social networking sites and learning and other methods to build the model for data
generate TB-level data. How do social networking sites use collection research, from the perspective of the data
technology to protect the privacy of users? The results show inequality user and social media operators to obtain,
that the technology on the social networking site is cross-language text analysis, a variety of social media text
intermediary communication, which is between the user and analysis, Operators’ obtaining user information for product
the operator. The development of privacy limits and the marketing and other aspects of research. According to the
technology are controlled by people. Ultimately people can existing research results of scholars, users’ privacy are
identify privacy inequality, some of whom are more likely to severely exposed to the public view. There is few mature
take advantage of the technology to protect their privacy, but research system for social media user privacy. And the
there are also some personal information and reputations that existing user information exposure research empirical
may face more risk than others[15]. In addition, Sweeny research is rare. Therefore, this study based on the social
promoted K-anonymity (K-anonymity) processing before media user information exposure that is great significance
the release of micro-data is can effectively reduce the issue of privacy protection.
probability of personal information disclosure[16]. This paper selects the data with which users register for
Priyadarshini Lamabam and Kunal Chakma[17] basic information operations research, delimits exposure
proposed a way by code mixing and code switching, as well level of the users’ basic information. When browsing the
as two or more languages in the exchange process at the basic information of Facebook users, the basic information
same time being used. Based on previous research, for such of the user is shown in Table 1. In Table, “1” is the user to
a mixed code for language recognition, because of existence expose this information which is dangerous and “0” is not
of informal text (such as creative spelling and phonetic dangerous but “0+” is a little bad.
input), the text is difficult to accurately identify. The author According to the basic information exposed, this paper
uses natural language processing methods to automatically selects the most basic 10 variables. The basic information of
identify mixed social media text. In the study of Facebook users is the most likely to affect the user privacy
cross-language social media, the author selected Twitter and exposure. There are work experience, education, living place,
Facebook to show the mix of codes between English and mailbox and phone, birthday, gender, family members,
Manipuri, compared the Trigram model and the conditional emotional status and user avatar. Set 0 as the definition of
random field (CRF) model, to find which model can more the starting point for work experience and education. Based
accurately identify the language[17]. on the amount of information exposed, this paper tries to
[18] used text analysis in data mining for emotional establish a model that reflects the factors and the types of
analysis in social media texts. The collected data sets are attack in the process of information exposure.
divided into words and sentences, and then the weights of Through empirical research, this study aims at
the two classes are calculated to form the final text model. providing users with optimization strategies to reduce the
JAVA technology is used to evaluate the performance of the possibility of attack and a basis for long-term healthy
model through the different performance parameters and the development.
system in which the proposed system is trained by accuracy,
error rate, storage space, training time and test time. The 4. Experiments and results
author attempted to promote the automatic identification
model by using this model through subsequent research[18].
Entry the basic information of 300 users into the R
software and use K-means for grouping. By k-means
3. Materials and methods clustering algorithm, this paper divided 300 users’ data into
five groups. The Quantities were 14, 91, 43, 143, 9.ġ The
The amount of Facebook's monthly active users following results can be obtained by analyzing the grouping
exceeded 1.7 billion, and the amount of active users also results.ġ The first group of features are mainly in the work,
broke through 1.1 billion. Facebook has a huge amount of education background, place of residence, birthday, family
active users, a lot of social media text and provide an open members and these features are exposed more. The second
API access to data. group of features are mainly in the work, education
Based on the existing research results, although some background, family members and these features are exposed
scholars use the data mining for social media text analysis, more. In the third group, work and family members are not
domestic study lacks the choice of Facebook as a research exposed. There were much exposure in education
example and aspects of the user information exposure. Some background and few exposure in telephone numbers.ġ The
foreign researchers now use the data mining, machine characteristics of the fourth group mainly manifested in the
1SPDFFEJOHTPGUIF*OUFSOBUJPOBM$POGFSFODFPO.BDIJOF-FBSOJOHBOE$ZCFSOFUJDT
/JOHCP
$IJOB
+VMZ
few exposure in work, education background and much fourth group for the fifth level, the first group for the fourth
information disclosure in family members. level, the second group for the third level, the fifth group for
The exposure of the fifth group exists mainly in the the second level, the third group for the first level .The first
work, education background, but family members are rarely level is the safest one.
exposed. These five groups are defined for user information
exposure hazard levels from danger to safety level. The
TABLE 1: Facebook users record information of field name and sub-field name
1SPDFFEJOHTPGUIF*OUFSOBUJPOBM$POGFSFODFPO.BDIJOF-FBSOJOHBOE$ZCFSOFUJDT
/JOHCP
$IJOB
+VMZ
1SPDFFEJOHTPGUIF*OUFSOBUJPOBM$POGFSFODFPO.BDIJOF-FBSOJOHBOE$ZCFSOFUJDT
/JOHCP
$IJOB
+VMZ