Академический Документы
Профессиональный Документы
Культура Документы
Abstract Conceptual Automatic content through the arranged streamlining of sub isolated
summarization is a main NLP technique that limits.
intends to consolidate a source content into a Keywords: Summarization, Multimedia, RNN,
shorter adjustment. The quick addition in all kind NLP, Sequence-to-Sequence
of data show over the internet requires abstractive
summarization from non-simultaneous Introduction
aggregations of content, picture, sound and video. Text summarization plays a fundamental
Here propose an abstractive summarization employment in our step by step life and has been
procedure that joins the strategies of NLP, considered for an extended period of time. With
discourse handling, PC vision and Recurrent the incident to the data age and the advancement
Neural Network to examine the rich data of multimedia development, multimedia data
contained from all kind of data and to get better (counting text, picture, audio, video) have
the idea of multimedia news summarization. The extended altogether. Multimedia data have
main idea is to associate the semantic openings unimaginably changed the way where people live
between multimodel substance. Sound and visual and make it difficult for them to get noteworthy
are major modalities in the video. For sound data, data capably. Most summarization structures base
we structure an approach to manage explicitly use on just NLP, the opportunity to commonly
its interpretation and to find the astounding nature improve the idea of the diagram with the guide of
of the translation with sound signals. For visual programmed discourse acknowledgment (ASR)
data, we get acquainted with the joint depictions and PC vision (CV) handling systems is
of content and pictures using a Computer Vision commonly dismissed.
Technique. Previous researches done on Text
Summarization mainly focuses on Extractive Abstractive text summarization aims to generate a
method of summarization. In this work, we put short summary of the complete article that covers
forward an Abstractive method of summarization all the important information. Summarization
with sequence to sequence architecture. Finally, techniques are mainly divided into extractive and
all the multimodal points are considered to make abstractive methods. Extractive methods
a literary once-over by increasing the striking construct a summary by extracting the salient
nature, non-reiteration, clarity and consideration words, phrases, or sentences from the source text
itself. On the other hand Abstractive methods of delivered abstract by using data in visual
produce a summary that is similar to a human- modality. Regardless, the yield of existing
written abstract by concisely paraphrasing the multimodal summarization systems is commonly
source content. That is, the former ensures the addressed in a single modality, for instance,
grammatical and semantic correctness of the textual or visual (Li et al., 2017; Evangelopoulos).
generated summaries, while the latter creates
more diverse and novel content. In this paper, we Majority of work in the past decade has been
focus on abstractive text summarization. focused on extractive summarization [18]–[26],
where a summary consists of key words or
The fast advancement in deep learning, sentences from the source text (article). Different
encourages many sequence to sequence models, from extractive methods copying units from the
proposed to project an input sequence into another source article directly, abstractive summarization
output sequence. These approaches has been uses the readable language for human to
successful in many applications or tasks like summarize the key information of the original
speech recognition [3], video captioning [4] and text. Therefore, abstractive approaches can
machine translation [5]. Unlike these tasks, in text produce much more diverse and richer
summarization, the output sequence summaries. Abstractive summarization task has
(summarized) is much shorter than the input been standardized by the DUC2003 and
source sequence (document). To realize the DUC2004 competitions [27]. Hence, there
context summarization, [6] we proposes an emerge a series of notable methods without neural
attentional sequence to sequence model based on networks on this task, e.g., the best performer
RNNs. It projects the original document into low- TOPIARY system [28].
dimensional embeddings. However, RNNs tend
to have low-efficiency problems as they rely on Deep learning has been developing fast recently,
the previous steps when training. Therefore, and it can handle many NLP tasks [20]. So
though not commonly used in sequence model, researchers have began to consider such
we propose a sequence to sequence model based framework as an effective, completely data-
on RNN to create the representations for the driven alternative for text summarization.
source texts. Reference [21] used convolutional modes to
encode the original text, and an attentional feed-
forward neural network to generate summaries.
Related work But it did not use hierarchical CNNs, thus not that
Text summarization is to expel the noteworthy effective. Reference [22] is an extension to [23],
data from source archives. With the development and it replaced the decoder with an RNN.
of multimedia data on the web, a couple of pros Reference [24] introduced a large Chinese dataset
(Shah et al., 2016; Li et al., 2017) revolve around for short text summarization and it used a RNN
multimodal summarization starting late. Existing based seq2seq model. [25] utilized a similar
examinations (Li et al., 2017, 2018a) have seq2seq model with RNN encoder and decoder,
exhibited that, stood out from text summarization; but it contained English corpora, reporting
multimodal summarization can improve the idea advanced results on DUC and Gigaword datasets.
Reference [26] utilized the generative adversarial films, pictorial storylines and social multimedia.
networks to generate the abstract text Erol et al. [2] hope to make huge parts of a social
summarization, meanwhile equipped with a time- event recording subject to an examination of
decay attention mechanism. Reference [27] was sound, text and visual activity. Tjondronegoro et
based on BERT to take advantage of the pre- al. [4] propose a technique to consolidate a game
trained language model in the seq2seq by separating the textual data removed from
framework, and designed a two-stage decoder multiple assets and recognizing the huge
process to consider both sides’ context substance. Li et al. consolidate news pictures by
information of every word in a summary. We text and visualize text by pictures.
denoted the above two methods as GAN and
BERT respectively, and we will compare our By then, a news story and a picture are picked to
model with them in experiments in detail. address each topic. For electronic long range
Reference [28] proposed a two-stage sentences interpersonal communication summarization,
selection model based on clustering and Fabro et al. [10] and Schinas et al. [12] propose to
optimization techniques. Reference [29] proposed layout the genuine events subject to multimedia
a text summarization model based on a LSTM- content. A multimodal LDA to recognize topics
CNN framework that can construct summaries by by getting the connections between's the text and
exploring semantic phrases. Reference [30] image features of little scale online diaries with
presented an automatic process for text introduced pictures. The yield of their procedure
abstraction which was based on fuzzy rules on a is a great deal of specialist pictures that portray
variety of extracted features to find the most the events.
important information from the source text.
Problem Definition
The present applications related to Text
Multi-document Summarization Summarization consolidate get-together of
MDS tries to expel huge data from a great deal of summarization, sport video summarization, film
reports related to an event to create an outline of summarization, pictorial storyline summarization,
much more diminutive size. MDS can be course of occasion’s summarization and social
abstractive or extractive. Extractive-based models multimedia summarization. Past examinations on
use distinctive phonetic features, for instance, these topics overwhelmingly revolve around
sentence position [17], [18] and tf*idf [19], to sketching out synchronous multimodal substance.
perceive the most surprising sentences in a ton of Pictorial storylines involve a great deal of pictures
reports. Diagram based methods [20] are with text depictions. None of these applications
generally used extractive-set up together MDS revolve around sketching out multimedia data that
models based. Finally, the top-situated sentence is contain non-simultaneous data about events.
picked to manufacture outlines.
Implementation Model Overview
Multi-modal Summarization There are various basic edges in making a not too
As of now, much work has been performed to bad textual framework for multi-modal data. The
consolidate meeting narratives, sport recordings, prominent substance in records should be held,
and the key convictions in recordings and pictures things considered handling strategies for different
should be verified. Further, the once-over should modalities.
be clear and non-dull and should seek after the
fixed length prerequisite. All of these perspectives Sound, i.e., discourse can be consequently
can be as one overhauled by the arranged converted into text by using an ASR system2.
expansion of sub particular capacities. For visual, which is extremely a progression of
pictures (diagrams), in light of the fact that most
Max S T {F(S) :X sϵS ls <=L } of the adjacent housings hold abundance data, we
first concentrate the most significant edges, i.e.,
Above T is the course of action of sentences, S is key frames. We become acquainted with the joint
the blueprint, ls is length words, L is spending p depictions for textual and visual modalities and
lan, i.e., length prerequisite for the summary, and would then have the option to recognize the
sub measured limit F(S) is the summation score sentence that is appropriate to the picture. Thusly,
related to the recently referenced points of view. we can guarantee the consideration of made
Text is the essential modality of archives, and on framework for the visual data.
occasion, pictures are embedded in records.
Recordings include at any rate two sorts of By then this text output is given to our trained
modalities: sound and visual. Next, we give all RNN model.
Architecture Model
Implementation Details
We first convert the multimedia news data (Audio
and Images) into text form. For this a GUI is
developed with a user login and an admin login.
We also categorized the news articles into
different categories such as Politics, Education,
Fig4: Seq-to-seq accuracy and loss.
Movies, Electronics, Fashion, Others, etc.
Further the extracted text data is given as an input
to our trained RNN model. The trained model
gives an abstractive summarization as an output.
Experimental Results
Conclusion
This paper tends to an offbeat Abstractive
Summarization task, to be specific, how to utilize
related text, sound and video data to create a
textual outline. In this work, we apply the
sequence to sequence framework for the task of
abstractive summarization with very promising
results. Our model is built on a encoder-decoder
model with attentional mechanism. As a part of
future work, we will concentrate on developing [8] D. Wang, T. Li, and M. Ogihara, “Generating
more advance model that deal with the huge range pictorial storylines via minimum-weight
of multimedia data to extract more comprehensive connected dominating set approximation in multi-
summarization. view graphs.” in AAAI, 2012.
[9] W. Y. Wang, Y. Mehdad, D. R. Radev, and
A. Stent, “A low-rank approximation approach to
References learning joint embeddings of news stories and
[1] H. Li, J. Zhu, C. Ma, J. Zhang, and C. Zong, images for timeline summarization,” in NAACL-
“Multi-modal summarization for asynchronous HLT, 2016, pp. 58–68.
collection of text, image, audio and video.” in [10] I. Mademlis, A. Tefas, N. Nikolaidis, and I.
EMNLP, 2017, pp. 1092–1102. Pitas, “Multimodal stereoscopic movie
[2] B. Erol, D.-S. Lee, and J. Hull, “Multimodal summarization conforming to narrative
summarization of meeting recordings,” in ICME, characteristics,” IEEE Transactions on Image
vol. 3. IEEE, 2003, pp. III–25. Processing, vol. 25, no. 12, pp. 5828–5840, 2016.
[3] R. Gross, M. Bett, H. Yu, X. Zhu, Y. Pan, J. [11] T. Hasan, H. Boˇril, A. Sangwan, and J. H.
Yang, and A. Waibel, “Towards a multimodal Hansen, “Multi-modal highlight generation for
meeting record,” in ICME, vol. 3. IEEE, 2000, pp. sports videos using an informationtheoretic
1593–1596. excitability measure,” EURASIP Journal on
[4] Venugopalan, S.; Rohrbach, M.; Donahue, J.; Advances in Signal Processing, vol. 2013, no. 1,
Mooney, R.J.; Darrell, T.; Saenko, K. Sequence to p. 173, 2013.
Sequence—Video to Text. In Proceedings of the [12] G. Evangelopoulos, A. Zlatintsi, A.
2015 IEEE International Conference on Potamianos, P. Maragos, K. Rapantzikos, G.
Computer Vision, ICCV 2015, Santiago, Chile, Skoumas, and Y. Avrithis, “Multimodal saliency
7–13 December 2015; pp. 4534–4542, and fusion for movie summarization based on
doi:10.1109/ICCV.2015.515. [CrossRef] aural, visual, and textual attention,” IEEE
[5] Bahdanau, D.; Cho, K.; Bengio, Y. Neural Transactions on Multimedia, vol. 15, no. 7, pp.
Machine Translation by Jointly Learning to Align 1553–1568, 2013.
and Translate. arXiv 2014, arXiv:1409.0473.
[6] Nallapati, R.; Zhou, B.; dos Santos, C.N.; [13] M. Del Fabro, A. Sobe, and L.
Gülçehre, Ç.; Xiang, B. Abstractive Text B¨osz¨ormenyi, “Summarization of real-life
Summarization using Sequence-to-sequence events based on community-contributed content,”
RNNs and Beyond. In Proceedings of the 20th in The Fourth International Conferences on
SIGNLL Conference on Computational Natural Advances in Multimedia, 2012, pp. 119–126.
Language Learning, CoNLL 2016, Berlin, [14] J. Bian, Y. Yang, and T.-S. Chua,
Germany, 11–12 August 2016; pp. 280–290. “Multimedia summarization for trending topics in
[7] D. Tjondronegoro, X. Tao, J. Sasongko, and microblogs,” in CIKM. ACM, 2013, pp. 1807–
C. H. Lau, “Multimodal summarization of key 1812.
events and top players in sports tournament [15] M. Schinas, S. Papadopoulos, G. Petkos, Y.
videos,” in WACV. IEEE, 2011, pp. 471–478. Kompatsiaris, and P. A. Mitkas, “Multimodal
graph-based event detection and summarization
in social media streams,” in Proceedings of the [23] K. Wong, M. Wu, and W. Li, “Extractive
23rd ACM international conference on summarization using supervised and semi-
Multimedia. ACM, 2015, pp. 189–192. supervised learning,” in Proc. Conf. 22nd Int.
[16] J. Bian, Y. Yang, H. Zhang, and T.-S. Chua, Conf. Comput.
“Multimedia summarization for social events in [24] Y. Ouyang, W. Li, Q. Lu, and R. Zhang, “A
microblog stream,” IEEE Transactions on study on position information in document
Multimedia, vol. 17, no. 2, pp. 216–228, 2015. summarization,” in COLING, 2010, pp. 919–927.
[17] R. R. Shah, A. D. Shaikh, Y. Yu, W. Geng, [25] D. R. Radev, H. Jing, M. Sty´s, and D. Tam,
R. Zimmermann, and G.Wu, “Eventbuilder: Real- “Centroid-based summarization of multiple
time multimedia event summarization by documents,” Information Processing
visualizing social media,” in Proceedings of the Management, vol. 40, no. 6, pp. 919–938, 2004.
23rd ACM international conference on [26] Z. Yong, J. E. Meng, Z. Rui, and M. Pratama,
Multimedia. ACM, 2015, pp. 185–188. “Multiview convolutional neural networks for
[18] J. L. Neto, A. A. Freitas, and C. A. A. multidocument extractive summarization,”
Kaestner, “Automatic text summarization using a IEEE Trans. Cybern., vol. 47, no. 10, pp. 3230–
machine learning approach,” in Proc. Adv. Artif. 3242, Oct. 2016.
Intell. 16th Braz. Symp. Artif. Intell. (SBIA), Nov. [27] K. Filippova and Y. Altun, “Overcoming the
2002, pp. 205–215. lack of parallel data in sentence compression,” in
[19] G. Erkan and D. R. Radev, “LexRank: Proc. Conf. Empir. Methods Nat. Lang. Process.
Graph-based lexical centrality as salience in text (EMNLP), Oct. 2013, pp. 1481–1491.
summarization,” CoRR, vol. abs/1109.2128, [28] T. Mikolov, I. Sutskever, K. Chen, G. S.
2011.[Online].Available:http://arxiv.org/abs/110 Corrado, and J. Dean, “Distributed
9.2128 representations of words and phrases and their
[20] V. Varma, V. Varma, and V. Varma, compositionality,” in Proc. 27th Annu. Conf.
“Sentence position revisited: a robust light-weight Neural Inf. Process. Syst. Adv. Neural Inf.
update summarization ’baseline’ algorithm,” in Process. Syst., Dec. 2013, pp. 3111–3119.
International Workshop on Cross Lingual
Information Access: Addressing the Information [29] C. A. Colmenares, M. Litvak, A. Mantrach,
Need of Multilingual Societies, 2009, pp. 46–52. and F. Silvestri, “HEADS: Headline generation as
[21] R. R. Shah, Y. Yu, A. Verma, S. Tang, A. D. sequence prediction using an abstract feature-rich
Shaikh, and R. Zimmermann, “Leveraging space,” in Proc. Conf. North Amer. Assoc.
multimodal information for event summarization Comput. Linguist. Human Lang. Technol.
and concept-level sentiment analysis,” (NAACL HLT), Denver, CO, USA, May/Jun.
Knowledge-Based Systems, vol. 108, pp. 102– 2015, pp. 133–142.
109, 2016. [30] J. Cheng and M. Lapata, “Neural
[22] S. Khuller, A. Moss, and J. S. Naor, “The summarization by extracting sentences and
budgeted maximum coverage problem,” words,” in Proc. 54th Annu. Meeting Assoc.
Information Processing Letters, vol. 70, no. 1, pp. Comput. Linguist. (ACL), vol. 1. Berlin,
39–45, 1999. Germany, Aug. 2016, pp. 484–494.
[31] X. Wan and J. Yang, “Improved affinity IEEE Transactions on Knowledge & Data
graph based multidocument summarization,” in Engineering, vol. 25, no. 5, pp. 1162–1174, 2013.
NAACL, 2006, pp. 181–184. [35] R. Nallapati, F. Zhai, and B. Zhou,
[32] G. Erkan and D. R. Radev, “Lexrank: Graph- “SummaRuNNer: A recurrent neural network
based lexical centrality as salience in text based sequence model for extractive
summarization,” Journal of Qiqihar Junior summarization of documents,” in Proc. 31st AAAI
Teachers College, vol. 22, p. 2004, 2011. Conf. Artif. Intell., San Francisco, CA, USA,
[33] X. Zhou, X. Wan, and J. Xiao, “Cminer: Feb. 2017, pp. 3075–3081.
Opinion extraction and summarization for chinese [36] R. Mihalcea and P. Tarau, “Textrank:
microblogs,” IEEE Transactions on Knowledge & Bringing order into texts,” in ACL, 2004.
Data Engineering, vol. 28, no. 7, pp. 1650–1663, [37]https://epaper.timesgroup.com/Olive/ODN/Tim
2016. esOfIndia/#
[34] X. Li, L. Du, and Y. D. Shen, “Update [38]http://epaper.indianexpress.com/
summarization via graphbased sentence ranking,” [39]http://paper.hindustantimes.com/epaper/viewe
r.aspx