Вы находитесь на странице: 1из 23

NITTE MEENAKSHI INSTITUTE OF TECHNOLOGY

(AN AUTONOMOUS INSTITUTION)


(AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM, APPROVED BY AICTE & GOVT.OF KARNATAKA)

COURSE-PROJECT REPORT
ON

ALANG MARKDOWN TO HTML CONVERTOR

Submitted by:
Abhilash Rejanair
Aditya Hosamani
Aniruddha Achar B P

1NT13CS003
1NT13CS007
1NT13CS016

In partial fulfilment of the requirements for the completion of IV Semester Object-Oriented


Programming Course-Project work during the academic year 2014-2015.

Department of Computer Science and Engineering


Nitte Meenakshi Institute of Technology,
Yelahanka, Bangalore 560064
Academic Year 2014-15

NITTE MEENAKSHI INSTITUTE OF TECHNOLOGY


(AN AUTONOMOUS INSTITUTION)
(AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM, APPROVED BY AICTE & GOVT.OF KARNATAKA)

CERTIFICATE
This is to certify that the Project Report

ALANG markdown to HTML convertor


Is an authentic work carried out by
Abhilash Rejanair
Aditya Hosamani
Aniruddha Achar B P

1NT13CS003
1NT13CS007
1NT13CS016

In partial fulfilment of the requirements for the completion of IV Semester Object-Oriented


Programming Course-Project work during the academic year 2014-2015.

Name & Signature of the Guide

Name & Signature of HOD

ACKNOWLEDGEMENT

This project was compiled for the Object Oriented Programming Course of 4th
Semester.
We would like to thank our Professor, Mrs. Vijaya Shetty for providing us with
the opportunity and daring us to come up with something new and creative. We
also thank her for her assistance and support, both moral and technical, in writing
this project.
We would also like to thank our respective parents and family members, all of
whom have been thoroughly supportive.
We are also grateful to the internet, in no small amounts, for all the amazing
research material and ideas which inspired us to come up with something on our
own.
Last, but not the least, we are sincerely indebted to the author of the prescribed
text book, Herbert Schildt, for providing us an absolute reference guide, using
which we could solve numerous technical problems we faced.
With the mutual consensus among the team members, we have decided to release
the source code of this project to the public, effectively making this whole project
open source, after the due evaluation is done.
The project and its code will soon be available on Github, under MIT/APACHE
license.

ABSTRACT

Creating a website, on its own, is very difficult. A complete knowledge of HTML


and CSS is required to build a basic web page, let alone a website which is
compliant with modern standards and of good design ethics.
This is where our project comes in. ALANG is a relatively friendlier language,
which imbibes the good qualities of HTML and CSS, which also makes the
overall implementation shorter and easier to implement modern designing
aspects.

Contents
Chapter 1 Introduction ............................................................................................................... 1
1.1 HTML .......................................................................................................................... 1
1.2 LATEX [2] ................................................................................................................... 1
1.3 Text processor: ............................................................................................................. 1
Chapter 2 The ALANG language: ............................................................................................. 3
HTML ................................................................................................................................ 3
ALANG.............................................................................................................................. 3
2.1

Salient features of ALANG: .................................................................................... 3

2.2

ALANG vs. Markdown: .......................................................................................... 4

2.3

Difficulty tokenizing and converting ALANG to HTML ....................................... 5

Chapter 3 Design of convertor: .................................................................................................. 6


Chapter 4 Implementation.......................................................................................................... 8
4.1 File handling: ............................................................................................................... 8
4.2 Tokenizing ALANG: ................................................................................................... 9
4.3 Code Convertor .......................................................................................................... 12
Chapter 5 Efficiency of the implementation ............................................................................ 13
5.1 Mathematical analysis of the scanning and converting algorithm [6] ....................... 13
5.2 Analysis using the system clock ................................................................................ 13
Chapter 6 Future development ................................................................................................. 16
6.1 Using of regex instead of scanning character-wise.................................................... 16
6.2 Error detection and proper message passing .............................................................. 16
6.3 Use of Other Frameworks .......................................................................................... 16
6.4 Cross-OS Compatibility ............................................................................................. 16
Reference ............................................................................................................................. 17

Figure 1 Design of the convertor............................................................................................................................. 7


Figure 2 The Output File Hierarchy ......................................................................................................................... 9
Figure 3 Class diagram for scanning the input ..................................................................................................... 10
Figure 4 Flow chart for scanning input ................................................................................................................. 11
Figure 5 Efficiency ................................................................................................................................................. 14
Figure 6 Input screen ............................................................................................................................................ 14
Figure 7 Part of the output screen ........................................................................................................................ 15

Table 1 HTML vs. ALANG ........................................................................................................................................ 2

Chapter 1

ALANG markdown to HTML

Chapter 1 Introduction
Why develop a new language when there are thousands of other languages out there?
The answer is twofold. Firstly different languages were developed for different purposes. Each
programming language has its limitations and advantages. Languages that are widely used take
time to change and changes are rolled out slowly.
There is also a claim Programmer training is the dominant cost of programming
language. [1]
The cost of programming languages can be reduced by developing languages closer to
natural language

1.1 HTML
With the advent of the internet age, a new formatting method to universally present documents
was invented called HTML. It created a standard for formatting documents that could be
shared, linked and viewing documents. Use of tags for formatting a document made it easy for
writing and editing of documents. HTML is the standard format for presentational mark-up on
the World Wide Web.
With the smartphone boom, web browsers are in everyones hand; making HTML and its
excellent formatting capability that much more important. But with the HTML tags came the
learning curve and the need to remember the tags for each of the properties. Also the length
and correct sequence of tags to perform a task many a times hinders the fast and efficient
formatting of documents that are web ready.
In HTML, we use <em>the em and strong tags</em> to add <strong>emphasis</strong>.

1.2 LATEX [2]


LATEX is another presentational mark-up language, but its primary use is in preparing
scientific documents in academia, where the document is often translated into a PDF format
and then rendered for presentation to end users by a PDF reader application.
In LaTeX, we use \emph{the emph and bf commands} for the same {\bf effect}.

1.3 Text processor:


Talking about formatting documents, the next thing that comes into mind is word processing
software. They are elegantly designed, beautifully presented, are versatile and easy to use but,
they are platform dependent.
Neither LATEX nor HTML was designed with readability in mind, they are clumsy as-is and
require translation before presentation to readers.
Text processing software are not clumsy and are easy to use but are platform dependent.
What if we could develop a formatting language that is platform independent, easy to use, easy
to share, easy to edit. Something close to natural language. This idea drove us to create a new
formatting language which we call ALANG.
ALANGs explicate goal was to make the code more readable. With the accompanying lexical
analyser and convertor, ALANG documents can be converted into HTML and presented to
1
Department of computer science

2014-2015

Chapter 1

ALANG markdown to HTML

readers, the code in itself is highly readable. As a quick demonstration of this quality, consider
the following code segment which is equivalent to the HTML example given above:
We use ~ the em and strong tags~ to add *emphasis*
ALANG supports the same basic formatting techniques as HTML and LATAX but strives to
keep the syntax as light as possible.

HTML

ALANG

Text based tags are used here.

Special characters are used as tags here.

Tags are normally three or more characters long.

Tags are three characters long at most.

Low code readability.

Comparatively high readability.

Steep learning curve.

Small learning curve.

Somewhat artificial in nature.

Close to natural language.

~the em and strong tags~ to add *emphasis*

<em>the em and strong tags</em> to add


<strong>emphasis</strong>

TABLE 1 HTML VS. ALANG


Code conversion is the process of replacing syntax/tags of one language with the
equivalent syntax of another language. Especially in the case of ALANG and its intended use,
converting it into HTML (the language of the web) makes it easy for the user to publish the
document on the World Wide Web.
Conversion of one language to another requires the analysis of all the tokens in the input
language. This analysis of tokens is called lexical analysis [3] or tokenizing. Once input is
tokenized, tokens and the corresponding text is split. This text is then sent to a converting
function for appropriate tags of the output language to be added. These steps of converting
ALANG to HTML are done using a convertor called ALANG convertor.
The goal of this project is to design a mark-up language with high readability and also
a convertor that converts documents in ALANG to HTML. For the first part we relied on an
existing platform called the Markdown [4] language. The syntax and use of tags were inspired
by Markdown. For the second part i.e. the convertor, C++ was used to get the input from the
user, sanitize the users input, tokenize the input and convert it to HTML.
In the following chapters we shall discuss the language more in-depth and also the
complexity therein our convertor must be able to handle.
In chapter 3 we shall discuss about the requirement for such a conversion to take place.
In chapter 4 we discuss about the implementation of such convertor with C++. In chapter 5 we
shall evaluate it against our requirement. In chapter 6 we shall discuss about future
development and the last chapter 7 to sum up the report.

2
Department of computer science

2014-2015

Chapter 2

ALANG markdown to HTML

Chapter 2 The ALANG language:


ALANG is a Markdown1 derivative developed with great emphasis given to readability of the
source code. The syntactic elements it defines map directly to HTML, which is the most
common format ALANG documents are translated to. This language can be used in chat
forums, instant messengers, comment fields and also can be used for introductory courses in
computer programming.
We have seen comparison between ALANG, HTML and LATEX in the introduction.
Now we will have a detailed discussion about the differences between ALANG and HTML.
To understand the difference we will consider the following sample document:

HTML
<h1>this is head one</h1>
<p>this is a paragraph with <strong>bold</strong></p>
<p>this has <pre>code</pre> and <em>italics</em></p>

ALANG
#1this is head one
this is a paragraph with *bold*
this has `code` and ~italics~
The difference if striking, the code in ALANG is clear and does not clutter the actual content.
The user can look at the code without giving much thought to what each tag means. Thus
increasing readability.

2.1

Salient features of ALANG:


All the syntactical elements of ALANG have an equivalent in HTML but the opposite
is not true as ALANG supports only a sub-set of elements in HTML. Many prominent
features like block-level and span-level elements. The paragraph, table tags are
example for the former while as code, emphasis are examples for the latter.
Paragraphs in ALANG are separated by a next line character.
Ex:
This is a paragraph in ALANG.
This is another paragraph with *bold* emphasis.
Special tags like headings and tables are indicated with #. A line beginning with
# followed by a number indicates a heading of that number. # followed by a t
represents a table whose end is signified by a %. Ex:
Heading in ALANG:
#1 this is a head one
#2 this is a head 2

The author of Markdown, John Grubel uses the same name to refer to his Markdown to HTML compiler, but
in the name of clarification of this report, we shall talk only about the language Markdown and not the compiler.

3
Department of computer science

2014-2015

Chapter 2

ALANG markdown to HTML

Tables can be created by separating the columns by a pipeline symbol (|) and
each row should be written in a new line. To indicate the start and end of a table, the
table is enclosed inside #t and %.
Tables in ALANG:
#t
|table one| table 2 | table 3 | table 4 |
|This is a test column |A second test column is here| Too many columns here |Last|
|Row 2 column 1|Row 2 column 2 |Row 2 column 3|Row 2 column 4|
%
Tables can also be used with headers, i.e., using the thead HTML tag. The syntax for
this is almost the same as that for normal tables, with one small exception.
Tables in ALANG:
#t
|table heading one| table heading 2 | table heading 3 | table heading 4 |
|----------------------|--------------------|---------------------|--------------------|
|This is a test column |A second test column is here| Too many columns here |Last|
|Row 2 column 1|Row 2 column 2 |Row 2 column 3|Row 2 column 4|
%

Emphasis is added by surrounding the text with * or ~ for bold and italics respectively.
A part of the document can be made italics by surrounding them with ` (grave). Ex:
This is a paragraph with *some bold text* followed by ~italic text~ and ended with
`code`.
In addition to all the above mentioned elements, ALANG also supports images
and links. Both of them are surrounded by !. The syntax for images and links are as
given below:
Syntax for image:
!{image source}!
Syntax for link:
!(anchor text)[link]!

2.2 ALANG vs. Markdown:


Even though ALANG is a derivation of the Markdown language, many features
that Markdown lacked were added to ALANG. The most important being support for
CSS.
In ALANG, support for sections was added. To declare a part of the document,
the part has to be surrounded by $ and a variable name for that section has to be
assigned.
Syntax of sections:
4
Department of computer science

2014-2015

Chapter 2

ALANG markdown to HTML

$variable name:
Content of the section
$
With the section support, we have added CSS to style the appropriate sections
which should be specified at the end of the document. CSS part of the code is specified
within brackets.
Syntax for CSS:
##variable name
{
CSS part here
}
With these additions we have tried to improve upon the already existing Markdown
language.

2.3 Difficulty tokenizing and converting ALANG to HTML


The major problem with tokenizing ALANG in C++ is lack of support for regex
library. The next problem was proper nesting of elements one inside the other. A
considerable amount of time and research was spent on this. We had to look at how
compilers tokenize and build there expression trees and we had to develop a proper
way to identify the start and end of an element as symmetrical tags were being used.
One research material we came across when we were researching about ALANG to
HTML convertor was adapting a Markdown Compilers Parser for Syntax
Highlighting by Ali Rantakari. Here they had proposed a syntax highlighter that would
present each element of Markdown in a different colour. This process of highlighting
was achieved using a Markdown parser. In this paper they had talked about different
Markdown parsers and there efficiency. This gave us a starting point from where we
could work to develop our own tokenize and convertor

5
Department of computer science

2014-2015

Chapter 4

ALANG markdown to HTML

Chapter 3 Design of convertor:


Convertors [3] that convert one form of document to another generally have a front-end that
interprets the input and transforms it into some kind of intermediate form, and a back-end that
generates the output. The front end performs the lexical analysis or scanning while the backend does the code conversion. The front-end can perform the scanning and parsing of the input
code in one single pass instead of using multiple passes.
A similar approach was used while designing the ALANG to HTML convertor. The user writes
the code in a file named input.alg. This file is then pre-processed by the file handling classes
to give a valid input to the scanning and convertor class. Along with this various files and
folders required of the execution and presentation of the output has to be created. After these
files are created and a stable platform for scanning and converting the code is setup, the
scanning classes are called.
The scanning classes have the task of tokenizing and sending the output to the file handling
classes. The process of tokenizing the input is achieved by traversing the input one character
at a time. When an element is encountered, the tokens are stripped and the content of the
element is passed to the specific methods conversion classes. Some elements are block and
others are of span nature the scanning class methods process the elements according to this
nature.
The converter makes use of a HTML framework, called Bootstrap. Bootstrap, an open source
HTML framework, from Twitter, consists of a set of CSS, JS and Font files, which are used to
easily implement modern UIs, cross-platform and cross-browser consistency in websites.
We make use of Bootstrap simply because it is easier to create a website with modern UI using
a framework, rather than starting from a scratch.
Since bootstrap is being used, all the related functionalities associated with it can also be used.
Bootstrap websites are generally divided into 12 parts or grids. All these grids come under a
row, which in turn, is held in a container. Bootstrap makes use of these grid layouts to make
the website responsive which implies that a website which works well on desktops, laptop
and larger display devices should also work well on smaller devices, say Mobile phones, PDAs,
Tablets, etc.
Each of these grids can be of different types. The different types are: xs, sm, md, and lg. They
stand for extra small, small, medium and large respectively. If a grid is designated with xs, this
makes the grid constant, that is, the grid does not change its height and width in case of any
displays. For example, if we use md, this means that the grid doesnt change its height and
width till the screen size is medium, after which it automatically becomes large.
Since Bootstrap is designed based on a mobile-first philosophy, this helps our generated
ALANG site to become fully functional on mobile phones too.
Bootstrap also incorporates some UI designs, like buttons, etc., all of which are readily
available if the respective variable names (see below chapters) are used correctly.

6
Department of computer science

2014-2015

Chapter 4

ALANG markdown to HTML

The resultant HTML file generated is by no means read-only or restricted by any means. The
code of the website is completely available to the user for modification, removal of credits, etc.

____
____
____
____
____
____
____
ALANG
____
____
____
____

______
______
______
______
______
______
HTML
+ CSS
______
__

ALANG HTML
CONVERTER

Searching for Tags:

All flags are set to 0.

As soon as a tag is
encountered, respective
flag is set to 1.

The process repeats for all


nested tags.

Conversion of Tags:

File Output:

As soon as tag is

encountered, its HTML

copied to newly created

equivalent is found.

directory.

Code iterated for every

tag found, including

Variables and sections

Body of the file is output


every time a tag is realized.

are treated specially.

Header for the output file is


requested.

nested tags.

Constant bootstrap files are

Styles are stored in


object and is output

Footer is requested.

Style.css is generated.

when destroyed.

FIGURE 1 DESIGN OF THE CONVERTOR

7
Department of computer science

2014-2015

Chapter 4

ALANG markdown to HTML

Chapter 4 Implementation
The implementation of the convertor has was divided into three important segments.
These three segments were assigned to the members of the team. The development of the
project was done using a product design technique called the swift technique. Here each
member was required to produce a working iteration of the project every week. At the end of
the project, the swift cycle was reduced to a single day.
Convertors that convert one form of document to another generally have a front-end
that interprets the input and transforms it into some kind of intermediate form, and a back-end
that generates the output. The front end performs the lexical analysis or scanning while the
back-end does the cod conversion. The front-end can perform the scanning and parsing of the
input code in one single pass instead of using multiple passes.
Three importation parts of the project:
1. File handling.
The front-end
2. Tokenization
3. Conversion from ALANG to HTML

Back-end

The development of the language has been discussed in the previous chapters. In this chapter
we will talk about the implementation of the convertor in C++. The following section will
discuss about the file handling and creation of appropriate directories and files for the scanning
and code conversion.

4.1 File handling:


File handling deals with the moving of directories and files from the working directory
to the directory mentioned by the user. This is done by accessing various predefined functions
which are specific for each operation system. As previously mentioned the project was
developed for Microsoft windows operation system.
The first task the file handling does is to convert the user mentioned directory path to windows
friendly path name. This is achieved by replacing \ with \\.The input files which has to have
a constant name input.alg which has all the code in ALANG is pre-processed by the methods
of the file handling classes. This pre-processing involves the escaping of quotes given by the
user in the input as these quotes may lead to ambiguity to the compiler during runtime. The
next task it performs is to create directories and required files for the conversion and running
of the code. The directories created are done using functions belonging to the header file
windows.h inclusion of this header file makes the program limited to the windows platform.
But gives great functionality to the program. The directories created are CSS, JS and the media
folders. The required files that constitute some features of bootstrap (which will be discussed
in the subsequent sections) are also copied to the respective folders. There are also a few
temporary directories and files created during the execution of the program, these are deleted
at the end of the execution of the program.
The file system used to implement bootstrap and to present the output is shown below:

8
Department of computer science

2014-2015

Chapter 4

ALANG markdown to HTML

bootstrap.css
CSS
style.css
bootstrap.js
JS

Website

jquery.js
Media

.jpg, .gif, .mp3, ...

index.html

FIGURE 2 THE OUTPUT FILE HIERARCHY


After the conversion of the code, the HTML output is displayed using the default browser.

4.2 Tokenizing ALANG:


The tokenizing or the scanning part of the convertor is among the most time-consuming and
complex part of the program. Even though ALANG has a handful of tokens, the permutations
in which they can be used makes it difficult to scan and interrupt the tokens and what each of
them mean in that context. Combined with the various possibilities of using the tokens, many
of them are nested which calls for a special approach the problem. The best solution to such a
problem would be to use regular expressions like most compilers and parsers would use. But,
due to lack (almost no) support for regex in the standard C++ library, a new text processing
method had to be designed for this particular task. This task was achieved by using two classes;
one to traverse through the whole text while the other to scan for the tokens that are encountered
and strip the tags and pass the plain text to the conversion class.
The findall class has the methods to traverse through the whole input file character by
character just once and find all the tags. This class has two methods namely end_s(), that
searches the entire input till the end of a particular tag/element is found. While there is another
virtual method, that starts the search and recognises which token represents which tag. The
reason for this being made virtual is, the tags cannot be recognized until the meaning of the tag
is known and this class is designed purely to traverse the input string and not recognize the
various tags used.
For the recognition of the tokens and there meaning, a second class was derived which
inherited all the properties of the findall class. The class was named Tags where the pure virtual
method start_s() was defined. If one observes the code, they will recognize that start_s() and
end_s() are recursively call within each other. This is done to support nesting of elements within
one other. When there are several elements within one other, the innermost element is first
9
Department of computer science

2014-2015

Chapter 4

ALANG markdown to HTML

detected, then converted and next the element surrounding it and so on. This is achieved using
a stack. The systems stack is implicitly called to hold the hold the scanned and tokenized string
until the end of the element is found. First the innermost i.ie the smallest element is converted,
this converted is replaced in place and the control is transferred to the outer element to scan for
other elements if any inside this element. If none are found, the outer element that contains the
converted element is passed to the converting function to be converted to its HTML equivalent.
As the starting and ending tags(tokens) are same for most of the elements, a flag is maintained
for each element to check if it is the starting or the end of the element.
The start_s() has switch cases for each of the tags, whenever a character matching the
tokens is encountered, the switch is triggered, the flag corresponding to that element is set to
true indicating that the starting of the element was found, next if the tag supports nested tags,
the next tag is found else the steps to convert the processed text to its equivalent HTML is
performed by passing the processed string i.e. the string that is assigned with the value of the
contents of the tag if passed to a method of the convert class that appends equivalent HTML
tags to the content and returns the appended string back. This now replaces the output string
that will have the HTML equivalent of the code. In case of images, along with conversion, the
source files i.e. the media files mentioned in the code are copied to a folder named media which
will act as the source of the images.
One the processing is completed, the string is passed to the file handling classes where
it is written into the index.html file.

FIGURE 3 CLASS DIAGRAM FOR SCANNING THE INPUT


10
Department of computer science

2014-2015

Chapter 4

ALANG markdown to HTML

Start

Find the first token

Make the flag true for


that token

NO
Is
spanning
element

Until the
end of
the token

Make a
recursive call
to start to find
the next
element.

Make the flag true for


that token

YES

Store content
until the end
of the token

Send the
content of the
element to
convert
methods

Send the
content of the
element to
convert
methods
FIGURE 4 FLOW CHART FOR SCANNING INPUT

11
Department of computer science

2014-2015

Chapter 4

ALANG markdown to HTML

4.3 Code Convertor


Bootstrap (v3) is an open source HTML framework, released by Twitter, which contains a
fixed set of CSS, JS and Fonts files, which can be used as a basic skeleton to develop websites
which comply with the web standards, while also offering cross-compatibility among the
various rendering engines of the numerous web browsers.
During the conversion of ALANG to HTML, we thoroughly use Bootstrap. Here the processed
input that is stripped of the ALANG tokens are converted to their equivalent HTML tags. This
is achieved by combining the inputs from the lexical analyser and also the various methods
defined in the classes.
One main technique used to achieve the conversion is through switch cases for each of the tag,
linking a specific character to a tag, and then converting it to the respective HTML code.
The lexical analyser sends the tag detected along with the content of the tag. The switch case
is moved to the specific case, the equivalent HTML tags are added to the text and returned to
the scanning class methods.
The code converter also parses variables sent along with the tag.
Syntax for variables:
{ALANG_IDENTIFIER}text_inside[variable_name]{ALANG_IDENTIFIER }
Example:
*bold text [boldvar]*
The use of variables is numerous. The variable is first converted into a div tag, with the variable
name as the id of the div tag. Having this enables the programmer to specify CSS styles to that
particular element only, using the CSS identifiers in ALANG.
This is particularly very useful in Bootstrap, since these variables can be given names such as
.col-xx-xx which can then be used to make the website responsive.
The code conversion has a method to parse the tables in ALANG to its equivalent in HTML.
The parsing method traverses through the | symbol, using two pointers, and outputs the
content into the same row, until a new line is encountered. The code conversion of ALANG
with respect to tables is rather strict. In case of any instances where the syntax is wrong, the
program simply ignores the whole set of tables and returns an error statement.
When an image is encountered, not only are the ALANG tags replaced, but also the images are
moved to specific folders in the website directory using the functions of the file handling
classes.

12
Department of computer science

2014-2015

Chapter 5

Chapter 5

ALANG markdown to HTML

Efficiency of the implementation

The main focus of the convertor is to scan and convert the code. As the writing and
reading from files are hardware and software dependent, the analysis was done on the scanning
and code conversion algorithm. Also readability was a part of the efficiency. The readability
of the code has been improved drastically as seen in previous chapters.
To find the efficiency of the algorithm developed, two methods were used. One was a
mathematical analysis of the code, the other was using the system clock to verify the efficiency
of the program.

5.1 Mathematical analysis of the scanning and converting algorithm [5]


The analysis of the algorithm was straight forward. There are two main methods that
are used to traverse through the input. The start_s() and end_s(): Both of these methods take in
the character index as a reference, thus making the traversal through the whole input only once.
Thus the whole input could be put into one looping statement with the conversion call as the
basic operation. But the complexity of recursive calls to find the nested tags added another
layer of analysis to the algorithm. As the number of nested tags increased, the number of
recursions also increased. There were two different analysis done for each of these cases. While
the analysis of just the increase in the input sixe without increase in the number of nested tags
was simple as it deals with only traversal. It was found that this outer loop belongs to the
efficiency class (n) as the whole function can be considered to be build using a huge looping
statement.
The recursion on modern computers are so fact that the time required to move from
one segment even when the number of nesting increases is almost constant for even large values
of recursive calls. Also it can be observed that a reference to the input string index is passed
making it more effective and less time consuming.

5.2 Analysis using the system clock


An analysis of the code to verify our findings using the mathematical method
was done. Before the execution of the algorithm, a system clock was started, this ran
until the end of the end of the algorithm. The time for a particular input size was noted
down. The input size was gradually increased and the corresponding time was noted
down. A graph was plotted for the same and was found to be almost linear in nature.
A similar approach to find the time efficiency of the recursive function was also
designed. Even with fifty nested elements one inside each other, there was no change
in the time it took to execute the one nested element, thus verifying the mathematical
analysis.

13
Department of computer science

2014-2015

Chapter 5

ALANG markdown to HTML

Time efficiecy of ALANG to HTML Conversion


with constant number of nested tags
1.6

Time in seconds

1.4
1.2
1
0.8
0.6
0.4
0.2
0
0

10

20

30

40

50

60

Size of input in times of n

FIGURE 5 EFFICIENCY
The above graph makes it clear that the conversion algorithm is linear in nature. Some of the
other parsers used to convert Markdown to HTML are exponential in nature. Marking this
convertor quite efficient and quick.

FIGURE 6 INPUT SCREEN

14
Department of computer science

2014-2015

Chapter 5

ALANG markdown to HTML

FIGURE 7 PART OF THE OUTPUT SCREEN

15
Department of computer science

2014-2015

Chapter 6

ALANG markdown to HTML

Chapter 6 Future development


Even though a lot work has been done to make the project as perfect as it can, there are some
improvements that can be made to improve the project.

6.1 Using of regex instead of scanning character-wise


As previously mentioned, C++ standard library lacks support for regex. But other
libraries and also the upcoming version of the C++ standard library, supports regex.
Use of regex, makes the program more efficient and reduces various other limitations
that the present scanning character by character method has. Various advance methods
and several other HTML equivalent can be supported as well.

6.2 Error detection and proper message passing


Due to the time limitation, this feature could not be implemented. With error detection,
the user can be warned about potentially missing tags or when he has used wrong tags.
This improves the user experience of the project and makes it easier for the user to work
with the program. One was to achieve this is through use of exceptions and flags. There
can be a counter to see the number of lines in the program and to pass the line number
using exceptions when an error occurs.

6.3 Use of Other Frameworks


This project makes use of the Bootstrap framework, as previous explained. This,
however, need not be something which ALANG programmers restrict themselves to.
By changing the header and the footer strings of the class provide_htmlcss, it is
possible for anyone to change frameworks to, say, Skeleton, Zurb Foundation, etc.
Also, care must be taken to change the files at C:\website to that of the framework being
used.

6.4 Cross-OS Compatibility


Time has restricted the program to run only on Windows OS. Since this was the OS in
which we primarily began development in, we decided to stick with it. However, a few
changes in the header files and the directory structure can easily port this software to
work almost fluently on Linux, UNIX, Mac and almost all POSIX compliant OSes,
including Windows.

16
Department of computer science

2014-2015

Reference

ALANG markdown to HTML

Reference
[1] A. Aiken, Cloud front, [Online]. Available:
https://d2bk0s8yylvsxl.cloudfront.net/stanford-compilers/slides/01-03-the-economy-ofprogramming-languages.pdf.. [Accessed 26 March 2015].
[2] LaTeX, LaTeX intro, LaTeX, 09 Feb 2008. [Online]. Available: http://latexproject.org/intro.html.. [Accessed 26 March 2015].
[3] A. Ranta, Specifying the lexer, in Implementing Programming Languages, 2012.
[4] Wikipedia, Markdown, [Online]. Available: http://en.wikipedia.org/wiki/Markdown..
[Accessed 26 March 2015].
[5] A. Levitin, Introduction to the Design and analysis of Algorithms, Delhi: Pearson, 2009.

17
Department of computer science

2014-2015

Вам также может понравиться