Вы находитесь на странице: 1из 27

Introduction to the Course

Welcome to the Course!


Lets say that you want to gather information about a particular
website for some kind of security analyst job. So what you can do is
you can go in the terminal and you can start gathering all the
information and its gonna take a lot of time, you know finding the IP
address of the website, getting a Nmap scan, the robots.txt, the
whois and this is what I was actually doing up into a little bit ago and I
was like you know every single website I scan , I am doing the same
thing over and over again. So, why not just build a tool in Python that
allows me to do it all in a single click. So again what the tool is going
to do is that you have to just type the website url like facebook.com
and hit GO, it will grab all the information for you. So its pretty cool.
So we will be scanning the website and storing the result. People
probably doesnt have MySQL installed on their system so I will teach
you how to save the data in a simple text file which is easily readable
to each and everyone.

General.py
We will create a new file called general.py. Inside here what I am
going to do is I am going to make two really simple functions to first
create a directory and another one to just write to a file. So then
whenever we start building our little tools then we can save the
results easily using this file.
Lets start by importing os!
=> import os

Basically now I want to create a function to make a new directory


because lets say that we have a bunch of target or scanning a bunch
of websites, I wanna store all the results in their own directory. So I
have a directory for youtube, ebay and etc etc.
=> def create_dir( directory ):
Now we have to check if the directory already exists or not. So lets
say we are scanning a list of 100 websites, maybe we already have
some websites scanned and we do not want to scan them again.
=> if not os.path.exists ( directory ):
So basically we are only gonna create this folder if it is not created
yet. Simple Enough! So,
=> os.makedirs ( directory )
So lets say that we pass ebay So its gonna say that did we create
this folder yet? No! then I am gonna create it. If it is already created
then you dont have to do anything. That was the easiest function in
the world.
So this next one is just to write a simple file so I am just gonna call it
write_file.
=> def write_file ( path, data ):
Path is where do you want to write it and data is what do you want to
write. So first thing I am going to do is I will open a file with path in a
writing mode.
=> f = open ( path, w )
=> f.write ( data )
=> f.close ()

So all this does is, we are going to pass in the path essentially where
we want to write it, what folder, what location and also what you
want in the file and that is all we need to do for the general.py. In
the next tutorial we are going to start with the fun stuff and making
the actual tools.
Full source code for general.py

Top Level Domain Name


All right guys! Welcome back and I am gonna show you how to get the
top level domain for a website.
Now if you dont know what the top level domain is, its basically a
small part of the URL. Lets understand this by an example.

https://www.facebook.com/
=> This is a simple url or the full url but
when we talk about the top level domain name then it only talks
about facebook.com. Not the protocol, not the www, not the
directory at the end, its only facebook.com in this case.

At first I thought user is going to post a URL and then we are just
going to rip the extra part which is not needed. So for this we are
going to use a Python module. First lets open a terminal and try
whois command.
=> whois
https://www.facebook.com
It will not show the result.

You can easily see the error No whois server is know for this kind of
object. This only works with a top level domain name. Now lets try
with a top level domain.
=> whois facebook.com
Now it will show all the results. So now lets get to work. We will
create a new file called domain_name.py. You need to go ahead and
import tld and from tld we can import get_tld
=> from tld import get_tld
If you dont know how to install this then you can do a pip or a
manual installation. Lets see how you can install pip and tld.
=> sudo apt-get install python-pip

So this has successfully installed pip and now we will install tld using
pip.
=> pip install tld

This has installed the Python module tld.


Lets get back to the domain_name.py file. Now I am going to make a
function called get_domain_name and pass in the url.
=> def get_domain_name ( url ):

So essentially what the user is going to pass in is the full url. So now
we are going to rip the extra part from the full url to get the top level
domain.
=> domain_name = get_tld ( url )
This only accepts a single parameter which is the full url of the
website and then we are just going to return the top level domain i.e
domain name.
=> return domain_name
So again this function right here, what it does is that you pass in an url
and it gives you the plain top level domain name and just so that we
can verify it, if we just run
=> print ( get_domain_name (
https://www.facebook.com
))
Alright lets run this real quick and check it out. So we just passed in
the full url and it returned the top level domain name.
Now we can allow the user to pass in any url and we can extract the
top level domain, looking good, see you guys in the next tutorial.

Full code for domain_name.py

IP Address
Now that we have the top level domain of the target, what we can do
now is we need to get the IP Address of that website and I will show
you guys what I mean. Now I am pretty sure that there is an easy way
to do this but this is how I do it.
So in the terminal if you type
=> host facebook.com

or any other top level domain and hit enter, what this does is, it
returns the IP Address. Now the thing is we just cant take these
results and store them in a text file because we are only worried
about the IP Address , not the whole result. So what I am going to do
is run this command through Python and then we are going to extract
the IP Address from the whole result.
Lets make a new file ip_address.py.

We are going to import os which allows us to make operating system


calls and allows us to use the command line or the terminal through
Python.
=> import os
=> def get_ip_address ( url ):
We are passing an argument which is the top level domain name. Now
the command that we are going to run is :
=> command = host + url
Now what we are going to do is, in order to actually run that
command and get the results back we are going to pretty much open
up a new process.
=> process = os.popen ( command )
So this is going to run a new process, just think of it like running or
opening a new terminal. We are storing the result in the variable
called process.
So now what we need to do after that is we actually need to work on
removing the extra part from the result as we only need the IP
Address.
We are going to write :
=> results = str ( process.read () )
All we are doing here is actually just converting it to a string. Now
what I am going to do from here is, I will make a marker like this :
=> marker = results.find ( has address ) + 12

Lets understand what this method does. This will look into the string
results and will find the index of has address. It will return the index
of first character of the string. So now we will need to move 12
characters ahead so that we can reach at the starting of the IP
Address that we are finding.
=> return results[marker:].splitlines()[0]
The reason I am doing this is because lets say that we have a domain
name and it has multiple IP Addresses, like google.com
=> host google.com

( inside terminal )

We do not want all the IP Addresses , we only want the top one. So
we are using a method split lines to give us only the top level IP
Address.
So now, lets verify whether this works.
=> print ( get_ip_address ( google.com ) )
=> print ( get_ip_address ( facebook.com ) )
Lets run this in the terminal.

We have got the IP Address of google.com and facebook.com.


So it does not matter if the result is one IP Address or more, we wrote
a method in Python that will only extract the top level IP Address of
the website. We can now use this in our other scanning tools. So see
you in the next section.
Source code for ip_address.py

Nmap Port Scan


Alright guys, So now that we have the IP Address of a target, of a
server whatever, what I want to do now is I want to show you guys
how to run a Nmap Scan from Python.
If you dont know what Nmap is, it is a tool that allows you to scan a
server and find out what processes are running and what ports are
open.
So for example
=> nmap -F 54.186.250.79

( terminal )

Now you can see the results. This server is running ssh, http and
https. This can also tell you if server is running FTP or MySQL.
For example if they have a database running on it and a bunch of
other good information. But what we want to do is we want to run
this from Python. There are bunch of options through which we can
run this tool, so we are going to have an additional parameter for the
options.
So the function we are going to create will take two arguments, the
first one is any option that user wants to use and the second one is
the target IP. Lets make a new file called namp.py.
=> import os
=> def get_nmap ( options, url ):
So we are actually going to be passing in IP Address and the options
to this function as parameters.

=> command = nmap + options + + ip


Now take a look, what if they dont include any options, that is going
to be fine because options will be an empty string and it will run the
command without any option.
Next thing we want to do is ofcourse to start a new process.
=> process = os.popen( command )
=> results = str( process.read () )
So here we are building the process and then converting it to a string.
Now we only have to return those results.
=> return results
You can actually parse the results if you want, but there is no special
need so I am returning the whole result as it is.
Lets verify whether this works or not.
=> print( get_nmap( -F, 54.186.250.9 ) )

So it runs a scan and it returns the result. Later on we are going to


save all this to a text file. For now, this is how we run Nmap Scan with
Python. I will see you in the next section.
Source Code of nmap.py

Robots.txt
Alright guys, welcome back and in this video I am going to show you
how to build a Python tool to scan for a robots.txt file. Now if you
dont know what robots.txt file is, its this!
Whenever you make a website a bunch of search engines like google,
yahoo. They are going to crawl your website and thats how with the
crawler they will go page by page and store in their search engine. So
whenever people type in the website name, all the results pop up for
profiles page, forum etc etc. Now the problem with this is whenever
you are developing a website there are some pages that you dont
want google or yahoo to crawl. Some examples of this page would be
the admin login page, maybe some sensitive areas or maybe some
moderator panels. So lot of the private areas of the website, you
want to make sure that google does not crawl. So what you can do is
you can make a special file called robots.txt and you can upload this

to your server and usually what web developer do is they list all the
files that they do not want google to crawl and then google ignores
them.
Now the cool thing is whenever you are analysing a website for
security issues, one of the first file that you always go to is that
robots.txt file.
Why is that?
So, if the developers said Hey google dont crawl these because the
people shouldnt be looking at them. So we can look at the file and
make sure these are the areas which are sensitive.
So lets make a new file called robot_txt.py.
=> import urllib.request
All this does is, it allows us to use or make a request to a url like get
request. So basically it downloads files from the internet. We also
need to import a package called io.
=> import io
This is just for encoding so that we can ensure we are getting our data
in a readable format.
=> def get_robots_txt ( url ):
So here we are going to pass in a URL
=> if url.endswith ( / ):
=> path = url
=> else

=>

path = url + /

So what we basically did is, if the url ends with a forward slash then it
will remain as it is and if it does not then we will add a forward slash
at the end of the url.
So now we have to make the request to that file. So we are going to
request a file from the internet.
=> req = urllib.request.urlopen( path + robots.txt, data = None )
So this is the function, it is going to open this file i.e robots.txt and it
is going to store the result in the variable. Now we have to make sure
that our data is encoded properly.
=> data = io.TextIOWrapper( req, encoding = utf-8 )
Now lets just return whatever those results were.
=> return data.read()
So again, all we are doing is passing a url and we are getting
robots.txt file from that website and then we are returning the data.
Lets verify if it works or not.
=> print ( get_robots_txt(
https://www.reddit.com/
))

So this is reddit.coms robots.txt.


So there you go, looking beautiful and you guys can play with this for
different websites. In the next section I am going to show how to get
whois for a website. See you next time
Source code for robots_txt.py

Whois
Welcome back and in this section I am going to show you how to get
the whois of a website domain. Now if you dont know what whois
is, this is a tool to give you information about who registered a
domain name. So if you write any domain name i.e the top level
domain name it will give you the required information. We also did
this some tutorials back.
=> whois facebook.com

( terminal )

It will give you information about who registered the domain, who
they registered it through and if you didnt choose domain name
privacy then it will also give information like your phone number,
their address and a bunch of personal information.

So another thing I want to point out, whenever you are making a


website, you buy a domain name, there is always going to be an
option that says : Do you want to buy domain name privacy?. Its going
to be like 10 bucks a year or something but its worth it because you
dont have to show your personal information to the world.
So now lets make a new file whois.py.
=> import os
=> def get_whois ( url ):
=> command = whois + url
=> process = os.popen( command )
=> results = str( process.read() )
=> return results
This is the simplest tool we have build yet and you are familiar with all
things that we have done as we have been doing this from last 2-3
sections.
Lets see if its working.
=> print( get_whois(
reddit.com
))

So this is all the information about the domain reddit.com. You can
try this for any website you want. This is how whois is done using
Python, see you in the next section.
Source Code for whois.py

The Final Program


Alright guys and welcome back, the last thing that we need to do for
the program is, now that we have all the individual tools created to
find the domain name, IP Address, Nmap Scan, robots.txt and whois.
We have got all those tools and now I am going to teach you how to
make a function to run all of them with a single click. So then the user
has to do is to give a URL and hit Run and it gathers all of those
information automatically.
Lets first import all of our tools that we have created.
=> from general import *
=> from domain_name import *
=> from ip_address import *
=> from nmap import *
=> from robots_txt import *
=> from whois import import *
Now I will make a new directory to store all of our results.
=> ROOT_DIR = companies
You can name this company's, targets, projects or whatever you want
to but this is just going to be a separate folder and every website we
scan will be stored inside this directory.
=> create_dir ( ROOT_DIR )

Now I will make a new function that user is going to call.


=> def gather_info ( name, url ):
We are passing the name i.e the name of the website or the company
and then the URL of that website.
Now what we are going to do is call all the different tools and they
will run a scan and we will then save all those results to different files.
Lets call all those tools.
=> robots_txt = get_robots_txt ( url )
=> domain_name = get_domain_name ( url )
=> whois = get_whois ( domain_name )
=> ip_address = get_ip_address ( domain_name )
=> nmap = get_nmap ( -F , ip_address )
Till now what we did is ran all those tools and saved their results in
different variables. Now we can make one more function and all this
function does is save this text and write them to a text file.
=> create_report( name, url, domain_name, nmap, robots_txt, whois )
Now lets create the function
=> def create_report ( name, url, domain_name, nmap, robots_txt
,whois ):

Now remember that each new website that you scan which is
essentially a new project, you want to save it inside a new folder. So
what we can do is :
=> project_dir = ROOT_DIR + / + name
=> create_dir( project_dir )
Remember that we have created the function create_dir in our
general.py file that we did in the first section of the course.
Now we only have to write the content to each and every file. We will
use the write_file function that is also a part of general.py file.
=> write_file( project_dir + /full_url.txt, url )
=> write_file( project_dir + /domain_name.txt, domain_name )
=> write_file( project_dir + /nmap.txt, nmap )
=> write_file( project_dir + /robots.txt, robots_txt )
=> write_file( project_dir + /whois.txt, whois )
And I think this is it. Lets now call the function.
=> gather_info( google,
https://www.google.com
)
Now let me run and check whether its working or not. It will take a bit
of time to run.

So, now we have all the details. Now we are saving a hack lot of time
by using this tool with a single click.
Source Code for main.py

Вам также может понравиться