Академический Документы
Профессиональный Документы
Культура Документы
General.py
We will create a new file called general.py. Inside here what I am
going to do is I am going to make two really simple functions to first
create a directory and another one to just write to a file. So then
whenever we start building our little tools then we can save the
results easily using this file.
Lets start by importing os!
=> import os
So all this does is, we are going to pass in the path essentially where
we want to write it, what folder, what location and also what you
want in the file and that is all we need to do for the general.py. In
the next tutorial we are going to start with the fun stuff and making
the actual tools.
Full source code for general.py
https://www.facebook.com/
=> This is a simple url or the full url but
when we talk about the top level domain name then it only talks
about facebook.com. Not the protocol, not the www, not the
directory at the end, its only facebook.com in this case.
At first I thought user is going to post a URL and then we are just
going to rip the extra part which is not needed. So for this we are
going to use a Python module. First lets open a terminal and try
whois command.
=> whois
https://www.facebook.com
It will not show the result.
You can easily see the error No whois server is know for this kind of
object. This only works with a top level domain name. Now lets try
with a top level domain.
=> whois facebook.com
Now it will show all the results. So now lets get to work. We will
create a new file called domain_name.py. You need to go ahead and
import tld and from tld we can import get_tld
=> from tld import get_tld
If you dont know how to install this then you can do a pip or a
manual installation. Lets see how you can install pip and tld.
=> sudo apt-get install python-pip
So this has successfully installed pip and now we will install tld using
pip.
=> pip install tld
So essentially what the user is going to pass in is the full url. So now
we are going to rip the extra part from the full url to get the top level
domain.
=> domain_name = get_tld ( url )
This only accepts a single parameter which is the full url of the
website and then we are just going to return the top level domain i.e
domain name.
=> return domain_name
So again this function right here, what it does is that you pass in an url
and it gives you the plain top level domain name and just so that we
can verify it, if we just run
=> print ( get_domain_name (
https://www.facebook.com
))
Alright lets run this real quick and check it out. So we just passed in
the full url and it returned the top level domain name.
Now we can allow the user to pass in any url and we can extract the
top level domain, looking good, see you guys in the next tutorial.
IP Address
Now that we have the top level domain of the target, what we can do
now is we need to get the IP Address of that website and I will show
you guys what I mean. Now I am pretty sure that there is an easy way
to do this but this is how I do it.
So in the terminal if you type
=> host facebook.com
or any other top level domain and hit enter, what this does is, it
returns the IP Address. Now the thing is we just cant take these
results and store them in a text file because we are only worried
about the IP Address , not the whole result. So what I am going to do
is run this command through Python and then we are going to extract
the IP Address from the whole result.
Lets make a new file ip_address.py.
Lets understand what this method does. This will look into the string
results and will find the index of has address. It will return the index
of first character of the string. So now we will need to move 12
characters ahead so that we can reach at the starting of the IP
Address that we are finding.
=> return results[marker:].splitlines()[0]
The reason I am doing this is because lets say that we have a domain
name and it has multiple IP Addresses, like google.com
=> host google.com
( inside terminal )
We do not want all the IP Addresses , we only want the top one. So
we are using a method split lines to give us only the top level IP
Address.
So now, lets verify whether this works.
=> print ( get_ip_address ( google.com ) )
=> print ( get_ip_address ( facebook.com ) )
Lets run this in the terminal.
( terminal )
Now you can see the results. This server is running ssh, http and
https. This can also tell you if server is running FTP or MySQL.
For example if they have a database running on it and a bunch of
other good information. But what we want to do is we want to run
this from Python. There are bunch of options through which we can
run this tool, so we are going to have an additional parameter for the
options.
So the function we are going to create will take two arguments, the
first one is any option that user wants to use and the second one is
the target IP. Lets make a new file called namp.py.
=> import os
=> def get_nmap ( options, url ):
So we are actually going to be passing in IP Address and the options
to this function as parameters.
Robots.txt
Alright guys, welcome back and in this video I am going to show you
how to build a Python tool to scan for a robots.txt file. Now if you
dont know what robots.txt file is, its this!
Whenever you make a website a bunch of search engines like google,
yahoo. They are going to crawl your website and thats how with the
crawler they will go page by page and store in their search engine. So
whenever people type in the website name, all the results pop up for
profiles page, forum etc etc. Now the problem with this is whenever
you are developing a website there are some pages that you dont
want google or yahoo to crawl. Some examples of this page would be
the admin login page, maybe some sensitive areas or maybe some
moderator panels. So lot of the private areas of the website, you
want to make sure that google does not crawl. So what you can do is
you can make a special file called robots.txt and you can upload this
to your server and usually what web developer do is they list all the
files that they do not want google to crawl and then google ignores
them.
Now the cool thing is whenever you are analysing a website for
security issues, one of the first file that you always go to is that
robots.txt file.
Why is that?
So, if the developers said Hey google dont crawl these because the
people shouldnt be looking at them. So we can look at the file and
make sure these are the areas which are sensitive.
So lets make a new file called robot_txt.py.
=> import urllib.request
All this does is, it allows us to use or make a request to a url like get
request. So basically it downloads files from the internet. We also
need to import a package called io.
=> import io
This is just for encoding so that we can ensure we are getting our data
in a readable format.
=> def get_robots_txt ( url ):
So here we are going to pass in a URL
=> if url.endswith ( / ):
=> path = url
=> else
=>
path = url + /
So what we basically did is, if the url ends with a forward slash then it
will remain as it is and if it does not then we will add a forward slash
at the end of the url.
So now we have to make the request to that file. So we are going to
request a file from the internet.
=> req = urllib.request.urlopen( path + robots.txt, data = None )
So this is the function, it is going to open this file i.e robots.txt and it
is going to store the result in the variable. Now we have to make sure
that our data is encoded properly.
=> data = io.TextIOWrapper( req, encoding = utf-8 )
Now lets just return whatever those results were.
=> return data.read()
So again, all we are doing is passing a url and we are getting
robots.txt file from that website and then we are returning the data.
Lets verify if it works or not.
=> print ( get_robots_txt(
https://www.reddit.com/
))
Whois
Welcome back and in this section I am going to show you how to get
the whois of a website domain. Now if you dont know what whois
is, this is a tool to give you information about who registered a
domain name. So if you write any domain name i.e the top level
domain name it will give you the required information. We also did
this some tutorials back.
=> whois facebook.com
( terminal )
It will give you information about who registered the domain, who
they registered it through and if you didnt choose domain name
privacy then it will also give information like your phone number,
their address and a bunch of personal information.
So this is all the information about the domain reddit.com. You can
try this for any website you want. This is how whois is done using
Python, see you in the next section.
Source Code for whois.py
Now remember that each new website that you scan which is
essentially a new project, you want to save it inside a new folder. So
what we can do is :
=> project_dir = ROOT_DIR + / + name
=> create_dir( project_dir )
Remember that we have created the function create_dir in our
general.py file that we did in the first section of the course.
Now we only have to write the content to each and every file. We will
use the write_file function that is also a part of general.py file.
=> write_file( project_dir + /full_url.txt, url )
=> write_file( project_dir + /domain_name.txt, domain_name )
=> write_file( project_dir + /nmap.txt, nmap )
=> write_file( project_dir + /robots.txt, robots_txt )
=> write_file( project_dir + /whois.txt, whois )
And I think this is it. Lets now call the function.
=> gather_info( google,
https://www.google.com
)
Now let me run and check whether its working or not. It will take a bit
of time to run.
So, now we have all the details. Now we are saving a hack lot of time
by using this tool with a single click.
Source Code for main.py