Cyb - Detective - Python For OSINT. 21 Day Course For Beginners (2023) PDF
Cyb - Detective - Python For OSINT. 21 Day Course For Beginners (2023) PDF
If you use (or plan to use) OSINT tools written in Python, but you're not
satisfied with the standard functionality and would like to modify them a bit,
this course will help you learn how to do that as quickly as possible.
Also, this course will help you to automate various routine tasks related to
investigations: processing data from API, collecting data from websites,
collecting search results, working with Internet archives, creating reports and
data visualization.
The main goal of the course is not to teach you how to write Python code,
but to teach you to spend less time on routine OSINT tasks. So, in addition
to code examples, I will also give you links to different services that will help
you solve different problems.
This course will also be useful for those who are far from Computer Science
and want to raise their technical level a little, try to use Linux, learn to work
with the command line and understand different popular IT terms like
"JSON", "API", "WHOIS" etc.
2
Who should avoid this course?
For those who have never done OSINT and are going to do OSINT. This
course consists for the most part of specialized topics related to investigation
and data collection.
This course omits VERY many important things and sometimes even
recommends what could have been called bad practice. There are things
that don't matter when writing small automations for everyday OSINT tasks,
but are extremely important when creating serious team projects.
3
How to take this course
The first thing I advise you to do is to look at the table of contents, flip through
the pages of the book, and clearly decide if this course will be useful to you.
If you've made a clear decision, read one lesson each day thoughtfully and
try every day to think about how you could apply what you have learned to
your investigations. If you happen to miss a day or even a week, please don't
scold yourself for it, but just continue the course day by day.
I also recommend that you try to run all the sample code and try to change
something in it.
All the code samples in the book are available in this repository -
https://github.com/cipher387/python-for-OSINT-21-days.
4
But to strengthen your discipline and motivate you to take it to the end, I
recommend you make a small donation.
Free courses people often don't finish until the end, and paying will help you
take learning seriously. Also, every donation will motivate me to make new
OSINT courses and make them available to people all over the world.
For example, if you smoke, then for you the price of the course may be equal
to the price of a pack of your favorite cigarettes.
If you drink alcohol, then the cost of a can of beer in the nearest supermarket
or a small glass of wine in a restaurant on the next street.
If you like fast food, go with the price of a small burger or package of fries.
If for some reason you don't want to send a donation, I would still be very
happy if you took this course.
5
Day 0. Preparing for work
To fully participate in this course, you need internet access and a computer
or smartphone with Python and Git installed, or the latest version of a popular
browser to run Gitpod (web app providing development environments in your
browser) or its analogues (Repl.it, CodePen, CodeAnywhere etc).
I also recommend that you install the Notion app for your computer or phone
so that you can mark there each of the 21 days to complete the "complete
course task" task. Notion is free for personal use.
If for some reason you don't like the app, you can just read one chapter of
the PDF book a day, but I'd still advise you to take a closer look at Notion.
How to install Python?
I won't go into detail on this course, as all readers work on different platforms.
I will just give links to instructions for different platforms.
Installation files:
Windows:
https://www.python.org/downloads/windows/
MacOS :
https://www.python.org/downloads/macos/
Linux:
https://www.python.org/downloads/source/
6
Android Termux App
https://play.google.com/store/apps/details?id=com.termux&hl=en&pli=1
(use Linux instructions to install)
iOS Pythonista App https://apps.apple.com/us/app/pythonista-
3/id1085978097?ls=1
Git is a version control system. It helps you debug bugs in your code,
allowing you to go back to the state "when it all still worked" and to organize
work in large teams of programmers (clearly see "who broke everything").
As part of this course, you'll use Git to copy code samples from Github and
to install various OSINT tools.
https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
I would strongly advise against doing it any faster. Unless you are a
schoolboy (or girl) who is on holiday and has a lot of free time. In that case
you can do 2-3 lessons per day (but no more).
7
If you work 8 or more hours a day, you can do 1 lesson every two or three
days. You could also take a break for a few days during the course to give
yourself time to rest and reflect on what you've learned.
However, I would not recommend that you do this course for more than two
or three months.
I would also recommend that you take the lessons strictly in order.
But if you decide otherwise, there is no harm in doing so. Unless you might
encounter a "No module named" error if the script uses the package set up
in one of the previous lessons.
In this situation, you just need to install the right module (package) using pip.
For example:
8
Day 1. Run the first script
Let's start by copying the Github repository with the sample code files for this
course to your computer.
If you see a message asking for your username and password, enter your
Github account username and password.
A Github repository is essentially a storage for files. with code, data files,
and documentation. It differs from the usual directory in some additional
functionality: version history, the ability to make issues (notes with bug
reports and questions), forks (copies that are finalized independently of each
other) and some other features.
python Day_1/start.py
9
Try changing the text in quotes and run the script again.
If you're not using Gitpod, you may now be faced with the question, "What's
the best application to edit Python code files in?"
You can use any text editor you like. Notepad or TextEdit will do too, but I
still recommend you to try popular code editors as well: Sublime Text,
Notepad++, Visual Studio Code etc. They can automatically highlight syntax
and suggest function/variable names (auto-complete code).
This lesson is much shorter than the others because I want to leave free time
for those who haven't had time to install Python yet and for those who already
have something going wrong and need to solve some additional problems.
python –version
git –version
10
If the folder “python-for-OSINT-21-days” is not copied, check if you entered
the password correctly.
If you get an error message when you run the script, try deleting the folder
and copying again. Just in case you changed something in the code when
you were reviewing it.
https://gitpod.io#https://github.com/cipher387/python-for-OSINT-21-days
And create New Workspace with standard settings. If necessary, log into
your Github, Gitlab or Bitbucket account.
11
When it's all ready, type at the command line:
python Day_1/start.py
Please do not proceed to the next day until you have successfully
completed this script and the message "Welcome to 21 day Python
course!" appears.
12
Day 2. Minimum Basic Syntax
Today we're going to introduce you to four basics Python syntax concepts
that are also found in most popular programming languages.
Yes, readers who have studied Python before may think I'm missing some
very important things. But once again, this course is not intended to make
you a good Python developer, it simply shows you possible solutions to the
problem of automating a routine in OSINT.
Variable
text values (e.g., a person's name or a chapter in a book). This type of data
is declared with str();
Integer numbers. Declared using the int() function;
Float numbers. Declared using the float() function;
True/false. Declared using the bool() function.
There are a lot of other data types we won't look into in this course.
You can use capital letters, small letters, and the underscore character in
variable names. You can also use digits, but not as the first character.
13
Try to give variables names that make as much sense as possible, so that it
will be easier for you to understand your code after a while.
Let's practice a little. Run the script variable.py from the Day_2 folder:
cd Day_2
python variable.py
Note that we run the Python script a little differently than in the first lesson.
Last time we specified the path to the file right away, but now we make the
Day_2 folder active first, and then run the script. Both variants are
acceptable, do it the way you like.
Hereafter, I will refer to code comments with a # sign. You can also add text
after the # sign in script code and it will be ignored by the interpreter.
In this course, I recommend that you first look at the code picture for a couple
of minutes and try to guess what the code does. And only after that read the
code with comments.
first_name = "John"
14
# Then we assign to last_name the value entered by the user using input()
function. (The \n after the question mark is a line break, you can remove it if
you want):
# And then we output the value of both variables with the function print():
Note that we use both single and double quotes for text strings. Both types
are valid in Python code.
In a bit of this course, you will learn how to create your own functions.
Conditional statement
Run condition.py:
15
# First, we use the input() function to ask the user how old he is.
List
A list is an ordered set of items, each with its own number or index, allowing
quick access to it.
Run list.py:
16
# Create a list of women's names:
print(girls)
# Add an item to it with built-in append() function (by default new element will
be added to the end of the list):
girls.append("Brenda")
print(girls)
# Print item number three (the items in the list begin with zero):
print (girls[3])
In this course we will use lists lots of times and learn lots of different built-in
functions for work with them.
17
If you've studied other programming languages, you're probably familiar with
the concept of arrays. Python also has this concept. Python arrays differ from
lists in particular by the fact that in lists you can use data of different types
(for example, the first element of a list can be a string, and the second a
number), while arrays must consist only of numbers or only of strings.
There are other differences that make lists a more flexible and convenient
tool. For most tasks related to OSINT, it is sufficient to know how to use lists
and we will not study arrays in this course.
Lists can also be multidimensional. When each list item is also a list of 2, 3
or more items. They will be mentioned in the course, but will not receive
much attention.
Loop
Run loop.py:
18
# Create a list of girls' names:
for x in range(20):
print(x)
When using loops and conditions, always pay attention to the number of
indents. There should always be four spaces before the "internal" code.
In my opinion, this is the minimum theory you need to start writing Python
scripts. Tomorrow, we start learning the practical skills that will be useful for
OSINT.
19
Day 3. Install and run Python command line tools
If you often read my Twitter, you may regularly see posts about command
line tools for OSINT. Most of them are written in Python. JavaScript
(Node.js), Go, Bash (Shell script) and Rust are also popular.
Today we will learn how to set them up to run. As an example, I will use
Thorndyke and Blackbird, a tools to search a user's social network pages
by nickname.
First way
20
The Python Package Index (PyPI) is a repository of software for the Python
programming language (pypi.org). It contains over 300,000 packages! We
will use it in almost every lesson of this course.
Type thorndyke + the nickname you are interested in at the command line:
thorndyke johmsmith
21
Second way
Unfortunately, not all tool developers add their projects to PyPi (despite the
fact that it's quite easy to do). Therefore, sometimes you have to copy them
from Github, install related modules on your own and run the tool by referring
directly to the code file instead of the command name.
Now we will install another tool for searching social networking pages by
nickname: blackbird (https://github.com/p1ngul1n0/blackbird). Type in the
command line:
22
cd Day_3
git clone https://github.com/p1ngul1n0/blackbird
cd blackbird
pip install -r requirements.txt
The requirements.txt file contains a list of packages needed to run the tool.
Let me remind you once again, that command "cd" is used to navigate to
another folder.
Third way
23
You can also launch tools directly from Python script code, using Subprocess
module (https://docs.python.org/3/library/subprocess.html), which allows
you to run different command line commands directly in Python code
Move the launch.py file from the Day_3 folder to the blackbird folder.
mv launch.py blackbird
Before running the command, make sure that you are in the Day_3 folder.
If you are in the blackbird folder, use this command to go one directory
above:
cd ..
Run launch.py:
24
# Import the subprocess module:
import subprocess
This way you can run not only Python scripts, but also scripts created in other
programming languages.
The most important thing you should remember from this course is that
Blackbird and Thorndyke are NOT the best solutions for nickname
enumeration.
25
Day 4. Reading and writing files
Writing file
Run write_text.py:
26
results_file = open("results.txt", "a")
results_file.write(result)
results_file.close()
Note that the open() function has two arguments. The first is the name of the
file and the second is the so-called "opening mode".
27
Now let's try to read the text of the file we just created.
Reading file
Run read_file.py:
28
results_file = open("results.txt", "r")
print(results_file.read())
There is another way to go. With a simple loop, you can read the lines of a
file one at a time and perform some action on each line.
29
# Create a variable with the line number:
stringNumber = 1
with open("results.txt") as f:
for line in f:
print(str(stringNumber) + ". " + line)
stringNumber += 1
Note that we use str() to convert a variable of type integer to a string. You
should always do this when you concatenate a text variable and a number
into one string.
30
If you do not want to print all lines in the file, but only lines with certain
numbers, you can use the readlines() function, which converts the file lines
into list items.
.
Run readlines.py:
.
# Open results.txt file:
f = open("results.txt", "r")
# Create an array whose elements are the lines of the results.txt file:
stringList=f.readlines()
# Print the array element with index one (the second line of the file). Don't
forget that in lists the counting goes from zero:
print(stringList[1])
If, on the contrary, you need to write the array elements to a file, so that each
element is written on a separate line, use the writelines() method.
31
Storing data in files is not always the best practice (although it is the easiest
to learn). If you regularly have to work with files that are tens or hundreds of
megabytes in size, you should consider starting to use databases. We'll
touch on this topic a bit in Lesson 8.
32
Day 5. Handling HTTP requests and working with
APIs
When you open a web page in your browser, there is a request to the server.
In response to the request, the server returns the status, headers and body
of the response (for example, html-code of web page, some data in CSV,
JSON or XML format).
OSINT often needs to automate data collection from web pages or various
APIs (Application Programming Interface). And the basic skill needed to do
this is to write code to send requests to web servers and process the
responses.
33
APIs (Application Programming Interface) is a technology that allows you to
interact with an application by sending requests to a server. For example,
the Github API allows you to retrieve data about Github users, as well as
make changes to repositories and more.
Run api_request.py:
# Add the requests package to the script file using the import command:
34
import requests
# Making a request:
response =
requests.get("https://api.github.com/search/users?q=javascript")
print(response.json())
There are a huge number of APIs, both paid and free, which provide useful
data for OSINT. For example:
A list of over a hundred useful OSINT APIs can be found in this Github
repository:
https://github.com/cipher387/API-s-for-OSINT
35
It is not necessary to write a separate Python script to test different APIs. It
is better to use special online services that can simulate different types of
requests and authorization methods.
36
We will come back to the topic of network requests when we talk about JSON
files, scraping and the use of proxy servers. We will learn how to add headers
to the query and extract data from the response texts.
37
Day 6. JSON
In the last lesson, we talked about the fact that a lot of useful data for
investigations can be obtained through various APIs. Many of them return
data in JSON (JavaScript Object Notation) format (as well as CSV and XML,
but we will talk about these formats in the next lessons).
In the last lesson, you saw a very good example of JSON data when working
with the Github API (documentation).
Each object has properties that store information about the user: login,
html_url, id, followers_url etc.
Now let's try to extract data from JSON files using code. The JSON package
(https://docs.python.org/3/library/json.html) is available in Python by default
and does not require installation.
38
Reading one field
Run read_one_field.py:
import json
import requests
# Then make the same request to the Github API that we did in the previous
lesson:
response =
requests.get("https://api.github.com/search/users?q=javascript")
# Assign to the variable the value of the response to the query in json format:
Json_data = response.json()
39
print (json_data['total_count'])
# Output the link to the first Github profile from the results:
print (json_data['items'][0]['html_url'])
But most often we need to extract not a single value, but information about
a whole list of objects. For example, the links to Github user profiles from the
example above.
Run read_list_of_fields.py:
40
import json
import requests
response =
requests.get("https://api.github.com/search/users?q=javascript")
json_data=response.json()
usersCount = len(json_data['items'])-1
for x in range(usersCount):
print (json_data['items'][x]['html_url'])
It often happens that the structure of JSON files is quite complicated and it
is difficult to understand how to mark the path to certain data. Special
41
services can help you figure this out. For example, https://jsonpath.com/ or
https://jsonpathfinder.com.
And before you write any code to process JSON files, remember that
sometimes it's easier to convert them to CSV files and just cut out the
columns with the data you need:
42
Day 7. CSV
Here is an example of how a CSV file looks when opened in a text editor.
And this is how it looks when you open Numbers (MS Excel equivalent for
Mac).
43
Let's try to create a CSV file using the CSV package
(https://docs.python.org/3/library/csv.html).
Run write_csv.py:
import csv
44
# Create a list with data headers:
writer.writerow(header)
writer.writerow(data)
csv_file.close()
The CSV file created in this way can be opened in any spreadsheet editor:
Excel, Numbers, Google Sheet etc.
45
Now, let's try to read the contents of the CSV file.
Run read_csv.py:
import csv
csv_reader = csv.reader(csv_file)
46
for row in csv_reader:
print(row)
Run read_csv_one_column.py:
import csv
csv_reader = csv.reader(csv_file)
# One by one divide the string into columns, using delimiter - semicolon:
47
# And print first column:
print(columns[0])
JSON to CSV
Sometimes you need to convert data from JSON to CSV so that it can be
conveniently viewed and opened in Microsoft Excel/Google Sheet.
You can do it with special services like csvjson.com (and that would be the
best solution).
But I will show you how to do it with Python code to reinforce what you have
learned in the last two days.
48
Run json_to_csv.py:
import json
import requests
import csv
response =
requests.get("https://api.github.com/search/users?q=javascript")
json_data=response.json()
49
# Open and simultaneously create file test.csv:
usersCount = len(json_data['items'])-1
# Pass each line of JSON data one by one, create empty string object, add
login, link to profile and link to avatar, write string to csv file:
for x in range(usersCount):
row = []
row.append(json_data['items'][x]['login'])
row.append(json_data['items'][x]['html_url'])
row.append(json_data['items'][x]['avatar_url'])
writer.writerow(row)
csv_file.close()
50
This is what the contents of the test.csv file should look like after running the
csv_to_json.py script.
51
Day 8. Databases
There are a lot of python packages to work with almost all popular databases:
MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch etc. I decided in this
course not to go over code examples for any one database, but just to give
some universal advice.
SQL (Structured Query Language) format files are very often encountered in
investigations. This format stores database dumps, in which you can often
find useful contact information (a common example is a list of emails and
phone numbers of employees of the company).
52
If you do a Google Hacking Database search for "sql" (particularly in Juicy
Info Dorks section), you will find over 1,500 example queries to find data in
.sql files.
The easiest way to view such a file is simply to convert it to CSV and open it
in Excel/Numbers/Google Sheet. This free online converter will help you. For
example, Rebsase SQL to CSV.
53
Also useful for investigation contact information that can be stored and
databases of other formats: MS Access (.MDB), SQl Server (.MDF), SQLite
(.sqlite, .sqlite3, .db, .db3, .s3db, .sl3), Firebird (.FBD) and many others.
Rebasedata.com
Anyconv.com
101convert.com
In this lesson, we will not run sample code. As practice, I'll just recommend
that you find database files with data contacts on Google, see how they're
54
arranged, and convert them to CSV. Use the filetype:pdf operator and
sample queries from Google Hacking Database.
55
Day 9. Automate the collection of search results
There are a huge number of Python tools for collecting search results from
different search engines. Many of them are designed to search for vulnerable
sites and juicy info (for example, tables with personal contact data) using
Google Dorks.
They save a huge amount of time looking at search results. Because they
can automatically analyze the content of found web pages.
56
Install package from pip:
Check installation:
Please remember this flag. It works for most Python packages. If you type -
-help or -h after the command name, help information about its use will be
displayed.
Run ddg_search.py:
57
# Importing the ddg package:
keywords = 'osint'
# Send a search request, specifying that we want to see the search results
for the US with safe search turned off:
print(results)
Simply displaying the results on the screen is not very useful. The same
could be done in the browser. Let's try to save them to a CSV file so that they
can be automatically analyzed later.
Run ddg_search_to_csv.py:
58
# Importing ddg and csv modules:
keywords = 'osint'
# Send a search request, specifying that we want to see the search results
for the US with safe search turned off:
59
# Go through the search results one by one and write each one in a line of
the CSV file with three fields - title,body,href (note that each time you write
a string, it creates a list with three elements):
for x in range(len(results)):
row = [results[x]["title"],results[x]["body"],results[x]["href"]]
writer.writerow(row)
csv_file.close()
Python makes it easy to find and manipulate files in folders (we'll talk more
about that in Lesson 14), but sometimes it's more convenient just to combine
several CSV files into one to save your time while writing code.
60
You can do this with any service you can find in Google by searching for
"Merge CSV files online". For example, https://extendsclass.com/merge-
csv.html
This package also allows you to download page content from search results.
You can download all found files (html, pdf, xlsx etc) and then automatically
analyze them or just do a simple keyword search in them.
Run ddg_search_download_pdf.py:
61
# Importing the ddg package:
# Send a search request, specifying that we want to see the search results
for the US with download option turned on:
With duckduckgo_search packages you can you can also collect Answers,
Images, Videos, News, Maps, Suggestions, and translate text.
62
Note that in the last example we used the extended search operator
filetype:pdf. Other advanced search operators can also be used in queries
for duckduckgo_search.
It is worth noting that the ability to use advanced search operators at every
opportunity is a very useful skill for every OSINT specialist. Here is a list of
reference articles with advanced search operators for search engines,
social media platforms, mailboxes, and other services:
63
Day 10. Scraping
The most important thing to know about scraping and Python is that writing
your own script from scratch for each task is most often not the best solution.
It is better to try different ready-made tools first.
Browse AI
https://www.browse.ai
AnyPicker https://chrome.google.com/webstore/detail/anypicker-ai-
powered-no-c/bjkpgfhekfmdffdphnniobddhkjlmmlj
ScrapeStorm
https://www.scrapestorm.com
If you need to collect data from any popular social networks, try to find any
solution made specifically for a particular platform. For example, YouTube
tool (https://github.com/nlitsme/youtube_tool) for YouTube or Stweet
(https://github.com/markowanga/stweet) for Twitter.
But sometimes you may encounter a problem for which there are no ready-
made solutions and you want to write your own script to solve it. There are a
lot of Python packages for scraping: Scrapy, Selenium, ZenRows etc.
64
We will use BeautifulSoup package
(https://pypi.org/project/beautifulsoup4/) for scraping. It is installed by
default.
Beautifulsoup package can also be useful to work with data in XML format
(you may encounter it in particular when retrieving data from some API). In
this case, in addition to Beautifulsoup you should use LXML package
(https://lxml.de).
Run scraping.py:
import requests
from bs4 import BeautifulSoup
url = "https://pypi.org/project/duckduckgo-search/"
65
# Make https request:
web_page = requests.get(url)
header = soup.find("h1").get_text()
# Print h1 header:
print(header)
More CSS-selectors:
https://www.freecodecamp.org/news/css-selectors-cheat-sheet/
The easiest way to find out which selector stands for a certain html-element
is to look at the source code of the page with the help of developer tools,
which are available in every popular browser.
66
And for scraping pages with a complex structure (which contains many
nested elements), you can use special browser extensions that display the
full "path" to the element. For example, HTML DOM Navigation
67
It often happens that the code that is displayed when the site is loaded with
Python scripts is very different from what is displayed in the browser. This is
caused by the fact that some elements are added after the page has been
loaded by executing JavaScript code.
To see what I mean, try opening some Twitter account code in the View
Rendered Source extension.
With it you can visually compare how the HTML code looks immediately after
receiving a request from the server, and how it looks after performing certain
actions on the page (try scrolling down the ribbon a bit and restarting the
extension again).
For scraping websites where the code changes a lot after the execution of
JavaScript code in the browser, you can use such packages as Selenium
(https://selenium-python.readthedocs.io). It allows you to use Python to open
different browsers and simulate user actions in them.
68
Day 11. Regular expressions
\d - allows you to find the digits. Here is a brief scheme that gives you a basic
understanding of regular expression syntax:
69
Source of this image.
Run extract_emails.py:
70
# Importing requests and re packages:
import requests
import re
# Create a variable with a link to the page we want to retrieve data from:
url = "https://cleantalk.org/blacklists/[email protected]"
html = requests.get(url).text
print(result)
There is only one code example in this lesson, since your main goal today
is to learn in detail how regular expressions are used in OSINT.
71
Day 12. Proxies
Very many sites and services block IP addresses that send a large number
of requests in a short time. You can bypass such protection by using Proxy
servers (It doesn't always work, but sometimes it does).
Using them in Python is very easy. You just need to specify the address of
the server you want to redirect traffic through when making a request.
Run simple_proxy.py:
72
# Import requests package:
import requests
proxies = {
'https': '135.181.149.47:8080',
}
url = 'https://cleantalk.org/blacklists/[email protected]'
73
# Print text of web page:
print(response.text)
https://hidemy.name/en/
https://github.com/clarketm/proxy-list
https://github.com/TheSpeedX/PROXY-List
74
https://github.com/ShiftyTR/Proxy-List
https://github.com/jetkai/proxy-list
Therefore, you may need to search through proxy servers in order to find
one that works and is NOT blocked.
run proxy_permulation.py:
As in the first case, the proxy addresses on the list at the time of publication
of the book may not work. Therefore, replace them with other ones (which,
as said above, can be found in free lists) before start script.
75
# Import requests package:
import requests
url = 'https://cleantalk.org/blacklists/[email protected]'
try:
response = requests.post(url, proxies=proxies)
print(response.text)
except:
print("No")
You can also use out-of-the-box tools to redirect traffic through proxy
servers:
XX-net
mitmproxy
76
Proxify (very good Go written tool from Projectdiscovery)
Run useragent.py:
import requests
url = 'https://www.whatismybrowser.com/'
# Create list with request headers (now we use only User-Agent header):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
Safari/537.36'
}
77
# Send request:
print(response.text)
As a result, the html code of the page should be displayed, which will
contain the User-Agent specified in the headers passed along with the
request.
78
Day 13. Functions for working with lists
As you have already noticed, lists are a very important element of Python
syntax that we have used in most of our lessons.
Today's lesson will be something of a rest day. We'll take a look at some very
simple but very useful functions for working with lists.
Run array_copy.py:
copy_of_cities = cities.copy()
79
# Print copy of list of USA cities with elements separated by “,”:
Notice that we used a new way to output the array elements (we set up a
separator).
Run array_sort.py:
cities.sort()
# Print result:
80
print(cities)
cities.sort(reverse=True)
# Print results :
print(cities)
And here are two functions for adding elements to an array - insert() and
append().
Run array_insert.py:
# Add item number three and the text Dallas (remember that the count
starts with zero):
81
cities.insert(3,"Dallas")
print(cities)
Run array_append.py:
cities.append("Dallas")
print(cities, sep=";")
82
Just in case, let me explain the difference between insert() and append().
Insert() inserts into an element at a certain location (under a certain number).
And append() adds an element to the end of the array (under the last
number).
Let's finish this lesson with a function that removes an element with a specific
number from an array.
Run array_pop.py:
# Delete from it the element with the index two (remember that counting
starts with zero):
cities.pop(2)
83
# Display the changed list on the screen:
print(cities)
In this course we will return to the topic of arrays and discuss the most
important function for working with them - map(), which is definitely worthy
of its own lesson.
84
Day 14. Working with the file system
In this course we do not deal with databases and we use csv, json, txt files
to store data. And sometimes you may want to write a script that will not
process data from a single file with a specific name, but data from a large
group of files (and sometimes located in different directories).
Therefore, you may need a minimal skills in working with the file system in
Python.
Run print_files_names.py:
import glob, os
# Go to “test_dir” directory:
85
os.chdir("test_dir")
# Search all files with txt extension and print it’s names:
Now let's try to display the contents of the found files using the readlines()
command.
Run print_files_contents.py:
86
# Import glob and os packages:
import glob, os
# Go to “test_dir” directory:
os.chdir("test_dir")
# Search all files with txt extension and print it’s names:
The contents of the files obtained in this way can be automatically analyzed.
The simplest example is to check for the presence of any symbol or word.
Run check_files_contents.py:
87
# Import glob and os packages:
import glob, os
# Go to “test_dir” directory:
os.chdir("test_dir")
current_file_content=current_file.read()
if "2" in current_file_content:
88
print(current_file_content)
Now, let's try to check each file for an email address in its text by using a
regular expression (we learned them in lesson 11).
Run check_re_file_contents.py:
regexp = re.compile('[a-zA-Z0-9-_.]+@[a-zA-Z0-9-_.]+')
# Go to “test_dir” directory:
os.chdir("test_dir")
89
for file in glob.glob("*"):
current_file_content=current_file.read()
if regexp.search(current_file_content):
If you don't know where to get regular expressions for other types of data
(phone numbers, IP addresses, etc.), it means that you weren't paying
attention in Lesson 11 and didn't read the linked article.
90
Day 15. Domain information gathering
This lesson was the hardest to write because it is a topic worthy of a separate
course called "Networking in Python" or "Python for Pentester.
Today I will try to explain the basic terms related to domain research and
show you some Python packages that may be useful for this purpose.
There are many free online services that display Whois data for a particular
domain. For example:
https://who.is/whois/nytimes.com
91
There you can find out the date of registration of the domain, the date of
expiration of the paid period of using the domain, the contact information of
the company or person who currently owns the domain.
There are also services that allow you to find domains associated with a
certain email (e.g. https://www.whoxy.com/) and see the history of whois
data of the domain (https://research.domaintools.com/research/whois-
history/).
Many packages have been created for Python to automate the handling of
WHOIS data. Let’s try Python-Whois package
(https://pypi.org/project/python-whois/).
Run whois_info.py:
92
# Import python-whois package:
import whois
whois_info = whois.whois('sector035.nl')
print (whois_info)
print("Creation date")
print(whois_info["creation_date"])
93
DNS (Domain Name System) is a system for naming computers on the
Internet. It’s a database that maps each numerical Internet address (called
an IP address) with the corresponding domain name. (Techslang)
To see what the DNS data of the domain looks like you can use one of the
free online services (example, https://mxtoolbox.com/).
You can use the DNSPython package to automate the retrieval of DNS
data (https://pypi.org/project/dnspython/).
94
# Import pythondns package and dns.resolver:
import dns
import dns.resolver
print('IP', ipval.to_text())
95
When collecting data about a site, it can be useful to find its subdomains in
order to use them as an additional source of information.
96
discosub run nytimes.com
There is also an option to run Discosub with the results saved to the
nytimes_subdomains.txt file:
97
Sublist3r
SubDomainizer
Subscraper
Also do not forget that there are many APIs for domain information gathering,
on the basis of which you can create your own scripts, ideally suited for the
purposes of your investigation (we talked about it in more detail in lessons 5
and 6).
98
Day 16. List mapping and functions for work with
strings
map() is a function that allows to apply some other function to each element
of a particular list. It opens up huge opportunities for us to work with data.
For example, we can make some calculations with each element of the list,
replace or add some characters in a group of strings, сonvert some values
to other values, decrypt hashes. What to say... You can use map() with
almost any other function.
def doubleNumber(x):
return x * 2
numbers = [2,4,8]
99
# Apply the doubleNumbers function one by one to each element of the list
of numbers:
result = map(doubleNumber,numbers)
print(list(result))
Let's try making the first letters of the words capitalised in each element of
the list.
Run capallwords.py:
# Create a function that capitalizes the first letters in all words that are
passed to it:
100
def capitalizeAllWords(x):
return x.title()
result = map(capitalizeAllWords,cities)
print(list(result))
Run capfirstword.py:
101
# Create a function capitalizerDirstWord that changes the first letter in a
line into a capital letter:
def capitalizeFirstWord(x):
return x.capitalize()
# Run map() function for the list of three U.S. cities and capitalizeFirstWord
function:
result = map(capitalizeFirstWord,cities)
print(list(result))
Another very simple but very useful skill is to replace some characters in a
string with others.
Run replace.py:
102
# Create a replaceDash function that replaces the underscore with the
normal underscore:
def replaceDash(x):
return x.replace("_","-")
words = ["six_pack","king_size","editor_in_chief"]
# Run the map() function with arguments in the form of an array of three
words and the replaceDash function:
result = map(replaceDash,words)
print(list(result))
103
Python has many other functions for working with strings and you can also
install additional packages to extend this functionality.
For example, the Python version of the well-known online data converter
Cyber Shef (https://pypi.org/project/chepy/). It can:
decode URLs;
decode/encode base64;
convert Binary to Decimal and vice versa;
PGP encrypt/decrypt.
and more.
And at the end of this lesson, I want to show another simple but very
important function that knows how to split one line into several according to
the delimiter character.
Run split_string.py:
104
# Create a line of multiple lines, separated by semicolons:
fields = string.split(";")
# Go through the elements of the array in a loop and display each one in
turn:
105
Day 17. Generating documents
One important part of most research or investigation is the writing of the final
report. In case you are creating many similar reports based on data with a
similar structure, you can automate this process using Python. Let's try to
create a simple MS Excel book.
Run create_xlsx.py:
106
import xlsxwriter
workbook = xlsxwriter.Workbook('employees.xlsx')
worksheet = workbook.add_worksheet()
employees = (
['Name', 'Age'],
['John Smith', 33],
['Eric Gold', 26],
['Simon Silver', 37],
['James Conor', 50],
)
# Go through the array and write the employee's name in column a and
age in column B (creating a new row each time you try):
row = 0
# At the lowest row, add the formula for calculating the average age:
107
worksheet.write(row, 1, '=average(B1:B'+str(row-1)+')')
workbook.close()
This is what the finished document would look like if you open it in Excel:
Now let's try to create a Word document. To do this we will use the python-
docx package (https://python-docx.readthedocs.io/en/latest/).
108
pip install python-docx
Run create_docx.py:
# Importing packages:
document = Document()
109
document.add_heading('Report', 0)
document.add_heading('Report', level=1)
document.add_page_break()
document.add_picture('histogram.png', width=Inches(1.25))
document.save('report.docx')
110
In the final part of our lesson, let's see how to generate PDF files. For this
we will use FPDF (https://pyfpdf.readthedocs.io/en/latest/) packages.
Run create_pdf.py:
111
# Importing an FPDF package:
pdfFile = FPDF()
pdfFile.add_page()
112
# Add text to the page, specifying the coordinates of the top and bottom
indents:
# Add an image to the page, specifying its size and the coordinates of the
top and bottom indents:
pdfFile.output("report.pdf")
This is what the result will look like if you open it in a PDF viewer:
113
Microsoft Power Point presentations can also be generated with Python
using the package python-pptx (https://python-
pptx.readthedocs.io/en/latest/).
114
Day 18. Generating charts and maps
Generating diagrams
Run bar.py:
115
# Importing matplotlib and numpy packages:
116
plt.bar(x,y)
plt.title ("Population")
plt.savefig('population_barchart.png')
In the last lesson we created two-dimensional lists without using numpy, but
if you are going to use lists to create any visualizations using Matplotlib, it is
better to use numpy arrays (once again, note that lists and arrays in Python
are different things).
Generating maps
117
Install the basemap package before running script:
Run map.py:
118
# Importing matplotlib, nump, basemap packages:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
map = Basemap()
map.drawcountries()
map.drawcoastlines()
plt.savefig('map.png')
119
This is what the resulting map.png file should look like:
Basemap allows you to load any maps from shape files (with shp
extension). This allows you to make visualizations for individual countries
and regions. You can read more about it here:
https://basemaptutorial.readthedocs.io/en/latest/shapefile.html.
120
Day 19. Wayback Machine and time/date functions
Run download_mementos.py:
121
# Importing pathback and datetime packages:
import wayback
from datetime import datetime
client = wayback.WaybackClient()
memento = client.get_memento(record)
122
# Generate the name of the file in which we will save the HTML code of the
web page copy (replace the link to the page with / to - (so that no error
occurs when saving the file) and add .html extension:
fileName=memento.memento_url.replace("/","-")+".html"
# Open file in which we will save the HTML code of the web page copy:
memento_file.write(memento.text)
# Close file:
memento_file.close()
print (fileName)
Be prepared for the fact that the script may take some time to run.
Note that in the code above, we used datetime() to set the date range for
finding copies of the web page in the web archive.
This is a very important function. Let's look at some examples of how to work
with it.
Run date_time.py:
123
# Import the datetime package (available in Python by default):
import datetime
currentTime = datetime.datetime.now()
print(currentTime)
124
print("Current Month: "+str(currentTime.month))
# Displays current day of the week, day of the month, month and year:
print(currentTime.strftime("%A %d %B %Y"))
125
Day 20. Web apps creation
After posts about any OSINT command line tools, readers sometimes ask
me, "Isn't there a web version of this tool?"
After taking this course, you are unlikely to have serious difficulty using the
command line tools. After all, you now know how easy it is.
And if you've made some useful Python script and you want as many people
as possible to use it, you should consider turning it into a web application.
There are many ways to create web applications in Python. For example,
you can use frameworks such as Django, Flask, Dash, Falcon etc. They are
fairly easy to use and learn, but still, for this course, I chose the easiest and
fastest option - Streamlit package (https://streamlit.io).
126
Now let's launch our first web application:
Run webapp.py (note that we do this in a different way than we did with the
other files in this tutorial):
127
The result should be a simple web application that displays the text entered
by the user after clicking on the Start button.
import streamlit as st
# Create the button, on pressing which the text entered in the text field will
be displayed:
if(st.button("Start")):
128
nickname = textInput.title()
st.text("You entered: "+textInput)
At the end of this lesson I suggest you to read an article in which I tell you
how to use Streamlit to turn Maigret (tool for nickname enumeration) into a
web application:
The easiest way to turn an OSINT Python script into a web application.
Combining Maigret and Streamlit in three simple steps
From it you will learn more about the various useful features of Streamlit.
129
Day 21. Tools to help you work with code
If you write your own Python code after finishing this course, you will probably
run into different problems all the time.
Firstly, you may occasionally have scripts that don't run because of syntax
errors.
But fortunately, they can be found very quickly using numerous online
syntax-checking services. For example, https://extendsclass.com/python-
tester.html.
130
If you want to show your code to someone else and want to make it as
easy to read as possible, I recommend using
https://www.pythonchecker.com.
It will tell you where to put extra spaces and line breaks, and point out
comments that are too long in the code. In other words, it will find problems
that do not hinder the execution of the code, but make it less understandable
for other developers.
131
And, of course, you could just ask ChatGPT to refine your code and suggest
solutions to any problems. It will also help you make sense of other people's
code.
But the most important thing to remember when working with ChatGPT is
that it makes VERY many mistakes.
132
Stackoverflow is the world's largest online community where you can discuss
issues related to programming and computer technology in general. There
are over 2 million Python-related questions already created and you can
really find solutions to most problems.
133
Phind.com is an AI search tool that searches for relevant answers on
Stackoverflow, analyzes them and generates a solution based on them.
Sometimes he also uses other IT sites as sources of information.
134
Sometimes, in order to find the code that performs a particular task in a larger
project, you have to go through a huge Github repository of many related
files.
More tools to make the code easier to work with can be found in this
repository on my Github profile:
https://github.com/cipher387/code-understanding-tools
135
What to do next
It depends on what your goals are. If you are an OSINT specialist, then I
recommend you to practice more investigations and try to use some tools
written in Python (there are a lot of them in my Twitter account), and to get
additional knowledge as needed when you face new tasks.
If your goal is to develop some serious project in Python, you definitely need
additional courses. For example, you can start by taking the free official
course https://www.learnpython.org/ from Python.org and Datacamp.
It's likely that after completing this course, you've had ideas for some new
simple tools for OSINT. You already have enough knowledge to do a lot of
useful things.
If you want to make your tool public and find your users, I recommend
reading this article:
136
Table of contents
137