Automatically Get Top 10 Jobs from LinkedIn Using Python
Last Updated :
21 Mar, 2024
Here we are going to use Clicknium to scrape LinkedIn top 10 jobs. First, we will login to LinkedIn to search the jobs according to the job keyword(the title, the skill, or the company) and the location, and then get the top 10 jobs in the search results. For each job, we will get the job information, such as the title, the company name, the size of the company, the post date, the job type, and the link URL. At last, we will save the results into CSV file.
The steps overview are as below:
- Login to LinkedIn
- Search jobs with the keyword and location
- Scrape the information of the top 10 jobs
- Save search results into csv file
Installation
1.1 Python modules
Clicknium python module provides methods to automate various types of applications in Windows, such as Web browser, Windows Desktop application, Java application and Sap windows GUI app, etc. In this sample, we also use pywin32 python module to get clipboard data, pywin32 python module provides access to many of the Windows APIs from Python.
Install the python libraries with the following commands:
pip install clicknium
pip install pywin32
1.2 Clicknium Visual Studio Code Extension
Clicknium VS Code extension provides ways to install extension with the chosen browser, Clicknium use the browser extension to interact with the browser. It also helps us get elements, edit elements or validate elements easier than before.
Login to LinkedIn
2.1 Capturing Steps using clicknium VS Code extension
Besides writing Python source code to automate the login process and the job search as well as the storing of the data, we also need to capture the web elements on Chrome browser using the clicknium VS Code extension. To launch the extension, press Ctrl+Shift+P to open the command palette and type to select "clicknium capture". This will open a new capture dialog and let the user record web elements using Ctrl+Click. After following the discussed steps as discussed below, click complete and execute the Python source code for clicknium.
Launch Clicknium Capture Dialog2.2 In this section, we will scrape the related elements of the login page
login page2.3 Open the browser with LinkedIn website, input the account username and password and then click the Sign in button
Python3
from clicknium import clicknium as cc, locator
# Create a browser instance with
# "cc.chrome", for edge browser using "cc.edge"
# Open browser with specified url and
# get browser tab For default, it will
# wait the page load completely. You do
# not need to add extra time.sleep()
_tab = cc.chrome.open("https://www.linkedin.com/", is_wait_complete=True)
# Find input box for username
# Fill in with the key value 'linkedin_login_name'
# in setting.json
_tab.find_element(locator.chrome.linkedin.login.login_email).set_text(
Setting.login_name)
# Find input box for password
# Fill in with the key value 'linkedin_login_password'
# in setting.json
_tab.find_element(locator.chrome.linkedin.login.login_password).set_text(
Setting.login_password)
# Find submit button, and click it to login
_tab.find_element(locator.chrome.linkedin.login.signin).click()
# Wait skip add phone button appears in 5 seconds,
# if it exists, click the 'skip' button
_tab.wait_appear(locator.chrome.linkedin.login.skip_add_phone,
wait_timeout=5).click()
Search jobs with the keyword and location
3.1 In this section, we will scrape the related elements of the job search page
job search page3.2 Switch to the Jobs tab, fill out keyword and location of the job, and then click the Search button
Python3
# Wait the page load completely
# after submitting login information
# Find job channel and click it
# to switch to job channel
_tab.wait_appear(locator.chrome.linkedin.job.jobs_channel,
wait_timeout=5).click()
# Wait job search keyword input
# box exists in 10 seconds
# If exists fill in with the key
# value 'linkedin_search_job_key'
# in setting.json
_tab.wait_appear(locator.chrome.linkedin.job.job_search_key,
wait_timeout=10).set_text(Setting.search_job_key)
# Find job search location input box
# Fill in with the key value
# 'linkedin_search_job_location' in setting.json
_tab.find_element(locator.chrome.linkedin.job.job_search_location).set_text(
Setting.search_job_location)
# Find the search button, and click
# it to search
_tab.find_element(locator.chrome.linkedin.job.job_search).click()
Scrape the information of the top 10 jobs
4.1 In this section, we will scrape the elements below:
job detail information4.2 Get the job item from the searching result list with parameter index
Python3
# Here we set range(1,11) to get top
# 10 jobs, it can be set with any value
for i in range(1, 11):
# Wait the job item appears in 5 second,
# and get the element with index value
ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_listitem, {
"index": i}, wait_timeout=5)
4.3 Get the title, the company name, the size of the company, the post date, the job type for each job item
Python3
# Initial job item search dict
details = {}
# Click job item
ele.click()
# Wait job item's title appears in 5 seconds
job_title_ele = _tab.wait_appear(
locator.chrome.linkedin.jobitem.job_title, wait_timeout=5)
# If job item's title exists, get the title
# string and save into result object 'details'
if job_title_ele:
details["Job Title"] = job_title_ele.get_text().strip()
# Wait job item's company name appears in 5 seconds
job_company_ele = _tab.wait_appear(
locator.chrome.linkedin.jobitem.job_company, wait_timeout=2)
# If job item's company name exists, get the company
# name string and save into result object 'details'
if job_company_ele:
details["Company Name"] = job_company_ele.get_text().strip()
# Wait job item's company scale appears in 5 seconds
company_size_ele = _tab.wait_appear(
locator.chrome.linkedin.jobitem.company_size, wait_timeout=2)
# If job item's company scale exists, get the
# company scale string and save into result
# object 'details'
if company_size_ele:
scale = company_size_ele.get_text().strip(
) if "employees" in company_size_ele.get_text() else ""
details["Company Size"] = scale
# Wait job item's post date appears in 5 seconds
job_post_date_ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_post_date,
wait_timeout = 2)
# If job item's post date exists, get
# the post date string and save into
# result object 'details'
if job_post_date_ele:
post_date = job_post_date_ele.get_text().strip() \
if "ago" in job_post_date_ele.get_text() else ""
details["Post Date"] = post_date
# Wait job item's type appears in 5 seconds
job_type_ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_type,
wait_timeout = 2)
# If job item's type exists, get the type string
# and save into result object 'details'
if job_type_ele:
details["Job Type"] = job_type_ele.get_text().strip()
4.4 Get job link
4.4.1 Getting clipboard data with pywin32
Python3
# Library for win32 clipboard api
import win32clipboard
# Get clipboard data
def get_clipboard_data():
try:
# Call open clipboard api
win32clipboard.OpenClipboard()
# Call get clipboard data api, and return the data
data = win32clipboard.GetClipboardData()
return data
except:
# If it got exception, return empty string
return ""
finally:
# Call close clipboard api
win32clipboard.CloseClipboard()
4.4.2 Click the Share button and Copy link button, then get data from clipboard
Python3
# Wait job item's share button appears
# in 5 seconds
job_share_btn_ele = _tab.wait_appear(
locator.chrome.linkedin.jobitem.share_button, wait_timeout=2)
# If job item's share button exists, click
# the share button
if job_share_btn_ele:
job_share_btn_ele.click()
# Wait the copy link button appears in 5 seconds
copy_link = _tab.wait_appear(
locator.chrome.linkedin.jobitem.copy_link, wait_timeout=2)
# If the copy link exists, click the copy
# link to set clipboard data
if copy_link:
copy_link.click()
# Sleep 0.2 second to wait the clipboard
# in ready state
sleep(0.2)
# Get the job link string and save into
# result object 'details'
details["Job Link"] = get_clipboard_data()
Save search results into csv file
5.1 Here is the content in result csv file:
CSV File of Saved Records5.2 Use python built-in module csv to save data into csv file
Python3
# Library for csv operations api
import csv
# Save the list of dicts info csv file
def list_dict_to_csv(dicts, filename="test.csv"):
# Open csv file and get file object
with open(filename, 'w', newline='') as output_file:
# Get csv header with the dicts keys
keys = dicts[0].keys()
# Initial DictWriter object
dict_writer = csv.DictWriter(output_file, keys)
# Write header into csv
dict_writer.writeheader()
# Write row datas into csv
dict_writer.writerows(dicts)
Below is the complete implementation
6.1 sample.py
Python3
# Library for web automation apis
# Locator used for selector reference
from clicknium import clicknium as cc, locator
# Library for delay function
from time import sleep
# Library for save dict list data into csv file
from csvutils import list_dict_to_csv
# Library for clear clipboard and get clipboard data
from clipboard import get_clipboard_data, clear_clipboard_data
# Library for get setting in 'setting.json' file
from setting import Setting
# Login to LinkedIn page
# Find input box for username and password,
# and fill in with the value in setting.json
# Find submit button, and click it to login
# Wait 'skip add phone' button if it needs,
# and click the 'skip' button
def login():
# Find input box for username
# Fill in with the key value
# 'linkedin_login_name' in setting.json
_tab.find_element(locator.chrome.linkedin.login.login_email).set_text(
Setting.login_name)
# Find input box for password
# Fill in with the key value
# 'linkedin_login_password' in setting.json
_tab.find_element(locator.chrome.linkedin.login.login_password).set_text(
Setting.login_password)
# Find submit button, and click it to login
_tab.find_element(locator.chrome.linkedin.login.signin).click()
# Wait skip add phone button appears in 5
# seconds, if it exists, click the 'skip' button
_tab.wait_appear(
locator.chrome.linkedin.login.skip_add_phone, wait_timeout=5).click()
def search_jobs():
# Wait the page load completely after
# submitting login information
# Find job channel and click it to
# switch to job channel
_tab.wait_appear(locator.chrome.linkedin.job.jobs_channel,
wait_timeout=5).click()
# Wait job search keyword input box exists
# in 10 seconds If exists fill in with
# the key value 'linkedin_search_job_key'
# in setting.json
_tab.wait_appear(locator.chrome.linkedin.job.job_search_key,
wait_timeout=10).set_text(Setting.search_job_key)
# Find job search location input box
# Fill in with the key value
# 'linkedin_search_job_location' in setting.json
_tab.find_element(locator.chrome.linkedin.job.job_search_location).set_text(
Setting.search_job_location)
# Find the search button, and click it to search
_tab.find_element(locator.chrome.linkedin.job.job_search).click()
# Scrape the information of the top 10 jobs
# For each job item, get the title,
# the company name, the size of the company,
# the post date, the job type
# Save search results into csv file
def get_job_top10_list():
# Initial search result list
job_list = []
# Clear clipboard data first
clear_clipboard_data()
# Here we set range(1,11) to get top 10 jobs,
# it can be set with any value
for i in range(1, 11):
# Wait the job item appears in 5 second,
# and get the element with index value
ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_listitem, {
"index": i}, wait_timeout=5)
# If job item exists, click the job
# item to get detail information
if ele:
# Initial job item search dict
details = {}
# Click job item
ele.click()
# Wait job item's title appears in 5 seconds
job_title_ele = _tab.wait_appear(
locator.chrome.linkedin.jobitem.job_title, wait_timeout=5)
# If job item's title exists, get
# the title string and save into
# result object 'details'
if job_title_ele:
details["Job Title"] = job_title_ele.get_text().strip()
# Wait job item's company name appears in 5 seconds
job_company_ele = _tab.wait_appear(
locator.chrome.linkedin.jobitem.job_company, wait_timeout=2)
# If job item's company name exists
#, get the company name string and
# save into result object 'details'
if job_company_ele:
details["Company Name"] = job_company_ele.get_text().strip()
# Wait job item's company scale appears in 5 seconds
company_size_ele = _tab.wait_appear(
locator.chrome.linkedin.jobitem.company_size, wait_timeout=2)
# If job item's company scale exists,
# get the company scale string and
# save into result object 'details'
if company_size_ele:
scale = company_size_ele.get_text().strip(
) if "employees" in company_size_ele.get_text() else ""
details["Company Size"] = scale
# Wait job item's post date appears in 5 seconds
job_post_date_ele = _tab.wait_appear(
locator.chrome.linkedin.jobitem.job_post_date, wait_timeout=2)
# If job item's post date exists,
# get the post date string and save
# into result object 'details'
if job_post_date_ele:
post_date = job_post_date_ele.get_text().strip(
) if "ago" in job_post_date_ele.get_text() else ""
details["Post Date"] = post_date
# Wait job item's type appears in 5 seconds
job_type_ele = _tab.wait_appear(
locator.chrome.linkedin.jobitem.job_type, wait_timeout=2)
# If job item's type exists, get the
# type string and save into result
# object 'details'
if job_type_ele:
details["Job Type"] = job_type_ele.get_text().strip()
# Wait job item's share button appears in 5 seconds
job_share_btn_ele = _tab.wait_appear(
locator.chrome.linkedin.jobitem.share_button, wait_timeout=2)
# If job item's share button exists,
# click the share button
if job_share_btn_ele:
job_share_btn_ele.click()
# Wait the copy link button appears in 5 seconds
copy_link = _tab.wait_appear(
locator.chrome.linkedin.jobitem.copy_link, wait_timeout=2)
# If the copy link exists, click the copy
# link to set clipboard data
if copy_link:
copy_link.click()
# Sleep 0.2 second to wait the clipboard in ready state
sleep(0.2)
# Get the job link string and save
# into result object 'details'
details["Job Link"] = get_clipboard_data()
# Save job item's result to list object
job_list.append(details)
# If it has any results, save into the csv file,
# set the file path with the key
# value 'result_csv_file' in setting.json
if job_list:
list_dict_to_csv(job_list, Setting.result_csv_file)
if __name__ == "__main__":
# Create a browser instance with "cc.chrome",
# for edge browser using "cc.edge"
# Open browser with specified url and get browser tab
# For default, it will wait the page load
# completely. You do not need to add extra time.sleep()
_tab = cc.chrome.open("https://www.linkedin.com/", is_wait_complete=True)
# Check whether it needs to login in with username and password
# True: means it needs to login in with username and password
# False: means the website has remember authentication information
if _tab.is_existing(locator.chrome.linkedin.login.login_email):
# Login to LinkedIn
login()
# Search jobs with the keyword and location
search_jobs()
# Get top 10 jobs information from search
# results and save into csv file
get_job_top10_list()
6.2 csvutils.py
Python3
# Library for csv operations api
import csv
# Save the list of dicts info csv file
def list_dict_to_csv(dicts, filename="test.csv"):
# Open csv file and get file object
with open(filename, 'w', newline='') as output_file:
# Get csv header with the dicts keys
keys = dicts[0].keys()
# Initial DictWriter object
dict_writer = csv.DictWriter(output_file, keys)
# Write header into csv
dict_writer.writeheader()
# Write row datas into csv
dict_writer.writerows(dicts)
6.3 clipboard.py
Python3
# Library for win32 clipboard api
import win32clipboard
# Clear clipboard data
def clear_clipboard_data():
try:
# Call open clipboard api
win32clipboard.OpenClipboard()
# Call empty clipboard api
win32clipboard.EmptyClipboard()
finally:
# Call close clipboard api
win32clipboard.CloseClipboard()
# Get clipboard data
def get_clipboard_data():
try:
# Call open clipboard api
win32clipboard.OpenClipboard()
# Call get clipboard data api, and return the data
data = win32clipboard.GetClipboardData()
return data
except:
# If it got exception, return empty string
return ""
finally:
# Call close clipboard api
win32clipboard.CloseClipboard()
6.4 setting.py
Python3
# Library for json operations api
import json
class Setting(object):
# Open json file and get file object
# Load json data
with open("setting.json") as f:
data = json.load(f)
# Value set for LinkedIn login username
login_name = data['linkedin_login_name']
# Value set for LinkedIn login password
login_password = data['linkedin_login_password']
# Value set for LinkedIn job search keyword
search_job_key = data['linkedin_search_job_key']
# Value set for LinkedIn job search location
search_job_location = data['linkedin_search_job_location']
# Value set for csv file path to save search results
result_csv_file = data['result_csv_file']
6.5 setting.json
Python3
{
"linkedin_login_name": "your account username",
"linkedin_login_password": "your account password",
"linkedin_search_job_key": "your desired job title",
"linkedin_search_job_location": "your desired job location",
"result_csv_file": "C:\\test\\test.csv"
}
6.6 Output
Here is the video of the complete execution:
complete execution
Similar Reads
How to get data from LinkedIn using Python
Linkedin is a professional tool that helps connect people of certain industries together, and jobseekers with recruiters. Overall, it is the need of an hour. Do you have any such requirement in which need to extract data from various LinkedIn profiles? If yes, then you must definitely check this art
3 min read
Automate linkedin connections using Python
Automating LinkedIn connections using Python involves creating a script that navigates LinkedIn, finds users based on specific criteria (e.g., job title, company, or location), and sends personalized connection requests. In this article, we will walk you through the process, using Selenium for web a
5 min read
How to get the Daily News using Python
In this article, we are going to see how to get daily news using Python. Here we will use Beautiful Soup and the request module to scrape the data. Modules neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. T
3 min read
How to Make API Call Using Python
APIs (Application Programming Interfaces) are an essential part of modern software development, allowing different applications to communicate and share data. Python provides a popular library i.e. requests library that simplifies the process of calling API in Python. In this article, we will see ho
3 min read
How to get COVID 19 update using Covid module in Python?
A new Python library which tells the COVID-19 related information (country-wise) and it show that how many cases of confirmed, active, deaths, recovered found in that particular Country. Requirement: You have python package named COVID and python >= 3.6 Installation: pip install covid Dependencie
3 min read
How to read Emails from Gmail using Gmail API in Python ?
In this article, we will see how to read Emails from your Gmail using Gmail API in Python. Gmail API is a RESTful API that allows users to interact with your Gmail account and use its features with a Python script. So, let's go ahead and write a simple Python script to read emails. RequirementsPytho
6 min read
Extract Author's information from Geeksforgeeks article using Python
In this article, we are going to write a python script to extract author information from GeeksforGeeks article. Module neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below comman
3 min read
Create Multiple jobs using python-crontab
Cron is a Unix-like operating system software utility that allows us to schedule tasks. Cron's tasks are specified in a Crontab, which is a text file that contains the instructions to run. The Crontab module in Python allows us to handle scheduled operations using Cron. It has functionalities that a
2 min read
Managing Cron Jobs Using Python
Here, we will discover the significance of cron jobs and the reasons you require them in this lesson. You will examine python-crontab, a Python module that allows you to communicate with the crontab. What are Cron and Crontab?The utility known as Cron enables users to automatically run scripts, comm
6 min read
Application to get live USD/INR rate Using Python
In this article, we are going to write a python scripts to get live information of USD/INR rate and bind with it GUI application. Modules Required:bs4: Beautiful Soup is a Python library for pulling data out of HTML and XML files. Installation: pip install bs4requests: This module allows you to send
3 min read