Shamanth Internship Report
Shamanth Internship Report
An Internship Report
Bachelor of Engineering
In
Computer Science & Engineering
Submitted by
Shamanth_N
(1AH16CS036)
Internship carried out
at
WEBLOGIK Pvt Ltd.
WebLogik 29/8, Mirza Hyder Ali Street Royapettah, Chennai - 600 014 India
CERTIFICATE
Certified that the Internship entitled “ONLINE PRICE COMPARISON OF PRODUCTS ON
E-COMMERCE WEBSITES” has been successfully carried out and submitted by
SHAMANTH N (1AH16CS036) a bonafide student of ACS COLLEGE OF ENGINEERING
affiliated to Visvesvaraya Technological University, Belgaum during the year 2019-2020. It is
certified that all corrections/suggestions indicated for Internal Assessment have been
incorporated in the report submitted to the department. This report has been approved as it
satisfies the academic requirements in respect of internship work prescribed in 8th semester for
the said degree.
External Viva
1.________________ 1.________________
2.________________ 2.________________
CERTIFICATE OF THE COMPANY
ACKNOWLEDGMENT
I express my deepest regards to the Honorable Chairman Sri A.C. Shanmugam for providing us
the opportunity to fulfil our ambition in his institute.
I express my sincere regards and thanks to the coordinator Dr. Jyoti Metan, Assistant Professor,
Computer science and Engineering Department, ACSCE, Bengaluru for the encouragement and
support throughout the work
With profound sense of gratitude I acknowledge the guidance, support and encouragement of the
guide Mrs. Sunita Chalageri, Assistant professor of Computer Science and Engineering
Department, ACSCE, Bengaluru.
Shamanth_N
(1AH16CS036)
ABSTRACT
Web scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting etc.) is a
technique employed to extract large amounts of data from websites whereby the data is extracted
and saved to a local file in your computer or to a database in table (spreadsheet) format. Data
displayed by most websites can only be viewed using a web browser. They do not offer the
functionality to save a copy of this data for personal use. The only option then is to manually
copy and paste the data - a very tedious job which can take many hours or sometimes days to
complete. Web scraping is the technique of automating this process, so that instead of manually
copying the data from websites, the Web scraping software will perform the same task within a
fraction of the time. Employing this technique, the following projects was taken up, where the
most popular e-commerce websites such as amazon and flipkart are scraped for the top 3 to 4
results of a product in search and return its name and price in a sort once completed
TABLE OF CONTENT
REFERENCES 25
LIST OF FIGURES
CHAPTER – 01
COMPANY PROFILE
The company is an expert in offering the following services:
1. Web design
2. Web development
3. Software services
4. Search engine optimization
5. Digital marketing
Web LogiK offers Web services such as web designing, Responsive Web design, e-commerce
website design, web application development, domain hosting and website maintenance services.
It offers search engine optimization, Digital Media Advertisement and Marketing, Product
Photography Web Hosting, Inventory/Catalog Management, Digital Art, Mobile Application
Development, Innovation Consulting, Business Networking & Consulting
Web Design
Web design encompasses many different skills and disciplines in the production and
maintenance of websites. The different areas of web design include web graphic design;
interface design authoring, including standardised code and proprietary software; user experience
design; and search engine optimization. Often many individuals will work in teams covering
different aspects of the design process, although some designers will cover them all. The term
"web design" is normally used to describe the design process relating to the front-end (client
side) design of a website including writing markup. Web design partially overlaps web
engineering in the broader scope of web development. Web designers are expected to have an
awareness of usability and if their role involves creating markup then they are also expected to
be up to date with web accessibility guidelines.
Web Development
Web development is the work involved in developing a website for the Internet (World wide
web) or an intranet (a private network). Web development can range from developing a simple
single static page of plane text to complex web-based internet applications (web
apps),electronic business , and social network service . A more comprehensive list of tasks to
which web development commonly refers, may include web engineering , web design , web
content development, client liaison, client-side/server-side scripting, web server and network
security configuration, and e-commerce development.
Among web professionals, "web development" usually refers to the main non-design aspects of
building websites: writing markup and coding Web development may use content management
systems (CMS) to make content changes easier and available with basic technical skills.
For larger organizations and businesses, web development teams can consist of hundreds of
people (web Developers) and follow standard methods like Agile methodologies while
developing websites. Smaller organizations may only require a single permanent or contracting
developer, or secondary assignment to related job positions such as a graphic designer or
information systems technician. Web development may be a collaborative effort between
departments rather than the domain of a designated department. There are three kinds of web
developer specialization: front-end developer , back-end developer, and full-stack developer.
Front-end developers are responsible for behavior and visuals that run in the user browser, while
back-end developers deal with the servers.
Software services
In reference to computer software, a service is software that performs automated tasks, responds
to hardware events, or listens for data requests from other software. In a user's operating system,
these services are often loaded automatically at startup, and run in the background, without user
interaction. For example, in Microsoft Windows, many services are loaded to accomplish
different functions. They respond to user keyboard shortcuts, index and optimize the file system,
and communicate with other devices on the local network. An example of a Windows service is
Messenger, which allows users to send messages to other Windows users.
Search engine optimization (SEO) is the process of increasing the quality and quantity of website
traffic by increasing the visibility of a website or a web page to users of a web search engine.
SEO refers to the improvement of unpaid results (known as "natural" or "organic" results) and
excludes direct traffic/visitors and the purchase of paid placement.
SEO may target different kinds of searches, including image search video search, academic
search, news search, and industry-specific vertical search engines.
Optimizing a website may involve editing its content, adding content, and modifying HTML and
associated coding to both increase its relevance to specific keywords and remove barriers to the
indexing activities of search engines like Google, Yahoo etc. Promoting a site to increase the
number of back links, or inbound links, is another SEO tactic. By May 2015, mobile search had
surpassed desktop search.
As an Internet marketing strategy, SEO considers how search engines work, the computer-
programmed algorithms that dictate search engine behavior, what people search for, the actual
search terms or keywords typed into search engines, and which search engines are preferred by
their targeted audience. SEO is performed because a website will receive more visitors from a
search engine the higher the website ranks in the search engine results page (SERP). These
visitors can then be converted into customers.
SEO differs from local search engine optimization in that the latter is focused on optimizing a
business' online presence so that its web pages will be displayed by search engines when a user
enters a local search for its products or services. The former instead is more focused on national
or international searches.
Digital marketing
Digital marketing is the component of marketing that utilizes internet and online based digital
technologies such as desktop computers, mobile phones and other digital media and platforms to
promote products and services. Its development during the 1990s and 2000s, changed the way
brands and businesses use technology for marketing. As digital platforms became increasingly
incorporated into marketing plans and everyday life, and as people increasingly use digital
devices instead of visiting physical shops, digital marketing campaigns have become prevalent,
employing combinations of search engine optimization (SEO), search engine marketing (SEM),
content marketing, influencer marketing , content automation, campaign marketing, data-driven
marketing, e-commerce marketing, social media marketing, social media optimization, e-mail
direct marketing, display advertising, e-books, and optical disks, and optical disks and games
have become commonplace. Digital marketing extends to non-Internet channels that provide
digital media, such as television, mobile phones (SMS and MMS), callback, and on-hold mobile
ring tones. The extension to non-Internet channels differentiates digital marketing from online
marketing.
webLogiK is a software company based in the following address: WebLogik 29/8, Mirza Hyder Ali
Street Royapettah, Chennai - 600 014 India.
Web Logik answers the following questions about them while stating why their services are
essential
Who We Are?
• We Are One-Stop Solution Providers For All Your Technology Driven Business Needs.
• We Experts In Web & E-Commerce Development Solutions
• We Are A Creative Team With Vibrant Technical Expertise
• We Deliver Quality Technology Solutions To Business Problems
• We Provider Of A Wide Array Of Digital Marketing Solutions
Why Choose Us
We know there are a lot of freelancers and web design firms around every corner and it may be
difficult for you to select the best technology partner. If you are looking for an agency who can
handle your website needs from design through development and into on-going promotion, you
have come to the right place. we will not disappoint. What sets us apart from other web design
firms is that we focus on your business aspects and aspirations and translate that into appropriate
technology solution.
Our team takes a balanced, customized approach to each project from the discovery all the way
to launch—and beyond. We don't just create beautiful websites, we create successful business
website solutions. Our web design portfolio speaks for itself and we are more than happy to
provide references.
While we bring the technology expertise to the table, You are the master of your business and
know it better than anyone, so our first step is to listen while you tell us about your business,
market scenario, future plans and so on. Then we dig deep; do necessary research on our own
end so that we really get to know the ins-and-outs of your industry and what to get going for you
and your audience. Armed with this knowledge, WebLogik team will help you define what
makes you unique and translate that into technology driven business solution.
WebLogik team is a healthy blend of creative czars and technomaniacs. We do not churn out
fancy web pages and widget to every client. We work in sync with client to understand each their
industry and their online marketing goals, so that we may bring fresh yet practical ideas to the
table—every time. The objective such efforts is client success and satisfaction—today, tomorrow
and ever after. We don’t create a website that you’re happy with today. We deliver you a
technology driven business solutions which is easy to manage in days to come and delivers right
returns.
CHAPTER – 02
The tools and technology used in this project are those mentioned below:
• Selenium WebDriver
• Python
• Chrome developer tool
• Visual studio code
Selenium:
Python is used along with the Selenium WebDriver client library to create automated scripts.
Python is a widely used general-purpose, high-level programming language. It’s easy and its
syntax allows us to express concepts in fewer lines of code. It emphasizes code readability and
provides constructs that enable us to write programs on both the small and large scale. It also
provides a number of in-built and user-written libraries to achieve complex tasks quite easily.
The Selenium WebDriver client library for Python provides access to all the Selenium
WebDriver features and Selenium standalone server for remote and distributed testing of
browser-based applications. Selenium Python language bindings are developed and maintained
by David Burns, Adam Goucher, Maik Röder, Jason Huggins, Luke Semerau, Miki Tebeka, and
Eric Allenin.
Components of selenium:
Selenium is composed of several components with each taking on a specific role in aiding the
development of web application test automation
Selenium IDE
Selenium IDE is a complete integrated development environment (IDE) for Selenium tests. It is
implemented as a Firefox Add-on and as a Chrome extension. It allows for recording, editing and
debugging of functional tests. It was previously known as Selenium Recorder. Selenium-IDE
was originally created by Shinya Kasatani and donated to the Selenium project in 2006.
Selenium IDE was previously little-maintained. Selenium IDE began being actively maintained
in 2018.
Scripts may be automatically recorded and edited manually providing autocomplition support
and the ability to move commands around quickly. Scripts are recorded in Selenese, a special test
scripting language for Selenium. Selenese provides commands for performing actions in a
browser (click a link, select an option) and for retrieving data from the resulting pages.
The 2.x version of the Selenium IDE for Firefox stopped working after the Firefox 55 upgrade
and has been replaced by Selenium IDE 3.x.
In addition to the official Selenium IDE project, two alternative Selenium IDE browser
extensions are actively maintained: Kantu (Open-Source GPL license) and Katalon Recorder.
As an alternative to writing tests in Selenese, tests can also be written in various programming
languages. These tests then communicate with Selenium by calling methods in the Selenium
Client API. Selenium currently provides client APIs for Java, C#,Ruby, JavaScript and Python
With Selenium 2, a new Client API was introduced (with WebDriver as its central component).
However, the old API (using class Selenium) is still supported.
Selenium WebDriver
Selenium WebDriver is the successor to Selenium RC. Selenium WebDriver accepts commands
(sent in Selenese, or via a Client API) and sends them to a browser. This is implemented through
a browser-specific browser driver, which sends commands to a browser and retrieves results.
Most browser drivers actually launch and access a browser application (such as Firefox, Google
Chrome, Internet Explorer, Safari or Microsoft Edge); there is also an HtmlUnt browser driver,
which simulates a browser using the headless browser HtmlUnit.
Unlike in Selenium 1, where the Selenium server was necessary to run tests, Selenium
WebDriver does not need a special server to execute tests. Instead, the WebDriver directly starts
a browser instance and controls it. However, Selenium Grid can be used with WebDriver to
execute tests on remote systems (see below). Where possible, WebDriver uses native operating
system level functionality rather than browser-based JavaScript commands to drive the browser.
This bypasses problems with subtle differences between native and JavaScript commands,
including security restrictions.
In practice, this means that the Selenium 2.0 API has significantly fewer calls than does the
Selenium 1.0 API. Where Selenium 1.0 attempted to provide a rich interface for many different
browser operations, Selenium 2.0 aims to provide a basic set of building blocks from which
developers can create their own Domain specific language . One such DSL already exists:
the Watir project in the Ruby language has a rich history of good design. Watir-webdriver
implements the Watir API as a wrapper for Selenium-Webdriver in Ruby. Watir-webdriver is
created entirely automatically, based on the WebDriver specification and the HTML
specification.
As of early 2012, Simon Stewart (inventor of WebDriver), who was then with Google and now
with Facebook, and David Burns of Mozilla were negotiating with the W3C to make WebDriver
an internet standard. In July 2012, the working draft was released and the recommendation
followed in June 2018. Selenium-Webdriver (Selenium 2.0) is fully implemented and supported
in Python, Ruby, Java and C#.
Selenium Remote Control (RC) is a server, written in Java, that accepts commands for the
browser via HTTP. RC makes it possible to write automated tests for a web application in any
programming language, which allows for better integration of Selenium in existing unit test
frameworks. To make writing tests easier, Selenium project currently provides client drivers for
PHP, Python, Ruby, NET, Perl and Java. The Java driver can also be used with JavaScript (via
the Rhino engine). An instance of selenium RC server is needed to launch html test case - which
means that the port should be different for each parallel run. However, for Java/PHP test case
only one Selenium RC instance needs to be running continuously.
Selenium Remote Control was a refactoring of Driven Selenium or Selenium B designed by Paul
Hammant, credited with Jason as co-creator of Selenium. The original version directly launched
a process for the browser in question, from the test language of Java, .Net, Python or Ruby. The
wire protocol (called 'Selenese' in its day) was reimplemented in each language port. After the
refactor by Dan Fabulich and Nelson Sproul (with help from Pat Lightbody) there was an
intermediate daemon process between the driving test script and the browser. The benefits
included the ability to drive remote browsers and the reduced need to port every line of code to
an increasingly growing set of languages. Selenium Remote Control completely took over from
the Driven Selenium code-line in 2006. The browser pattern for 'Driven'/'B' and 'RC' was
response/request, which subsequently became known as Comet.
With the release of Selenium 2, Selenium RC has been officially deprecated in favor of Selenium
WebDriver.
Selenium Grid
Selenium Grid is a server that allows tests to use web browser instances running on remote
machines. With Selenium Grid, one server acts as the hub. Tests contact the hub to obtain access
to browser instances. The hub has a list of servers that provide access to browser instances
(WebDriver nodes), and lets tests use these instances. Selenium Grid allows running tests in
parallel on multiple machines and to manage different browser versions and browser
configurations centrally (instead of in each individual test).
The ability to run tests on remote browser instances is useful to spread the load of testing across
several machines and to run tests in browsers running on different platforms or operating
systems. The latter is particularly useful in cases where not all browsers to be used for testing can
run on the same platform.
Python:
Python is an interpreted, high level, general purpose programing language. Created by Guido van
Rossum and first released in 1991, Python's design philosophy emphasizes code reliability with
its notable use of significant white space. Its language constructs and object oriented approach
aim to help programmers write clear, logical code for small and large-scale projects.
Python is dynamically typed and garbage collected It supports multiple programing paradigms,
including procedural, object-oriented, and functional programing. Python is often described as a
"batteries included" language due to its comprehensive standard library.
Python was conceived in the late 1980s as a successor to the ABC language. Python 2.0, released
in 2000, introduced features like list comprehensions and a garbage collections system capable of
collecting reference cycles. Python 3.0, released in 2008, was a major revision of the language
that is not completely backward compatible, and much Python 2 code does not run unmodified
on Python 3.
The Python 2 language, i.e. Python 2.7.x, was officially discontinued on 1 January 2020 (first
planned for 2015) after which security patches and other improvements will not be released for
it. With Python 2's end-of-life, only Python 3.5.x[32] and later are supported.
Python interpreters are available for many operating systems. A global community of
programmers develops and maintains CPython, an open source reference implementation. A
non-profit organization, the python software foundation, manages and directs resources for
Python and CPython development.
Web development tools allow web developers to test and debug their code. They are different
from website builders and integrated development environments in that they do not assist in the
direct creation of a webpage, rather they are tools used for testing the user interface of a website
or web application
Visual Studio Code is a source-code editor developed by Microsoft for Windows, Linux and
macOS. It includes support for debugging, embedded Git control and GitHub, syntax
highlighting, intelligent code completion, snippets, and code refactoring
XPath:
XPath (XML Path Language) is a query language for selecting nodes from an XML document.
In addition, XPath may be used to compute values (e.g., string, numbers, or Boolean values)
from the content of an XML document. XPath was defined by the World Wide Web
Consortium (W3C).
In the full, unabbreviated syntax, the two examples above would be written
/child::A/child::B/child::C
child::A/descendant-or-self::node()/child::B/child::*[position()=1]
Here, in each step of the XPath, the axis (e.g. child or descendant-or-self ) is explicitly
specified, followed by :: and then the node test, such as A or node() in the examples above.
Axis specifiers indicate navigation direction within the tree representation of the XML
document. The axes available are:
CHAPTER – 03
TASK PERFORMED
Web Scraping:
Web scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting etc.) is a
technique employed to extract large amounts of data from websites whereby the data is extracted
and saved to a local file in your computer or to a database in table (spreadsheet) format. Data
displayed by most websites can only be viewed using a web browser. They do not offer the
functionality to save a copy of this data for personal use. The only option then is to manually
copy and paste the data - a very tedious job which can take many hours or sometimes days to
complete. Web scraping is the technique of automating this process, so that instead of manually
copying the data from websites, the Web scraping software will perform the same task within a
fraction of the time. Employing this technique, the following projects was taken up, where the
most popular e-commerce websites such as amazon and flipkart are scraped for the top 3 to 4
results of a product in search and return its name and price in a sort once completed
The project is about scraping the chosen websites and fetching the necessary data from
those websites.
The user enters the proposed website and chooses from it the websites which he wants to
compare the product with
Once the above mentioned steps are completed, he will then be displayed the details of
the products, such as name, price and picture
The role of the candidate in designing this product is to write down the code for scraping through
websites amazon and flipcart and fetch the data asked for.
Finding elements using the ID is the most preferable way to find elements on a page. The
find_element_by_id() and find_elements_by_id() methods return an element or a set of elements
that have matching ID attribute values.
The find_element_by_id() method returns the first element that has a matching ID attribute
value. If no element with matching ID attribute is found, a NoSuchElementException will be
raised.
Let’s try finding the search textbox from the sample application as shown in the following
screenshot:
Here is the HTML code for the search textbox with an ID attribute value defined as search:
Here is a test that uses the find_element_by_id() method to find the search textbox and check its
maxlength attribute. We will pass the ID attribute’s value, search, to the find_element_by_id()
method:
The find_elements_by_id() method returns all the elements that have the same ID attribute
values.
Finding elements by the class name: Apart from using the ID and name attributes, we can also
use the class attributes to find elements. The class attribute is used to apply CSS to an element.
The find_element_by_class_name() and find_elements_by_class_name() methods return
element(s) that have matching class attribute value. If no element is found with the matching
name attribute value, a NoSuchElementException will be raised.
Let’s create a test that finds the search button element using its class attribute value and check
whether it is enabled as shown in following code:
The find_elements_by_class_name() method returns all the elements that have the identical
class name attribute values.
We will use the find_elements_by_tag_name() method to get all the images. In this example, we
will first find the list of banners implemented as <ul> or unordered lists using the
find_element_by_class_name() method and then get all the <img> or image elements by calling
the find_elements_by_tag_name() method on the banners list:
Finding element by XPath: XPath is a query language used to search and locate nodes in an
XML document. All the major web browsers support XPath. Selenium can leverage and use
powerful XPath queries to find elements on a web page. One of the advantages of using XPath is
when we can’t find a suitable ID, name, or class attribute value for the element. We can use
XPath to either find the element in absolute terms or relative to an element that does have an ID
or name attribute. We can also use defined attributes other than the ID, name, or class with
XPath queries. We can also find elements with the help of a partial check on attribute values
using XPath functions such as starts-with(), contains(), and ends-with(). Here is how the HTML
code looks like:
Let’s create a test that uses the find_element_by_xpath() method. We are using a relative XPath
query to find this <img> tag using its alt attribute (this is how we can use ID, name, and class
attributes as well as other attributes such as title, type, value, alt, and so on within XPath
queries):
The find_elements_by_xpath() method returns all the elements that match the XPath query.
Finding elements by using CSS selectors: CSS is a style sheet language used by web designers
to describe the look and feel of an HTML document. CSS is used to define various style classes
that can be applied to elements for formatting. CSS selectors are used to find HTML elements
based on their attributes such as ID, classes, types, attributes, or values and much more to apply
the defined CSS rules.
The find_element_by_css_selector() and find_elements_by_css_selector() methods return
element(s) that are found by the specified CSS selector. On the home page of the sample
application, we can see the shopping cart icon. When we click on the icon, we can see the
shopping cart. When there are no items added to the shopping cart, a message should be
displayed saying You have no items in your shopping cart, as shown in the following
screenshot:
Let’s create a test to validate this message. We will use CSS selectors to find the shopping cart
icon, click on it, and then find the shopping cart message implemented in the <p> or paragraph
element:
We used the element tag along with the class name in this example. For example, to get
the shopping cart icon, we used the following selector: shopping_cart_icon = self.driver.\
find_element_by_css_selector(“div.header-minicart span.icon“) This will first find a <div>
element with the header_minicart class name and then find a <span> element under this div,
which has icon as its class name.
CHAPTER – 04
RESULTS
CHAPTER – 05
CONCLUSION
Web technology is a huge umbrella sheltering a huge verity of tools and sub-technologies under
it. It has tools that are suitable for both designing and scraping of the websites.
Designing tools are those which are used to build a website, tools such as HTML, CSS and
javascript are used for the purpose. Javascript adds the dynamic nature to the given website,
whereas CSS is used to style the website for improved user experience. HTML provides the
basic structure for the whole design.
Web-scraping is such a technology which helps extract the data from the already designed
website, tools such as Beautifulsoap, selenium are used for this purpose.
This project employs selenium to accomplish the given purpose, but it is appreciated to use other
efficient tools such as beautifulsoap. Selenium was used only to make the job easy and beginner
friendly.
REFERENCES
[1] Book: Learning selenium testing tool with python by Ummesh Gundecha
[2] Selenium testing tools cook book second edition by Ummesh Gundecha
[3] Youtube: Python Selenium | Python Selenium Webdriver Tutorial | Python Tutorial |
Python Training | Edureka: https://www.youtube.com/watch?v=oM-yAjUGO-E