0% found this document useful (0 votes)
179 views

Seminar Report Mine

This document discusses CAPTCHAs, which are tests used to distinguish humans from machines on the internet. It begins with an introduction explaining the increased use of the internet and need for security. It then defines CAPTCHAs as tests to tell computers and humans apart. The document outlines three types of CAPTCHAs: text-based, graphic-based, and audio-based. It also discusses major applications of CAPTCHAs, including preventing comment spam, protecting website registrations, and protecting email addresses from scrapers. The document provides examples and steps for different CAPTCHA types.

Uploaded by

sachin17586
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views

Seminar Report Mine

This document discusses CAPTCHAs, which are tests used to distinguish humans from machines on the internet. It begins with an introduction explaining the increased use of the internet and need for security. It then defines CAPTCHAs as tests to tell computers and humans apart. The document outlines three types of CAPTCHAs: text-based, graphic-based, and audio-based. It also discusses major applications of CAPTCHAs, including preventing comment spam, protecting website registrations, and protecting email addresses from scrapers. The document provides examples and steps for different CAPTCHA types.

Uploaded by

sachin17586
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 21

UE6858

“CAPTCHAS”

“CAPTCHAS”
Submitted in fulfillment of Seminar required for
the
Bachelor of Engineering (B.E)
In
Information Technology
By

Sachin Narang
UE6858, 8th Semester
Panjab University

Under the Supervision


Of
Ms. Roopali Garg
Associate Professor, UIET

UE6858 Page 1
UE6858
“CAPTCHAS”

Contents

S.N Topic Page No.


o
1 Cover Page 1

2 Contents 2

3 Acknowledgment 3

4 Declaration 4

5 Certificate 5

4 Introduction 6

5 Why use CAPTCHAS 7

6 Definitions 8

7 Types of CAPTCHAS 9

8 Major Areas Of 11
Applications

9 ReCAPTCHA 14

10 Breaking of CAPTCHAS 16

11 New Proposed Approaches 17

12 Conclusion 19

13 Bibliography 20

Acknowledgement

UE6858 Page 2
UE6858
“CAPTCHAS”

This is to thank all those who supported and helped me


throughout the commencement of this seminar report. I
would like to thank specially my teacher in-charge, Ms.
Roopali Garg for her continuous guidance. I would also like
to thank my friends for their encouragement. Also, each time
they found a mistake and suggested a correction and led this
seminar to perfection.

Sachin Narang

B.E, I.T, UE6858

U.I.E.T

Declaration

UE6858 Page 3
UE6858
“CAPTCHAS”

I hereby declare that the work which is being presented in


this seminar report on ‘CAPTCHAS’ submitted at U.I.E.T.,
Panjab University is an authentic work presented by Mr.
Sachin Narang (UE6858) of B.E. (I.T.) 8th semester under
the supervision of Ms. Roopali Garg.

Sachin Narang

B.E, I.T, UE6858

U.I.E.T

Certificate

UE6858 Page 4
UE6858
“CAPTCHAS”

This is to certify that Mr. Sachin Narang, UE6858 , B.E.


(I.T.) 8th Semester have completed seminar report, in
accordance with the requirement for qualifying 8th semester,
on CAPTCHAS under the guidance of Ms. Roopali Garg.

Roopali Garg

Associate Professor

(Teacher In-Charge)

Introduction

UE6858 Page 5
UE6858
“CAPTCHAS”

Use of INTERNET has remarkably increased


Globally in the past 10-12 years and so is the
need of the Security over it. Marketing and
Advertisement over INTERNET has seen
companies like GOOGLE being made, which at
the moment is traded at 181 billion USD ie.
Almost twice of General Motors, McDonalds
combined.
Well this presentation is about Security
achieved over Internet using CAPTCHAS.
CAPTCHAS are basically software programs
which act as a test to any user over internet
that the person (user) is a human or another
machine. This concept is used by all the big
companies over internet Google, yahoo or
facebook (name any).So what are these
CAPTCHAS? And what are their possible
applications? This is what we cover in our
presentation.

UE6858 Page 6
UE6858
“CAPTCHAS”

Why USE CAPTCHAS


Well to completely understand its usage one can
consider this story. Few years ago(November 99)
www.Slashdot.org(a popular site in US)
conducted following poll on internet.

Now students at CMU and MIT instantly wrote


a program which increased their vote counts
using software and ultimately the poll had to
be taken down because both MIT and CMU
had millions of votes while others struggled to
reach thousands.

UE6858 Page 7
UE6858
“CAPTCHAS”

There are situations like these where you


need to distinguish whether user is a machine
or a computer. This is where we use
CAPTCHAS.

DEFINITIONS

CAPTCHA stands for


Completely Automated Public Turing test to tell
Computers and Humans Apart
A.K.A. Reverse Turing Test, Human Interaction
Proof

Turing Test: to conduct this test two people


and a machine is needed here one person acts as
an interrogator sitting in a separate room asking
questions and receiving responses and goal of
machine is to fool the interrogator.

The challenge here: develop a software


program that can create and grade challenges
most humans can pass but computers cannot.

UE6858 Page 8
UE6858
“CAPTCHAS”

Types of CAPTCHAS
There are basically 3 types of CAPTCHAS
1.Text Based: These are the most commonly
used CAPTCHAS. It can be further be divided into
3 parts:
GIMPY : Initially used by yahoo ,in this CAPTCHA
two steps are followed as:
a) Pick a word or words from a small dictionary
b) Distort them and add noise and background
GIMPY-R: This was used by google and was
basically a simple advance over gimpy. Here
instead of a complete word individual letters are
noised instead of complete words. steps followed
are as
a) Pick random letters
b) Distort them, add noise and background

UE6858 Page 9
UE6858
“CAPTCHAS”

SIMARD’S: here further advances made and arcs


being made into it ie. Curved geometrical
shapes. Hence steps followed are as
a)Pick random letters and numbers
b)Distort them and add arcs

2. Graphic Based CAPTCHAS :These are


based on graphics ie. Images symbols and again is
of two types:
Bongo
Following steps are followed in BONGO CAPTCHAS
as:
a)Display two series of blocks
b)User must find the characteristic that sets the
two series apart
c)User is asked to determine which series each of
four single blocks belongs to.
PIX
This is the second kind of graphics CAPTCHA using
distorted images. Steps followed in its usage are as
a) Create a large database of labeled images
b) Pick a concrete object
c) Pick four images of the object from the images
database

UE6858 Page 10
UE6858
“CAPTCHAS”

d) Distort the images


e) Ask the user to pick the object for a list of words

3.Audio Based CAPTCHAS:


These are based on humans ability to depict
sounds that may be distorted, following algorithm
is followed in using it:
a) Pick a word or a sequence of numbers at
random
b) Render them into an audio clip using a TTS
software
c) Distort the audio clip
d) Ask the user to identify and type the word or
numbers
MAJOR AREAS OF APPLICATIONS:
CAPTCHAs have several applications for practical security,
including (but not limited to):

• Preventing Comment Spam in Blogs. Most bloggers are


familiar with programs that submit bogus comments, usually
for the purpose of raising search engine ranks of some
website (e.g., "buy penny stocks here"). This is called
comment spam. By using a CAPTCHA, only humans can
enter comments on a blog. There is no need to make users
sign up before they enter a comment, and no legitimate
comments are ever lost!

UE6858 Page 11
UE6858
“CAPTCHAS”

• Protecting Website Registration. Several companies


(Yahoo!, Microsoft, etc.) offer free email services. Up until a
few years ago, most of these services suffered from a
specific type of attack: "bots" that would sign up for
thousands of email accounts every minute. The solution to
this problem was to use CAPTCHAs to ensure that only
humans obtain free accounts. In general, free services
should be protected with a CAPTCHA in order to prevent
abuse by automated scripts.

• Protecting Email Addresses From Scrapers. Spammers


crawl the Web in search of email addresses posted in clear
text. CAPTCHAs provide an effective mechanism to hide
your email address from Web scrapers. The idea is to
require users to solve a CAPTCHA before showing your
email address. A free and secure implementation that uses
CAPTCHAs to obfuscate an email address can be found at
reCAPTCHA MailHide.

• Online Polls. In November 1999, http://www.slashdot.org


released an online poll asking which was the best graduate
school in computer science (a dangerous question to ask
over the web!). As is the case with most online polls, IP
addresses of voters were recorded in order to prevent single
users from voting more than once. However, students at
Carnegie Mellon found a way to stuff the ballots using
programs that voted for CMU thousands of times. CMU's
score started growing rapidly. The next day, students at MIT
wrote their own program and the poll became a contest

UE6858 Page 12
UE6858
“CAPTCHAS”

between voting "bots." MIT finished with 21,156 votes,


Carnegie Mellon with 21,032 and every other school with
less than 1,000. Can the result of any online poll be trusted?
Not unless the poll ensures that only humans can vote.

• Preventing Dictionary Attacks. CAPTCHAs can also be


used to prevent dictionary attacks in password systems. The
idea is simple: prevent a computer from being able to iterate
through the entire space of passwords by requiring it to solve
a CAPTCHA after a certain number of unsuccessful logins.
This is better than the classic approach of locking an
account after a sequence of unsuccessful logins, since doing
so allows an attacker to lock accounts at will.

• Search Engine Bots. It is sometimes desirable to keep


webpages unindexed to prevent others from finding them
easily. There is an html tag to prevent search engine bots
from reading web pages. The tag, however, doesn't
guarantee that bots won't read a web page; it only serves to
say "no bots, please." Search engine bots, since they usually
belong to large companies, respect web pages that don't
want to allow them in. However, in order to truly guarantee
that bots won't enter a web site, CAPTCHAs are needed.

• Worms and Spam. CAPTCHAs also offer a plausible


solution against email worms and spam: "I will only accept

UE6858 Page 13
UE6858
“CAPTCHAS”

an email if I know there is a human behind the other


computer." A few companies are already marketing this idea

ReCAPTCHA
ReCAPTCHA is a free CAPTCHA service that helps to digitize
books, newspapers and old time radio shows

About 200 million CAPTCHAs are solved by humans around the


world every day. In each case, roughly ten seconds of human
time are being spent. Individually, that's not a lot of time, but in
aggregate these little puzzles consume more than 150,000 hours
of work each day. What if we could make positive use of this

UE6858 Page 14
UE6858
“CAPTCHAS”

human effort? ReCAPTCHA does exactly that by channeling the


effort spent solving CAPTCHAs online into "reading" books.

To archive human knowledge and to make information more


accessible to the world, multiple projects are currently digitizing
physical books that were written before the computer age. The
book pages are being photographically scanned, and then
transformed into text using "Optical Character Recognition"
(OCR). The transformation into text is useful because scanning a
book produces images, which are difficult to store on small
devices, expensive to download, and cannot be searched. The
problem is that OCR is not perfect.

ReCAPTCHA improves the process of digitizing books by sending


words that cannot be read by computers to the Web in the form of
CAPTCHAs for humans to decipher. More specifically, each word
that cannot be read correctly by OCR is placed on an image and
used as a CAPTCHA. This is possible because most OCR
programs alert you when a word cannot be read correctly.

But if a computer can't read such a CAPTCHA, how does the


system know the correct answer to the puzzle? Here's how: Each
new word that cannot be read correctly by OCR is given to a user
in conjunction with another word for which the answer is already
known. The user is then asked to read both words. If they solve
the one for which the answer is known, the system assumes their
answer is correct for the new one. The system then gives the new
image to a number of other people to determine, with higher
confidence, whether the original answer was correct

UE6858 Page 15
UE6858
“CAPTCHAS”

BREAKING OF CAPTCHAS
There are two methods used till now to break these
CAPTCHAS one uses decoding software’s which removes
noise and other uses humans
1. Some text based CAPTCHAs have been broken by
software which has 3 properties as :

UE6858 Page 16
UE6858
“CAPTCHAS”

PreProcessing : Removal of background clutter and


noise
Segmentation : Splitting the image into regions which
each contain a single character.
Classification: Identifying the character in each region

2. Other CAPTCHAs can be broken by streaming the


tests for unsuspecting users to solve.

New Proposed Approaches


This new approach is Very similar to PIX CAPTCHAS as
discussed earlier following are the steps followed in
using it:
• Pick a concrete object

UE6858 Page 17
UE6858
“CAPTCHAS”

• Get 6 images at random from images.google.com


that match the object
• Distort the images
• Build a list of 100 words: 90 from a full dictionary, 10
from the objects dictionary
• Prompt the user to pick the object from the list of
words
• Make an HTTP call to images.google.com and search
for the object
• Screen scrape the result of 2-3 pages to get the list
of images
• Pick 6 images at random
• Randomly distort both the images and their URLs
before displaying them
• Expire the CAPTCHA in 30-45 seconds

Benefits of this approach

• The database already exists and is public

UE6858 Page 18
UE6858
“CAPTCHAS”

• The database is constantly being updated and


maintained
• Adding “concrete objects” to the dictionary is
virtually instantaneous
• Distortion prevents caching hacks
• Quick expiration limits streaming hacks

Drawbacks of this approach:

• Not accessible to people with disabilities (which is


the case of most CAPTCHAs)
• Relies on Google’s infrastructure
• Unlike CAPTCHAs using random letters and numbers,
the number of challenge words is limited.

Conclusion
1.CAPTCHAS are any software that distinguishes
human and machine.

UE6858 Page 19
UE6858
“CAPTCHAS”

2.Research in CAPTCHAS implies advancement in


AI making computers understand how human
thinks.
3.Internet companies are making billions of dollars
every year, their security and services quality
matters and so does the advancement in CAPTCHA
technology.
4.Different methods of CAPTCHAS are being
studied but new ideas like ReCAPTCHA using
human time on internet is amazing.

Bibliography

UE6858 Page 20
UE6858
“CAPTCHAS”

[i] www.phpcaptcha.org
[ii] www.captcha.net
[iii] www.wikipedia.com
[iv]Research papers by Louis Ahn (Carmegie
mellon university).

UE6858 Page 21

You might also like