Infosys
Infosys
1. About Project:
This project is for analyzing Bank statements or Transactions. In this project we will develop
a model which can analyze our transactions and provide us details like where we are using
our money. It shows us where we spend more or less, how much ,and how frequently. It also
categorizes our transactions like entertainment, grocery, transport etc. In this project we will
be using API integration, accurate OCR data extraction, comprehensive salary and expenses
analysis ,and full deployment and integration with existing financial systems.
It is mostly suitable for large companies where there are a lot of bank statements to
categorize. They will be able to picture where and how their money is being spent.
2. Fetching Data:
For providing data to our model for Testing we can use Kaggle or other sites which provides
data sets or We can even create our own data i.e. synthetic data or dummy data.
DAY-2
1. What are the types of API’s
Authentication and Authorization are the types of API
Authentication:
This is the process of verifying who someone is. When we log in to a website, for example,
we typically enter a username and password. The system checks these credentials against its
database to confirm our identity.
Authorization:
Once our identity is verified, the system needs to know what we're allowed to do.
Authorization determines our access level and permissions. For example, after logging in to a
website (authenticated), we may be allowed to view our profile (authorized) but not change
system settings if we're not an admin. In essence, authorization answers the question, "What
can you do?"
Requires a user to pass multiple factors to gain access to a service or network. MFA methods
include one-time passwords (OTPs) generated from a smartphone, Captcha tests, fingerprints,
voice biometrics, or facial recognition.
3. What is a token ?
A token is nothing but a temporary key that allows a user to log in to a service and access
protected resources without re-entering their credentials every time they log in.
DAY-3
1. About HTTP:
Hypertext Transfer Protocol (HTTP) is a set of rules that govern how data is transmitted over
the web. It defines how messages are formatted and transmitted, and how web servers and
browsers should respond to various commands. It is a CLIENT-SERVER protocol.
Headers: These provide additional information(metadata) about the request, such as the type
of data being sent or accepted, authentication credentials, and more.
Body: This is optional and typically used with methods like POST and PUT to include data to
be sent to the server.
Query Parameter: A defined set of parameters attached at the end of an URL used to provide
additional information to a web server when making requests.
The content-type provides the information about the media type and the format of the content
in the request body.
Authorization:
It is used to send the client’s credentials to the server when client is attempting to access a
protected resource.
Accept:
It defines the media types that are accepted by the client from the server.
5. HTTP Response:
An HTTP response is a message sent by a server back to a client (like a web browser) after
receiving an HTTP request. It contains the information requested by the client or the status of
the request. Here's a breakdown of its main components:
Status Line: This includes the HTTP version, status code, and a reason phrase.
Headers:
These provide additional information about the response, such as content type, date, server
details, and more.
Body:
This contains the actual data being returned, such as the HTML of a web page, an image, or
other resource. The body is optional and often used for successful responses (like 200 OK).
Status Code:
1xx (Informational):
100 Continue: The server received the initial part of the request and the client should
continue.
2xx (Success):
201 Created: The request was successful and a new resource was created.
3xx (Redirection):
301 Moved Permanently: The resource has been moved to a new URL permanently.
401 Unauthorized: Authentication is required and has failed or not been provided.
503 Service Unavailable: The server is currently unable to handle the request due to
maintenance or overload.
Widely used due to its readability and ease of use with JavaScript.
Advantages:
Simplicity: JSON is lightweight, easy to use, and has minimal syntax
Fast data transfer: JSON's small file size makes it ideal for scenarios where parsing speed is
important
Disadvantages:
Limited data types: JSON lacks built-in support for complex data types
More verbose than JSON but flexible and widely used in applications.
Advantages:
XML is easy to read and write, making it accessible for anyone to understand.
Disadvantages:
XML syntax is redundant or large relative to binary representation of same data.
DAY-4:
Flask:
It is a lightweight WSGI web application framework in Python. It's designed to be easy to get
started with, but also powerful enough to support complex applications. Flask provides the
tools, libraries, and technologies needed to build a web application.
Jsonify:
It is used to format python dictionaries into JSON responses i.e. in human readable format.
Code:
Output:
Home page
/hello:
DAY-5:
Api keys are used for authentication of users. It is used to protect the data from unauthorized
users. A user can access the data only when they have the key.
Code:
Output:
Before accessing secure data
/secure-data
Note: add API key to the extension for request header
A client must include an x-api-key header in the HTTP request with the value
"mysecretapikey"(API_KEY )
If the key is missing or incorrect, the client gets an HTTP 401 Unauthorized error.
TASK-
To get the data from weather App.
Code:
DAY-6
API limit-
Falsk Limiter: It is used for rate limiting i.e. a user can send the specified no.of requests at a
time . if the limit is reached they have to wait until they are allowed to send requests again.
get_remote_address: A utility function provided by Flask-Limiter to determine the IP address
of the client making the request. This is used to apply rate-limiting on a per-client basis.
Code:
Error Handling:
Error handling is a process to handle errors when they occur. Instead of showing the error to
the user in code we can just show the status code with a message. This will prevent the data
from unauthorized users like hackers and prevent vulnerabilities.
Code:
Output:
Accessing a non-existent route (e.g., /non_existent) triggers the 404 Error Handler.
The logs of the user i.e their activity will be stored in api.log file
Version:
There are various versions of an application. Suppose there are 2 version
Version 1
Version 2
The developer can allow the user to choose the version according to their preference or he
can make any version as the default version.
Code:
Output:
Customer authentication:
Accounts:
Access Token:
Balance check:
Creating sandbox account:
As it has long duration when the token is stolen, one can generate multiple new tokens and
steal the information and can even change the owner details.
if he generates new tokens, he gets long term access then he performs unauthorized activities
like accessing personal information selling it using it in bad way.
To avoid this problem we can :
Use https to encrypt the tokens so that unauthorized users cannot access the token.
Even if he/she tries to decrypt during all this process the token is going to expire.
Use second factor authentication.
DAY-9:
What Is Data?
Data is collection of Information. It refers to raw facts, figures, or information that can be
processed and analysed.
Data is of 3 types:
Structured Data:
As the name itself says structured i.e. in the form of rows and columns it is neat and easy to
understand It is suitable for querying.
Ex: Databases, Spreadsheets etc….
Semi structured Data:
Combination of structured and unstructured data. it is mainly stored in json or xml files
it may or may not contain additional information. it may not be that suitable for querying.
Ex: JSON, XML.
Unstructured Data: No specific rules or structure
like text doc, images, videos and audio files. It is very complicated for a computer to store
them in structured manner.
DAY 10:
Google Log in Code Discussion.
DAY 11:
For example in our case, when we give the image of bank payslip to OCR model it analyses
the image by dividing it into small pixels and then analyzing the text, their characteristics,
fonts etc… to give us final output or information regarding that image.
OCR recognizes text within an image and translates it into machine-readable text, making it
easier to edit, search, or store electronically.
ORIGIN OF OCR:
OCR was first introduced by Ray Kurzweil in the 1970s. He developed the Kurzweil reading
machine, the first system capable of recognizing text in multiple fonts, It was designed to
assist visually impaired by converting printed text into speech.
The Kurzweil Reading Machine worked by using the following steps:
1. Scanning: A flatbed scanner captured an image of the printed text.
2. Character Recognition: The machine applied pattern recognition algorithms to
identify individual characters and words.
3. Conversion: The recognized text was then converted into machine-readable text.
4. Speech Output: The machine used a speech synthesizer to read the text aloud to the
user.
5. Scanning: A flatbed scanner captured an image of the printed text.
6. Character Recognition: The machine applied pattern recognition algorithms to
identify individual characters and words.
7. Conversion: The recognized text was then converted into machine-readable text.
8. Speech Output: The machine used a speech synthesizer to read the text aloud to the
user.