0% found this document useful (0 votes)
32 views

Notes For Fintech Assesment, Cheatsheet

Uploaded by

jamshaidtariq425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Notes For Fintech Assesment, Cheatsheet

Uploaded by

jamshaidtariq425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Python Cheat Sheet/ Essential Syntax & Vocab/ Terminologies For Exam Revision:

Python Cheat Sheet # 1 : (Essential Syntax)


Basic Input/Output:
o User input: name = input("Enter your name: ")
o Printing: print("Hello, world!")
o Comments: # This is a comment

Data Types & Initialization:


o Numbers: int, float num = 1 num=0.1
o Strings: Single ('str') or double quotes ("str") Name = “abc”
o Booleans: True, False Status = True
o Lists: [], mutable, my_list = [1, "hello", True]
o Tuples: (), immutable, my_tuple = (1, 2, 3)
o Sets: {}, unordered, unique elements, my_set = {1, "a", True}
o Dictionaries: {}, key-value pairs, my_dict = {"name": "Bob", "age": 30}
Variables & Operators:
o Variable assignment: name = "Alice"
o Arithmetic: +, -, *, /, // (integer division), % (modulo)
o Comparison: ==, !=, <, >, <=, >=
o Logical: and, or, not
o Membership: in, not in
https://www.w3schools.com/python/python_operators.asp
Control Flow:
o if, elif, else statements (conditional execution)
o for loops (iterate over sequences): for i in range(5): print(i)
- The for loop is used when we know the number of iterations, that is, how many times
a statement must be executed. That is why, when we initialize the for loop, we must
define the ending point.
o while loops (repeat code until a condition is met)
- a while loop is used to execute a block of statements repeatedly until a given
condition is satisfied. A while loop is used when the number of iterations is unknown
To control loop flow:
o break Immediately terminates the loop, regardless of any remaining iterations.
o continue Skips the remaining code in the current iteration and jumps to the beginning
of the next iteration.
Functions:
o Defining: def my_function(arg1, arg2): ...
o Calling: result = my_function(value1, value2)
o Returning a value: return result
Data Structures & Advanced Features:
O List Comprehensions: Concisely create lists based on other sequences
o Tuple Unpacking: Assign multiple values from a tuple to variables
O Dictionary Methods: keys(), values(), items(), get(key), update(dict)
O String Methods: upper(), lower(), split(), join(), find()
Files & Exceptions:
o Open files for reading/writing: f = open("file.txt", "r")
o Exception handling with try, except, finally blocks
Data Analysis & Libraries:
O NumPy: Efficient numerical computations with arrays
O Pandas: Data analysis and manipulation with DataFrames
O Matplotlib: Creating visualisations (plots, charts, graphs)
o Importing data: df = pd.read_csv("data.csv")
o Selecting data: df[column_name], df.iloc[row, col]
o Data operations: filtering, sorting, grouping, aggregating
o Visualization: bar plots, line plots, histograms

Python Cheat Sheet # 2 : (Essential Syntax)

Data Types and Conversions:


o Check data types: type()
o Convert strings to numbers: float(), int()
o Convert numbers to strings: str()
o Convert strings to bool: bool()
o Format numbers as strings: f-strings e.g., f"{value:.2f}"), format() method
Lists Functions:
o Access elements: list[index]
o Add elements: list.append(element)
O Maximum & Minimum: max(list), min(list)
o Remove specific elements is search & removed : list.remove(element)
o Combine lists: list1 + list2, list.extend(list2)
o Sort lists: list.sort() # change the original list, sorted(list) # doesn’t change the list
o Reverse lists: list[::-1]
Dictionaries Functions:
o Create dictionaries: dict()
o Access values: dict[key]
o Add key-value pairs: dict[new_key] = new_value
o Check (Find) for keys: key in dict # returns True or False
o Get all keys: dict.keys()
o Get all values: dict.values()
Strings Functions:
o Access characters: string[index]
o Slice substrings: string[start:end]
o Reverse strings: string[::-1]
o Check (find) for substrings: substring in string
o Split strings: string.split()
o Strip whitespace: string.strip(), string.lstrip(), string.rstrip()
o Replace characters: string.replace(old, new)
Loops and Iteration:
o Iterate over lists: for element in list:
o Iterate over dictionaries: for key in dict:
o Iterate over string characters: for char in string:
o Range-based loop: for i in range(start, end, step):
Booleans and Conditions:
o Boolean values: True, False
o Comparisons: ==, !=, <, >, <=, >=
o Logical operators: and, or, not
o Conditional statements: if, elif, else
Functions:
o Define functions: def function_name(parameters):
o Call functions: function_name(arguments)
o Return values: return value
File Operations:
o Check file existence: os.path.exists(filename)
o Open files: open(filename, mode)
o Read file contents: file.read()
o Write to files: file.write(text)
o Close files: file.close()

Additional Concepts:
o Comments: # single-line comment, """ multi-line comment """
o Calculate averages: mean()
o Find maximum/minimum values: max(), min()
o Count element occurrences: collections.Counter()
o Remove duplicates: set()

Remember:
· Practice writing and running Python code regularly.
· Use online resources and tutorials for more in-depth learning.
· Experiment with different libraries for various tasks.
· Don't hesitate to ask questions and seek help from communities or mentors.
· Good luck with your exam!

Cheat Sheet # 3: PANDAS.


Pandas Cheat Sheet for Data Wrangling and Analysis

Data Structures:
· DataFrame: Two-dimensional table of data with rows and columns.
· Series: One-dimensional ordered array of data with an index.

Data Loading and Formatting:


· pd.read_csv(filename): Reads a CSV file into a DataFrame.
· df.head(n): Shows the first n rows of a DataFrame.
· df.tail(n): Shows the last n rows of a DataFrame.
· df.info(): Displays information about the DataFrame, including data types and memory usage.
· df.describe(): Summarizes the numerical columns of a DataFrame.
· df.dtypes: Shows data types of all columns.
· df.astype(dtype): Converts column(s) to a specific data type.

Selection and Indexing:


· df[column_name]: Selects a specific column.
· df[condition]: Selects rows based on a condition.
· df.loc[indexes, columns]: Selects specific rows and columns by index.
· df.iloc[row_index, column_index]: Selects specific rows and columns by integer index.

Data Manipulation:
· df.drop(column_name, inplace=True): Drops a column.
· df.dropna(): Drops rows with missing values.
· df.fillna(value): Fills missing values with a specific value.
· df.sort_values(by=column_name): Sorts the DataFrame by a column.
· df.groupby(column_name): Groups the DataFrame by a column.
· df.agg(function): Applies an aggregation function to each column.
· df.apply(function): Applies a function to each element of the DataFrame.

Data Conversion in Pandas:


· pd.to_datetime(series/object): Converts a Series or object to datetime format.
· pd.to_numeric(series/object): Converts a Series or object to numeric format.
· pd.to_timedelta(series/object): Converts a Series or object to timedelta format.
· df[column_name] = df[column_name].astype(dtype): Converts a specific column to a desired
data type.
· df.apply(pd.to_numeric, errors='coerce'): Attempts to convert all columns to numeric, ignoring
errors (e.g., non-numeric values).

String Manipulation in Pandas:


· df[column_name].str.strip(): Removes leading and trailing whitespaces from a string column.
· df[column_name].str.upper(): Converts all characters in a string column to uppercase.
· df[column_name].str.lower(): Converts all characters in a string column to lowercase.
· df[column_name].str.split(): Splits a string column into a list based on a delimiter.
· df[column_name].str.replace(pattern, replacement): Replaces all occurrences of a pattern in a
string column with another string.

Date and Time Functions In Pandas:


· df['date_column'].dt.year: Extract the year from a datetime column.
· df['date_column'].dt.month: Extract the month from a datetime column.
· df['date_column'].dt.day: Extract the day from a datetime column.
· df['date_column'].dt.hour: Extract the hour from a datetime column.
· df['date_column'].dt.is_leap_year: Check if a year in a datetime column is a leap year.

Merging and Joining:


· pd.merge(df1, df2, on=column_name): Merges two DataFrames based on a common column.
· df.join(df2, how='inner', on=column_name): Joins two DataFrames based on a common column.

Statistical Analysis:
· df.mean(): Calculates the mean of each column.
· df.median(): Calculates the median of each column.
· df.std(): Calculates the standard deviation of each column.
· df.corr(): Calculates the correlation between columns.
· df.value_counts(): Counts the occurrences of each unique value in a column.

Visualization:
· df.plot(): Creates a basic plot of a DataFrame.
· import matplotlib.pyplot as plt: For advanced plot customization.

Additional Tips:
· Use .copy() to avoid modifying the original DataFrame while manipulating data.
· Chain various methods together for efficient data cleaning and transformation.
· Explore pd.Categorical for working with categorical data.

I. Joining and Merging with DataFrames (using pandas)


Key Functions:
o merge(): Merges DataFrames based on common columns or indices.
merged_df = df1.merge(df2, on="common_column", how="inner")
o join(): Similar to merge(), often used for simple joins on index.
merged_df = df1.join(df2, on="index")
o concat(): Concatenates DataFrames along rows or columns.
merged_df = pd.concat([df1, df2], axis=0, join="inner")

Specifying Join Type:


how="inner" (default): Inner join.
how="outer": Outer join.
how="left": Left join.
how="right": Right join.

Example (inner join using merge()):


import pandas as pd
df1 = pd.DataFrame({"ID": [1, 2, 3], "Name": ["Alice", "Bob", "Charlie"]})
df2 = pd.DataFrame({"ID": [2, 3, 4], "Age": [30, 35, 40]})
merged_df = df1.merge(df2, on="ID", how="inner")
print(merged_df)

II. Joining and Merging without DataFrames


Concepts:
o Joining: Combining data based on shared values.
o Merging: Similar to joining, often used with more structured data.

Common types of joins:


o Inner join: Keeps only rows with matching values in both datasets.
o Outer join: Keeps all rows from both datasets, filling missing values with NaN.
o Left join: Keeps all rows from the left dataset, filling missing values from the right with NaN.
o Right join: Keeps all rows from the right dataset, filling missing values from the left with NaN.
Key Functions:
o zip(): Pairs elements from two iterables based on their index.
paired_data = zip(data1, data2)
o dict(): Creates dictionaries to associate values based on keys.
combined_dict = dict(zip(keys, values))
o Loops: Iterate over data structures & manually combine elements based on conditions.
for i in range(len(data1)):
if condition:
combined_data.append((data1[i], data2[i]))

- Example (inner join using dictionaries):


names1 = ["Alice", "Bob", "Charlie"]
ages1 = [25, 30, 35]
names2 = ["Bob", "Charlie", "David"]
ages2 = [31, 34, 40]

combined_dict = {}
for name, age in zip(names1, ages1):
if name in names2:
combined_dict[name] = (age, ages2[names2.index(name)])

print(combined_dict) # Output: {'Bob': (30, 31), 'Charlie': (35, 34)}

Remember,

● Practice is key to solidifying these concepts!


● Choose appropriate functions and join types based on your data structures and desired
outcomes.
● Consider using DataFrames for more efficient and versatile merging operations.
● Explore additional pandas features for advanced merging and data manipulation.

Key Terminologies:
● Neobanks is a fintech firm that only works in the banking industry and provides digital banking
services without physical branches
● Advanced persistent threat (APT) is a prolonged and targeted cyber attack in which an
intruder gains access to a network and remains undetected for an extended period. APT
attacks are initiated to steal highly sensitive data rather than cause damage to the target
organization's network.
● Dwell time represents the length of time a cyberattacker has free reign in an environment,
from the time they get in until they are eradicated. Dwell time is determined by adding mean
time to detect (MTTD) and mean time to repair/remediate (MTTR), and is usually measured in
days.
● Multi-vector attacks are sophisticated cyberattacks that use multiple methods to gain access
to an organization's systems. For example, an attacker may perform a Distributed Denial-Of-
Service (DDoS) attack using multiple techniques or types at once.
● DOS - Disk Operating System
● Blended Attacks: Combine multiple attack techniques to increase effectiveness.
● Hybrid Attacks: Merge different attack methods, often across physical and digital domains.
● Composite Attacks: Involve multiple stages or phases, each with distinct techniques.
● Distributed Attacks: Originate from multiple sources to overwhelm defenses and obscure
origins.
● Chained Attacks: Sequentially leverage multiple vulnerabilities to penetrate deeper into
systems.
● Polymorphic Attacks: Adapt and change form to evade detection and countermeasures.
● Related Concepts:
● Advanced Persistent Threats (APTs): Highly sophisticated, long-term attacks often employing
multi-vector strategies.
● Zero-Day Attacks: Exploit previously unknown vulnerabilities, leaving defenders less time to
react.
● Supply Chain Attacks: Target third-party vendors or software supply chains to gain access to
target organizations.
● Insider Threats: Involve malicious actors within an organization, potentially using multiple
vectors to compromise systems.
● Cyber Kill Chain: A model for understanding and defending against multi-stage cyberattacks,
commonly used in military and intelligence contexts.
● MITRE ATT&CK Framework: A comprehensive knowledge base of adversary tactics and
techniques, used by security professionals for threat modeling and defense planning.
● Multi-Layered Security: A defensive approach emphasizing multiple layers of protection to
address diverse attack vectors.
● Threat Intelligence: Information about potential threats used to anticipate and counter attacks
proactively.
● Incident Response: The process of detecting, responding to, and recovering from
cyberattacks.
● Penetration Testing: Simulating attacks to identify vulnerabilities and test defenses.
● Vulnerability Management: Identifying, prioritizing, and remediating security vulnerabilities.
● Security Awareness Training: Educating users about cybersecurity risks and best practices.
● Targeted Attack: Explicitly focusing on a specific organization or individual.
● Cyber Espionage: Stealing intellectual property, trade secrets, or confidential data.
● Nation-State Attack: Sponsored or supported by a foreign government.
● Zero-Day Attack: Exploiting previously unknown vulnerabilities.
● Watering Hole Attack: Targeting specific websites or resources frequented by the target.
● Lateral Movement: Expanding access within a compromised network.
● Data Exfiltration: Stealing and transferring sensitive data out of the network.
● Command and Control (C&C) Server: Base of operations for an attacker to control
compromised systems.
● Botnet: Network of compromised devices used for coordinated attacks.
● Mirai is malware that infects smart devices that run on ARC processors, turning them into a
network of remotely controlled bots or "zombies"
● Mirai is malware that turns networked devices running Linux into remotely controlled bots
that can be used as part of a botnet in large-scale network attacks. It primarily targets
online consumer devices such as IP cameras and home routers
● Social Engineering: Tricking users into granting access or revealing information.
● Phishing: Deceptive emails or websites aiming to steal login credentials.
● Spear Phishing: Targeted phishing designed for specific individuals within an organization.
● Malware: Malicious software used to exploit vulnerabilities and gain access.
● Ransomware: Encrypting data and demanding payment for decryption.
● Supply Chain Attack: Targeting third-party vendors to gain access to the target organization.
● Advanced Evasion Techniques: Hiding from detection and security measures.
● Attribute Masking: Disguising attacker origins and tools.
● Fileless Malware: Operating without leaving traditional file traces.
● Persistence Mechanisms: Ensuring prolonged access even after system restarts.
● Rootkit: Malicious software providing deep system access and privilege escalation.
● Timely Response: Dwell Time and its Implications
● Exposure Period: Time attackers remain active within a compromised system.
● Breach Duration: Length of time between initial intrusion and detection.
● Incident Response Time: Speed of identifying and responding to a security incident.
● MTTD (Mean Time to Detect): Average time to discover a security breach.
● MTTR (Mean Time to Repair/Remediate): Average time to contain and neutralize the threat.
● Attack Lifecycle: Stages of a cyberattack, from initial intrusion to exploitation and exfiltration.
● Post-Breach Investigation: Analyzing the attack and its impact after detection.
● Security Incident Response Plan: Defined procedures for handling cyberattacks.
● Threat Hunting: Proactively searching for hidden threats within networks.
● Vulnerability Management: Identifying and patching security vulnerabilities.
● Penetration Testing: Simulating attacker behavior to identify and address vulnerabilities.
● Security Awareness Training: Educating employees about cybersecurity best practices.
● Phishing Simulations: Testing employee susceptibility to phishing attacks.
● Security Information and Event Management (SIEM): Consolidate and analyze security logs
for threat detection.
● Endpoint Detection and Response (EDR): Detect and contain threats on endpoints like
laptops and desktops.
● Network Traffic Analysis (NTA): Monitor network activity for suspicious behavior.
● Security Orchestration, Automation, and Response (SOAR): Automate incident response
tasks to improve efficiency.
● Incident Response Rehearsals: Practicing incident response procedures to ensure
preparedness.
● Cybersecurity Insurance: Financial protection against losses from cyberattacks.
● Data Loss Prevention (DLP): Prevent unauthorized data exfiltration.
● Cybersecurity Framework: Established sets of security best practices (e.g., NIST
Cybersecurity Framework).
● Zero Trust Security: Granting least privilege access based on continuous verification.
● Security by Design: Integrating security considerations into development and operations.
● Threat Intelligence: Sharing threat information among entities to stay ahead of attackers.
● Vulnerability Disclosure: Responsible reporting of discovered vulnerabilities to vendors.
● Bug Bounty Programs: Rewarding researchers for finding security vulnerabilities.
● Cybersecurity Incident Response Teams (CSIRTs): Dedicated teams handling security
incidents.
● Threat Hunting Teams: Proactive teams specializing in identifying hidden threats.
● Security Operations Centers (SOCs): Monitoring and analyzing security events in real-time.
● Cybersecurity Compliance: Adhering to relevant industry regulations and standards.

These are just some examples, and the specific terminology used may vary
● Information Security Lifecycle (ISLC): Identify, protect, monitor, respond, recover
○ Key Stages: Asset identification, classification, threat assessment, control
implementation, monitoring, incident response, recovery, review
● Cybersecurity Models:
○ CIA Triad (Confidentiality, Integrity, Availability),
○ Star Model (CIA + Non-Repudiation, Accountability),
○ Parkerian Hexad (CIA + Possession/Control, Utility)
● RegTech is the use of technology to improve the way businesses manage regulatory
compliance.
○ RegTech Solutions are used for Regulatory monitoring, compliance reporting, risk
management, KYC/AML, IAM, content management, compliance platforms, RPA,
blockchain/DLT, AI/ML
● AI (Artificial Intelligence): Machines that simulate human intelligence to perform tasks and
make decisions.
● ML (Machine Learning): A subset of AI where machines learn from data to improve their
abilities without explicit programming.
● DL (Deep Learning): A type of ML inspired by the human brain that uses artificial neural
networks to learn from large, complex datasets.
● GenAI (Generative AI): A type of AI that can create new content, such as images, text, or
music, based on patterns it has learned from existing data.
● API (Application Programming Interface): Enables secure data exchange between
platforms, powering payment gateways, account aggregation, and credit scoring tools.
● Open Banking: Regulatory framework allowing third-party access to customer financial data
with consent, driving innovative FinTech apps and services.
● BaaS (Banking as a Service): Banks provide core banking functionalities as APIs or cloud
services, empowering startups and smaller companies to offer financial services.
● Blockchain: Distributed ledger technology for secure and transparent financial transactions,
powering cryptocurrencies and smart contracts.
● Big Data: Analyzing vast financial data sets to gain insights, predict trends, and improve risk
management.
● RegTech (Regulatory Technology): Software solutions for automating compliance tasks,
managing regulatory changes, and mitigating risks within FinTech.
● Cloud Computing: On-demand storage and computing resources enabling scalable and
flexible FinTech infrastructure.

Payment & Transactions:


● EMV (Europay, Mastercard, Visa): Global standard for secure chip-and-PIN card payments,
reducing fraud.
● Contactless Payments: Secure mobile or card payments without physical contact.
● Tokenization: Replacing sensitive card data with secure tokens to protect customer
information.
● P2P (Peer-to-Peer) Payments: Money transfer directly between individuals, bypassing
traditional banks.
● Mobile Wallets: Digital wallets stored on smartphones for contactless payments and loyalty
programs.
● Cryptocurrency: Digital or virtual currency secured by cryptography, like Bitcoin or
Ethereum.
● Stablecoin: Cryptocurrency pegged to a stable asset like USD, aiming for price stability.

Lending & Investing:


● FinTech Lending: Technology-driven lending platforms offering alternative lending options,
peer-to-peer loans, and robo-advisors.
● Robo-Advisors: Algorithmic investment platforms offering automated portfolio management.
● Crowdfunding: Raising capital from a large pool of investors online for startups and projects.
● Microloans: Small loans provided to individuals or businesses, particularly in developing
countries.
● InsurTech: Technology-driven innovations in the insurance industry, offering personalized
policies and automated claims processing.

Risk Management & Security:


● KYC (Know Your Customer): Customer verification process to comply with anti-money
laundering regulations.
● AML (Anti-Money Laundering): Preventing criminals from using financial systems to
legitimize illegal funds.
● Fraud Detection: Identifying and preventing unauthorized transactions and financial crime.
● Data Security: Protecting customer data from unauthorized access, theft, or misuse.
● Cybersecurity: Protecting financial systems and data from cyberattacks.

Emerging Trends:
● Open Finance: Expanding Open Banking beyond traditional banking to include insurance,
investments, and other financial services.
● Embedded Finance: Integrating financial services seamlessly into non-financial applications
and platforms.
● Central Bank Digital Currencies (CBDCs): Digital versions of national currencies issued by
central banks, potentially revolutionizing monetary policy and payments.
● Decentralized Finance (DeFi): Financial services built on blockchain technology, including
lending, borrowing, and trading without intermediaries.
● WealthTech: Technology-driven solutions for wealth management, personalized financial
advice, and robo-investing.

Additionally:
● FinTech Hubs: Concentrated ecosystems of FinTech companies and startups fostering
innovation and collaboration.
● FinTech Unicorns: Privately held FinTech companies valued at over $1 billion.
● RegTech Sandbox: Controlled environment for testing and developing innovative RegTech
solutions under regulatory supervision.
● InsurTech Sandbox: Similar to RegTech Sandbox, but tailored for testing innovative
insurance solutions.
● Financial Inclusion: Making financial services accessible and affordable to everyone,
including underserved communities.

Following terms primarily fall under the realm of statistics and machine learning, specifically
focusing on model evaluation and regression analysis.

Metrics like MSE, RMSE, MAE, MdAE, R-squared, Adjusted R-squared, and the concept of residuals
are all fundamental in evaluating the goodness-of-fit of a model and understanding its prediction
accuracy.

Terms like loss function, overfitting, and underfitting are crucial in the training process of machine
learning models to optimize their performance

While the terminologies have a strong foundation in statistics and machine learning, their specific
application lies in the field of finance.

Metrics like MSE and R-squared are applied to evaluate the accuracy of financial predictions,
particularly in areas like price forecasting, portfolio optimization, and risk management.

Financial-specific terms like Sharpe Ratio, Sortino Ratio, and Calmar Ratio build upon these statistical
fundamentals to measure the performance and risk of investments and financial strategies.

Error Metrics:
● Root Mean Squared Error (RMSE): Square root of MSE, making it easier to interpret in the same
units as the target variable.
● Mean Absolute Error (MAE): Average absolute difference between predicted and actual values,
less sensitive to outliers than MSE.
● Median Absolute Error (MdAE): Median of absolute differences, even less sensitive to outliers
than MAE.
● R-squared: Proportion of variance in the target variable explained by the model, ranging from 0 to 1,
with higher values indicating better fit.
● Adjusted R-squared: Adjusts R-squared for the number of features in the model to avoid
overfitting.
Other Relevant Terms:
● Loss function: Function that measures the discrepancy between predicted and actual values, used to
optimize models during training.
● Residuals: Differences between predicted and actual values.
● Goodness-of-fit: How well a model fits the data, often measured by error metrics like MSE or R-
squared.
● Overfitting: Model memorizes the training data instead of generalizing to unseen data, leading to
poor performance on new data.
● Underfitting: Model fails to capture the underlying patterns in the data, leading to poor predictive
accuracy.
Specific Financial Terms:
● Sharpe Ratio: Measures the excess return of an investment above a risk-free rate relative to its
volatility.
● Sortino Ratio: Similar to Sharpe Ratio, but focuses on downside volatility to reward positive excess
returns.
● Calmar Ratio: Similar to Sharpe Ratio, but uses maximum drawdown instead of standard deviation
for a more extreme risk measure.

Based on Exam feedback: Following content has been added

Total 3 Question
1 & 2 Theory
3 MCQ (25)

- Blockchain terminologies
- Information Security/ Cyber security Terminologies
- Python Question
- Import python libraries

a) Check data type of all columns


import pandas as pd
# Assuming df is your DataFrame
data = {
'A': [1, 2, 3],
'B': ['a', 'b', 'c'],
'C': [1.1, 2.2, 3.3]
}
df = pd.DataFrame(data)
# Check data types of all columns
print(df.dtypes)

b) Check data type of all columns


You can get the number of non-null values in each column of a Pandas DataFrame using the
count() method. Here's how you can do it:

import pandas as pd
# Assuming df is your DataFrame
data = { 'A': [1, 2, 3],
'B': ['a', None, 'c'],
'C': [1.1, 2.2, 3.3] }
df = pd.DataFrame(data)
# Count non-null values in each column
print(df.count())

c) Rename a Column Name


You can rename a column in a Pandas DataFrame using the rename() method.
Here's how you can do it:

import pandas as pd
# Assuming df is your DataFrame
data = {
'Old_Column_Name': [1, 2, 3],
'B': ['a', 'b', 'c'],
'C': [1.1, 2.2, 3.3] }
df = pd.DataFrame(data)
# Rename a column
df = df.rename(columns={'Old_Column_Name': 'New_Column_Name'})
# Print the DataFrame to see the changes
print(df)
This will rename the column 'Old_Column_Name' to 'New_Column_Name' in your
DataFrame.

d) To delete a column in a Pandas DataFrame, you can use the drop() method.

import pandas as pd
# Example DataFrame
data = {
'A': [1, 2, 3],
'B': ['a', 'b', 'c'],
'C': [1.1, 2.2, 3.3] }
df = pd.DataFrame(data)
# Delete a column
df.drop(columns=['B'], inplace=True)
# Print the DataFrame to see the changes
print(df)

This code will delete the column 'B' from your DataFrame. The inplace=True parameter
ensures that the change is applied to the DataFrame directly, without the need to assign the
result back to df. If you don't want to modify the original DataFrame, you can omit
inplace=True and assign the result to a new variable.

e) Delete a row

To delete a specific row in a Pandas DataFrame, you can use the drop() method by
specifying the index label or index position of the row you want to delete. Here's how you can
do it:

import pandas as pd
# Example DataFrame
data = {
'A': [1, 2, 3],
'B': ['a', 'b', 'c'],
'C': [1.1, 2.2, 3.3] }
df = pd.DataFrame(data)

# i) Delete a specific row by index label


index_to_delete = 1 # For example, delete the row with index label 1
df = df.drop(index_to_delete)
# Print the DataFrame to see the changes
print(df)
This code will delete the row with the index label 1 from your DataFrame. If you want to delete
a row by its position, you can use the drop() method with the axis=0 argument (which is the
default) and specify the row's position using the index parameter.
# ii) Delete a specific row by index position
row_position_to_delete = 1 # For example, delete the row at position 1
df = df.drop(df.index[row_position_to_delete])

# Print the DataFrame to see the changes


print(df)
Both approaches will remove the specified row from the DataFrame

f) Group Data
Grouping data in Pandas is a powerful operation that allows you to split the data into groups
based on one or more criteria, apply a function to each group independently, and then
combine the results. You can group data using the groupby() function.

import pandas as pd

# Example DataFrame
data = {
'Category': ['A', 'B', 'A', 'B', 'A'],
'Value': [10, 20, 30, 40, 50] }
df = pd.DataFrame(data)

# i) Group the data by 'Category'


grouped = df.groupby('Category')
# Apply a function to each group, for example, calculate the sum of values in each group
sum_by_category = grouped.sum()
# Print the result
print(sum_by_category)

In this example, the DataFrame is grouped by the 'Category' column, and then the sum of
values in each group is calculated using the sum() function. You can replace sum() with any
other aggregation function such as mean(), count(), max(), min(), etc., or you can even apply
your custom function using the apply() method.

You can also group by multiple columns by passing a list of column names to the groupby()
function:

# ii) Group the data by multiple columns


grouped = df.groupby(['Category', 'Subcategory'])
This will group the data by both 'Category' and 'Subcategory' columns

g) Import and export csv file in python

i) Importing CSV File:

import pandas as pd
# Import CSV file into DataFrame
df = pd.read_csv('input.csv')
# Print the DataFrame to see the data
print(df)
#This code reads the data from a CSV file named 'input.csv' into a Pandas DataFrame df.
ii) Exporting DataFrame to CSV:

import pandas as pd
# Example DataFrame
data = {
'ID': [1, 2, 3],
'Name': ['John', 'Alice', 'Bob'],
'Value': [10.5, 20.3, 15.7]
}
df = pd.DataFrame(data)

# Export DataFrame to CSV


df.to_csv('output.csv', index=False)

This code creates a DataFrame from a dictionary, then exports it to a CSV file named
'output.csv'. The index=False argument ensures that the index is not included in the exported
CSV file.

Make sure you have pandas installed. If not, you can install it using pip:

pip install pandas

These examples provide a straightforward way to import and export CSV files in Python using
pandas.

h) Line chart In python


Below is an example of how to create a line chart in Python using Matplotlib with markers,
legends, and country names displayed in different colors:

import matplotlib.pyplot as plt

# Sample data
years = [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]
countries = ['USA', 'China', 'India', 'Russia']
data = {
'USA': [15, 16, 18, 20, 22, 23, 25, 27, 28, 30],
'China': [10, 12, 14, 16, 18, 20, 22, 24, 26, 28],
'India': [5, 6, 8, 10, 12, 14, 16, 18, 20, 22],
'Russia': [8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
}

# Plotting
plt.figure(figsize=(10, 6))

for country in countries:


plt.plot(years, data[country], marker='o', label=country)

plt.title('GDP Growth Over Years')


plt.xlabel('Year')
plt.ylabel('GDP (Trillions USD)')
plt.xticks(years) # Show all years on x-axis
plt.grid(True)
plt.legend()
plt.show()
This code will generate a line chart showing the GDP growth over the years for four different
countries (USA, China, India, Russia). Each country's data will be represented by a colored
line with markers, and the country names will be displayed in the legend with different colors.

i) Encryption in Blockchain
Blockchain technology doesn't directly involve encryption in the same sense as encrypting
data for confidentiality. Instead, it relies on cryptographic techniques to ensure the integrity,
immutability, and security of transactions and data stored on the blockchain.

Here's how encryption and cryptography are used in blockchain technology:

Hash Functions: Blockchain uses cryptographic hash functions like SHA-256 to convert input
data into a fixed-size string of characters, which is a unique representation of the original
data. This hash is a one-way function, meaning it's easy to compute the hash value from the
input data, but it's computationally infeasible to reverse the process or derive the original data
from the hash. Hash functions ensure the integrity of data by providing a digital fingerprint for
each block in the blockchain.

Public-Key Cryptography: Blockchain also employs asymmetric encryption, commonly known


as public-key cryptography, for secure digital signatures and identity verification. Each
participant in the blockchain network has a pair of cryptographic keys: a public key and a
private key. The public key is shared openly, while the private key is kept secret. Digital
signatures are created using the private key and verified using the corresponding public key.
This ensures that transactions are authentic and tamper-proof.

Consensus Mechanisms: Blockchain networks use consensus mechanisms, such as Proof of


Work (PoW) or Proof of Stake (PoS), to achieve agreement among participants on the validity
of transactions and the order in which they are added to the blockchain. These mechanisms
often involve cryptographic puzzles or algorithms that require participants to expend
computational resources or stake cryptocurrency as collateral, thus preventing malicious
actors from altering the blockchain's history.

Encryption for Privacy: While the data stored on the blockchain itself is typically transparent
and immutable, sensitive information can be encrypted off-chain before being stored on the
blockchain. Participants with the appropriate decryption keys can access this data as needed
while maintaining confidentiality and privacy.

Overall, encryption and cryptographic techniques are fundamental to the security and
trustworthiness of blockchain networks, ensuring that transactions are secure, verifiable, and
resistant to tampering.

j) Types of Viruses:
Viruses are a type of malicious software (malware) designed to infect computers, networks,
and devices, causing harm to data, systems, or users. There are various types of viruses,
each with its own characteristics and methods of spreading. Here are some common types:
1. File Infector Viruses: These viruses attach themselves to executable files, such as .exe or .dll
files. When the infected file is executed, the virus activates and infects other files on the
system.
Example: One famous example is the CIH virus (also known as the Chernobyl virus), which
infected executable files on Windows systems and caused damage by overwriting data on the
hard drive and corrupting system BIOS.

2. Boot Sector Viruses: These viruses infect the boot sector of a storage device, such as a hard
drive or a USB flash drive. They activate when the infected device is booted, allowing the
virus to spread to other devices connected to the system.
Example:The Stoned virus is a well-known boot sector virus that infected the master boot
record (MBR) of floppy disks and hard drives. It displayed the message "Your PC is now
Stoned!" when the infected system booted.

3. Macro Viruses: Macro viruses infect documents and spreadsheets that support macros, such
as Microsoft Word or Excel files. They use macros to execute malicious code when the
infected document is opened.
Example:The Melissa virus is a notable example of a macro virus that spread via infected
Word documents attached to emails. When opened, the virus used macros to replicate itself
and send infected emails to the victim's contacts

4. Polymorphic Viruses: These viruses mutate or change their code to avoid detection by
antivirus software. Each instance of the virus appears slightly different from the others,
making it challenging to detect and remove.
Example:The Storm Worm is an example of a polymorphic virus that constantly changed its
code to evade detection. It spread through malicious email attachments and infected
computers to form a botnet used for sending spam emails and launching distributed denial-of-
service (DDoS) attacks.

5. Resident Viruses: Resident viruses embed themselves in the system's memory and can
execute malicious code whenever the infected system is booted or certain actions are
performed.
Example:The CMJ virus (also known as the Win95.CIH or Spacefiller virus) was a resident
virus that infected executable files and remained in memory, infecting new files as they were
opened or executed.

6. Multipartite Viruses: These viruses combine characteristics of multiple virus types, making
them versatile and capable of infecting different types of files and system components.
Example:The Tequila virus is an example of a multipartite virus that infected both the boot
sector and executable files on DOS and Windows systems, making it difficult to detect and
remove.

7. Worms: While not strictly viruses, worms are a type of malware that replicates itself to spread
to other computers or networks. Unlike viruses, worms do not require a host file to propagate
and can spread independently.
Example:The Conficker worm is a prominent example of a worm that spread across networks
by exploiting vulnerabilities in Windows systems. It infected millions of computers worldwide
and caused widespread disruption.

8. Trojans: Trojans disguise themselves as legitimate software or files to trick users into
downloading and executing them. Once installed, they can perform various malicious actions,
such as stealing data, spying on users, or providing backdoor access to attackers.
Example:The Zeus Trojan (also known as Zbot) is a notorious example of a Trojan that
targeted online banking users. It stole login credentials and financial information by
intercepting user keystrokes and capturing screenshots.

9. Ransomware: Ransomware encrypts the victim's files or locks them out of their system,
demanding a ransom payment in exchange for decryption keys or restoring access.
Example:WannaCry is a well-known example of ransomware that spread rapidly in 2017,
encrypting files on infected systems and demanding ransom payments in Bitcoin for
decryption. It exploited a vulnerability in the Windows SMB protocol to propagate across
networks.

10. Spyware: Spyware secretly monitors and collects information about a user's activities, such
as browsing habits, keystrokes, or login credentials, without their consent.
11. Example:The CoolWebSearch spyware is an example of spyware that infected computers via
drive-by downloads or bundled with other software. It monitored user browsing habits,
redirected search queries, and displayed unwanted advertisements.

These examples demonstrate the diversity and sophistication of viruses and malware that have posed
significant threats to computer systems and users over the years.

j) Encryption in Blockchain
In blockchain technology, encryption is primarily used to ensure the security, integrity, and authenticity
of data and transactions. The two main types of encryption commonly used in blockchain are:

1. Hash Functions: Hash functions play a crucial role in blockchain by generating unique digital
fingerprints (hash values) for data stored in each block. These hash values are deterministic
and unique for each block and are calculated using cryptographic hash functions like SHA-
256 (Secure Hash Algorithm 256-bit). Hash functions have several properties that make them
ideal for blockchain applications:

● Deterministic: Given the same input, a hash function will always produce the same
output.
● Fast computation: Hash functions are computationally efficient, allowing for quick
verification of data integrity.
● Collision-resistant: It's computationally infeasible to find two different inputs that
produce the same hash output, ensuring the uniqueness of hash values.
● Irreversibility: It's practically impossible to derive the original input data from its hash
value, making hash functions one-way.

2. Public-Key Cryptography: Public-key cryptography, also known as asymmetric cryptography,
is used in blockchain for digital signatures and identity verification. Each participant in the
blockchain network has a pair of cryptographic keys: a public key and a private key. The
public key is shared openly, while the private key is kept secret. Public-key cryptography
relies on the mathematical relationship between these keys:
● Digital Signatures: Participants use their private keys to sign transactions, generating
a unique digital signature. Anyone with access to the corresponding public key can
verify the signature's authenticity, ensuring that the transaction was created by the
holder of the private key.
● Identity Verification: Public keys serve as unique identifiers for participants in the
blockchain network. Participants can encrypt messages using the recipient's public
key, ensuring that only the holder of the corresponding private key can decrypt and
read the message.
These encryption techniques, along with consensus mechanisms and cryptographic puzzles,
form the foundation of blockchain security, enabling trustless transactions and decentralized
consensus.

k) Types of Encryption in general


There are two main categories of encryption methods: symmetric and asymmetric. Each has
its own popular algorithms, and here's a breakdown of a few common ones:

I.) Symmetric Encryption


● Advanced Encryption Standard (AES): This is the current gold standard for
symmetric encryption, trusted by governments and widely used. AES is considered
highly secure with key lengths of 128, 192, or 256 bits. Its main advantage is speed
and efficiency. The downside is that both parties need to share the same secret key,
which requires secure key exchange.
● Triple Data Encryption Standard (3DES): This is an older symmetric algorithm that
applies DES encryption three times for increased security. While more secure than
DES, 3DES can be slower than AES. It's still used in some legacy systems but not as
common for new deployments.

Advantages of Symmetric Encryption:


● Fast and efficient
● Relatively simple to implement

Disadvantages of Symmetric Encryption:


● Key management can be complex, especially for large deployments
● Not suitable for scenarios where key distribution is challenging

II) Asymmetric Encryption


● Rivest-Shamir-Adleman (RSA): This is a widely used public-key encryption system. It
uses a key pair: a public key for encryption and a private key for decryption. RSA is
secure for data transfer over the internet. However, it can be slower than symmetric
encryption and is not ideal for encrypting large amounts of data.
(2048-bit keys are used with RSA for stronger encryption. However, with current technology,
there's no known practical need for key sizes this large for most purposes. The trade-off is that
encryption and decryption become computationally expensive.)
● Elliptic Curve Cryptography (ECC): This is a newer and more efficient public-key
cryptography method compared to RSA. It offers similar security levels with smaller
key sizes, making it faster. ECC is becoming increasingly popular for digital
signatures and secure communications.

Advantages of Asymmetric Encryption:


● More secure key distribution
● Well-suited for digital signatures and authentication
Disadvantages of Asymmetric Encryption:
● Slower than symmetric encryption
● More complex to implement

You might also like