0% found this document useful (0 votes)

9 views

Startup Ecosystem Analysis Model

The document provides a comprehensive analysis of the Indian startup funding ecosystem, detailing various aspects such as funding trends over time, industry preferences, and the role of location in startup growth. It includes a project description, dataset information, and methodologies for data cleaning and statistical analysis. The insights aim to guide new investors in making informed decisions based on data-driven visualizations and trends.

Uploaded by

nermine.limem.tbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Startup Ecosystem Analysis Model

Uploaded by

nermine.limem.tbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

1. Understanding The Data

1.1. Project Description

1.2. About the Datasets

1.3. Import The Libraries

1.4. Performing Essential Statistical Analysis on the Dataset

1.5. Data Cleaning and Preparation

1.6. Feature Engineering

2. How Does the Funding Ecosystem changes with respect to Time?

3. What is the General Amount that Startups get in India?

4. Which Kind of Industries are more preferred for Startups?

5. Does Location also play a role, In determining the Growth of a Startup?

6. Who plays the main role in Indian Startups Ecosystem?

7. What are the different Types of Funding for Startups?

1. Understanding The Data

1.1. Project Description
As the startup world keeps changing, figuring out how funding works is essential. This information collection
gives a complete picture of how startups get funded in India, including all the different ways it's been done
over time. From the kind of investments available to who the major players are and what industries get
funded the most, this data is like a detailed map that can help people involved in startups make smart
choices and spot new trends.

#### Purpose: To provide strategic guidance to new investors looking to invest in the Indian startup
ecosystem by analyzing data and visualizing which sectors, cities, and types of investments have the
highest potential

#### Project Importance: Examining this project helps new investors make more informed and strategic
decisions in the Indian startup ecosystem. Data-driven insights and visualizations enable investors to
minimize risks and capitalize on high-potential opportunities.
1.2. About the Datasets
- Dataset Descriptions: 'stocks_daily_prices.csv' / 'stocks_daily_returns.csv.

Content: Daily stock prices for various companies.

Rows: 3044
Columns-: 10
Sr No: Serial number.A unique identifier for each re-cord.
Date dd/mm/yyyy: The date when the funding event took place.
Startup Name: The name of the startup receiving the funding.
Industry Vertical: The primary industry to which the startup belongs.
SubVertical: A more specific category within the primary industry.
City Location: The city where the startup is headquartered.
Investors Name: The names of the investors or investment firms involved in the funding.
InvestmentnType: The type of investment (e.g., Seed, Series A, Series B).
Amount in USD: The amount of funding received in US dollars.
Remarks: Additional comments or details about thels about the funding event.

1.3. Import The Libraries

In [46]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import seaborn as sns
import folium
from folium.plugins import MarkerCluster
from wordcloud import WordCloud
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
warnings.warn("this will not show")
pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows', None)

In [47]: df0=pd.read_csv('startup_funding.csv')
df=df0.copy()

1.4. Performing Essential Statistical Analysis on the Dataset

In [48]: # Dimensions of the Data Set - (rows, columns)
df.shape

(3044, 10)
Out[48]:

In [49]: # Preview of Data Set

df.head()

Out[49]: Sr Date Startup Industry City Investors Amou

SubVertical InvestmentnType
No dd/mm/yyyy Name Vertical Location Name

Tiger Global Private Equity

0 1 9/1/2020 BYJU’S E-Tech E-learning Bengaluru 20,00,00
Management Round

1 2 13/01/2020 Shuttl Transportation App based Gurgaon Susquehanna Series C 80,48

shuttle Growth
service Equity

Retailer of
baby and Sequoia
2 3 9/1/2020 Mamaearth E-commerce Bengaluru Series B 1,83,58
toddler Capital India
products

Online Vinod
3 4 2/1/2020 Wealthbucket FinTech New Delhi Pre-series A 30,00
Investment Khatumal

Embroiled Sprout
Fashion and
4 5 2/1/2020 Fashor Clothes For Mumbai Venture Seed Round 18,00
Apparel
Women Partners

In [50]: # Data Type Properties

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3044 entries, 0 to 3043
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Sr No 3044 non-null int64
1 Date dd/mm/yyyy 3044 non-null object
2 Startup Name 3044 non-null object
3 Industry Vertical 2873 non-null object
4 SubVertical 2108 non-null object
5 City Location 2864 non-null object
6 Investors Name 3020 non-null object
7 InvestmentnType 3040 non-null object
8 Amount in USD 2084 non-null object
9 Remarks 419 non-null object
dtypes: int64(1), object(9)
memory usage: 237.9+ KB

In [51]: df.describe(include="object").T

Out[51]: count unique top freq

Date dd/mm/yyyy 3044 1035 2/2/2015 11

Startup Name 3044 2459 Ola Cabs 8

Industry Vertical 2873 821 Consumer Internet 941

SubVertical 2108 1942 Online Lending Platform 11

City Location 2864 112 Bangalore 700

Investors Name 3020 2412 Undisclosed Investors 39

InvestmentnType 3040 55 Private Equity 1356

Amount in USD 2084 464 10,00,000 165

Remarks 419 71 Series A 175

In [52]: # Checking Null Values

(df.isnull().sum() / df.shape[0] *
100).sort_values(ascending=False).round(2).astype(str) + ' %'

Remarks 86.24 %
Out[52]:
Amount in USD 31.54 %
SubVertical 30.75 %
City Location 5.91 %
Industry Vertical 5.62 %
Investors Name 0.79 %
InvestmentnType 0.13 %
Sr No 0.0 %
Date dd/mm/yyyy 0.0 %
Startup Name 0.0 %
dtype: object

1.5. Data Cleaning and Preparation

In [53]: #Changing commas in the 'Amount in USD' column
df['Amount in USD']=df['Amount in USD'].apply(lambda x: str(x).replace(',',''))

In [54]: # Correction of incorrect values from 'Amount in USD' column

replace_map={
"Undisclosed":"0",
"unknown":"0",
"undisclosed":"0",
"\\xc2\\xa020000000":"0",
"N/A":"0",
"nan":"0",
"\\xc2\\xa020000000":"0"
}
df['Amount in USD']=df['Amount in USD'].apply(lambda x: replace_map.get(str(x),x))

In [55]: # Conversion to digital data

df['Amount in USD'] = pd.to_numeric(df['Amount in USD'])

In [56]: # Replacing 'Amount in USD' 0 values with empty values

df["Amount in USD"]=df["Amount in USD"].replace(0,np.nan)

In [57]: # Replace empty values with average

df["Amount in USD"].fillna(df["Amount in USD"].mean(), inplace=True)

In [58]: # Correcting incorrect date values

data_replace_map={
'12/05.2015':'12/05/2015',
'13/04.2015':'13/04/2015',
'15/01.2015':'15/01/2015',
'22/01//2015':'22/01/2015/',
'05/072018':'05/07/2018',
'01/07/015':'01/07/2015',
'\\xc2\\xa010/7/2015': '10/07/2015',
'\\\\xc2\\\\xa010/7/2015': '10/07/2015'
}
df['Date dd/mm/yyyy']=df['Date dd/mm/yyyy'].apply(lambda x:data_replace_map.get(x,x))

In [59]: # Convert to datetime type by specifying the date format

df['Date dd/mm/yyyy'] = pd.to_datetime(df['Date dd/mm/yyyy'],
format='%d/%m/%Y',
errors='coerce')

In [60]: # 86.24% of the 'Remarks' column consists of empty values, so we remove this line
df.drop('Remarks', axis=1, inplace=True)

In [61]: # Replacing 'Bengaluru' used in the data set with the more common name 'Bangalore'
df['City Location'][df['City Location'] ==
'Bengaluru'] = 'Bangalore'

In [62]: # Change the name in the 'Undisclosed investors' column to 'Undisclosed Investors'
investor_replace_map={
'Undisclosed investors': 'Undisclosed Investors',
'Undisclosed Investor': 'Undisclosed Investors',
'undisclosed investors': 'Undisclosed Investors',
'Undisclosed': 'Undisclosed Investors'
}
df['Investors Name']=df['Investors Name'].apply(lambda x:investor_replace_map.get(x,x))

In [63]: # Removal of the gap in 'Ola Cabs'.

df['Startup Name'][df['Startup Name'] == 'Ola Cabs'] = 'OlaCabs'

In [64]: # Replace with a more commonly used word

investment_type_replace_map = {
'Seed/ Angel Funding': 'Seed / Angel Funding',
'Seed\\nFunding': 'Seed Funding',
'Seed/Angel Funding': 'Seed / Angel Funding',
'Angel / Seed Funding': 'Seed / Angel Funding'
}
df['InvestmentnType']=df['InvestmentnType'].apply(lambda x:investment_type_replace_map.g

In [65]: # Standardizing common industry terms using regex and string replacement
replacements = {
r'\be[ -]?commerce\b': 'e-commerce',
r'\bfintech\b': 'fintech',
r'\bhealth[ -]?tech\b': 'healthtech',
r'\bedu[ -]?tech\b': 'edtech',
r'\bfood[ -]?(tech|delivery)\b': 'food & beverage',
r'\btransportation|logistics\b': 'transportation & logistics',
r'\bconsumer internet\b': 'consumer internet',
r'\btechnology\b': 'technology',
r'\bagri[ -]?tech\b': 'agritech',
r'\bauto[ -]?tech\b': 'autotech',
r'\bmedia\b': 'media',
r'\bfinance\b': 'finance',
r'\bunknown\b': 'other'
}
# Applying replacements
for pattern, replacement in replacements.items():
df['Industry Vertical']=df['Industry Vertical'].str.replace(pattern, replacement,reg

In [66]: df['Industry Vertical']=df['Industry Vertical'].fillna('unknown')

df['IndustY Vertical']=df['Industry Vertical'].str.lower()
def clean_industry(industry):
parts=industry.split('&')
cleaned_parts=[]
for part in parts:
if part not in cleaned_parts:
cleaned_parts.append(part)
if len(cleaned_parts)==2:
break
return '&'.join(cleaned_parts)
df['Industry Vertical']=df['Industry Vertical'].apply(clean_industry)

1.6. Feature Engineering

In [67]: # Create the 'Year Month' column
df['Year Month']=(df['Date dd/mm/yyyy'].dt.year*100+df['Date dd/mm/yyyy'].dt.month)
# Let's check that the conversion was successful
df[['Date dd/mm/yyyy','Year Month']].head()

Out[67]: Date dd/mm/yyyy Year Month

0 2020-01-09 202001.0

1 2020-01-13 202001.0

2 2020-01-09 202001.0
3 2020-01-02 202001.0

4 2020-01-02 202001.0

In [68]: df.head()

Out[68]: Sr Date Startup Industry City Investors Amou

SubVertical InvestmentnType
No dd/mm/yyyy Name Vertical Location Name

Tiger Global Private Equity

0 1 2020-01-09 BYJU’S E-Tech E-learning Bangalore 2000000
Management Round

App based Susquehanna

1 2 2020-01-13 Shuttl Transportation shuttle Gurgaon Growth Series C 80483
service Equity

Retailer of
baby and Sequoia
2 3 2020-01-09 Mamaearth E-commerce Bangalore Series B 183588
toddler Capital India
products

Online Vinod
3 4 2020-01-02 Wealthbucket FinTech New Delhi Pre-series A 30000
Investment Khatumal

Embroiled Sprout
Fashion and
4 5 2020-01-02 Fashor Clothes For Mumbai Venture Seed Round 18000
Apparel
Women Partners

2. How Does the Funding Ecosystem changes with respect to Time?

In [69]: # Convert 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date dd/mm/yyyy'], format='%d/%m/%Y')
# Add Year and Month columns
df['Year']=df['Date'].dt.year
df['Month']=df['Date'].dt.month
# Yearly funding trend
funding_trend_yearly=df.groupby('Year')['Amount in USD'].sum().reset_index()
# Monthly funding trend
funding_trend_monthly=df.groupby(['Year','Month'])['Amount in USD'].sum().reset_index()
funding_trend_monthly['Month']=funding_trend_monthly['Month'].astype(str)
plt.figure(figsize=(10,6))
plt.plot(funding_trend_yearly['Year'], funding_trend_yearly['Amount in USD'], marker='o'
plt.title('Yearly Funding Trend')
plt.xlabel('Year')
plt.ylabel('Total Funding(USD)')
plt.grid(True)
plt.show()
fig=make_subplots(rows=1, cols=1, subplot_titles=('Monthly Funding Trend'))
# Monthly funding trend with selected years highlighted
colors={2015:'red', 2017:'green', 2019:'purple'}
for year in funding_trend_monthly['Year'].unique():
monthly_data=funding_trend_monthly[funding_trend_monthly['Year']==year]
color=colors.get(year,'gray')
fig.add_trace(
go.Scatter(x=monthly_data['Month'], y=monthly_data['Amount in USD'], mode='lines
name=str(year), line=dict(color=color)),
row=1, col=1
)
# Update layout
fig.update_layout(title_text='Monthly Funding Trend', height=600)
fig.update_xaxes(title_text='Month',row=1,col=1)
fig.update_yaxes(title_text='Total Funding Amount (USD)',row=1,col=1)
# Show the plot
fig.show()
Findings:
Yearly Funding Pattern:
The funding amounts demonstrate significant fluctuation year-over-year.
Funding reached a peak in 2017, nearing $10 billion total.
A substantial decrease followed in 2018, with another increase observed in 2019.
The data for 2020 shows a sharp decline, but it's likely incomplete because the year is partial or
ongoing. #### Potential Causes:

Economic cycles, investor sentiment, and macroeconomic factors might be influencing these variations.

Large funding rounds or significant investments in specific years can cause surges.

Monthly Funding Pattern:

The monthly funding trend exposes finer details, with numerous peaks and valleys.
There are substantial spikes in certain months, particularly in mid-2017 and mid-2019.
There appears to be some seasonality, with certain periods consistently exhibiting higher funding
activity.

In [70]: # Extract year and month for monthly funding analysis

df['Year']=df['Date dd/mm/yyyy'].dt.year
df['Month']=df['Date dd/mm/yyyy'].dt.month
# Group by Year and Month to get monthly funding amounts
monthly_funding=df.groupby(['Year','Month'])['Amount in USD'].sum().reset_index()
# Unique years in the dataset
years=monthly_funding['Year'].unique()
# Plotting monthly funding amounts for each year using subplots
fig, axs=plt.subplots(len(years)//2,2,figsize=(14,7*(len(years)//2)))
color='#4878A2'
for i,year in enumerate(years):
row=i//2
col=i%2
sns.barplot(x='Month',
y='Amount in USD',
data=monthly_funding[monthly_funding['Year']==year],
color=color,
ax=axs[row,col])
axs[row,col].set_title(f'Monthly Funding in {year}')
axs[row,col].set_xlabel('Month')
axs[row,col].set_ylabel('Funding Amount (USD)')
axs[row,col].grid(True)
plt.tight_layout()
plt.show()
Findings:
Seasonal Distribution:
Certain months, like January, July, and August, consistently exhibit higher funding levels, hinting at
potential seasonal patterns. Yearly Variations:

Peak funding:
months differ across years, suggesting that specific events or investments may influence funding
activity.

Funding Spikes:
Substantial surges in funding can be explained by large investment rounds or prominent startups
receiving funding.

Analysis Insight:
This examination provides a clear picture of how funding is dispersed throughout the year, emphasizing
significant periods of investment activity.

3. What is the General Amount that Startups get in India?

In [71]: # Preview of the details of the 10 most funded Initiatives
df.sort_values('Amount in USD',ascending=False).head(10)

Out[71]: Sr Date Startup Industry City Investors Am

SubVertical InvestmentnType
No dd/mm/yyyy Name Vertical Location Name

Rapido Bike Westbridge

60 61 2019-08-27 Transportation Bike Taxi Bangalore Series B 3.900
Taxi Capital

Online
651 652 2017-08-11 Flipkart eCommerce Bangalore Softbank Private Equity 2.500
Marketplace

Microsoft,
ECommerce eBay,
966 967 2017-03-21 Flipkart eCommerce Bangalore Private Equity 1.400
Marketplace Tencent
Holdings

Mobile
Wallet & SoftBank
830 831 2017-05-18 Paytm ECommerce Bangalore Private Equity 1.400
ECommerce Group
platform

Vijay
Mobile
31 32 2019-11-25 Paytm FinTech Noida Shekhar Funding Round 1.000
Wallet
Sharma

Steadview
Online Capital and
2648 2649 2015-07-28 Flipkart.com NaN Bangalore Private Equity 7.000
Marketplace existing
investors

Alibaba
2459 2460 2015-09-29 Paytm E-Commerce NaN New Delhi Group, Ant Private Equity 6.800
Financial

Private
188 189 2018-08-30 True North Finance Mumbai NaN Private Equity 6.000
Equity Firm

33 34 2019-10-02 Udaan B2B Business Bangalore Altimeter Series D 5.850

development Capital,
DST
Global

Baillie
Gifford,
Car Falcon
2244 2245 2015-11-18 Ola NaN Bangalore Private Equity 5.000
Aggregator Edge
Capital,
Tiger Gl...

In [72]: # Preview of the least funded initiatives

df.sort_values(by='Amount in USD').head(10)

Out[72]: Sr Date Startup Industry City Investors Amount

SubVertical InvestmentnType
No dd/mm/yyyy Name Vertical Location Name in USD

Hyderabad
Angels (at
3020 3021 2015-01-19 Enabli unknown NaN NaN Startup Seed Funding 16000.0 u
Heroes
event)

Hyderabad
Angels (at
3021 3022 2015-01-19 CBS unknown NaN NaN Startup Seed Funding 16000.0 u
Heroes
event)

Hyderabad
Angels (at
3019 3020 2015-01-19 Yo Grad unknown NaN NaN Startup Seed Funding 16000.0 u
Heroes
event)

Hyderabad
Angels (at
Play your
3018 3019 2015-01-19 unknown NaN NaN Startup Seed Funding 16000.0 u
sport
Heroes
event)

Hyderabad
Angels (at
3017 3018 2015-01-19 Hostel Dunia unknown NaN NaN Startup Seed Funding 16000.0 u
Heroes
event)

Group of
2933 2934 2015-02-02 Faaya unknown NaN NaN Angel Seed\\nFunding 16600.0 u
Investors

Group of
2934 2935 2015-02-02 InstaBounce unknown NaN NaN Angel Seed\\nFunding 16600.0 u
Investors

Group of
Chloroplast
2935 2936 2015-02-02 unknown NaN NaN Angel Seed\\nFunding 16600.0 u
Foods
Investors

Group of
2936 2937 2015-02-02 Dealwithus unknown NaN NaN Angel Seed\\nFunding 16600.0 u
Investors

Group of
2937 2938 2015-02-02 CleverSharks unknown NaN NaN Angel Seed\\nFunding 16600.0 u
Investors

In [73]: # Calculate the average funding amount

average_funding=df['Amount in USD'].mean()
# Log-transform the funding amounts for better visualization
df['Log Amount in USD']=np.log10(df['Amount in USD']+1)
# Plot the log-transformed funding amount distribution
plt.figure(figsize=(14, 8))
sns.histplot(df['Log Amount in USD'], bins=50, kde=True, color='#4878A2')
plt.axvline(np.log10(average_funding + 1), color='r', linestyle='--', label=f'Log Averag
plt.title('Log-Scaled Distribution of Funding Amounts for Startups in India')
plt.xlabel('Log Funding Amount (USD)')
plt.ylabel('Number of Startups')
plt.legend()
plt.grid(True)
plt.show()

4. Which Kind of Industries are more preferred for Startups?

In [74]: # Identify the top 10 industries
top_industries=df[(df['Industry Vertical']!='unknown')]['Industry Vertical'].value_count
top_industries.columns=['Industry Vertical','Count']
# Create the bar plot
plt.figure(figsize=(14,8))
sns.barplot(x='Count',y='Industry Vertical', data=top_industries, color='#4878A2')
plt.title('Top 10 Preferred Indusries for Startups')
plt.xlabel('Number of Starups')
plt.ylabel('Industry Vertical')
plt.grid(True)
plt.show()
In [75]: # Count the number of startups in each city
top_cities_count = df['City Location'].value_counts().head(10).reset_index()
top_cities_count.columns = ['City', 'Count']
# Plot the number of startups by city
plt.figure(figsize=(14, 8))
sns.barplot(x='Count', y='City', data=top_cities_count, color='#4878A2')
plt.title('Top 10 Cities by Number of Startups')
plt.xlabel('Number of Startups')
plt.ylabel('City')
plt.grid(True)
plt.show()

Output: Industry Preferences Analysis

Number of Funding Rounds per Industry
Top Industries:
Consumer Internet: Leading with the highest number of funding rounds (589 rounds).
Technology: Second most active with 310 funding rounds.
E-commerce: Significant presence with 170 funding rounds.

Total Funding Amount per Industry

Top Funded Industries:

E-commerce: Secured the highest total funding amount, indicating large investments in this sector
(7.16 billion Dollars).
Consumer Internet: Close behind with substantial funding (6.25 billion Dollars).
Technology: Also received considerable funding (2.23 billion Dollars). #### Insights

Active Sectors: Consumer Internet and Technology sectors are highly active in terms of funding rounds.

High Investment Sectors: E-commerce and Consumer Internet attract the highest total funding,
reflecting investor confidence and market potential in these sectors.
Industry Dynamics: The analysis highlights which industries are more preferred by investors and which
sectors secure larger investments.

In [76]: # Ensure 'Date' column is in datetime format

df['Date'] = pd.to_datetime(df['Date dd/mm/yyyy'], format='%d/%m/%Y')

# Extract year and group by year and industry

df['Year'] = df['Date'].dt.year
yearly_industry_count = df.groupby(['Year', 'Industry Vertical']).size().unstack().filln
# Plotting the line plots for top industries over time

top_industries_list = top_industries['Industry Vertical'].head(5)

plt.figure(figsize=(14, 8))
for industry in top_industries_list:
plt.plot(yearly_industry_count.index, yearly_industry_count[industry], marker='o', l
plt.title('Number of Startups Founded Over Time by Industry')
plt.xlabel('Year')
plt.ylabel('Number of Startups')
plt.legend()
plt.grid(True)
plt.show()
Output:Top 5 Industry Choice Analysis
Consumer Internet:

Peak in 2016: The number of consumer internet startups saw a significant peak in 2016 with over 500
startups founded. Sharp Decline: After 2016, there is a sharp decline, indicating a reduction in new
consumer internet startups over the subsequent years. Technology:

Steady Growth and Decline: Technology startups grew steadily, peaking in 2016 with around 200 startups,
followed by a decline similar to the consumer internet trend. Consistency: Despite the decline, the number
of technology startups remains relatively consistent compared to other industries.

E-Commerce:

Initial Growth: E-commerce startups showed initial growth, peaking in 2016 with about 150 startups.
Gradual Decline: There is a gradual decline after 2016, but not as steep as the consumer internet sector.

Healthcare:

Stability: The healthcare industry shows relative stability with slight fluctuations, peaking modestly in 2016
and maintaining a lower but consistent presence.

Insights
2016 as a Pivotal Year: Most industries, especially consumer internet, technology, and e-commerce, peaked
in 2016. This indicates a significant year for startup formations across these sectors. Post-2016, there is a
noticeable decline in new startup formations, which could be due to market saturation, changing investment
climates, or shifts in entrepreneurial focus.

Consumer Internet and Technology Leading: These two sectors have the highest peaks, indicating high
interest and investment in these areas during their peak years. The sharp decline post-2016 suggests
potential over-saturation or a shift in investor interest.

Steady but Low Growth in Healthcare: Healthcare startups show steady but lower growth compared to other
sectors, suggesting a more stable but less explosive industry.

Potential Reasons for Trends:

Economic Factors: Changes in the economic environment, funding availability, and investor sentiment
could explain the peak and subsequent decline. Market Saturation: High initial growth could lead to
market saturation, causing a drop in new startup formations in subsequent years.
Shifts in Focus: Emerging technologies and changing market demands might shift entrepreneurial
focus to other areas over time.

5. Does Location also play a role, In determining the Growth

of a Startup?
In [77]: top_cities_funding = df.groupby('City Location')['Amount in USD'].sum().reset_index()
top_cities_funding = top_cities_funding.sort_values(by='Amount in USD', ascending=False)
top_cities_funding.columns = ['City', 'Total Funding Amount']

# Plot the total funding amount by city

plt.figure(figsize=(14, 8))
sns.barplot(x='Total Funding Amount', y='City', data=top_cities_funding, color='#4878A2'
plt.title('Top 10 Cities by Total Funding Amount')
plt.xlabel('Total Funding Amount (USD)')
plt.ylabel('City')
plt.grid(True)
plt.show()

Output:Analysis of the Top 10 Cities by Number of Startups

Bangalore as the Primary Hub:
Bangalore's significant lead in the number of startups highlights its role as the primary tech and innovation
hub in India. The city's infrastructure, talent pool, and supportive ecosystem attract a large number of
startups.

Mumbai and New Delhi's Strong Presence:

Mumbai and New Delhi's high ranks underscore their importance in the Indian startup ecosystem. Mumbai's
financial prowess and New Delhi's political and incubator support contribute to their strong startup cultures.

Emergence of Other Cities:

Cities like Gurgaon, Pune, and Hyderabad show significant numbers of startups, indicating the
diversification of the startup ecosystem beyond the primary hubs. These cities offer favorable conditions
such as talent availability, infrastructure, and government support.

Regional Clusters:

The presence of multiple cities from the National Capital Region (NCR) like New Delhi, Gurgaon, and Noida
highlights the region's attractiveness for startups. Proximity to the capital and good connectivity are key
factors.

Supporting Infrastructure and Ecosystems:

The distribution of startups across these cities suggests that supporting infrastructure, educational
institutions, corporate presence, and government policies play crucial roles in fostering startup growth.

Conclusion
The chart indicates that location significantly influences startup growth. Cities with strong ecosystems,
infrastructure, and support systems tend to have higher concentrations of startups. Understanding these
dynamics can help stakeholders, including investors, entrepreneurs, and policymakers, make informed
decisions about where to focus their efforts and resources.

6. Who plays the main role in Indian Startups Ecosystem?

In [78]: # Investor analysis
investor_funding = df['Investors Name'].value_counts().reset_index()
investor_funding.columns = ['Investor', 'Number of Investments']

# Top 10 most invested investors

top_investors = investor_funding.head(10)

plt.figure(figsize=(14, 7))
plt.barh(top_investors['Investor'],
top_investors['Number of Investments'],
color='#4878A2')
plt.xlabel('Number of Investments')
plt.ylabel('Investor')
plt.title('Top 10 Investors in Indian Startup Ecosystem')
plt.gca().invert_yaxis()
plt.show()
In [79]: # Extract the most active startup founders (this is a simplification)
# Split multiple founders in a single row (assuming founders are listed in 'Startup Name
df['Founders'] = df['Startup Name'].fillna('Unknown').str.split(',')

# Explode the list of founders into separate rows

founders_exploded = df.explode('Founders')
top_founders = founders_exploded['Founders'].value_counts().head(10).reset_index()
top_founders.columns = ['Founder', 'Number of Startups']

# Plot the top founders

plt.figure(figsize=(14, 8))
sns.barplot(x='Number of Startups', y='Founder', data=top_founders, color='#4878A2')
plt.title('Top 10 Founders by Number of Startups')
plt.xlabel('Number of Startups')
plt.ylabel('Founder')
plt.grid(True)
plt.show()
7. What are the different Types of Funding for Startups?
In [40]: df.head()

Out[40]: Sr Date Startup Industry City Investors Amou

SubVertical InvestmentnType
No dd/mm/yyyy Name Vertical Location Name

Tiger Global Private Equity

0 1 2020-01-09 BYJU’S E-Tech E-learning Bangalore 2000000
Management Round

App based Susquehanna

1 2 2020-01-13 Shuttl Transportation shuttle Gurgaon Growth Series C 80483
service Equity

Retailer of
baby and Sequoia
2 3 2020-01-09 Mamaearth E-commerce Bangalore Series B 183588
toddler Capital India
products

Online Vinod
3 4 2020-01-02 Wealthbucket FinTech New Delhi Pre-series A 30000
Investment Khatumal

Embroiled Sprout
Fashion and
4 5 2020-01-02 Fashor Clothes For Mumbai Venture Seed Round 18000
Apparel
Women Partners

In [41]: from wordcloud import WordCloud

investment_types=df['InvestmentnType'].value_counts().reset_index()
investment_types.columns=['Investment Type','Number of Investments']
# Convert the investment types to a dictionary
investment_dict=dict(zip(investment_types['Investment Type'],investment_types['Number of
#generate a word cloud
wordcloud=WordCloud(width=800, height=400,background_color='white',colormap='coolwarm').
# Plot the word cloud
plt.figure(figsize=(14, 8))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Types of Funding for Startups')
plt.show()

In [42]: # Assuming df is the cleaned dataframe with startup data

# Count the number of each investment type
investment_types = df['InvestmentnType'].value_counts().head(10).reset_index()
investment_types.columns = ['Investment Type', 'Number of Investments']

# Plot the investment types

plt.figure(figsize=(14, 8))
sns.barplot(x='Number of Investments', y='Investment Type', data=investment_types, color
plt.title('Types of Funding for Startups')
plt.xlabel('Number of Investments')
plt.ylabel('Investment Type')
plt.grid(True)
plt.show()

Output:Analysis of the Types of Funding for Startups

Private Equity: Most Common Funding Type: Private Equity is the most common funding type, with
nearly 1,400 instances. This indicates that many startups in the dataset have reached a level of
maturity where they can attract significant private equity investments.

Seed Funding: Second Most Common: Seed Funding is close behind Private Equity, with a similar
number of instances. This suggests that many startups are in the early stages of their lifecycle, seeking
initial capital to develop their ideas and products.

Seed / Angel Funding: Early-Stage Investments: Seed / Angel Funding is also prominent, with a
significant number of instances. This type of funding is crucial for startups to get off the ground and
demonstrates the active role of angel investors in the ecosystem.

Diverse Funding Landscape: The chart demonstrates a diverse landscape of funding types, from early-
stage seed funding to later-stage private equity. This diversity is crucial for catering to the varying
needs of startups at different stages of their growth.

Importance of Early-Stage Funding: The high frequency of Seed Funding and Seed / Angel Funding
underscores the importance of early-stage investments in nurturing new startups. These funding types
are critical for startups to develop their initial ideas and products.
Private Equity's Dominance: The dominance of Private Equity highlights the significant role of large-
scale investments in the startup ecosystem. It suggests that many startups in the dataset have
achieved substantial growth and maturity, making them attractive targets for private equity investors.

Growth Funding Rounds: The presence of Series A, B, C, and D funding rounds, although less
frequent, indicates a structured path for startups to secure additional capital as they grow. Each
subsequent round typically involves larger amounts of funding and is aimed at scaling the business.

Role of Debt Funding: While less common, Debt Funding provides an alternative financing route for
startups. This can be particularly useful for startups that want to avoid equity dilution or have specific
capital requirements that debt can fulfill

Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
The Philippine Administrative System Module
No ratings yet
The Philippine Administrative System Module
16 pages
The Mean Activity Coefficients of HBR in Three Dil...
100% (1)
The Mean Activity Coefficients of HBR in Three Dil...
2 pages
Learning Pandas PDF
No ratings yet
Learning Pandas PDF
171 pages
The DAP Strategy: A New Way of Working to De-Risk & Accelerate Your Digital Transformation
From Everand
The DAP Strategy: A New Way of Working to De-Risk & Accelerate Your Digital Transformation
Raj Sundarason
No ratings yet
Startup Case Study
No ratings yet
Startup Case Study
5 pages
Eval Plus Notes
No ratings yet
Eval Plus Notes
69 pages
Case Study 2
No ratings yet
Case Study 2
13 pages
CST 383 Start-Up Success Failure - Colaboratory
No ratings yet
CST 383 Start-Up Success Failure - Colaboratory
32 pages
ds
No ratings yet
ds
114 pages
Christian Mayer, Lukas Rieger, Kyrylo Kravets - Coffee Break Pandas - 74 Pandas Puzzles To Build Your Pandas Data Science Superpower-Finxter - Com (2020)
No ratings yet
Christian Mayer, Lukas Rieger, Kyrylo Kravets - Coffee Break Pandas - 74 Pandas Puzzles To Build Your Pandas Data Science Superpower-Finxter - Com (2020)
156 pages
Pandas Notes
No ratings yet
Pandas Notes
3 pages
Assvid
No ratings yet
Assvid
13 pages
Pandas Roadmap
No ratings yet
Pandas Roadmap
6 pages
Pandas Data Manipulation Extended CheatSheet 1731972219
No ratings yet
Pandas Data Manipulation Extended CheatSheet 1731972219
9 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
No ratings yet
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
35 pages
Supermarket Sales Data analysis
No ratings yet
Supermarket Sales Data analysis
6 pages
ML Report Miniproject
No ratings yet
ML Report Miniproject
11 pages
Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
Importing & Managing Financial Data in Python: Aggregate Your Data by Category
No ratings yet
Importing & Managing Financial Data in Python: Aggregate Your Data by Category
32 pages
Practicals
No ratings yet
Practicals
42 pages
Asar Project
No ratings yet
Asar Project
14 pages
Oxy Metre
No ratings yet
Oxy Metre
17 pages
14oct Pandas 2024
No ratings yet
14oct Pandas 2024
13 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
;;;;;;;;;;;;;;;;;;;;;5
No ratings yet
;;;;;;;;;;;;;;;;;;;;;5
19 pages
Learninng Plan
No ratings yet
Learninng Plan
6 pages
Ipo Eda
No ratings yet
Ipo Eda
76 pages
Final Funda
No ratings yet
Final Funda
22 pages
Project Report
No ratings yet
Project Report
37 pages
Python For Data Analysis
67% (3)
Python For Data Analysis
39 pages
Data Analysis Roadmap
No ratings yet
Data Analysis Roadmap
17 pages
SalesMgmtSystem XII IP Projectreport 2022 23
No ratings yet
SalesMgmtSystem XII IP Projectreport 2022 23
18 pages
10 - Jayesh - Prakash - Rane
No ratings yet
10 - Jayesh - Prakash - Rane
26 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
stoke market
No ratings yet
stoke market
13 pages
Course_ Introduction to Data Science (SD211105)
No ratings yet
Course_ Introduction to Data Science (SD211105)
10 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Python - Data Analysis
No ratings yet
Python - Data Analysis
11 pages
EDA Python for Data Analsis
No ratings yet
EDA Python for Data Analsis
10 pages
Reading and Plotting Stock Data Notes
No ratings yet
Reading and Plotting Stock Data Notes
2 pages
profitanalysis
No ratings yet
profitanalysis
18 pages
Mohit 1
No ratings yet
Mohit 1
28 pages
Beeplov Sharma
No ratings yet
Beeplov Sharma
5 pages
Pandas_Notes_Design
No ratings yet
Pandas_Notes_Design
5 pages
lab 1 ML lab
No ratings yet
lab 1 ML lab
15 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Python Data Wrangling Tutorial With Pandas
No ratings yet
Python Data Wrangling Tutorial With Pandas
15 pages
@Bhaveshvaishnav844 Commented Video Timeline Topic
No ratings yet
@Bhaveshvaishnav844 Commented Video Timeline Topic
3 pages
Chapter 1
No ratings yet
Chapter 1
24 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
R Code
No ratings yet
R Code
5 pages
Automl Code
No ratings yet
Automl Code
3 pages
Pandas Cheatsheet DF
No ratings yet
Pandas Cheatsheet DF
1 page
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
58 pages
Lab 2 More Python
No ratings yet
Lab 2 More Python
28 pages
Gold Price Analysis (Neural Network)
No ratings yet
Gold Price Analysis (Neural Network)
44 pages
BigData - W1 - Practice - Data Acquisition - HoangVu
No ratings yet
BigData - W1 - Practice - Data Acquisition - HoangVu
50 pages
Capitalizing Data Science: A Guide to Unlocking the Power of Data for Your Business and Products (English Edition)
From Everand
Capitalizing Data Science: A Guide to Unlocking the Power of Data for Your Business and Products (English Edition)
Mathangi Sri Ramachandran
No ratings yet
AddressingBiasandDataPrivacyConcernsinAI-DrivenCreditScoringSystemsThroughCybersecurityRiskAssessment
No ratings yet
AddressingBiasandDataPrivacyConcernsinAI-DrivenCreditScoringSystemsThroughCybersecurityRiskAssessment
25 pages
Using Data Mining to Improve Assessment
No ratings yet
Using Data Mining to Improve Assessment
10 pages
RST-MLP method
No ratings yet
RST-MLP method
11 pages
AIandCreditScoring
No ratings yet
AIandCreditScoring
13 pages
Book Credit Scoring
No ratings yet
Book Credit Scoring
382 pages
Business-Plan EasyBank[1]
No ratings yet
Business-Plan EasyBank[1]
16 pages
page_de_garde (1)
No ratings yet
page_de_garde (1)
3 pages
Bridging+the+Gap-+The+Impact+of+Open+Banking+on+Traditional+Banking+and+FinTech+Collaboration+(5)
No ratings yet
Bridging+the+Gap-+The+Impact+of+Open+Banking+on+Traditional+Banking+and+FinTech+Collaboration+(5)
11 pages
futuresvaluation
No ratings yet
futuresvaluation
2 pages
Financial Market Project Report
No ratings yet
Financial Market Project Report
46 pages
Enhancing portfolio management using artificial intelligence
No ratings yet
Enhancing portfolio management using artificial intelligence
20 pages
PFE Report
No ratings yet
PFE Report
10 pages
2-FCFF-FCFE-Valuation-Models-Blank
No ratings yet
2-FCFF-FCFE-Valuation-Models-Blank
2 pages
houses prices prediction model
No ratings yet
houses prices prediction model
11 pages
Ranking Ratios based on weighted scores
No ratings yet
Ranking Ratios based on weighted scores
2 pages
Enhancing Financial Decision-Making and Education in Fintech with Data Analytics and Information Technology
No ratings yet
Enhancing Financial Decision-Making and Education in Fintech with Data Analytics and Information Technology
9 pages
Curriculum Vitae Example Nederlands
100% (2)
Curriculum Vitae Example Nederlands
8 pages
Unit 7 Designing The Innovative Process
No ratings yet
Unit 7 Designing The Innovative Process
13 pages
Forum S11 Thread Soal Valuation of Inventories 1-FIFO, LIFO, and AVERAGE - Diaz Hesron Deo Simorangkir - 2602202526
No ratings yet
Forum S11 Thread Soal Valuation of Inventories 1-FIFO, LIFO, and AVERAGE - Diaz Hesron Deo Simorangkir - 2602202526
2 pages
Quality Control Assignment 1
No ratings yet
Quality Control Assignment 1
11 pages
Total Productive Maintenance (TPM) Analysis on Lathe Machines using the Overall Equipment Effectiveness Method and Six Big Losses
No ratings yet
Total Productive Maintenance (TPM) Analysis on Lathe Machines using the Overall Equipment Effectiveness Method and Six Big Losses
6 pages
Class Routine (PMBA) Summer 2022 (Online)
No ratings yet
Class Routine (PMBA) Summer 2022 (Online)
1 page
Session 1 AUDITING AND ASSURANCE PRINCIPLES
89% (9)
Session 1 AUDITING AND ASSURANCE PRINCIPLES
55 pages
Leadership Curriculum V5
No ratings yet
Leadership Curriculum V5
17 pages
Cambridge IGCSE ™: Accounting
No ratings yet
Cambridge IGCSE ™: Accounting
15 pages
43 - Factors That Influence The Utilization of Digital Transactions Among The Micro-Business Enterprises in General Santos City
No ratings yet
43 - Factors That Influence The Utilization of Digital Transactions Among The Micro-Business Enterprises in General Santos City
15 pages
CV Board of Commissioners Board of Directors Sharia Supervisory Board
No ratings yet
CV Board of Commissioners Board of Directors Sharia Supervisory Board
13 pages
As A Principle, IFRS Standards Should Not Be Able To Be Applied Without The Accompanying Material. (Should Be Able)
No ratings yet
As A Principle, IFRS Standards Should Not Be Able To Be Applied Without The Accompanying Material. (Should Be Able)
11 pages
Menstrual Cups Report Studying The Market
No ratings yet
Menstrual Cups Report Studying The Market
48 pages
BTS - BE (DELUXE EDITION) - SubK Shop PDF
No ratings yet
BTS - BE (DELUXE EDITION) - SubK Shop PDF
1 page
2019-12-02 14 35 23 - خاص اقتصاد هندسي
No ratings yet
2019-12-02 14 35 23 - خاص اقتصاد هندسي
88 pages
Investing for Dummies 3rd Edition Tony Levene - The ebook is ready for instant download and access
100% (1)
Investing for Dummies 3rd Edition Tony Levene - The ebook is ready for instant download and access
61 pages
Who Owns Big Pharma + Big Media - You'Ll Never Guess. - Children's Health Defense
100% (1)
Who Owns Big Pharma + Big Media - You'Ll Never Guess. - Children's Health Defense
12 pages
SAP PP Questionnaire: The Following Are Some SAP PP Questionnaires Which You Can Try
No ratings yet
SAP PP Questionnaire: The Following Are Some SAP PP Questionnaires Which You Can Try
5 pages
1) - D7072 - EMERSON - (ALL-EEB05-0002-0-01) - 100216-NOI-1 Rev.1
No ratings yet
1) - D7072 - EMERSON - (ALL-EEB05-0002-0-01) - 100216-NOI-1 Rev.1
4 pages
Bapa 2020
No ratings yet
Bapa 2020
130 pages
Call For Papers - SPGC Summit 2024 - Amalfi Coast - May 15-17
No ratings yet
Call For Papers - SPGC Summit 2024 - Amalfi Coast - May 15-17
8 pages
GRR Basic Training
No ratings yet
GRR Basic Training
16 pages
RISK MANAGEMENT PLAN V2.0
No ratings yet
RISK MANAGEMENT PLAN V2.0
105 pages
Me Mod1@Azdocuments - in
No ratings yet
Me Mod1@Azdocuments - in
25 pages
Nps Charges
No ratings yet
Nps Charges
1 page
FundsIndia Wealth Conversations
No ratings yet
FundsIndia Wealth Conversations
105 pages
Test Bank for Auditing A Risk Based Approach to Conducting a Quality Audit 10th Edition Johnstone Gramling Rittenberg 1305080572 9781305080577 - Instant Download To Read The Complete Content
100% (19)
Test Bank for Auditing A Risk Based Approach to Conducting a Quality Audit 10th Edition Johnstone Gramling Rittenberg 1305080572 9781305080577 - Instant Download To Read The Complete Content
53 pages
Translate SPM Teori Strategic Planning Huha
No ratings yet
Translate SPM Teori Strategic Planning Huha
31 pages