0% found this document useful (0 votes)
14 views

Framing an Analytics Problem - Case 2

Uploaded by

Akshit Dahiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Framing an Analytics Problem - Case 2

Uploaded by

Akshit Dahiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

harshmadaan00@gmail.

com
AQFVB5DZMK
Framing an Analytics
Problem

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Case 2 – Solar Electric Incentive Program –
Project cost analysis case study

• The objective of this 2nd case is to understand the factors


that can affect or contribute to the cost of a project.
• The dataset we use for this case study is has been sourced
[email protected]
AQFVB5DZMK

from New York State Energy Research and Development


Authority(NYSERDA) and it includes the data points for Solar
electric projects in the Incentive Program that started in
December 2000.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Data Information
Variable Description
City Name of city for project location
County Name of county for project location
Name of project sector. The sectors in this dataset are either Residential or Non-
Sector Residential
Name of program type; either Residential/Small Commercial, Commercial/Industrial
Program Type (Competitive), or Commercial/Industrial (MW Block)
Electric Utility Name of electric utility for project location
Solar photovoltaic project purchase agreement type. The purchase types are either Lease,
Purchase Type Purchase or Power Purchase Agreement.
Date Application Received Date project application was received by the program
[email protected] Date NYSERDA recognized the project as interconnected and operational, and closed out
Date Completed
AQFVB5DZMK the project application.
Either Complete or Pipeline. Complete indicates projects that are interconnected and
operational, and closed out the project application. Pipeline indicates projects with an
Project Status active application that are not yet complete.
Total Inverter Quantity Quantity of all inverters installed for project.
Total PV Module Quantity Quantity of all photovoltaic (PV) modules installed for project.
Expected project installation cost in US dollars (USD), as reported by the solar project
Project Cost contractor.
$Incentive Amount of project incentives paid by the program in USD.
Total Nameplate kW DC The sum of kilowatt (kW) DC capacity ratings of the installed photovoltaic modules
Expected KWh Annual Production Expected annual electricity production in kilowatt-hours (kWh)
Affordable Solar Indicates if project is part of Affordable Solar program
Community Distributed Generation Indicates if project Community Distributed Generation (Shared Solar)
This file is meant for personal use by [email protected] only.
Contractor Name of entity responsible for installation of the project.
Sharing or publishing the contents in part or full is liable for legal action.
Our first task here would be to look at the type of data that has been
made available to us for analyzing. Let's take a look at the datatypes
of the variables.
Categorical Variables:
Binary: Date Time Type Variables:
Sector Date Application Received
Program Type Date Completed
Project Status
Affordable Solar
[email protected]
AQFVB5DZMK

Community Distributed Continuous Variables:


Generation Total Inverter Quantity
Total PV Module Quantity
Multi-level: Project Cost
City Incentive
County Total Nameplate kW DC
Electric Utility Expected KWh Annual Production
Purchase Type
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Task at Hand
• The Data Scientists in the team have access to
this data and they need to figure out how they
can use this available data to find out different
factors that can contribute to the overall project
[email protected]
AQFVB5DZMK

cost.
• The intent is to identify the avenues where there
is a scope of reducing the cost of the project.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Some questions that can be raised initially that
can act as a starting point to analyse the dataset
• What is the total project cost that has been estimated in Solar Electric projects since the
beginning of this incentive program?
• How are the projects distributed across different cities?
• What is the current status of the projects? What proportion of projects are completed and up &
running?
• More the number of photovoltaic modules installed, more would be the DC current generating
[email protected]
capacity subsequently
AQFVB5DZMK more would be the Expected annual electricity production in kilowatt-
hours (kWh). Which plot can we use to verify this claim?
• Are higher costing projects estimated to produce more electricity in terms of annual production?
• Amongst the completed project, which project contractor seems to have exceeded the timeline
of 5 years to complete the project since the date of application?
• Which type of program seems to have a higher cost of project? Does it adhere with the expected
annual electricity production?
• Do incentives depend on the cost of the project?
• Every city can have multiple electric utilities. How can we visualize the contribution of different
electric utilities to the cost of the project for the top 10 cities we previously identified?

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Lets start answering these questions using the data at our disposal.

What is the total project cost that has been estimated in


Solar Electric projects since the beginning of this incentive
program?

[email protected]
AQFVB5DZMK

Approximately, $7.3 billion have been invested in the Solar Electric project since
the beginning of the incentive program.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
How are the projects distributed across different
cities?
This question can further be divided to identify the top 10 cities that have most number
of projects implemented/in progress as there might be many entries for city and
visualizing all in a single graph becomes difficult.

[email protected]
AQFVB5DZMK

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Among the top 10 cities with most number of projects,
which cities has the most project cost on an average?

[email protected]
AQFVB5DZMK

Even though Staten Island has the most number of projects, the mean project cost is highest
in the city of Schenectady. This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
What is the current status of the projects? How many
projects are completed and up & running from each sector?

[email protected]
AQFVB5DZMK

Approximately 95% projects have been completed. Majority of the completed projects are from
Residential sector. This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
More the number of photovoltaic modules installed, more would be the DC current
generating capacity subsequently more would be the Expected annual electricity
production in kilowatt-hours (kWh). Use an appropriate plot to verify this claim.

[email protected]
AQFVB5DZMK

Plot 1

This file is meant for personal use by [email protected] only.


Continued on next slide….
Sharing or publishing the contents in part or full is liable for legal action.
Continued from previous slide…

[email protected]
AQFVB5DZMK

Using a statistical test (ANOVA), this can further be verified by formulating a hypothesis as follows:
H0 (Null Hypothesis): correlation for plot 1 = correlation for plot 2 = correlation for plot 3
Ha (Alternative Hypothesis): correlation for plot 1 != correlation for plot 2 != correlation for plot 3

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Are higher costing projects estimated to produce more
electricity in terms of annual production?

[email protected]
AQFVB5DZMK

Looking at the plot, we can see there seems to be a positive correlation between project cost and
Expected annual electricity production. So, quite clearly it looks like higher costing project might produce
more electricity. But, its also important to keep
Thisin
filemind
is meantthat “Correlation
for personal does not imply Causation”.
use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Amongst the completed projects, which project contractors seems
to have exceeded the deadline of 5 years to complete the project
since the date of application?

[email protected]
AQFVB5DZMK

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
How much time did similar projects take for completion
on an average?

[email protected]
AQFVB5DZMK

Contractor named "Solar Liberty Energy Systems, Inc." has taken the highest time(more than
6 years) to install the project and get it up & running. It takes just about 1 year to implement
similar projects. Whereas this project took about 6 years to complete.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
How much money was invested in similar projects on an
average?

[email protected]
AQFVB5DZMK

On an average, approximately 233 thousand dollars were invested in similar projects, but
about 320 thousand dollars were invested in this specific project which is more than the
average.

More details regarding the reasons for the delay in completion of the project as well as the
higher cost of the project should be discussed with the contractor.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Which type of program seems to have a higher cost of project?
Does it adhere with the expected annual electricity production?

[email protected]
AQFVB5DZMK

As we can see from the plots, the project cost is higher for "Commercial/
Industrial(MegaWatt Block)" program type which is quite obvious as it is expected to
generate the highest amount of annual electricity production.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Do incentives depend on the cost of the project?

[email protected]
AQFVB5DZMK

There seems to be a positive correlation between the project cost and the incentives. But
this might not actually be the only criteria for deciding the amount of incentive. Remember
the phrase, "Correlation does not imply causation". Its highly likely that there can be other
factors involved to decide the incentive amount.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Every city can have multiple electric utilities. How can we visualize
the contribution of different electric utilities to the cost of the
project for the top 10 cities we previously identified?

[email protected]
AQFVB5DZMK

From the above stacked bar plot, we can clearly see that most used electric utility is “Consolidated
Edison”. “Orange and Rockland Utilities” is the only electric utility available in Middletown. “National
Grid” is the only utility available in Schenectady and
This file Albany
is meant and ause
for personal majorly used utility in Buffalo.
by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
• This data can further be refined and used for prediction
of the project cost as well as the incentives.
• This dataset has a vast scope of drilling down to a specific
case/situation and analyzing the data revolving around a
certain situation, as we did in case of identifying the
project contractor who did not meet the 5 year
completion deadline.
[email protected]
AQFVB5DZMK

• This is just an introduction to what can be the


possibilities of exploring a certain dataset to find
meaningful insights w.r.t cost analysis. If you have set an
objective in mind, you can narrow down this analysis to
just follow the path to reach the objective which will also
save a lot of time.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
[email protected]
Thank You!
AQFVB5DZMK

END!

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.

You might also like