Internship Report - Software - Salaries Predictions
Internship Report - Software - Salaries Predictions
Chapter 1
Company profile
and employees. So that future human resources will be very beneficial, purposeful and
profitable to the nation.
1.5 Objectives
• AAPL had a trust in Skill India mission & vision, hence our utmost priority is to add skill to the
young Generation and make them Profitable and productive for the nation.
• We aim in Providing Industrial Automation Training Skill module kits to Institution University’s
& Collage Lab Facilities with Lowest Possible Price for Benefits of Technical Students.
• Identifying young entrepreneurs and motivate, training them to establish Start-up to create
Employment as well as prosperity for the nation.
• Consultation, Sourcing and supplying highly skilled Manpower to Industry for better efficiency
and productivity.
• Very eager to fetch solution for most complex industrial problems in a mode
Organization structure The organization structure is having three different departments such
as design department, software department and sales and marketing.
• All type of automation projects to companies using PLC’s, SCADA embedded systems.
• We provide robots and robotic solutions to small and medium scale companies.
Chapter 2
Introduction
Data analysis is a process of inspecting, cleansing, transforming and modelling data with the goal of
discovering useful information, informing conclusions, and supporting decision making. Every
product that is manufactured is supposed to have distinguishing physical characteristics, which makes
it attractive and provides usefulness and value to customers; these characteristics are known as
Design, and the process employed in this regard is known as product design. Product designs clearly
defines a problem , develops a proper solution for that problem and validates the solution with the
real users Product design is the process of creating a new product to be sold by a business to its
customers. It is essentially the efficient and effective generation and development of ideas through a
process that leads to new products
Data mining is a particular data analysis technique that focuses on statistical modelling and
knowledge discovery for predictive rather than purely descriptive purposes, while business
intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business
information. In statistical applications, data analysis can be divided into descriptive statistics,
explorative data analysis (EDA), and confirmation data analysis (CDA). EDA focuses on discovering
new features in the data while CDA focuses on confirming or falsifying existing hypothesis.
Predictive analysis focuses on the application of statistical models for predictive forecasting or
classification, while text analytics applies statistical, linguistic, and structural techniques to extract
and classify information from textual sources, a species of unstructured data. All of the above are
varieties of data analysis.
Chapter 3
Tools exposed
The notebook dashboard is the component which is shown first when you launch jupyter
notebook app. The notebook dashboard is mainly used to open notebook documents and
manage the running kernels. The jupyter notebook extends the console based approach to
interactive computing in a qualitatively new direction, providing a web based application
suitable for capturing the whole computation process: developing, computing and executing
code as well as communicating the results. The jupyter notebook combines two components a
web application and notebook documents.
A web application: A web browser based tool for interactive authoring of documents
which combine explanatory text, mathematics, computations and their rich media output.
Notebook documents: A representation of all content visible in the web application, including
inputs and outputs of the computations, explanatory text, mathematics, images and rich media
representation of objects.
Colaboratory or colab for short, is a product from Google research. Colab allows anybody to write
and execute arbitrary python code through the browser and is especially well suited to machine
learning, data analysis and education. More technically colab is a hosted jupyter notebook service
that requires no setup to use, while providing access free of charge to computing resources including
GPUs.
Colab resources are not guaranteed and not unlimited, and the usage limits sometimes
fluctuate. This is necessary for colab to be able to provide resources free charge. Resources in colab
are prioritized for interactive use cases. We prohibit actions associated with bulk compute, actions
that negatively impact others as well as actions associated with bypassing the policies. Jupyter is the
open source project on which the colab is based. Colab allows you to use and share jupyter notebooks
with others without having to download, install or run anything.
You can search colab notes using google drive. Clicking on the colab logo at the top left of the
notebook view will show all notebooks in drive. You can also search for notebooks that you have
opened recently by clicking on file and then open notebook. Google drive operations can time out
when the number of folders or subfolders in a folder grows too large. If thousands of items are directly
contained in the top level “My drive” folder then mounting the drive will likely time out. Repeated
attempts may eventually succeed as failed attempts cache partial state locally before timing out.
Colab is able to provide resources free of cost in part by having dynamic usage limits that sometimes
fluctuate this means that overall usage limits as well as idle timeout periods, maximum VM lifetime,
GPU types available and other factors vary over time. Colab does not publish these limits in parts
because they can vary quickly. This is necessary for colab to be able to provide access these resources
free of charge. Colab works with most of the major browsers and is most thoroughly tested with the
latest versions of Chrome, Firefox and Safari.
Chapter 4
Task performed: Data analysis of software salaries
While analysing data sets, it is important to define the objectives so that further steps become clearer.
Analysis lets us pose questions about data. For questioning data, it is important to have data collection
on which further operations will be carried out. After the above steps, “Data Analysis” comes into
picture. Data analysis o is the process of raw data cleaning and conversion so that further operations
become easier to carry on and then the conclusions can be drawn from the results. For Today, data
has become the backbone of all research in almost every field. Research and analysis is no more
limited to just the area of sciences, but has grown to be a part of businesses – startups and established
organisations, government works and more.
Data set :
A data set is a collection of similar and related data or information. It is organised for better
accessibility of an entity. Data sets are used for data analytics as they provide related information in
a united form. It can be structured or unstructured.
Data set link: https://www.kaggle.com/code/sagarvarandekar/software-salaries-eda
Chapter 5
Results and discussions
1. Scatter plot
2. Bar plot
3. Count plot
4. Bar plot
5. Bar plot
6. Heat map
Chapter 6
Reflection notes
• Machine learning involves computations on large data sets, hence we learnt strong basic fundamental
knowledge such as computer architecture, algorithms and data structure complexity. Getting in depth
into the python language and exploring new commands.
• Synthesize visual perception skills along with drawing skills to visually communicate ideas.
Deconstruction of designs for its motives and inspirations. To learn to synthesize data and make
connections within the data points using the available frameworks.
• To frame an appropriate actionable problem statement with reference to user needs and contextual
alignments.
• Data analysis of different data sets and to understand the concepts on a real world basis to implement
and make use of AI/ML in our upcoming career.
• To train different models and to make sure the requirement of the respective clients and make
to implement a model according to their requirements .
Time management helps you allocate time for the most important tasks. When we follow a schedule
we don’t have to spend time and energy on what to do. Instead we can focus on what matters and do
well. The quality of the work will suffer if we are constantly worrying about meeting the deadlines.
Time management helps to prioritize the tasks, so we can have enough time to focus on each project
to put in the effort and produce high quality outcomes.
Many software companies have to work against tight timelines. Proper time management will allow
us to allocate enough time to meet each deadline. Planning ahead also keeps us calm and think freely
to work more in an efficient way.
Confidence is the key to a positive personality. Exude confidence and positive aura wherever you go.
Personality development teaches you to be calm and composed even at stressful situations. Never
over react. Avoid finding faults in others. Learn to be a little broad minded and flexible.
Chapter 7
Conclusion
In conclusion, this internship has been a very useful experience for me. I can safely say that my
understanding of the job environment has increased greatly. However, I do think that there are some
aspects of the job that I could have done better and that I need to wort on. I have built more confidence
in usage of software tools. The two main things I learnt after my experience in this firm are time
management and being self-motivated. I have gained new knowledge and skills and met new people.
Usage of big data tools can improve operational efficiency. Data analysis helps companies make
informed decisions, create a more efficient marketing strategies, improve customer experience,
streamline operations , among many other things. Usage of charts, maps, other visual representations
of data to help present your findings in an easy-to-understand way. Improving the data visualisation
skills often means learning visualisation software.
References
1. Mittal P, Sharma M, Jain P. A detailed study of security and privacy concerns in big data. Int J Appl
Eng Res. 2018; 13:7406–11
2. Paul Z, Eaton C. Understanding big data: analytics for enterprise class Hadoop and streaming data.
McGraw-Hill Osborne Media;2011. P. 1–166.
3. Philip R. Big data analytics. TDWI best practices report, fourth quarter. 2011;19(4):1–
34.
6. G data: various R programming tools for data manipulation. [cited 2005 May].
7. The art of R programming: a tour of statistical software design. [cited 2012 Apr].
https://www.researchgate.net/publication/254296013_The_Art_of_R_Programming_A _ Tour _of
_Statistical _Software _Design _by _Norman _ Matloff.