Machine Learning Internship Report
Machine Learning Internship Report
Technofly Solutions & Consulting was found in year 2017 by a team with 14+ years of
experience in embedded systems domain. Technofly Solutions focuses globally on
automotive embedded technologies and VLSI Design, Corporate Training & Consulting. Till
now we have delivered more than 15+ Corporate Trainings for companies working in
Embedded Automotive Technologies in India. Also involved in the Development of OBD2
(On Board Diagnose Product for Passenger cars) for clients in India.
Technical Expertise
1. Microcontroller Drivers
5. Model based software development: Modeling, Simulation, Auto coding and Reverse
Engineering
3. HVAC Systems
6. Seat Modules
Expertise in ASIC VLSI:
1. Verilog courses
4. Functional Verification
Process Quality:
Technologies:
2. Embedded C, Python, Iot (PHP Front End & MY SQL Back End) Wireless –
Bluetooth, GPS, GPRS, Wi-Fi
Management:
The Management team as mixture of Technical and Business development expertise with
14+years of experience in the Information Technology Field.
Present the company is involved with developing the GPS Training system for two wheels
with our associated partners also more focusing on Corporate Trainings on AUTOMOTIVE
EMBEDDED and Focused on providing ASIC solutions that involves Design and
Verification IP’s And Functional Verification of Designs.
Company Profile:
TechnoFly was formed by professionals with formal qualifications and industrial experience
in the fields of embedded systems, real-time software, process control and industrial
electronics. The company is professionally managed and supported by qualified experienced
specialists and consultants with experience in embedded systems – including hardware and
software.
Initially, the company Developed system software tools; these include C Compilers for
micro-controllers and other supporting tools such as assembler, linker, simulator and
Integrated Development Environment. Later Single Board Computers (SBCs) – were
developed and are still manufactured. Such hardware boards support a broad range of
processors – including 8 bit, 16 and 32 bit processor.
Since 2015, company also started offering design and development services. This includes a
complete spectrum of activities in product development life cycle that is idea generation,
requirement gathering to prototype making, testing and manufacturing. Company has so far
provided product design services for various sectors which include the Industrial automation,
Instrumentation, Automotive, Consumer and Defense sector.
Services of Technofly:
When you don’t have enough time, or the right skills on hand, you can supplement your team
with expert embedded engineers from Technofly, who can tackle your projects with
confidence, take out the risk, and hit your milestones. We’ll take as much ownership as you
want us to, and make sure your project is done right, on time and on budget. Go ahead, check
our reputation for on-time, on-budget delivery. We've earned it, time and again.
We can help you cut risk on embedded systems R&D, and accelerate time to market.
Technofly is your best choice for designing and developing embedded products from
concept to delivery. Our team is well-versed in product life cycles. We build complex
software systems for real-time environments and have unique expertise and core
competencies in the following domains: Wireless, Access and IOT/Cloud.
The department is currently developing and examining optimal solutions for Network Data
Rate maximization in both co-operative and non-cooperative network users scenarios
involving cognitive(SU’s) and non-cognitive(PU’s) devices. The work is mainly concentrated
on:
1. Resource management (Spectrum management as well as power management),
5. Efficiency analysis
The department is actively involved in acquiring latest technologies related projects in Low
power VLSI, wireless domain and these projects are well thought out and detailed
implementations are carried out. Projects are mainly done on Verilog, MATLAB platform
(from math works) and may also depend on NS2, NetSim and Xilinx platforms as per the
requirements of the project in progress.
Current internship involves study implementation and analysis of High speed and Energy
Efficient Carry Skip adder (CSKA) with Hybrid model for achieving high speed and reducing
the power consumption.
1. Study Requirements: Low power VLSI design and fundamentals of Digital circuits
Technofly solution offers services in the areas of Real-Time Embedded Systems, Low power
VLSI design, Verification and Software Engineering Services. Its strong team of around 30
engineers is equipped with the right tools and right processes to deliver the best. Technofly
solution also offers customization of its products.
Real Time Embedded System and Low power VLSI design Department:
1. Design Services
2. Product Realization
Design Services:
The hardware design and development follow stringent life cycle guidelines laid out at
Technofly solution while accomplishing the following –
Design Assurance
1. Signal Integrity
2. Cross-talk
4. Power supply design with due emphasis for Low-power battery operated
5. applications
6. Thermal analysis
7. Clock distribution
8. Timing analysis
Design optimization
1. Cost , Size
PCB design
Pilot production
2. PCB assembly
Software Development
6. Board bring-up
ASIC
1. Design IP’s
Skill Set
Tools
FPGA Tools
2. Back End Design: Xilinx ISE 9.1.03i ,Actel’s Libero 6.0 , Altera’s MAXPlusII
Simulation:
1. Xilinx ModelSim SE
3. Altera’s MAXPlusII
Coverage Analysis:
TransEDA VN-Cover
Debugging:
ChipScope
Hardware Tools:
1. Spectrum Analyzer
2. Signal Generators
3. Logic Analyzer
5. Multifunction Counters
6. Development Tools and In-circuit Emulators for all ADI DSP’s, TI DSP’s,
Product Realization
1. Consumer Electronics
2. Automotive
3. Space
4. Defense
5. Simulation/Emulation
Our Work group productivity software suite Smart Works consists of software
applications which can help you plan and track your projects, Manage meetings and Track
various issues to its closures. Smart Works is affordably priced and uses TCP/IP based client
server architecture at its core. Smart Works server runs on all the windowing platforms
(Windows 95/98/NT/2000/ME). Efforts are on to make Smart works available on other
platforms as well.
Following are the skill sets Technofly solution has garnered in the area of software:
1. Programming Languages: C, C++, VC++, Java, C#, ASP.Net, PHP, Lex &Yacc,
Perl, Python, Assembly Language and Ada
Abstract
Heart disease is a major life threatening disease that can cause either death or a serious long
term disability. However, there is lack of effective tools to discover hidden relationships
and trends in e-health data. Medicaldiagnosis is a complicated task and plays a vital
role in saving human lives so it needs to be executed accurately and efficiently. An
appropriate and accurate computer based automated decision support system is required
to reduce cost for achieving clinical tests. This paper provides an insight into machine
learning techniques used in diagnosing various diseases. Various data mining classifiers
have been discussed which has emerged in recent years for efficient and effective
disease diagnosis.
However using data mining technique can reduce the number of test that are
required. In orderto reduce from heart diseases there have to be a quick and efficient
detection technique. Decision Tree is one of the effective data mining methods used.
This research compares different algorithms of Decision Tree classification seeking
better performance in heart disease diagnosis. The algorithms which are tested are
SVM algorithm, K Nearest Neighbour algorithm and Random Forest algorithm .
Decision Tree is one of the effective data mining methods used. This datasets
consists of 303 instances and 76 attributes. Subsequently, the classification algorithm
that has optimal potential will be suggested for use in sizeable data. The goal of this
study is to extract hidden patterns by applying data mining techniques, which are
noteworthy to heart diseases and to predict the presence of heart disease in patients
where this presence is valued from no presence to likely presence.
Process of training and prediction involves use of specialized algorithms. We feed the
training data to an algorithm, and the algorithm uses this training data to give predictions on a
new test data.
There are various machine learning algorithms like Decision trees, Naive Bayes, Random
forest, Support vector machine, K-nearest neighbour, K-means clustering, etc.
Machine Learning is the art (and science) of enabling machines to learn things which are not
explicitly programmed.
It involves as much mathematics as much it involves computer science. Most often, people
(also read as “sometimes me too”) are put off by the sheer amount of mathematical equations
and concepts in machine learning papers or articles that we ditch the entire article without
reading.
In this series, I will (possibly with the help of my friends, which will be duly noted in the
respective articles) talk about machine learning and deep learning math-free.
Purists might argue, learning is incomplete without the math behind it. I AGREE. But this is
not intended to be a complete reference to the machine learning concepts, this series intends to
start a conversation, or encourage thought in this direction.
Python is widely used general purpose,high level programming language. It was initially
designed by Guido van Rossum in 1991and developed by Python Software Foundation.It was
mainly developed for emphasis on code readability, and its syntax allows programmers to
express concepts in fewer lines of code.
Python is a programming language that lets you work quickly and integrate systems
more efficiently. Python was designed for readability, and has some similarities to the English
language with influence from mathematics. Python uses new lines to complete a command as
opposed to other programming language which often use semicolons or parentheses.
The most recent major version of Python is Python 3, which we shall be using in this
tutorial. Python can be used on a server to create web applications. It can be used along side
software to create workflows. It can connect to database system and also read and modify
files.
Python can be used to handle big data and perform complex mathematics. It can be
used for rapid prototyping, or for production ready software development. It works on
different platforms (Windows, Mac, Linux, Raspberry Pi, etc). It runs on an interpreter
system, meaning that code can be executed as soon as it is written. This means that
prototyping can be very quick.
2. Easy to Learn: Learning python is easy as this is a expressive and high level
programming language, which means it is easy to understand the language and thus easy to
learn.
3. Cross platform: Python is available and can run on various operating systems such as
Mac, Windows, Linux, Unix etc. This makes it a cross platform and portable language.
5. Large standard library: Python comes with a large standard library that has some handy
codes and functions which we can use while writing code in Python.
6. Free: Python is free to download and use. This means you can download it for free and use
it in your application. See: Open Source Python License. Python is an example of a FLOSS
(Free/Libre Open Source Software), which means you can freely distribute copies of this
software, read its source code and modify it.
7. Supports exception handling: If you are new, you may wonder what is an exception?
exception is an event that can occur during program exception and can disrupt the normal
flow of program. Python supports exception handling which means we can write less error
prone code and can test various scenarios that can cause an exception later on.
8. Advanced features: Supports generators and list comprehensions. We will cover these
features later.
Applications of Python
1. Web development – Web framework like Django and Flask are based on Python. They
help you write server side code which helps you manage database, write backend
programming logic, mapping urls etc.
2. Machine learning – There are many machine learning applications written in Python.
Machine learning is a way to write a logic so that a machine can learn and solve a particular
problem on its own. For example, products recommendation in websites like Amazon,
Flipkart, eBay etc. is a machine learning algorithm that recognises user’s interest. Face
recognition and Voice recognition in your phone is another example of machine learning.
3. Data Analysis – Data analysis and data visualisation in form of charts can also be
developed using Python.4. Scripting – Scripting is writing small programs to automate
simple tasks such as sending automated response emails etc. Such type of applications can
also be written in Python programming language.
7. Desktop applications – You can develop desktop application in Python using library like
TKinter or QT.
Python is increasingly being used as a scientific language. Matrix and vector manipulations
are extremely important for scientific computations. Both NumPy and Pandas have emerged
to be essential libraries for any scientific computation, including machine learning, in python
due to their intuitive syntax and high-performance matrix computation capabilities.
In this post, we will provide an overview of the common functionalities of NumPy and
Pandas. We will realize the similarity of these libraries with existing toolboxes in R and
MATLAB. This similarity and added flexibility have resulted in wide acceptance of python in
the scientific community lately. Topic covered in the blog are:
Overview of NumPy
Overview of Pandas
Using Matplotlib
This post is an excerpt from a live hands-on training conducted by CloudxLab on 25th
Nov 2017. It was attended by more than 100 learners around the globe. The participants were
from countries namely; United States, Canada, Australia, Indonesia, India, Thailand,
Philippines, Malaysia, Macao, Japan, Hong Kong, Singapore, United Kingdom, Saudi Arabia,
Nepal, & New Zealand.
Pandas:
1
Similar to NumPy, Pandas is one of the most widely used python libraries in data science. It
provides high-performance, easy to use structures and data analysis tools.
Matplotlib:
2 >>>x=[21,22,23,4,5,6,77,8,9,10,31,32,33,34,3
5,36,37,18,49,50,100]
3
>>>num_bins=5
4
>>>plt.hist(x,num_bins,facecolor='blue')
5
>>>plt.show()
Objectives :
The Heart Disease Prediction application is an end user support and online consultation
project. Here, we propose a web application that allows users to get instant guidance on their
heart disease through an intelligent system online. The application is fed with various details
and the heart disease associated with those details. The application allows user to share their
heart related issues. It then processes user specific details to check for various illness that
could be associated with it. Here we use some intelligent data mining techniques to guess the
most accurate illness that could be associated with patient’s details. Based on result, the can
contact doctor accordingly for further treatment. The system allows user to view doctor’s
details too. The system can be used for free heart disease consulting online.
Heart disease is the leading cause of death in the world over the past 10 years
(World Health Organization 2007). The European Public Health Alliance reported that
heart attacks, strokes and other circulatory diseases account for 41% of all deaths
(European Public Health Alliance 2010). Several different symptoms are associated
with heart disease, which makes it difficult to diagnose it quicker and better.
Methodology:
1.SVM (Support Vector Machine)
For example, if we only had two features like Height and Hair length of an individual, we’d
first plot these two variables in two dimensional space where each point has two co-ordinates
(these co-ordinates are known as Support Vectors)
Now, we will find some line that splits the data between the two differently classified groups
of data. This will be the line such that the distances from the closest point in each of the two
groups will be farthest away.
For instance, orange frontier is closest to blue circles. And the closest blue circle is 2 units
away from the frontier. Once we have these distances for all the frontiers, we simply choose
the frontier with the maximum distance (from the closest support vector). Out of the three
shown frontiers, we see the black frontier is farthest from nearest support vector (i.e. 15
units).
2.Decision Tree
This is one of my favorite algorithm and I use it quite frequently. It is a type of supervised
learning algorithm that is mostly used for classification problems. Surprisingly, it works for
both categorical and continuous dependent variables. In this algorithm, we split the
population into two or more homogeneous sets. This is done based on most significant
attributes/ independent variables to make as distinct groups as possible. For more details, you
can read: Decision Tree Simplified.
In the image above, you can see that population is classified into four different groups based
on multiple attributes to identify ‘if they will play or not’. To split the population into
different heterogeneous groups, it uses various techniques like Gini, Information Gain, Chi-
square, entropy.
The best way to understand how decision tree works, is to play Jezzball – a classic game from
Microsoft (image below). Essentially, you have a room with moving walls and you need to
create walls such that maximum area gets cleared off without the balls.
So, every time you split the room with a wall, you are trying to create 2 different populations
with in the same room. Decision trees work in very similar fashion by dividing a population
in as different groups as possible.
3.kNN (k- Nearest Neighbours)
It can be used for both classification and regression problems. However, it is more widely
used in classification problems in the industry. K nearest neighbors is a simple algorithm that
stores all available cases and classifies new cases by a majority vote of its k neighbors. The
case being assigned to the class is most common amongst its K nearest neighbors measured
by a distance function.
These distance functions can be Euclidean, Manhattan, Minkowski and Hamming distance.
First three functions are used for continuous function and fourth one (Hamming) for
categorical variables. If K = 1, then the case is simply assigned to the class of its nearest
neighbor. At times, choosing K turns out to be a challenge while performing kNN modeling.
KNN can easily be mapped to our real lives. If you want to learn about a person, of whom
you have no information, you might like to find out about his close friends and the circles he
moves in and gain access to his/her information!
Works on pre-processing stage more before going for kNN like outlier, noise removal
4.Random Forest
Random Forest is a trademark term for an ensemble of decision trees. In Random Forest,
we’ve collection of decision trees (so known as “Forest”). To classify a new object based on
attributes, each tree gives a classification and we say the tree “votes” for that class. The forest
chooses the classification having the most votes (over all the trees in the forest).
If the number of cases in the training set is N, then sample of N cases is taken at random but
with replacement. This sample will be the training set for growing the tree.
If there are M input variables, a number m<<M is specified such that at each node, m
variables are selected at random out of the M and the best split on these m is used to split the
node. The value of m is held constant during the forest growing.
Task Performed:
Dataset Description
We performed computer simulation on one dataset. Dataset is a Heart dataset. The dataset is
available in UCI Machine Learning Repository [10]. Dataset contains 303 samples and 14
input features as well as 1 output feature. The features describe financial, personal, and social
feature of loan applicants. The output feature is the decision class which has value 1 for Good
credit and 2 for Bad credit. The dataset-1 contains 700 instances shown as Good credit while
300 instances as bad credit. The dataset contains features expressed on nominal, ordinal, or
interval scales. A list of all those features is given in Table .
1 Age
2 Sex
3 Cp
4 Trestbps
5 Choi
6 Fbs
7 Restesg
8 Thalach
9 Exang
10 Oldpeak
11 Slop
12 Ca
13 Thal
14 Num
Conclusion:
The project involved analysis of the heart disease patient dataset with proper data processing.
Then, 4 models were trained and tested with maximum scores as follows:
Reference:
[1]C. S. Dangare and S. S. Apte, “Improved study of heart disease prediction system
using data mining classification techniques,” International Journal of Computer
Applications, vol. 47, no. 10, pp. 44–48, 2012.
[2] S. Palaniappan and R. Awang, “Intelligent heart disease prediction systemusing data
mining techniques,” pp. 108–115, 2008.
[3] Y. E. Shao, C.-D. Hou, and C.-C. Chiu, “Hybrid intelligent modelling schemes for
heart disease classification,” Applied Soft Computing,vol. 14, pp. 47–52, 2014.
[4] M. Shouman, T. Turner, and R. Stocker, “Using data mining techniquesin heart
disease diagnosis and treatment,” pp. 173–177, 2012.;3
[6] J. Nahar, T. Imam, K. S. Tickle, and Y.-P. P. Chen, “Computationalintelligence for heart
disease diagnosis: A medical Knowledge driven approach,” Expert Systems with
Applications, vol. 40, no. 1, pp. 96–104,2013.
[7] Y. Xing, J. Wang, Z. Zhao, and Y. Gao, “Combination data miningmethods with
new medical data to predicting outcome of coronary heartdisease,” pp. 868–872, 2007.
[8] Combination data mining methods with new medical data to predictingoutcome of
coronary heart disease,” in Convergence InformationTechnology, 2007. International
Conference on. IEEE, 2007, pp. 868–872.
[9] Y. E. Shao, C.-D. Hou, and C.-C. Chiu, “Hybrid intelligent modelling, schemes
for heart disease classification,” Applied Soft Computing,vol. 14, pp. 47–52, 2014.