Instructions
Instructions
At Sahaj, we strive to build high-quality software that has strong aesthetics (is
readable and maintainable), has extensive safety net to safeguard quality, handles
errors gracefully and works as expected, without breaking down.
We are looking for people with data science knowledge coupled with pragmatism to
deliver the implementation to business production environments. The data scientist
should understand the domain and build models that have the ability to deal with
the real-life constraints.
Following is a list of things to keep in mind, before you submit your solution, to
ensure that your model focuses on attributes, we are looking for -
Page 1
Problem Statement
Your goal is to build a model to predict the outcome of a football match, given data for the past 9
years. All the football matches from 2009 to 2017 are covered in the dataset.
Based on the dataset provided, your goal should be to come up with an optimal solution to
predict if a Home Team would win or lose or draw a game (column name FTR) for the year of
2017-18.
We are looking to compare multiple approaches and choose the one that performs the best.
Use of external data sources is encouraged, with some recommendation. Of course, the actual
results of these matches can be easily downloaded from the web. However, this problem
statement is intended to be for fun and learning. You can choose to enrich the dataset by using
other publicly available European football league datasets, e.g. http://www.football-data.co.uk/
or http://football-data.mx-api.enetscores.com/
The train and the test dataset is not randomly sampled for testing and training
Page 2
About the Data
The data is collected from http://www.football-data.co.uk/ and consists of different leagues.
Column Details
Name Description
Page 3
Expectations from the submission
1. Approach to the dataset.
2. Choice of the model(s).
3. Model validation approach (used locally).
4. Choice of features used for the model creation.
Page 4