EE2211 CheatSheet
EE2211 CheatSheet
Lecture 7
Supervised learning: given feature(s) x, we want to predict
target. Algorithm for it:
Lecture 10
Useful Methods for EE2211:
How do i write the fact that I have good communication skills ( i know 3 languages, english
tamil and chinese), good leadership skills as i have been teaching tuition for 2 years and
am a good team member for a job
scipy.integrate.quad(function, lower_limit,
upper_limit)
scipy.stats.norm
collections.defaultdict
• Allows you to group a sequence of key-value pairs into a dictionary of lists
• This means that if a key has been addressed multiple times by the values, these values are put
into and array and the newly created array is made the value of the key
• You can choose what you want the key to be
• You first have to implement as a function:
• d = defaultdict(default_factory)
• default_factory can be sets, tuples or list depending on the type that you’d want to
arrange
• ‘d’ is still a dictionary and the dict methods apply to it
matplotlib.pyplot
• This method allows you to plot a graph
• .plot(x_axis_values, y_axis_values, marker_characters)
• .plot.xlabel lets you label the x-axis
• .plot.ylabel lets you label the y-axis
numpy.ones(shape)
● takes in the shape of new array, eg: 3 x 1
● returns array of given array, filled with ones
numpy.sign(array)
● takes in array and returns signum output of elements in array
numpy.argmax(array, axis)
● takes in array to find the index of the maximum elements
● by default, array is flattened
● if axis=0, then in each column, the maximum element is found and it’s row index is returned
● if axis=1, then in each row, the maximum element is found and it’s column index is returned
numpy.argsort([array])
● it returns the indexes into the original list so that if you use them in that order applied to the
original list, the list is sorted
numpy.matlib.repmat(a,m,n):
numpy.array(list)
• Takes in a list to display it in a matrix form
• You can access matrix elements with array indexing
• E.g: nparray[:,0] - returns elements in the first column
• nparray[[0,1,2] : [0,1,0]] - returns the elements in the [0,0], [1,1] and [2,0] positions
numpy.where(condition[,x,y])
● return elements choses form x or y depending on condition
● condition: when True, yield x, otherwise yield y
● returns an array with elements from x where conditions is True and elements from y elsewhere
● np.where(a < 5, a, 10*a)
array([ 0, 1, 2, 3, 4, 50, 60, 70, 80, 90])
pandas.read_csv(file_path, delimiter)
• Allows you to read in comma separated values
• delimiter/separator is usually Automatically detected, or else it can be specified
Csv dataframes
• When you read in csv data, it gets put into a data frame
• It’s basically a miniature excel type table
• Its fields can be accessed by Dataframe[‘field_name’]
• To access the rows and columns by label, we use dataframe.loc[‘what_you_want_to_find’]
• .index returns the row_labels, usually the left most columns
• Dataframes are made up of index and columns
Dataframe.describe()
• Gives all the descriptive statistics of the dataframe
Dataframe.replace(original, new)
• Replace the old values with new ones
sklearn.preprocessing.scale(dataframe/ndarray)
• Sometimes we need to scale our values into a common range
• If not the machine learning will be bias towards the larger values
• This one function does it all for us
• And the scaled data has unit variance and zero mean
sklearn.metrics.mean_squared_error(y_true, y_pred)
● returns the mean square error as an array or float
sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(100, ), activation='relu', *,
solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001,
power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False,
warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False,
validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10,
max_fun=15000)
● a multilayer perceptron classifier
● the number of neurons in the hidden layer, total - 2
● activation: to specify the activation function
● solver: the solver for weight optimization
● alpha: L2 penalty parameter
● batch_size: not for lbfgs
● learning_rate: for sgd
● max_iter: the number of iterations
● .loss_: the current loss computed with the loss function
● .classes_: class labels for each output
● .fit(X,y): fit the model data matrix X and target y
● predict(X): predict the classes using mlp
sklearn.neural_network.MLPRegressor()
● same as above
bias_variance_decomp(estimator, X_train, y_train, X_test, y_test, loss='0-1_loss',
num_rounds=200, random_seed=None)
estimator : object A classifier or regressor object or class implementing a fit predict method similar
to the scikit-learn API.
sklearn.ensemble.RandomForestRegressor(n_estimators=100, *, criterion='mse',
max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0,
max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, random_state=None,
verbose=0, warm_start=False, ccp_alpha=0.0, max_samples=None)
● n_estimators: the number of trees in the forest
● max_depth: the maximum depth of the trees, default None
● criterion: the function to measure the quality of the split. “mse” or “mae”
● max_features: the number of features to consider when looking for the best fit: “auto, sqrt,
log2”
● impurity can be specified
● .fit(X,y): build a forest from the training set X,y, # reshape X , it is necessary because tree
expects 2D array
● .predict(X): predicts the regression target values for X
● .score(X,y): returns the coefficient of determination R^2 of the prediction
numpy.matlib.repmat(a,m,n)
● repeats the matrix a MxN times
● m,n: the number of times a is repeated in the row and columns
If your array has weights (ie. your distribution is not uniform) you can use
Numpy.average(array, weights) to calculate the expected value of such distribution.
numpy.dot(a, b)
● for taking matrix multiplication of matrix a and matrix b
● returns a computed matrix.
● input must be of numpy.array type
numpy.linalg.inv(a)
● returns the inverse of matrix a
● input must be of numpy.
numpy.linalg.pinv(a)
● takes in a matrix and returns its pseudo inverse
● pseudo inverse rhs before multiplying with y for overdetermined system
numpy.unique(array, axis)
● this method returns the unique elements of the input array
● the result is sorted
● axis is the axis to work on, i.e. if axis=NONE, the input array is flattened, if 0, it looks for unique
rows, if 1, looks for unique columns.
numpyArray.reshape(a, newshape)
● returns the array in the new rows and columns
● a is the array
● if newshape is an integer then returns 1-D array of that length, or can be tuple
Mean Squared error: the average squared difference between the estimated values and what is
estimated.
numpy.random.randn(d0,d1….)
● returns a sample from the “standard normal” distribution
● d0,d1 the dimensions of the returned array
sigma * np.random.randn(...) + mu
numpy.c_[]
● translates slice objects to concatenation along the second axis
numpy.column_stack(tup)
● stacks 1-D arrays as columns into 2-D array
● tup: tuple of arrays to stack, must have the same first dimension
● returns stacked 2_D array
our goal is to find a line that minimizes the squared distances to these different points.
Confusion Matrix
Leaf Node
Leaf nodes are the nodes of the tree that have no additional nodes coming off them. They
don't split the data any further
Decision trees are examples of models with low bias and high variance. The tree makes
almost no assumptions about target function but it is highly susceptible to variance in data.
There are ensemble algorithms, such as bootstrapping aggregation and random forest, which
aim to reduce variance at the small cost of bias in decision trees.
Relations:
u got it? wtv answer u get just put might not have time to take the best ss 1 or 2 options might have
been left off
1.D
2. 1-0.125 , 2-0.649.
3.false
4.b
5.
6.F
7.False
8.
9.
10.a)wo = 0, w1 = 0.240 and w2 = -0.420.
b)girl
c)1.862 m
11.C
12.
13.False
14.
15.
16.
17.D
18.a)wo ≈ 0.915 and w1 ≈ 1.139.
b)wo is 2.935 and w1 is 1.471
c)ynew = 3.638
19.F
20.F