Machine Learning Philosophy

ifeelfree

3 min readApr 5, 2021

Part 1: rules of data scientist, data engineer and machine learning engineer

Data scientist: focus on data collection, interpretation and processing (feature engineering, model building).
Data engineer: focus on the infrastructure and workflow
Machine learning engineer: build production systems to handle updating models, model versioning, and serving predictions to end users.

Part 2: consensus general

## 2.1 what is a good alogrithm?

sophisticated algorithm < simple learning algorithm + good training data.

## 2.2 what makes it different from research papers and practical application?

The message to take away, especially in practical applications, is that what we want is both better algorithms and better training data. It’s fine to look for better algorithms, but make sure you’re not focusing on better algorithms to the exclusion of easy wins getting more or better training data.

Part 3: Deep Learning

## 3.1 the hardest part of deep learning

1. The hardest part of deep learning is artisan:

* how do you know if you’ve got enough data;

* whether it is in the right format;

* if your model is training properly; and if it’s not, what should you do about it? That is why we believe in learning by doing

2. Black-box, non-interpret-able solution

## 3.2 should we focus on architecture?

Most of the time, however, picking an architecture isn’t a very important part of the deep learning process. It’s something that academics love to talk about, but in practice it is unlikely to be something you need to spend much time on. There are some standard architectures that work most of the time

## 3.3 neual network essentials

* large dataset

* varied dataset

* real dataset

## 3.4 why should we use deep layers?

We already know that a single nonlinearity with two linear layers is enough to approximate any function. So why would we use deeper models? The reason is performance. With a deeper model (that is, one with more layers) we do not need to use as many parameters; it turns out that we can use smaller matrices, with more layers, and get better results than we would get with larger matrices, and few layers.

## 3.5 Why should we use bigger neural network?

The takeaway is that you should not be using smaller networks because you are afraid of overfitting. Instead, you should use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting

## 3.6 What’s the short-comings of CNN?

Convolutional inductive biases, though, lack a global understanding of the image itself. They are great at extracting visual features but they are not able to modelize the dependencies between them.

For example, a convolutional layer of a model trained to recognize faces can encode information about whether the features “eyes”, “nose” or “mouth” are present in the input image, but these representations will not have the kind of of “eyes above nose” or “mouth below nose” because each convolutional kernel will not be large enough to process multiple of these features at once.

Large receptive fields are required in order to track long-range dependencies within an image, which in practice involves using large kernels or long sequences of convolutional layers at the cost of losing efficiency and making the model extremely complex, even impossible to train.

Part 4: traditional methods

## 4.1 What are the assumptions of common machine learning models?

If you plan to use Regression or any of the Generalized Linear Models (GLM), there are model assumptions you must validate before building your model.

For SVM or tree-based models, there aren’t any model assumptions to validate.

See [Back to Basics: Assumptions of Common Machine Learning Models](https://towardsdatascience.com/back-to-basics-assumptions-of-common-machine-learning-models-e43c02325535)

Part 5: Data

- “Torture the data long enough, and it will confess”

It means that if you look at the data through enough different perspectives and ask enough questions, you almost invariably will find a statistically significant effect.

- “Garbage in, garbage out”