Library
00/07 · ~34 min
GUIDEDECK · for engineers meeting ML for the first time

Machine Learning
Fundamentals
patterns, not rules.

A 34-minute working session on the ideas under every ML system: how a model learns from data instead of hand-written rules, the main families of algorithms, how you train and evaluate one — and the honest question of when ML is the wrong tool.

~34 MINBEGINNER → INTERMEDIATELANGUAGE-AGNOSTIC + PYTHON
SCROLL
01 · What machine learning is 4 min

Learn the rules from data —
don't write them by hand.

In ordinary programming you write the rules and the computer follows them. Machine learning flips that around: you feed it examples — inputs and the answers you want — and it figures out the rule that connects them. The output is a model you can run on new, unseen inputs.

Machine learning programs that improve at a task by learning patterns from data rather than following rules a human typed out. You give it labeled or raw examples; a training process searches for a model that maps inputs to outputs; then you use that model to predict on data it has never seen. The whole bet is generalization — that patterns learned from the past hold up on the future.
TRADITIONAL PROGRAMMING rules data program answers MACHINE LEARNING data answers training model (rules)

Same pieces, reversed arrows: classic code consumes rules; ML produces them from data plus answers.

The mental switch

  • You stop asking "what rule should I write?" and start asking "what data shows the pattern?"
  • The model is only as good as its data — garbage in, garbage out is not a slogan here, it is the dominant failure mode.
  • It is statistical, not exact. A model gives likely answers with error bars, not guarantees.
  • Success is measured on data it has never seen, not on the examples it trained on.
Hand-written rules — brittle
# hand-written rules — endless edge cases def is_spam(email): if "free money" in email: return True if email.count("!") > 5: return True return False # spammers adapt → you patch forever
Learned model — adapts
# learn the pattern from labeled examples from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X_train, y_train) # X = features, y = spam? model.predict(new_email) # generalizes to unseen spam

The honest question: when ML is the wrong tool

ML is not a default. It adds data pipelines, training cost, drift, and opacity. Reach for plain rules or a lookup table when:

  • The rule is known and stable — tax brackets, password length, business policy. Just write the if.
  • You have little or no data — no examples means nothing to learn from.
  • Mistakes are unacceptable or must be explained — a model gives probabilities, not proofs; some decisions need an auditable rule.
  • A simple heuristic already works."Flag orders over $10k" may beat a model nobody can debug at 2 a.m.

Rule of thumb  reach for ML when the rules are too many, too fuzzy, or always changing to write by hand — recognizing spam, ranking results, forecasting demand.

02 · The landscape 5 min

Three ways a machine
can learn.

Almost everything in classical ML falls into one of three setups, defined by what kind of feedback the model gets while learning: full answers, no answers, or rewards.

Supervised

Learn from labeled answers

Every training example comes with the correct output (the label). The model learns the mapping features → label, then predicts labels for new inputs.

  • Tasks:regression (a number) & classification (a category).
  • Examples: spam detection, price prediction, credit scoring.
  • Needs: labeled data — often the expensive part.
Unsupervised

Find structure, no labels

You hand the model raw data with no answers and ask it to find structure on its own — natural groups, or a simpler representation.

  • Tasks:clustering & dimensionality reduction.
  • Examples: customer segments, anomaly detection, topic discovery.
  • Catch:no ground truth, so "good" is harder to define.
Reinforcement

Learn by trial & reward

An agent takes actions in an environment and gets rewards or penalties. Over many tries it learns a policy that maximizes long-term reward.

  • Tasks:sequential decisions & control.
  • Examples: game play, robotics, RLHF that aligns chatbots.
  • Catch: data-hungry and tricky to get stable.
labeled data (features, label) model learns f(x)→y new input x prediction train

Supervised flow: labeled examples train the model once; afterward it labels brand-new inputs.

How to tell them apart

  • Got labels? → supervised. Each row has the right answer attached.
  • No labels, want groups? → unsupervised. You are exploring structure.
  • An agent acting over time for reward? → reinforcement.
  • A middle ground, semi-supervised, mixes a few labels with lots of unlabeled data when labeling is costly.

Most production ML you will meet is supervised — so the rest of this deck leans there.

03 · Core tasks & algorithms 7 min

Three tasks, a handful
of workhorse algorithms.

The task is what you are predicting; the algorithm is how. Click through the three core tasks, then skim the cheat-sheet of algorithms that earn their keep on everyday tabular data.

Regression — predict a continuous number

from sklearn.linear_model import LinearRegression model = LinearRegression().fit(X_train, y_train) model.predict(X_new) # → a number: price, temperature, demand…
feature →

Fit a line (or curve) through the data; read off a number for any new input.

Use when
The answer is a quantity — house price, delivery time, next-week sales.
Go-to algorithms
Linear regression first; gradient boosting when relationships are non-linear.

Classification — predict a category

from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(n_estimators=300) clf.fit(X_train, y_train) clf.predict(X_new) # → a label: spam / not-spam, fraud / ok
decision boundary

Learn a boundary that separates the classes; new points are labeled by which side they land on.

Use when
The answer is a class — yes/no, or one of several categories.
Go-to algorithms
Logistic regression as a baseline; random forest or XGBoost to push accuracy.

Clustering — group without labels

from sklearn.cluster import KMeans km = KMeans(n_clusters=4).fit(X) # no labels! km.labels_ # → which group each point landed in

No labels given — the algorithm groups points by similarity and you interpret the groups.

Use when
You want to discover segments or structure you did not pre-define.
Go-to algorithms
k-means for round, well-separated groups; DBSCAN for odd shapes & outliers.

Algorithm cheat-sheet

Linear regression

Fits a straight-line relationship. Fast, interpretable baseline for predicting a number.

+ simple, explainable

− misses non-linear patterns

Logistic regression

Despite the name, a classifier: outputs a probability for a class. The default first model.

+ calibrated, fast

− linear boundaries only

Decision tree

A flowchart of yes/no splits. Human-readable but prone to memorizing the training set.

+ easy to read

− overfits alone

Random forest

Many trees on random subsets, votes averaged (bagging). Strong, forgiving default.

+ robust, little tuning

− larger, less readable

Gradient boosting

Trees built in sequence, each fixing the last one's errors. XGBoost / LightGBM dominate tabular contests.

+ top accuracy on tables

− tuning matters

k-NN

Predict by looking at the k nearest known examples. No real training — it just remembers.

+ dead simple

− slow at predict time

k-means

Unsupervised: partitions points into k clusters around moving centers.

+ fast clustering

− pick k; round blobs only

SVM

Finds the widest-margin boundary between classes; kernels handle non-linear splits.

+ strong on small data

− scales poorly to huge sets

These shine on structured / tabular data — rows and columns. When the input is images, audio, or free text, hand-made features break down and you move to Deep Learning & Neural Networks, where the model learns its own features.

04 · Training basics 5 min

Features in, and a fair way
to measure what comes out.

Two ideas do most of the work: turning raw data into good features, and splitting your data so you measure the model on examples it never trained on. Skip the split and you will fool yourself every time.

Feature one measurable input column the model learns from (age, price, words-in-email). Turning messy raw data into useful features — scaling numbers, encoding categories, combining fields — is feature engineering, and on tabular data it usually moves the needle more than swapping algorithms.

The train / validation / test split

  • Training set — the model fits its parameters on this.
  • Validation set — used to tune choices (which algorithm, which settings / hyperparameters).
  • Test set — touched once, at the very end, for an honest estimate of real-world performance.

The cardinal sin is data leakage: letting any test information sneak into training. The test score then lies, and production disappoints.

ALL LABELED DATA training · 60% val · 20% test · 20% fit the model tune final check

A typical 60 / 20 / 20 split. Exact ratios vary; the principle — never test on what you trained on — does not.

from sklearn.model_selection import ( train_test_split, cross_val_score) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42) scores = cross_val_score(model, X_train, y_train, cv=5) scores.mean() # avg score across 5 folds
5-FOLD CROSS-VALIDATION val train train train train train val train train train train train val train train train train train val train train train train train val average the 5 scores

Cross-validation rotates the validation fold so every row is tested once — a steadier score than a single split.

05 · The central problem 5 min

Memorizing the past vs.
generalizing to the future.

Every model lives between two failure modes. Learn too little and it misses the pattern; learn too much and it memorizes noise. Managing that tension — the bias-variance trade-off — is the heart of practical ML.

Overfitting the model fits the training data so closely it captures its noise and then fails on new data. Its mirror image is underfitting — the model is too simple to capture the real pattern, so it does poorly on training and new data. The tell: great training score, poor test score = overfit; both scores poor = underfit.
underfit too simple good fit captures trend overfit memorizes noise

Same points, three models. The straight line misses the trend; the wiggly line chases every dot; the smooth curve generalizes.

model complexity → error → bias² variance total error sweet spot underfit overfit

Bias is error from being too simple; variance is error from being too sensitive. Total error bottoms out in between.

Symptoms of overfitting
  • Training accuracy near-perfect, test accuracy much lower.
  • A big gap between train and validation scores.
  • Wild predictions on slightly different inputs.
Ways to fix it
  • More / cleaner data, or fewer features. Simpler model.
  • Regularization (L1 / L2), tree depth limits, dropout.
  • Cross-validation and early stopping to catch it sooner.
06 · Evaluating a model 5 min

Accuracy alone
can lie to you.

Pick the metric that matches the cost of being wrong. On a dataset that is 99% "not fraud", a model that always says "not fraud" scores 99% accuracy and catches zero fraud. The confusion matrix is where you start.

Confusion matrix a table of predictions vs. actual labels: true positives (TP), true negatives (TN), false positives (FP — false alarms), and false negatives (FN — misses). Every classification metric is just a ratio built from these four cells.
PREDICTED positive negative ACTUAL positive negative TP hit FN miss FP false alarm TN correct reject precision = TP/(TP+FP) · recall = TP/(TP+FN)

Four outcomes. Which one hurts more — a false alarm or a miss — decides whether you optimize precision or recall.

A
Accuracy
share of all predictions that were correct
+

(TP + TN) / everything. Fine when classes are balanced — misleading when one class is rare (fraud, disease).

P
Precision
of the ones flagged positive, how many really were
+

TP / (TP + FP). Optimize when a false alarm is costly — e.g. blocking a legit payment.

R
Recall
of the real positives, how many you caught
+

TP / (TP + FN). Optimize when a miss is costly — e.g. failing to detect cancer or fraud.

F
F1 score
the balance of precision and recall
+

The harmonic mean of precision and recall — one number when you need both to be decent and the classes are imbalanced.

from sklearn.metrics import ( classification_report, confusion_matrix) preds = model.predict(X_test) print(confusion_matrix(y_test, preds)) print(classification_report(y_test, preds)) # precision, recall, F1 per class — one call

Regression has its own metrics

  • MAE — mean absolute error. Average miss, in the original units; easy to explain.
  • RMSE — root mean squared error. Like MAE but punishes big misses harder.
  • — share of variance the model explains; 1.0 is perfect, 0 is no better than guessing the mean.

The discipline is the same: choose the metric before you train, tied to what a mistake actually costs the business.

07 · Tooling landscape & honest trade-offs 3 min

Pick the right library —
and know when not to use ML.

A small, stable set of libraries covers almost everything. Then the harder call: ML is powerful, but it is also a liability. Here is the honest balance sheet.

scikit-learn

Classical ML on tabular data

The standard toolkit: regression, classification, clustering, splits, metrics — one clean API.

+ batteries-included, easy to learn

− not for deep learning / GPUs / huge data

Choose when you have rows-and-columns data and want a strong model fast. Start here.

XGBoost / LightGBM

Gradient boosting for tables

Specialized boosted-tree libraries that win most tabular benchmarks and Kaggle competitions.

+ top accuracy, handles mixed features

− more hyperparameters to tune

Choose when scikit-learn baselines plateau and you need the last few points of accuracy on tables.

PyTorch

Deep learning & research

The dominant framework for neural networks — flexible, Pythonic, behind most modern research and LLMs.

+ flexible, huge ecosystem

− overkill for small tabular problems

Choose when  your data is images, audio, or text — see Deep Learning.

TensorFlow / Keras

Deep learning at production scale

The other major deep-learning stack; Keras gives a friendly high-level API, with strong mobile / serving tooling.

+ mature deployment path

− steeper, heavier than scikit-learn

Choose when you need deep nets plus a battle-tested serving and edge story.

  • Tabular data? scikit-learn first, then XGBoost / LightGBM for more accuracy.
  • Images, audio, text? PyTorch or TensorFlow — go deep.
  • Generative AI / chatbots? Usually you do not train from scratch — you build on a foundation model: see Building LLM Apps.
  • Shipping & monitoring it? That is its own discipline — MLOps & Model Deployment.

ML vs. plain rules — the honest balance

ML earns its keep when
  • The rules are too many, fuzzy, or always changing to hand-code.
  • You have plenty of representative, labeled data.
  • The pattern is real and reasonably stable over time.
  • Being approximately right at scale beats being exactly right slowly.
Plain rules win when
  • The logic is known, simple, and stable — just write it.
  • Data is scarce, biased, or expensive to label.
  • Decisions must be explained, audited, or guaranteed.
  • A cheap heuristic already meets the bar — start there.
The real costs of ML — beyond the model: data pipelines, labeling, drift (the world changes, accuracy decays), bias (a model learns the prejudices in its data), opacity (hard to explain a decision), and ongoing monitoring. A model is a liability you maintain, not a feature you finish.

Five things to walk out with

1ML learns rules from data. Use it when rules are too many or too fuzzy to write — not as a default.
2Know the setup. Supervised (labels), unsupervised (structure), reinforcement (reward) — pick by the feedback you have.
3Test on unseen data. Train / validation / test splits and cross-validation keep you honest; leakage makes scores lie.
4Mind the trade-off. Fight both underfitting and overfitting; generalization, not training accuracy, is the goal.
5Measure what matters. Choose precision, recall, F1 or RMSE by the cost of being wrong — accuracy alone can deceive.
Knowledge check

Did it stick?

Five quick questions on what ML is, the learning setups, training, the bias-variance trade-off, and metrics — instant feedback, no sign-in.

Rate this deck
be the first

Navigate with ← → or scroll · back to library