Here’s a Data Science Anki deck with a load of questions covering some of the usual suspects.
I’ll add more later, but at present the topics covered include:
- L1
- L2
- MLE
- SSE
- Sklearn
- accuracy
- adaboost
- advanced
- advantages
- aggregation
- algorithm
- applications
- assumptions
- augmentation
- automation
- averaging
- bagging
- baselines
- basics
- bayes-theorem
- benefits
- bias-variance
- bias_variance
- binary-classification
- binning
- blending
- boosting
- bootstrapping
- brier-score
- cart
- categorical
- categorical_features
- class-imbalance
- classification
- closed_form
- comparison
- computation
- concept
- concepts
- confusion-matrix
- continuous_features
- core-components
- cost-sensitive
- cross-entropy
- cross-validation
- cross_validation
- data
- data-issues
- data-leakage
- data-limitations
- data-prep
- data-preprocessing
- data-splitting
- decision-boundaries
- decision-boundary
- decision-making
- decision-trees
- decision_tree
- definition
- derivation
- derivatives
- design_matrix
- diagnostics
- dimensions
- disadvantages
- discriminative-classifiers
- distance-metrics
- diversity
- efficiency
- ensemble
- ensemble-methods
- entropy
- equation
- equations
- error
- evaluation
- evaluation-metrics
- examples
- explainability
- f1
- feature-analysis
- feature-engineering
- feature-importance
- feature-quality
- feature-scaling
- feature_engineering
- feature_importance
- feature_selection
- features
- formula
- fundamentals
- gaussian
- generalisation
- generative-classifiers
- generative-models
- gini
- gini_gain
- gini_vs_entropy
- gradient
- gradient-boosting
- gradient-descent
- gradient_descent
- greedy
- homoscedasticity
- hyperparameter
- hyperparameter-tuning
- hyperparameters
- imbalance
- implementation
- independence
- inference
- information-gain
- information_gain
- instability
- interactions
- interpretability
- interpretation
- intuition
- knn
- leaf
- lightgbm
- limitations
- linear-regression
- linear-vs-logistic
- linear_models
- linear_regression
- linearity
- local
- log-loss
- log-odds
- logistic-regression
- loss
- loss-function
- loss_functions
- loss_functions::mse
- machine-learning
- machine_learning
- matrix
- max_depth
- methods
- metric
- metrics
- min_samples_leaf
- min_samples_split
- mitigation
- ml
- ml::bias-variance
- ml::foundations
- ml::generalisation
- ml::linear-regression
- ml::metrics
- ml::regularisation
- ml::supervised-learning
- ml::validation
- model
- model-behaviour
- model-evaluation
- model-formulation
- model-improvement
- model-validation
- model_quality
- models
- multi-class
- multiclass
- multicollinearity
- multivariate
- naive-bayes
- node
- non_parametric
- normality
- numerical-stability
- objective
- ols
- optimisation
- outliers
- overfitting
- overview
- ovo
- ovr
- parameters
- performance
- performance-metrics
- polynomial
- precision
- prediction
- preprocessing
- probabilistic-models
- probabilities
- probability
- pruning
- purity
- purpose
- r2
- random-forest
- random_forest
- recall
- regression
- regularisation
- regularization
- residuals
- robustness
- roc
- roc-auc
- sampling
- sigmoid
- sinusoidal
- sklearn
- solution
- splines
- splitting
- stacking
- standardisation
- stopping
- structure
- supervised
- supervised-classification
- theory
- tools
- training
- transformations
- tree-structure
- tuning
- uncertainty
- underfitting
- univariate
- update_rules
- validation
- variance
- variants
- visualization
- weak-learners
- weighting
- workflow
The post thumbnail image was made by Gemini, obviously.

