Elena' s AI Blog

Machine Learning Series

Elena Daehnhardt

Midjourney AI-generated art
Image credit: Illustration created with Midjourney, prompt by the author.
Image prompt

“An illustration representing cloud computing”

Machine Learning: From Foundations to Fine-Tuning

This series is ordered from beginner concepts to advanced practice. Start at Part 1 and build up.

Series Progress

27 of 27 posts published


Learning Path

Part 1: Deep Learning vs Machine Learning

Artificial Intelligence (AI) is a field of computer science. AI provides methods and algorithms to mimic human intelligence, reasoning, and decision-making and provide insights, which businesses could use in research or industry to build new exciting and innovative products or services. Machine Learning (ML) is a subset of AI with algorithms that learn from data. In this post, we sort out the differences between AI and ML.

Deep Learning vs Machine Learning

Part 1: Deep Learning with DataCamp and Twitter

While having some machine learning experience of working with Scikit Learn, I was always interested in Deep Learning. The plan is to learn basic concepts and apply algorithms to a real-life situation, which I have always liked.

Deep Learning with DataCamp and Twitter

Part 2: Tools and Data to Experiment with Machine Learning

Python open-source library scikit-learn provides a comprehensive selection of machine learning techniques (regression, classification, clustering), feature selection, metrics, preprocessing, and other functionality. At this moment, Scikit-learn, is lacking deep learning functionality; however, we can use TensorFlow with the Scikit Flow wrapper for creating neural networks using the Scikit-learn approach.

Tools and Data to Experiment with Machine Learning

Part 3: TensorFlow on M1

TensorFlow is a free OS library for machine learning created by Google Brain. Tensorflow has excellent functionality for building deep neural networks. I have chosen TensorFlow because it is pretty robust, efficient, and can be used with Python. Since I like Jupyter Notebooks and Conda, they were also installed on my system. Next, I am going through simple steps to install TensorFlow and the packages above on M1 macOS Monterey.

TensorFlow on M1

Part 3: Feature preprocessing

Machine Learning algorithms often require that data is in a specific type. For instance, we can use only numerical data. In other cases, ML algorithms would perform better or converge faster when we preprocess data before training the model. Since we do this step before training the model, we call it preprocessing.

Feature preprocessing

Part 4: Tensors in TensorFlow

TensorFlow is a free OS library for machine learning created by Google Brain. Tensorflow has excellent functionality for building deep neural networks. I have chosen TensorFlow because it is pretty robust, efficient, and can be used with Python. In this post, I am going to write about how we can create tensors, shuffle them, index them, get information about tensors with simple examples.

Tensors in TensorFlow

Part 4: TensorFlow: Regression Model

I have described regression modeling in TensorFlow. We have predicted a numerical value and adjusted hyperparameters to better model performance with a simple neural network. We generated a dataset, demonstrated a simple data split into training and testing sets, visualised our data and the created neural network, evaluated our model using a testing dataset.

TensorFlow: Regression Model

Part 5: TensorFlow: Global and Operation-level Seeds

In training Machine Learning models, we want to avoid any ordering biases in the data. In some cases, such as in Cross-Validation experiments, it is essential to mix data and ensure that the order of data is the same between different runs or system restarts. We can use operation-level and global seeds to achieve the reproducibility of results.

TensorFlow: Global and Operation-level Seeds

Part 5: TensorFlow: Evaluating the Regression Model

In this post, we have performed the evaluation of four regression models using TensorFlow. MAE and MSE error metrics were used to compare the Sequential models while finding the best neural network architecture regarding the defined hyperparameters.

TensorFlow: Evaluating the Regression Model

Part 6: TensorFlow: Multiclass Classification Model

In Machine Learning, the classification problem is categorising input data into different classes. For instance, we can categorise email messages into two groups, spam or not spam. In this case, we have two classes, we talk about binary classification. When we have more than two classes, we talk about multiclass classification. In this post, I am going to address the latest multiclass classification, on the example of categorising clothing items into clothing types.

TensorFlow: Multiclass Classification Model

Part 7: TensorFlow: Convolutional Neural Networks for Image Classification

In this post, I have demonstrated CNN usage for birds recognition using TensorFlow and Kaggle 400 birds species dataset. We observed how the model works with the original and augmented images.

TensorFlow: Convolutional Neural Networks for Image Classification

Part 8: TensorFlow: Transfer Learning (Feature Extraction) in Image Classification

Image classification is a complex task. However, we can approach the problem while reusing state-of-the-art pre-trained models. Using previously learned patterns from other models is named "Transfer Learning." This way, we can efficiently apply well-tested models, potentially leading to excellent performance.

TensorFlow: Transfer Learning (Feature Extraction) in Image Classification

Part 9: TensorFlow: Transfer Learning (Fine-Tuning) in Image Classification

We used a 400 species birds dataset for building bird species predictive models based on EffeicientNetB0 from Keras. The baseline model showed already an excellent Accuracy=0.9845. However, data augmentation did not help in improving accuracy, which slightly lowered to 0.9690. Further, this model with a data augmentation layer was partially unfrozen, retrained with a lower learning rate, and reached an Accuracy=0.9850.

TensorFlow: Transfer Learning (Fine-Tuning) in Image Classification

Part 10: TensorFlow: Evaluating the Saved Bird Species Prediction Model

In this post, I have described the process of in-depth model evaluation. I have reused the previously created EffecientNetB0 model, which is fine-tuned with the 400 Bird Species Kaggle dataset. As a result, I have found out which bird species are not well predicted.

TensorFlow: Evaluating the Saved Bird Species Prediction Model

Part 11: Data exploration and analysis with Python Pandas

In Data Science, we have so many terms explaining concepts and techniques that it is easy to need clarification and get a clear understanding of all data science components and steps. In this post, I filled the gap by explaining data science's two essential components, data analysis and exploration. To clarify things, I have shown both approaches, compared them, and provided Python code using Pandas dataframe and graph drawing.

Data exploration and analysis with Python Pandas

Part 11: Machine Learning Tests using the Titanic dataset

In this post, we created and evaluated several machine-learning models using the Titanic Dataset. We have compared the performance of the Logistic Regression, Decision Tree and Random Forest from Python's library scikit-learn and a Neural Network created with TensorFlow. The Random Forest Performed the best!

Machine Learning Tests using the Titanic dataset

Part 12: Machine-Learning Process

The machine learning process involves a series of steps and activities designed to develop and deploy machine learning models to solve specific problems or make predictions. To simplify, we create programs that take in data and produce desired results in machine learning. There are several stages in the machine-learning process that we briefly describe in this post.

Machine-Learning Process

Part 12: Decision Tree versus Random Forest, and Hyperparameter Optimisation

Decision trees, with their elegant simplicity and transparency, stand in stark contrast to the robust predictive power of Random Forest, an ensemble of trees. In this post, we compare the key distinctions, advantages, and trade-offs between these two approaches. We will use Scikit-Learn for training and testing both models and also perform hyperparameter optimisation to find both model parameters for improved performance.

Decision Tree versus Random Forest, and Hyperparameter Optimisation

Part 13: TensorFlow: Romancing with TensorFlow and NLP

In this post we will create a simple poem generation model with Keras Sequential API.

TensorFlow: Romancing with TensorFlow and NLP

Part 13: Cross-Validation Techniques

Building a machine learning model is easy; proving it actually works on unseen data is the hard part. In this post, we cover cross-validation techniques—from traditional K-Fold to Stratified and Time-Series splits—using hands-on examples in scikit-learn.

Cross-Validation Techniques

Part 14: LoRA fine-tuning wins

You no longer need to retrain entire language models. LoRA allows you to teach new capabilities via tiny adapters. Here is the architectural code, deployment cheat sheets, and production pitfalls.

LoRA fine-tuning wins

Part 17: Floating-point format and Mixed Precision in TensorFlow

When creating large Machine Learning models, we want to minimise the training time. In TensorFlow, it is possible to do mixed precision model training, which helps in significant performance improvement because it uses lower-precision operations with 16 bits (such as float16) together with single-precision operations (f.i. using float32 data type). Google TPUs and NVIDIA GPUs devices can perform operations with 16-bit datatype much faster

Floating-point format and Mixed Precision in TensorFlow

Part 18: Audio Signal Processing with Python's Librosa

In this post, I focus on audio signal processing and working with WAV files. I apply Python's Librosa library for extracting wave features commonly used in research and application tasks such as gender prediction, music genre prediction, and voice identification. To succeed in these complex tasks, we need a clear understanding of how WAV files can be analysed, which I cover in detail with handy Python code snippets.

Audio Signal Processing with Python's Librosa

Part 19: Bias-Variance Challenge

In machine learning, we usually start from a simple baseline model and progressively adjust its complexity until we reach that spot with the best model performance. How can we do this? Let's detail the most essential machine learning concepts and the bias-variance challenge.

Bias-Variance Challenge

Part 20: How recommendation engines actually work (with Python code)

Ever wondered how Netflix or Spotify manages to guess what you want to watch or listen to next? The secret lies in recommendation algorithms. Here is a look at the math behind collaborative and content-based filtering, and how to implement them in Python.

How recommendation engines actually work (with Python code)

Part 21: OpenAI's Model Show-off

OpenAI's GPT models are highly sophisticated machine learning models that are used in various fields such as natural language processing, coding assistance, and content creation. OpenAI's newest video-generating model, Sora, sets a new benchmark in video generation technology, which I quickly explore in this post.

OpenAI's Model Show-off

Part 22: Apache-Licensed Summarizers

Looking for summarisation models you can safely use in your app? Here is a definitive guide to Apache-licensed transformer models, complete with selection matrices and production gotchas.

Apache-Licensed Summarizers

Start Here

Start with Part 1: Deep Learning vs Machine Learning.

All Posts