Featured Image

Multi-Class Imbalanced Classification

Last Updated on August 21, 2020

Imbalanced classification are those prediction tasks where the distribution of examples across class labels is not equal.

Most imbalanced classification examples focus on binary classification tasks, yet many of the tools and techniques for imbalanced classification also directly support multi-class classification problems.

In this tutorial, you will discover how to use the tools of imbalanced classification with a multi-class dataset.

After completing this tutorial, you will know:

  • About the glass identification standard imbalanced multi-class prediction problem.
  • How to use SMOTE oversampling for imbalanced multi-class classification.
  • How to use cost-sensitive learning for imbalanced multi-class classification.

Kick-start your project with my new book Imbalanced Classification with Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Multi-Class Imbalanced Classification

Multi-Class Imbalanced Classification
Photo by istolethetv, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  • Glass Multi-Class Classification Dataset
  • SMOTE Oversampling for Multi-Class Classification
  • Cost-Sensitive Learning for Multi-Class Classification
  • Glass Multi-Class Classification Dataset

    In this tutorial, we will focus on the standard imbalanced multi-class classification problem referred to as “Glass Identification” or simply “glass.”

    The dataset describes the chemical properties of glass and involves classifying samples of glass using their chemical properties as one of six classes. The dataset was credited to Vina Spiehler in 1987.

    Ignoring the sample identification number, there are nine input variables that summarize the properties of the glass dataset; they are:

    • RI: Refractive Index
    • Na: Sodium
    • Mg: Magnesium
    • Al: Aluminum
    • Si: Silicon
    • K: Potassium
    • Ca: Calcium
    • Ba: Barium
    • Fe: Iron

    The chemical compositions are measured as the weight percent in corresponding oxide.

    There are seven types of glass listed; they are:

    • Class 1: building windows (float processed)
    • Class 2: building windows (non-float processed)
    • Class 3: vehicle windows (float processed)
    • Class 4: vehicle windows (non-float processed)
    • Class 5: containers
    • Class 6: tableware
    • Class 7: headlamps

    Float glass refers to the process used to make the glass.

    There are 214 observations in the dataset and the number of observations in each class is imbalanced. Note that there are no examples for class 4 (non-float processed vehicle windows) in the dataset.

    • Class 1: 70 examples
    • Class 2: 76 examples
    • Class 3: 17 examples
    • Class 4: 0 examples
    • Class 5: 13 examples
    • Class 6: 9 examples
    • Class 7: 29 examples

    Although there are minority classes, all classes are equally important in this prediction problem.

    The dataset can be divided into window glass (classes 1-4) and non-window glass (classes 5-7). There are 163 examples of window glass and 51 examples of non-window glass.

    • Window Glass: 163 examples
    • Non-Window Glass: 51 examples

    Another division of the observations would be between float processed glass and non-float processed glass, in the case of window glass only. This division is more balanced.

    • Float Glass: 87 examples
    • Non-Float Glass: 76 examples

    You can learn more about the dataset here:

    No need to download the dataset; we will download it automatically as part of the worked examples.

    Below is a sample of the first few rows of the data.


    We can see that all inputs are numeric and the target variable in the final column is the integer encoded class label.

    You can learn more about how to work through this dataset as part of a project in the tutorial:

    Now that we are familiar with the glass multi-class classification dataset, let’s explore how we can use standard imbalanced classification tools with it.



    Want to Get Started With Imbalance Classification?

    Take my free 7-day email crash course now (with sample code).

    Click to sign-up and also get a free PDF Ebook version of the course.

    Download Your FREE Mini-Course


    SMOTE Oversampling for Multi-Class Classification

    Oversampling refers to copying or synthesizing new examples of the minority classes so that the number of examples in the minority class better resembles or matches the number of examples in the majority classes.

    Perhaps the most widely used approach to synthesizing new examples is called the Synthetic Minority Oversampling TEchnique, or SMOTE for short. This technique was described by Nitesh Chawla, et al. in their 2002 paper named for the technique titled “SMOTE: Synthetic Minority Over-sampling Technique.”

    You can learn more about SMOTE in the tutorial:

    The imbalanced-learn library provides an implementation of SMOTE that we can use that is compatible with the popular scikit-learn library.

    First, the library must be installed. We can install it using pip as follows:

    sudo pip install imbalanced-learn

    We can confirm that the installation was successful by printing the version of the installed library:


    Running the example will print the version number of the installed library; for example:


    Before we apply SMOTE, let’s first load the dataset and confirm the number of examples in each class.


    Running the example first downloads the dataset and splits it into train and test sets.

    The number of rows in each class is then reported, confirming that some classes, such as 0 and 1, have many more examples (more than 70) than other classes, such as 3 and 4 (less than 15).


    A bar chart is created providing a visualization of the class breakdown of the dataset.

    This gives a clearer idea that classes 0 and 1 have many more examples than classes 2, 3, 4 and 5.

    Histogram of Examples in Each Class in the Glass Multi-Class Classification Dataset

    Histogram of Examples in Each Class in the Glass Multi-Class Classification Dataset

    Next, we can apply SMOTE to oversample the dataset.

    By default, SMOTE will oversample all classes to have the same number of examples as the class with the most examples.

    In this case, class 1 has the most examples with 76, therefore, SMOTE will oversample all classes to have 76 examples.

    The complete example of oversampling the glass dataset with SMOTE is listed below.


    Running the example first loads the dataset and applies SMOTE to it.

    The distribution of examples in each class is then reported, confirming that each class now has 76 examples, as we expected.


    A bar chart of the class distribution is also created, providing a strong visual indication that all classes now have the same number of examples.

    Histogram of Examples in Each Class in the Glass Multi-Class Classification Dataset After Default SMOTE Oversampling

    Histogram of Examples in Each Class in the Glass Multi-Class Classification Dataset After Default SMOTE Oversampling

    Instead of using the default strategy of SMOTE to oversample all classes to the number of examples in the majority class, we could instead specify the number of examples to oversample in each class.

    For example, we could oversample to 100 examples in classes 0 and 1 and 200 examples in remaining classes. This can be achieved by creating a dictionary that maps class labels to the number of desired examples in each class, then specifying this via the “sampling_strategy” argument to the SMOTE class.


    Tying this together, the complete example of using a custom oversampling strategy for SMOTE is listed below.


    Running the example creates the desired sampling and summarizes the effect on the dataset, confirming the intended result.


    Note: you may see warnings that can be safely ignored for the purposes of this example, such as:


    A bar chart of the class distribution is also created confirming the specified class distribution after data sampling.

    Histogram of Examples in Each Class in the Glass Multi-Class Classification Dataset After Custom SMOTE Oversampling

    Histogram of Examples in Each Class in the Glass Multi-Class Classification Dataset After Custom SMOTE Oversampling

    Note: when using data sampling like SMOTE, it must only be applied to the training dataset, not the entire dataset. I recommend using a Pipeline to ensure that the SMOTE method is correctly used when evaluating models and making predictions with models.

    You can see an example of the correct usage of SMOTE in a Pipeline in this tutorial:

    Cost-Sensitive Learning for Multi-Class Classification

    Most machine learning algorithms assume that all classes have an equal number of examples.

    This is not the case in multi-class imbalanced classification. Algorithms can be modified to change the way learning is performed to bias towards those classes that have fewer examples in the training dataset. This is generally called cost-sensitive learning.

    For more on cost-sensitive learning, see the tutorial:

    The RandomForestClassifier class in scikit-learn supports cost-sensitive learning via the “class_weight” argument.

    By default, the random forest class assigns equal weight to each class.

    We can evaluate the classification accuracy of the default random forest class weighting on the glass imbalanced multi-class classification dataset.

    The complete example is listed below.


    Running the example evaluates the default random forest algorithm with 1,000 trees on the glass dataset using repeated stratified k-fold cross-validation.

    The mean and standard deviation classification accuracy are reported at the end of the run.

    Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

    In this case, we can see that the default model achieved a classification accuracy of about 79.6 percent.


    We can specify the “class_weight” argument to the value “balanced” that will automatically calculates a class weighting that will ensure each class gets an equal weighting during the training of the model.


    Tying this together, the complete example is listed below.


    Running the example reports the mean and standard deviation classification accuracy of the cost-sensitive version of random forest on the glass dataset.

    Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

    In this case, we can see that the default model achieved a lift in classification accuracy over the cost-insensitive version of the algorithm, with 80.2 percent classification accuracy vs. 79.6 percent.


    The “class_weight” argument takes a dictionary of class labels mapped to a class weighting value.

    We can use this to specify a custom weighting, such as a default weighting for classes 0 and 1.0 that have many examples and a double class weighting of 2.0 for the other classes.


    Tying this together, the complete example of using a custom class weighting for cost-sensitive learning on the glass multi-class imbalanced classification problem is listed below.


    Running the example reports the mean and standard deviation classification accuracy of the cost-sensitive version of random forest on the glass dataset with custom weights.

    Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

    In this case, we can see that we achieved a further lift in accuracy from about 80.2 percent with balanced class weighting to 80.8 percent with a more biased class weighting.


    Further Reading

    This section provides more resources on the topic if you are looking to go deeper.

    Related Tutorials
    APIs

    Summary

    In this tutorial, you discovered how to use the tools of imbalanced classification with a multi-class dataset.

    Specifically, you learned:

    • About the glass identification standard imbalanced multi-class prediction problem.
    • How to use SMOTE oversampling for imbalanced multi-class classification.
    • How to use cost-sensitive learning for imbalanced multi-class classification.

    Do you have any questions?
    Ask your questions in the comments below and I will do my best to answer.

    Get a Handle on Imbalanced Classification!

    Imbalanced Classification with Python

    Develop Imbalanced Learning Models in Minutes

    …with just a few lines of python code

    Discover how in my new Ebook:
    Imbalanced Classification with Python

    It provides self-study tutorials and end-to-end projects on:
    Performance Metrics, Undersampling Methods, SMOTE, Threshold Moving, Probability Calibration, Cost-Sensitive Algorithms

    and much more…

    Bring Imbalanced Classification Methods to Your Machine Learning Projects

    See What’s Inside

    Covid Abruzzo Basilicata Calabria Campania Emilia Romagna Friuli Venezia Giulia Lazio Liguria Lombardia Marche Molise Piemonte Puglia Sardegna Sicilia Toscana Trentino Alto Adige Umbria Valle d’Aosta Veneto Italia Agrigento Alessandria Ancona Aosta Arezzo Ascoli Piceno Asti Avellino Bari Barletta-Andria-Trani Belluno Benevento Bergamo Biella Bologna Bolzano Brescia Brindisi Cagliari Caltanissetta Campobasso Carbonia-Iglesias Caserta Catania Catanzaro Chieti Como Cosenza Cremona Crotone Cuneo Enna Fermo Ferrara Firenze Foggia Forlì-Cesena Frosinone Genova Gorizia Grosseto Imperia Isernia La Spezia L’Aquila Latina Lecce Lecco Livorno Lodi Lucca Macerata Mantova Massa-Carrara Matera Messina Milano Modena Monza e della Brianza Napoli Novara Nuoro Olbia-Tempio Oristano Padova Palermo Parma Pavia Perugia Pesaro e Urbino Pescara Piacenza Pisa Pistoia Pordenone Potenza Prato Ragusa Ravenna Reggio Calabria Reggio Emilia Rieti Rimini Roma Rovigo Salerno Medio Campidano Sassari Savona Siena Siracusa Sondrio Taranto Teramo Terni Torino Ogliastra Trapani Trento Treviso Trieste Udine Varese Venezia Verbano-Cusio-Ossola Vercelli Verona Vibo Valentia Vicenza Viterbo

    Featured Image

    How to Use AutoKeras for Classification and Regression

    AutoML refers to techniques for automatically discovering the best-performing model for a given dataset.

    When applied to neural networks, this involves both discovering the model architecture and the hyperparameters used to train the model, generally referred to as neural architecture search.

    AutoKeras is an open-source library for performing AutoML for deep learning models. The search is performed using so-called Keras models via the TensorFlow tf.keras API.

    It provides a simple and effective approach for automatically finding top-performing models for a wide range of predictive modeling tasks, including tabular or so-called structured classification and regression datasets.

    In this tutorial, you will discover how to use AutoKeras to find good neural network models for classification and regression tasks.

    After completing this tutorial, you will know:

    • AutoKeras is an implementation of AutoML for deep learning that uses neural architecture search.
    • How to use AutoKeras to find a top-performing model for a binary classification dataset.
    • How to use AutoKeras to find a top-performing model for a regression dataset.

    Let’s get started.

    How to Use AutoKeras for Classification and Regression

    How to Use AutoKeras for Classification and Regression
    Photo by kanu101, some rights reserved.

    Tutorial Overview

    This tutorial is divided into three parts; they are:

  • AutoKeras for Deep Learning
  • AutoKeras for Classification
  • AutoKeras for Regression
  • AutoKeras for Deep Learning

    Automated Machine Learning, or AutoML for short, refers to automatically finding the best combination of data preparation, model, and model hyperparameters for a predictive modeling problem.

    The benefit of AutoML is allowing machine learning practitioners to quickly and effectively address predictive modeling tasks with very little input, e.g. fire and forget.

    Automated Machine Learning (AutoML) has become a very important research topic with wide applications of machine learning techniques. The goal of AutoML is to enable people with limited machine learning background knowledge to use machine learning models easily.

    — Auto-keras: An efficient neural architecture search system, 2019.

    AutoKeras is an implementation of AutoML for deep learning models using the Keras API, specifically the tf.keras API provided by TensorFlow 2.

    It uses a process of searching through neural network architectures to best address a modeling task, referred to more generally as Neural Architecture Search, or NAS for short.

    … we have developed a widely adopted open-source AutoML system based on our proposed method, namely Auto-Keras. It is an open-source AutoML system, which can be downloaded and installed locally.

    — Auto-keras: An efficient neural architecture search system, 2019.

    In the spirit of Keras, AutoKeras provides an easy-to-use interface for different tasks, such as image classification, structured data classification or regression, and more. The user is only required to specify the location of the data and the number of models to try and is returned a model that achieves the best performance (under the configured constraints) on that dataset.

    Note: AutoKeras provides a TensorFlow 2 Keras model (e.g. tf.keras) and not a Standalone Keras model. As such, the library assumes that you have Python 3 and TensorFlow 2.1 or higher installed.

    To install AutoKeras, you can use Pip, as follows:


    You can confirm the installation was successful and check the version number as follows:


    You should see output like the following:


    Once installed, you can then apply AutoKeras to find a good or great neural network model for your predictive modeling task.

    We will take a look at two common examples where you may want to use AutoKeras, classification and regression on tabular data, so-called structured data.

    AutoKeras for Classification

    AutoKeras can be used to discover a good or great model for classification tasks on tabular data.

    Recall tabular data are those datasets composed of rows and columns, such as a table or data as you would see in a spreadsheet.

    In this section, we will develop a model for the Sonar classification dataset for classifying sonar returns as rocks or mines. This dataset consists of 208 rows of data with 60 input features and a target class label of 0 (rock) or 1 (mine).

    A naive model can achieve a classification accuracy of about 53.4 percent via repeated 10-fold cross-validation, which provides a lower-bound. A good model can achieve an accuracy of about 88.2 percent, providing an upper-bound.

    You can learn more about the dataset here:

    No need to download the dataset; we will download it automatically as part of the example.

    First, we can download the dataset and split it into a randomly selected train and test set, holding 33 percent for test and using 67 percent for training.

    The complete example is listed below.


    Running the example first downloads the dataset and summarizes the shape, showing the expected number of rows and columns.

    The dataset is then split into input and output elements, then these elements are further split into train and test datasets.


    We can use AutoKeras to automatically discover an effective neural network model for this dataset.

    This can be achieved by using the StructuredDataClassifier class and specifying the number of models to search. This defines the search to perform.


    We can then execute the search using our loaded dataset.


    This may take a few minutes and will report the progress of the search.

    Next, we can evaluate the model on the test dataset to see how it performs on new data.


    We then use the model to make a prediction for a new row of data.


    We can retrieve the final model, which is an instance of a TensorFlow Keras model.


    We can then summarize the structure of the model to see what was selected.


    Finally, we can save the model to file for later use, which can be loaded using the TensorFlow load_model() function.


    Tying this together, the complete example of applying AutoKeras to find an effective neural network model for the Sonar dataset is listed below.


    Running the example will report a lot of debug information about the progress of the search.

    The models and results are all saved in a folder called “structured_data_classifier” in your current working directory.


    The best-performing model is then evaluated on the hold-out test dataset.

    Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

    In this case, we can see that the model achieved a classification accuracy of about 82.6 percent.


    Next, the architecture of the best-performing model is reported.

    We can see a model with two hidden layers with dropout and ReLU activation.


    AutoKeras for Regression

    AutoKeras can also be used for regression tasks, that is, predictive modeling problems where a numeric value is predicted.

    We will use the auto insurance dataset that involves predicting the total payment from claims given the total number of claims. The dataset has 63 rows and one input and one output variable.

    A naive model can achieve a mean absolute error (MAE) of about 66 using repeated 10-fold cross-validation, providing a lower-bound on expected performance. A good model can achieve a MAE of about 28, providing a performance upper-bound.

    You can learn more about this dataset here:

    We can load the dataset and split it into input and output elements and then train and test datasets.

    The complete example is listed below.


    Running the example loads the dataset, confirming the number of rows and columns, then splits the dataset into train and test sets.


    AutoKeras can be applied to a regression task using the StructuredDataRegressor class and configured for the number of models to trial.


    The search can then be run and the best model saved, much like in the classification case.


    We can then use the best-performing model and evaluate it on the hold out dataset, make a prediction on new data, and summarize its structure.


    Tying this together, the complete example of using AutoKeras to discover an effective neural network model for the auto insurance dataset is listed below.


    Running the example will report a lot of debug information about the progress of the search.

    The models and results are all saved in a folder called “structured_data_regressor” in your current working directory.


    The best-performing model is then evaluated on the hold-out test dataset.

    Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

    In this case, we can see that the model achieved a MAE of about 24.


    Next, the architecture of the best-performing model is reported.

    We can see a model with two hidden layers with ReLU activation.


    Further Reading

    This section provides more resources on the topic if you are looking to go deeper.

    Summary

    In this tutorial, you discovered how to use AutoKeras to find good neural network models for classification and regression tasks.

    Specifically, you learned:

    • AutoKeras is an implementation of AutoML for deep learning that uses neural architecture search.
    • How to use AutoKeras to find a top-performing model for a binary classification dataset.
    • How to use AutoKeras to find a top-performing model for a regression dataset.

    Do you have any questions?
    Ask your questions in the comments below and I will do my best to answer.

    Develop Deep Learning Projects with Python!

    Deep Learning with Python
     What If You Could Develop A Network in Minutes

    …with just a few lines of Python

    Discover how in my new Ebook:
    Deep Learning With Python

    It covers end-to-end projects on topics like:
    Multilayer Perceptrons, Convolutional Nets and Recurrent Neural Nets, and more…

    Finally Bring Deep Learning To

    Your Own Projects

    Skip the Academics. Just Results.

    See What’s Inside

    Covid Abruzzo Basilicata Calabria Campania Emilia Romagna Friuli Venezia Giulia Lazio Liguria Lombardia Marche Molise Piemonte Puglia Sardegna Sicilia Toscana Trentino Alto Adige Umbria Valle d’Aosta Veneto Italia Agrigento Alessandria Ancona Aosta Arezzo Ascoli Piceno Asti Avellino Bari Barletta-Andria-Trani Belluno Benevento Bergamo Biella Bologna Bolzano Brescia Brindisi Cagliari Caltanissetta Campobasso Carbonia-Iglesias Caserta Catania Catanzaro Chieti Como Cosenza Cremona Crotone Cuneo Enna Fermo Ferrara Firenze Foggia Forlì-Cesena Frosinone Genova Gorizia Grosseto Imperia Isernia La Spezia L’Aquila Latina Lecce Lecco Livorno Lodi Lucca Macerata Mantova Massa-Carrara Matera Messina Milano Modena Monza e della Brianza Napoli Novara Nuoro Olbia-Tempio Oristano Padova Palermo Parma Pavia Perugia Pesaro e Urbino Pescara Piacenza Pisa Pistoia Pordenone Potenza Prato Ragusa Ravenna Reggio Calabria Reggio Emilia Rieti Rimini Roma Rovigo Salerno Medio Campidano Sassari Savona Siena Siracusa Sondrio Taranto Teramo Terni Torino Ogliastra Trapani Trento Treviso Trieste Udine Varese Venezia Verbano-Cusio-Ossola Vercelli Verona Vibo Valentia Vicenza Viterbo

    Featured Image

    Multi-Label Classification with Deep Learning

    Last Updated on August 31, 2020

    Multi-label classification involves predicting zero or more class labels.

    Unlike normal classification tasks where class labels are mutually exclusive, multi-label classification requires specialized machine learning algorithms that support predicting multiple mutually non-exclusive classes or “labels.”

    Deep learning neural networks are an example of an algorithm that natively supports multi-label classification problems. Neural network models for multi-label classification tasks can be easily defined and evaluated using the Keras deep learning library.

    In this tutorial, you will discover how to develop deep learning models for multi-label classification.

    After completing this tutorial, you will know:

    • Multi-label classification is a predictive modeling task that involves predicting zero or more mutually non-exclusive class labels.
    • Neural network models can be configured for multi-label classification tasks.
    • How to evaluate a neural network for multi-label classification and make a prediction for new data.

    Let’s get started.

    Multi-Label Classification with Deep Learning

    Multi-Label Classification with Deep Learning
    Photo by Trevor Marron, some rights reserved.

    Tutorial Overview

    This tutorial is divided into three parts; they are:

    • Multi-Label Classification
    • Neural Networks for Multiple Labels
    • Neural Network for Multi-Label Classification

    Multi-Label Classification

    Classification is a predictive modeling problem that involves outputting a class label given some input

    It is different from regression tasks that involve predicting a numeric value.

    Typically, a classification task involves predicting a single label. Alternately, it might involve predicting the likelihood across two or more class labels. In these cases, the classes are mutually exclusive, meaning the classification task assumes that the input belongs to one class only.

    Some classification tasks require predicting more than one class label. This means that class labels or class membership are not mutually exclusive. These tasks are referred to as multiple label classification, or multi-label classification for short.

    In multi-label classification, zero or more labels are required as output for each input sample, and the outputs are required simultaneously. The assumption is that the output labels are a function of the inputs.

    We can create a synthetic multi-label classification dataset using the make_multilabel_classification() function in the scikit-learn library.

    Our dataset will have 1,000 samples with 10 input features. The dataset will have three class label outputs for each sample and each class will have one or two values (0 or 1, e.g. present or not present).

    The complete example of creating and summarizing the synthetic multi-label classification dataset is listed below.


    Running the example creates the dataset and summarizes the shape of the input and output elements.

    We can see that, as expected, there are 1,000 samples, each with 10 input features and three output features.

    The first 10 rows of inputs and outputs are summarized and we can see that all inputs for this dataset are numeric and that output class labels have 0 or 1 values for each of the three class labels.


    Next, let’s look at how we can develop neural network models for multi-label classification tasks.

    Neural Networks for Multiple Labels

    Some machine learning algorithms support multi-label classification natively.

    Neural network models can be configured to support multi-label classification and can perform well, depending on the specifics of the classification task.

    Multi-label classification can be supported directly by neural networks simply by specifying the number of target labels there is in the problem as the number of nodes in the output layer. For example, a task that has three output labels (classes) will require a neural network output layer with three nodes in the output layer.

    Each node in the output layer must use the sigmoid activation. This will predict a probability of class membership for the label, a value between 0 and 1. Finally, the model must be fit with the binary cross-entropy loss function.

    In summary, to configure a neural network model for multi-label classification, the specifics are:

    • Number of nodes in the output layer matches the number of labels.
    • Sigmoid activation for each node in the output layer.
    • Binary cross-entropy loss function.

    We can demonstrate this using the Keras deep learning library.

    We will define a Multilayer Perceptron (MLP) model for the multi-label classification task defined in the previous section.

    Each sample has 10 inputs and three outputs; therefore, the network requires an input layer that expects 10 inputs specified via the “input_dim” argument in the first hidden layer and three nodes in the output layer.

    We will use the popular ReLU activation function in the hidden layer. The hidden layer has 20 nodes that were chosen after some trial and error. We will fit the model using binary cross-entropy loss and the Adam version of stochastic gradient descent.

    The definition of the network for the multi-label classification task is listed below.


    You may want to adapt this model for your own multi-label classification task; therefore, we can create a function to define and return the model where the number of input and output variables is provided as arguments.


    Now that we are familiar with how to define an MLP for multi-label classification, let’s explore how this model can be evaluated.

    Neural Network for Multi-Label Classification

    If the dataset is small, it is good practice to evaluate neural network models repeatedly on the same dataset and report the mean performance across the repeats.

    This is because of the stochastic nature of the learning algorithm.

    Additionally, it is good practice to use k-fold cross-validation instead of train/test splits of a dataset to get an unbiased estimate of model performance when making predictions on new data. Again, only if there is not too much data that the process can be completed in a reasonable time.

    Taking this into account, we will evaluate the MLP model on the multi-output regression task using repeated k-fold cross-validation with 10 folds and three repeats.

    The MLP model will predict the probability for each class label by default. This means it will predict three probabilities for each sample. These can be converted to crisp class labels by rounding the values to either 0 or 1. We can then calculate the classification accuracy for the crisp class labels.


    The scores are collected and can be summarized by reporting the mean and standard deviation across all repeats and cross-validation folds.

    The evaluate_model() function below takes the dataset, evaluates the model, and returns a list of evaluation scores, in this case, accuracy scores.


    We can then load our dataset and evaluate the model and report the mean performance.

    Tying this together, the complete example is listed below.


    Running the example reports the classification accuracy for each fold and each repeat, to give an idea of the evaluation progress.

    Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

    At the end, the mean and standard deviation accuracy is reported. In this case, the model is shown to achieve an accuracy of about 81.2 percent.

    You can use this code as a template for evaluating MLP models on your own multi-label classification tasks. The number of nodes and layers in the model can easily be adapted and tailored to the complexity of your dataset.


    Once a model configuration is chosen, we can use it to fit a final model on all available data and make a prediction for new data.

    The example below demonstrates this by first fitting the MLP model on the entire multi-label classification dataset, then calling the predict() function on the saved model in order to make a prediction for a new row of data.


    Running the example fits the model and makes a prediction for a new row. As expected, the prediction contains three output variables required for the multi-label classification task: the probabilities of each class label.


    Further Reading

    This section provides more resources on the topic if you are looking to go deeper.

    Summary

    In this tutorial, you discovered how to develop deep learning models for multi-label classification.

    Specifically, you learned:

    • Multi-label classification is a predictive modeling task that involves predicting zero or more mutually non-exclusive class labels.
    • Neural network models can be configured for multi-label classification tasks.
    • How to evaluate a neural network for multi-label classification and make a prediction for new data.

    Do you have any questions?
    Ask your questions in the comments below and I will do my best to answer.

    Develop Deep Learning Projects with Python!

    Deep Learning with Python
     What If You Could Develop A Network in Minutes

    …with just a few lines of Python

    Discover how in my new Ebook:
    Deep Learning With Python

    It covers end-to-end projects on topics like:
    Multilayer Perceptrons, Convolutional Nets and Recurrent Neural Nets, and more…

    Finally Bring Deep Learning To

    Your Own Projects

    Skip the Academics. Just Results.

    See What’s Inside

    Covid Abruzzo Basilicata Calabria Campania Emilia Romagna Friuli Venezia Giulia Lazio Liguria Lombardia Marche Molise Piemonte Puglia Sardegna Sicilia Toscana Trentino Alto Adige Umbria Valle d’Aosta Veneto Italia Agrigento Alessandria Ancona Aosta Arezzo Ascoli Piceno Asti Avellino Bari Barletta-Andria-Trani Belluno Benevento Bergamo Biella Bologna Bolzano Brescia Brindisi Cagliari Caltanissetta Campobasso Carbonia-Iglesias Caserta Catania Catanzaro Chieti Como Cosenza Cremona Crotone Cuneo Enna Fermo Ferrara Firenze Foggia Forlì-Cesena Frosinone Genova Gorizia Grosseto Imperia Isernia La Spezia L’Aquila Latina Lecce Lecco Livorno Lodi Lucca Macerata Mantova Massa-Carrara Matera Messina Milano Modena Monza e della Brianza Napoli Novara Nuoro Olbia-Tempio Oristano Padova Palermo Parma Pavia Perugia Pesaro e Urbino Pescara Piacenza Pisa Pistoia Pordenone Potenza Prato Ragusa Ravenna Reggio Calabria Reggio Emilia Rieti Rimini Roma Rovigo Salerno Medio Campidano Sassari Savona Siena Siracusa Sondrio Taranto Teramo Terni Torino Ogliastra Trapani Trento Treviso Trieste Udine Varese Venezia Verbano-Cusio-Ossola Vercelli Verona Vibo Valentia Vicenza Viterbo

    Recent Posts

    Archives