Sentiment analysis with .NET ML explained

.NET ML (Machine Learning) is a machine learning library that is part of the .NET ecosystem. It provides a set of APIs for building, training, and deploying machine learning models in .NET applications. .NET ML supports a wide range of machine learning scenarios, including classification, regression, clustering, and recommendation systems, among others.

The library is built on top of the .NET runtime and is designed to work seamlessly with other .NET technologies, such as ASP.NET, Azure, and Xamarin. It also supports a variety of data formats, including CSV, JSON, and Parquet.

.NET ML provides a high-level API that simplifies the process of building machine learning models. It includes pre-built transforms and trainers that can be easily combined to create complex machine learning pipelines. It also supports distributed training, which allows you to train models on large datasets across multiple machines.

.NET ML is an open-source library and is available on GitHub. It is actively maintained by Microsoft and the .NET community, which ensures that it stays up-to-date with the latest developments in the field of machine learning.

Let us use 3 different approaches for getting started, you can select any of the below for you to learn and understand.

Approach 1

As a trial, let us first understand the code, by creating our own dataset. This is pretty hands-on to just getting started –

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;

// Define the input data schema
public class InputData
{
    [LoadColumn(0)]
    public string Text;

    [LoadColumn(1)]
    public bool Sentiment;
}

// Define the output data schema
public class OutputData
{
    [ColumnName("PredictedLabel")]
    public bool Sentiment;
}

class Program
{
    static void Main(string[] args)
    {
        // Step 1: Generate the dataset
        var dataPath = Path.Combine(Environment.CurrentDirectory, "sentiment.csv");
        GenerateDataset(dataPath);

        // Step 2: Load the dataset
        var mlContext = new MLContext();
        var data = mlContext.Data.LoadFromTextFile<InputData>(
            dataPath, separatorChar: ',', hasHeader: true);

        // Step 3: Prepare the data for training
        var trainTestSplit = mlContext.Data.TrainTestSplit(data, testFraction: 0.3);
        var pipeline = mlContext.Transforms.Text.FeaturizeText("Features", "Text")
            .Append(mlContext.Transforms.Conversion.MapValueToKey("Label", "Sentiment"))
            .Append(mlContext.Transforms.Concatenate("Features", "Features"))
            .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
            .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

        // Step 4: Train the model
        var trainer = mlContext.BinaryClassification.Trainers.SdcaLogisticRegression();
        var trainingPipeline = pipeline.Append(trainer);
        var model = trainingPipeline.Fit(trainTestSplit.TrainSet);

        // Step 5: Evaluate the model
        var predictions = model.Transform(trainTestSplit.TestSet);
        var metrics = mlContext.BinaryClassification.Evaluate(predictions);
        Console.WriteLine($"Accuracy: {metrics.Accuracy}");

        // Step 6: Test the model
        var predictionEngine = mlContext.Model.CreatePredictionEngine<InputData, OutputData>(model);
        var testData = new InputData { Text = "This is a great movie!", Sentiment = true };
        var prediction = predictionEngine.Predict(testData);
        Console.WriteLine($"Prediction: {prediction.Sentiment}");
    }

    private static void GenerateDataset(string filePath)
    {
        var data = new List<InputData>
        {
            new InputData { Text = "I love this product", Sentiment = true },
            new InputData { Text = "This is the worst product ever", Sentiment = false },
            new InputData { Text = "The customer service was amazing", Sentiment = true },
            new InputData { Text = "I will never buy from this company again", Sentiment = false },
            new InputData { Text = "The quality of the product was poor", Sentiment = false },
            new InputData { Text = "This is the best restaurant in town", Sentiment = true },
            new InputData { Text = "I had a terrible experience at this hotel", Sentiment = false },
            new InputData { Text = "The movie was fantastic", Sentiment = true },
            new InputData { Text = "I would not recommend this book", Sentiment = false },
            new InputData { Text = "The concert was a complete disappointment", Sentiment = false }
        };

     }

Running this above, should provide you a good boost into how this all comes together.

Approach 2

Let’s create a console application in Visual Studio and add the following NuGet packages: Microsoft.ML and Microsoft.ML.FastTree.

Next, let’s create a class to represent our data:

public class SentimentData
{
    public string SentimentText { get; set; }
    public bool Sentiment { get; set; }
}

The SentimentData class has two properties: SentimentText, which represents the text of the sentiment, and Sentiment, which represents the sentiment label (positive or negative).

Next, let’s create a method to load the dataset from a file:

public static IEnumerable<SentimentData> LoadData(string path)
{
    return File.ReadAllLines(path)
        .Select(line => line.Split('\t'))
        .Select(line => new SentimentData { SentimentText = line[0], Sentiment = bool.Parse(line[1]) });
}

This method reads the data from a tab-separated file and returns an IEnumerable of SentimentData objects.

Now, let’s create a method to prepare the data for training:

public static IDataView PrepareData(MLContext mlContext, IEnumerable<SentimentData> data)
{
    var dataView = mlContext.Data.LoadFromEnumerable(data);

    var pipeline = mlContext.Transforms.Text
        .FeaturizeText("Features", "SentimentText")
        .Append(mlContext.Transforms.CopyColumns("Label", "Sentiment"))
        .Append(mlContext.Transforms.Conversion.MapValueToKey("Label"))
        .Append(mlContext.Transforms.NormalizeMinMax("Features"))
        .Append(mlContext.Transforms.Concatenate("Features", "Features"))
        .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

    return pipeline.Fit(dataView).Transform(dataView);
}

his method takes an MLContext object and an IEnumerable of SentimentData objects as inputs. It then creates a data view from the input data and creates a pipeline of transforms that featurizes the text data, copies the sentiment label, maps the label to a numeric key, normalizes the feature values, concatenates the features, and maps the predicted label back to a string value.

Now, let’s create a method to train the model:

public static ITransformer TrainModel(MLContext mlContext, IDataView data)
{
    var estimator = mlContext.BinaryClassification.Trainers.FastTree(
        new FastTreeBinaryTrainer.Options
        {
            NumLeaves = 50,
            NumTrees = 100,
            LabelColumnName = "Label",
            FeatureColumnName = "Features"
        });

    return estimator.Fit(data);
}

This method takes an MLContext object and an IDataView object as inputs. It then creates a FastTreeBinaryTrainer estimator with some hyperparameters and fits the estimator to the input data to train the model.

Finally, let’s create a method to test the model:

public static void TestModel(MLContext mlContext, ITransformer model, IDataView data)
{
    var predictions = model.Transform(data);

    var metrics = mlContext.BinaryClassification.Evaluate(predictions);

    Console.WriteLine($"Accuracy: {metrics.Accuracy}");
    Console.WriteLine($"AUC: {metrics.AreaUnderRocCurve}");
}

It then applies the model to the test data to generate predictions, and evaluates the predictions using binary classification metrics.

Now, let’s put it all together in a Main method:

static void Main(string[] args)
{
    var mlContext = new MLContext();

    var trainData = LoadData("train.txt");
    var testData = LoadData("test.txt");

    var preparedTrainData = PrepareData(mlContext, trainData);
    var preparedTestData = PrepareData(mlContext, testData);

    var model = TrainModel(mlContext, preparedTrainData);

    TestModel(mlContext, model, preparedTestData);
}

This method creates an MLContext object, loads the training and test data, prepares the data, trains the model, and tests the model.

To generate your own dataset, you can simply create a text file with tab-separated values, where the first column represents the text of the sentiment and the second column represents the sentiment label (either true for positive or false for negative). You can then use this file to train and test your sentiment analysis model.

Approach 3

As a standard example we can also use the IMDB movie review dataset, which contains more than 50,000 movie reviews labeled as positive or negative. You can download the dataset from here: https://ai.stanford.edu/~amaas/data/sentiment/.

Once you have downloaded the dataset, you can split it into a training set and a test set. For this example, we’ll use 40,000 reviews for training and 10,000 for testing.

Next, we’ll load the data into memory using the TextLoader class provided by .NET ML. Here’s the code to load the data:

using Microsoft.ML;
using Microsoft.ML.Data;
using System;

namespace SentimentAnalysis
{
    class Program
    {
        static void Main(string[] args)
        {
            var context = new MLContext();

            // Load data
            var data = context.Data.LoadFromTextFile<MovieReview>(
                @"C:\path\to\imdb\dataset\train\imdb_train.txt",
                hasHeader: true,
                separatorChar: '\t');

            // Split data
            var trainTestData = context.Data.TrainTestSplit(data, testFraction: 0.2);

            // Prepare training data
            var trainData = trainTestData.TrainSet;

            // Prepare test data
            var testData = trainTestData.TestSet;
        }
    }

    public class MovieReview
    {
        [LoadColumn(0)]
        public string Label { get; set; }

        [LoadColumn(1)]
        public string Review { get; set; }
    }
}

In this code, we use the LoadFromTextFile method to load the data from a text file. We pass in the path to the file, set hasHeader to true because our file has a header row, and set separatorChar to ‘\t’ because the data is tab-separated.

We then use the TrainTestSplit method to split the data into a training set and a test set. We pass in the data and set testFraction to 0.2 to use 20% of the data for testing.

Next, we prepare the training and test data by setting trainData and testData to the training and test sets returned by TrainTestSplit.

Now that we have our data, we can create a machine learning pipeline to train our model. Here’s the code to create the pipeline:

// Create pipeline
var pipeline = context.Transforms.Text.FeaturizeText("Features", nameof(MovieReview.Review))
    .Append(context.Transforms.Conversion.MapValueToKey("LabelKey", nameof(MovieReview.Label)))
    .Append(context.Transforms.Text.ProduceNgrams("Features", "Features", ngramLength: 2))
    .Append(context.Transforms.NormalizeMinMax("Features"))
    .Append(context.Transforms.Conversion.MapKeyToValue("Label", "LabelKey"))
    .Append(context.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

// Train model
var model = pipeline.Fit(trainData);

In this code, we use the FeaturizeText method to convert the text data in the Review column to a numerical vector of features. We then use MapValueToKey to convert the Label column to a numeric key.

Next, we use ProduceNgrams to generate n-grams from the features. We then use NormalizeMinMax to normalize the features.

To simply conclude, .NET ML is a set of libraries and tools provided by Microsoft that allows developers to easily build machine learning models in .NET languages such as C#. With .NET ML, developers can build and train a wide variety of machine learning models, including classification, regression, clustering, and recommendation systems. .NET ML also provides tools for data preparation, model evaluation, and deployment. By using .NET ML, developers can leverage the power of machine learning to solve complex problems and create intelligent applications.

Approach 1

Approach 2

Approach 3

Related

Leave a comment Cancel reply