Weka

Language: Java

ML/AI / Machine Learning

Weka was developed at the University of Waikato to provide an accessible platform for teaching, research, and experimentation in machine learning. It offers a wide range of algorithms and evaluation methods, making it popular in academia and for rapid prototyping of ML models in Java.

Weka is a collection of machine learning algorithms for data mining tasks in Java. It provides tools for classification, regression, clustering, association rule mining, and data preprocessing, with both a GUI and programmatic API.

Installation

maven: <dependency> <groupId>nz.ac.waikato.cms.weka</groupId> <artifactId>weka-stable</artifactId> <version>3.8.6</version> </dependency>
gradle: implementation 'nz.ac.waikato.cms.weka:weka-stable:3.8.6'

Usage

Weka provides an API for loading datasets, applying machine learning algorithms, evaluating models, and exporting results. It supports ARFF, CSV, and database inputs, and integrates preprocessing filters, feature selection, and model evaluation methods.

Loading a dataset and printing summary

import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;

DataSource source = new DataSource("data/iris.arff");
Instances data = source.getDataSet();
if(data.classIndex() == -1) data.setClassIndex(data.numAttributes() - 1);
System.out.println(data);

Loads an ARFF dataset and prints a summary of instances and attributes.

Training a simple classifier

import weka.classifiers.trees.J48;
import weka.classifiers.Evaluation;

J48 tree = new J48();
tree.buildClassifier(data);
Evaluation eval = new Evaluation(data);
eval.crossValidateModel(tree, data, 10, new java.util.Random(1));
System.out.println(eval.toSummaryString());

Trains a J48 decision tree on the dataset and evaluates it using 10-fold cross-validation.

Data preprocessing with filters

import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Normalize;

Normalize normalize = new Normalize();
normalize.setInputFormat(data);
Instances normalizedData = Filter.useFilter(data, normalize);

Applies normalization preprocessing to the dataset before training.

Clustering

import weka.clusterers.SimpleKMeans;
SimpleKMeans kMeans = new SimpleKMeans();
kMeans.setNumClusters(3);
kMeans.buildClusterer(data);

Performs k-means clustering on the dataset.

Evaluating classifiers with ROC

eval.evaluateModel(tree, data);
System.out.println(eval.areaUnderROC(1));

Evaluates classifier performance and prints ROC area for the positive class.

Feature selection

import weka.attributeSelection.InfoGainAttributeEval;
import weka.attributeSelection.Ranker;
weka.attributeSelection.AttributeSelection selector = new weka.attributeSelection.AttributeSelection();
selector.setEvaluator(new InfoGainAttributeEval());
selector.setSearch(new Ranker());
selector.SelectAttributes(data);

Performs feature selection using information gain and ranks attributes.

Error Handling

Exception: No class attribute assigned: Ensure the dataset has the class attribute set before training classifiers.
ArrayIndexOutOfBoundsException: Check dataset formatting and ensure ARFF/CSV files are correctly structured.
Exception: Cannot handle string attributes: Convert string attributes to nominal or numeric using preprocessing filters.

Best Practices

Normalize or standardize data when needed.

Use cross-validation to assess model performance reliably.

Filter irrelevant or redundant features to improve accuracy.

Try multiple algorithms and compare results.

Leverage Weka’s evaluation metrics to understand classifier performance.