Language: Java
ML/AI / Machine Learning
Weka was developed at the University of Waikato to provide an accessible platform for teaching, research, and experimentation in machine learning. It offers a wide range of algorithms and evaluation methods, making it popular in academia and for rapid prototyping of ML models in Java.
Weka is a collection of machine learning algorithms for data mining tasks in Java. It provides tools for classification, regression, clustering, association rule mining, and data preprocessing, with both a GUI and programmatic API.
<dependency>
<groupId>nz.ac.waikato.cms.weka</groupId>
<artifactId>weka-stable</artifactId>
<version>3.8.6</version>
</dependency>implementation 'nz.ac.waikato.cms.weka:weka-stable:3.8.6'Weka provides an API for loading datasets, applying machine learning algorithms, evaluating models, and exporting results. It supports ARFF, CSV, and database inputs, and integrates preprocessing filters, feature selection, and model evaluation methods.
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
DataSource source = new DataSource("data/iris.arff");
Instances data = source.getDataSet();
if(data.classIndex() == -1) data.setClassIndex(data.numAttributes() - 1);
System.out.println(data);Loads an ARFF dataset and prints a summary of instances and attributes.
import weka.classifiers.trees.J48;
import weka.classifiers.Evaluation;
J48 tree = new J48();
tree.buildClassifier(data);
Evaluation eval = new Evaluation(data);
eval.crossValidateModel(tree, data, 10, new java.util.Random(1));
System.out.println(eval.toSummaryString());Trains a J48 decision tree on the dataset and evaluates it using 10-fold cross-validation.
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Normalize;
Normalize normalize = new Normalize();
normalize.setInputFormat(data);
Instances normalizedData = Filter.useFilter(data, normalize);Applies normalization preprocessing to the dataset before training.
import weka.clusterers.SimpleKMeans;
SimpleKMeans kMeans = new SimpleKMeans();
kMeans.setNumClusters(3);
kMeans.buildClusterer(data);Performs k-means clustering on the dataset.
eval.evaluateModel(tree, data);
System.out.println(eval.areaUnderROC(1));Evaluates classifier performance and prints ROC area for the positive class.
import weka.attributeSelection.InfoGainAttributeEval;
import weka.attributeSelection.Ranker;
weka.attributeSelection.AttributeSelection selector = new weka.attributeSelection.AttributeSelection();
selector.setEvaluator(new InfoGainAttributeEval());
selector.setSearch(new Ranker());
selector.SelectAttributes(data);Performs feature selection using information gain and ranks attributes.
Normalize or standardize data when needed.
Use cross-validation to assess model performance reliably.
Filter irrelevant or redundant features to improve accuracy.
Try multiple algorithms and compare results.
Leverage Weka’s evaluation metrics to understand classifier performance.