Dataset health metrics to enhance your dataset management and the ability to specify labels for training, ensuring optimal model performance.
By Philip AdzanoukpeFeb 29, 2024

3 min read

Dataset Quality Checks and Label Specification

At Epigos AI, we're committed to empowering our users with cutting-edge tools and features to enhance their AI journey. Today, we're thrilled to announce the release of our newest feature: Dataset Quality Checks and Label Specification.

What's New?

Our latest feature introduces comprehensive dataset health metrics, enabling users to assess the quality of their datasets effortlessly. These metrics include:

  1. Class Imbalance Analysis: Understand the distribution of classes within your dataset to identify any imbalance issues that may affect model performance.

  2. Missing Annotations Detection: Quickly identify instances where annotations are missing, ensuring comprehensive coverage and accuracy in your dataset.

  3. Image Dimensions and Pixels Analysis: Gain insights into image dimensions and pixel values, helping you identify any anomalies or inconsistencies that may impact model training.

Dataset quality checks

Label Specification for Model Training

In addition to dataset quality checks, we've incorporated a new option for specifying labels to use during model training. This empowers users to tailor their training data by excluding underrepresented labels, thereby optimizing model quality and performance.

How Does It Help?

The addition of dataset health metrics serves as a valuable guide for users, informing them about the quality of the dataset at hand. By understanding these metrics, users can make informed decisions on how to improve their datasets, ultimately leading to better model performance and accuracy.

Why It Matters

By providing users with the ability to specify labels for training, we ensure that models are trained on the most relevant and representative data. This not only improves model accuracy but also streamlines the training process by focusing on the most significant data points.

