Annotate Smarter

What is Ontology in Machine Learning Data Annotation?

What is ontology in machine learning data annotation? What does an annotation ontology contain? Why does it matter? How to do ontology-based annotation on the BasicAI platform?

min

Admon W.

Ontology started in ancient philosophy as the study of the nature of existence.

Computer scientists later adopted the term. In machine learning, an ontology is a structured framework that defines the entities, concepts, relationships, and constraints in a domain or problem space.

Applied to data annotation, ontology embeds rich semantic and hierarchical information into annotations. This helps keep labeling consistent across large teams and long-running projects. Downstream AI systems can also use the added structure to improve reasoning and decision-making.

BasicAI’s Xtreme1 was the first open-source multimodal training data platform to bring ontology into the core annotation workflow. The same system is also integrated into the BasicAI Data Annotation Platform, helping AI teams maintain high label consistency, reuse schemas across projects, and capture domain knowledge as an asset.

In this post, we'll explain what an annotation ontology contains, why it matters, and how to build ontology on the BasicAI platform.

What elements make up an ontology in data annotation?

An effective ontology for machine learning data annotation consists of several connected elements that together form a domain knowledge model:

Class, Sub-Class, and Instance

In ontology-based data annotation, a class is a basic category of “things” in the domain. It defines a general concept that can apply to many entities. In an autonomous driving dataset, for example, “vehicle” can be a top-level class covering motor vehicles.

Classes usually sit in a hierarchy. A sub-class is a specialized version of a parent class. It inherits broad properties from the parent, and adds features that distinguish it from sibling classes. “Sedan” can be a sub-class of “vehicle”, and may be further split into “luxury sedan” or “mid-size sedan”.

An instance (often called an entity or object) is a specific, located occurrence of a class or sub-class in actual data.

This hierarchy matters for deep learning. It allows models learn visual features at different levels of abstraction. If low light or heavy occlusion prevents a model from deciding whether an object is a sedan or an SUV, the perception stack can still recognize it as a vehicle from more general shape cues and trigger braking.

Classification

Classes and sub-classes define and locate objects (entities) in the data. Classification, in the context of an annotation ontology, describes global properties of the broader context, the environment, or the data itself.

Classification ontologies often split into data-level and scene-level ones. Scene-level classification assigns semantic labels to describe the full environment or context in a frame, rather than any single local object within it. These labels provide key metadata so AI models can interpret objects under the conditions in which they appear.

In autonomous driving, classifications might include weather (sunny/rainy/snowy/foggy) and time context (day/night/dawn/dusk). Other scene labels can include road type, traffic density, or the presence of pedestrians and cyclists.

Attributes

A class says what something is. An attribute describes how it looks, what state it's in, or what properties it has. Attributes capture nuance that a simple class label cannot. That nuance often affects real-world performance.

Two especially important attributes in computer vision and spatial annotation are:

Occlusion: how much an object is hidden by other objects or the environment.
Truncation: whether part of the object lies outside the sensor’s field of view. This tells the model the missing pixels are due to framing, not physical obstruction.

Other common attributes include color (e.g., red, blue, silver), motion state (e.g., stationary, moving), or physical condition (e.g., intact, damaged, severely deformed).

Relation

A relation defines semantic, logical, spatial, or causal connections between two or more instances. Relation annotation is widely used in NLP and knowledge extraction, and the same idea applies to multimodal data.

A standard format in text annotation is the entity–relation–entity triple. Entities are recognized concepts such as people, organizations, locations, or events. The relation encodes the meaningful link between them.

Constraints

Attributes describe properties. Constraints set validity bounds to ensure annotations conform to domain knowledge and physical reality.

For example, when annotating sedans with a 3D cuboid tool in LiDAR point clouds, predefined constraints might specify that any sedan instance cannot exceed 6.0m long x 2.5m wide x 2.0m tall.

Raw LiDAR point clouds can be sparse. Annotators may accidentally extend a 3D cuboid (3D bounding box) into background noise or a nearby car. Ontology constraints act as guardrails, so the model learns correct spatial dimensions.

Why ontology matters for machine learning data annotation?

In computer science, ontology was first used to connect early web hypertext documents by giving precise definitions to underlying concepts.

Modern deep learning, particularly in computer vision and NLP, requires training datasets that can reach petabyte scale. The semantic framework that was built to organize the web maps well to organize training data, and in practice becomes necessary.

Ontology has moved from a pure knowledge-representation idea to a practical backbone for data-centric AI. Integrating formal ontology into ML data annotation has far-reaching implications.

The traditional approach based on class labels alone (“flat taxonomy”) breaks down at scale:

Different annotators interpret the same guideline differently, which introduces bias and inconsistency.
As applications grow more complex and datasets reach millions of images, teams spend heavily on relabeling, reconciling conflicting tags, and dealing with a combinatorial explosion in flat label sets.

Higher annotation quality

Ontology-based annotation standardizes work through a rigorous framework. It ensures an annotator in one office labels a heavily occluded vehicle the same way as an annotator on another continent. That reduces error rates, narrows data variance, and cuts expensive rework.

Scalability and reuse

Many datasets are labeled as one-off efforts for a single, narrow model task. The result is data silos that are hard to reuse. Ontology abstracts the problem definition away from one model’s immediate needs. A well-built ontology becomes a central, evolving knowledge base that you can reuse, extend, and apply to future datasets.

How to start annotating with ontology? (BasicAI platform example)

Ontology-based annotation requires robust software infrastructure. Here we use the BasicAI data annotation platform as an example to show how to build and manage ontologies. The platform provides an intuitive interface designed for computer vision and multimodal data, so project owners can create, deploy, and enforce complex ontologies.

The Ontology Center

Ontology Center is the central repository for ontology assets on the BasicAI platform. You can access it from the left-side Ontology tab on the homepage. It separates ontology creation from any single dataset (though you can also create ontologies inside a dataset).

Ontology Center allows teams build an ontology once and apply it across multiple projects and datasets. This reduces setup cost when starting new labeling work.

Creating Classifications

Let's start with classification ontology.

Open a dataset and switch to the Ontology tab.
Under the tab, go to the Classification section.
Click Create to define a new classification.
Fill in key fields:
1. Name: The classification identifier. For this example, we create a “time_of_day” classification.
2. Target On: Choose whether the classification applies at the scene level or data level. For scene-level classification, select Scene.
3. Options: Add valid values. For time of day, options might be Day, Night, Dawn, Dusk. Each scene should map to exactly one option, with no ambiguity.
Repeat the same flow to create more classifications, such as Weather with options like Sunny, Rainy, Snowy, Foggy, Overcast.

During labeling, you can switch the right-side panel to Classification annotation and see the ontologies you created.

Creating a Classification Ontology on BasicAI Platform

Creating Classes

A class defines the object types that must be labeled in a scene (and learned by the model). Whether you create classes globally in Ontology Center or locally inside a dataset, the steps are the same.

Click Create to open the class configuration panel. It includes:

Name: The official machine-readable name (for example, vehicle_sedan). This is the exact string downstream training receives.
Alias: A human-friendly name shown in the UI when the machine-readable name is long or unclear. Annotators can toggle between name and alias with the “O” shortcut to stay oriented during fast labeling.
Number: An integer ID (for example, 1 for Pedestrian, 2 for Vehicle). If downstream pipelines use numeric IDs, this avoids extra parsing or mapping scripts.
Color: Display color for this class in the labeling view. Pick colors that are easy to tell apart so annotators can visually verify categories.
Tool Type: The annotation tool for this class, such as bounding box, polygon, mask, cuboid, or skeleton. If you choose skeleton, you must also define keypoint order and connections before saving. For other modalities, the platform provides tool types such as Clip for temporal segmentation in video or audio.
Tags: Domain tags or special handling markers. Useful for search and organization in large ontologies.
Size Limit: Size constraints, especially for 3D LiDAR annotation. You can set strict min/max bounds for width, height, area, and more.

Creating an Ontology Class on BasicAI Platform

Setting Attributes

Attributes add the state and nuance a class needs. In the class configuration window, click Manage Attributes to open the attribute panel.

You can add multiple attributes to a class. Each attribute has an input type that controls how annotators enter values:

Radio: Mutually exclusive choice. Annotators must pick exactly one option.
- Example: occlusion > None / Partial (<40%) / Heavy (40-70%) / Extreme (>70%)
Multi-selection: Multiple overlapping states can be selected.
- Example: damage types such as “windshield cracked” and “bumper dented” at the same time.
Dropdown: Like radio logic, but more compact. Useful for attributes with long lists.
- Example: “vehicle brand” with dozens of manufacturers.
Text: Free-form input for transcription, unique strings, or formulas.
- Example: license plate transcription.
Rank: A numeric severity or ordering.
- Example: medical imaging grades for lesion severity, readability, or artifact level.

Example of different input types for Ontology Attributes

Creating Sub-Classes

After you define the class configuration, save it to create the ontology class. To add hierarchy, hover over an existing class entry in the Ontology tab and use the Sub-Class entry point to create a more specific class under it.

Ontology management

Once an ontology is created and refined through real labeling work, you need a clear process for versioning, sharing, and evolution.

In Ontology Center, use the three-dot menu in the top-right to import/export ontology JSON for version control, sharing, or archival.
In a dataset’s Ontology tab, the three-dot menu also provides Copy from Ontology Center, which enables project-level management while staying connected to the central library.

If an organization maintains a detailed city-driving ontology, any newly uploaded LiDAR dataset can be configured for annotation in seconds by importing that master ontology.

Conclusion

Modern ML systems have moved from isolated perception tasks to multimodal reasoning and interaction in dynamic environments. The methods used to annotate and manage training data must evolve with that shift.

Formal ontology in machine learning annotation is a practical change in how you enforce consistency and capture knowledge. Ontology-based labeling reduces disagreement between annotators and produces cleaner training signals, which helps models learn more reliably.

Ontology assets also compound in value across projects. Reuse speeds up new project setup and ensures new work benefits from what the organization has already learned.

Modern platforms such as Xtreme1 (open source) and the BasicAI enterprise platform make these ontology-driven workflows accessible in day-to-day annotation work.

Back to All Posts

Get Essential Training Data
for Your AI Model Today.

Let's Talk

AI Training Data Solutions & Services

Overview of BasicAI’s professional, efficient and low-cost data annotation services for all types of training data and all industries.

Contact BasicAI to get project estimates and free pilot for your customized data labeling project.

End-to-end image/video annotation services for robust computer vision.

Leading 3D Sensor Fusion annotation services for autonomous systems.

Data labeling services for large language model and Gen AI training.

Get Project Estimates

BasicAI Data Annotation Platform

Overview of BasicAI’s all-in-one smart data annotation platform.

Explore the AI-powered labeling toolset for all types of AI training data.

See how BasicAI facilitates collaborative annotation project.

Learn about annotation tools designed for SFT, RLHF and classification tasks.

Tools for auto point cloud data labeling and semantic segmentation.

Choose the right plan for your teams, no matter how small or large.

Industries & Use Cases

Proprietary Data Engine Prompt Delivery Full Quality Assurance

Competitive Pricing Dedicated Project Manager ​Robust Data Security

Free Pilot Project

Blog

Resources

Open Source

Platform