top of page

Annotate Smarter

BasicAI Cloud v1.1: Data Annotation for Large Models Training

BasicAI Cloud is now upgraded to v1.1, with annotation tools and workflows tailored for Generative AI and large language model training.




Admon W.

It is a truth universally acknowledged, that the natural language processing (NLP) landscape is undergoing a seismic shift, largely driven by the emergence of large language models (LLMs). These powerful models have showcased their ability to tackle a wide array of tasks, with Generative AI (Gen AI) built on top of them propelling virtual assistants from the likes of "Eliza" to the realm of "Jarvis."

As the era of large models arrives, data annotation, a crucial cog in the AI machine, is evolving to keep pace. At BasicAI, we've been quick to identify this trend and have proactively incorporated annotation tools and workflows tailored for Generative AI into BasicAI Cloud's roadmap. We strive to help our customers achieve AI success in the new wave – as always.

This brings us to the launch of BasicAI Cloud v1.1. Here is a brief introduction to this update:

Introducing New Annotation Tools for SFT and RLHF Tasks

Supervised Fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) are two critical stages in the supervised training pipeline of large language models. SFT provides the model with suggested answers, while RLHF fine-tunes the model's output to align with human preferences through a reward-punishment mechanism (via scoring or ranking).

BasicAI v1.1 Now Supports Generative AI Data Annotation

BasicAI Cloud now supports both of these annotation task types. It also enables the annotation of Generative AI data that combines images and text, facilitating the creation of training datasets for large multimodal models.

New Ontology and Tools for Generative AI Data

Accordingly, we've introduced the Generative AI Ontology type, which comes with two annotation tools: Dialog Evaluation and Dialog Response.

Generative AI Ontology with Two Annotation Types

Dialog Response Tool: Enabling annotators to continue the dialogue with suitable responses by different roles based on the given context. By providing high-quality, human-generated responses, annotators create a rich SFT dataset that serves as a reference for the model during the fine-tuning process.

Dialog Evaluation Tool: Designed for RLHF tasks. The tool allows annotators to label dialogue content generated by models, providing valuable feedback on the quality and appropriateness of the generated responses. This labeled data is then used to fine-tune the model, reinforcing desirable behaviors and penalizing suboptimal ones, thereby aligning the model's output with human preferences.

Classification Tool: Used to assign labels or categories to the entire dialogue, providing valuable metadata for the model, like language, domain, or formality level. This contextual information enables the model to generate more appropriate and targeted responses, enhancing its overall performance and usability.

Dialog Response Tool for SFT Tasks and Dialog Evaluation for RLHF Tasks

Collaboration System for GenAI Annotation

At the same time, the collaboration system has been updated to support SFT and RLHF tasks, offering features like custom workflows, task distribution, members management, and automatic QA – similar to other task types. The performance statistics logic remains largely unchanged. Users can batch modify the Ontologies, workflow, basic settings, and QA rules for Generative AI tasks.

Annotation Interface

The image below shows the annotation interface for Generative AI data. Users can upload and process .json, .csv, .xlsx, .xls files and .zip, .gzip, .tar, and .rar archives containing valid files.

Annotation Interface of Generative AI Data

The platform automatically parses the data and renders it as dialogue bubbles, differentiating between User and Bot roles. Similar to text annotation, the canvas supports content search, font size scaling, and toggling between edit / read-only modes. Ontology attributes can be expanded or collapsed. The "Class Inheritance" feature allows quick application of the same Class labels to different dialogue content.

We've also introduced a nifty Pin feature that lets users pin up to 4 selected bubbles on the interface.

Start Your LLM Training Journey with BasicAI Cloud

From computer vision data like images, videos, and point clouds, to NLP data spanning text, speech, and now LLM data support, BasicAI Cloud remains committed to being the go-to annotation platform for algorithm experts and businesses across diverse domains. We firmly believe that data is the bedrock of any model's success, and we aim to accelerate this success by simplifying the building of high-quality training datasets.

Click the button below to create your first GenAI dataset now.


History View & Restore for Annotation Tools

History tracking is a vital feature in tool-based software (e.g., document management, data management) to monitor data changes. We've incorporated this into our platform, enabling users to view and restore point cloud and image annotation tool results. In case of annotation data loss due to human error or system issues, the nearest version can be swiftly restored based on the history.

History View & Restore for Annotation Tools

The annotation modules automatically save every 5 minutes when not in a Paused state. Manual operations that alter the data state also trigger the saving of history records. The entry point for history records can be found at the top of the annotation interface. Upon entering, users can view operation details and preview or restore them to a specific version.

If you are a project manager, you can freely configure team members' permissions to view and restore task history records.


Enhanced Upload/Export Progress Tracking with Stage-wise Breakdown

During user research, we learned that the platform's display of total progress during upload and export could sometimes be perplexing, especially when data remained stuck at 40% or 70% for extended periods. To address this, we've introduced a more detailed breakdown, splitting the progress bar into distinct stages to provide users with clearer insights into the current upload/export progress.

Stage Breakdown of Data Import/Export

For data uploads, the stages may include: Uploading, Pulling, Unzipping, Data Format Conversion, and Parsing. Hovering over the progress bar reveals the time spent on each stage. For batch uploads, the queue status is also displayed, indicating the position of specific data in the queue or whether it's being processed. In case of upload failures, a specific reason is provided, enabling users to make necessary adjustments and re-upload the data. We've also expanded the size limit of uploaded files to 100GB (free for a limited time).

Data exports involve stages such as Standard Format Processing, Script Conversion, Zipping, and Transferring. The export process also showcases the queue, progress, and processing status.


Explore BasicAI Cloud v1.1 Now

These are the three key updates in BasicAI Cloud v1.1. Additionally, we've made a host of UX optimizations and squashed some bugs. Now, with BasicAI Cloud v1.1, you can confidently hop on the express of AI and make strides toward a future where your bold innovation is accessible to all.

Get Project Estimates
Get a Quote Today

Get Essential Training Data
for Your AI Model Today.

bottom of page