Computer Vision

Is Data Annotation Obsolete with Meta's Segment Anything

Meta released Segment Anything on April 5th, announcing the arrival of the image-based large model era. Do we still need manual labeling?

min

Basic Marketing

In the last decade, machine learning has ignited a new wave of artificial intelligence, culminating in the launch of ChatGPT late last year, that fully showcased the superpowers of large models. Some are exclaiming "after ChatGPT, NLP will no longer exist". In the past, natural language experts specialized in their own fields, with some focusing on text classification, information extraction, question answering, or reading comprehension. With the advent of large models, the prompt paradigm in the NLP field has begun to expand into the computer vision (CV) domain, allowing large models to achieve good results in zero-shot and few-shot learning of new datasets by using "prompt" technology. A few weeks ago, everyone was looking forward to the arrival of "ImageGPT" and "multimodal GPT."

On April 5th, Meta released Segment Anything [1], announcing the arrival of the image-based large model era. Has the standard answers (Ground truth) of manual annotations in academia and commercial applications become a relic of the past?

A Glimpse at Semantic Segmentation

Semantic segmentation is an advanced image processing technique that classifies each pixel to create semantically meaningful regions. It goes beyond merely outlining objects, providing precise labeling of their specific locations and shapes. Semantic segmentation has widespread applications in CV, medical image processing, and digital art.

Unveiling the Segment Anything Project

Meta's Segment Anything project introduces a new image segmentation task, model, and dataset. The dataset boasts the most extensive segmentation collection to date, with over 100 million Mask images and 11 million license-compliant images. The "promptable" model allows zero-shot learning transfer to new image tasks, showcasing impressive results, even surpassing previous results using fully supervised learning. The "Segment Anything Model (SAM)" and corresponding dataset (SA-1B) have also been released.

Creating accurate segmentation models typically "requires highly specialized work by technical experts with access to AI training infrastructure and large volumes of carefully annotated in-domain data." By creating SAM, Meta hopes to "3" the process by reducing the need for specialized training and expertise, promoting further development in computer vision research.

SAM: A Game Changer?

SAM claims to have mastered a general concept of objects, generating Masks for any object in any image or video, even those not encountered during training. Its versatility lends itself to a wide range of use cases, enabling zero-shot transfer to new image domains.

SAM vs. Manual Annotation

Challenges & Opportunities

Similar technologies are already on the market

We've seen many decent segmentation tools, like Photoshop or iOS's built-in image cutout feature. They can generate decent results and improve image processing efficiency with a good interactive experience.

In iOS 16 and later, you can isolate the subject of a photo from the rest of the photo and then copy or share it [2]

“Ground truth” in open-source datasets are full of errors

Image algorithm engineers often request annotators to re-annotate open-source datasets in commercial projects, incurring substantial costs. This is primarily due to the subpar "standard answers (Ground truth)" in open-source datasets, riddled with annotation errors.

For niche research scenarios, manual annotations may not satisfy data quality requirements. For example, a traffic light scenario researcher discovered glaring annotation errors in datasets like COCO. These issues persist in other renowned datasets, such as CIFAR-100 and ImageNet. Data annotation is challenging and error-prone, resulting from ambiguous requirement documents and human judgment inconsistencies.

Incorrect annotations for traffic lights in the COCO dataset

Not accurate enough in the professional field

In professional fields, when SAM's performance closely matches or even surpasses human annotation results in some data segmentation tasks, we question the poor quality of open-source datasets. Undoubtedly, no one would use ChatGPT's responses entirely in their articles, as we must accept occasional "nonsense" that may distort facts. Large general models cannot meet project demands when trained with insufficient professional data. In rigorous disciplines, such as medical diagnosis, autonomous driving, and security, errors are unacceptable.

SAM performs worse in medical data annotation. Professionals with medical backgrounds are necessary for this work

Other Challenges

Other challenges include issues similar to those faced by ChatGPT, like computing power and data security. In practical environments, many online small models cannot handle excessive operating costs.

Opportunities we found

Indeed, as artificial intelligence technology continues to grow, traditional NLP and CV techniques may gradually become obsolete in the future. Future research should focus on deeper, more abstract frameworks for thinking and exploration.

Embrace cutting-edge technology: Revolutionary technologies shouldn't bring despair; we should strive to understand and employ them. New paradigms improve production efficiency and lay a solid foundation for future innovation. Professionals in niche areas must continue digging deep.
Open-source software: AI technology's rapid development benefits from open-source concepts, enabling everyone to stand on giants' shoulders. This is the original intention behind our open-source Xtreme1 project
Open-source data: AI requires excellent data to operate correctly. We're currently researching the world's first multimodal data, covering the latest sensor devices and accurate human-annotated data. Stay tuned!

Lastly, we'd like to share some open-source image segmentation datasets for tech enthusiasts like you:

ADE20K (82.6k, 3.9GB):

https://opendatalab.com/ADE20K_2016

The ADE20K dataset offers benchmark scene data and partial segmentation data. Each folder houses images sorted by scene categories, with object and partial segmentation stored in distinct PNG files. All instances have been individually annotated for precision.

Medical Segmentation Decathlon (4.4k, 354.9GB):

https://opendatalab.com/Medical_Segmentation_Decathlon

MSD is an extensive collection of medical image segmentation datasets, comprising 2,633 three-dimensional images collected across multiple anatomical areas of interest, modalities, and sources. Specifically, it includes data for brain, heart, liver, hippocampus, prostate, lung, pancreas, hepatic vessels, spleen, and colon.

Panoptic Agricultural Satellite TIme Series (14.6k, 36.8GB):

https://opendatalab.com/PASTIS

PASTIS is a benchmark dataset for panoptic and semantic segmentation of agricultural plots in satellite time series. It features 2,433 patches within French urban territories, each with panoptic annotations (instance index and semantic labels per pixel). Each patch consists of a variable-length Sentinel-2 multispectral image time series.

3D Lane Synthetic Dataset (30k, 17.8GB):

https://opendatalab.com/3D_Lane_Synthetic_Dataset

This synthetic dataset is designed to promote the development and evaluation of 3D lane detection methods. It expands upon the Apollo Synthetic Dataset. For detailed construction strategies and evaluation methods, refer to the ECCV 2020 paper: "Gen-LaneNet: a generalized and scalable approach for 3D lane detection," Y. Guo, et al., ECCV, 2020.

We hope that these resources will be helpful for researchers and practitioners working on image segmentation and other computer vision tasks. As AI technology continues to advance, it is essential to collaborate and share knowledge to promote further innovation and breakthroughs in the field.

References:

[1] Segment Anything. https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation

[2] Create and share photo cutouts on your iPhone. https://support.apple.com/en-us/HT213459

[3] The Mislabelled Objects in COCO. https://www.neuralception.com/mislabelled-traffic

[4] How I found nearly 300,000 errors in MS COCO. https://medium.com/@jamie_34747/how-i-found-nearly-300-000-errors-in-ms-coco-79d382edf22b

[5] Many thanks to OpenDataLab for providing dataset support. For more datasets, please visit: https://opendatalab.com

Cover image from Segment Anything | Meta. https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation

Back to All Posts

Get Essential Training Data
for Your AI Model Today.

Let's Talk

AI Training Data Solutions & Services

Overview of BasicAI’s professional, efficient and low-cost data annotation services for all types of training data and all industries.

Contact BasicAI to get project estimates and free pilot for your customized data labeling project.

End-to-end image/video annotation services for robust computer vision.

Leading 3D Sensor Fusion annotation services for autonomous systems.

Data labeling services for large language model and Gen AI training.

Get Project Estimates

BasicAI Data Annotation Platform

Overview of BasicAI’s all-in-one smart data annotation platform.

Explore the AI-powered labeling toolset for all types of AI training data.

See how BasicAI facilitates collaborative annotation project.

Learn about annotation tools designed for SFT, RLHF and classification tasks.

Tools for auto point cloud data labeling and semantic segmentation.

Choose the right plan for your teams, no matter how small or large.

Industries & Use Cases

Proprietary Data Engine
Prompt Delivery
Full Quality Assurance

Competitive Pricing
Dedicated Project Manager
Robust Data Security

Free Pilot Project

Blog

Resources