
Voxel51 in Ann Arbor, a powerful visual AI data platform, has released new research showing auto-labeling technology can achieve accuracy nearly equivalent to human labeling (up to 95 percent), while operating 5,000 times faster than traditional annotation methods.
As a result, labeling costs can be reduced by up to 100,000 times, potentially saving millions of dollars in AI development costs, according to the company.
Essential to powering computer vision, data labeling — the costly, time-consuming process of manually labeling images to train machine learning models — has traditionally been a tedious, costly, and slow process.
For years, AI hype reinforced the idea that more labels meant better models.
“Our research shows that data annotation no longer has to be a multi-million-dollar line item,” says Jason Corso, co-founder and chief science officer at Voxel51. “While previous research has qualitatively claimed auto-labeling reduces annotation costs, our study provides concrete figures that have significant implications.
“The findings reflect the potential for a massive reduction in costs for data labeling, enabling AI developers to invest more of their budget and human workforce on more effective data curation, quality assurance, model and edge-case analysis, and strategic dataset expansion.”
To determine whether auto-labels alone could produce high-performing models in real-world scenarios, Voxel51’s Auto-Labeling Data for Object Detection study benchmarks leading foundation models, including YOLOE, YOLO-World, and Grounding DINO, across four widely-used datasets:
The datasets — Berkeley Deep Drive (BDD autonomous driving), Common Objects in Context (COCO), Large Vocabulary Instance Segmentation (LVIS high complexity), and Visual Object Classes (VOC general imagery) — span basic object categories to challenging, long-tail distributions.
Using mean Average Precision (mAP), a key real-world metric for object detection accuracy, the study found that models trained solely on auto-labels performed just as well, and sometimes even better, than models trained on traditional human labels.
Voxel51 research findings include:
- Results showed AI-generated labels can achieve about 90–95 percent of the performance of human labeling for as much as 100,000 times in cost savings and for cutting time by 5,000 times.
For example, labeling 3.4 million objects on a single NVIDIA L40S GPU cost only $1.18 and took just over an hour. In comparison, manually labeling the same dataset via AWS SageMaker, which has among the least expensive annotation costs, would cost roughly $124,092 and take nearly 7,000 hours.
- In certain cases, such as detecting rare classes in COCO or VOC, auto-label-trained models occasionally outperformed those trained on human labels. This may occur because foundation models, trained on massive datasets, can generalize better than humans across diverse objects or more consistently label challenging edge cases.
- While auto-labels achieve close to the performance of human labeling in many practical scenarios, careful consideration of dataset complexity and class definitions remains essential. For specialized or particularly challenging categories, teams should consider adopting hybrid annotation strategies, combining auto-labeling’s scalability with targeted human expertise.
Voxel51 provides an AI development platform that helps organizations unlock the full potential of their visual data.
By leveraging innovative techniques to explore, refine, and enhance large-scale datasets and models, Voxel51 enables organizations to deliver accurate and reliable AI models that drive real-world impact.
The full research report is available for download on the Voxel51 website.