An artificial intelligence software company started by University of Michigan professors has launched the computer vision industry’s first open-source rapid data experimentation tool.
FiftyOne, the new tool from Voxel51, addresses what the company defines as the most fundamental pain point for machine learning scientists – performance-limiting datasets.
Its interactive dashboard and Python library, the creators say, allows users to easily visualize, explore, and analyze their data to curate superior datasets with the ability to scale to the size and level of accuracy required in order to be useful in complex, real-world applications.
“Nothing hinders the success of machine learning systems more than poor-quality data,” says Jason Corso, co-founder and CEO of Voxel51. “Yet the process of continuous data quality management is incredibly challenging and time consuming.
“We created this tool, which brings over 15 years of academic research and experience in creating computer vision and machine learning systems to offer engineers a better toolbox and a more efficient way to improve the quality, accuracy, and diversity of image datasets in order to mitigate the consequences of bad data and to improve the predictive performance of production models.”
According to the company, FiftyOne makes it easy to search, filter, and sort images by their predicted and ground truth classifications as well as their related hardness, mistakenness, and uniqueness.
It supports the most commonly used image dataset types such as Berkeley DeepDrive, COCO, CVAT, KITTI, TFRecords, and VOC. A collection of open-source datasets and support for TensorFlow and PyTorch ML frameworks also are built into FiftyOne, along with helper functions and tutorials.
Users are able to view subsets of dataset samples; remove duplicate and near-duplicate images; cleanup labeling mistakes; recommend samples for annotation; rank samples by representativeness; and discover the most unique samples, which is essential to improving model performance, says Voxel51.
“There’s no other solution that’s as easy to use to rapidly experiment with datasets and to identify limitations that stifle model performance,” says Brian Moore, co-founder and chief technical officer of Voxel51. “We’re transforming the art of dataset curation into a science and we are eager to share the results of our work with developers and machine learning scientists across industry and academia.”
For more information, visit here.