Synthetic data generation startup Infinity AI Inc. said today it’s hoping to expand its business after closing on a $5 million seed funding round led by Matrix.
Founders and operators from companies that include Google LLC, Tesla Inc. and Snorkel AI Inc. also participated in the round.
The startup is aiming to tackle one of the most pressing problems in the field of artificial intelligence development. As Infinity AI explains, AI models are only as good as the data they have been trained on.
So one of the challenges of making better AI models is data collection — a process that’s notoriously expensive and slow because all data collected must be labeled and annotated. According to Infinity AI’s studies, many enterprise data scientists spend as much as 80% of their time gathering, organizing and labeling AI training data.
Infinity AI, like many others, believes this problem can be solved using what’s known as synthetic data — that is, data that’s generated through simulation, rather than collected by a sensor. The startup points out that training AI models on synthetic data has become the established best practice at big tech firms such as Tesla, Microsoft Corp. and Amazon.com Inc. Moreover, Gartner Inc. stressed the importance of synthetic data in its 2022 predictions, saying that it will be used to power the development of the majority of AI projects by 2024.
It’s this demand for synthetic data that prompted Infinity AI to create its platform. It makes it possible for users to upload a single, real-world video and transform it into hundreds of similar, perfectly labeled synthetic videos.
On its website, it gives examples of synthetic videos of shoppers in a supermarket and workers in a warehouse. The startup explained that it uses a combination of physics-based simulations and generative algorithms to create these videos. These include its self-serve application programming interface, available now in beta, which makes it possible to create hundreds of videos that meet desired statistical distributions for camera location, lighting conditions, avatar appearance and other parameters.
Are you working on pose estimation? Or computer vision models for remote fitness/physical therapy? Check out the new open-source InfiniteRep dataset!
— Infinity AI (@toinfinityai) February 9, 2022
Infinity AI has also created what’s known as a Stable Diffusion-based inpainting tool that can be used to augment scenes, while a third tool is able to add clothing textures to avatars. These will be launched in early 2023, the company said.
According to Infinity AI, the real advantage of its synthetic data generation platform is that it spans the entire lifecycle for machine learning model development. Its off-the-shelf training datasets can be used to explore new model concepts. Then, when customers are ready, they can graduate to the self-serve API to create their own, tailor-made data. Last, they can adopt Infinity AI’s data flywheel technology that feeds failure cases into the Infinity engine to output infinite, labeled versions of that data.
Andy Thurai, Vice President and Principal Analyst at Constellation Research Inc., told SiliconANGLE that the accuracy of AI models is directly correlated to having lots of training data. The problem is that real world data can be prohibitively expensive, he said, due to the difficulty of collecting it. After the data is collected, it then has to be properly classified and annotated before it can be used for AI model training. Another issue with real-world data is that it can be risky to use from a compliance standpoint, because even anonymized information can be reverse-engineered to get the original data back.
“This is why synthetic data companies are taking off,” Thurai explained. “Many synthetic data companies are trying to substitute real-world data with synthetic data that can be used in model training. Especially if your data acquisition and model training costs are limited, synthetic data can give you the most bang for the buck.”
Besides being cost-effective, a second advantage of synthetic data is that it can accelerate the speed at which AI models are trained and tested, especially in “what if” scenarios and A/B testing, the analyst said. “If a situation arises where a model turns out to be not fully-trained, it can be quick and easy to produce synthetic data to retrain that model and do what-if analysis to see how it will perform in real-world scenarios,” he explained.
However, Thurai warned against companies relying on synthetic data exclusively for both model creation and testing, saying that this would increase the likelihood of false positives occurring. “A good blend of real-world data and synthetic data will provide the best cost efficiency,” he said. “Don’t go cheap and use all synthetic data.”
Infinity AI’s products have been available in beta for a while, and the company claims to have picked up a number of customers, including Voxel Safety Inc., a creator of AI-powered workplace safety systems.
“Using Infinity AI, we can get new products out of the door faster,” said Harishima Dayanidhi, co-founder and vice president of engineering at Voxel. “Our ML engineers are happier because they get to spend more time on the fun part of model development.”
Interestingly, Infinity AI claims to have been boosted by the economic downturn this year, saying it has prompted more customers to turn to synthetic data as they tighten their budgets for AI projects. According to the company, synthetic data is orders of magnitude cheaper than sensor-collected data. It said that in the last six months, customers have generated more than 5 million synthetic data frames using the self-serve API.
Along with today’s funding round, Infinity AI announced the launch of its Infinity Marketplace, which it hopes will become the world’s largest open-source marketplace for synthetic datasets. Already, it’s offering companies more than a million free frames that can be used for research or commercial purposes.
Its datasets cover areas such as fitness, robotics, smart retail, industrial safety and more. Each month, Infinity AI plans to grow its marketplace by adding yet more free datasets.
Infinity AI founder Lina Colucci said the startup simply wants to make it easy for machine learning development teams to start working with synthetic data. “The ML community has a scarcity mentality with regards to data today,” she said. “Synthetic data turns this into an abundance mentality. Infinity AI is democratizing access to training data since this is the biggest roadblock to progress in ML today.”