The ROI of High-Quality AI Training Data 2023

Introduction

Just seven months ago we saw the release of ChatGPT, a transformational generative AI tool that has taken the larger business and technology industry by storm. It has spurred a frenzy of additional generative AI applications and the integration of the technology into leading platforms. This will have huge implications for the way organizations approach their AI strategy and investments into the future.

Our Path to AI Maturity research shows that even before the shockwave-like launch of ChatGPT, natural language processing (NLP) and conversational AI (CAI) were two of the top three most widely deployed AI applications. The range of AI applications is large - everything from predictive analytics to security and robotics - which highlights the wide applicability of AI, and now generative AI, across the enterprise.

Conversational AI and natural language processing were also at the top of the list of AI applications to deliver ROI in the enterprise, which may be due in part to being more widely deployed. The rise of generative AI has turbo-charged these applications and their potential ROI.

What is fueling those deployments? Enterprises are using CAI to build chatbots, digital assistants and other automated interfaces that allow them to connect with customers, build relationships and reduce costs. They are taking advantage of natural language processing and speech/voice recognition technologies to unlock the potential of their language data for use cases that include knowledge management, sentiment analysis, contract reviews and more.

Earlier this year, LXT published its second annual report: “The Path to AI Maturity”. The report summarizes the findings of LXT’s survey of 315 senior executives from a variety of industries with artificial intelligence (AI) experience at mid-to-large US organizations. Two-thirds of respondents were C-Suite executives.

This year’s report revealed increased levels of AI investment despite the general downturn in the economy, and a greater proportion of organizations assessing themselves as operating at higher levels of AI maturity. The findings also show that amidst the current macroeconomic conditions, business agility has come to the forefront as a driver of AI strategy.

In today’s climate, companies are increasingly evaluating their investments to determine which ones are driving value. This follow-on report from the same research study provides a view into how enterprises evaluate the return on investment of the training data that fuels their AI projects, and how they value these investments.

AI maturity of US organizations today

In the survey which was fielded in the fall of 2022, respondents were asked to indicate the level of AI maturity currently achieved by their organizations. The results show that 48% of US organizations consider themselves to have reached the three highest levels of AI maturity. This means that they have moved from the awareness and experimentation phases of their AI deployments to achieving demonstrable ROI from AI in production. Since last year we’ve seen an 8% increase in companies that have moved from the experimenter phase to the maturing phase, with more organizations successfully transitioning from AI experiments to AI in production.

Image
Survey Q5. Looking at the Gartner AI Maturity Model diagram below, at what level of AI maturity is your organization today? n=315; weighted to NAICS US industry split
It’s definitely encouraging to see the progress in AI maturity within US enterprises year over year. However with just 9% of companies at the transformational stage where AI is a part of business DNA, there is still a way to go before the majority of enterprises in the US have significantly embraced AI across their organizations.
Image
[2022] Q7. Looking at the Gartner AI Maturity Model diagram below, at what level of AI maturity is your organization today? n=315; weighted to
NAICS US industry split
[2021] Q7. Looking at the Gartner AI Maturity Model diagram below, at what level of AI maturity is your organization today? n=200

AI data investment

While the current economic climate is creating a high level of overall uncertainty, investment in AI continues to be strong as organizations view the technology as an important lever for business growth. Our study showed that close to half (49%) of organizations surveyed have AI budgets of $76M or more.
Image
Survey Q8. In which range approximately is your organization’s total budget for AI? n=315; weighted to NAICS US industry split
Image

Want to read this report later?

Enter your email address and a copy of this report will be sent to your inbox.

AI budget distribution

When asked how AI budgets are allocated, we found that investments are relatively evenly distributed across categories. Training data and product development ranked highest, however the spread across categories suggests that companies are still trying to determine where to place their bets as they continue to mature their AI strategies.
Image
Survey Q9. What percentage of your investment in AI is dedicated to each of the following? (Average) n=315; weighted to NAICS US industry split

Investing in high-quality training data

The organizations surveyed place a large importance on the quality of their AI training data. When asked if they would be willing to spend more for higher-quality data to support their AI initiatives, 87% agreed. No respondents indicated that they would not at all be willing to spend more for higher-quality data. Organizations understand the value of using high-quality training data as it helps them achieve higher accuracy and generate overall ROI for their AI projects faster.
Image
Survey Q13. How willing is your organization to spend more for higher quality training data for AI? n=315; weighted to NAICS US industry split

How training data is sourced

When it comes to sourcing data for AI, companies use various methods. Building data sets internally and working with 3rd party providers rose to the top of the list for survey respondents. The results here demonstrate that companies are actively pursuing a variety of methods of obtaining the data they need to power their AI projects.
Image
Q15. How does your organization source training data for AI? n=315; weighted to NAICS US industry split

Training data challenges

Data security is currently by far the most challenging aspect for organizations when it comes to training data for AI. Companies are also struggling most when it comes to the availability and quality of their training data. When comparing these challenges by maturity level, experimenter companies are most concerned with secure data storage and data leaks, while maturing companies are more likely to face challenges with cybersecurity concerns and lack of quality data.
Image
Q18. What are the biggest challenges your organization is facing right now when it comes to training data for AI? n=315; weighted to NAICS US industry split.
Image
Q18. What are the biggest challenges your organization is facing right now when it comes to training data for AI? n=315; weighted to NAICS US industry split.

Measuring training data ROI

Companies investing in AI evaluate the ROI of the training data that is used to support their AI projects in several ways, with operational efficiency rising to the top of the list. Organizations see high-quality training data as a way to improve their productivity and streamline their AI projects. High-quality training data also helps companies improve the success rate and improve costs for their AI programs.
Image
Q12. What is the ROI for good quality training data for AI? n=315; weighted to NAICS US industry split
When we look at how the ROI of high quality training data is evaluated by experimenters vs. maturing companies, we see that maturing companies measure ROI for each category higher than experimenters, except for when it comes to “improved reputation”. This measurement of ROI is more common with experimenter companies. Where we see the biggest difference between how experimenters and maturing companies measure the ROI of high-quality training data is in operational efficiency, higher success rates of AI programs, cost savings of AI programs and reduced error rates of AI models.
Image
Q12. What is the ROI for good quality training data for AI? n=315 weighted to NAICS US industry split
When we break this out more granularly by maturity stage we find that companies at the Awareness stage of their AI journey are more likely to measure the ROI of high-quality training data by how it helps accelerate their time to market. Companies at the Active and Operational phases are most likely to measure ROI in terms of operational efficiency, and companies at the highest levels of AI maturity — Systemic and Transformational — are most likely to measure ROI by the increased success rate of their AI programs.
Image
Q12. What is the ROI for good quality training data for AI? n=315 weighted to NAICS US industry split
The graphic below is another way to illustrate how companies evaluate the ROI of high-quality training data according to their phase of AI maturity. In the early stages of their AI journey, using high-quality training data allows companies to accelerate their time to market, which is critical as they attempt to outperform competitors. As they move into the Active and Operational phases they see the use of high-quality training data as a way to improve their operational efficiency and productivity as they scale, and at the highest levels of AI maturity, using high-quality training data is most tied to the increased success rate of their AI initiatives. Ensuring the use of high-quality training data supports these organizations in their AI journeys, regardless of maturity phase they’ve currently determined to be.
Image

Enterprise training data needs

Two thirds of respondents stated that their need for AI training data will increase in the next two to five years. No respondents expect training data expenditure to decrease in that timeframe. Organizations that have reached the highest levels of maturity indicate the strongest need to increase their training data volumes over this time period. As companies deploy more AI models across functions and business processes, more training data is needed to support initial model training and periodic model updates.
Image
Image
Q19. Do you expect your organization’s needs for training data to increase, decrease, or remain the same in the next two to five years? n=315; weighted to NAICS US industry split

Conclusion

LXT’s research findings illustrate the important role that high-quality training data plays in an organization’s AI maturity journey. The ROI of high-quality training data is evaluated by the enterprise in multiple ways including operational efficiency, higher success rates for AI programs and cost reduction. This research study found that companies are overwhelmingly willing to pay more for their training data to ensure high-quality so they can build a strong foundation for their AI programs and set them up for success.

Training data is a strong need across US organizations, particularly those that have reached the highest levels of AI maturity. Companies looking to drive successful AI projects should thoroughly evaluate their training data investments to make sure they are building the data pipeline needed to ensure their projects reach their goals.

Behind the research

LXT commissioned a survey of 315 senior decision-makers working for US organizations. 46% of respondents were from the C-Suite and all those who took part had verified AI experience. Only 25% of those who applied met the criteria required for participation, which included their level of AI knowledge and experience.

Contributors were engaged using online surveys, answering on behalf of a range of business sizes, revenues and industries. Each participant represents a US organization with at least $100 million in annual revenue and over 500 employees.

The results were weighted to match the North American Industry Classification System split of companies with 500+ employees in an effort to reflect an accurate representation of the mix of US companies by industry.

The research was conducted from September 29th to October 21st, 2022, by Reputation Leaders, an independent research organization. Interviews were conducted in the US by online survey using the panel services of Borderless Access.

For questions about the research findings, please reach out to us at info@lxt ai.

ImageImage

About LXT

LXT is an emerging leader in AI training data to power intelligent technology for global organizations. In partnership with an international network of contributors, LXT collects and annotates data across multiple modalities with the speed, scale and agility required by the enterprise. Our global expertise spans more than 145 countries and over 1000 language locales. Founded in 2010, LXT is headquartered in Toronto, Canada with presence in Australia, Egypt, Turkey and the United States. The company serves customers in North America, Europe, Asia Pacific and the Middle East.

To learn more about LXT, visit lxt.ai.