IDC’s 2013 study had the digital universe of data sized at 4.4 trillion gigabytes at the time. And just 5 years later, PwC predicted that the accumulated digital universe of data would be 44 trillion gigabytes in 2020. Despite this exponential growth, the same experts foresee only 3% of useful data to be tagged with metadata.
With 2020 approaching, this big data gap coincides with the increasing popularity of data science and its promise to solve big data issues. Enterprising data scientists have therefore begun to leverage AI and machine learning knowledge to tackle the metadata tagging.
The auto-tagging effort currently has had its share of machine learning challenges. For starters, suggested keywords relating to digital assets are often still inaccurate. Also, suggested keyword sets are limited to the metadata taxonomy used by the company. While these are customizable, it can get difficult in an increasingly change-management driven business climate. Additionally, most API providers in the market currently don’t allow for learning feedback loops, while those that do are still generic in approach, at the cost of being custom and client-specific. On the other end of the spectrum, custom training, which can offer the client-specific feedback loops, present the challenge of tackling the automation problem, which hurts the possibility to scale such efforts.
Despite these hurdles, the need for useful data is only going to get bigger. And because converting impotent data to actionable insight, at scale, can only be possible through machine learning and AI, businesses seeking to stand out from the rest will have no choice but to ultimately find a way to make metadata auto-tagging happen.