Fuel For Digital Future: Does India Have Enough Streamlined Data To Feed Its AI Ambitions?

As the country races to build homegrown AI models, its vast digital footprint could become a powerful advantage, but only if the data behind it is made accessible, standardised, and responsibly shared

AI, artificial intelligence, data, data race, India data, MeitY, IndiaAI, IndiaAI Mission, digital

“The key thing that remains is data,” said Abhishek Singh, Additional Secretary, MeitY, and CEO of the IndiaAI Mission, at the India Mobile Congress 2025 a few days ago.

India, with its billion-plus population and sprawling digital economy, appears to generate data by the terabyte every day. Yet, ask how much of it is actually usable, accessible, and structured to build artificial models, and the answer grows more complicated.

Compute, Data, And Talent

Singh laid out the triad needed to build AI at a population scale: “Compute, data, and talent.” 

In its bid to build homegrown foundational AI models, India ticks two off the list. “When it comes to talent, India has strength in the digital human capital that we have,” said Singh. “Every BigTech company does have people of Indian origin. Indian engineers are building solutions at scale, both in India and abroad,” he said.  

Compute power, too, is becoming more accessible. “Today, we have almost 40,000 GPUs that are available at a very low cost of less than a dollar per GPU per hour so that developers, startups, and researchers can train models and build applications,” he added. 

But data — the third pillar — is where the cracks begin to show.

Data, Data Everywhere, Not A Byte To Use

India’s digital footprint is immense. Aadhaar enrolment, health records, financial transactions, land registries, weather sensors, and satellite images collectively form a data ocean that only a few countries can rival. 

Yet, most of this information lies trapped in silos, locked behind departmental boundaries or incompatible systems. “We have done a lot of digitisation in the last few years,” Singh said, and pointed out, “But even though we generate a lot of data, with a billion-plus people using various services, most of the data that we have is not in a form in which it can be used for AI applications.”

“The problem is not one of scarcity, but structure. Many government datasets are published in PDF reports rather than in machine-readable formats. Others are scattered across portals that follow their own standards and naming conventions,” a senior economist told The Secretariat

Without mechanisms for interoperability, such as APIs (Application Programming Interfaces that allow different systems to exchange information), this data cannot be easily combined or analysed.

National Data Sharing And Accessibility Policy 2012

“India’s National Data Sharing and Accessibility Policy of 2012 was conceived to change the siloed nature. To open up non-sensitive government data for public use,” said the economist. 

However, policy implementation has been inconsistent. Portals like data.gov.in remain underused, sometimes outdated, and datasets often lack the metadata that makes them discoverable or usable for machine learning.

The result is a digital paradox: a country rich in data, yet poor in data usability.

AI Kosh, A Step Towards A Well-Defined Structure

The government’s latest attempt to bridge these silos is AI Kosh, a platform under the IndiaAI Mission that aims to consolidate and standardise datasets. 

By mid-2025, AI Kosh hosted more than 3,000 datasets and 243 models across 20 sectors – from agriculture and transport to health. The goal is to make data not only available for research but also to build practical AI tools tailored to suit the Indian system. This includes chatbots in local languages, crop prediction models, and disease surveillance systems. 

Framework Of Data Economy 

Singh says the platform gives private entities “the freedom to lay down their own policies with regard to consumption of that data,” and, thereby, creates what he calls “a whole framework of data economy.”

But challenges remain. The quality and completeness of datasets vary widely. Only a few ministries have uploaded time-series or geospatial data that the AI models require. Private entities remain cautious about sharing proprietary or commercially valuable data, despite assurances of anonymisation. 

Data Require Refinement

“Very often we refer to data as the new oil, but just like oil requires processing, data also requires being shared through APIs, subjected to privacy preservation and anonymisation tools, and then made available to various stakeholders, which can really impact people," Singh says. 

If India manages to create a trustworthy, standardised data ecosystem, the effects could transform governance. Predictive analytics could help anticipate disease outbreaks, streamline traffic flows, or plan irrigation in drought-prone districts. AI could improve welfare delivery by helping identify eligible beneficiaries and track leakages. 

India’s engineers are ready. The compute is coming online. What remains is the task to turn scattered bytes into coherent knowledge – to refine the oil before it powers the engine.

This is a free story, Feel free to share.

facebooktwitterlinkedInwhatsApp