IndiaAI Expert Group Report - Oct. 2023
IndiaAI Expert Group Report - Oct. 2023
2
TABLE OF CONTENTS
• Objective ............................................................................................................................................................... 33
• Introduction .......................................................................................................................................................... 54
3
• AI Investment Trends ............................................................................................................................................ 72
• Scheme Guidelines................................................................................................................................................ 76
• Introduction .......................................................................................................................................................... 79
• Annexure – I: Qualification Packs (QPs) and National Occupational Standards (NOS) for AI & Big Dat ............. 111
• Driving AI Transformation: Mechanisms, Metrics, Measures and Estimation for AI Compute Capacity ............ 149
• Preamble............................................................................................................................................................. 160
• Recommendations for AI Chipsets for High-performance Compute (HPC) and related areas ........................... 171
4
CONTEXT AND BACKGROUND
AI will be the kinetic enabler of India’s digital economy and make Governance smarter and more data-led. AI is expected
to add USD 967 Bn to the Indian economy by 2035 and USD 450–500 billion to India’s GDP by 2025, accounting for 10%
of the country’s USD 5 trillion GDP target. AI has become a top priority on policy agendas worldwide, as it has the power
to foster innovation, generate employment opportunities, and contribute to the growth of the country.
Over the past several years, the Government of India has taken concrete steps to encourage the adoption of AI in a
responsible manner and build public trust in its use, placing the idea of ‘AI for All’ at its very core. Favourable policies and
continuous interventions strive to harness the potential of AI for social development and inclusive growth, in line with our
Hon’ble Prime Minister’s inclusive development philosophy of ‘Sabka Saath, Sabka Vikas and Sabka Prayas’.
The Ministry of Electronics and Information Technology (MeitY) is committed to PM’s vision of fostering and promoting
cutting-edge technology use cases in the country. While MeitY has undertaken the implementation of a ‘National Program
on AI’ which encompasses four components: Data Management Office, National Centre for AI, Skilling on AI, and
Responsible AI, the ‘IndiaAI' is crucial to complement the ongoing 'National Program on AI' by establishing a focused and
comprehensive framework that addresses specific gaps in AI ecosystem. As one of the largest Global South economies
leading the AI race, India has been nominated as the Council Chair of the Global Partnership on AI (GPAI) by winning more
than two-thirds of first preference votes.
The success of the government’s initiatives and efforts is evident from India's rising position in global AI rankings and
indexes. We have been ranked 1st in AI Skill Penetration and 1st in the Number of GitHub AI Projects as per the Stanford
AI Index report 2023. The same Index places India 5th in the value of Private Investment in AI and Number of Newly
Funded AI Companies. Also, India has been ranked 1st in all 5 Pillars of Peak AI’s Decision Intelligence Maturity Scale,
which assesses a business’s commercial AI readiness. Further in the 2023, NASSCOM report on the State of Data Science
& AI Skills In India, India ranked 1st in terms of AI skill penetration and 1st in AI talent concentration.
IndiaAI Vision
IndiaAI has a mission-centric approach that ensures a precise and cohesive strategy to bridge the gaps in the existing AI
ecosystem viz-a-viz Compute infrastructure, Data, AI financing, Research and Innovation, targeted Skilling, and
institutional capacity for Data to maximize the potential of AI to advance India's progress.
Furthering our commitment to building AI in India and for India, MeitY has set up seven working groups to collectively
brainstorm on the vision, objectives, outcomes, and design for each of IndiaAI pillars.
After months of deliberative and collective brainstorming, the working groups have studied & shared their report on the
following aspects of AI:
The objective of this working group is to detail the operational aspects of establishing three Centres of Excellence (CoEs)
that will strive to leverage India's distinct strengths in AI to tackle critical problems of society. The proposed CoEs will lay
special emphasis, not just on the foundational & multidisciplinary research in AI but also on the development and adoption
of indigenous AI technologies at the national and international levels.
The proposed CoEs will bring together experts from academia, industry, and research entities to work on cutting-edge
research to create high-quality AI solutions and to develop scalable solutions across sectors. Through these Centres, India
will foster a culture of creativity, experimentation, and entrepreneurship to unleash the full potential of AI in India and to
establish itself as a world leader in AI innovations.
a. Conduct foundational research in broad areas of AI to generate new knowledge in the field relevant to the unique
advantages and challenges of Indian society.
5
b. Facilitate global knowledge exchange and capacity building through collaborations with the global academic and
innovation ecosystems.
c. Provide training and development opportunities to a new generation of researchers, innovators, and entrepreneurs,
and promote collaboration between academia, industry, and research entities, enabling rapid translation of
candidates from TRL 3 AI technologies to TRL 7 prototypes.
d. Create an industry-academia-startup ecosystem to develop technology modules both vertically within the
organisations and horizontally across the sectors.
e. Develop and scale innovative, cost-effective, and efficient solutions to address challenges in several national critical
sectors.
f. Aid and facilitate the commercialization of research and development outputs by leveraging existing national
schemes and by working with industry partners to take new products and services to the market.
g. Partner with and strengthen the existing incubation facilities to support AI start-ups in developing their ideas,
nurturing talent, accessing networks, etc.
h. Increase the penetration of AI solutions both domestically and internationally to expand the range and depth of AI
applications and to cater the new markets.
The CoEs will play a pivotal role in positioning India as a global leader in a fast-evolving AI landscape through impactful
research and innovative solutions to critical problems faced by the country. They will be responsible for setting up the
agenda and priorities of AI in the sector as well as identifying the key challenges and opportunities in the field. They will
engage with the stakeholders to implement solutions relevant to different societal sectors. The CoEs will play a critical
role in developing and nurturing the AI ecosystem in India and move it closer to being a powerhouse in AI innovation and
its responsible adoption.
The objective of this working group is to collaboratively conceptualize the architecture of the India Dataset Platform (IDP)
and Datasets Program. The working group aims to define the objectives of the IDP and highlight its role in creating a
process for identifying potential datasets in government ministries and departments for better decision-making strategies
and artificial intelligence applications. The working group will also encourage non-government entities to contribute
datasets to IDP. It seeks to serve as a guiding framework for the development, implementation, and utilisation of the IDP.
The following are the salient features of IDP:
• Federal Structure: The IDP is being developed as a federal structure to accommodate data providers from various
ministries and departments. The federated data approach allows organisations to maintain autonomy and
control over their data while enabling collaborative analysis across different datasets.
• Data Discovery: The IDP aims to build a single platform where data from various sources can be accessed and
linked. This enables easier discovery and cataloguing of datasets, along with promoting efficient data utilisation.
• Data Value-Added Services: There is a need for a central agency or organisation to provide value-added services
curated to enable government ministries and departments to solve challenges in the initial stages of datasets
formation and, to manage the data. This will ensure data quality and consistency, as all the data providers might
not have the expertise to curate their data effectively.
• PAAS Architecture: The suggestion is that the India Dataset platform should act as a platform-as-a-service (PaaS)
or architecture-as-a-service (AaaS) for various organizations. This approach will allow each domain group to
manage and upload datasets that can be further used for research and innovation purposes.
• Establish clear governance: Define roles, responsibilities, and decision-making processes for the data exchange
platform, ensuring compliance with the data protection regulations.
• Identify and onboard the data providers: Educate government departments about the benefits of participation
and foster collaboration for data sharing.
• Define data standards and formats: Establish guidelines for consistent data formats, metadata, and quality
standards.
6
• Develop data security and privacy measures: Implement robust security measures and ensure compliance with
data protection regulations.
• Build technical infrastructure: Create a scalable and secure platform with APIs for data access and high
availability.
• Implement data governance and access control: Define data stewardship, access controls, and usage agreements.
• Foster collaboration and user adoption: Engage with data consumers, build user-friendly interfaces, and promote
the platform's benefits.
• Continuously monitor and improve: Regularly evaluate performance, gather feedback, and iterate on the
platform to address emerging needs.
Prioritising these areas can go a long way in establishing a solid foundation for a data management structure that will
facilitate effective data exchange and utilisation for national benefits.
To realise the full potential of India’s digital government vision, maximise the efficiency of data-led governance & public
service delivery, and catalyse data-based research and innovation, MeitY has released the Draft National Data Governance
Policy. This Working group will work in alignment with the policy to achieve tangible quantifiable yearly targets. The
National Data Governance Policy provides an institutional framework for governing data collection, management,
processing, storage, and accessing processes and systems by the National Data Management Office (NDMO).
The working group’s report provides recommendations around the objectives and functions of NDMO and its governance
structure. The report provides its recommendations across the below-mentioned pillars:
• Institutionalising NDMO: A detailed mechanism to establish the NDMO as a non-statutory independent agency
has been proposed. The NDMO will work like a regulator in several respects of its operations.
• Governance and Structure of NDMO: The organisational structure and governance of the NDMO have been
elucidated. To provide a general direction on the management of affairs and operations, NDMO is proposed to
be headed by a CEO, whose roles and responsibilities have been detailed out. The CEO could be supported by
heads of six functional divisions including Standards and Policies, Platforms and IT, Grievance Redressal, Legal,
Audit and Compliance, and HR & Finance. Experts from the project management unit could be assigned to
relevant divisions to provide technical and operational support.
• Functions of NDMO: This is in alignment with the National Data Governance Policy, guidelines have been created
for executing the functions of the NDMO.
• Data Management Units: As proposed under the NDGP, an institutional mechanism of Data Management Units
(DMU) may be established within each Ministry/Department. The roles and responsibilities of the DMU would
be to qualify the expected outcomes and to standardise operations across all line ministries/departments. The
DMU structure has also been created detailing the functional and technical requirements of each division/ officer
in the DMU, including the Data fellows supported by MeitY.
• Schedule for Implementation: Tangible and quantifiable outputs and milestones for the NDMO have been
identified across specified timelines which may be further updated as the ecosystem matures.
The objective of this working group is to assess and design funding mechanisms for AI startups to enhance AI innovation
in India. The vision is to build the next 100 AI unicorns in the country through the India AI program. The objectives of the
scheme are as follows:
• Empower AI startups to make AI-enabled products/solutions for India and the world under Make AI in India &
Make AI Work for India initiatives.
• Develop & utilise the available R&D ecosystem and promote innovation in AI and other emerging technologies.
• Establish a funding mechanism for promising AI Startups and leverage transformative technologies to foster
inclusion, innovation, and economic growth.
7
• Enable access to the state-of-art AI infrastructure through CoEs and develop the new support infrastructure for
various use cases of emerging technologies/Deep Tech. Startups.
• Initiate collaboration with central government organisations, states, industry, academia, and international
organisations for the development and deployment of emerging technologies, skilling, and capacity-building
activities.
• Enable the funding mechanism for Early-stage startups for commercialization and growth of AI startups in the
country.
• Strengthen community-building initiatives including workshops, capacity-building activities, conferences, etc. to
strengthen the ecosystem for the recognition & promotion of AI Startups.
o Measuring, monitoring, and reporting compliance and performance of MeitY-related Assets, Programs, and
schemes.
● Challenge Execution:
o Sourcing, executing, monitoring, and reporting on challenges and providing support to AI startups.
o Building links with global AI startup ecosystems and aggregating the resources from global and local
industries, institutions, and agencies for the benefit of Indian AI startups.
o Optimising and enhancing the performance of MeitY related Assets, Programs, and schemes via capacity-
building programs.
o Funding of AI startups through a pitching program to attract capital into MeitY-supported AI startups.
o Creating an awareness program and engaging a larger number of AI startups via effective and persuasive
media marketing programs.
The Future Design IndiaAI Scheme has significant potential to empower AI startups and foster the development and
availability of AI-enabled products and solutions in India and globally. By providing a comprehensive framework, the
scheme aims to create a conducive environment for AI innovation, entrepreneurship, and market expansion, driving
India's growth as a leading AI hub.
This working group has reviewed various existing AI-based curricula across the globe and has emphasised an AI Research
based model curriculum framework involving K12 interventions and Graduate/Post Graduate level interventions. To
realize the vision of “Make AI in India” & to keep pace with AI advancements, the Indian workforce must have the requisite
skillsets. The Working group has categorised key recommendations as follows:
• Model Curriculum & Repository: A comprehensive AI curriculum that covers the fundamentals of AI, mathematics
and statistics, machine learning, deep learning, NLP, computer vision, reinforcement learning, AI ethics along
with real-world problems, and continual learning is recommended.
• Framework: The framework categorises courses & programs as per the focus areas--technology-specific
(algorithms, LLMs, etc.), infrastructure-specific (GPUs, specialised accelerators, Cloud, HPC, etc.), application-
specific (sectorial, domain, etc.), among others.
8
• Collaborative & Competitive ecosystem: Building a collaborative and competitive ecosystem among Schools,
Universities, and Research institutes through various AI-based government interventions.
• Research for Startups and MSMEs: To prioritise innovation among Indian startups and MSMEs, there is a need to
focus on AI-related research, encourage academic collaboration, and upskill the non-IT workforce.
• Research fellowships: Initiatives like research fellowships and grants for building research capability in tier 2-3-4
institutions, supporting students in international AI conferences/journals, research mentorship for selected
scholars, and encouraging students to collaborate in AI-related thesis and datasets.
• Faculty training in AI: Recommendations like making it mandatory for educational institutes to re-skill their
teaching staff with the new trends & research in the field of AI at least once every two years. Periodic incentives
for the training of faculty can also be considered. Teaching staff should be encouraged to explore collaboration
opportunities on AI research with industry through AI joint projects and Industry internships.
• Career Path Mapping: University courses should be mapped to diverse career fields in AI such as AI researcher,
machine learning engineer, data scientist, AI architect, NLP engineer, computer vision engineer, AI product
manager, AI ethicist, AI consultant, and AI entrepreneur and students should be skilled relevant to their preferred
job roles.
• India-specific AI community: Creating an India-specific AI community is important to address national challenges,
to promote national data sharing, for effective use of AI compute infrastructure, and for collaborations among
researchers on research findings and results, etc. A national platform would help in building collaborations on
Made in India AI algorithms, Indian datasets, and AI computing, along with relevant training, and courses for
capacity building. The National platform would also be beneficial in posting real-world challenges for AI startups
along with financial aid for developing indigenous capability in diverse aspects of AI for India.
These recommendations aim to address the growing demand for AI-related skills and prepare the Indian workforce and
students to work on AI and other emerging technologies in the future. This Working Group’s vision for “A Transformative
Approach: From Job Takers to Job Providers” is to keep Indian human resources in AI educational and research
organisations up to date with the latest AI technologies and to make them competitive in the global AI job market.
The objective of this working group is to provide a comprehensive overview of the current state of AI computational
resources and their limitations in India. It also expands on the challenges and opportunities associated with intensifying
AI compute capacity and its potential impact on AI applications in reshaping India’s future. Following are some of the
actionable recommendations:
Infrastructure and Compute Capacity: Establish best-in-class AI compute infrastructure at five locations, with a capacity of
3000 AI Petaflops, along with an Inference Farm (2500 AI PF) and Edge Compute (500 AI PF) systems. There is also a need
to set up AI innovation hubs with Secure Distributed Data Grids (200/400 Gbps) across the country to support startups
and build academic and industrial collaborations. Improving digital infrastructure and attracting private sector investment
in AI infrastructure should be the priority.
· AI Use Cases: AI use cases should be prioritised in governance, agriculture, health, education, and finance. The aim is
to support AI adoption in these domains by stakeholders such as government ministries, academia, research labs, and
startups. Additionally, strategies will be developed to ensure inclusive AI, making technology accessible and beneficial to
all, including specially-abled individuals and people in remote areas of the country.
· Evaluation and Impact Measurement: SPMIND and AIMIND framework implementations will enable infrastructure
impact assessment for startups and AI innovation across the country. An evaluation framework metric will be developed
to measure this impact. Additionally, model benchmarking and real-time data monitoring systems will be established to
compare various AI models’ performance and to track the effectiveness of AI systems across sectors, in providing valuable
improvement insights.
· Marketplaces and Open-source AI: To enhance AI accessibility, AI marketplaces will be leveraged to build service
models for AI as a Service (AIaaS), Infrastructure as a Service (IaaS), and Platform as a Service (PaaS), offering pre-trained
9
models to users. Open-source AI frameworks and libraries, supported by diverse vendors and communities, will be utilized
to promote vendor-agnostic AI development, fostering innovation and collaboration.
· AI Membership Subscription and Repository: A nationwide AI membership subscription program will be launched for
government employees and civil servants to promote AI literacy and skills development. A dedicated platform will be
established to store pre-trained models, reducing redundant training costs, and providing a model repository for others
to utilize and fine-tune under various licensing options.
· Security and UI Challenges: There is a need to prioritize vetting of AI developers to safeguard against data leakage
and misuse of AI. This will also address UI challenges, ensure user-friendly and accessible AI solutions, and establish
regulatory guidelines for data privacy, AI ethics, and interoperability standards.
· API Management: To address the issue of varying API costs and ensure effective cost management in AI marketplaces,
transparent pricing models will be implemented, along with clear guidelines for users. This will promote fairness and
transparency in pricing while enabling users to manage their costs efficiently.
· Capacity Building and Collaboration: To drive AI readiness, the government will promote AI education, facilitate cross-
sector collaborations, and enhance public awareness and trust in AI technologies through training programs in
partnerships with educational institutions, and other public engagement initiatives.
The objective of the Group is to Conceptualise the design for AI Compute, assess the requirements for technical
capabilities, latency, and specifications, detail the scalability and flexibility of the infrastructure to meet evolving needs
and elaborate on the pricing models (including applicability, eligibility for subsidy (if any), etc.).
The FutureDesign Design Linked Incentive (DLI) Scheme aims to offer financial incentives as well as design infrastructure
support to domestic companies and start-ups/ MSMEs across various stages of design, development, and deployment of
semiconductor design(s) for Integrated Circuits (ICs), Chipsets, System on Chips (SoCs), Systems & IP Cores for AI over a
period of 5 years.
The objective of this exercise is to undertake a comprehensive study of the pillars of IndiaAI and to identify tangible action
items that need to be worked on to achieve the goal of “AI for all”. In line with our Hon’ble Prime Minister’s inclusive
development philosophy of ‘Sabka Saath, Sabka Vikas, and Sabka Prayas’, the recommendations in the report will go a
long way in harnessing the potential of AI for social development and inclusive growth.
The Reports of each of the Working Groups are detailed in the sections that follow
10
WORKING GROUP 1:
CHAIRMAN:
MEMBERS:
11
INTRODUCTION AND CONTEXT SETTING
Recognising AI’s potential to catalyse large-scale socio-economic transformation and the need for India to build its
foundational, applied and translational AI capabilities, the Hon’ble Finance Minister, in her budget speech for 2023–2024,
announced the establishment of three Centres of Excellence (CoEs) for Artificial Intelligence in top educational institutions
in a hub and spoke model. These CoEs will focus on one or more socially and economically critical application sectors, such
as Governance, Healthcare, Agriculture, Manufacturing, FinTech, etc.
The objective of this initiative is to create a world-class AI ecosystem that pushes the frontiers of AI knowledge creation
and innovation in collaboration with industry, national and international academia, start-ups, etc. The CoEs will promote
foundational and applied research in AI to address critical challenges across sectors as well as help commercialise and
contextualise existing solutions. They will be responsible for setting the agenda and priorities of AI in the sector as well as
identifying the key challenges and opportunities in the field.
The network of CoEs form an essential part of the larger India AI initiative to place the country as a global leader in AI
related areas. It will provide intellectual impetus to the broad AI ecosystem by contributing to and benefitting from the
datasets created, the Data Management Framework, the computing infrastructure, the startup promotion schemes,
skilling efforts, etc., that are being built in parallel. Overall, it will foster a culture of creativity, experimentation, and
entrepreneurship to unleash the full potential of AI in India and pave the way towards becoming a world-leader in AI
innovation.
OBJECTIVES
The functions of CoEs may include, but are not limited to, the following:
1. Foundational Research: The CoEs will focus on outcome-driven foundational research to support knowledge
generation in the field of AI. The CoEs will augment in-house research with partnerships with industry and
academia in the country, as well as develop global connects, participate in flagship conferences and events, etc.
The CoEs will also strive to undertake research in other constituent areas of AI such as Computer Vision, Robotics,
Machine Learning, NLP, etc. as needed.
2. Technology Development: The CoEs will focus on the identified core areas to develop and scale efficient solutions
to address challenges in the sector. Technology development will be done in partnership with industry partners,
government departments, MSMEs, NGOs, academic spoke institutions, and other relevant stakeholders.
3. Promoting Innovation & Entrepreneurship: The CoEs will partner with industry and the startup ecosystem to
facilitate the scaling of technology solutions and increase the penetration of AI solutions both domestically and
internationally to expand the range and depth of AI applications.
4. AI Skill Development: The CoEs will provide training and development opportunities to a new generation of
researchers, entrepreneurs, and practitioners of AI through in-person and online engagements, focussed
workshops, etc.
SCOPE
The CoEs seek to enable knowledge creation, knowledge exchange and capacity building to a new generation of
researchers, innovators, and entrepreneurs along with promoting collaboration among Indian and global academia,
industry, and research entities. It includes the following aspects:
Startups are the backbone of the Indian economy and contribute a lion’s share to India’s GDP and employment
opportunities. The proposed CoEs will play a key role in empowering startups and fostering entrepreneurship in the field
of AI to enable businesses of all sizes to benefit from this transformative technology. The CoEs will equip them with cutting-
edge AI technology, technical expertise, and training. The initiatives envisaged in this direction include:
12
1. Supporting an industry-academia-government collaborative efforts as well as leveraging existing programmes to
foster the start-up ecosystem.
2. Enhancing the commercial viability of applications through interactive discussions with industry and sectoral
ministries.
3. Providing handholding support to innovators and start-ups and helping them introduce new technologies within
their current facilities.
4. Developing and deploying products for the users as well as providing funding support to encourage such
development.
5. Organising innovation challenges and hackathons to boost the start-up culture and create competition among
start-ups to develop cost-effective and deployable solutions.
6. Establishing pre-incubators to facilitate pilot-scale research projects that can be commercialised by start-ups and
involve business/management schools in promoting entrepreneurship.
INTERNATIONAL COLLABORATION
To achieve the vision of becoming the global hub for AI research and innovation, India can leverage its position of
leadership in important multilateral forums to position India’s CoEs as global CoEs and to increase collaboration between
domestic and global AI experts. This could include initiatives to encourage global experts to spend time as visiting
researchers at the CoEs, facilitating the exchange of PhD students, collaborative research projects, joint solution
development, etc.
The network of CoEs can be proposed as Global Partnership on AI (GPAI) National AI Institutes to support GPAI’s Expert
Working Groups focused on interdisciplinary and applied AI research. Thus, augmenting the support provided by the
current GPAI expert support centers - in Montreal and Paris that have expertise in Responsible AI and Data Governance;
and Future of Work and Innovation & Commercialization respectively. The CoEs may also support other multilateral and
bilateral partnerships on AI, such as the proposed AI-Engage partnership between QUAD countries.
The CoEs will focus on the application of AI in areas with high potential impact, leveraging the unique strengths of India
such as the availability of diverse data and scale and focus on India’s socio-economic needs. Each CoE will focus on one or
more application sectors based on their vision and strengths.
The potential application areas will include AI in Governance, Healthcare, Agriculture, Sustainability and Manufacturing,
among others. Going forward, aspects such as the availability of reliable data, domain expertise, and effective partnerships
will play a role in selecting future application sectors.
As an application area, governance holds tremendous potential given the strides made by India in this regard. The
Government of India has embarked on a massive effort to collect data on all aspects of basic governance in the country
and make it available for its responsible use. This makes governance a prime sector for AI interventions. Similarly, other
socially and economically relevant sectors such as healthcare, agriculture, sustainability, manufacturing, etc. stand at the
forefront of AI adoption, showcasing their potential to revolutionise industries and address critical challenges.
The proposed Centres of Excellence will be established at leading academic institutions. Each CoE will have one or more
relevant industries as partners. Strong collaboration with global academia, research entities, and industry in a hub and
spoke model is also expected from each CoE. To leverage the interdisciplinary nature of AI, active participation from
domestic and international experts in both academia and industry will be encouraged.
13
THE COES WILL BE SET UP ON THE BELOW PRINCIPLES:
1. The CoEs will be established in leading academic institutions selected based on the strength of proposed activities as
well as on their expertise in foundational and applied research, track record of success in identified sectors, and the
readiness to collaborate with other relevant entities.
2. The CoEs will foster a conducive environment to nurture aspiring entrepreneurs and equip them with the necessary
tools and knowledge through an active and robust incubation centre.
3. Appropriate models will be established for the faculty and other personnel of the academic institutions working in
and for the CoEs, safeguarding the interests of all parties.
4. Each CoE will be set up as a Section 8 company with its independent and autonomous governing structure to facilitate
agile decision-making.
5. The Government funding for CoEs will be available for 5 years initially, with the industry and other partners starting
to contribute from the third year.
6. The CoEs will aim for long-term self-sustainability by generating revenue from lab facilities, incubation, training,
research, products, and consulting services. Each CoE will be expected to be 50% self-supporting within 3 years of
their establishment, and 100% self-supporting within 5 years.
7. The CoEs will have a strong guidance framework consisting of a Governance Council, a PRSG (Project Review &
Steering Group), a Steering Committee, and other mechanisms as necessary.
8. Apart from the anchor academic institution and industry partners, each CoE will work with several spoke institutions
from academic/research institutions, industry and start-ups. The hub-and-spoke model will provide wider access to
resources economically, in addition to improving collaborations across different entities.
The CoEs will be set up in a Hub & Spoke model with one CoE academic institution operating as the Hub and a diverse and
vibrant set of spoke institutions that work closely with the Hub. The spokes could consist of other academic or research
institutions, start-ups, and industry players, and will be included for a well-defined role in the overall operations of the
CoE. The objective of including spokes is to augment and complement the expertise available at the hub.
Each CoE will ideally start its operations with 4 or more spokes that provide supplementary and complementary strengths.
Additional spokes may be added subsequently as needed in the concurrence of the Governing Council of the CoEs.
The CoE hub institution will be responsible for setting the agenda and priorities of AI in the sector as well as identifying,
in collaboration with the spokes, the key challenges and opportunities in the field. The hub institution will take the overall
leadership of the CoE with the spoke institutions helping critically with different aspects. It will coordinate the activities
of the CoE overall and be the primary interface with the governance structure of the CoEs. The hub will also play a crucial
role in managing the finances of the CoE.
Each of the spoke institutions will sign an MoU with the respective CoE to detail their roles and responsibilities, details of
the activities, deliverables, funding etc. The spokes may play different roles within the structure and goals of the CoE
ranging from participation in technology development, translation, user studies, field trials, training, etc., based on the
needs of the activity and the expertise of the spoke institution. The spokes will be responsible for engaging with the users
and implementing the solutions developed by the CoE as well as bringing the sectoral expertise that can inform AI
solutions.
14
IMPLEMENTATION MECHANISM
The CoEs will be set up as independent Section 8 companies for effective and agile governance. Each CoE will be anchored
at the Hub academic institution and will have Spokes that may include other academic or research institutions and industry
representatives. These CoEs will serve as centres for research, innovation, entrepreneurship, and state-of-the-art
technology infrastructure. They will act as technology resource centres that encourage interdisciplinary research as well
as translate cutting-edge research into practical use cases.
A 2-step process is envisaged to select the CoE institutions, based on the strength of their proposals. The overview of the
process is given in Annexure A.
In the first step, qualifying academic institutions will be invited to submit a short concept proposal – no more than 5
pages in length – on the broad outline of the CoE they have in mind. The concept proposal should indicate the
capability, track record, and capacity of the institution in different aspects of AI: foundational research, global
connections, application development, industry engagements, innovation promotion, incubation, skill development,
etc. It should also outline the plans for the proposed CoE along these dimensions and indicate potential industry
partners and spoke institutions.
A committee of experts constituted by MeitY with members from academia, industry, and research institutions will
scrutinise all concept proposals and rank them based on their suitability to the intended goals. The top 10-12 concept
proposals will be selected, and the shortlisted institutions will be invited to submit a detailed proposal within two
months.
The detailed proposals by the shortlisted institutions or teams of institutions on their vision for the CoE and detailed
plans to achieve them. The proposal must identify academic and industry partners, spoke institutions, etc. Details on
the possible foundational problems and the problems in the application sectors to be addressed by the CoE must also
be spelt out in the proposal. The key dimensions of evaluation for step 2 of the process will broadly include:
• Academic experience, record of successful translation research and impactful real-world outcomes of the
institution and its partners.
• Quality and impact of the proposed research, applications in sectoral areas, and the readiness of the institution
in carrying them out.
• Potential national and international impact on the AI ecosystem.
Detailed parameters to be used for the selection of the CoE hub institution at Step 1 are outlined in Annexure B along with
the evaluation criteria for the same. Similarly, the selection parameters and evaluation criteria for Step 2 are outlined in
Annexure C.
The performance of the CoEs will be evaluated regularly by mapping their performance to the deliverables and timelines
as well as the identified set of KPIs/Critical Success Factors to ensure that the objective of the CoEs is achieved as
envisioned. The evaluation parameters and scoring mechanism for periodic monitoring are outlined in Annexure D. The
overall principles include:
1. Excellence and Impact of Research: CoEs will be assessed on their ability to generate high-quality research
outputs in core AI and application areas.
15
2. Impactful Technology Development: CoEs will be evaluated on their ability and success in developing relevant
innovative technologies and solutions that address real-world challenges in the identified sectors.
3. Revenue Generation: CoEs will be evaluated on the revenue generated from their technologies, IPRs etc. The
revenue stream will ideally be established within a year.
4. Fostering Innovation: CoEs will be evaluated on their contribution to nurturing entrepreneurship, facilitating
collaboration with industry partners, and promoting an ecosystem that promotes innovation.
5. Collaboration and Knowledge Sharing: CoEs will be assessed on their ability to foster collaboration and
knowledge sharing with stakeholders such as research institutions, industry partners, etc.
6. Capacity Building: CoEs will be evaluated on their efforts to nurture human capital and build a skilled workforce
in the field of AI.
GOVERNANCE MECHANISM
Each CoE will be set up with an independent and autonomous governing structure to facilitate agile decision-making. The
statutory board of these companies will include the head of the host academic institution as its Chair with members drawn
from academia, industry, and government.
GOVERNING COUNCIL (GC) AND PROJECT REVIEW & STEERING GROUP (PRSG)
A Governing Council (GC) and a Project Review & Steering Group (PRSG) will be constituted to set policies and oversee the
operations of CoEs periodically. The composition of the GC and the PRSG will be decided keeping in mind the objectives
and aspirations of the initiative.
A common Governing Council is proposed to provide high-level directions to the network of CoEs and to establish the
required policies under which they operate. The GC will be chaired by the Secretary, MeitY and the CEO, India AI as
Secretary. Other members will include the head of the Hub institutions, senior officials from Science and Technology,
industry and academic experts. The GC will meet with the PRSG and the CoE leadership at least once every 6 months. The
Project Review & Steering Group will provide more detailed guidance to the CoEs. The PRSG will periodically evaluate the
progress of the individual CoEs. The PRSG will be chaired by the CEO of IndiaAI and will comprise of officials from different
government departments, academia, and industry.
The CoEs will also have individual internal mechanisms to oversee and monitor the activities undertaken by the institution.
Each CoE will have an Executive Council with members from the Section 8 company led by a Chief Executive Officer (CEO)
along with other partners and outside experts. Further, the technical team for each CoE will include AI programmers,
engineers, domain experts, program managers, designers, etc. The foundational and applied research in each of the CoEs
will be carried out by faculty members from academic institutions involved, post-doctoral researchers, PhD students, and
other research manpower like engineers and interns.
To ensure the sustainability of the CoE, robust collaboration with the Industry, and the creation of commercially viable
outputs, the MeitY will not own any IP generated by the CoE. The IP generated can be retained by the idea generator or
by the CoE. Based on the nature of innovation, two IPR models may be considered:
1. Licensing agreements: CoEs can license their technology and intellectual property to industry partners for
commercialization. In these agreements, the CoE retains ownership of the intellectual property but grants the
industry partner the right to use it for a specific purpose or period of time. The terms of the licensing agreement,
including payment and royalties, can be negotiated between the CoE and the industry partner. This may be more
suitable for early-stage innovations by student entrepreneurs/start-ups.
2. Joint ventures: CoEs can form joint ventures with industry partners to develop and commercialise new AI
technologies and solutions. In these partnerships, both parties contribute funding, resources, and expertise, and
share ownership of the resulting intellectual property. The terms of the joint venture are typically negotiated in
16
a joint venture agreement that outlines the responsibilities, rights, and obligations of each party. This may be
more suitable for collaboration with existing industry partners for scaling AI innovations.
The exact nature of the IPR model may be determined on a case-to-case basis in consultation with the Governing Council.
SUSTAINABILITY MODEL 1
The CoEs will be required to start generating revenue to enable attaining sustainability after the initial 5 years of funding
support. It can do so in the following ways:
1. Commercialisation of IPR - The CoEs will have the ability to enter into licensing agreements/JV with the
innovators which will enable it to retain the ownership and grant the industry partner the right to use it for a
specific purpose or period of time. This will enable the CoEs to earn a royalty on the IPR.
2. Solutions/Technologies Developed - The CoEs will be able to provide the solutions/technologies developed in
the CoEs to the relevant industry/start-up/MSME partners for a one-time fee or a subscription fee. CoEs may also
deploy solutions themselves.
3. Consultancy - The CoEs will be able to provide consultancy and mentorship support to MSMEs to quip them with
the latest AI technology and a nominal fee may be charged for such services.
4. Capacity Building Events - The CoEs will be able to organise events such as Hackathons, boot camp seminars,
workshops, and investor connects for outreach and to attract better applications for the CoE. State and/or
national-level events will be organised to showcase the various advancements and achievements of the CoE.
5. Training programs and workshops- The CoEs will be able to provide training programs and workshops to start-
ups, industry partners and MSMEs to learn and upskill themselves on the latest technological advancements in
the AI and allied sectors.
6. Industrial Projects - The state of art infrastructure may be utilised for developing projects from scratch. The
development of such products/services may be done by the CoE team in collaboration with industry partners.
1
https://community.nasscom.in/communities/industry-40/india-industry-40-adoption-case-mature-manufacturing-digitalization-20
17
ANNEXURE A: OVERALL SELECTION PROCESS
The selection of Centres of Excellence (CoEs) in Artificial Intelligence is a critical step towards realizing the vision of the
India AI initiative. This 2-step selection strategy has been carefully crafted to ensure a rigorous, transparent, and objective
process in identifying the institutions best suited to establish world-class AI ecosystems. The goal is to create a
transformative environment for cutting-edge research, innovation, and collaboration with industry, academia, and
startups.
1. Expression of Interest (EOI) Call: Issue a public Expression of Interest (EOI) call inviting leading academic
institutions to express their interest in hosting a CoE. The EOI should include broad guidelines, objectives, and
the application process.
2. Evaluation of EOIs: A committee of experts from academia, industry, and government should evaluate the
received EOIs. The evaluation should focus on the capability, track record, and capacity of the institutions in
different aspects of AI, as mentioned in the EOI guidelines.
3. Shortlisting of Institutions: Based on the evaluation of EOIs, shortlist the top 10-12 academic institutions that
show the most potential to host a CoE. The shortlisted institutions will be invited to submit detailed proposals.
4. Detailed Proposal Submission: The shortlisted institutions will be given a specific timeframe (e.g., two months)
to submit a detailed proposal outlining their vision for the CoE and how they plan to achieve the objectives. The
proposal should include plans for research, innovation, industry collaboration, technology transfer, sustainability,
and outreach.
5. Expert Review of Detailed Proposals: An empowered committee comprising AI experts, industry representatives,
and government officials should review the detailed proposals. The evaluation should assess the institutions'
academic experience, research quality, and potential national and international impact on the AI ecosystem.
6. Presentation and Q&A: The shortlisted institutions should make presentations to the evaluation committee to
elaborate on their proposals and respond to any questions or clarifications sought by the committee.
7. Scoring and Ranking: Each proposal should be scored based on the evaluation criteria and guidelines provided
to the evaluation committee. The scores should be used to rank the proposals.
8. Selection and Announcement: Based on the rankings and evaluation scores, the top three institutions should be
selected to host the CoEs. The selection should be announced publicly, and the chosen institutions should be
informed officially.
9. Signing of MoUs: After the selection, the Ministry of Electronics and Information Technology (MeitY) should enter
into Memorandums of Understanding (MoUs) with the selected institutions, clearly outlining the roles,
responsibilities, funding, and governance structure.
10. Monitoring and Periodic evaluation: Regularly monitor the progress of the selected CoEs through periodic
reviews and evaluations. Hold the CoEs accountable for achieving their proposed objectives and delivering
impactful outcomes. Conduct comprehensive evaluations after specific periods to assess their achievements,
impact, and adherence to the original proposals and objectives. The evaluation parameters listed in Annexure D
are subject to revision and change every 2 years, depending on the progress of the mission and future context.
18
ANNEXURE B: TEMPLATE FOR SUBMISSION OF SHORT CONCEPT P ROPOSAL
A. Organisation Details
● Computing resources: Specify the available computing resources for AI research, including number of servers,
number of GPUs and other relevant hardware.
● Laboratories: Provide information about the number and details of research laboratories or facilities available for
AI research within the institution.
● Data storage: Provide information about the capacity and availability of data storage systems for AI research.
● Software licences: Mention any specialised software licences or tools available that support AI research activities.
● Access to datasets: Mention whether the institution has access to high-quality datasets that can be used for AI
research. Provide details of the same.
● Incubation centre:
○ Indicate whether the institution has an active and functioning incubation centre focused on fostering
entrepreneurship and supporting startups in AI-related fields:
○ Provide an overview of the activities undertaken by the incubation centre in the past 3 years:
19
C. Expertise and Personnel
D. Research Capacity
E. Collaboration Potential
Details
S.No Parameter Number (Stakeholders, objective, outcome,
impact, length of collaboration)
20
PART II - SHORT CONCEPT PROPOSAL
A. How does the establishment of the AI-CoE align with the larger goals of your academic institution? (500 words)
B. Provide an overview of the unique advantages that your academic institution offers to the establishment of the
AI-CoE (500 words)
C. What is the potential sectoral focus envisioned for the CoE and what makes your academic institution a good fit
for the identified sector? (500 words)
The evaluation parameters for Step I are outlined in the Table below:
2 Expertise & - Track record and achievements of faculty and staff in 20%
Personnel AI research, publications, or collaborations
4 Research Capacity - Evaluation of the quality and impact of ongoing and 20%
completed projects showcasing AI research
21
S.No Evaluation Evaluation Parameters Weightage
Category
22
ANNEXURE C: TEMPLATE FOR SUBMISSION OF DETAILED PROPOSAL
○ Provide an overview of the sectoral focus as well as the research priorities within the chosen sector(s).
Explain the significance and relevance of these research priorities in addressing real-world challenges and
opportunities. Highlight how these priorities align with the broader goals of the CoE and contribute to
advancements in AI technology and applications.
○ Describe the institute’s efforts to foster innovation and entrepreneurship in the field of AI. Highlight
initiatives, programs, and support mechanisms put in place to promote the development and growth of AI
startups. The following aspects should be covered:
■ Number of incubated startups in the last 5 years - List the names of AI startups that have been
incubated by the institute in the past five years. Include information about their respective sectors
and websites to showcase the diversity and scope of the startups supported by the institute.
■ Average amount of funding raised by start-ups - Include information on the total amount of
funding raised by AI startups that received support from the institute. Highlight any success stories
of startups that secured substantial funding from investors.
■ Placement rate in the past 5 years - Provide information on the placement rate of students and
researchers who have been associated with the institute in leading AI companies or research
institutions. This may include data on job placements, internships, or further academic pursuits.
○ Describe the capacity-building programs and initiatives implemented by the institution to enhance the
skills and expertise of its stakeholders in AI research and innovation in the last 5 years.
○ Describe different proposed activities the CoE will be undertaking to build the capacity of stakeholders
in the next 5 years and the proposed impact of the same.
○ List the institutions that the CoE seeks to identify as "spoke" institutions collaborating with the CoE.
These institutions may include universities, research centres, or industry partners that play a key role in
supporting the CoE's research priorities and initiatives in the AI sector.
○ Given that the CoEs will be required to start generating revenue to enable attaining sustainability after
the initial 5 years of funding support, outline the proposed sustainability model for the CoE to ensure its
long-term viability. This model should include funding sources, revenue generation strategies,
partnerships with industry and the proposed half-yearly/yearly targets to attain sustainability.
23
● Quarterly and yearly milestones
○ Provide a list of the key milestones planned for the CoE on a quarterly and yearly basis. These milestones
should align with the sectoral focus and research priorities, as well as indicate the progress and
achievements of the CoE over time. They may include activities such as research projects, publications,
collaborations, capacity building and entrepreneurship activities that the CoE
The evaluation parameters for Step II are outlined in the Table below:
1 Sectoral focus and research priorities - Assessment of the relevance of sectoral focus and research
priorities in the identified application sector(s)
4 Proposed Sustainability Model - Assessment of the clear funding sources and revenue
generation strategy for long-term viability
24
S.No Evaluation Category Evaluation Parameters
targets
6 Quarterly and yearly milestones - Alignment of milestones with the sectoral research
priorities and tangible progress and achievements of the
CoE over time
Selection of the Centers of Excellence (CoEs) in Artificial Intelligence, a rigorous and transparent process should be followed
to ensure that the chosen institutions are best suited to fulfil the objectives of the India AI vision.
The value of “x” in the table below is 0 for the first two years, 5 for the next two years, and 10 from the fifth year onward.
25
S.n Main Metric Sub-Metric Description Score Weightage
o
26
S.n Main Metric Sub-Metric Description Score Weightage
o
27
S.n Main Metric Sub-Metric Description Score Weightage
o
28
S.n Main Metric Sub-Metric Description Score Weightage
o
7 Revenue (10+2x)%
Generation and
Self-
Sustainability:
29
S.n Main Metric Sub-Metric Description Score Weightage
o
The scoring system will be on a scale of 1 to 100, where 1 represents the lowest performance and 100 represents the
highest performance for each sub-metric. The overall scores for the main metrics will be calculated based on the average
scores of their corresponding sub-metrics.
Scoring Guide:
A. Score
● 21-40: Below average performance. Minor evidence of achievement, but significant improvements are needed.
● 41-60: Average performance. Some achievements in the sub-metric, but further enhancements are required.
Note: The evaluators will assign scores based on the evidence provided by the Center of Excellence in each sub-metric.
The individual metrics have been assigned weights in sync with the vision.
30
The overall scores for the main metrics will be calculated as the sum of the products of individual metrics and weightage.
Example:
Overall Score 85
31
WORKING GROUP 2 :
CHAIRMAN:
MEMBERS:
32
OBJECTIVE
The objective of this report is to outline the key goals and objectives of the India Datasets Platform (IDP) and provide a
comprehensive understanding of its purpose, functionality, and potential benefits. This document highlights IDP’s role in
framing a process of identifying and providing access to potential datasets for data-driven governance as well as to
catalyze a robust data and AI-based startup and research innovation ecosystem. It will serve as a guiding framework for
the development, implementation, and utilization of the IDP, ensuring that all stakeholders have a common understanding
of its objectives and the desired outcomes it seeks to achieve.
Datasets are structured collections of data that are organized and presented in a specific format or structure. They consist
of individual data points or records that are related to a particular subject or domain. Datasets can include various types
of data, such as numerical, textual, categorical, or multimedia data. Government departments or agencies serve as data
providers and upload datasets onto the data exchange platform. Research organisations and private entities may also
contribute datasets to the platform. These datasets contain valuable information collected by the government and other
entities as expanded under Section H, such as demographic data, economic indicators, environmental data, or public
health statistics. Datasets act as the core offering of data providers and form the basis for data sharing and utilization.
Datasets are subject to data governance principles and guidelines. Data governance ensures that datasets are managed,
stored, and shared in a secure and compliant manner. It establishes rules for data quality, privacy protection, access
control, and metadata management, ensuring the integrity and usability of datasets within the data exchange platform.
Datasets form the core building blocks of the IDP, serving as the foundation for dataset sharing, analysis, and monetization.
They play a crucial role in enabling collaboration between dataset providers and consumers, driving innovation, and
unlocking the value of data for societal benefits.
The IDP will play a crucial role in boosting the AI ecosystem in India by providing a robust foundation for data-driven
innovation and development. Here's how IDP may contribute to the growth of the AI ecosystem:
IDP plays a pivotal role in creating an ecosystem where AI researchers, start-ups, and organizations can access high-quality
datasets, collaborate, and leverage data-driven approaches to drive AI innovation and development in India.
GLOSSARY
Term Definition
Data Exchange A platform that facilitates the sharing and exchange of data between data providers
and data consumers.
Data Provider A government department or agency that uploads and shares data on the data
exchange platform for consumption by data consumers.
Data Consumer An entity, such as a research institution or startup, that utilizes data from the data
exchange platform for applications, innovation, or research purposes.
33
Term Definition
API Application Programming Interface - A set of rules and protocols that allows software
applications to communicate and access data from the data exchange platform.
Dataset A collection of related data provided by data providers, which is made available for
consumption by data consumers on the data exchange platform.
Data Governance The overall management, control, and protection of data, including policies,
procedures, and practices related to data sharing, security, privacy, and compliance.
Metadata Descriptive information about a dataset, including its structure, format, source, and
other relevant details that provide context and facilitate data discovery and
understanding.
Data Security Measures and practices implemented to protect data from unauthorized access,
breaches, and other security threats, ensuring the confidentiality, integrity, and
availability of data.
Data Monetization The process of generating revenue or economic value from data assets, typically
through selling data, providing data-related services, or leveraging data for business
opportunities.
Governance Framework A set of guidelines, rules, and principles that govern the management, use, and
sharing of data within the data exchange platform, ensuring compliance and
accountability.
EXECUTIVE SUMMARY
Datasets from government ministries and departments play a fundamental role in artificial intelligence (AI) as they serve
as the primary building blocks for training, validating, and improving AI models. Datasets provide the necessary
information for AI algorithms to learn patterns, correlations, and relationships within the data, enabling models to make
accurate predictions or decisions when presented with new, unseen data. They are particularly vital in supervised learning,
where labelled examples help AI models understand the desired outputs, and unsupervised learning, where datasets
enable models to discover hidden patterns and structures within the data. Additionally, datasets are essential for
validating and testing AI models, ensuring their performance and generalization capabilities. Continuous learning and
improvement of AI models also rely on updated datasets. Therefore, acquiring and utilizing high-quality datasets are
critical steps in developing effective AI solutions that can drive innovation and solve complex problems across various
domains. The IDP Working Group chaired by Mr. Amitabh Nag, CEO, of Digital India Bhashini Division was set up with a
mandate to develop the blueprint for IDP’s design, architecture and working models.
In a world inundated with information, data provides valuable insights that help individuals, organizations, and
governments make well-informed choices. From market trends and consumer preferences to economic indicators and
social patterns, data analysis allows us to identify opportunities, mitigate risks, and optimize outcomes.
34
Data fuels emerging technologies like AI, enabling them to improve efficiency, enhance personalization, and create
intelligent systems that revolutionize industries and enhance user experiences, this can be achieved by forming datasets
from available data within the organization.
In the realm of healthcare and research, data plays a crucial role in driving progress and improving patient care. Health
records, clinical trials, and medical research data provide valuable insights that aid healthcare professionals in diagnosing
diseases, identifying patterns, and developing effective treatments.
• Leveraging data in healthcare systems allows tracking of epidemics, identifying risk factors, and finding innovative
solutions to health challenges.
• Datasets have the potential to foster social and economic development.
• Governments can use datasets to identify societal trends, formulate evidence-based policies, and address social
issues.
• Dataset-driven decision-making leads to better resource allocation, improved governance, and effective
solutions.
• Datasets have become an indispensable asset, empowering informed decision-making, and driving technological
advancements.
• Responsible and ethical use of data is crucial for shaping a better future for individuals, organizations, and society
as a whole.
The digital era has presented governments worldwide with a remarkable opportunity for the transformation of data in
governance. Data has become a valuable asset that, when effectively harnessed, can revolutionize decision-making, public
service delivery, and overall governance practices. Governments now have access to vast amounts of data generated by
citizens, businesses, and various public systems, offering unprecedented insights and possibilities for improvement. By
leveraging data analytics, governments can extract meaningful information, identify patterns, and gain valuable insights
into the needs and preferences of their constituents. This data-driven approach allows for evidence-based policymaking,
enabling governments to address societal challenges more effectively and efficiently. For instance, data analysis can aid
in identifying areas of high demand for public services, optimizing resource allocation, and improving service delivery.
Government Data can empower citizens, fostering a participatory and inclusive governance environment. By making data
accessible to the relevant stakeholders, governments can promote transparency, accountability, and citizen engagement.
Data initiatives provide opportunities for collaboration, innovation, and informed decision-making by citizens, businesses,
researchers, and non-governmental organizations. Citizens can actively contribute to governance processes, provide
feedback, and co-create solutions based on data-driven insights. The transformation of data in government also opens
avenues for improving the overall quality of public services. Through predictive analytics and machine learning,
governments can anticipate and prevent issues before they arise. For example, data analysis can identify potential
infrastructure failures, anticipate disease outbreaks, or detect fraudulent activities, enabling proactive interventions and
resource optimization.
Harnessing the full potential of data in government requires several considerations. Governments must establish robust
data governance frameworks, including privacy protection, security measures, and ethical guidelines. Effective data
management practices ensure that data is collected, stored, and used in a responsible and secure manner, respecting
privacy rights, and safeguarding against misuse. The opportunity for the transformation of data in government is immense.
By embracing data-driven governance practices, governments can enhance decision-making, increase transparency, and
35
engage citizens more effectively. The utilization of data analytics, open data initiatives, and advanced technologies
empower governments to address societal challenges and deliver improved public services. As governments continue to
explore and invest in data-driven approaches, the potential for positive and transformative change in governance is
significant.
The implementation of the Indian Data Platform (IDP) presents a significant opportunity for government departments to
unlock the value of their data and leverage it for evidence-based decision-making, improved governance, and efficient
service delivery.
“India Datasets platform (IDP) is conceptualised as a platform which makes all available, discoverable and usable
datasets for research and start-ups with role-based access.”
Identifying potential data that can be used for national benefits requires a systematic approach. By following the outlined
steps, the platform can strategically gather diverse datasets from government ministries, research institutions, and private
organizations, fostering collaboration and unlocking valuable insights through emerging technologies like AI and data
analytics. Here are some steps that a government department can take to identify such data.
• Clearly articulate the objectives and priorities of the government department. Identify the areas where data can
play a significant role in addressing societal challenges, improving public services, or supporting policy
formulation.
• Engage with internal and external stakeholders to understand their data needs, challenges, and potential data
sources. This can involve consultations with other government departments, industry experts, research
institutions, and citizens.
• Conduct an inventory of the data already available within the department. Evaluate the quality, relevance, and
potential utility of the data for addressing the identified objectives. Consider both structured data (e.g.,
databases, spreadsheets) and unstructured data (e.g., documents, text, images).
• Look beyond the department's internal data holdings and explore external sources of data that could be relevant
to the objectives. This can include data collected by other government agencies, research institutions, private
organizations, or even publicly available data.
• Foster collaboration with other government departments and agencies to identify and share data that can be
mutually beneficial. Establish data-sharing agreements and protocols to facilitate the exchange of data while
ensuring privacy and security.
• Stay updated on emerging technologies, such as AI, machine learning, and data analytics, that can unlock insights
from large and complex datasets. These technologies can help identify patterns, correlations, and trends in data
that may not be immediately apparent.
• Ensure that any data collection and usage adhere to privacy regulations and ethical guidelines. Protecting the
privacy of individuals and sensitive information is crucial in handling data for national benefits.
• Conduct pilot projects or proof of concepts to test the feasibility and value of using specific datasets for national
benefits. These small-scale initiatives can provide insights into the potential impact and challenges associated
with utilizing the data.
• Regularly evaluate the effectiveness and impact of using data for national benefits. Monitor progress, gather
feedback from stakeholders, and make necessary adjustments to data strategies and priorities.
Data that is managed through a proper channel and segment is more valuable as considered in standard data management
practices. IDP is conceptualised on a federated approach. In a federated data approach, data remains distributed across
multiple independent systems or organizations. Each system retains control and ownership over its own data, and data is
36
accessed and queried in a decentralized manner while enabling collaborative analysis across different Ministry and
department datasets. In this approach, there is no central data repository or consolidation of data. Instead, when a query
is executed, it is sent to the relevant systems that hold the data being queried. These systems independently process the
query and return the results to the user.
The federated data approach allows organizations to maintain autonomy and control over their data (managed by DMO)
while enabling collaborative analysis across different ministry datasets. However, it can be challenging to ensure data
consistency, security, and coordination across multiple systems. In IDP’s federated structure following data activities can
be easily done:
• Datasets Catalog: A centralized repository or inventory that provides a comprehensive view of available datasets
within an organization. It includes metadata about the datasets, such as descriptions, attributes, data sources,
and access permissions.
• Smart Contract: It provides a secure and transparent mechanism for data transactions, ensuring trust and
authenticity in data exchange between providers and consumers. These self-executing contracts enable
automated agreement enforcement, streamline processes, and enhance data governance within the IDP
ecosystem.
• Datasets Discovery: The process of locating and exploring relevant datasets within a data catalogue or data
repository. It involves searching, filtering, and evaluating datasets to find the most suitable ones for specific
purposes.
• Data Lineage: The ability to track and trace the origin, transformations, and movement of data throughout its
lifecycle. Data lineage helps in understanding data dependencies, ensuring data quality, and complying with
regulatory requirements.
• Datasets Access Control: The mechanisms and policies that control who can access, modify, or view specific
datasets. It involves defining user roles, permissions, and authentication mechanisms to protect sensitive or
restricted data.
• Datasets Versioning: The practice of maintaining different versions of a dataset to track changes over time. It
allows users to refer to specific versions of a dataset and compare differences between versions.
• Datasets Collaboration: The process of enabling multiple users or teams to collaborate and work together on
shared datasets. It involves providing mechanisms for data sharing, data synchronization, and collaborative data
analysis.
• Datasets Privacy: The protection of personal or sensitive data to ensure compliance with privacy regulations and
ethical considerations. It involves implementing measures like anonymization, pseudonymization, or data
masking techniques to safeguard individual privacy.
IDP enables input (i.e. sharing) and output (i.e. accessing) of data in supported file formats (like CSV, JSON and XML) and
via API. In other words, Data Providers can share data on IDP either by uploading a file(s) or via API, while Data Consumers
can access data on IDP either by downloading a file(s) or via API. Further, Data Providers can specify whether a shared
dataset can be accessed by Data Consumers as a downloadable file(s) and/or via API.
The preferred format for exchanging data on IDP will be via APIs. However, the platform will also make the uploaded
dataset files available to be accessed via APIs by the Data Consumers.
PRICING MODEL
A pricing model in IDP will play a crucial role in determining the appropriate pricing for a product or service. IDP will
implement a comprehensive pricing and payment management module. This Payment module aims to empower Data
Providers to specify the pricing details for the datasets they share on the platform, enabling a seamless and transparent
transaction process with Data Consumers.
37
Features of the Pricing and Payment Management Module
The pricing and payment management module will provide Data Providers with the ability to set the price for the datasets
they share on IDP. Data Providers can determine the value of their data products based on various factors, such as the
dataset's uniqueness, relevance, and demand.
The module will offer flexibility in pricing options, allowing Data Providers to choose between different pricing models.
They can set a fixed price for a dataset for a specific period, a one-time payment for a certain number of accesses, or even
offer subscription-based pricing models.
• Terms of Payment:
Data Providers will have the freedom to define the terms of payment associated with accessing and consuming their
datasets. They can specify whether the payment is a one-time transaction, periodic subscription, or based on the number
of accesses.
Once the Data Provider specifies the pricing details, the module will facilitate a streamlined payment process for Data
Consumers. After selecting the desired dataset, Data Consumers will be guided through a secure and efficient payment
gateway to complete the transaction.
Data Consumers can access the dataset concerned immediately after completing the payment process as specified by the
Data Provider. This ensures a seamless experience for users and encourages more data transactions on the platform.
By building an interoperable, robust, and secure digital platform, the IDP strives to foster the growth of the data-driven
digital economy and governance systems. The IDP will be developed using modular software components based on micro-
services architecture, incorporating open-source tools and technologies. Through its robust federated framework, the IDP
aims to promote the efficient utilization of data for various purposes that solve citizen-centric problems and Research.
Features of IDP are:
• Open-Source Architecture & Codebase: The IDP is built on an open-source architecture, ensuring transparency,
flexibility, and community collaboration. The use of open-source codebase promotes accessibility, allowing
developers to contribute, customize, and enhance the platform. This approach fosters innovation, encourages
collaboration, and enables continuous improvement of the IDP.
• Dataset Catalogue & Standards for Integration: The IDP establishes data registries and standards that enable
seamless integration of various data types shared by both government and private entities. These registries and
standards ensure data consistency, quality, and compatibility, allowing for effective data integration and analysis.
By providing a unified platform for diverse datasets, the IDP promotes data-driven decision-making and enables
innovative solutions across sectors.
• Data Exchange Ecosystem: The IDP fosters a robust data exchange ecosystem by facilitating secure and
standardized data sharing between different stakeholders. It promotes collaborations and partnerships, allowing
38
data contributors to share their datasets with other users within a trusted environment. The data exchange
ecosystem encourages knowledge sharing, innovation, and the development of new applications and services.
• User Created Data Artefacts: The IDP enables users to contribute and share user-generated data artefacts, such
as code, scrapers, annotations, analysis, visualizations, AI/ML models, and more. This feature encourages
collaboration, knowledge exchange, and the reuse of data artifacts within the IDP community. It empowers users
to leverage and build upon existing resources, fostering a culture of open innovation and accelerating the
development of data-driven solutions.
• Data API & Catalogue: The IDP features a comprehensive data and API catalogue that serves as a centralized
repository of available datasets and APIs. It provides a searchable interface for users to discover, access, and
utilize data resources. The catalogue includes detailed descriptions, metadata, and access mechanisms,
facilitating efficient data exploration and integration.
• Open Metadata Standard: The IDP incorporates an open metadata standard, ensuring consistency and
interoperability across diverse datasets. This standard provides a common framework for describing and
organizing data, facilitating easy discovery, understanding, and integration of different data sources. It enables
data consumers to efficiently navigate and access relevant datasets within the IDP ecosystem.
FUNCTIONS OF IDP
India Datasets Platform (IDP) is a unified national data sharing exchange platform for all stakeholders to share, explore,
discover, and consume data/metadata/ artefacts/APIs without compromising their business or social goals, or on privacy,
security, and other concerns.
IDP is envisaged to include a Data Provider Workflow for sharing of data/metadata via API or file upload; a Data Consumer
Workflow for discovering and accessing datasets available on IDP for the consumers; a Community Forum to provide a
space to Data Providers and Consumers to share their analyses, code, findings and views with other community members
and stakeholders; a Data Visualization Engine for users without programming skills to create charts and maps using data
available from the platform; a Developers’ Sandbox for user with programming skills to use Open Source
Libraries/Software like Python and R to clean, analyze, annotate, anonymize, compile, restructure, visualize and related
activities on data available from the platform and elsewhere; and other features and functionalities as per requirements
of stakeholders and technical feasibility.
IDP will primarily enable the exchange of data and metadata, following common and/or domain-specific standards as
finalised by competent authority(ies), by the following types of source entities (referred to as ‘Data Providers’):
• Data generated using public funds by Indian Central and State government entities, Union Territories, Urban Local
Bodies and other government/government-owned entities. Chief Data Officers associated with government
entities will be responsible for sharing of data on IDP on behalf of the entity concerned. IDP can be used by
government entities for both Government-to-Government (G2G) and Government-to-Any (G2X) data exchange
use cases.
• Data generated by Indian private entities can be enabled to be shared, discovered, accessed and used via IDP as
per NDGP and other Acts/Rules/Policies. Such private entities include but are not limited to academic institutions
and universities; citizens and individuals; civil society and research organisations; industry bodies, private
companies and start-ups; media organisations; and open data/technology communities and initiatives.
• Data collected/compiled/generated by international government, multilateral, multistakeholder or private
entities can also be enabled to be shared, discovered, accessed, and used via IDP as per applicable Indian and
international law and under approval from competent authority(ies).
39
Further, IDP will enable the exchange of data sourced by Data Providers through any of the following ways:
• Data collected or generated by the Data Provider, subject to data quality criteria and review process framed for
collected or generated data (if and as applicable) by competent authority(ies)
• Data crowdsourced by the Data Provider, subject to data quality criteria and review process framed for
crowdsourced data (if and as applicable) by competent authority(ies)
• Data compiled/harvested/scraped by the Data Provider, subject to data quality criteria and review process
framed for harvested data (if and as applicable) by competent authority(ies)
• Central Metadata Catalogue/Registry of all shareable, non-personal data/APIs managed/owned by various
entities can be created by enabling Data Provider entities, which prefer to host and serve datasets
managed/owned by them on their independent domain/entity-specific data portals, to only make metadata of
their data/API collections available for discovery and access on IDP. This will ensure much wider discoverability,
and thus access and usage, of the data/API collections concerned without necessitating migration of the actual
datasets to IDP.
Data Providers may share a dataset on IDP either by uploading the same as a file (in an open format like CSV, JSON and
XML) or via APIs. A copy of the shared data is only kept with IDP when the dataset is uploaded as a file and not when it is
shared via API.
Under no circumstances does IDP access, analyse, or modify datasets shared with IDP. IDP makes the shared datasets
available to Data Consumers as-is-shared by the Data Provider and under the terms of sharing specified by the Data
Provider.
Alternatively, a Data Provider may decide to only share metadata on IDP for cataloguing and improved discoverability and
keep the datasets available only on their portal/website.
The India Datasets Platform (IDP) serves as a platform for enabling the sharing and accessing of data from various data
sources. It supports multiple file formats, such as CSV, JSON, and XML, and also provides Application Programming
Interfaces (APIs) to facilitate data exchange.
(A) Input to the India Datasets Platform (IDP): Data Providers can input data into the IDP using two main methods:
Uploading files: Data Providers can share datasets by uploading files in supported formats like CSV, JSON, or XML.
This means they can submit the data in a structured file format directly to the platform.
Via API: Data Providers can also share data by connecting their systems to the IDP through APIs. This allows for
a more dynamic and real-time exchange of data, where the IDP can directly retrieve data from the Data Providers'
systems as needed.
(B) Output from the India Datasets Platform (IDP): Data Consumers can access data from the IDP using two primary
methods:
Downloading files: Data Consumers can download datasets from the IDP in the supported file formats like CSV,
JSON, or XML. This allows them to get a copy of the data in a structured file format that they can analyze or use
offline.
Via API: Data Consumers can also access data in real time by connecting their applications to the IDP through
APIs. This enables them to query the IDP and receive data directly into their applications without the need to
download the entire dataset.
40
Flexibility for Data Providers and Data Consumers
The IDP offers flexibility to both Data Providers and Data Consumers:
Data Providers can choose how they want their data to be accessed. They can specify whether a dataset can be
downloaded as a file(s) or accessed only through APIs. This allows them to control the level of access to their datasets.
Data Consumers can choose their preferred method of access. If they want real-time data and have the capability to
integrate with APIs, they can use APIs to access the data. On the other hand, if they need a local copy for analysis, they
can download the dataset as a file.
The IDP encourages the use of APIs as the preferred method for exchanging data. APIs enable a more efficient and
seamless data exchange process. However, the platform also recognizes that some users may still prefer working with
files, so it allows uploaded datasets to be made accessible via APIs as well.
“The India Datasets Platform (IDP) provides a robust infrastructure for data sharing and access. It supports various file
formats for uploading and downloading datasets and encourages the use of APIs for real-time data exchange. Data
Providers have control over how their datasets are accessed, and Data Consumers have the flexibility to choose the
method that best suits their needs.”
IDP is a unified national data sharing and exchange platform to enable various data sharing and exchange use cases of all
stakeholders including but not limited to Central/ State/UT Governments, public sector undertakings, private sector
companies, industry bodies, MSMEs and startups, academia and researchers, civil society and media organisations, open
technology communities, etc.
IDP is inclusive of all types of stakeholders, while offering customisable options to Data Providers to define and control
how data/metadata/artefact/APIs shared by them can be accessed and consumed by interested Data Consumers.
Metadata of all datasets shared on IDP by different stakeholders are discoverable by all users on IDP. However, access to
datasets shared on IDP depends on the terms of sharing defined for the dataset concerned by the Data Provider
concerned.
A Digital Agreement is generated in real-time for a Data Consumer, who wants to consume is approved/authorised by the
Data Provider concerned to access a dataset. The terms of sharing of the dataset, including the policy and licence under
which the dataset is shared as well as additional terms and conditions (if any), as specified by the Data Provider are
contained in the Agreement along with the general terms and conditions for the Data Provider and the Data Consumer.
The Data Provider can define the terms of sharing a dataset by selecting from the pre-approved options for policy, license
and additional terms and conditions of sharing available on IDP. If needed, a request to make other policy and license
terms available for selection on IDP may be submitted by a Data Provider to be processed and approved by the competent
authority(ies). Once approved, the Data Provider concerned will be able to select the requested policy/license on IDP to
define the terms of sharing of datasets shared by them.
IDP enables sharing of non-personal data through the following Access Types:
• Open Access to shareable, non-sensitive data without any requirement of registration or authorisation of the
Data Consumer.
41
• Registered Access to data that requires registration of an account on IDP by the Data Consumer before accessing
the data concerned.
• Restricted Access to data that requires registration of an account on IDP by the Data Consumer and approval of
a data access request submitted by the Data Consumer before accessing and/or consuming the dataset.
A Data Provider can decide to either make the dataset available to all registered and non-registered users of IDP under
Open Access or make it available only to Data Consumers registered on IDP under Registered Access. The data provider
may also choose to make the dataset available only to Data Consumers who are registered on IDP and who are
approved/authorised by the Data Provider to access the data concerned under Restricted Access.
In other words, Data Consumers will have to register on IDP and submit a data access request to be approved by the Data
Provider concerned for Registered and Restricted Access datasets, respectively.
Data Consumers may download a dataset in open formats like CSV, JSON and XML or get API-based access to the dataset,
as enabled and specified by the Data Provider concerned.
IDP, as a unified national data sharing and exchange platform, is not going to access, analyze or modify any datasets shared
on the platform by any Data Provider and will make the datasets available to Data Consumers as-is-shared by the Data
Provider under the terms of sharing specified by the Data Provider.
Management of integrity of shared datasets may be divided into three stages of the dataset collection/generation,
management and sharing process:
• Data integrity during the collection/generation stage will be the sole responsibility of the Data Providers
concerned since IDP will not have a role in this stage of the data life cycle.
• Data integrity during the management stage will also be the sole responsibility of the Data Providers concerned
as all data management activities, including but not limited to annotation, anonymization, cleaning and
organizing, are expected to be undertaken by the Data Providers concerned (or by agencies selected/deployed
by them).
IDP strongly recommends hosting and management of shared data at source, i.e. in a data storage/processing facility
owned/managed by the Data Provider concerned. Data hosted/ managed by a Data Provider may be made available on
IDP for discovery, sharing and usage via API while the data is managed/maintained/updated at source to ensure data
integrity and Single Source of Truth.
Data integrity during the exchange stage will be ensured by IDP by administering all data sharing and exchange flows,
between the Data Provider and IDP as well as between the Data Provider and Data Consumer (once the Data Consumer
has gained access to the data concerned), through a dedicated secure channel/tunnel to prevent any data corruption or
loss while in transit. Data shared by Data Providers on IDP will only be stored on IDP storage if the data concerned has
been uploaded as file(s) and not via API. Strict access control and security protocols will govern the management of shared
data kept on IDP storage to ensure zero unauthorized access, modification or sharing of the same.
Under the upcoming National Data Governance Policy (NDGP), MeitY is to establish a Data Management Office (DMO).
DMO will prescribe Guidelines, Rules, and Standards to govern data collected/managed by all government entities,
including but not limited to quality monitoring, control, and assurance processes. IDP is being developed with an
extensible, modular, and robust microservices-based architecture so as to incorporate and implement the upcoming
Guidelines, Rules and Standards as may be prescribed by DMO as per NDGP and other Acts/Rules/Policies.
“Maintaining data integrity in IDP offers multiple benefits, including ensuring the accuracy and reliability of shared
datasets, fostering trust among data providers and consumers, and complying with data governance regulations.
42
Emphasizing the role of Data Providers in data integrity during the collection and management stages, along with IDP's
secure exchange processes, will enhance the platform's effectiveness in facilitating seamless and trustworthy data
sharing and usage for various national initiatives and decision-making processes.”
• Data Marketplace
IDP can serve as a data marketplace where data providers can offer their datasets for sale or subscription. The platform
can provide a user-friendly interface for data consumers to browse and discover available datasets. Data providers can
set pricing models, such as one-time purchases, subscriptions, or licensing fees, based on the value and uniqueness of
their datasets. The central platform facilitates the transactional process, handling payments, licensing agreements, and
data access controls.
The central platform can enable data providers to specify access levels and permissions for their datasets. This allows data
providers to control who can access their data and under what conditions. They can choose to make certain datasets
publicly available for free, while charging a fee for more exclusive or premium datasets. The platform ensures that only
authorized data consumers who have paid or subscribed to the datasets can access them, protecting the interests and
revenue potential of the data providers.
IDP works as a systematic structure that will cover all challenges related to data generation and curation. Functioning of
IDP is split into a smaller segment that further gets split into sub-segments. IDP working basically deals with the following
verticals to operate efficiently.
• Data Provider: Department of Government/ministries for uploading curated datasets on the IDP platform.
Research organisations and private entities may also contribute datasets to the platform.
• Open-source Datasets Exchange: The IDP incorporates an open-source data exchange model, which forms the
foundation of its data-sharing infrastructure. This model promotes transparency, interoperability, and
community participation. The open-source approach allows for the development and sharing of standardized
protocols, formats, and interfaces, ensuring seamless integration and exchange of data within the platform. It
encourages the active involvement of developers, data scientists, and other stakeholders in improving and
expanding the functionalities of the data exchange ecosystem. The open-source nature of the model empowers
users to customize, enhance, and build upon the existing framework, fostering a collaborative environment for
data sharing and utilization.
• Data Consumers: Startups and Research institutions that will consume data for building applications.
DATA PROVIDERS
As the backbone of the data exchange platform, government departments play a crucial role in supplying valuable data
for research institutions to consume and utilise in building applications. Government departments encompass a wide
range of sectors, including healthcare, education, transportation, environment, and more. These departments possess
extensive datasets that hold immense potential for driving innovation and addressing societal challenges. Research
organisations and private entities may also contribute datasets to the platform. Data providers, in this context, refer to
the government departments responsible for uploading and sharing their data on the platform. They act as the custodians
of valuable information gathered through their operations, policies, and programs. These departments recognize the
importance of data-driven decision-making and understand the value of making their datasets accessible to external
stakeholders, such as research institutions and start-ups.
Data providers are responsible for creating datasets that are suitable for consumption by research institutions. This
involves several steps, including data collection, cleaning, transformation, and annotation. Government departments have
established processes to ensure data quality, accuracy, and relevance. Data providers employ various methods to curate
43
and prepare datasets. They collect data from multiple sources, such as internal systems, surveys, and third-party data
providers. Once collected, the data goes through a rigorous cleaning process to remove errors, inconsistencies, and
duplicate entries. Data providers also ensure compliance with privacy regulations and take measures to anonymize or de-
identify sensitive information.
After the cleaning process, data providers transform the data into a format that is easily consumable by research
institutions. This may involve structuring the data into standardized schemas, organizing it in a tabular or hierarchical
format, and applying relevant metadata. Data annotation is another crucial step where data providers add descriptive
labels, tags, or annotations to enhance the understanding and usability of the datasets.
To facilitate seamless access to data, data providers prepare Application Programming Interfaces (APIs) at their end. APIs
act as the interface through which research institutions can retrieve specific datasets or interact with the data in real time.
Data providers design APIs to expose relevant endpoints that allow authorized access to their datasets. When preparing
APIs, data providers define the functionalities and endpoints that researchers can utilize to retrieve data. They specify the
format in which data will be delivered, such as JSON or XML, and outline any authentication or authorization mechanisms
required for access. Additionally, data providers may establish rate limits or usage policies to ensure fair and responsible
use of the APIs.
By exposing APIs, data providers enable research institutions to access and consume data programmatically, streamlining
the process of data retrieval and integration into their applications. This seamless integration facilitates faster
development and innovation, enabling research institutions to leverage the government's datasets effectively. Data
providers, as government departments, play a critical role in the data exchange platform. They create, curate, and
annotate datasets to ensure data quality and relevance. Additionally, they prepare APIs that allow research institutions to
access and consume the data in a standardized and efficient manner. Collaborating with data providers empowers
research institutions to leverage valuable government data and drive innovation in various domains.
In cases where government departments may not have the resources or expertise to directly curate, label, or filter their
data for creating useful datasets, partner agencies empanelled with the IDP can play a vital role. These partner agencies,
equipped with domain knowledge and technical capabilities, collaborate with government departments to assist in the
data preparation process. Partner agencies work closely with the department to understand their data requirements and
objectives. They employ their expertise in data management and analysis to curate datasets that align with the
department's needs. This involves identifying relevant data sources, cleaning and transforming the data, and applying
appropriate labelling or filtering techniques.
The partner agency works in collaboration with the department to define the criteria for labelling or filtering the data.
They may employ machine learning or manual annotation techniques to accurately label the data based on specific
attributes or categories. By leveraging their expertise, partner agencies can help ensure that the curated datasets are
accurate, relevant, and useful for research institutions.
To maintain data quality and reliability, it is crucial to verify the datasets created by partner agencies. The verification
process involves several steps to ensure the accuracy, completeness, and integrity of the datasets before they are made
available on the data exchange platform.
Verification starts with a thorough review of the dataset by subject matter experts within the government department.
They assess the data against predefined criteria, such as data quality standards, relevance to the department's objectives,
and alignment with regulatory requirements. Any discrepancies or anomalies are identified and addressed through further
collaboration with the partner agency.
The involvement of partner agencies brings valuable expertise and resources to government departments, enabling them
to curate and filter data effectively. Through thorough verification processes, the government department can maintain
44
data quality and ensure that the datasets created by partner agencies are accurate, reliable, and suitable for consumption
by research institutions.
DATA CONSUMERS
In IDP Data consumers are entities such as research institutions, startups, or organizations that utilize the data provided
by government departments for application building, innovation, or research purposes. They can be identified through a
process of registration or onboarding on the data exchange platform. Interested parties would submit their details,
including their organization's profile, intended use of the data, and any relevant expertise or credentials. The central body
overseeing the data exchange platform would review these submissions and verify the authenticity and suitability of the
data consumers based on their stated objectives and capabilities.
Data consumption by data consumers occurs using Application Programming Interfaces (APIs) provided by the data
providers. The APIs serve as the interface through which data consumers can access and retrieve the desired data. When
a data consumer registers on the platform and obtains access to specific APIs, their usage is recorded and monitored by
the central body overseeing the data exchange ecosystem. The central body tracks API usage metrics, including the
frequency and volume of data accessed, to ensure fair usage and compliance with any established policies or agreements.
This enables the central body to recognize and validate the usage of APIs by data consumers, providing insights into the
overall data consumption patterns.
Smart contracts can play a crucial role in ensuring data authenticity and establishing a trustworthy relationship between
data consumers and data providers. Smart contracts are self-executing contracts with predefined conditions written into
code, and they can be deployed on a blockchain network. These contracts can be utilized in the data exchange ecosystem
to enforce data usage agreements, track data access, and ensure compliance with data sharing policies. Smart contracts
can include clauses related to data confidentiality, usage restrictions, and intellectual property rights. They can automate
the verification and validation processes, ensuring that data consumers adhere to the agreed-upon terms and that data
providers have control and visibility over their data. By leveraging smart contracts, the data exchange platform can
establish a transparent and secure environment that fosters trust between data consumers and data providers.
Data consumers are identified through a registration process, where their credentials and intended data usage are
reviewed by the central body overseeing the data exchange platform. Data consumption occurs through APIs provided by
the data providers, with usage recognition monitored by the central body. Smart contracts can be employed to ensure
data authenticity, establish a trustworthy relationship, and enforce data usage agreements between data consumers and
data providers. These measures contribute to a transparent, compliant, and secure data exchange ecosystem.
The building blocks of IDP work on combining data providers, consumers and core working exchange structure. It involves
complex architecture of various modules that work in the system with overall dependency on the governance framework
responsible for carrying out operations for maximum outputs. From data consumer to data provider the flow of
information and interconnected nodes work in multi-directional pathways to ensure success on the end of data consumers
being a valid stakeholder for usability of data.
Blocks are explained below with their basic nature of work within the IDP ecosystem:
Data Providers Registry A registry that keeps track of registered data providers, storing their information and
credentials to facilitate data contribution and management within the platform
45
Data Consumer Registry A registry that maintains records of registered data consumers, allowing them to
access and utilize available datasets while ensuring secure and authorized data
consumption
Portal Interface The user interface of the data exchange platform, provides a user-friendly and intuitive
interface for data providers and consumers to interact with the system, access
datasets, and manage their accounts
Data Catalogue The process of organizing and categorizing datasets within the platform, making them
easily discoverable and searchable by data consumers
Metadata Exchange A mechanism for exchanging metadata, including dataset descriptions, attributes, and
other relevant information, to facilitate understanding and discovery of available
datasets
Access Model A defined set of rules and permissions that govern data access, specifying who can
access which datasets and under what conditions
Policy/License Management The management of policies and licenses that govern the usage, sharing, and
redistribution of datasets within the platform, ensuring compliance with legal and
regulatory requirements
Payment Management A system for managing financial transactions related to data access and usage,
including billing, invoicing, and payment processing
Identity & Account A module that handles user identity verification, registration, and account
Management Module management, ensuring secure and authorized access to the platform.
Transaction & Payment A module that facilitates secure and transparent transactions between data consumers
Module and providers, ensuring smooth financial exchanges for accessing datasets.
High-value Datasets A dedicated repository for storing and managing datasets of high importance or
Repository strategic value, ensuring their prioritized availability and accessibility.
Datasets Classifier A mechanism that classifies datasets based on predefined criteria or categories,
enabling efficient dataset organization and discovery.
46
Pricing Model – A framework that defines the pricing models and strategies for monetizing datasets
Monetization Framework within the platform, allowing data providers to earn revenue from their data
contributions.
Data Versioning A system that manages and tracks different versions of datasets, enabling data
providers to update and maintain dataset versions over time.
Dataset Engine The core engine responsible for data storage, retrieval, and processing within the
platform, ensuring efficient and reliable dataset management.
API Catalog Module A module that provides a catalogue of available APIs within the platform, allowing data
consumers to discover and utilize APIs for accessing and interacting with datasets.
Cross Domain Data Linkage IDP Provides access to a Process of connecting and combining datasets from different
domains or fields to gain new insights and create a more comprehensive
understanding of complex phenomena
Developer Sandbox A dedicated environment within the IDP that allows developers to experiment, test,
and build applications using the available datasets and APIs without affecting the
production environment.
The institutionalization of the Indian Dataset Platform (IDP) is a crucial process that involves formalizing and integrating
the platform within the existing institutional framework. To ensure its long-term viability and effectiveness, several key
aspects need to be addressed. Firstly, a well-defined legal and regulatory framework should be established to govern the
operations of the IDP, ensuring compliance with data privacy, security, and intellectual property rights while allowing for
flexibility and future adaptations. Secondly, a dedicated organizational structure with appropriate leadership roles and
governance bodies should be created to effectively manage and govern the IDP's operations. This includes establishing
clear decision-making processes and coordination mechanisms.
Sustainable funding mechanisms are also essential for the IDP's success. This can involve a combination of government
funding, public-private partnerships, grants, and revenue generation models to ensure ongoing maintenance,
technological upgrades, and capacity-building initiatives. Active engagement with relevant stakeholders, such as
government agencies, private organizations, research institutions, and citizen groups, is crucial during the
institutionalization process. Their perspectives and expertise should be sought to shape the IDP's governance, policies,
and strategic directions. Collaborative partnerships should be fostered to leverage collective knowledge and resources.
Capacity building and training programs are vital for maximizing the benefits of the IDP. These initiatives can focus on
enhancing data management, data sharing, data analysis, and data governance skills among government officials, data
contributors, researchers, and other stakeholders. By building a skilled workforce, the IDP can be effectively utilized,
promoting data-driven decision-making across various sectors. Finally, regular monitoring and evaluation mechanisms
47
should be established to assess the performance, impact, and effectiveness of the IDP. This feedback-driven approach
helps identify areas for improvement, address challenges, and optimize the platform's functionalities, ensuring its
continued growth and success. The organization structure will look like the following:
The structure represents a hierarchical organization with the CEO at the top, followed by key positions in different
departments. Each department includes senior roles responsible for overseeing specific aspects of the IDP's operations,
as well as junior and mid-level positions contributing to the day-to-day functioning of the platform. The data department
(CDO) includes roles such as Senior Data Analysts, Data Governance Managers, and Data Operations Supervisors, while
the technology department (CTO) has positions like Senior Software Engineers, Database Administrators, and Data
Integration Specialists.
The Chief Partnerships Manager department focuses on strategic partnerships and collaborations, with roles such as
Senior Partnership Specialists and Business Development Managers. This listed structure provides a clear overview of the
organizational hierarchy and the various roles within the IDP.
Initially, key officers that will enable the IDP performance are:
(A) Chief Executive Officer (Higher Authority for Managing IDP from Top to Bottom):
The Chief Executive Officer is the ultimate authority responsible for overseeing and managing all aspects of the IDP,
ensuring effective leadership and coordination from top to bottom, and driving the organization towards its strategic goals
and vision.
(B) Chief Data Officer (Manages entire Data operations of IDP):
The Chief Data Officer is responsible for overseeing and managing the complete data providers of the IDP, including data
collection, analysis, storage, and security, ensuring the availability of accurate datasets to help the research & innovation
ecosystem in India.
The Chief Technology Officer is tasked with managing & upgrading the technology operations of the IDP, overseeing the
development and implementation of technology strategies, infrastructure management, and ensuring the efficient
delivery of technical solutions aligned with organizational objectives. The CTO is also responsible for conducting
workshops along with DMO to educate government ministries on data management and determine how datasets can be
used for further usage in segments of Artificial intelligence.
(D) Chief Business Officer (Responsible for Revenue Generation and Client Management):
The Chief Business Officer plays a crucial role in driving revenue generation and client management for the IDP, developing
and executing business strategies, fostering partnerships, and ensuring client satisfaction to achieve sustainable growth
and profitability.
(E) Chief Operating Officer (Responsible for Detailed operations management of IDP):
The Chief Operating Officer is responsible for the comprehensive management of operations within the IDP, overseeing
day-to-day activities, optimizing processes, and ensuring operational efficiency to deliver high-quality services to all
stakeholders of IDP.
The Chief Partnerships Manager is responsible for developing and nurturing strategic partnerships for the IDP, identifying
collaboration opportunities, negotiating agreements, and fostering relationships with external stakeholders to drive
innovation, growth, and expand the organization's reach by onboarding more data organizations as part of IDP.
48
“To ensure the successful institutionalization of IDP, it is crucial to establish a robust legal and regulatory framework,
secure diverse and sustainable funding sources, and actively engage with relevant stakeholders. Furthermore,
conducting capacity building and training programs will enhance the skills of the workforce, and implementing a
comprehensive monitoring and evaluation system will ensure continuous improvement and effectiveness of the
platform. A well-structured and coordinated organizational framework under India AI as IBD and National Data
Management Office may be required for achieving the IDP's objectives and promoting data-driven decision-making
across sectors in India
Data-sharing practices through marketplaces have gained significant traction internationally, revolutionizing the way data
is exchanged and utilized across borders. These marketplaces serve as intermediaries, facilitating seamless transactions
between data providers and data consumers from different countries and industries. International data marketplaces
have become a pivotal force in promoting data-driven decision-making, powering innovation, and driving economic
growth. Through secure and standardized platforms, organizations and individuals can now share, buy, and sell datasets
on a global scale, breaking down traditional barriers and enabling a more interconnected and collaborative approach to
data sharing.
One of the key advantages of international data sharing through marketplaces is access to a vast and diverse pool of
datasets from various regions and domains. Organizations can tap into valuable data sources that might have been
previously inaccessible due to geographic or industry-specific constraints. This democratization of data fosters a richer
understanding of global challenges and opportunities, supporting cross-border research, market analysis, and data-driven
policymaking. Moreover, data marketplaces often prioritize data privacy and security, adhering to international data
protection standards, and implementing encryption and access controls, ensuring that sensitive data is shared responsibly
and ethically. As these marketplaces continue to evolve and gain trust, they hold the potential to revolutionize industries,
spur innovation, and lead the way in shaping the future of international data collaboration.
Referring to the core crux of the data marketplace for public data sharing, several international examples stand out, each
managed by various entities such as private companies, academic institutions, and government bodies. These platforms
offer valuable datasets for a fee, enabling access to a wide range of information for research, analysis, and decision-making
purposes.
Microsoft Azure Cloud Marketplace is one of the leading data marketplaces, offering an extensive array of datasets across
various domains. Managed by Microsoft, this platform caters to both private and public sector organizations. Data
available on this marketplace includes demographic information, geospatial data, financial data, health data, and more.
Users can access these datasets on a pay-per-use basis, allowing them to obtain valuable insights and leverage Microsoft's
data processing capabilities.
Amazon Web Services (AWS) also hosts a paid data marketplace that serves as a hub for diverse datasets. The AWS Data
Marketplace is a part of the AWS ecosystem and is managed by Amazon. It provides users with access to datasets on
topics such as weather patterns, economic indicators, market trends, and more. With the pay-as-you-go pricing model,
users can access these datasets seamlessly through AWS infrastructure, empowering them to draw meaningful
conclusions and drive innovation.
The World Bank, a prominent international financial institution, manages a data marketplace known as the World Bank
Data Catalog. This platform offers a wide range of public datasets related to global development, economics, social
49
indicators, and environmental statistics. These datasets are openly accessible, and researchers, policymakers, and
businesses can utilize them to gain insights into global trends, challenges, and opportunities.
Google Cloud Platform provides the Google Cloud Public Datasets, a collection of publicly available datasets that cover
various fields such as genomics, climate, geospatial data, and more. Managed by Google, this platform encourages data
sharing for research and innovation, and the datasets are often made available at no cost or with minimal fees to cover
storage and processing expenses.
The European Union (EU) operates the European Union Open Data Portal, which acts as a central repository for open data
from EU institutions and agencies. Managed by the European Commission, this portal offers a vast range of datasets
covering topics like agriculture, economy, environment, and social issues. These datasets are freely accessible, promoting
transparency, collaboration, and data-driven policy-making within the EU.
The information given above represents just a fraction of the thriving data marketplaces globally, each contributing to the
democratization of data and enabling data-driven decision-making across sectors. As data becomes an increasingly
valuable asset, these platforms play a critical role in facilitating data sharing and advancing knowledge and innovation in
the digital age.
50
REGULATIONS LEADING TO DATA SHARING – INTERNATIONAL EXAMPLES
The following table provides examples of private data sharing across different countries and sectors. Note that this list is
not exhaustive and provides indicative examples that have been encountered. Note, in these cases the government is
involved and there is no charge involved in these cases.
Automobile Australia Automobile seller companies should share Right to Equitable and
complete service and repair information Professional Auto Industry
about their cars with 30k+ independent Repair (REPAIR) Act
auto repair and servicing companies
Transport Singapore Availability of taxis across the city of any Singapore Open Data
company License (SODL)
Availability of car parking places
Housing Singapore Registration information on the sale and Singapore Open Data
rental of properties in Singapore License (SODL)
51
Sector Country Data Shared Supporting Regulation
Healthcare Singapore Sharing healthcare data between private EMRX (Electronic Medical
hospitals and clinics: discharge summary, Record Exchange)
test reports, medical operations, drug
allergies, medicines prescribed, cardiac
reports, etc.
Healthcare USA Healthcare insurance data sharing in the Health Insurance Portability
USA between parties. and Accountability Act
(HIPAA)
52
WORKING GROUP 3:
CHAIRMAN:
MEMBERS:
53
INTRODUCTION
In the context of emerging technologies, data is a foundational building block. Availability of quality datasets as indicated
by the collection, management and processing methods and practices, can highly influence the efficacy of data-based
innovation like artificial intelligence (AI) and improve public service delivery.
To realise the full potential of India’s digital government vision of maximising the efficiency of data-led public service
delivery and catalyse data-based research and innovation, the Ministry of Electronics and Information Technology (MeitY)
released the Draft National Data Governance Policy (NDGP). The policy provides an institutional framework for governing
data collection, management, processing, storage, and access processes and systems through the National Data
Management Office (NDMO).
INSTITUTIONALISING NDMO
The executive office of the NDMO will be set up with an official gazette notification under Article 73 on the lines of the
executive Pension Fund Regulatory and Development Authority (PFRDA) and executive Securities and Exchange Board of
India (SEBI). Both these regulators were established as executive agencies even before the PFRDA and SEBI Acts were
passed. The same legal power will be exercised here.
A draft office order, as detailed below, expands on the functions of the NDMO in alignment with the NDGP, along with
the processes it will follow to discharge these functions, and the powers of the MeitY to create standard terms, and
conditions of service for NDMO and Data Management Unit (DMU) officials, make financial allocations, etc. The office
order may include the below-mentioned details:
Term of CEO:
5 years, which may be extended with the approval of the Central Government.
To set up appropriate data governance frameworks in alignment with the National Data Governance Policy with a focus
on three central functions – Data Integrity & Audit, Data Regulation and Data Management.
• Data Integrity & Audit: To conduct periodic audits of systems and processes related to the collection, storage,
and use of data within the Central government and ensure compliance against rules/standards/guidelines issued
by the NDMO; to prepare annual reports reviewing NDMO and DMU’s performances in the previous financial
year.
• Data Management: To create and manage the institutional Framework for non-personal data collection,
processing, and access; to prescribe guidelines and standards for data storage & retention management
practices; to create, operate and maintain the India Datasets Platform; to develop, in consultation with the
Central Government, metrics for measuring outputs and outcomes under the NDGP; to prepare whitepapers,
consultation papers, and reviews pertaining to any of its functions.
54
• Data Regulation: To prescribe meta-data and data standards and oversee their adoption; to prescribe standards
and processes for non-personal data identification, classification, and access; to create standards and processes
for data security, privacy preservation, and anonymization standards; to develop a grievance redressal
mechanism related to the discharge of its functions; to provide dedicated data management capacity for
ministries/departments; to conduct training, courses, workshops, and seminars on data management, standards,
and best practices relating to the collection, storage, and use of non-personal data for government officials.
Staffing:
• The NDMO will be staffed by, among others: [CEO, Head of the Standards and Policies Division, Head of the HR
& Finance Division, Head of the Platforms and IT Division, Head of the Legal Division, Head of the Audit and
Compliance Division, consultants]
• The NDMO shall have the power to employ individuals or engage consultants to discharge its functions.
• The terms and conditions of service of employees and consultants shall be framed by the NDMO in consultation
with the Central Government.
Processes:
• The NDMO shall, from time to time, prepare a plan for discharging its functions in consultation with the Central
Government, including the priority of its objectives, and the approach towards establishing the DMUs.
• The NDMO shall follow a consultative process while framing or creating standards and guidelines with the public
and within government departments.
• All standards and guidelines, the initial drafts for each, and the inputs received, where appropriate, shall be
published on the website of the NDMO.
Procurement and contracting: The NDMO shall follow the General Financial Rules of the Government of India for
purchasing any goods or services required to discharge its functions.
The business which NDMO may not transact: - Except in the pursuit of its objectives under this notification or any other
law for the time being in force, the NDMO must not –
• Purchase any capital, including any shares of any bank or any other person or grant loans against such capital or
shares; or
• Advance money on the mortgage or otherwise against the collateral of immovable property or documents of title
relating to such immovable property or becoming the owner of immovable property.
• The provisions of clauses a and b will not prevent the NDMO from acquiring or holding property necessary for its
business or residential premises for its use.
• The NDMO must constitute a fund, to which the following amounts may be credited –
iii. all sums received by the NDMO from any other sources.
i. the salaries, allowances, & other remuneration of members & employees of the NDMO;
ii. expenses incurred by the NDMO for the discharge of its functions; and
55
iii. expenses incurred by the NDMO in furthering the objectives of this notification.
• The Central Government may make grants to NDMO pursuant to the preparation of an annual budget by the
NDMO, and its consideration by the Central Government.
• The NDMO must maintain accounts and other relevant records and prepare an annual statement of accounts in
such form as may be prescribed by the Central Government in consultation with the Controller and Auditor-
General of India.
• The accounts of the NDMO must be audited annually by the government auditor.
This report has been prepared to provide recommendations and inputs around the objectives and functions of NDMO and
its governance structure. The report also details potential outcomes that NDMO should target in its initial years of
operations. Finally, the report also details the structure of the DMUs to be set up under the NDMO.
The NDMO shall be set up with the objective of streamlining data governance and enhancing the quality of data for
catalysing data-based innovation and research. The NDMO will coordinate closely with government ministries,
departments, states, and other relevant stakeholders (within and outside the government) to streamline data quality and
accelerate its use for large-scale social transformation. Through its efforts, NDMO will not only enable informed policy-
making and efficient governance but will also allow a new generation of startups and researchers to bolster innovation
for high-priority use cases, enter new markets and drive growth in the Indian economy.
NDMO should be set up as an independent and separate entity funded by MeitY, as per Section 6 ‘Institutional Framework’
of the National Data Governance Policy.
a. Chief Executive Officer (CEO): The NDMO may be headed by a senior executive with over 25 years of experience
spread across technology, technology policy and industry, with demonstrated expertise in data management and
instituting data governance frameworks.
i. Provide strategic direction in terms of framing and execution of the implementation strategy for the
NDMO with a focus on three functions – Data Integrity & Audit, Data Regulation and Data Management.
56
ii. Oversee overall governance of the NDMO and ensure adherence to its implementation as stated in the
National Data Governance Policy.
iii. Oversee Quarterly Reviews of the performance and progress of various DMUs and provide strategic
direction.
iv. Collaborate with external stakeholders to harmonise existing efforts being undertaken on data
governance and ensure implementation of the India Datasets Program and India Datasets Platform
(IDP).
b. Functional Divisions: The CEO could be supported by heads of respective functional divisions to implement the
operations of the NDMO effectively and build dedicated technical and functional capacity. The following divisions are
suggested for efficient management of the ecosystem:
i. Data Integrity & Audit: This division will be responsible for conducting periodic audits of systems and
processes related to the collection, storage, and use of data within the Central government and ensure
compliance against rules/standards/guidelines issued by the NDMO. This division will also be
responsible for preparing annual reports assessing NDMO and DMU’s performances in the previous
financial year.
ii. Data Management: This division will be responsible for creating and managing the institutional
Framework for non-personal data collection, processing, and access along with developing, in
consultation with the central government, metrics for measuring outputs and outcomes under the
NDGP. It will also prescribe guidelines and standards for data storage & retention management practices
within the central government, and prepare whitepapers, consultation papers, and reviews pertaining
to any of its functions. This division will also be responsible for creating, operating and maintaining the
India Datasets Platform.
iii. Data Regulation: This division will prescribe meta-data and data standards and oversee their adoption
for non-personal data identification, classification and access and for data security, privacy preservation,
and anonymization. It will also develop a grievance redressal mechanism related to the discharge of the
functions of NDMO and DMU.
iv. HR & Finance: A dedicated team to be onboarded to ensure adherence to project budgets and
undertake hiring activities.
v. Legal: Address legal matters pertaining to non-personal data governance. Additionally, it will provide
legal counsel to the NDMO CEO and other functional divisions including assisting them with the
legislative drafting of standards/policies/guidelines, MoUs etc.
c. Project Management Unit (PMU): The PMU should comprise of experts equipped to undertake due diligence on
technical and sectoral matters. Each divisional head will be supported by these experts from the PMU depending on
their functional and technical expertise. These experts should actively track the global developments in the data
sharing and management ecosystem, recent efforts by states to ease G2G data sharing for improving public service
delivery, and related trends to provide timely information for suitable actions by NDMO. The experts will undertake
relevant functional responsibilities for their assigned division, including:
i. Create sector-agnostic frameworks and SoPs for data sharing along with necessary tools and guidelines
for Data Anonymization, Metadata & Quality Standards etc.
ii. Support the onboarding of stakeholders for the India Datasets Program
57
iii. Oversee the overall implementation of the India Datasets Platform as envisioned under the NDGP.
iv. Monitor and review the performance of DMUs and provide interventions for their capacity building.
v. Collaborate with and guide the ecosystem partners in converging ongoing initiatives on data
governance.
vi. Undertake capacity-building initiatives for government officials and citizens for the promotion of data
governance.
FUNCTIONS OF NDMO
The central functions of NDMO are divided into three pillars – Data Integrity & Audit, Data Regulation and Data
Management.
i. The NDMO shall ensure that adequate processes (including checklists and standards) are developed to
facilitate the conducting of independent, periodic data and system audits of each ministry/department by
the DMU in the assigned timeline and process. These audits shall ensure ministries/departments comply
with the rules/standards/guidelines on data governance issued by the NDMO from time to time.
ii. The NDMO shall also come out with detailed guidelines on the corrective action to be taken in case of non-
compliance along with the time frame for compliance. It shall also do a post-compliance to determine that
the non-compliance has been rectified.
iii. For the new and upcoming systems, it should be mandatory to obtain a data compliance certificate from the
designated audit agency(s) before the launch of the ‘go-live’ of the systems. Even the projects which are
being done on a pilot basis shall be required to follow this.
iv. Real-time monitoring of uploaded and shared datasets and visualisation of metadata from
Ministries/Departments is critical to track progress. A unified tracking dashboard will be built as part of the
India Datasets Platform to enable oversight and monitoring of the data-sharing activities across all Ministries
to enhance transparency and accountability on the shared datasets.
b) Data Regulation
i. The NDMO shall create standards in accordance with principles of privacy by design, promote the use of
privacy-enhancing technologies and systems, and adapt global best practices and standards as feasible to
India.
ii. Data dissemination will be streamlined through standards that make data portals, dashboards, repositories,
etc. discoverable. NDMO will publish meta-data and data standards that cut across sectors.
iii. With a focus on machine readability and searchability, the standards will provide the salient features and
standards that every repository/portal should adhere to and allow departments/schemes/states/urban local
bodies to customise their approach to data dissemination. This federated approach will be key to the India
Datasets Platform’s approach of being an umbrella platform that improves discoverability for all public data
in the country. The NDMO will also create standards for the architecture and design of the India Datasets
Platform.
iv. NDMO will build on MeitY’s metadata and e-governance standards to develop a standard vocabulary
applicable across sectors for geographic identifiers, among others. This vocabulary will enrich MeitY’s Digital
58
Service Standard and enable government stakeholders to design data systems for high-quality data
collection, which will impact subsequent sharing/dissemination.
v. NDMO will define the conditions for access and use of data within and beyond the government ecosystem.
The NDMO will notify protocols that will determine access to non-personal datasets under restricted access
while ensuring privacy, accountability, security, and trust for accessing and using data.
vi. The NDMO will decide the genuineness and validity of data usage requests. Requests for data on any of the
government’s data portals will be addressed based on the final classification guidelines issued by the NDMO.
vii. The NDMO shall create standards and principles for the ethical and fair use of data based on global standards
and best practices and would be developed through public consultation.
viii. With the new data governance regime, there will arise a need for a redressal mechanism. This would be
established in coordination with the Digital Personal Data Protection Bill (DPDP Bill) to address overlaps and
ensure a transparent and accountable data ecosystem. NDMO would also coordinate with other
agencies/data protection boards/bodies across states, centre, and ministries to ensure harmony across
various policies.
ix. The NDMO will institute a mechanism for the requesting entities to request datasets, register grievances and
establish the responsibility of DMUs under the NDMO to respond in a time-bound manner. This will
supersede existing mechanisms used in the government data portals to harmonise the process and provide
a reliable process to request data.
x. The NDMO shall take effective steps to build capacity across DMUs to ensure compliance with its standards
around anonymisation, data protection, privacy-by-design, and allied issues. This shall require different kinds
of training for different levels of officials within the NDMO and the Government
c) Data Management
i. NDMO will prescribe a comprehensive and evolving set of guidelines and frameworks that will act as the
common minimum sector-agnostic frameworks for data storage and retention across sectors. The NDMO
shall require DMUs to file periodic reports with updates on the datasets stored and maintained by them, in
a manner and timeline established by the NDMO.
ii. The NDMO will create guidelines for securing data storage and retention against potential malicious attacks,
including data breaches, distortion of datasets, and de-anonymization with support from the Data Protection
Board of India (DPBI) and Indian Computer Emergency Response Team (CERT-in). Other public authorities
(such as the National Health Authority) will be required to use the NDMO’s standards and data retention
policies to build upon sector-specific guidelines and harmonise data storage and retention frameworks
across Ministries/Departments.
iii. To ensure better government-to-government coordination, the NDMO shall issue guidelines to
Ministries/Departments for the creation of searchable data inventories and data dictionaries with clear
meta-data standards on the India Datasets Platform. All datasets in the India Datasets Program will only be
accessed through the IDP and any other NDMO-designated and authorised platforms. NDMO-authorised
platforms will integrate with the IDP to promote data discoverability.
iv. A separate and secure module within the IDP will be developed to enable seamless data sharing between
Ministries/Departments. This will allow Ministries/Departments visibility into the data inventories within the
government and streamline data access for governance and public service delivery. Such a module will allow
government users to explore a catalogue of available data, select the desired subset, make an access request,
and receive data through an online workflow. Each request shall have its audit trail.
59
v. The DMUs may require the Chief Data Officer to coordinate with other Ministries/Departments and the
NDMO in addition to their Ministries/Departments to secure alignment.
vi. The NDMO shall create rules for access to datasets from different Ministries/Departments. These rules shall,
inter alia, identify datasets, different user groups with whom data may be shared, the extent of sharing
access to restricted datasets for specific user groups, and bulk sharing norms. Classification criteria become
integral for the proper and ethical usage of the data.
vii. The data classification rules create standards that will work in concert with the processes and rules laid out
in the DPDP Bill and the NDGP.
The Development Monitoring and Evaluation Office (DMEO) is the monitoring and evaluation division of NITI Aayog and
should continue to monitor and evaluate different initiatives. DMEO initiated the exercise of the Data Governance Quality
Index (DGQI) by seeking information from Ministries/Departments on various schemes/initiatives. DMEO may share the
learnings of their DGQI initiative with NDMO as it shall help in building up its processes. Moreover, with the formation of
NDMO, there is a government body which shall be responsible for policies, guidelines, rules and standards related to data
management in the whole of the government ecosystem. Hence, going forward all initiatives undertaken by the DMEO
should align with the NDMO directives. Once the guidelines and procedures related to data management have been
released by NDMO, the DMEO may realign the DGQI questionnaire and exercise as per the standards defined by NDMO
and submit its findings to NDMO to inform corrective actions.
The National Data Sharing and Accessibility Policy (NDSAP) governs the current data governance ecosystem and access
practices across government ministries/departments. The NDSAP implementation guidelines mandate the setting up of
NDSAP cells (data offices) under each line ministry/department/agency. While the guidelines briefly detail out roles and
responsibilities of chief data officers (CDOs) and Data Contributors (DCs) within NDSAP cells, they do not provide a detailed
structure for NDSAP cells.
In the absence of defined guidelines on the structure of the NDSAP Cell and its composition, NDSAP cells are not consistent
across line ministries/departments/agencies. While some ministries have well-structured NDSAP cells with CDOs,
designated data contributors for different programs/applications/projects, analytics officers, platform managers, etc
other cells have few designated officers within the Data office. This inconsistency also translates into the
performance/contribution of line ministries towards the data objectives of the country. For instance, capacity limitations
of CDOs and data contributors have often resulted in the slow pace of tasks focused on open government data (OGD).
To streamline the accessibility and availability of quality data, as indicated by its completeness, accuracy and reliability,
an institutional mechanism of DMU within each Ministry/Department is proposed under the NDGP. The DMU will
operationalise efforts towards transforming the data management practices of the Ministries/ Departments to improve
the quality and use of public sector data and catalyse the data-based research and innovation ecosystem.
The key role of the DMU would be to break silos within the Ministry/Department to ensure the creation of synchronised
compliance with relevant data policies/guidelines/rules as set forth by the NDMO, streamlining data collection practices
and improving the quality of data.
60
STRUCTURE OF DATA MANAGEMENT UNITS
Studying the existing institutional structures for data management and recognising their limitations, and their advantages,
MeitY has proposed the following structure for DMUs. The best practices adopted to set up Data Strategy Units and
structure data offices within the Ministry of Housing and Urban Affairs have been given due consideration to specify the
roles and structure of DMUs.
It is proposed that the NDMO create a CDO council which can have rotating conveners and members from DMUs of
different Ministries which can help in the exchange of best practices and effective coordination between the Ministries.
In principle, each DMU should reflect the following structure specified in Figure 2.
a. Chief Data Officer (CDO): The DMUs may be headed by an Additional Secretary/Joint Secretary level officer as the
designated CDO, who shall work closely with the NDMO to ensure the implementation of this policy.
b. Data Monitoring, Technology, Statistics and Data Analytics units: The existing structure of Data Strategy Units may
be realigned to enable the effective implementation of the NDGP through the below-mentioned verticals:
• Data Monitoring Unit: The unit will lead a synergistic approach to enable the monitoring and
management of data-related initiatives and activities.
• Statistics Unit: To support the overall statistical needs of the Ministry/Department as well as to enable
leveraging data for decision-making around policy and program design, evaluation, etc.
• Technology Unit: For ensuring 100% digitization and integrating siloed Management Information
Systems/dashboards/data systems of the Ministry/Department.
• Data Analytics Unit: For undertaking and promoting data analysis on collected data into meaningful
insights which may aid decision making. The Unit would foster a culture of evidence-based policy-making
within the Ministry/Department.
61
All the DMUs should have an identified NIC nodal officer attached to it in their respective Ministry/Department as most
of the applications in Ministries / Departments are under the purview of NIC. This shall also facilitate efficient operations
and quick redressal of concerns.
c. Data Scientist and Data Analyst: To augment dedicated data capacity within line Ministries/ Departments and deliver
against outcomes of effective and accelerated data-based innovation with public data, two data analysts and one
data scientist will be placed as data fellows in Ministries/Departments.
As the DMUs build the internal infrastructure and practices for the use and access of data, Data Fellows may be leveraged
to focus on potential/novel use cases and proof-of-concept studies that are within the scope of the Ministry/Department’s
data. They may also be able to work across multiple teams/ministries to spot opportunities for merging data and
improvements to the existing data infrastructure, quality and processes.
However, depending on requirements and resource availability, the size of each DMU could vary in terms of positions and
personnel. Many of the functions detailed below could be performed by one person, especially in DMUs situated within
non-data-intensive ministries/departments. Additionally, some Ministries/Departments have statistical and economic
advisors to facilitate the use of data for policy formulation, these officers may be included in the proposed DMU structure.
The minimum standard for the DMU must include 1 CDO, 2 Data Analyst, 1 Data Scientist and 1 IT support person.
Based on insights from past studies, specific medium-term targets and transparency are key to ensuring Ministries/
Departments are able to realise value from participating in the process and have the necessary support to succeed.
To achieve the objectives as envisaged under the policy, the key outputs and milestones for the NDMO have been given
below. This may, however, be reviewed or modified from time to time based on the maturity of the data ecosystem:
- Establishment of the - Publish meta-data standards & data quality -Issue Data Access -Publish
National Data standards (Phase 1) & Licensing standards &
Management Office Agreements principles for
- Issue Data Identification & Classification Fair & Ethical
- Hiring of personnel to Framework -Publish Data Use of Data
staff DMUs Anonymisation
- Issue Guidelines for disclosure norms Guidelines
- Setting up of DMUs in
Ministries/ - Launch of India Datasets Platform - Publish standards
Departments Meta-Data & Data
- Launch of India Datasets Program Quality Standards
- Preparation of Data (Phase 2)
Strategy by DMU for - Draft guidelines for the creation of searchable
the Ministry/ data inventories at the Ministries/ Departments
Department level.
62
WORKING GROUP 4:
CHAIRMAN:
MEMBERS:
63
BACKGROUND & INTRODUCTION:
The proposed scheme seeks to enhance the outcome of the MeitY scheme by providing the maximum support to the
growing AI startup ecosystem. The startup ecosystem is comprised of various stakeholders that play a vital role in fostering
the growth and achieving the success. The AI Startup ecosystem comprise of industry, government institutions, academia,
international partners/ecosystems, investors, etc., it is critically important to involve all of them proactively to support
the overall ecosystem development. With the vision to build the next 100 AI Unicorns in the country, India AI is envisaged
as an umbrella program to catalyze the AI innovation ecosystem in the Country. For realizing the vision of "Make AI in
India and Make AI work for India", funding mechanisms for AI innovation need to be explored.
1. Artificial Intelligence is a constellation of technologies that enable machines to act with higher levels of
intelligence and emulate the human capabilities of sense, comprehend and act. Thus, computer vision and audio
processing can actively perceive the world around them by acquiring and processing images, sound and speech.
The natural language processing and inference engines can enable AI systems to analyze and understand the
information collected. Moreover, AI has evolved in ways that far exceed its original conception. With incredible
advances made in data collection, processing and computation power, intelligent systems can now be deployed
to take over a variety of tasks, enable connectivity and enhance productivity.
2. The artificial intelligence (AI) sector is growing rapidly, with global spending on AI solutions expected to reach
$407 billion by 2027. This growth is being driven by a number of factors, including the increasing availability of
data, the development of more powerful computing platforms, and the growing demand for AI-powered
solutions across a wide range of industries.
3. AI has the potential to transform various industries by providing tailored solutions to specific challenges.
Industries such as healthcare, finance, retail, manufacturing, transportation, and agriculture can benefit from AI-
powered applications and systems designed to address their unique needs. Developing industry-specific AI
solutions can lead to improved efficiency, cost savings, enhanced decision-making, and better customer
experiences. AI can also be used for optimizing the production processes, predict maintenance needs, and
improve quality control.
4. Startups are leveraging artificial intelligence and machine learning technologies to develop ground-breaking
solutions across various industries. In addition, the AI startups also led to automation, personalizing the learning
experiences, developed adaptive learning systems and all these created new opportunities across a variety of
industries.
5. AI as a platform spans hardware, software, and on-demand services. All three categories have very different
players, although there is some overlap between hardware and software players. AI-as-a-Service is provided by
the top cloud, it allows companies to make AI a part of their existing applications to make more accurate
predictions, automate the decision-making processes, and get optimized solutions.
6. In recent years, there have been significant advances in machine learning, deep learning, and natural language
processing. These technologies are making it possible to develop AI algorithms that can perform tasks that were
previously impossible. The key opportunities for AI startups lie in addressing industry-specific challenges through
intelligent automation, predictive analytics, and data-driven decision-making. AI-powered solutions have the
potential to streamline operations, optimize processes, and enhance productivity across various sectors, such as
healthcare, finance, manufacturing, retail, and more.
7. AI can also be used for optimizing the production processes, predict maintenance needs, and improve quality
control. Apart from this AI, also led to automation, personalizing the learning experiences, developed adaptive
learning systems and all these created new opportunities across a variety of industries.
8. India has the transformative potential of artificial intelligence and the importance of nurturing a robust AI
ecosystem to drive innovation, economic growth, and societal development. Initiatives like the National AI
Strategy, Startup India, and Digital India have been launched to foster AI innovation, provide funding and
resources to startups, and promote collaborations between industry, academia, and government institutions.
9. Government of India has taken a number of initiatives to promote the development and use of artificial
intelligence (AI). These initiatives include:
64
i. The National Artificial Intelligence (AI) Strategy, was launched in 2020. The strategy aims to make India
a global leader in AI by 2030.
ii. The National AI Mission, was launched in 2021. The mission aims to promote the development of AI-
based solutions for a wide range of sectors, including healthcare, agriculture, and education.
iii. The National AI Platform, which was launched in 2022. The platform provides a common infrastructure
for AI researchers and developers to share data, tools, and resources.
iv. The National AI Fellowship, which was launched in 2022. The fellowship provides funding for early-career
AI researchers to pursue research in India.
v. The National AI Ethics Guidelines, which were released in 2022. The guidelines provide a framework for
the responsible development and use of AI in India.
DATA
Data is the fuel that powers artificial intelligence (AI). Without data, AI systems would not be able to learn or make
predictions. The quality and quantity of data that is available to AI systems is essential for its success. AI systems can be
designed to learn continuously from new data over time enabling them to adapt and improve their performance as they
come across new samples and patterns. Further, Data is crucial to train the AI models that powers autonomous vehicles,
collecting the data from sensors. The data-driven AI techniques had already been making significant contributions in the
field of healthcare, fraud detection, speech recognition, and customer services etc.
Startups need diverse and representative data sets to evaluate the performance, robustness, and reliability of their
algorithms. By using comprehensive data sets, they can identify and address potential limitations, biases, or errors in their
models, ensuring they deliver accurate and trustworthy results. The high-quality data sets are of paramount importance
for AI startups. They serve as the foundation for training, validating, and improving AI models. Access to diverse, high-
quality, and representative data sets enables startups to develop accurate, reliable, and innovative AI solutions that meet
the requirements of their target markets.
COMPUTE/INFRA
Compute is one of the most important infrastructure components for AI startups. These startups have specific compute
and infrastructure requirements to support their operations and development of AI solutions. As operation of AI models
can also be computationally intensive, and volume of compute resources that are needed for the AI models will depend
65
on the size and complexity of the model, as well as the number of users that are accessing the model. In addition to
training and running AI models, startups may also need compute resources for other tasks, such as data processing,
visualization, and debugging. Further, recently the use of cloud-based services offer flexibility in cost management by
allowing startups to pay for resources on-demand or based on usage. Additionally, startups can also explore options like
serverless computing or resource optimization techniques to minimize costs without compromising performance.
The specific compute and infrastructure requirements may vary depending on the AI startup's focus, scale, and industry.
Evaluating the needs of AI algorithms, data processing requirements, scalability, security, and cost considerations are
essential for AI startups to build a robust and efficient infrastructure.
KNOWLEDGE/EXPERTISE
AI startups often deal with large volumes of data, requiring expertise in data engineering. Expertise in the specific domain
or industry that the AI startup is targeting is highly valuable. It is vital to understand the nuances, challenges, and
requirements of the domain helps in developing AI solutions that are relevant, impactful, and aligned with customer
needs. Further, AI startups are also collaborating with premier institutions, incubation centers for accessing the state-of-
the-art research facilities and pool of talented faculty and students who are experts in AI. This can help AI startups to
develop and deploy their products and services more quickly and efficiently.
FUNDING
The funding of AI startups consists of investment from venture capitalists, angel investors, or through other funding
channels to procure financial resources necessary for the startup's operations, research, and development activities. AI
startups can pitch their ideas, demonstrate their technology and market potential to secure funding from VC firms
specializing in technology, AI, or deep tech. VC funding provides financial resources along with guidance and industry
connections. Additionally, angel investors can offer financial support, mentorship, expertise, and networking
opportunities.
The large corporations particularly those with a strong interest in AI, create a dedicated accelerator programs or venture
arms for provide funding and support to AI startups. These programs offer access to resources, mentoring, and potential
collaborations with established industry players. Funding is essential to fuel their research and development efforts,
acquire and manage data, access infrastructure and computing resources, attract and retain talent, drive marketing and
business development, protect intellectual property, and scale their operations. Adequate funding provides startups with
the necessary resources to innovate, compete in the market, and bring their AI solutions to the execution.
INDUSTRY
AI startups are emerging in various other industries, including education, entertainment, insurance, real estate, legal
services, and more. AI startups in agriculture are leveraging AI to enhance crop yield prediction, smart logistics, route
optimization, predictive maintenance, quality control, robotics, and supply chain optimization etc. In manufacturing AI
improving efficiency in manufacturing and supply chain operations through predictive maintenance, quality control,
demand forecasting. In Healthcare AI has revolutionized by aiding in medical diagnosis, drug discovery, personalized
medicine, telemedicine, and health monitoring. The potential applications of AI are vast, and startups are continuously
exploring new ways to leverage AI technology to solve complex problems and drive innovation in different sectors.
PROCUREMENT
The procurement for the AI startups refers to the process of acquiring the necessary resources, technologies, and services
to support the development and operations of the startup. Additionally, startups may need to procure data management
and analysis tools, data labeling platforms, visualization tools, and project management software. AI startups may also
66
need to procure external services or expertise to augment their capabilities. This could involve engaging consultants, data
scientists, or subject matter experts on a project basis.
Centres of Excellence (CoEs) to achieve the vision of becoming the global hub for AI research and innovation, it is vital to
position India’s CoEs as global CoEs and to increase collaboration between domestic and global AI experts. The CoEs are
being set up to bring together experts from academia, industry, and research entities to work on foundational AI research,
cutting-edge AI applications and scalable solutions. These CoEs will focus strongly on societal application sectors starting
with healthcare, agriculture, and manufacturing. They will work towards creating frontier research as well as developing
and scaling innovative solutions to address critical challenges in the sectors as well as help commercialize and
contextualize existing solutions.
Further, AI-CoEs will also enable knowledge creation, exchange and capacity building to a new generation of researchers,
innovators, and entrepreneurs. They will also promote collaboration among Indian and global academia, industry, and
research entities. In addition to acting as a knowledge hub, it will also provide incubation facilities to enable start-ups to
develop their ideas. CoE’s will strengthen India position in the fast-changing global AI landscape by furthering research
and fostering innovative applications while addressing ethical concerns and ensuring that the technology is used for the
betterment of society.
● Develop and scale innovative cost-effective and efficient solutions to address critical challenges in nationally
critical sectors, including healthcare, agriculture, and manufacturing.
● Facilitate global knowledge exchange and capacity building through research collaborations.
● Provide training and development opportunities to the next generation of researchers, innovators,
entrepreneurs, and promote collaboration between academia, industry, and research entities.
● Aid and facilitate the commercialization of research and development outputs and create new revenue streams
by working with industry partners to bring new products and services to the market.
● Strengthen existing incubation facilities for the AI start-ups to develop their ideas, nurture talent and provide
resources such as mentorship, technical assistance, funding, and access to networks.
● Enhance the domestic AI innovation ecosystem by granting access to world-class AI infrastructure and experts
through different schemes.
● Increase the penetration of AI solutions both domestically and internationally to expand the range and depth of
AI applications and create new markets.
● Strengthen partnerships with global AI centres, as well as industry, start-up ecosystems, MSMEs, and other
stakeholders.
SCHEME OBJECTIVES
• Scheme will empower AI startups to make AI enabled products / solutions for India and the world.
• It is aimed at developing & utilizing the available R&D ecosystem and promoting innovation in AI and related
emerging technology space.
67
• Establishing funding mechanism for the comprehensive AI Startups program and leveraging transformative
technologies to foster inclusion, innovation, and economic growth.
• Enabling the access to the state-of-art infrastructure for AI through CoEs and developing the new infrastructure
for various for use of emerging technologies/Deep Tech. Startups.
• Scheme shall also initiate a collaboration with central government organizations, states, industry, academia and
international organizations for development and deployment of emerging technologies, skilling, and capacity
building activities.
• The scheme will enable the funding mechanism to support Investment in the Early-stage startups,
commercialization, and growth of AI startups.
• The scheme shall also strengthen community building initiatives including workshops, capacity building activities,
conferences etc. to strengthen the ecosystem and recognition & promotion of AI Startups
• There is a great opportunity for IndiaAI scheme to cater to both the external as well as internal stakeholders of
the ecosystem through extensive performance monitoring, quality indicators, measurement, and reporting.
SALIENT FEATURES
● Measuring, monitoring, and reporting compliance and performance of MeitY Assets, Programs, & Schemes.
● Sourcing, executing, monitoring, and reporting on challenges and supporting AI startups, etc.
● Building linkages with global AI startup ecosystems and aggregating the resources from global and local
industries, institutions, and agencies for the benefit of AI startups
● Optimizing and enhancing the performance of MeitY Assets, Programs, & Schemes via capacity-building
programs
● Creating awareness and engagement of a larger number of AI startups via effective and pervasive media
and marketing programs.
SCHEME OUTPUTS/DELIVERABLE
MeitY Startup Hub (MSH) has envisaged an Future Design IndiaAI scheme with a budget of 945 Crores from and matching
funding of Rs 3000 Cr. for 5 years to discover, support, grow and accelerate successful AI startups with emphasis on
collaborative engagement among AI startups, government, and corporates for promoting and scaling up new technology
and innovation.
Scheme envisages scaling up and sustaining the tech ecosystem especially to discover, support, grow and make successful
AI startups and unicorns. The program envisages impacting and consolidating 725 tech start-ups over the course of the
next 5 years to pave the road for an inclusive AI startup ecosystem, one that evenly represents the aspirations of our
ambitious entrepreneurs for inclusive techno-socio-economic development of the country.
68
MeitY, through IndiaAI startup support scheme aims to support to approx. 725 AI startups via incubators, accelerators,
matching funding, and fund of fund basis. These initiatives are expected to strengthen the innovation, entrepreneurship,
and economic growth in the fields of emerging & niche technology areas.
Via Digital
Investment Via MeitY AI
Via MeitY AI COEs Innovation -
Mechanism COEs
Fund
Investment
Category
Investment Round Pre-seed Seed Post-Seed -
Max Investment
Amount/Startup ₹ 0.30 - ₹ 5.00 -
(Rs. Cr)
Max Investment
Amount/Challenge 0 ₹ 3.00 -
(Rs. Cr)
Funding
Amount &
Allocation % Max Number of
Grand - 3 - -
Challenges/Year
Total Startups
100 15 30 135
Supported/year
Total Startups
Supported (5 500 75 150 725
Years)
69
Future Design IndiaAI Startup Scheme
Total MeitY
Funding/Year (Rs. ₹30.00 ₹ 9.00 ₹ 150.00 ₹ 209.00
Cr)
Total MeitY
Funding 5 Years ₹ 150.00 ₹ 45.00 ₹ 750.00 ₹ 1,045.00
(Rs. Cr)
Matching
Contribution 5 - - 3,000.00 ₹ 3,000.00
Years (Rs. Cr)
Total Scheme
Corpus 5 Years
150.00 45.00 3,750.00 ₹ 3,945.00
including Matching
Contribution
The IndiaAI scheme funding is designed to unlock the enormous potential of India’s startup ecosystem, to catalyse AI-led
innovation, by providing necessary support at different stages in a startup lifecycle like product development, business
development and growth stage, to support them in translating ideas to reality. Financing AI Innovation shall unlock Indian
startup ecosystem’s potential to rise as a leader in Artificial Intelligence.
Product Development Stage: An AI Ignition Grant Program is proposed to support 500 AI/deep-tech startups at product
development stage with grant of Rs 30 Lakhs each. It would support innovators and budding entrepreneurs in transiting
from the ideation to minimum viable product stage. The grant amount shall be based on a cost break-up, as demonstrated
by the applicant. Such grants will be channelized through the existing Incubation Support of MeitY located at Tier-1, Tier-
2 and Tier-3 cities for an inclusive and sustainable growth across India.
Commercialization Stage: At the commercialization stage 3 grand challenges of 3 Cr. each across various sectors
(Agriculture, Health, Cybersecurity, Weather Forecasting, etc.) will conducted. In 5 years of span 15 challenges with a total
budget of Rs 45 Cr.
Growth Stage: In the growth stage the Fund of Funds is proposed to support around 150 startups over a tenure of 5 years,
through daughter funds. The Government of India will be the anchor investor and provide total budgetary support of Rs.
750 Crores over a period of 5 years. The investment manager of the fund/funds/VC’s shall be tasked to raise Rs 3000
Crores or more. MeitY contribution per startup will be max of Rs 5 crore.
70
ABOUT INDIAAI FUTURE DESIGN FUNDING MECHANISM
● A total Budget outlay of Rs 945 Cr. for 5 years from MeitY and Matching Contribution of 3000 Cr. in 5 years
● Each startup will receive maximum of 30 Lakhs grant as a pre-seed i.e product development funding from MeitY
(Via AI CoEs). In the product development stage total of 500 AI startups will be supported with the total funding
of Rs. 150 Cr. in 5 years. For the early-stage startups funding will be done without any matching fund requirement
● The growth stage for AI startups under the fund of fund, each startup will receive investment of Rs 5 Cr. with a
matching funding of 1:4. The 150 AI startups will receive maximum funding of Rs 750 Cr from MeitY(Via Digital
Innovation Fund) and matching funding of Rs 3000 Cr (Via Fund Manager/VC)
● The total MeitY contribution under product development, and growth stage will be of Rs 945 Cr. supporting 650
startups in 5 years period.
● The scheme is expected to generate significant product development, & IP creation will generate 3-6 times
returns in period of 5 years.
● For product development and commercialization, stage MeitY/MSH investment committee comprising of
investment experts and domain experts.
● The budget of 3 Cr. per challenge Conducting 3 Challenges per Year across key sectors. A total 15 grand challenge
in 5 Year period with total budget of 45 Cr is proposed.
AI is a rapidly growing field with the potential to transform many industries. The scheme may initially focus on the
following 3 priority sectors - governance (DPI), Healthcare and Agriculture. The challenge statement and call for
applications shall be designed and implemented for three sectors governance (DPI), Healthcare and Agriculture initially.
In addition, MSH will design and implement the challenges and call for applications for these 3 sectors in consultation with
line Ministries along with CDAC for execution of Future Design India AI Scheme. The scheme will preferably support the
AI startups in field of governance, healthcare, and agriculture via incubators, accelerators, matching funding, and fund of
fund basis. Scheme is aimed to address the key challenges across sectors and further enabling the new technology
practices, sustainable implementation, and efficient governance. Some of the recent use of AI in Healthcare, Agriculture
and Governance are described as below:
HEALTHCARE
AI is revolutionizing healthcare delivery, ranging from diagnostics and personalized medicine to patient monitoring and
data analytics. Startups focusing on AI-driven medical imaging, telemedicine, disease prediction, and drug discovery have
great potential in the Indian healthcare sector.
• Predictive analytics for Public Health: AI tools can be used to inform public health policy. Predictive analytics
algorithms can detect patterns and risk factors that help identify individuals at high risk of developing certain
conditions and intervene early with targeted prevention strategies, improving outcomes and reducing overall
healthcare costs.
• Faster and better Diagnosis: AI algorithms can analyse and detect patterns and connections that may not be
apparent. This process can amplify the efforts of experts such as radiologists and pathologists to perform better
diagnosis.
71
• Predictive Analysis: AI can analyse a patient's complete medical history in real-time and link it with symptoms
and chronic illnesses that run in the family. The outcome can be transformed into a predictive analytics tool that
can identify and treat a disease before it becomes fatal.
• AI-powered drug discovery: AI can be used to accelerate the drug discovery process by identifying potential new
drug candidates and predicting their efficacy and safety. This can significantly reduce the time and cost involved
in bringing new drugs to market.
AGRICULTURE
AI solutions can enhance agricultural productivity, optimize resource utilization, and enable precision farming. Agriculture
is impacted by a variety of factors from weather patterns and precipitation to irrigation models and crop-patterns. This is
where AI can be used to learn, interpret and analyse different data points to make accurate predictions that can maximise
productivity.
• Precision Farming: AI algorithms can analyse data from various sources, such as sensors, weather stations, and
drones, to provide farmers with real-time insights into crop health, soil moisture, and nutrient levels. This
information can enable farmers to optimise resource utilisation, such as water, fertiliser, and pesticides, resulting
in increased yields, reduced costs, and improved food security.
• Crop Monitoring: AI can be used to monitor crops throughout the growing season, allowing farmers to detect
and respond quickly to changes in conditions. For example, satellite imagery analysis can help identify crop
health, predict yields, and estimate water requirements, enabling farmers to make informed decisions about
planting, fertilising, irrigating, and harvesting crops
• Supply Chain Management: Artificial intelligence technologies have the potential to dramatically improve the
management of input supplies, such as seeds and fertiliser, as well as optimise the processing of output produce.
AI algorithms can examine historical data, market trends, and other pertinent elements to predict the demand
for agricultural products with accuracy. Farmers and consumers can track the origin, methods of production, and
quality of agricultural products utilising technology like blockchain, sensors, and IoT devices.
GOVERNANCE (DPI)
The involvement of AI startups enabled robust & efficient digital infrastructure development. Moreover, AI startups played
a significant role in integrating AI capabilities into the DPI ecosystem, enhancing its effectiveness and impact. The use AI
startups in governance has been gaining traction in recent years owing to the potential benefits associated to decision-
making processes, optimizing public services, and deploying new & innovative solutions.
AI INVESTMENT TRENDS
In recent years, AI has emerged as a highly attractive investment opportunity, garnering substantial financial support from
venture capital firms, corporate investors, and governments. Huge investments have been engrossed towards AI startups
and research initiatives. This surge in investment is motivated by the recognition of AI's potential to disrupt industries and
revolutionize business processes. Following the are key trends witnessing the growth of AI.
Increased Focus towards AI Startups: The investment landscape has witnessed a surge in AI-focused startups. Companies
are developing innovative AI technologies, applications, and platforms across various sectors, including healthcare,
finance, retail, transportation, and more. Investors are actively seeking out promising AI startups to fund their growth and
development.
Increase in Investment for AI in Healthcare: The healthcare industry has seen a substantial increase in AI investments. AI
has the potential to revolutionize healthcare delivery, diagnostics, drug discovery, patient monitoring, and personalized
medicine. Investors are recognizing the transformative power of AI in improving patient outcomes, reducing costs, and
enhancing efficiency.
72
Significant Rise of AI in Financing: The financial sector has also witnessed significant AI investments. Financial institutions
are leveraging AI to automate processes, enhance fraud detection, optimize trading strategies, and improve customer
experience. AI-powered chatbots and virtual assistants are being used to provide personalized financial advice and
support.
Use of AI in Research & Development: Besides startups, significant investments are being made in AI research and
development. Technology giants are investing heavily in AI research labs and initiatives. These investments aim to push
the boundaries of AI capabilities, develop new algorithms, and drive innovation across various domains.
Industry-specific AI solutions: There is a growing trend of investors looking for AI solutions tailored to specific industries.
Rather than generic AI platforms, companies that provide industry-specific AI applications are attracting attention. These
solutions address the unique challenges and requirements of sectors like healthcare, finance, manufacturing, retail, and
agriculture.
Personalized customer experience: Delivering personalized customer experiences has become a top priority for many
businesses. AI enables companies to analyze vast amounts of customer data and provide tailored recommendations,
product suggestions, and targeted marketing campaigns. Investors are interested in AI startups that specialize in customer
analytics, recommendation engines, and personalized marketing platforms.
AI in Edge Computing: Edge computing consists of processing data closer to the source rather than relying on centralized
cloud infrastructure, has gained prominence. AI algorithms are being deployed at the edge to enable real-time data
analysis, reduce latency, and enhance privacy. Investors are looking for AI startups that combine edge computing and AI
to create efficient and intelligent edge devices and applications.
SCHEME BENEFICIARIES
Beneficiaries include all the stakeholders engaged in the creation, promotion, and acceleration of the startup ecosystem
in India. Some of the key stakeholders which shall be benefited from the scheme are as follows:
● Financial incentives and design infrastructure support will be extended to domestic startups.
● Startups shall be defined as per the DPIIT notification dated 19th February 2019 or extant norms.
● Startups should be primarily working on AI technologies across any areas of Vision, Language, Text, Large datasets
and building products that solves a large problem using AI.
● The approved applicants that claim incentives under the scheme shall retain their domestic status (i.e. more than
50% of the capital in it is beneficially owned by resident Indian citizens and/or Indian companies, which are
ultimately owned and controlled by resident Indian citizens) for a period of three years after claiming incentives
under the scheme
73
GOVERNANCE MECHANISM
The scheme will be implemented through MeitY Startup Hub (MSH). MSH will receive the applications under the scheme
and carry out financial and technical appraisal of such applications. It will implement the scheme, submit periodic reports
to Ministry of Electronics and Information Technology regarding the progress and performance of the scheme and carry
out other responsibilities as assigned by Ministry of Electronics and Information Technology from time to time. The
functions and responsibilities of MSH under this scheme will be elaborated in the Scheme Guidelines to be issued by
Ministry of Electronics and Information Technology separately. For carrying out activities related to implementation of
the Scheme, MSH will inter-alia:
● Receive the applications, issue acknowledgements, verify eligibility of the applicants for support under the
Scheme and issue approvals under the Scheme.
● Empanel agency / agencies or consultants as deemed necessary to carry out the technical and financial appraisal
of the applications as well as evaluate expertise of the applicants in the field of AI startups.
● Establishing AI COEs under the scheme either by itself or through other incubator(s) to provide such services to
the startups.
● Examining claims eligible for disbursement of fiscal support/grant under the scheme and disburse those as per
eligibility.
● Operational oversight will be provided by Project Review and Steering Committee headed by JS (SIIPR Division)
and comprising of members from ministry and nominated from private sector.
● The overall scheme will be managed by the governing council chaired by Secretary MeitY with member secretary
JS (SIIPR Division)
Nodal Agency/Project Management Agency: Future design IndiaAI Scheme is proposed to run jointly by C-DAC (Centre
for Development of Advanced Computing), a scientific society operating under MeitY, and MSH. CDAC shall act as Nodal
Agency/Programme Management Agency (PMA) for implementing, executing, and monitoring the progress of AI startups
under CoEs. Further, C-DAC on behalf of MSH shall receive, appraise applications, verification of eligibility and examination
of claims/funds through any method / document deemed appropriate. C-DAC shall also contribute towards AI-Startup
ecosystem development, engaging with startups, global technology majors, industry associations, academia, and sector
experts.
Empowered committee (EC) – For the overall administration and management of the IndiaAI scheme, an Empowered
Committee will be constituted by MeitY, under the chairmanship of Secretary, MeitY, officers of MeitY along with
representation from CDAC, Industry and leaders from AI startup fraternity. The empowered committee may review the
status/progress of the scheme once in every 3- or 6-months duration.
Scheme Management Committee (SMC) - under the chairmanship of JS/GC (startups, Innovation & IPR) comprising of
officials/ reps from MeitY, MSH, Industry experts and representatives from startup ecosystem, will be constituted by MSH
with the approval of MeitY for selection of IAs and for shortlisting of AI startups. This committee will be tasked with
evaluating and onboarding IAs and startups as per the scheme guidelines. Key recommendations including selection of IAs
will be submitted to Empowered Committee for approval.
PRSG committee will be constituted under the chairmanship of JS/GC (Startups, Innovation, and IPR Division) to review
technical and financial progress of the scheme periodically. The PRSG is also entrusted with the responsibility of disbursal
of funds and reappropriation of budget heads within the approved outlay of the scheme.
74
To ensure adequate governance and compliance, MSH will require the following information and rights from its
empanelled implementing agencies for performance evaluation on a periodic and as needed basis.
▪ Quarterly Portfolio report: Quarterly portfolio performance report will be provided by the Implementing Agency
to facilitate quantification and assessment of the impact being generated by the Investment fund or the AI
Startups
▪ Board Meeting and Investment Committee Meetings Minutes: Implementing Agency will have to share the
minutes of the board meeting as well as the investment committee meetings so that there is transparency and
accountability about deliberations and investment/grant decisions.
▪ Investment Decision Memorandums: Investment memorandums prepared by the Implementing Agency for
supporting their investment decisions will need to be shared for record-keeping and to understand the rationale
behind investment decision-making.
SCHEME TENURE
Applications under the Scheme will be open for five (5) years from 01.07.2023. The applications received under the
Scheme will be appraised on an ongoing basis and implementation will continue as per the approvals accorded under the
Scheme.
Mid-term appraisal of the scheme will be done after two years of its implementation or as per recommendations of MSH
to assess the impact of the scheme, offtake by the applicants and economy in terms of the stated objectives. Based on
such impact assessment, decision will be taken to increase the tenure of the scheme and change its financial outlay with
the approval of the Minister of Electronics and Information Technology.
APPROVAL
● The applications received under the scheme will be appraised on an ongoing basis by MSH.
● Approval to the selected applicants shall be accorded by the MSH and communicated to the applicant under
intimation to Ministry of Electronics and Information Technology
● Ministry of Electronics and Information Technology shall make budgetary provisions for disbursal of fiscal
support/grants to approved startups under the scheme. The disbursement shall be done by MSH based on
approval conditions. MSH will submit budgetary requirement to Ministry of Electronics and Information
Technology as consolidated amount on regular basis and not on project-by-project basis.
DISBURSEMENT PROCESS
The fiscal support against the different categories (product development, commercialization, growth) of the scheme shall
be released after the approval of the application and achievement of milestones as included in the approval letter.
GeM or Government e-Marketplace is a completely paperless, cashless and system-driven e-marketplace that enables
procurement of everyday use goods and services with minimal human interface. GeM offers a wide variety of advantages
for buyers and sellers. It eliminates human interface in vendor registration, order placement and payment processing to
a great extent. GeM is an open platform that offers no entry barriers to bonafide suppliers who wish to do business with
75
the government. At every step, an SMS and e-Mail notification are sent to buyers, their head of the organization, paying
authorities, and sellers.
After a sandbox validation the DPIIT & MSH Registered Startups may also be enrolled for proving their services and offering
on the GeM portal, as this enables procurement of AI Startups service to Government Buyers and public buyers.
SCHEME GUIDELINES
The Scheme Guidelines shall be issued by Ministry of Electronics and Information Technology (MeitY) separately with the
approval of competent authority.
The scheme and its guidelines shall be reviewed and amended periodically or as per requirement with the approval of
competent authority.
76
WORKING GROUP 5:
IndiaAI FutureSkills
CHAIRMAN:
MEMBERS:
77
EXECUTIVE SUMMARY
Artificial Intelligence has enabled computers to identify intricate patterns that were previously difficult to detect. In recent
times, there has been a surge in the complexity of such machine learning models, aided by the advancements in deep
learning. The growth of these models has been so remarkable that around 84% of users who interact with AI remain
unaware that they are engaging with an AI system. This highlights the increasing sophistication of AI systems and their
ability to seamlessly integrate into our daily lives. Additionally, AI has not only enabled systems to learn from pre-existing
examples but also to generate new examples from scratch, further expanding their capabilities.
AI has diverse applications across various sectors and industries including Healthcare, Finance, Manufacturing,
Transportation, Retail, Education, Agriculture, Energy and Utilities, eGovernance, Media and entertainment, Sports,
Judiciary, Real estate, Construction, Hospitality and Tourism. Rapid automation and emergence of new AI-based tools and
technologies have had a significant cross-sectoral impact on skilling. The World Economic Forum predicts AI to create over
97 million new jobs by 2025. This underscores the significant role AI is envisaged to play in shaping the future of work,
with AI-related jobs and skill sets spanning across various industries and sectors. As AI technology continues to evolve and
become more advanced, it is likely to create even more job opportunities in India and globally.
As AI is becoming increasingly integrated into various industries, there is a growing demand for workers with AI-related
skills. This has led to the need for upskilling and reskilling programs to help workers adapt to the changing skill
requirements in AI. Additionally, the development of AI-enabled education platforms and the integration of AI into formal
training programs are becoming essential to prepare professionals for the jobs of the future.
To leverage such a focused skilling ecosystem in AI, various strategies are to be implemented which includes enhancement
of AI-based skilling curriculum, encouraging AI-focused entrepreneurship development and innovation for startups and
MSMEs, fostering partnerships and collaborations between industry-academia-research to leverage training and learning
opportunities, promoting continuous learning through reskilling/upskilling platforms, providing mentorship, and
facilitating AI-based platforms for high-end research and development.
Keeping these aspects in view, the working group has reviewed various existing AI based curriculum and has emphasised
a model curriculum framework involving K12 interventions, Graduate/Post Graduate level interventions, AI based
research along with competency/ design considerations. Further, the Working group has evolved and categorised various
key recommendations which includes:
• Model Curriculum & Repository: A comprehensive AI curriculum which covers the fundamentals of AI,
mathematics and statistics, machine learning, deep learning, NLP, computer vision, reinforcement learning, AI
ethics, practical projects, and continual learning is recommended. To ensure that curricula keep pace with rapid
technological changes, (a) A central curriculum repository need to be part of the IndiaAI portal (b) Institutions to
share their curricula, and users can provide comments and suggestions (c) Model curricula to be evolved from
this pool and incentivized (d) A curriculum committee consisting of experts from academia and industry to make
recommendations on changes in curricula. Given the thrust on Gen AI, it is suggested that relevant tools and their
usage may be made an integral component of the model curriculum.
• Framework: The framework categorises courses/programs under key focus areas, including technology-specific
(algorithms, LLMs, etc.), infrastructure-specific (GPUs, specialised accelerators, Cloud, HPC, etc.), application-
specific (sectorial, domain, etc.), best practices (ethical/responsible AI), AI users (non-technical manpower
trained to understand/use/contribute to datasets), and awareness (sectoral opportunities, threats, etc.)
• Collaborative & Competitive ecosystem: Leveraging Collaborative & Competitive ecosystem among Schools,
Graduates, Postgraduates and Research through various AI based interventions.
78
• Resource/Talent for Startups and MSME: To leverage innovation among Indian startups and MSMEs, there is a
need to prioritise AI-related research, encourage academic collaboration, and upskill non-IT workforce. Efficient
mechanisms to share data for research and academic purposes should be established, and academia should be
encouraged to pursue patents and IPR in AI. Cloud-based AI Compute facilities should be made available for joint
research with startups and MSMEs, and higher education curricula should include advanced AI-related
technologies. Talent creation in AI infrastructure and sector-specific use cases should also be given due
weightage.
• Research fellowships: Initiatives like research fellowships, grants for building research capability in tier 2-3-4
institutions, international conferences/journals in AI, research mentorship for selected scholars, and sharing of
AI-related thesis and datasets are encouraged.
• Faculty training in AI: To ensure the quality of faculty/trainers in AI, they need to receive authentic training and
industry exposure, and engage in regular knowledge upgradation. Institutions should mandate that faculty
engage in knowledge upgradation once every two years, and incentivization for doing so can be considered.
Additionally, a scheme to facilitate "industry internship" for faculty dealing with AI topics can be implemented.
• Career Path Mapping: Diverse career path mapping in AI which includes AI researcher, machine learning
engineer, data scientist, AI architect, NLP engineer, computer vision engineer, AI product manager, AI ethicist, AI
consultant, and AI entrepreneur.
These recommendations are aimed to address the growing demand for AI-related skills and prepare the Indian workforce
and students for the future of work in AI and related areas. By adopting and implementing these strategies the Working
Group expects that Indian organisations and individuals would stay up to date with the latest AI technologies and remain
competitive in the AI specific job market through its new vision – “A transformative Approach: From Job Takers to Job
Providers”.
INTRODUCTION
AI SCENARIO (2023)
Artificial Intelligence will occupy the centre stage across all industries and sectors from pharma to defence, automobile,
environment and sustainability, healthcare, and IT to name a few. Consumer behaviour will drive brand experience using
AI. According to IDC research, worldwide spending by governments and businesses on AI technology is expected to top
$500 billion in 2023.
With AI in every corner of the enterprise, mounting regulation and increasing demand for responsible AI is increasing. AI
governance will become a board-level topic together with cybersecurity and compliance, which will include explainability,
79
fairness audits of high-impact algorithmic decision-making, environmental impacts of AI, etc. While globally, AI is worth
$120 billion as a market, it is growing by more than 20% each year, and is expected to reach a total of $1.5 trillion by 2030.
AI is also expected to create more than 97 million new jobs by 2025. It is also indicated that around 35% of businesses are
already using AI tools in their everyday operations. It is also estimated that around 84% of people who use AI are unaware
that they are interacting with an AI. Further, the global market for AI-specialised hardware will grow 9x to $90 billion by
2030.
GENERATIVE AI
Generative AI is becoming increasingly popular in the business world, due to three primary factors: improvements in
models, better and more data, and greater computing capacity. Recent years have seen an increase in sophistication in
machine learning models. By using deep learning, computers can now learn complex patterns in data that were previously
difficult to discover. This has allowed systems to not only learn from existing examples but also to create new examples
from scratch. It can be used for a variety of tasks such as automating customer service, improving product
recommendations, and creating personalised content. The generative AI technologies currently in focus, such as ChatGPT,
DALL-E, and LaMDA, are distinguished by three main characteristics: 1) their generalised rather than specialised use cases,
2) their ability to generate novel, human-like output rather than merely describe or interpret existing information, and 3)
their approachable interfaces that both understand and respond with natural language, images, audio, and video.
Just as the migration from command line programming (e.g., MS-DOS) to graphical user interfaces (e.g., Windows) enabled
the development of programs (e.g., Office) that brought the power of the personal computer to the masses, the intuitive
interfaces of the current generation of AI technologies could significantly increase their speed of adoption. For example,
ChatGPT surpassed 1mn users in just 5 days, the fastest that any company has ever reached this benchmark. On the whole,
AI with its generative counterpart is the talk of the town today, with huge expectations on various aspects of life and work
in the time to come. Countries are investing efforts to understand and exploit this technology, and human resources is a
major challenge in this journey.
As per survey, AI has a diverse impact on jobs, creating jobs in some spaces and eliminating jobs in others. In fact, as per
some survey, about 38% of employees expect some or all aspects of their jobs to be automated by the end of 2023.
Roughly 13% of people think that they will eventually lose their jobs to automation. Despite that, AI is expected to create
more jobs than it replaces. In fact, there are currently 97 million new AI jobs projected to be created by 2025 while 85
million jobs are expected to be replaced by AI. Job losses due to AI-driven automation are more likely to affect low-skilled
workers, leading to a growing income gap and reduced opportunities for social mobility.
Institutes such as the Indian Institute of Science (IISc), IIT Madras, Indian Statistical Institute (ISI) Kolkata, the National
Centre for Software Technology Mumbai and the Tata Institute of Fundamental Research (TIFR) were set up as nodal
agencies leading the front on developing critical aspects of AI in India. In a national effort around, 1985, several AI-based
applications have emerged, including, IIT Madras’ ‘Eklavya’, a knowledge-based program designed to support community
health workers in dealing with symptoms of illness in toddlers, CDAC’s ‘Sarani’, a flight scheduling expert system and IISc’s
Computer Vision based image processing facility. India’s R&D capabilities in AI have since been growing steadily. Between
2010 and 2016, national institutes of importance such as the IISc, IIT Bombay, IIT Delhi, IIT Madras, IIIT Hyderabad, IIT
Kanpur, IIT Kharagpur and ISI Kolkata feature among the top universities/research institutes for AI in India. India ranks
10th globally in terms of number of PhDs in AI, and 13th in terms of presentations in top AI research conferences.
However, there remain significant challenges in developing, adopting and using AI in India, as will be spelled out in the
subsequent sections of this report. The discussion explains India’s position behind current world leaders such as the US
and China. An international conference called KBCS was created by this effort and sustained by NCST. Through this effort,
India successfully brought the prestigious international conference IJCAI to India, in 2007. Much work in the area of natural
80
language processing, including machine translation, applications of AI for practical problems like scheduling, were initiated
as part of this effort.
AI MARKET IN INDIA
Source: NASSCOM Report: India Data Science & AI Skills Report: Demand-Supply Analysis
https://nasscom.in/system/files/publication/data-science-and-ai-skills-feb-2023-final-new.pdf
81
ETHICS & THREATS
AI ethics refers to the issues that stakeholders (from engineers to government officials) need to consider ensuring artificial
intelligence technology is developed and used responsibly. This means taking a safe, secure, humane, and environmentally
friendly approach to AI. Poorly constructed AI projects built on biased or inaccurate data can have harmful consequences
on minority groups and individuals. The threats of AI also include job displacement, privacy concerns, and the potential
for AI systems to be used for malicious purposes. There are also concerns about the potential for AI systems to be biased
or discriminatory. Search-engine technology is not neutral as it processes big data and prioritises results with the most
clicks relying both on user preferences and location. Thus, a search engine can become an echo chamber that upholds
biases of the real world and further entrenches these prejudices and stereotypes online.
As can be seen, ethics is a major concern in the space of AI. Human resource development in the field of AI, needs to equip
the candidates with a balanced outlook and understanding of this field, and this will be one of our concerns in the
curriculum design.
TALENT REQUIREMENTS
AI SKILLING IMPACTS
The cross-sectoral impact of AI on skilling in India is significant and transformative. Some key points highlighting the impact
of AI on skilling in India include:
• Upskilling and Reskilling Opportunities: AI provides opportunities for individuals to upskill and reskill themselves
in emerging areas of technology. As AI technologies become more prevalent across industries, there is a growing
demand for skilled professionals who can develop, implement, and maintain AI systems. This creates a need for
training programs and courses that can equip individuals with the necessary AI skills. As mentioned earlier, AI is
expected to upset the job space, by making some jobs disappear and new jobs to come up. It is widely believed
that this loss can be made up by reskilling such human resources in AI related jobs. Similarly, those with basic
knowledge of computing can be upskilled to AI capabilities. Thus, these two channels are important channels for
human resource development.
• AI-enabled Education Platforms: AI can be leveraged to enhance education platforms and make learning more
personalised and adaptive. AI algorithms can analyse learner data and provide customised recommendations,
adaptive assessments, and intelligent tutoring systems. This enables learners to acquire new skills at their own
pace and receive targeted guidance based on their individual needs.
• Automation and Job Disruption: AI-powered automation has the potential to disrupt certain job roles and tasks,
particularly those that are repetitive and rule-based. This can impact the skill requirements in affected sectors.
However, it also creates new opportunities as tasks that can be automated are replaced by more complex and
creative roles that require human skills, such as problem-solving, critical thinking, and emotional intelligence.
• Emerging AI-related Job Roles: The adoption of AI in various sectors leads to the emergence of new job roles
related to AI development, data science, machine learning, and AI ethics. Skilling initiatives in India need to
address the demand for these specialised roles by providing relevant training programs and courses. This helps
individuals acquire the necessary skills to excel in these emerging fields.
• AI in Vocational Training: AI is substantially altering labour markets, industrial services, agriculture processes,
value chains and the organisation of workplaces in particular. Due to the growing significance of artificial
intelligence, the labour market would have stricter requirements for employees, and graduates with high
professional quality and high comprehensive levels. Therefore, vocational training programmes must correctly
understand AI trends and connect AI with the reform and innovation of vocational education.
• Ethical and Responsible AI Skills: As AI becomes more pervasive, the need for ethical and responsible AI
development and deployment grows. Skilling programs should incorporate ethical considerations, data privacy,
and algorithmic bias awareness to ensure that AI systems are developed and used responsibly.
82
(Source: NASSCOM Report: India Data Science & AI Skills Report: Demand-Supply Analysis
https://nasscom.in/system/files/publication/data-science-and-ai-skills-feb-2023-final-new.pdf)
As the complexity of customer churn grows, retention approaches are also evolving to tackle the churn risk and protect
customer revenue—and AI can play an instrumental role in that. For many leading recurring revenue businesses, AI is
transforming retention by leveraging customer data, advanced analytics and machine learning to extract actionable
intelligence and drive multichannel retention actions. In order to allay the fears/concerns about GenAI amongst both high-
tech IT and non-IT employees, it would be important to train them on AI basics and prompt engineering.
AI ENABLERS
AI enablers in India refer to the factors and resources that contribute to the growth and development of artificial
intelligence technologies in the country. Some key AI enablers in India, include:
• Technological Infrastructure: India's rapidly expanding digital infrastructure, including the availability of high-
speed internet connectivity, cloud computing services, and advanced hardware, provides a strong foundation for
the development and deployment of AI technologies.
• Skilled Workforce: India has a large pool of technically skilled professionals, including engineers, data scientists,
and software developers. The country's strong IT industry and educational institutions produce a steady supply
of talent with expertise in AI-related fields, contributing to the growth of the AI ecosystem.
• Research and Development: India has several research institutes, universities, and technology companies that
actively engage in AI research and development. These institutions and organisations conduct cutting-edge
research, publish papers, and collaborate with global AI communities, fostering innovation and knowledge
sharing.
83
• Start-up Ecosystem: India has a thriving start-up ecosystem, with numerous AI-focused start-ups emerging in
recent years. These start-ups leverage AI technologies to develop innovative solutions across various sectors,
including healthcare, finance, agriculture, and e-commerce. The presence of venture capital firms and incubators
further supports the growth of AI start-ups in the country.
• Government Initiatives: The Government of India has recognized the potential of AI and has taken several
initiatives to promote its adoption and development. The National AI Strategy, launched in 2018, outlines the
government's vision for AI implementation in key sectors. Additionally, programs like the Digital India initiative
and the Atal Innovation Mission provide support and funding for AI-driven projects and start-ups.
• Data Availability: India's vast population and digital transformation efforts have resulted in a significant amount
of data being generated across various domains. Access to diverse and large-scale datasets is crucial for training
and developing sound AI models. The availability of data, combined with advancements in data analytics and
processing techniques, fuels AI development in India.
• Collaborative Ecosystem: Collaboration between academia, industry, and the government plays a crucial role in
fostering AI innovation. India has collaborative platforms such as research centres, innovation labs, and
technology hubs that bring together stakeholders from different domains to work on AI projects, share
knowledge, and drive interdisciplinary research.
These AI enablers collectively contribute to the growth and advancement of AI technologies in India. By leveraging these
resources and fostering a conducive environment, India has the potential to become a global AI hub and harness the
benefits of AI in various sectors, leading to economic growth and societal impact.
AI END-USERS-
The potential AI end-users would play a crucial role in driving the adoption and impact of artificial intelligence
technologies in various sectors. Some key points highlighting the significance of AI end users in India, include:
• Problem Solving and Innovation: AI end users in India, such as businesses, organisations, and individuals, identify
challenges and opportunities where AI can be applied to solve problems and drive innovation. By understanding
their specific needs and requirements, AI solutions can be developed and deployed to address those challenges
effectively. End users act as catalysts for AI-driven innovation by providing real-world use cases and driving the
demand for AI technologies.
• Sectoral Transformation: AI end users span across sectors such as healthcare, finance, agriculture, education,
manufacturing, and more. Their adoption of AI technologies can lead to transformative changes within these
sectors. For example, AI can improve diagnosis and treatment outcomes in healthcare, enable efficient financial
services, optimise crop yield in agriculture, enhance personalised learning in education, and streamline
production processes in manufacturing. The active engagement of end users in implementing AI solutions is
essential for sectoral transformation and progress.
• Feedback and Improvement: Effective end users provide valuable feedback based on their experience with AI
systems. This feedback helps developers and AI experts refine and improve the performance, usability, and
effectiveness of AI technologies. Continuous interaction between end users and AI developers fosters a cycle of
iterative improvement, resulting in more robust and user-friendly AI solutions.
• User-Centric Design: AI end users' needs and preferences shape the design and development of AI systems. By
understanding the requirements, challenges, and context of end users, AI technologies can be tailored to provide
intuitive interfaces, personalised experiences, and relevant insights. User-centric design ensures that AI solutions
are effectively integrated into users' workflows and deliver tangible benefits.
• Ethical Considerations: AI end users play a vital role in promoting ethical AI practices. They can influence the
responsible use of AI by ensuring transparency, fairness, and accountability in AI systems. By demanding and
supporting ethical guidelines and regulations, end users contribute to the development of AI technologies that
align with societal values, respect privacy, and minimise biases.
• Market Growth and Economic Impact: The adoption of AI technologies by end users drives market growth and
economic impact. AI solutions create new business opportunities, enhance productivity, and improve efficiency
84
across sectors. This, in turn, fosters economic growth, job creation, and competitiveness. End users, by embracing
and implementing AI, contribute to India's digital transformation and its position in the global AI landscape.
Thus, the end-users in India would help in shaping the adoption, impact, and responsible use of AI technologies. By actively
participating in the AI ecosystem, end users can contribute to the development of AI technologies that address real-world
challenges and deliver tangible benefits to individuals, organisations, and society as a whole.
AI adoption and employability holds significant promise and potential. The expected trends and opportunities, include:
• Increased AI Adoption, Industries such as healthcare, finance, retail, manufacturing, agriculture, and education
are likely to leverage AI to improve efficiency, enhance decision-making, and drive innovation. As AI becomes
more accessible and cost-effective, smaller businesses and startups are also expected to embrace AI solutions,
contributing to widespread adoption.
• Job Transformation as AI adoption will lead to a transformation in the job market, where certain tasks and job
roles may be automated or augmented by AI technologies. However, it is important to note that AI is more likely
to augment jobs rather than replace them entirely. While some routine and repetitive tasks may be automated,
new job roles will emerge, requiring a combination of technical and soft skills to collaborate with AI systems
effectively.
• Demand for AI Skills: The demand for AI-related skills is expected to rise significantly. There will be a need for
professionals with expertise in areas such as machine learning, data science, natural language processing,
robotics, and AI ethics. Additionally, individuals with domain-specific knowledge who can apply AI techniques to
solve industry-specific problems will be in high demand. Skilled professionals with the ability to develop, deploy,
and maintain AI systems will have stronger employability prospects.
• To meet the demand for AI skills, there will be an increased focus on skill development initiatives in India.
Educational institutions, government bodies, and private organisations are likely to offer training programs,
courses, and certifications to equip individuals with AI competencies. Upskilling and reskilling programs will help
professionals adapt to the changing job market and acquire the necessary AI-related skills.
• The growth of AI adoption in India will also create entrepreneurial opportunities for individuals to start AI-focused
ventures. The startup ecosystem in India is already witnessing the emergence of AI-driven startups, and this trend
is expected to continue. Entrepreneurs can leverage AI technologies to develop innovative solutions, products,
and services across sectors, contributing to economic growth and job creation.
• As AI becomes more pervasive, the importance of ethical and responsible AI practices will also increase. There
will be a greater emphasis on developing AI systems that are transparent, unbiased, and accountable. The
demand for professionals who can navigate the ethical challenges of AI, ensure privacy protection, and address
algorithmic biases will rise, leading to new job roles and opportunities.
As AI technologies continue to advance and mature, there will be a need for a skilled workforce that can harness the
potential of AI and drive innovation. With the right focus on skill development, education, and fostering an enabling
ecosystem, India has the potential to become a global leader in AI adoption, creating opportunities for economic growth
and societal transformation.
The industry plays a crucial role in skilling and reskilling individuals in AI, and key aspects to promote skill development in
AI include:
• Identifying Skill Requirements: The industry has a deep understanding of the skill requirements for AI roles and
can provide valuable insights into the technical and domain-specific skills needed to excel in AI-related job roles.
85
By working closely with academia and training providers, industry stakeholders can contribute to the design of
relevant and industry-aligned AI skilling programs.
• Collaboration with Educational Institutions: The industry can collaborate with educational institutions to
develop AI-focused curriculum and programs that align with industry needs. This can involve providing guest
lectures, workshops, internships, and industry projects to students, giving them exposure to real-world AI
applications and fostering a practical understanding of AI concepts plus involvement in board of studies and
advisory boards.
• Providing Training and Learning Opportunities: Industries can offer training programs and learning opportunities
to upskill and reskill their existing workforce in AI. This can be done through in-house training programs, online
courses, workshops, or partnerships with external training providers. By investing in their employees' AI skills,
industries ensure a competent workforce that can effectively leverage AI technologies.
• Establishing Centers of Excellence: Industries can establish centres of excellence in AI, where research,
development, and training in AI technologies take place. These centres can act as hubs for AI skill development,
bringing together industry experts, researchers, and educators to collaborate, exchange ideas, and create a talent
pipeline in AI.
• Supporting Startups and Innovation: Industries can support AI-focused startups and innovation by providing
mentorship, funding, and collaboration opportunities. This fosters an ecosystem of innovation and
entrepreneurship, driving AI skill development and creating a platform for aspiring AI professionals to thrive.
• Addressing Ethical Considerations: The industry has a responsibility to address ethical considerations in AI
development and deployment. By integrating ethical guidelines, privacy standards, and responsible AI practices
into their AI initiatives, industries can ensure that AI systems are developed and used in an ethical and socially
responsible manner.
By providing guidance; by collaborating with educational institutions; and by offering training opportunities, including
continuous learning, the industry can contribute to the growth and advancement of AI skills, creating a talent pool that
can drive innovation and competitiveness in the AI landscape.
86
AI SKILLING STRATEGY
Many Across Schools Structured learning path on Comfort with AI interface, what Badges/Quizze
AI, Generative AI and its is AI, Basics of AI, AI based tools s
applications, and Responsible and applications for school
AI at the school level projects
87
Scale of Target Segment Details Learning Outcome Assessment
Impact
88
Scale of Target Segment Details Learning Outcome Assessment
Impact
Across Industry Structured learning path on Comfort with AI interface, what Badges
AI, Generative AI and its is AI, Basics of AI, AI based tools
applications, and Responsible and applications connected to
AI field of study
89
Scale of Target Segment Details Learning Outcome Assessment
Impact
Engineering Advanced courses on Large Ability to design and build AI Badges and
colleges and STEM Language Model, AI models, models and solutions completions
Few (IT and Related) AI tools & frameworks, and
Responsible AI
Technical Advanced courses on AI tools Ability to design and build AI Badges and
Professionals & models, Generative AI, models and solutions completions
NLP, and Responsible AI
AI Product creation
(Detection/Prevention)
90
Scale of Target Segment Details Learning Outcome Assessment
Impact
Artificial intelligence (AI) is a hastily evolving subject that requires collaboration among enterprise, academia, and
government to drive innovation and improvement. The interaction between these sectors can be instrumental in
advancing AI technology and applications.
• Industry: Industries can play a critical function in AI improvement by means of using research and development
efforts, growing innovative AI-primarily based services and products, and making use of AI technologies to
enhance efficiency and competitiveness. They can deliver financial resources, expertise, and real-global
challenges to the table. Industry-led tasks can consist of collaborations with instructional institutions, sponsoring
AI projects, organising AI competitions, and setting up AI-centred innovation centres. Industry can also offer
internships and schooling packages to students and researchers, providing sensible experience and fostering
talent development.
• Academia: Academic establishments are at the leading edge of AI research, conducting fundamental research,
developing algorithms, and educating the next generation of AI specialists. They make contributions to the
theoretical and technical understanding of AI, exploring new thoughts, and pushing the limits of expertise.
Academic institutions collaborate with enterprise companions via joint projects and other programs that partner
with young leaders across the globe to foster collaboration, build resilience and strengthen capacity. They offer
educational packages in AI, including undergraduate and graduate courses, specialised AI degrees, and research
opportunities for college students. Academia can additionally play an important function in raising attention
about ethical concerns, privacy, and societal impact of AI.
• Government: Governments act as enablers and regulators in the AI surroundings. They provide conducive
surroundings for AI improvement by way of setting up guidelines, and investment mechanisms. Governments
can invest in AI research and development, offer incentives to AI startups and agencies, and aid infrastructure
improvement. They also have a function in shaping ethical and privacy guidelines, and requirements for AI
applications.
• Industry-led Internships: Industry-led internships can provide precious possibilities for students and researchers
to benefit from practical experience, paintings on actual-global AI tasks, and study from industry specialists.
These internships provide hands-on training, exposure to industry practices and demanding situations, and
potential career opportunities. Industry companions can collaborate with academic establishments to design
91
internship applications, provide mentors and supervisors, and offer monetary aid to interns. Such collaborations
give a boost to the binds among academia and enterprise, bridging the space among theoretical know-how and
practical implementation.
● National AI Strategy: In 2018, the Government of India released a national strategy on artificial intelligence. The
strategy aims to position India as a global leader in AI and leverage AI technologies for social and economic
benefits. It focuses on five key areas: healthcare, agriculture, education, smart cities, and infrastructure. The
strategy emphasises the need for skilling and capacity building in AI.
● National AI Portal: The Ministry of Electronics and Information Technology (MeitY) launched the National AI
Portal in 2020. The portal serves as a comprehensive resource for AI-related news, policies, initiatives, and
research in India. It provides information on AI adoption, training programs, and collaborations in the country.
● AI for All: The NITI Aayog, a government policy think tank, released the discussion paper "AI for All" in 2018. The
paper highlights the potential of AI in India and discusses its implications for various sectors, including healthcare,
agriculture, education, and governance. It emphasises the need for inclusive and responsible AI development.
● Draft National Strategy on AI: In June 2018, the NITI Aayog released a draft national strategy on AI, inviting public
comments and suggestions. The strategy focuses on leveraging AI for social inclusion, economic growth, and
national development. It outlines key objectives, policy recommendations, and implementation strategies.
● AI Task Force Report: The Ministry of Commerce and Industry constituted an AI Task Force in 2017 to develop a
roadmap for AI adoption in India. The task force released a report in 2018, highlighting the potential impact of AI
on various sectors and recommending strategies to promote AI research, development, and deployment in India.
PPP (public-private partnership) Models: Public-private partnerships (PPPs) are collaborative models that bring together
the resources, expertise, and capabilities of both public and private sectors to address complex challenges and drive
innovation. In the context of artificial intelligence (AI), PPP models can be utilised to promote AI research, development,
and deployment, while ensuring alignment with societal needs and ethical considerations. Here are some common PPP
models relevant to AI:
● Research Collaborations: Public and private organisations can form partnerships to conduct collaborative
research in AI. This involves joint projects, sharing of research facilities, and pooling of expertise and resources.
These collaborations can focus on developing new AI algorithms, models, or technologies that have potential
applications across various domains. This can also be used to make the research practically relevant, by focusing
on real problems with real data.
● Innovation Clusters and Centers: Governments, academic institutions, and industry can collaborate to establish
AI-focused innovation clusters or centres. These initiatives bring together researchers, entrepreneurs, and
industry experts to foster collaboration, knowledge exchange, and commercialization of AI technologies. These
clusters can provide shared infrastructure, funding support, and networking opportunities for AI startups and
companies.
● Data Sharing Initiatives: Public-private partnerships can facilitate the sharing of data between government
entities, private organisations, and research institutions. Quality data in good measure is the most critical
requirement for quality work in AI. This collaboration allows access to diverse datasets, which are crucial for
training and improving AI models. Data sharing initiatives should uphold privacy and security protocols while
promoting responsible and ethical use of data. Open Data Sets are critical to conduct research or build an
application to solve real world problems in multiple domains. Successful models of PPP data sharing enablement
92
include AWS data exchange programs making available free datasets on Environment, Social & Corporate
Governance (ESG), public sector, weather, air quality, satellite imagery and many more third-party datasets - that
can otherwise be hard for researchers to access or analyse.
● Standards Development: PPPs can play a role in setting industry standards for AI technologies. These
partnerships involve collaborations between industry, government agencies, and standardisation bodies to
establish guidelines, protocols, and benchmarks for AI development and deployment. This improves
interoperability, fairness, transparency, and accountability in AI systems.
● Skills Development and Training Programs: PPPs can support initiatives aimed at developing a skilled AI
workforce. This involves collaborative efforts to design and implement training programs, workshops, and
educational courses that equip individuals with AI-related skills. By combining the expertise of academia,
industry, and government, these programs can address the evolving demands of the AI job market.
● Policy and Regulatory Frameworks: Public-private partnerships can contribute to the development of AI policies
and regulations, which is emerging as a serious concern across the globe. Governments can collaborate with
industry stakeholders, research institutions, and civil society organisations to establish frameworks that address
ethical, legal, and social implications of AI. These partnerships ensure that AI technologies are deployed
responsibly, protect privacy rights, and mitigate potential biases or discriminatory outcomes.
Qualification Packs (QPs) and National Occupational Standards (NOSs) are frameworks that outline the skills, knowledge,
and competencies required for specific job roles in a particular industry. Some of the in-vogue job-roles relevant to AI
include:
• Artificial Intelligence Engineer: A QP or NOS for an AI engineer might cover areas such as machine learning
algorithms, deep learning, natural language processing, computer vision, data pre-processing, model evaluation, and
deployment of AI solutions. It could also include skills in programming languages like Python, knowledge of AI
frameworks like TensorFlow or PyTorch, and expertise in working with large datasets.
• Data Scientist: A QP or NOS for a data scientist in the AI field could encompass skills in data analysis, statistical
modelling, data visualisation, feature engineering, predictive modelling, and machine learning techniques. It may also
include proficiency in programming languages such as Python or R, knowledge of database querying languages like
SQL, and experience with tools like Jupyter Notebook or Tableau.
• AI Ethics and Policy Specialist: With the increasing importance of ethical considerations in AI, a QP or NOS for an AI
ethics and policy specialist might cover topics like responsible AI development, fairness, transparency, privacy, and
bias mitigation in AI systems. It could involve understanding legal and regulatory frameworks related to AI, evaluating
ethical implications of AI algorithms, and developing policies and guidelines for AI governance.
• AI Project Manager: A QP or NOS for an AI project manager might focus on skills related to project management
methodologies, resource allocation, risk assessment, stakeholder management, and budgeting specific to AI projects.
It could also include knowledge of AI technologies, familiarity with agile development practices, and the ability to
oversee the implementation of AI solutions within a project context.
The specific QPs and NOSs available for AI-related roles can vary by country, region, and industry. Some of the QP & NOSs
for Indian Industries are at Annexure-I.
As the nature of work changes with automation, millions of people may need to switch their profiles and acquire new
skills. This will impact the structure of India’s existing labour force. While research has established that AI has the potential
93
to create net new jobs, a lack of relevant skills might mean that a majority of the displaced workforce may remain
unemployed. Countries such as Australia and the US have launched programs that actively seek to encourage the
migration of skilled professionals in science, technology, engineering and mathematics (STEM). Specific profiles that are
likely to get replaced include the role of data entry clerks, cashiers, financial analysts, telemarketers, customer-service
executives, etc. Focused collaborations at the sector level, engaging students with corporates, can help build directly
adaptable skills for the industry.
Currently, about 23 institutes in India offer B.Tech programs in AI. There remains tremendous potential to offer more
wide-spread training in AI technology. Moreover, AI needs to be understood as being socially embedded, interacting and
affecting individuals and communities in myriad ways. It is therefore suggested that AI training go beyond technology
curricula to include social sciences, to contribute to the process of constructing the algorithm and in conducting an
algorithmic impact assessment.
It is estimated that about 30% of the existing workforce would require re-skilling/ up-skilling to stay relevant and about
50-60% of the workforce would require re-skilling/ up-skilling on continual basis in emerging technologies, so as to retain
the edge that India has in the IT sector due to growing automation, emergence of disruptive as well as new technologies,
etc. It would be challenging to create/maintain physical training infrastructure to match the emerging technical
requirements, due to the high rate of technological obsolescence coupled with faster peaking of newer technologies
appearing on the horizon, practically within about 6-12 months’ time. Also keeping in view that the working professionals
may have difficulty attending regular classroom based training programmes, offering them a choice of on-line platforms
for learning with multiple skilling options would encourage any-time, any-where, self-paced learning for acquiring newer
and industry relevant skill-sets. To cater to this huge demand and requirements of the industry at large, a ‘Re-skilling/ Up-
skilling Framework’, that is technology enabled for the employees of IT sector, has been evolved in active consultation
with NASSCOM – which is also the IT-ITeS Sector Skills Council. Through this initiative (titled FutureSkills PRIME), industry
and the Government is aiming to re-skill/ up-skill a total of about 1.4 million employees over a period of five years (1
million from company supported skilling (B2B) and another 0.4 Million from Government supported/ facilitated Scheme
(B2C)).
A sample GenAI Graduate Level Course Level course provides a 16-week course curriculum covering a wide range of topics
related to generative AI models across different modalities, including text, image, audio, VR/AR, and robotics. More details
at Annexure-II.
Modes of education in AI can vary depending on the candidate profile, institution and the specific program or course being
offered. Here are some common modes of education in AI can include:
94
• Bootcamps and Intensive Programs: AI bootcamps and intensive programs offer focused, immersive learning
experiences. These programs are typically short-term and intensive, providing practical hands-on training in AI
skills and technologies. They often incorporate real-world projects and industry collaborations.
• Corporate Training and In-house Programs: Many companies offer AI education and training programs for their
employees. These programs may be conducted in-house or outsourced to training providers. They aim to upskill
employees in AI technologies relevant to their work domains.
• Research and Academic Institutions: Universities, research institutions, and academic centres play a significant
role in AI education. They offer undergraduate and postgraduate degree programs, specialised research
opportunities, and access to AI laboratories and resources.
• AI Competitions and Hackathons: Competitions and hackathons provide a hands-on learning experience, where
participants work on AI challenges and projects within a limited timeframe. These events foster collaboration,
problem-solving, and practical application of AI techniques.
Depending on factors such as learner profile, nature of topic and duration, access to quality teachers and programming
environments, one should choose a mix of these for the various requirements. For example, short term knowledge
upgradation programs would be attractive in online form, as it enables access to good experts and minimises logistic
requirements for participants. On the other hand, long term degree programs etc. are best done in an in-campus mode.
TRAINING OF TRAINERS
Training of trainers in Artificial Intelligence (AI) plays a vital role in constructing a coaching network which could efficiently
teach the subsequent generation of AI experts. Quality and quantity of teachers in the engineering field has often been a
concern in India. The ToT ecosystem in AI can encompass the followings:
• Subject Matter Expertise: Trainers ought to own deep knowledge and know-how in AI ideas, algorithms, and
packages. They should be well-versed in various subfields of AI, consisting of machine learning, pc vision, natural
language processing, robotics etc. as applicable.
• Pedagogical Skills: For interactive and engaging training sessions, a trainer needs to have strong pedagogical
skills. To meet the needs of various beginners, they must identify various learning styles, select effective teaching
methods, and adapt content.
• Curriculum Development: Training programs may be designed with focus on comprehensiveness and up-to-date
AI curricula. In order to ensure the same, such programs can include objectives and effective lesson plans,
deciding on suitable sources of knowledge, including the inclusion of practical palms-on physical games and
initiatives.
• Hands-on Experience: Trainers should have a realistic level in AI, together with implementing AI algorithms,
running with AI tools and frameworks, and making use of AI techniques to real-international issues. Hands-on
experience facilitates trainers to provide realistic insights to inexperienced persons.
• Continuous Professional Development: Training applications must emphasise continuous professional
development for trainers. This includes staying updated with the recent developments in AI, taking part in
projects, attending conferences and workshops, and engaging in collaborative mastering with other AI educators.
• Teaching Methodologies and Tools: Trainers need to be acquainted with numerous teaching methodologies and
gear that decorate the learning enjoy. This consists of using visualisations, interactive demos, simulations, and
educational technologies particularly designed for teaching AI ideas.
• Ethical Considerations: AI trainers need to deal with ethical implications, privacy worries, and responsible AI
practices in their training programs.
• Collaboration and Networking: Building a vibrant teaching community entail fostering collaboration and
networking amongst AI trainers. This can be performed through regular meetings, workshops, online boards, and
sharing of best practices.
• Evaluation and Assessment: Training packages must consist of mechanisms for evaluating performance. This
may involve peer critiques, learner remarks, and ongoing exams to screen trainers' effectiveness and offer
focused assistance and professional development possibilities.
95
• Industry Engagement: Trainers need to establish connections with enterprise experts and groups to stay updated
on industry necessities and emerging tendencies. Industry collaborations can offer trainers with international
insights, guest lectures, internship opportunities, and enterprise-relevant projects for learners. Chapter 4: Model
Curriculum Framework
Despite a large number of engineering and science colleges starting streams in AI related topics, and thereby addressing
the human resources for much of the industry requirement, India’s profile in world class research is limited. Barring the
tier-1 institutions and a few select institutions (e.g. Amrita University, VIT, etc), most are focused on teaching only. This
has two implications which require attention in the context of this working group.
Firstly, the sole focus on teaching impacts the level of maturity, experience and expertise that the faculty has in fast-
changing areas like AI. Depending solely on continuous faculty training and upgrade is a costly way to address this problem.
Building a research ambience in the college can help improve this situation, by encouraging connection with the state-of-
the-art work in this field.
Secondly, India needs to stay abreast of the technological developments in this fast-evolving world, to sustain its efforts.
Even more, we need to be at the leader position in research in this strategically important area. Compared to other
economies like China, we have a long way to improve our publications and patent count in this area.
Lack of adequate research culture is also impacting the level and quality of PhD work in many areas. Hence, just starting
more PhD programs will not solve this problem.
Building a publicly visible review process, publicly accessible research repository, high quality datasets, ontologies, and
corpora with easy availability, and recognition of high-quality work being done irrespective of the institutions will
contribute positively to this cause.
It may be noted that while the current state of the art in AI, promises effective solutions to a number of important
problems – ranging from medical diagnosis to engineering – there are still many challenges like explainability, security,
privacy, and so on that are still open for big advances. The non-statistical aspect of AI also requires attention from sharp
research minds.
Therefore, attention to build a broad-based research culture in this area needs to be a key element of our skill
development under IndiaAI.
As mentioned earlier, a number of engineering colleges have started AI-based degree courses, along the lines of BE(IT)
and BE(CSE). These are designed with inputs from industry and academia, through their Board of studies. A sample
curriculum used by one of the engineering colleges, with which one of the committee members is associated, for data
science and AI/ML based computer science degree is placed in Annexure-III. This is typical of special engineering streams
introduced by many colleges related to the field of AI. Streams are available today in the name of Artificial Intelligence,
Data Science, Business data analytics, Data analytics, Big data analytics, etc. Many of them focus on the applicative aspect
of AI, quite relevant for our focus, particularly for the startups. These courses often have the latest subjects and topics
covered as per the syllabus.
One challenge that many of these organisations face is in the area of teaching resources – curated quality datasets,
software systems, computer facilities, etc. Alternate committees under IndiaAI are already looking into datasets and
compute provisions, and we hope these will address them. An equally strong – perhaps stronger – concern is the
96
availability of quality faculty with relevant experience. In order to ensure quality human resources coming out of these
degree programs, ensuring quality faculty is critical. IndiaAI could help in addressing this gap, by having recorded course
material, teacher guidance/mentoring mechanisms, industry internship for teachers, etc.
Promoting industry interventions and academia-industry partnerships would be highly beneficial at graduation stage. One
example is AWS Academy has enabled over 32,000 students across higher education institutions in India with a ready-to-
teach cloud computing curriculum, preparing students to pursue industry-recognized certifications and in-demand cloud
jobs, enabling learners to select and apply machine learning models and services to resolve business problems, followed
by virtual internship opportunities and a similar model can be adopted for AI curriculum.2
There are a few recent initiatives to map or create AI curriculum frameworks for grades K–12. These include 1) AI Literacy:
Competencies and Design Considerations, 2) the AI4K12: K-12 AI Guidelines (https://ai4k12.org/), and 3) the Machine
Learning Education Framework discussed in K12 AI Curricula: a mapping of govt endorsed AI curricula by UNESCO
(https://unesdoc.unesco.org/ark:/48223/pf0000380602).
The Atal Innovation Mission (AIM) was initiated in 2016 by the Government of India to create a space for innovation and
an entrepreneurial ecosystem, and focuses on empowering the youth with twenty-first-century skills such as creativity,
innovation, critical thinking, design thinking, social- and cross-cultural collaboration and ethical leadership. AIM
established Atal Tinkering Labs (ATLs) across India with a vision of cultivating one million children in India as Neoteric
innovators. The Tinkering Labs’ main objectives include creating workspaces where young minds can innovate by sculpting
ideas through hands-on experience and work, and by learning in a flexible environment. ATL also includes topics from AI,
ML, etc.
In 2021, CBSE collaborated with Intel to launch an initiative called the AI Student Community (AISC). Following the
framework of the National Education Policy (NEP) 2020, the AISC focused on preparing a workforce for economic
development using AI and the development of technical skills such as data analysis and computational thinking. The AISC
organises webinars by Intel’s AI-certified coaches and experts, as well as boot camps and hackathons, for students. In
addition, the AISC provides opportunities for students to access curated audio-visual learning resources related to AI.
In 2022, Intel partnered with CBSE to launch the ‘AI for All’ initiative to create a basic understanding of AI among the
general public. AI for All is a four-hour self-paced course available in eleven Indian languages and accessible to visually
impaired learners. The course is divided into two sections: (I) AI Awareness (1.5 hours), which includes a basic
understanding of AI and of misconceptions about AI and its application, and (II) AI Appreciation (2.5 hours), which involves
understanding the key domains of AI, the impact of AI across different industries and the principles of responsible AI and
AI ethics.
AI literacy based on a scoping study of existing research has sought to determine emerging themes in (a) what AI experts
believe a non-technical audience should know, and (b) common perceptions and misconceptions among learners. The
scoping study reveals 17 competencies and 13 design considerations, which focuses on pedagogical and learning methods,
but also on social and interpersonal elements. Overall, they emphasise experiential learning and relevant material, an
appreciation for cognitive demands and child development theory, and the positioning of AI within learner contexts. The
15 specific design considerations that the researchers present are:
2
https://d1.awsstatic.com/training-and-certification/ramp-up_guides/Ramp Up_Guide_Machine_Learning.pdf
97
• Explainability: Include graphical visualisations, simulations, explanations of agents’ decision-making processes,
or interactive demonstrations in order to aid learners’ understanding of AI.
• Embodied interactions: Design interventions in which individuals can act as or follow the agent, as a way of
making sense of the agent’s reasoning process. This may involve embodied simulations of algorithms and/or
hands-on physical experimentation with AI technology.
• Contextualising data: Encourage learners to investigate who created the dataset, how the data was collected,
and what the limitations of the dataset are. This may involve choosing datasets that are relevant to learners’
lives, are low-dimensional and are ‘messy’ (i.e. not cleaned or neatly categorizable).
• Promote transparency: Promote transparency in all aspects of AI design (i.e. eliminating black-box functionality,
sharing creator intentions and funding/ data sources, etc.).
• Unveil gradually: To prevent cognitive overload, give users the option to inspect and learn about different system
components; explain only a few components at a time; or introduce scaffolding that fades as the user learns
more about the system’s operations.
• Opportunities to program: Provide ways for individuals to program and/or teach AI agents. Keep coding
prerequisites to a minimum by focusing on visual/auditory elements and/or incorporating strategies like Parsons
problems and fill-in-the-blank code.
• Milestones: Consider how perceptions of AI are affected by developmental milestones (e.g. theory of mind
development), age, and prior experience with technology.
• Critical Thinking: Encourage learners to be critical consumers of AI technologies by questioning the intelligence
and trustworthiness of AI applications.
• Identities, values and backgrounds: Consider how learners’ identities, values, and backgrounds affect their
interest in and preconceptions of AI. Learning interventions that incorporate personal identity or cultural values
may encourage their interest and motivation.
The Machine Learning Education Framework provides a structured approach to learning and acquiring skills in machine
learning (ML). It outlines the key components and stages involved in building a solid foundation in ML education. The
Machine Learning Education Framework serves as a guideline for designing comprehensive ML education programs.
However, it is important to tailor the curriculum to specific learning goals, target audience, and available resources. The
framework emphasises a holistic understanding of ML concepts, practical hands-on experience, and awareness of ethical
considerations in the field. While there isn't a specific universally accepted framework, the following elements are
commonly considered when designing ML education programs:
• Fundamentals of ML: This stage covers the foundational concepts and principles of ML, including supervised
learning, unsupervised learning, reinforcement learning, feature engineering, model evaluation, and bias-
variance trade-off. It provides an understanding of the core algorithms, mathematical concepts, and statistical
techniques used in ML.
• Programming and Tools: Proficiency in programming languages commonly used in ML, such as Python, R, or Julia,
is essential. This stage focuses on developing programming skills, familiarity with ML libraries and frameworks
(e.g., TensorFlow, PyTorch, scikit-learn), and data manipulation and pre-processing techniques.
• Data Collection and Preparation: ML heavily relies on high-quality data. This stage covers data collection
methodologies, data cleaning, data pre-processing techniques (e.g., feature scaling, handling missing values,
outlier detection), and data visualisation to gain insights into the dataset.
• Model Development and Training: This stage delves into the process of designing ML models, selecting
appropriate algorithms, model architecture, hyperparameter tuning, and optimization techniques. It also covers
strategies for model training, cross-validation, regularisation, and ensemble methods.
98
• Evaluation and Validation: Understanding how to evaluate ML models is crucial. This stage focuses on
performance metrics (e.g., accuracy, precision, recall, F1-score), model validation techniques (e.g., train-test split,
k-fold cross-validation), and addressing overfitting or underfitting issues.
• Deployment and Productionisation: This stage emphasise the practical aspects of deploying ML models into
production environments. It covers considerations like scalability, latency, model serving, versioning, monitoring,
and maintaining model performance over time.
• Ethical and Responsible AI: ML education should include discussions on ethical considerations, bias detection and
mitigation, fairness, transparency, and accountability in ML models. Understanding the social implications and
responsible use of ML is crucial.
• Continuous Learning and Exploration: ML is a rapidly evolving field, and staying updated is essential. Encouraging
a mindset of continuous learning, exploration of new algorithms, research papers, attending conferences, and
engaging in ML communities helps individuals stay current and adapt to new advancements.
RESPONSIBLE AI / ETHICAL AI
AI has a wide range of applications and many demonstrable benefits. For instance, AI provided important insights and
issued alerts early in the COVID-19 pandemic. However, the use of AI also raises a number of ethical considerations. Bias
can be introduced into AI through the datasets used and the choices of developers, leading to discrimination. Due to
elements such as the hidden layers of some types of AI, the processes and factors in AI decision-making cannot be seen,
checked or redressed by humans, raising issues in terms of explainability and transparency. Other challenges include
balancing the use of personal data with the individual right to privacy; the security of data and potential exposure to
cyber-crime; and the reinforcement of prior beliefs by AI algorithms based on user interest, which can limit people’s
exposure to ideas and information and, some argue, infringe on an individual’s right to freedom of expression (UNDESA
et al., 2021). The First Draft of the Recommendation on the Ethics of Artificial Intelligence (UNESCO, 2020) highlights some
of the key ethical challenges of AI, and sets out ten principles for ethical AI:
● Proportionality and no do harm suggest that AI should have legitimate objectives and aims that are appropriate
to the context and based on rigorous scientific foundations.
99
● Safety and security suggest that AI should not cause damage and must protect against security risks throughout
its life cycle.
● Fairness and non-discrimination suggest that AI systems should avoid bias, and that access to AI and its benefits
should be shared at national, local and international levels, and be equally distributed without preference for
‘sex; gender; language; religion; political or other opinion; national, ethnic, indigenous or social origin; sexual
orientation; gender identity; property; birth; disability; age; or other status’.
● Sustainability suggests that the social, cultural, economic and environmental impact of AI technologies should be
continuously assessed in the context of shifting goals.
● Privacy suggests that data for AI is collected, used, shared, archived and deleted in ways that protect the
individual agency of data subjects, and that ‘legitimate aims’ and a ‘valid legal basis’ are in place for processing
personal data.
● Human oversight and determination suggest that humans or other legal entities bear responsibility for AI ethically
and in law.
● Transparency and explainability suggests that people should be aware of when decisions are based on AI
algorithms, and that individuals and social entities should be able to request and receive explanations for those
decisions, including insights into factors and decision trends. Explainability is detailed further: ‘outcomes, and
the sub-processes leading to outcomes, should be understandable and traceable, appropriate to the use context’.
● Responsibility and accountability reinforce the principle of human oversight and determination, and suggests
that impact assessment, monitoring, and due diligence mechanisms should be in place to ensure accountability
for AI systems. Auditability8 must be ensured.
● Awareness and literacy refer to the responsibilities of governments as well as the public sector, academia and
civil society to promote open and accessible education and other initiatives focused on the intersections of AI
and human rights, in order to ensure that ‘all members of society can take informed decisions about their use of
AI systems and be protected from undue influence’.
● Multi-stakeholder and adaptive governance and collaboration suggests that states should regulate data
generated within and passing through their territories; that stakeholders from a broad range of civil
organisations, and the public and private sector should be engaged throughout the AI life cycle; and that
measures need to be adopted to allow for meaningful intervention by marginalised groups, communities and
individuals.
AI FOR ALL
● India’s National Strategy for Artificial Intelligence, by NITI Aayog, is built on the philosophy of ‘AI For All’. The
approach is to leverage exponential technologies toward inclusive growth, develop population-scale AI solutions
for societal needs and leverage India's unique position to be a global leader for inclusive technologies.
● The National Education Policy 2020 acknowledges the importance of Artificial Intelligence (AI) and emphasises
on preparing students for an AI driven economy. The policy focuses on making education more experiential and
future-oriented, building crucial skills such as data analysis, computational thinking etc. from a young age through
teaching of contemporary subjects such as Artificial Intelligence.
● An AI driven economy requires rigorous programs and interventions to address current and future AI skill gaps
and empower a larger non-technical audience with AI social skills and tech skills for real world applications.
● Hon’ble Prime Minister of India, Shri Narendra Modi, launched a new initiative ‘AI For All’, a program for public
awareness on Thursday, July 29, 2021. This program is driven by the Central Board of Secondary Education,
100
Ministry of Education, Government of India, in collaboration with Intel India. ‘AI For All’ is a 4-hour, self-paced,
micro-learning program aimed to help demystify AI for the general public. The program is divided into two
sections: AI Aware, and AI Appreciate. Both sections can be completed in about four hours.AI Aware introduces
the participant to AI and helps them to realise that AI mimics Human intelligence. More at https://ai-for-
all.in/#/home.
According to a recent McKinsey survey, 56% of organisations are using AI in at least one business function. To integrate
AI there is a need to identify how AI can serve the business, including the possible use cases. Some common use cases
covering marketing, sales, customer services, security, data, technology, and other processes, are as under: Some details
of these are given in Annexure-IV.
AI-BASED PROFILES
The future career prospects in AI are highly promising and diverse. As AI continues to advance and permeate various
industries, the demand for professionals with AI skills is growing rapidly. As in the case of software development involving
different types of languages, software engineering, quality assurance, etc. work in AI also requires different profiles.
Depending on the competences acquired, the academic path followed and one’s own interest, suitable profiles can be
adopted. Here are some relevant profiles in AI:
• AI Researcher/Scientist: This role involves pushing the boundaries of AI by conducting cutting-edge research,
developing new algorithms, and creating innovative AI models.
• Machine Learning Engineer: ML engineers develop and deploy machine learning models, optimise algorithms,
and work on data pipelines to ensure efficient and effective machine learning systems.
• Data Scientist: Data scientists analyse and interpret complex data sets, apply statistical and machine learning
techniques to derive insights, and develop predictive models for business applications.
• AI Architect: AI architects design and develop scalable and efficient AI systems, including selecting appropriate
algorithms, defining the system architecture, and ensuring seamless integration with existing infrastructure.
• Natural Language Processing (NLP) Engineer: NLP engineers focus on developing AI systems that can understand,
interpret, and generate human language. They work on tasks like sentiment analysis, language translation,
chatbots, and text generation.
• Computer Vision Engineer: These professionals work on developing AI systems capable of analysing and
understanding visual information, such as image recognition, object detection, and video analysis.
• Speech Technologists
101
• AI Product Manager: AI product managers oversee the development and implementation of AI solutions. They
collaborate with cross-functional teams, define product strategy, and ensure that AI applications align with user
needs and business goals.
• AI Ethicist: With the growing ethical concerns surrounding AI, AI ethicists help organisations navigate the ethical
implications of AI technologies, ensuring responsible and unbiased development and deployment.
• AI Consultant: AI consultants provide guidance to organisations on implementing AI strategies, identifying AI
opportunities, and leveraging AI technologies to optimise business processes and decision-making.
• AI Entrepreneur: Some individuals choose to start their own AI-focused ventures, developing innovative AI
solutions, products, or services to address specific market needs.
AI TALENT FRAMEWORK
In order to formulate the curricula for different courses and the depth of knowledge in various subjects, we categorise
various courses/ programmes under the following key focus areas:
● Application Specific (Sectorial, Domain etc) – hybrid professionals who are specialists in their domain and AI
102
● Awareness (Sectoral Opportunities, Threats, etc.)
Curricula design for various specific courses should look at these categories and choose topics considering the candidate’s
prior background, the job profiles intended as outcome, and the competence available to the institution. However, core
topics should not be dropped due to resources and other constraints; efforts should be made to upgrade the institutional
mechanisms so that this can be achieved.
Quantum AI
103
Notes:
● The above chart indicates NSQF levels as per the existing NOS/QPs. However, considering that AI skilling is to
start from school levels, there is a need to enable lower NSQF levels. Also, for non-STEM graduates, lower levels
might be more appropriate. Those pursuing a specialization other than AI would also need AI skills at appropriate
levels.
● The National Credit Framework creates an equivalence between skilling and academic credentials towards
earning credits. With a design that is flexible, an always-updated list of qualifications and NOSs can be made
available on IndiaAI to enable institutions, organizations and training providers to align to national standards.
The framework should be flexible enough to incorporate industry courses and those recognized as relevant by
industry.
AI curriculum can vary depending on specific learning goals, background, and available resources. What is fair for a forum
like IndiaAI to do is to give specific guidelines, and bring out the relevant topics which are appropriate to be covered. Note
that this needs to be updated periodically, to cater to changes taking place in the field. With that caveat, a comprehensive
AI curriculum typically covers the following key areas:
• Fundamentals of AI: This includes understanding the basic concepts, principles, and techniques of artificial
intelligence, machine learning, and deep learning.
• Mathematics and Statistics: A strong foundation in mathematics and statistics is crucial for understanding the
algorithms and models used in AI. Topics such as linear algebra, calculus, probability, and statistics are important.
• Machine Learning: Supervised and unsupervised learning techniques, regression, classification, clustering,
dimensionality reduction, and evaluation methods, including popular algorithms like decision trees, support
vector machines, and neural networks.
• Deep Learning: Studying neural networks in depth, including architectures like convolutional neural networks
(CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs), including learning about
frameworks such as TensorFlow, Keras and PyTorch.
• Natural Language Processing (NLP): Techniques for understanding and processing human language, including
text classification, sentiment analysis, named entity recognition, and machine translation. This is important,
considering that a large proportion of data one encounters in many domains is in textual form.
104
• Computer Vision: Understanding image and video processing techniques, object detection, image recognition,
and semantic segmentation, including learning about popular libraries like OpenCV.
• Reinforcement Learning: Delving into the area of training agents to make sequential decisions through rewards
and punishments, including the study algorithms such as Q-learning and policy gradients.
• AI Ethics and Responsible AI: Ethical implications of AI technologies, bias in data, and the responsible
development and deployment of AI systems.
• Practical Projects: Hands-on projects for applying the knowledge gained, including working on real-world AI
problems and building new models and applications.
• Continual Learning: AI is a rapidly evolving field, so staying up-to-date with the latest research papers,
conferences, and developments is also important.
The recommendations of the Committee wrt to the competencies/ curriculum at various levels, viz, School, Vocational
level, Diploma, Minor Degree, B.Tech, M.Tech, Ph.D. including training for lawmakers, Govt. Officials and mass awareness
are as under:
105
DIPLOMA (NON-ENGINEERING)
106
PG LEVEL & RESEARCH (M.TECH/MS/PHD)
107
LAWMAKERS / GOVT. OFFICERS TRAINING
CURRICULUM REPOSITORY
In fields like this, curriculum needs rapid upgradation. For reference, sample curriculum being used currently
are shared in the report.
A central curriculum repository could be part of the IndiaAI portal. Institutions can share their curriculum.
Any valid user can login and share comments/suggestions.
A curriculum committee may be formed with experts from academia, industry, etc which may post
recommendations on changes in curricula.
108
LEARNING SPECTRUM - ENHANCEMENTS
● Application-oriented skilling for India-specific problem solving to be given priority at the higher education level.
● Higher education students to be encouraged to carry out projects in AI start-ups to create a win-win ecosystem
for Indian start-ups and students.
● Organizations like CDAC, NIELIT can conduct short term courses for reskilling those who want to move into AI,
and also to contribute to faculty upgradation.
● As mentioned earlier, quality of faculty involved in AI related subjects is a significant concern. Unless this is
brought under control, large scale development of quality human resources could be a challenge. The following
are some recommendations towards improving and sustaining this. We need to ensure that the training received
by potential faculty is authentic and coupled with a good amount of industry exposure.
● Providing a curated list of certified /recognized training programs as part of IndiaAI is recommended. Such
certifications should have a time validity, to ensure these institutions update their curriculum, pedagogy, tools
and faculty. It is recommended that faculty teaching AI related courses recalibrate and upgrade themselves every
2 years. It will be the responsibility of the parent institution to do this. Accrediting agencies can be recommended
to consider this suitably for AI related accreditations.
● A mechanism to provide industry internships for faculty can be formulated, through a faculty-internship-portal.
A 2-month internship once every 2-3 years can help build synergy between industry and academia. Depending
on the work carried out by the faculty in such internship, joint IPR and publications can be targeted. This activity
could also be brought under CSR permissible schemes.
109
RESEARCH IN AI
Research fellowships, about 1000, annually. OR 100 institutions get grant to build research capability (tier 2-3-4
institutions)
All AI related thesis at Masters and PhD level to be made available from India AI portal (motivation, quality, feedback,
etc)
All AI scholars encouraged to build good datasets and share with the community
IndiaAI portal and initiative can become an India specific AI community in similar lines of hugging faces to address various
national challenges, data sharing and associated constraints, effective infrastructure utilization mechanisms, research
result sharing, etc.
WAY FORWARD
Artificial intelligence has now become an integral part of our everyday lives and a variety of jobs. Recognising this growth
and projected impact, AI education is being integrated into all offerings, such as engineering, businesses, sciences,
humanities, and even the arts. As it permeates all aspects of modern life, it will be incorporated into all academic
disciplines and will become an essential component of all education.
The recommendations in this report are intended to better prepare the Indian workforce in general, and students in
particular, for opportunities in AI and related fields. Being different from traditional engineering in many ways, we have
included the law makers, bureaucrats and common citizens in the scope of AI awareness, to ensure a balanced vision of
this field. By adopting and implementing these strategies, it is expected that all stakeholders would benefit through the
proposed “Framework” that categorises courses/ programs under key focus areas, such as:
In this regard, it is envisioned that the “Model Curriculum & Repository” provided in this report covering the fundamentals
of AI, mathematics and statistics, machine learning, deep learning, NLP, computer vision, reinforcement learning, AI ethics,
practical projects, and continual learning aspects, would be of immense value to all stakeholders, including the MoE; UGC;
AICTE; MoSDE; CBSE/ State Education Boards; Sector Skill Councils (SSCs); Startups/ MSMEs/ Industries, etc.
110
Through synergy of capabilities and capacities at HEIs (Higher Education Institution); TVETs (technical and vocational
education and training); TAFEs (technical and further education), EdTech Institutions & Schools, including the network of
ITIs and polytechnics across India, it is expected that an institutional mechanism would be established at IndiaAI to
promote a collaborative and competitive ecosystem for AI education and research in India.
Annexure – I: Qualification Packs (QPs) and National Occupational Standards (NOS) for AI & Big Data
Level 6 NOS 8101 - Import data as per Variety of techniques to import data into datasets or data
specifications frames.
NOS 8102- Preprocess data as per Variety of techniques to preprocess data i.e. clean and
specifications transform the data
NOS 8115-Create new databases Creating new databases for internal and external clients.
NOS 8116-Maintain existing databases Maintaining existing databases for internal and external
clients.
NOS 8117-Manage database access and
configuration Managing access rules and configurations for databases for
internal and external clients.
NOS 8118-Manage computing cluster
administration Managing access rules and configurations for computing
clusters for internal and external clients.
NOS 8120-Develop tools, processes and
mechanisms for continuous integration Developing tools, processes and mechanisms to assist
and delivery continuous delivery and integration of developed solutions.
111
Level Core NOS AI Best Practices
112
SSC AI - Data https://nsdcindia.org/sites/default/fi https://nsdcindia.org/sites/default/files/MC_SSCQ8111_V1.0_AI-
/Q8 Steward les/SSCQ8111_AI- Data%20Steward_26.10.2018.pdf
111 Data_Steward_V1_21_01_2019.pdf
SSC AI - Devops https://nsdcindia.org/sites/default/fi https://nsdcindia.org/sites/default/files/MC_SSCQ8112_V1.0_AI-
/Q8 Engineer les/SSCQ8112_AI- DevOps%20Engineer_26.10.2018.pdf
112 DevOps_Engineer_V1_21_01_2019.p
df
SSC AI - https://nsdcindia.org/sites/default/fi https://nsdcindia.org/sites/default/files/MC_SSCQ8113_AI-
/Q8 Machine les/SSCQ8113_AI- Machine%20Learning%20Engineer_26.10.2018.pdf
113 Learning Machine_Learning_Engineer_V1_21_
Engineer 01_2019.pdf
SSC AI - https://nsdcindia.org/sites/default/fi https://nsdcindia.org/sites/default/files/MC_SSCQ8113_AI%20-
/Q8 Machine les/SSC_Q8113_AI_Machine_Learnin %20Machine%20Learning%20Engineer_QP-V2.0_22092020.pdf
113 Learning g_Engineer_V2_16_04_2020.pdf
Engineer v2
SSC AI - https://nsdcindia.org/sites/default/fi https://nsdcindia.org/sites/default/files/MC_SSCQ8114_V1.0_AI-
/Q8 Hardware les/SSCQ8114_AI- Hardware%20Engineer_26.10.2018.pdf
114 Engineer Hardware_Engineer_V1_21_01_201
9.pdf
SSC AI - https://nsdcindia.org/sites/default/fi https://nsdcindia.org/sites/default/files/MC_SSCQ8115_V1.0_AI-
/Q8 Integration les/SSCQ8115_AI- Integration%20Engineer_Model%20Curriculum_29.10.2018.pdf
115 Engineer Integration_Engineer_V1_21_01_20
19.pdf
SSC AI - Test https://nsdcindia.org/sites/default/fi https://nsdcindia.org/sites/default/files/MC_SSCQ8116_AI-
/Q8 Engineer les/SSCQ8116_AI- Test%20Engineer_26.10.18.pdf
116 Test_Engineer_V1_21_01_2019.pdf
113
Level Core NOS AI Best Practices
SS AI - https://nsdcindia.org/sites/defaul https://nsdcindia.org/sites/default/files/MC_SSCQ8104_V1.0_AI%20-
C/ Data t/files/SSCQ8104_AI- %20Data%20Scientist_Elective%20%E2%80%93%20Model%20Risk%20Assessment%2
Q8 Scientis Data_Scientist_V1_21_01_2019.p 0%20Model%20Business%20Performance%20%20Visualizations_26.10.2018.pdf
10 t df
4
SS AI - https://nsdcindia.org/sites/defaul https://nsdcindia.org/sites/default/files/MC_SSCQ8107_V1.0_AI%20-
C/ Data t/files/SSCQ8107_AI- %20Data%20Architect_Elective%20%E2%80%93%20Data%20Storage%20Networking
Q8 Archite Data_Architect_V1_21_01_2019.p %20%20Data%20Integrations_26.10.2018_0.pdf
10 ct df
7
SS AI - https://nsdcindia.org/sites/defaul https://nsdcindia.org/sites/default/files/MC_SSCQ8107_AI%20-
C/ Data t/files/SSC_Q8107_AI_Data_Archi %20Data%20Architect_QP-V2.0_22092020.pdf
Q8 Archite tect_V2_16_04_2020.pdf
10 ct v2
7
SS AI - https://nsdcindia.org/sites/defaul https://nsdcindia.org/sites/default/files/MC_SSCQ8108_V1.0_AI-
C/ Solutio t/files/SSCQ8108_AI- Solution%20Architect_26.10.2018.pdf
Q8 n Solution_Architect_V1_21_01_20
10 Archite 19.pdf
8 ct
SS AI - https://nsdcindia.org/sites/defaul https://nsdcindia.org/sites/default/files/MC_SSCQ8108_AI%20-
C/ Solutio t/files/SSC_Q8108_AI_Solution_Ar %20Solution%20Architect_QP-V2.0_22092020.pdf
Q8 n chitect_V2_16_04_2020.pdf
10 Archite
8 ct v2
114
ANNEXURE – II: A SAMPLE GENAI GRADUATE LEVEL COURSE
A detailed 16-week course curriculum for a graduate student to learn how to build generative AI models for various
modalities, along with the prerequisites for the course. This 16-week course curriculum covers a wide range of topics
related to generative AI models across different modalities, including text, image, audio, VR/AR, and robotics. Students
will gain a solid understanding of the underlying concepts and techniques involved in building generative models and will
have the opportunity to apply this knowledge to real-world problems.
Prerequisites:
Week 1-2: Introduction to generative models, including generative adversarial networks (GANs), variational
Introduction to auto encoders (VAEs), and autoregressive models
Generative
Models Fundamentals of probability theory, statistical inference, and optimization
Training basic generative models for text, image, and audio data using PyTorch
Week 3-4: Advanced generative models, including deep convolutional GANs (DCGANs), style-based GANs
Advanced (StyleGANs), and BigGANs
Generative
Models Techniques for stabilizing and improving GAN training, such as spectral normalization, hinge loss,
and progressive growing
Building generative models for video and VR/AR applications, including temporal GANs and
video prediction models
Building generative models for audio data, including WaveGAN and SampleRNN
Week 5-6: Introduction to natural language processing (NLP) with generative models
Natural Language
Processing with Building language models, including recurrent neural networks (RNNs) and transformers
Generative
Models Text generation using GPT-2 and GPT-3
Week 7-8: Image - Building image generation models using variational autoencoders (VAEs)
and Video
Generation - Image-to-image translation using cycle-consistent adversarial networks (CycleGANs)
- Applications of image and video generation, such as super-resolution, image inpainting, and
video prediction
Week 9-10: Audio Building audio generation models using WaveNet and GANs
Generation
Music generation using LSTM and transformers
115
Weeks Topics Covered
Applications of audio generation, such as audio style transfer and voice conversion
Week 11-12: Introduction to virtual reality (VR), augmented reality (AR), and robotics with generative models
VR/AR and
Robotics Building generative models for 3D objects and environments
Applications of generative models in VR/AR, such as VR/AR content creation and telepresence
Building generative models for robotics, including robot motion planning and control
Week 15-16: Final project involving the application of generative models to a real-world problem, such as
Project image or audio generation, text or language modeling, or VR/AR content creation
Students will be required to present their work in a final project report and demo
Prerequisites:
Week 3-4: Introduction to recurrent neural networks (RNNs) and their applications in NLP tasks
Recurrent
Neural Long short-term memory (LSTM) and gated recurrent unit (GRU) cells
Networks for
NLP Training RNNs for language modeling and sequence classification tasks
116
Weeks Topics Covered
Week 5-6: Introduction to attention mechanisms and their applications in NLP tasks
Attention
Mechanisms Attention-based models, including Transformer and BERT
and
Transformer Training Transformer-based models for text classification, sentiment analysis, and question-
Models answering tasks
Applications of Transformer-based models in NLP, such as language modeling and text generation
Week 7-8: Introduction to generative language models, including n-gram models and sequence models
Generative
Language Auto-regressive models for text generation, including GPT-2 and GPT-3
Models
Conditional language modeling with transformers and GANs
Applications of generative language models, such as text completion, language translation, and
question-answering
Applications of speech recognition and synthesis in NLP, such as speech translation and voice
assistants
Week 15-16: Final project involving the application of generative NLP models to a real-world problem, such as
Project language translation, chatbot design, or speech recognition
Students will be required to present their work in a final project report and demo
This 16-week course curriculum covers a wide range of topics related to generative AI models for NLP tasks. Students will
gain a solid understanding of the underlying concepts and techniques involved in building generative NLP models and will
have the opportunity to apply this knowledge to real-world problems.
117
ANNEXURE – III: A SAMPLE CURRICULUM FOR DATA SCIENCE
Curriculum used in one of the engineering colleges in the area of data science is enclosed as a reference. It shows an
indicative list of topics in such courses, and a neat way to categorise the courses to provide the student a perspective of
the various tracks involved. In most engineering colleges, semester 1 and 2 follows, discipline-independent curriculum to
allow movement across disciplines, and hence these semesters have been omitted here. Usually there are very few
domain courses there.
Here is another curriculum for specialization in AI/ML. Note the number of application streams featuring in the syllabus.
Different institutions may substitute specific verticals, depending on local context, access to data sources, faculty
competence, market requirements, etc. Here also, colour coded streams are used to indicate the various streams for ease
of student understanding and planning.
118
ANNEXURE – IV: SOME DOMAIN SPECIFIC USE CASES
OPERATIONS
• Cognitive / Intelligent Automation: Combine robotic process automation (RPA) with AI to automate complex
processes with unstructured information. Digitize your processes in weeks without replacing legacy systems,
which can take years. Bots can operate on legacy systems learning from your personnel’s instructions and actions.
Increase your efficiency and profitability ratios. Increase speed and precision, and many more. Feel free to check
intelligent automation use cases for more.
• Robotic Process Automation (RPA) Implementation: Implementing RPA solutions requires effort. Suitable
processes need to be identified. If a rules-based robot will be used, the robot needs to be programmed.
Employees’ questions need to be answered. That is why most companies get some level of external help.
Generally, outsourcing companies, consultants, and IT integrators are happy to provide temporary labor to
undertake this effort.
• Process Mining: Leverage AI algorithms to mine your processes and understand your actual processes in
detail. Process mining tools can provide fastest time to insights about your as-is processes as demonstrated
in case studies. Check out process mining use cases & benefits for more.
• Predictive Maintenance: Predictively maintain your robots and other machinery to minimize disruptions to
operations. Implement big data analytics to estimate the factors that are likely to impact your future cash flow.
Optimize PP&E spending by gaining insight regarding the possible factors.
• Manufacturing Analytics: Also called industrial analytics systems, these systems allow you to analyze your
manufacturing process from production to logistics to save time, reduce cost, and increase efficiency. Keep your
industry effectiveness at optimal levels.
• Inventory & Supply Chain Optimization: Leverage machine learning to take your inventory& supply chain
optimization to the next level. See the possible scenarios in different customer demands. Reduce your stock,
keeping spending, and maximize your inventory turnover ratios. Increase your impact factor in the value chain.
• Robotics: Factory floors are changing with programmable collaborative bots that can work next to employees to
take over more repetitive tasks. Automate physical processes such as manufacturing or logistics with the help of
advanced robotics. Increased your connected systems by centralizing the whole manufacturing process. Lower
your exposures to human errors.
• Collaborative Robot: Cobots provide a flexible method of automation. Cobots are flexible robots that learn by
mimicking human workers’ behavior.
• Cashierless Checkout: Self-checkout systems have many names. They are called cashierless, cashier-free, or
automated checkout systems. They allow retail companies to serve customers in their physical stores without
the need for cashiers. Technologies that allowed users to scan and pay for their products have been used for
almost a decade now, and those systems did not require great advances in AI. However, these days we are
witnessing systems powered by advanced sensors and AI to identify purchased merchandise and charge
customers automatically.
• Invoicing: Invoicing is a highly repetitive process that many companies perform manually. This causes human
errors in invoicing and high costs in terms of time, especially when a high volume of documents needs to be
processed. Thus, companies can handle these repetitive tasks with AI, automate invoicing procedures, and save
significant time while reducing invoicing errors.
119
GENERAL SOLUTIONS
• Analytics Platform: Empower your employees with unified data and tools to run advanced analyses. Quickly
identify problems and provide meaningful insights.
• Analytics Services: Satisfy your custom analytics needs with these e2e solution providers. Vendors are there to
help you with your business objectives by providing turnkey solutions.
• Automated Machine Learning (autoML): Machines helping data scientists optimize machine learning models.
With the rise of data and analytics capabilities, automation is needed in data science. AutoML automates time
consuming machine learning tasks, enabling companies to deploy models and automate processes faster.
SPECIALIZED SOLUTIONS
• Geo-Analytics Platform: Enables analysis of granular satellite imagery for predictions. Leverage spatial data for
your business goals. Capture the changes in any landscape on the fly.
• Conversational Analytics: Use conversational interfaces to analyze your business data. Natural Language
Processing is there to help you with voice data and more. Automated analysis of reviews and suggestions.
• Real-Time Analytics: Real-Time Analytics for your time-sensitive decisions. Act timely and keep your KPI’s intact.
Use machine learning to explore unstructured data without any disruptions.
• Image Recognition and Visual Analytics: Analyze visual data with advanced image and video recognition systems.
Meaningful insights can be derived from the data piles of images and videos.
• E-Commerce Analytics: Specialized analytics systems designed to deal with the explosion of e-commerce data.
Optimize your funnel and customer traffic to maximize your profits.
MARKETING
• Marketing analytics: AI systems learn from, analyze, and measure marketing efforts. These solutions track media
activity and provide insights into PR efforts to highlight what is driving engagement, traffic, and revenue. As a
result, companies can provide better and more accurate marketing services to their customers. Besides PR
efforts, AI-powered marketing analytics can lead companies to identify their customer groups more accurately.
By discovering their loyal customers, companies can develop accurate marketing strategies and also retarget
customers who have expressed interest in products or services before. Feel free to read more about marketing
analytics with AI from this article.
120
• Personalized Marketing: The more companies understand their customers, the better they serve them. AI can
assist companies in this task and support them in giving personalized experiences for customers. As an example,
suppose you visited an online store and looked at a product but didn’t buy it. Afterward, you see that exact
product in digital ads. More than that, companies can send personalized emails or special offers and recommend
new products that go along with customers’ tastes.
• Context-Aware Marketing: You can leverage machine vision and natural language processing (NLP) to
understand the context where your ads will be served. With context-aware advertising, you can protect your
brand and increase marketing efficiency by ensuring your message fits its context, making static images on the
web come alive with your messages.
PRE-SALES / SALES
• Sales Forecasting: AI allows automatic and accurate sales forecasts based on all customer contacts and previous
sales outcomes. Automatically forecast sales accurately based on all customer contacts and previous sales
outcomes. Give your sales personnel more sales time while increasing forecast accuracy. Hewlett Packard
Enterprise indicates that it has experienced a 5x increase in forecast simplicity, speed, and accuracy with Clari’s
sales forecasting tools.
• Lead generation: Use a comprehensive data profile of your visitors to identify which companies your sales reps
need to connect. Generate leads for your sales reps leveraging databases and social networks
• Sales Data Input Automation: Data from various sources will be effortlessly and intelligently copied into your
CRM. Automatically sync calendar, address book, emails, phone calls, and messages of your salesforce to your
CRM system. Enjoy better sales visibility and analytics while giving your sales personnel more sales time.
• Predictive sales/lead scoring: Use AI to enable predictive sales. Score leads to prioritize sales rep actions based
on lead scores and contact factors. Sales forecasting is automated with increased accuracy thanks to systems’
granular access to lead scores and sales rep performance. For scoring leads, these systems leverage anonymized
transaction data from their customers, sales data of this specific customer. For assessing contact factors, these
systems leverage anonymized data and analyze all customer contacts such as email and calls.
• AI-based agent coaching: Both AI and emotion AI can be leveraged to coach sales reps and customer service
employees by responses during live conversations or written messages with leads. Bots will listen in on agents’
calls suggesting best practice answers to improve sales effectiveness.
• Sales Rep Next Action Suggestions: Your sales reps’ actions and leads will be analyzed to suggest the next best
action. This situation wise solution will help your representatives to find the right way to deal with the issue.
Historical data and profile of the agent will help you to achieve higher results. All are leading to more customer
satisfaction.
• Sales Content Personalization and Analytics: Preferences and browsing behavior of high priority leads are
analyzed to match them with the right content, aimed to answer their most important questions. Personalize
your sales content and analyze its effectiveness allowing continuous improvement.
• Retail Sales Bot: Use bots on your retail floor to answer customer’s questions and promote products. Engage with
the right customer by analyzing the profile. Computer vision will help you to provide the right action depending
on the characteristics and mimics of the customer.
• Meeting Setup Automation (Digital Assistant): Leave a digital assistant to set up meetings freeing your sales reps
time. Decide on the targets to prioritize and keep your KPI’s high.
• Prescriptive Sales: Most sales processes exist in the mind of your sales reps. Sales reps interact with customers
based on their different habits and observations. Prescriptive sales systems prescribe the content, interaction
channel, frequency, price based on data on similar customers.
• Sales Chatbot: Chatbots are ideal to answer first customer questions. If the chatbot decides that it cannot
adequately serve the customer, it can pass those customers to human agents. Let 24/7 functioning, intelligent,
self-improving bots handle making initial contacts to leads. High value, responsive leads will be called by live
agents, increasing sales effectiveness.
121
SALES ANALYTICS
• Customer Sales Contact Analytics: Analyze all customer contacts, including phone calls or emails, to understand
what behaviors and actions drive sales. Advanced analytics on all sales call data to uncover insights to increase
sales effectiveness.
• Sales Call Analytics: Advanced analytics on call data to uncover insights to increase sales effectiveness. See how
well your conversation flow performs. Integrating data on calls will help you to identify the performance of each
component in your sales funnels.
• Sales attribution: Leverage big data to attribute sales to marketing and sales efforts accurately. See which step
of your sales funnel performs better. Pinpoint the low performing part by the insights provided by analysis.
• Sales Compensation: Determine the right compensation levels for your sales personnel. Decide on the right
incentive mechanism for the sales representatives. By using the sales data, provide objective measures, and
continuously increase your sales representatives’ performance.
CUSTOMER SERVICE
• Social Listening & Ticketing: Leverage Natural Language Processing and machine vision to identify customers to
contact and respond to them automatically or assign them to relevant agents, increasing customer satisfaction.
Use the data available in social networks to uncover whom to sell and what to sell.
• Intelligent Call Routing: Route calls to the most capable agents available. Intelligent routing systems incorporate
data from all customer interactions to optimize the customer satisfaction. Based on the customer profile and
your agent’s performance, you can deliver the right service with the right agent and achieve superior net
promoter scores. Feel free to read case studies about matching customer to right agent in our emotional AI
examples article.
• Call Classification: Leverage natural language processing (NLP) to understand what the customer wants to
achieve so your agents can focus on higher value-added activities. Before channeling the call, identify the nature
of your customers’ needs and let the right department handle the problem. Increase efficiency with higher
satisfaction rates.
• Voice Authentication: Authenticate customers without passwords leveraging biometry to improve customer
satisfaction and reduce issues related to forgotten passwords. Their unique voice id will be their most secure key
for accessing confidential information. Instead of the last four digits of SSN, customers will gain access by using
their voice.
• Call Intent Discovery: Leverage Natural Language Processing and machine learning to estimate and manage
customer’s intent (e.g., churn) to improve customer satisfaction and business metrics. Sentiment analysis
through the customer’s voice level and pitch. Detect the micro-emotions that drive the decision-making process.
Explore how chatbots detect customer intent in our in-depth article on intent recognition.
• Customer Service Response Suggestions: Bots will listen in on agents’ calls suggesting best practice answers to
improve customer satisfaction and standardize customer experience. Increase upsells and cross-sells by giving
the right suggestion. Responses will be standardized, and the best possible approach will serve the benefit of the
customer.
• Chatbot: Chatbots can understand more complicated queries as AI algorithms improve. Thus, businesses
understand their customers better since chatbots collect information from customers while interacting with
them and spot their weaknesses. There are other benefits like 24/7 availability and reduced costs, as bots can
handle more tasks as they learn more. All these benefits significantly improve the customer satisfaction of
businesses.
• Customer Service Chatbot (Self – Service Solution): Build your own 24/7 functioning, intelligent, self-improving
chatbots to handle most queries and transfer customers to live agents when needed. Reduce customer service
costs and increase customer satisfaction. Reduce the traffic on your existing customer representatives and make
them focus on the more specific needs of your customers. If you want to have more insights on chatbots in
customer service, you can find more in our article on the topic.
122
• Call Analytics: Advanced analytics on call data to uncover insights to improve customer satisfaction and increase
efficiency. Find patterns and optimize your results. Analyze customer reviews through voice data and pinpoint,
where there is room for improvement. Sestek indicates that ING Bank observed a 15% increase in sales quality
score and a 3% decrease in overall silence rates after they integrated AI into their call systems.
• Survey & Review Analytics: Leverage Natural Language Processing to analyze text fields in surveys and reviews to
uncover insights to improve customer satisfaction and increase efficiency. Automate the process by mapping the
right keywords with the right scores. Make it possible to lower the time for generating reports. Protobrand states
that they used to do review analytics manually through the hand-coding of the data, but now it automates much
of the analytical work with Gavagai. This helps the company to collect larger quantitative volumes of qualitative
data and still complete the analytical work in a timely and efficient manner. You can read more about survey
analytics from our related article.
• Customer Contact Analytics: Advanced analytics on all customer contact data to uncover insights to improve
customer satisfaction and increase efficiency. Utilize Natural Language Processing for higher customer
satisfaction rates.
• Chatbot Analytics: Analyze how customers are interacting with your chatbot. See the overall performance of your
chatbot. Pinpoint its shortcomings and improve your chatbot. Detect the overall satisfaction rate of your
customer with the chatbot.
• Chatbot testing: Semi-automated and automated testing frameworks facilitate bot testing. See the performance
of your chatbot before deploying. Save your business from catastrophic chatbot failures. Detect the shortcomings
of your conversational flow.
123
DATA
• Data Visualization: Visualize your data for better analytics and decision-making. Let the dashboards speak.
Convey your message more easily and more esthetically.
• Data Management & Monitoring: Keep your data high quality for advanced analytics. Adjust the quality by
filtering the incoming data. Save time by automating manual and repetitive tasks.
• Data Integration: Combine your data from different sources into meaningful and valuable information. Data
traffic depends on multiple platforms. Therefore, managing this huge traffic and structuring the data into a
meaningful format will be important. Keep your data lake available for further analysis.
• Data Preparation Platform: Prepare your data from raw formats with data quality problems to a clean, ready-to-
analyze format. Use extract, transform, and load (ETL) platforms to fine-tune your data before placing it into a
data warehouse.
• Data Cleaning & Validation Platform: Avoid garbage in, garbage out by ensuring the quality of your data with
appropriate data cleaning processes and tools. Automate the validation process by using external data sources.
Regular maintenance cleaning can be scheduled, and the quality of the data can be increased.
• Data Transformation: Transform your data to prepare it for advanced analytics. If it is unstructured, adjust it for
the required format.
• AppDev: App development platforms for your custom projects. Your in-house development team can create
original projects for your specific business needs. These platforms will help your team with the necessary tools.
• Data Labeling: Unless you use unsupervised learning systems, you need high quality labeled data. Label your data
to train your supervised learning systems. Human-in-the-loop systems auto label your data and crowd source
labeling data points that cannot be auto-labeled with confidence.
• Synthetic Data: Computers can artificially create synthetic data to perform certain operations. The synthetic data
is usually used to test new products and tools, validate models, and satisfy AI needs. Companies can simulate not
yet encountered conditions and take precautions accordingly with the help of synthetic data. They also overcome
the privacy limitations as it doesn’t expose any real data. Thus, synthetic data is a smart AI solution for companies
to simulate future events and consider future possibilities. You can have more information on synthetic data
from our related article.
HR
• Hiring: Hiring is a prediction game: Which candidate, starting at a specific position, will contribute more to the
company? Machine and recruiting chatbots‘ better data processing capabilities augment HR employees in
various parts of hiring such as finding qualified candidates, interviewing them with bots to understand their fit
or evaluating their assessment results to decide if they should receive an offer.
• Performance Management: Manage your employees’ performance effectively and fairly without hurting their
motivation. Follow their KPI’s on your dashboard and provide real-time feedback. This would increase
employee satisfaction and lower your organization’s employee turnover. Actualize your employee’s maximum
professional potential with the right tools.
• HR Retention Management: Predict which employees are likely to churn and improve their job satisfaction to
retain them. Detect the underlying reasons for their motive for seeking new opportunities. By keeping them at
your organization, lower your human capital loss.
• HR Analytics: HR analytics services are like the voice of employee analysis. Look at your workforce analytics
and make better HR decisions. Gain actionable insights and impactful suggestions for higher employee
satisfaction.
• Digital Assistant: Digital assistants are mature enough to replace real assistants in email communication.
Include them in your emails to schedule meetings. They have already scheduled hundreds of thousands of
meetings. Use the power of artificial intelligence in your day to day activities. Your own on-demand powerful
AI-backed assistant is helping you 24/7.
124
• Employee Monitoring: Monitor your employees for better productivity measurement. Provide objective
metrics to see how well they function. Forecast their overall performance with the availability of massive
amounts of data.
• Building Management: Sensors and advanced analytics improve building management. Integrate IoT systems
in your building for lower energy consumption and many more. Increase the available data by implementing
the right data collection tools for effective building management.
TECH
• Analytics & Predictive Intelligence for Security: Analyze data feeds about the broad cyber activity as well as
behavioral data inside an organization’s network to come up with actionable insights to help analysts predict
and thwart impending attacks. Integrate external data sources the watch out for global cyber threats and act
timely. Keep your tech infrastructure intact or minimize losses.
• Knowledge Management: Enterprise knowledge management enables effective and effortless storage and
retrieval of enterprise data, ensuring organizational memory. Increased collaboration by ensuring the right
people are working with the right data. Seamless organizational integration through knowledge management
platforms.
• Natural Language Processing Library/ SDK/ API: Leverage Natural Language Processing libraries/SDKs/APIs to
quickly and cost-effectively build your custom NLP powered systems or to add NLP capabilities to your existing
systems. An in-house team will gain experience and knowledge regarding the tools. Increased development
and deployment capabilities for your enterprise.
• Image Recognition Library/ SDK/ API: Leverage image recognition libraries/SDKs/APIs to quickly and cost-
effectively build your custom image processing systems or to add image processing capabilities to your existing
systems.
• Secure Communications: Protect employee communications like emails or phone conversations with advanced
multilayered cryptography & ephemerality. Keep your industry secrets safe from corporate espionage.
• Deception Security: Deploy decoy-assets in a network as bait for attackers to identify, track, and disrupt
security threats such as advanced automated malware attacks before they inflict damage. Keep your data and
traffic safe by keeping them engaged in decoys. Enhance your cyber security capabilities against various forms
of cyber attacks.
• Autonomous Cyber security Systems: Utilize learning systems to efficiently and instantaneously respond to
security threats, often augmenting the work of security analysts. Lower your risk of human errors by providing
greater autonomy for your cyber security. AI-backed systems can check compliance with standards.
• Smart Security Systems: AI-powered autonomous security systems. Functioning 24/7 for achieving maximum
protection. Computer vision for detecting even the tiniest anomalies in your environment. Automate
emergency response procedures by instant notification capabilities.
• Machine Learning Library/ SDK/ API: Leverage machine learning libraries/SDKs/APIs to quickly and cost-
effectively build your custom learning systems or to add learning capabilities to your existing systems.
• AI Developer: Develop your custom AI solutions with companies experienced in AI development. Create
turnkey projects and deploy them to the specific business function. Best for companies with limited in-house
capabilities for artificial intelligence.
• Deep Learning Library/ SDK/ API: Leverage deep learning libraries/SDKs/APIs to quickly and cost-effectively
build your custom learning systems or to add learning capabilities to your existing systems.
• Developer Assistance: Assist your developers using AI to help them intelligently access the coding knowledge
on the web and learn from suggested code samples. See the best practices for specific development tasks and
formulate your custom solution. Real-time feedback provided by the huge history of developer mistakes and
best practices.
• AI Consultancy: Provides consultancy services to support your in-house AI development, including machine
learning and data science projects. See which units can benefit most from AI deployment. Optimize your
artificial intelligence spending for the best results from the insight provided by a consultant.
125
AUTONOMOUS THINGS
• Self-Driving Cars: From mining to manufacturing, self-driving cars/vehicles are increasing the efficiency and
effectiveness of operations. Integrate them into your business for greater efficiency. Leverage the power of
artificial intelligence for complex tasks.
• Vehicle Cyber security: Secure connected and autonomous cars and other vehicles with intelligent cyber
security solutions. Guarantee your safety by hack-proof mechanisms. Protect your intelligent systems from
attacks.
• Vision Systems: Vision systems for self-driving cars. Integrate vision sensing and processing in your vehicle.
Achieve your goals with the help of computer vision.
• Driving Assistant: Required components and intelligent solutions to improve rider’s experience in the car.
Implement AI-Powered vehicle perception solutions for the ultimate driving experience.
HEALTHTECH
• Patient Data Analytics: Analyze patient and/or 3rd party data to discover insights and suggest actions. Greater
accuracy by assisted diagnostics. Lower the mortality rates and increase patient satisfaction by using all the
diagnostic data available to detect the underlying reasons for the symptoms.
• Personalized Medications and Care: Find the best treatment plans according to patient data. Provide custom-
tailored solutions for your patients. By using their medical history, genetic profile, you can create a custom
medication or care plan.
• Drug Discovery: Find new drugs based on previous data and medical intelligence. Lower your R&D cost and
increase the output — all leading to greater efficiency. Integrate FDA data, and you can transform your drug
discovery by locating market mismatches and FDA approval or rejection rates.
• Real-Time Prioritization and Triage: Prescriptive analytics on patient data enabling accurate real-time case
prioritization and triage. Manage your patient flow by automatization. Integrate your call center and use
language processing tools to extract the information, priorate patients that need urgent care, and lower your
error rates. Eliminate error-prone decisions by optimizing patient care.
• Early Diagnosis: Analyze chronic conditions leveraging lab data and other medical data to enable early
diagnosis. Provide a detailed report on the likelihood of the development of certain diseases with genetic data.
Integrate the right care plan for eliminating or reducing the risk factors.
• Assisted or Automated Diagnosis & Prescription: Suggest the best treatment based on the patient complaint
and other data. Put in place control mechanisms that detect and prevent possible diagnosis errors. Find out
which active compound is most effective against that specific patient. Get the right statistics for superior care
management.
• Pregnancy Management: Monitor mother and fetus health to reduce mothers’ worries and enable early
diagnosis. Use machine learning to uncover potential risks and complications quickly. Lower the rates of
miscarriage and pregnancy-related diseases.
• Medical Imaging Insights: Advanced medical imaging to analyze and transform images and model possible
situations. Use diagnostic platforms equipped with high image processing capabilities to detect possible
diseases.
• Healthcare Market Research: Prepare hospital competitive intelligence by tracking market prices. See the
available insurance plans, drug prices, and many more public data to optimize your services. Leverage NLP tools
to analyze the vast size of unstructured data.
• Healthcare Brand Management and Marketing: Create an optimal marketing strategy for the brand based on
market perception and target segment. Tools that offer high granularity will allow you to reach the specific
target and increase your sales.
• Gene Analytics and Editing: Understand genes and their components and predict the impact of gene edits.
126
• Device and Drug Comparative Effectiveness: Analyze drug and medical device effectiveness. Rather than just
using simulations, test on other patient’s data to see the effectiveness of the new drug, compare your results
with benchmark drugs to make an impact with the drug.
• Healthcare chatbot: Use a chatbot to schedule patient appointments, give information about certain diseases
or regulations, fill in patient information, handle insurance inquiries, and provide mental health assistance. You
can also use intelligent automation with chatbot capabilities.
• Fraud Detection: Leverage machine learning to detect fraudulent and abnormal financial behavior, and/or use
AI to improve general regulatory compliance matters and workflows. Lower your operational costs by limiting
your exposure to fraudulent documents.
• Insurance & InsurTech: Leverage machine learning to process underwriting submissions efficiently and
profitably, quote optimal prices, manage claims effectively, and improve customer satisfaction while reducing
costs. Detect your customer’s risk profile and provide the right plan.
• Financial Analytics Platform: Leverage machine learning, Natural Language Processing, and other AI techniques
for financial analysis, algorithmic trading, and other investment strategies or tools.
• Travel & expense management: Use deep learning to improve data extraction from receipts of all types
including hotel, gas station, taxi, grocery receipts. Use anomaly detection and other approaches to identify
fraud, non-compliant spending. Reduce approval workflows and processing costs per unit.
• Credit Lending & Scoring: Use AI for robust credit lending applications. Use predictive models to uncover
potentially non-performing loans and act. See the potential credit scores of your customers before they apply
for a loan and provide custom-tailored plans.
• Billing: Leverage accessible billing services that remind your customers to pay. Increase your loan recovery
ratios. Use automated invoice systems for your business.
• Robo-Advisory: Use AI finance chatbot and mobile app assistant applications to monitor personal finances. Set
your target savings or spending rates for your own goals. Your finance assistant will handle the rest and provide
you with insights to reach financial targets.
• Regulatory Compliance: Use Natural Language Processing to quickly scan legal and regulatory text for
compliance issues and do so at scale. Handle thousands of paperwork without any human interaction.
• Data Gathering: Use AI to efficiently gather external data such as sentiment and other market-related data.
Wrangle data for your financial models and trading approaches.
• Debt Collection: Leverage AI to ensure a compliant and efficient debt collection process. Effectively handle any
dispute and see your success right in debt collection.
• Conversational banking: Financial institutions engage with their customers on a variety of communication
platforms (WhatsApp, mobile app, website etc.) via conversational AI tools to increase customer satisfaction
and automate many tasks like customer onboarding.
127
WORKING GROUP 6:
CHAIRMAN:
MEMBERS:
128
INTRODUCTION
“AI computing resources (‘AI Compute’)
Artificial intelligence (AI) has become an increasingly include one or more stacks of hardware and
important technology for individuals, organizations, and software used to Support specialized AI
governments around the world. AI enables machines to workloads and applications in an efficient
learn from data and experiences, identify patterns, and manner.”
make decisions with minimal human intervention. AI
applications are increasingly being used in various
industries, such as healthcare, finance, transportation, and agriculture, to name a few. In India, AI is being seen as a
potential solution to some of the country’s most pressing challenges in healthcare, education, agriculture, e-
the infrastructure, hardware, and software needed to process, analyze, and store data required for AI applications. This
includes high-performance computing (HPC) clusters, specialized accelerators (such as Graphics Processing Units, Tensor
Processing Units), data storage systems, and networking infrastructure. AI compute capacity also includes the software
tools and frameworks necessary to develop and run AI models efficiently, including programming languages such as
Python and specialized software libraries such as TensorFlow and PyTorch. In other words, AI compute capacity is the
backbone of any AI system, and without sufficient compute capacity, it becomes impossible to build and deploy complex
AI models.
Although India has made significant progress in developing its computational capacity, such as National Supercomputing
Mission (NSM), there’s still much work to be done. The central and state governments, as well as private organizations,
have taken initiatives to support the growth of AI community in the country. However, challenges such as the need for
more investment in research and development, a shortage of skilled talent, and the lack of infrastructure remain.
AI has the potential to transform multiple sectors of the Indian economy, from healthcare and education to agriculture
and transportation. In healthcare, AI can be used to analyse vast amounts of medical data, making it possible to identify
disease patterns and develop personalized treatments. AI models can be used to analyse soil samples and weather data
to determine the best time to plant crops, optimize irrigation, and predict crop yield. As India’s cities grow, AI can help
cities become more sustainable and habitable by analysing data on traffic patterns, energy usage, and air quality, among
others.
The Government of India, in recent years, has amplified its efforts to improve its AI infrastructure. In June 2018, the Indian
government released the country’s national strategy for Artificial Intelligence which clearly identified priority sectors for
AI deployment as healthcare, agriculture, education, smart cities, and mobility. India has been active participant in many
of the new multi-lateral efforts around AI such as Global Partnership for Artificial Intelligence (GPAI) and is likely to play a
significant role in setting AI standards [1]. As the Indian Government moves forward with its plans to invigorate AI based
innovation and build AI ready infrastructure, appreciating the complete scope of India’s AI capabilities – today and in-near
future – becomes increasingly important. Without adequate AI compute capacity, India will struggle to leverage the
potential of AI, and it may fall behind other countries in terms of innovation and competitiveness.
The purpose of this report is to provide a comprehensive overview of the current state of computational resources and
limitations in India, the challenges and opportunities associated with expanding AI compute capacity and research, and
the potential impact of AI applications in reshaping India’s future. The report aims to be a guide for stakeholders interested
in supporting the growth of AI compute capacity in India and realizing the full potential of AI in India’s future. Finally, the
report seeks to serve as a guide for stakeholders interested in supporting the growth of AI compute capacity in India and
129
realizing the full potential of AI in India’s future.
AI compute infrastructure plays a crucial role in the advancement and deployment of artificial intelligence (AI)
technologies worldwide. It encompasses the hardware, software, and cloud resources necessary to support AI workloads
efficiently. The availability of robust and scalable AI compute infrastructure is vital for enabling complex AI computations,
such as training deep learning models and running AI algorithms in real- time.
There has been a huge acceleration in the AI infrastructure over the last few years driven largely by the rise of GPUs and
cloud scale infrastructure. Specialized hardware and software stacks are being built to support training and inference of
large language models or LLMs.
Availability of AI compute infrastructure varies across the globe. For instance, the United States is at the forefront with
numerous supercomputers and high- performance computing (HPC) facilities, including those operated by government
institutions, academic research centers, and private entities. China has also made significant strides in AI compute
infrastructure, boasting several supercomputers and state-of-the-art facilities. According to the November 2022 Top500
list, there are 34 economies that possess a "top supercomputer" based on the Top500 methodology (Figure 1).
Source: OECD (2023), "A blueprint for building national compute capacity for artificial intelligence", OECD Digital
Economy Papers
The People’s Republic of China has the largest share, accounting for 32% of the top supercomputers. Following China, the
United States holds 25%, Germany 7%, Japan 6%, France 5%, and the United Kingdom 3% of the top super computers.
In terms of market share of compute infrastructure and AI, companies like NVIDIA, Intel, AMD, IBM, and Google are leading
this domain. These industry giants invest heavily in research and development to create high-performance processors,
graphics processing units (GPUs), Application Specific Integrated Circuit (ASIC) AI chips, and software frameworks. In
addition to on-premises installations, the hardware is available through cloud service providers (CSPs) like Amazon Web
Services, Microsoft Azure, and Google Cloud Platform. These CSPs enable users to access scalable AI compute resources
on- demand, reducing the need for extensive infrastructure investments and facilitating widespread adoption of AI
technologies. These advancements forecast to bring more than fivefold increase in global AI infra market (Figure 2).
130
Figure: Global AI compute infrastructure market size
The projected AI infrastructure market size is valued at USD 36.14 billion in 2022 and expected to reach USD 222.42
billion by 2030, and grow at a CAGR of 25.5% over the forecast period 2023-2030 [3]
India has a large and young population, a strong digital footprint, a remarkably large startup ecosystem, a robust
telecommunications network, and a growing economy. These factors make India a promising market for AI systems and
compute infrastructure. The Indian government has also taken steps to promote the development of AI. It has published
a national strategy for AI, created policies and regulations for the proliferation of AI and its usage, with the objective to
make India a global leader in AI by 2030. Some salient aspects of India’s policy landscape, digital adoption, workforce AI
literacy etc. are mentioned below.
● National AI strategy and private sector AI investments: The Indian government has published the National
Strategy for Artificial Intelligence (#AIFOR ALL.) in June 2018. The strategy aims to make India a global leader in
AI by 2030. It includes a number of initiatives to support the development of AI talent, research, and
infrastructure. The private sector in India is also investing in AI. In 2022, Indian companies invested $1.5 billion
in AI startups. Government also launched “National AI Portal” [4] which is a repository of Artificial Intelligence
(AI) based initiatives in the country at a single place.
● Telecommunications network maturity: India has a mature telecommunications network. The country has a high
penetration of mobile phones and fixed-line broadband. In 2022, there were 1.15 billion active mobile phone
subscribers in India [6], and there were 31.7 million fixed-line broadband subscribers.
● Level of digital adoption: The country has a large and growing digital economy. In 2022, the Indian digital
economy was worth $1.5 trillion. The high level of digital adoption in India provides a strong foundation for the
development and use of AI.
● Data and Workforce AI literacy: The country has a large and diverse population, with a growing economy. This
diversity and growth are generating a large amount of “raw” data, which when curated can be used in AI
applications. The workforce AI literacy in India is growing as well. The country has many universities and colleges
that offer courses in AI. In addition, the Indian government is providing training programs to help workers develop
AI skills.
● Geography and access to supply chains: The country is a member of various international groups, which consists
131
of major emerging economies. India is also a member of the WTO and the G20. The country’s geographic location
and institutional memberships provide it with access to global markets and supply chains.
India has witnessed significant growth in its AI compute infrastructure in recent years. Several leading technology
companies, research institutions, and startups have established data centers and computing clusters equipped with high-
performance computing resources. Major cloud service providers, such as Amazon Web Services (AWS), Microsoft Azure,
and Google Cloud Platform, offer AI-specific services and computing instances tailored for machine learning and deep
learning workloads. Additionally, India has also witnessed the emergence of specialized AI startups and research labs
focused on developing AI compute solutions.
The National Supercomputing Mission (NSM), launched by the Government of India, aims to enhance the country’s
compute capabilities by establishing a network of 24 supercomputers and high-performance computing facilities [7].
Under this mission, several supercomputing facilities have been deployed across India, supporting scientific research and
AI-related applications.
AIRAWAT, an AI-specific cloud computing infrastructure approach paper [8], outlines a strategy to build a robust, scalable
AI ecosystem in India. PARAM Siddhi, on the other hand, is one of India’s fastest supercomputers specifically designed for
AI applications. At the time of writing, this facility is ranked 75th in the top 500 list of Supercomputers in the world [9].
With this background, we propose a Nation-wide AI infrastructure in the subsequent chapter.
• AIRAWAT-PSAI: As a 410 AI Petaflops AI-specific cloud computing infrastructure. It emphasizes on the need for
scalable compute and storage resources, efficient data handling, robust privacy and security protocols, and
extensive software libraries for AI.
• PARAM Siddhi-AI: PARAM Siddhi is a 210 AI Petaflops high-performance computing resource that can handle
complex AI workloads. It has a multitude of cores and a massive amount of memory, making it ideal for training
large, complex AI models.
132
CHALLENGES AND LIMITATIONS FACED IN CURRENT AI COMPUTE CAPACITY
Despite the progress in AI compute infrastructure, there are several challenges and limitations that need to be addressed:
● Infrastructure Accessibility: Rural areas and smaller cities face challenges in accessing advanced computing
resources, limiting their ability to participate fully in AI-driven initiatives.
FUTURELAB AI ARCHITECTURE
Setting up FutureLab AI compute infrastructure through public-private partnerships (PPPs) can be a strategic
approach to leverage the expertise and resources of both the government and private sector.
The overarching infrastructure comprises of the three-layered systems, high-end compute infrastructure, inference arm,
and edge compute that are strategically distributed to meet users’ computational requirements efficiently. This
distributed architecture ensures that users can seamlessly transition between these resources while harnessing their
distinct capabilities.
To facilitate smooth data exchange and collaboration across these distributed components, the secure distributed data
grid plays a pivotal role. The data grid acts as a robust and secure framework, enabling users to upload, download, and
exchange large datasets and AI models seamlessly.
The AI infrastructure can be established with current/latest generation of X86 processing, Latest generation accelerators
or Chip-to-Chip (CPU-GPU with CPU-GPU and GPU-GPU > 400 Gb/s bandwidth), High Throughput Storage, High Speed
Network, Cloud enabled software infrastructure and secure Data Centre ecosystem.
FutureLabs AI Compute Infrastructure is essentially a 3-tier architecture where, we can divide the infrastructure into three
133
distinct layers, each with its specific roles and responsibilities. This approach allows for efficient scaling, management, and
maintenance of the AI infrastructure.
134
High End Compute
Mid-Range Compute
Edge Compute
Design: Complementing the high-end compute infrastructure, the inference platform consists of a cohesive ensemble
of servers. Collaboratively, these servers deploy AI services and execute AI models, facilitating applications. These
servers are designed to handle concurrent inference requests, enabling the efficient execution of AI models at scale.
A collective configuration of GPUs from Tier 2 and Tier 3 centers, optimized for real-time inferencing tasks.
Capacity: The combined inferencing platform is engineered to handle concurrent inferencing requests from up to one
lakh (1,00,000) users in real-time.
Function: The inferencing platform serves AI applications that require quick responses, catering to a large user base
across the country.
By implementing this 3-tier Distributed Grid Model, India can establish a robust and scalable AI Compute
Infrastructure to support various AI applications and cater to the country's growing demand for advanced AI
capabilities.
135
Storage Total Power
Total Total Power
No. of GPU/ No. of Performance Total No. Capacity Storage requirem No.of
No. of GPU Performance requirement
Centre Nodes (EF) / Centre of Nodes (PB) Capacity ent (MW) Centres
(EF) (MW)
/Centre (PB) / Centre
AI Training 10,000 1,250 40 10,000 1,250 40
High Performance Storage 200 20 1
200 200
Table 1: FutureLabs AI Compute Infrastructure Setup
OVERALL AI COMPUTE INFRASTRUCTUR E SETUP
136
WHY AI INFRASTRUCTURE NEEDS BOTH TRAINING AND INFERENCE
While AI is referred to as a solution in a general sense, there are two distinct functions within any AI solution – the
“training” of an artificial neural network and “inference” processing. Training is the process of creating a neural network
model by processing of large amounts of relevant raw data (example: pictures of cows) through a particular neural
network algorithm, which creates the weighted layers of a neural network model, and, finally, fine tuning the model by
paring and optimization of the model. The inference part of AI is the deployment of the trained neural network model
for use in real-time (<10ms) or near real-time (10ms to 100ms) evaluation of new data (example: determining if a new
picture is of a cow). Because inference is the deployment or actual use of a neural network model. Both training and
inference are required for the deployment of an AI solution and both require extensive resources. However, the resource
requirements of each are quite different and there is a need for dedicated resources for each.
Table 2: Task wise Compute usage
modelling Research
modelling
modelling
modelling Research
modelling
Source: https://epochai.org/blog/estimating-training-compute#example-cnn-lstm-fcn-model
WORKLOADS
Training a neural network model can be one of the most challenging data center workloads. It requires dedicated
networks, storage, memory, and servers to process in parallel a single large data set, to produce the neural network
model. Then the neural network model must be optimized for deployment as an inference model. However, training is
not a one-time act. The model is often continuously retrained and modified as new and different data is added to the
raw data set. As a result, training typically requires dedicated resources that cannot easily be shared with other functions
within a data center. The neural network training performance is determined by the quality of the raw data set, training
time, and required level of accuracy. Like training, inference requires dedicated resources, but the workload processes
multiple inputs through a trained model to produce a result. Inference requires multiple instances, just like productivity
applications or web services, to handle the constant flow of new data in a timely manner. Neural network inference
performance is determined by throughput at the desired level of accuracy and response time to the input data.
137
Note that neural network performance is also the result of the overall network configuration and performance, not just
the neural network model, which must be considered in the system design.
PRECISION
Another difference between training and inference is the level of mathematical precision required to achieve the same
level of accuracy. Because training involves the development of the complex neural network model, it requires the
processing of a large data set to achieve a high level of accuracy. As a result, AI training platforms typically use 32-bit
floating point (single-precision) or 64-bit floating point (double- precision) data formats in the matrix multiplication
operations to maintain a high level of accuracy. Since the outcome of inference processing can only result in a limited
number of outcomes, lower and more efficient precision data formats can be used to accelerate inference without
significant drop in accuracy of the prediction. Current inference platforms typically use 8-bit or 16-bit integer data formats,
but many companies are evaluating using even
lower precision data formats, such as 4-bit integer,
when the reduction does not result in a significant
loss of accuracy. This reduction in the data format
also reduces the processing and power
requirements. While the same processing core can
be used for both training and inference, using cores
optimized for the different precision levels of
matrix multiplication can increase the throughput,
reduce power, and increase the overall efficiency of
the platform.
SCALING CHALLENGES
The greatest challenge for AI solutions is scaling.
For training, the resources required, such as memory, storage, and compute, increases with both the size of the data set
and the complexity of the neural network model. There are two key measures in training – time and accuracy. If there was
no impact on the resulting neural network model, just doubling the data size would double the operations required to
process the data. This requires increasing the resources by a relative factor to process the extra data in a fixed time.
However, more data also increases the accuracy, which translates to addition layers or nodes per layer and additional
resources. As a result, the doubling of model data can increase the operations required to process the data in a fixed time
by up to the original number of operations squared or n^2. The result is an exponential increase in the resources required
relative to the size of the data sets. With extremely large data sets, this can require hundreds or thousands of servers to
process the data in a timely manner (typically a few hours to a few days is acceptable depending on the application).
138
In contrast, the resource requirements for inference are typically not as great for a single instance of a model. However,
the deployment of a model may require hundreds or thousands of instances of the neural network model to support
multiple incoming data sources or to be closer to the data source. Each instance will require the same platform resources,
which increases the operations by up to the number of inputs. As an example, millions of smartphone users issue voice
commands to their smartphones every minute. While a single group of servers executing a natural language processing
inference model can handle several users, the number it can processes per minute is limited by the latency limitations of
the application, which in this case is only a few seconds. As a result, the amount of data or number of users an inference
platform can support is limited to the performance of the platform and the application requirements. The same is true
for other applications, such as event detection by security cameras, anomaly detection in manufacturing, or any other
industrial or consumer application processed using on-premises or cloud resources.
Vast majority of AI training computation is currently performed on Graphics Processing Units (GPUs) and most inference
processing is performed on traditional Central Processing Units (CPUs) with support from other architectures, including
GPUs, Field Programmable Gate Arrays (FPGAs), and Digital Signal Processors (DSPs). These architectures are also
evolving with additional feature, such as matrix multiplication engines and new instructions to improve AI processing.
Traditional processor architectures also continue to evolve around semiconductor design, semiconductor manufacturing
technology, and AI software frameworks.
DATA CENTER
In addition to the processors used for training and deploying a neural network model, the different workloads also place
unique requirements on the rest of the data center resources. Everything from the I/O, network, storage, and memory
subsystem should be designed around the intended function. As mentioned above, training is performed using large
amounts of data, typically historical data, and processing it in a highly parallel fashion. This requires large amounts of
storage and memory close to the processing cores to limit the latency associated with reading data from storage. It also
requires a configurable high-speed network between processing cores. Because inference is processing new data through
a trained model, it requires high I/O bandwidth combined with memory large enough to hold the models and data
without calls to storage. It also requires a configurable network to support each neural network instance. Essentially, the
difference between the two is raw execution (training) vs. throughput (inference). Because of the unique processing
unique requirements for both AI training and inference, there is a need for dedicated resources for each form of
processing within an AI platform or data center.
139
Figure 6: Secure Distributed Data Grid Architecture
It is indeed a well-established fact that large quantities of quality data when used in training of models will result in very
highly accurate models. By and large, this model building needs very huge compute infrastructure complemented by
large volumes of high quality, annotated data. The idea of distributed data grid is to establish right infrastructure to
facilitate all aspects of establishing a secure, robust ecosystem in which, it becomes easy to develop economical and very
accurate AI based products and businesses.
Thus, we arrive at the first requirement that, we need to have mechanisms to transfer large quantities of data. Since the
data sizes are different for training, for inferencing and for providing various AI based services, right network
infrastructure with optimal bandwidth allocation is needed. It is equally important to appreciate that the compute
requirements for Training models, for inferencing and for providing services are different. Hence, we need to have rightly
sized compute and storage infrastructure.
Transferring Huge Data sets: Huge amounts of data are being transferred for many applications even today. Typical
applications that come to one’s mind are the huge data that Large Hadron Collider (CERN, Geneva) is generating and
transferring all over the world to esoteric use and the more generic Netflix movies. Then, what is the challenge? The
point to be appreciated is that the data, however big, is NOT changing very often in its entirety. Hence this data can be
distributed and given access to by Data Mirrors. The minimal incremental changes can be transferred through minimal
bandwidth requirements. Further, the end-point applications are fairly simple, such as an parallel application or a simple
rendering application which can be very effectively distributed to obtain very high throughput. This essentially means
that the computation needs are rather modest in nature for getting one instance of throughput. However, for the AI
infrastructure that is being considered, we need to appreciate that the data required for training a model may change
very often, depending on the domain and the models themselves may be different. Hence, we need to have very high
data transfer capabilities. Also, huge compute infrastructures that may run into a few AI exaflops with adequate storage
capabilities. Only then, our entrepreneurs can develop well trained, very accurate models that give them competitive
advantage. It is also a fact that there can be only a few such computing facilities both from economic perspective and
from the perspective of establishing Electrical and cooling infrastructure in the country. Considering the vast expanse of
our country, it is essential that we establish very high-speed, optimal connectivity between AI developer and the compute
infrastructure to facilitate him/her, at all stages of AI based product development.
Technology Perspective:
In any large computing center, InfiniBand connectivity is used for interconnecting processors and storage as the
bandwidth and the latency offered are very good. Another advantage of InfiniBand connectivity is the use of ‘RDMA’
protocol that facilitates direct data transfer into the remote memory without the intervention of CPU and multiple
140
copying operations needed of the Network Interface Card (NIC) Host Channel Adapter buffers (HCA) buffer and the Kernel
Buffers. However, the distances to which InfiniBand connectivity can transfer data is limited to about 80 Kms (Say long
haul InfiniBand). Looking at the positive side, the data transfers can happen at about 400 Gbps and one can do a direct
mount of the storage device that is about 80 Kms away. Hence, there would be no significant performance degradation
between a directly attached storage disk and the storage device that is about 80 Kms away. This is a good solution when
one is located in the vicinity of a large computing resource. This can act as Near Active-Active Disaster Recovery Site as
well.
However, for distances larger than 80 Kms a modified version of the very simple and popular Ethernet protocol, referred
to as ‘Remote Direct access protocol over converged Ethernet’ can be used. It is worth noting that driving optical signals
over distances of 80 KMs would entail a repeater of signals. However, this is just a matter of using right technology and
developing right software to effect data transfers at 400 Gbps. Many factors like availability of dark fiber, availability of
wavelengths, choosing right protocol stack need to be considered. However, this is a matter of technical design and
implementation. All major countries are experimenting with this kind of implementing faster data transfer modes.
Transferring raw data bits is one thing and transferring files/ records/ Objects/ Databases is another ball game all
together. We need to develop right ecosystem with correct protocols depending on our needs. Depicted below is the
data transfer scheme and the protocol stack that will have to be utilized for developing the transfer mechanisms.
Thus, we have efficient and faster way of transferring data between two entities at about 400 Gbps. Further, the
technologies for 10 Gbps and 1 Gbps are well established. Hence, we have connectivity of different bandwidths to use
right bandwidth between components of interest (refer Figure 10).
Further, it is important to ensure the privacy and security of data of individual entrepreneurs and companies. We need
to ensure that the control about presenting their data in decipherable form only when they decide to do so. Interesting,
secure algorithms can be designed including key-less encryption methodologies that are quantum-attack immune.
As discussed earlier on in this chapter, the compute and storage requirements too are different for training models, for
inferencing and for providing AI services. Depicted below is the typical architecture which caters to majority of workflows
of designing, developing, deploying and commercially exploiting typical AI models.
141
LET US CONSIDER A TYPICAL WORKFLOW:
Use case 1:
● An AI developer would need authentic, tagged data from his/her source and data from large repositories
developed by various Agencies like (Ministry of Health, Education, Commerce, Environment etc.). He/ She can
access the same repositories. He/ she can access the same data from mirror locations as well.
● He can initiate a script on the large system that contains the paths to the location of data on the repository and
model he wishes to train on the repository. The script will get executed only when all desired data is present on
the local storage of the large system, the trained model will be ploughed back on to the repository.
● Inferencing too needs reasonable but modest resources. Hence, we need to establish a farm of servers with
modest computing and storage capacities. The AI developer can upload the desired inputs to a farm of servers
and get the results to test his trained model/ product. Depending on the complexity of the model/ product being
developed, this step can be optional. If he has sufficient computing capacity of his own, he can utilize the Laptop
or Desktop as an edge device.
● He can download the model for his use and/or may upload the same for commercial exploitation to the AI-
Market-Place which is a cloud based platform.
Use case2:
1. AI developer can get the data from the mirrors and curate and tag the same.
2. He can commercially exploit the same by uploading on to the AI-market-place.
Other Use cases: There can be many marketing services, AI services, Legal Services, Payment Services, Maintenance
warrantee services etc. Only imagination will be the limit to the kind of services that may come up on the AI market
place.
Secure Distributed Data Grid Components- Let us define the functionality of the components that have been listed in
Architecture diagram.
● Large System: It will have compute capability of 500 to 3000 AI Petaflops, Large storage to the tune of about
50 to 100 Petabytes and adequate number of inference engines. It shall also have a DR site with similar
storage and mid-range compute capability
o Functionality: The storage attached to this large system will access/ copy data from the repositories and the
models to be trained from the repositories and shall plough back the trained models on to the repository.
Thus, needs to be very fast between Large Machine, Repository and the DR.
o Connectivity: 400 Gbps (both input and output)
● DR site: The DR site will be active-active one with the main emphasis being on the safety of data and this shall
be of same capacity as that of Large system with less compute capability.
o Functionality: As specified earlier, this shall be an active-active DR and will have near real-time state of the
storage as that of the Large System.
o Connectivity: 100 Gbps (DR/ NDR) both input and output
● Repository: Repository will have 25 to 50 PB of storage along with medium range compute (250 AI PF) capability.
142
o Functionality: The repository shall contain curated, tagged and anonymized data from different agencies/
ministries. Further, it may also host temporarily the data from a consumer, do a quality check and keep it
curated. It will also temporarily host the model that a consumer wishes to train.
The repository will also receive the trained models from the large computer and host it temporarily to be
copied back by the consumer. The consumer may be connected to the repository at about 10Gbits/ 1 Gbit
speed. The data and model that a consumer provides shall be under the total control of the consumer. The
security of the data will be guaranteed.
o Connectivity: Both input and output 400 Gbps
● Mirror Site: Mirror site will have curated, anonymized and tagged data provided by contributors such as
Government Agencies and Voluntary organizations.
o Functionality: This will contain the data from the repository, and it does not contain the model information
uploaded by a consumer but may contain trained model that Government wises to share.
o Connectivity: This site will be connected to the repository at 100/ 10 Gbps. Both input and output: 100/ 10
Gbps.
● Inferencing Farm: This will comprise of a collection of modest capacity servers with low-end GPGPUs.
o Functionality: The inferencing farm will facilitate Inferencing/ AI Product development/Testing activities.
o Connectivity: The Input and output connectivity can both be pegged at 100 Gbps.
● AI Developer: The AI Developer shall have total control over the data he wishes to upload and the model he
wishes to train. He can do the uploads only to the repository. He can be connected to the repository at 10/ 1
Gbps.
o Functionality: He can retrieve the trained models after processing at the large machine only from the
repository. He can access the data from the mirrors for his research work. The connectivity to the mirrors
can be at 10/ 1 Gbps. He shall have very limited access to the Large Computer. The jobs there can use data
and models from the repository and the output/ trained model can be staged back to the repository and
kept for a finite amount of time to be downloaded by the AI Developer. The AI Developer, thus will have
very limited access to the large system and can only run some scripts. Normal 1 to 10 G connectivity will
suffice.
● Consumer: Consumer can avail the AI services from the AI Market place cloud as if he is accessing a typical web-
service on the device.
o Connectivity: The connectivity can be through any of the media like wired/ Wireless/ Satellite at the speeds
he wishes.
Sovereign cloud refers to a cloud computing infrastructure that is owned, managed, and operated within the legal
jurisdiction of a specific country. This type of cloud computing setup often adheres to the regulations, standards, and
legal requirements of that country. An Indian sovereign cloud let us focus on its potential structure and management,
within the context of India’s regulatory environment and technological capabilities.
143
Figure 8: AI Cloud Market Place
PRE-REQUISITES
To start with, there would be several key pre-requisites for setting up an Indian sovereign cloud:
• Legal Framework: It would be important to have a comprehensive legal and regulatory framework for data
protection and cybersecurity in place. As of latest, India is in the process of implementing such a framework. The
DIGITAL PERSONAL DATA PROTECTION BILL, 2022 for example, was one such proposed legislation which is ready
to be passed in parliament.
• Infrastructure: Robust physical and digital infrastructure would be crucial. This would include data centers
located within India, as well as high-speed internet connectivity and reliable power supply.
• Technological Capabilities: Setting up a sovereign cloud would require significant technical expertise. This would
include expertise in cloud computing technologies, cybersecurity, and data management, among others.
• Funding: The establishment and maintenance of a sovereign cloud could be a costly endeavour. Securing
adequate funding would therefore be crucial.
• Framework: The sovereign cloud could be set up as a multi-cloud architecture. This would involve utilising a
combination of different cloud services, potentially from different providers, in a way that they can be managed
as a single, unified system.
• Public-Private Partnerships: The establishment of a sovereign cloud might involve partnerships between the
Indian government and private companies, particularly cloud service providers. These partnerships could help to
leverage the technical expertise and resources of the private sector.
• Data Localization: One of the key aspects of a sovereign cloud would be data localization. This means that all
data stored in the cloud would have to be physically located within India. This could be a requirement both for
government data and for private sector data.
• Security: Security would be a key concern for the sovereign cloud. This would involve implementing strong
cybersecurity measures, as well as regular audits and checks to ensure the security and integrity of the data.
• Accessibility: Accessibility should be a key consideration in the design and implementation of the sovereign
cloud. This aspect involves ensuring the infrastructure can be accessed by various stakeholders from different
parts of the country and at different times, considering the geographic diversity and time zone differences in
India. Accessibility also extends to the user interface, which should be user-friendly and designed in a way that
users with varying levels of technical expertise can utilize the services provided.
144
• Monitoring: Regular monitoring of the cloud infrastructure would be crucial to ensure that it is functioning
properly, and that data is being managed appropriately.
• Maintenance: Regular maintenance of the physical and digital infrastructure would be crucial to ensure the
reliability and performance of the cloud.
• Security Management: Regular security checks and updates would be necessary to protect against cyber threats.
This could also involve monitoring for potential security breaches and responding to them promptly.
• Compliance: Ensuring compliance with the relevant regulations and standards would be a key aspect of the day-
to-day management of the sovereign cloud. This could involve regular audits and checks.
• Resource Management: Managing the resources of the cloud, such as storage and processing power, would be
a key task. This could involve allocating resources to different users and applications and adjusting these
allocations as necessary.
• Disaster Recovery and Continuity Planning: Planning for potential disasters, such as natural disasters or cyber-
attacks, would be a crucial aspect of managing the sovereign cloud. This could involve developing and
implementing disaster recovery plans, as well as strategies for maintaining continuity of service in the event of a
disaster.
To conclude, while the establishment and management of an Indian sovereign cloud would be a complex and potentially
costly endeavor, it could also offer significant benefits in terms of data protection, cybersecurity, and digital sovereignty.
It would be crucial to have a strong legal framework.
● Centralized access point for Users, Cloud administrators, Tenants to carry out functions for cloud management
and provisioning.
● Provisioning and orchestration of HPC-AI Clusters, Conventional compute for inference and hosting in form of
containers and VMs.
● Orchestration capability for priority-based resource allocation, workload scheduling and pre-emption if needed;
and pre-specified segregation of workloads arising from security/privacy mandates.
● Provides capabilities for audit of security vulnerabilities and compliance requirements.
145
● Unlike traditional cloud platform, telemetry in HPC has to be able to capture utilization metrics for virtual GPUs,
IB fabric, HPC Storage.
● Enable tenants to dynamically provision and scale-up/down HPC-AI clusters in terms of nodes/GPUs, HPC
storage, compute communication fabric.
The proposed cloud environment can be established with OpenStack based open-source cloud suite.
o Consuming User: Users benefit from a higher level of abstraction with AI PaaS, allowing them to focus
on building AI models and applications without the need to manage the infrastructure. However, this
comes with less control over the underlying resources.
o Implementing Agency: The agency’s responsibilities extend to managing both infrastructure and AI
development tools & services. They offer a platform with pre-configured AI tools and libraries, ensuring
that users have access to the latest AI technologies.
● AI Services
AI Services (or AI as a Service - AIaaS) are pre-built AI solutions that can be integrated into applications without
the need to build models from scratch. Examples include natural language processing, image recognition, and
chatbots.
o Consuming User: Users can readily incorporate AI functionality into their applications with AI Services.
This model requires the least technical expertise, as it eliminates the need for AI model development
and training.
o Implementing Agency: The agency not only manages the infrastructure and platform but also builds,
trains, and maintains the AI models. They provide APIs or SDKs for users to integrate the AI services
into their applications.
146
● Availability: This is often referred to as "uptime" and is usually presented as a percentage. For example, a
provider may guarantee that services will be available 99.9% of the time. This means that the services may be
down or unavailable for the remaining 0.1% of the time.
● Performance: The SLA may specify performance metrics that the services are expected to meet, such as
response times or throughput.
● Security: The SLA should outline the security measures that the provider has in place to protect your data. This
could include details about data encryption, firewalls, and compliance with security standards.
● Data Protection and Backup: The SLA should explain how the provider will protect your data, including details
about data backup and recovery procedures.
● Support: The SLA should specify the level of support provided, including response times for support requests
and hours of availability.
● Penalties for Non-Compliance: If the provider fails to meet the terms of the SLA, there may be penalties to be
defined.
● Disaster Recovery: The SLA should outline the provider's disaster recovery plan, including how quickly services
can be restored after a disaster.
A significant challenge is providing the necessary infrastructure and compute resources to support AI applications and
large-scale data processing.
• Countermeasures:
o Establish public-private partnerships to share the costs and resources needed for building data centers and AI
compute infrastructure.
o Leverage cloud-based AI services and platforms to distribute compute resources and minimize the upfront
investment in hardware.
o Invest in the development of domestic semiconductor and hardware industries to reduce dependency on
imports and enhance the country’s AI capabilities.
Ensuring data privacy and security is crucial when dealing with sensitive information in sectors like healthcare, finance,
and government.
• Countermeasures:
o Implement strong data protection regulations and guidelines to ensure data privacy and security.
o Develop secure data storage and processing facilities with robust access control mechanisms.
o Encourage the use of privacy-preserving AI techniques, such as federated learning or differential privacy, to
minimize the risk of data breaches and protect sensitive information.
A lack of skilled AI professionals and researchers can hinder the development and adoption of AI technologies in the
country.
• Countermeasures:
147
o Strengthen AI education and training programs at universities and technical institutes.
o Collaborate with international research institutions and organizations to promote knowledge exchange and
capacity building.
o Establish dedicated AI research labs and innovation centers to attract top talent and foster a culture of
innovation.
Integrating AI solutions across different sectors and organizations requires seamless interoperability and adherence to
common standards.
• Countermeasures:
o Develop and promote the adoption of open standards and APIs for AI applications and data sharing.
o Encourage the use of open-source AI frameworks and platforms to ensure compatibility and ease of
integration.
o Create cross-sector working groups to identify common challenges and collaboratively develop AI solutions.
Addressing ethical and legal concerns around AI technologies is vital to ensure their responsible and fair use.
• Countermeasures:
o Develop a national AI strategy and ethical guidelines to govern the development, deployment, and use of
AI technologies.
o Establish regulatory bodies and oversight mechanisms to monitor AI applications and ensure compliance
with ethical principles and legal requirements.
o Encourage public dialogue and stakeholder engagement to create awareness about AI ethics and promote
responsible AI development.
Current Landscape
• Intel: Intel introduced OneAPI, a unified programming model intended to simplify development across
diverse architectures. The goal is to support not only Intel’s CPUs, GPUs, and FPGAs, but also devices from
other vendors. However, its adoption and efficiency on non-Intel hardware remains to be seen.
• NVIDIA: NVIIA’s CUDA is a proprietary, parallel computing platform and API model. It is currently the most
popular framework for GPU-accelerated applications, but it only supports NVIDIA GPUs.
• AMD: AMD supports the open-source ROCm platform aimed at HPC and AI applications. It’s also a strong
supporter of OpenCL, a framework for writing programs that execute across heterogeneous platforms.
The development of a universal GPU API would require agreement and cooperation between these leading vendors. Such
an API would need to:
148
• Be Hardware Agnostic: The API should allow developers to write code that can run on GPUs from NVIDIA, Intel,
AMD, and potentially others without requiring significant modifications.
• Maintain High Performance: One of the challenges is ensuring the API can extract high performance from all
supported GPUs, which each have different architectures and features.
• Support Advanced Features: The API should support the latest features and capabilities of modern GPUs, such
as ray tracing or tensor cores, which can vary significantly between vendors.
High-Performance Computing (HPC) and Artificial Intelligence (AI) are converging to provide powerful solutions for
complex applications, data analysis, and decision-making processes. To harness the full potential of this convergence,
developers rely on AI frameworks, libraries, and Software Development Kits (SDKs) to build and deploy efficient AI models
on HPC infrastructure.
AI Frameworks: AI frameworks are software libraries that provide developers with an environment to create, train, and
deploy AI models. These frameworks simplify the development process by offering pre-built components, abstracting
complex mathematical operations, and optimizing model performance on HPC infrastructure. Examples of popular AI
frameworks include TensorFlow, PyTorch, and Apache MXNet.
AI Libraries: AI libraries are specialized collections of algorithms and functions that developers can use to implement AI
functionality in their applications. These libraries often focus on specific domains, such as machine learning, deep
learning, computer vision, or natural language processing. By leveraging these libraries, developers can accelerate the
development process and ensure compatibility with HPC systems. Examples of AI libraries include scikit-learn, OpenCV,
and NLTK.
Software Development Kits (SDKs): SDKs are comprehensive packages that Include libraries, tools, and documentation
needed to develop applications for a specific platform or service. In the context of HPC-AI systems, SDKs provide
developers with the resources to build and optimize AI applications for HPC environments, taking advantage of the
underlying hardware and software infrastructure. Examples of AI- related SDKs include CUDA, Intel AI Analytics Toolkit,
and OpenCL. Leveraging Open- source SDKs instead of using proprietary software should be utmost important due to
transparency, community collaboration and continuous improvements.
• Vendor Cooperation: NVIDIA, Intel, and AMD are direct competitors, and each has invested heavily in their
proprietary or favored technologies. Coordinating between them to support a single API could be difficult.
• Scalability: Instantly scaling up or down to accommodate workload changes, Incompatibility between different
hardware architectures can also make the framework less dynamic.
• Performance Optimization: Creating a truly "write once, run anywhere" API that also delivers optimal
performance on all platforms is a complex task due to the architectural differences between GPUs.
• Adoption: Even if a single API could be developed, it would need to be adopted by developers, which could be
a slow process if the new API doesn’t offer clear advantages over existing solutions.
Mechanisms for AI adoption among relevant stakeholders like start-ups, research labs, and academic institutions- The
widespread adoption of AI compute infrastructure is essential for driving innovation and progress in various fields.
Various mechanisms conceptualize to ensure the adoption of compute infrastructure.
149
Table 3: AI adoption for Stakeholders
Access to infrastructure and Start-ups, research Providing access to compute infrastructure and tools can help
tools labs, academic to reduce the cost of developing and deploying AI models.
institutions
Data sets Start-ups, research Ensure availability and providing access to high- quality,
labs, academic labelled data sets are essential for the development and
training of AI models.
institutions
Financial incentives Start-ups, research Providing financial incentives, such as grants and tax breaks,
labs, academic can help to reduce the cost of adopting AI and make it more
feasible for
institutions
businesses and organizations to invest in AI.
Regulatory and policy Government Governments can play a role in supporting the adoption of AI
support compute infrastructure by providing regulatory and policy
support. This could include things like providing tax breaks for
businesses that invest in AI, or creating regulations that
promote the use of AI in specific
sectors.
Competitions and Start-ups, research Hosting competitions and challenges can help to stimulate
challenges labs, innovation in the field of AI, and can
Innovation hubs and Start-ups, research Creating innovation hubs and centers of excellence can help to
centers of excellence labs, academic bring together stakeholders from different disciplines to
institutions collaborate on AI projects. This can help to accelerate the
development of new AI
150
Mechanism Stakeholder Description
Awareness and education Start-ups, research Raising awareness of the potential benefits of AI, and providing
labs, academic education and training on how to use AI, can help to encourage
more businesses
institutions
and organizations to adopt AI.
Training and capacity Start-ups, research Providing training and capacity building on how to develop and
building labs, academic deploy AI models can help to ensure that stakeholders have
the skills and
institutions
knowledge they need to use AI effectively.
These mechanisms are essential for ensuring the adoption of AI compute infrastructure by relevant stakeholders. By
ensuring that these mechanisms are in place, organizations can increase the likelihood of success with their AI initiatives.
Metrics to establish and evaluation frameworks to measure the impact of the AI compute infrastructure on start-ups
and innovation.
There are a number of metrics and evaluation frameworks that can be used to measure the impact of compute
infrastructure on start-ups and innovation. Some of the most common metrics include:
SPMIND is a solution provider maturity index for AI. It helps start-ups and SMEs identify trusted AI solution providers
who can help them implement AI solutions [12].
SPMIND index consists of 6 pillars with each pillar having requirements to assess whether the solution provider has
achieved a satisfying level of AI maturity.
Source: https://gpaidemo2.kinsta.cloud/spmind/
151
AIMIND: Artificial Intelligence Maturity
Index
AIMIND is a measure of AI readiness
assessment for SMEs. It helps SMEs assess
their current AI maturity and identify areas
where they can improve [13].
- Organisational Readiness
- Data Readiness
- Infrastructure Readiness
152
Measures for utilizing AI compute for scaling up and commercialization of AI solutions.
Consideration Description
Distributed and parallel computing Implement distributed and parallel computing to accelerate AI model training
and deployment.
Hardware acceleration Leverage specialized AI hardware like GPUs, TPUs, or FPGAs for faster training
and inferencing.
Pre-trained models and transfer Utilize pre-trained models and transfer learning to reduce training time and
learning computational resources.
AutoML and hyperparameter Employ AutoML tools and hyperparameter optimization techniques for
optimization improved model performance and resource efficiency.
Containerization and Adopt containerization and microservices for enhanced scalability and
microservices maintainability of AI solutions.
API-driven integration Develop API-driven AI solutions for seamless integration with existing systems
and faster commercialization.
Monitoring and Utilize monitoring and management tools provided by AI compute platforms
management tools to track resource usage, performance, and costs.
Collaboration and partnerships Foster collaborations and partnerships with industry players and research
institutions to drive AI commercialization.
153
MONETIZATION & COMMERCIALIZATION MODEL FOR AI COMPUTE INFRASTRUCTURE
Setting up an AI compute infrastructure to establish a secure distributed data grid requires significant resources, both in
terms of capital expenditure and operational costs. However, once established, there are several ways to monetize and
commercialize this infrastructure.
AI AS A SERVICE (AIAAS)
AI as a Service (AIaaS) is a cloud-based service that provides businesses with access to artificial intelligence (AI)
capabilities without the need to invest in or maintain their own AI infrastructure. AIaaS can be used for a variety of
purposes, including NLP, Computer vision, and speech recognition etc.
PRICING MODEL
The pricing model for the National AI Compute Grid should consider the cost of infrastructure, maintenance, support,
and upgrades. Different pricing models can be adopted:
● Pay-as-you-go: Users pay only for the compute resources they use. This model is flexible and can accommodate
users with varying needs.
154
● Subscription-based: Users pay a fixed amount periodically to access the grid’s resources. This model can provide
a steady income stream and is suitable for users with consistent usage patterns.
● Tiered pricing: Different levels of service are provided at different price points. This model can cater to a broad
range of users, from small-scale to large-scale users.
● Subsidies
Subsidies can encourage the use of the AI compute grid, particularly for startups, research institutions, and small
businesses. These could include:
o Reduced Pricing: Offering discounted prices for eligible users can make the grid more accessible.
o Grants: Providing grants for specific projects can encourage innovative uses of the grid.
o Free Trial Period: Allowing users to try the grid for free for a limited period can attract new users.
After thorough research, analysis, and collaboration among the working group members, we have identified key insights
and propose the following actionable recommendations:
HIGH-END COMPUTE
o AI Training: 1 Centre with 10,000 GPUs with 40 Exaflops performance.
o High-Performance Storage: 200 PB.
MID-RANGE COMPUTE
o AI Training: 4 Centres with 750 GPUs each, totalling 3,000 GPUs with 12 Exaflops performance.
o AI Inferencing: 4 Centres with 1,000 GPUs each, totalling 4,000 GPUs with 8.8 Exaflops performance.
o High-Performance Storage: 400 PB.
EDGE COMPUTE
o AI Training: 12 Centres with 125 GPUs each, totalling 1,500 GPUs with 6 Exaflops performance.
o AI Inferencing: 12 Centres with 500 GPUs each, totalling 6,000 GPUs with 13.2 Exaflops performance.
o High-Performance Storage: 240 PB
TOTAL
o AI Training Performance: 14,500 GPUs with 58 Exaflops.
o AI Inferencing Performance: 10000 GPUs with 22 Exaflops.
o High-Performance Storage: 840 PB.
o Power Requirement: 45 MW
• Ensure scalable and robust infrastructure to handle high compute requirements for AI workloads.
• Establish Secure Distributed Data Grid with a speed of 200/400 Gbps, operated under strict SLAs using a PPP model.
155
• Establish AI innovation hubs in various parts of the country to nurture AI startups, encourage AI innovations, and
promote collaborations between researchers, entrepreneurs, and businesses.
• Improve the underlying digital infrastructure, such as broadband connectivity, to support the use of AI technologies,
especially in remote and rural areas.
• Attract private sector investment in AI-based infrastructure. This could be achieved through incentives such as tax
breaks, subsidies, or public-private partnership models.
● Recommended to prioritize AI use cases for Governance, Agriculture, Health, Education, and Finance in that order.
● Support AI adaptation by stakeholders such as government (ministries and departments), academia, research labs
and startups in these domains.
● Inclusive AI: Develop strategies to ensure that AI technologies are accessible and beneficial to all sections of society,
including those with disabilities or those living in remote areas.
● Real-time Data Monitoring: Implement real-time monitoring systems to track the performance and impact of AI
systems on different sectors. This will provide valuable insights for improving AI systems and strategies.
API MANAGEMENT
● Address the issue of differing costs of APIs and management of per API costs for users in AI marketplaces.
● Implement transparent pricing models and clear guidelines for cost management.
156
CAPACITY BUILDING AND COLLABORATION
● Facilitate AI Education and Training: Promote AI learning and development programs to cultivate AI-ready talent.
Engage with educational institutions to incorporate AI studies in their curriculum. Establish training programs to
upskill the existing workforce.
● Cross-Sector Collaboration: Foster collaborations between different sectors, such as healthcare, agriculture, and
transportation, to leverage AI technologies across various applications. This can lead to the development of cross-
sector AI solutions.
● Public Awareness and Trust: Implement initiatives to increase public awareness and understanding of AI
technologies, promoting a more AI-accepting culture. This could involve information campaigns, workshops, and
seminars.
157
REFERENCES
158
WORKING GROUP 7:
CHAIRMAN:
MEMBERS:
159
PREAMBLE
Artificial Intelligence (AI) is profoundly shaping society, revolutionizing industries and daily life. AI-driven automation
enhances productivity but raises concerns about job displacement. In healthcare, AI improves diagnostics and
personalized treatment, though ethical issues like data privacy persist. Communication is redefined by AI-powered
language translation and recommendation systems, but authenticity concerns emerge. AI's role in autonomous vehicles
offers safer transportation, yet regulatory adjustments are vital. Ethical considerations, such as algorithmic bias,
necessitate responsible AI development. Overall, AI's far-reaching impact underscores the need for balanced integration
to ensure a prosperous and equitable future for society.
AI compute, the computational power harnessed by artificial intelligence systems, is a driving force behind the rapid
advancements in AI technology. It refers to the capacity of hardware, such as processors and GPUs, to perform the
complex calculations required for AI tasks. The evolution of AI compute has brought about transformative changes,
enabling AI models to process massive amounts of data and perform intricate tasks that were previously unimaginable.
The importance of AI compute cannot be overstated. It has propelled AI research and development, allowing for the
training of deep neural networks and the creation of sophisticated AI models. More AI compute power has led to
breakthroughs in natural language processing, image recognition, autonomous vehicles, drug discovery, and more.
AI compute has also democratized AI innovation by making powerful hardware more accessible, enabling researchers,
startups, and organizations of varying sizes to contribute to the field. Additionally, the increase in AI compute has
facilitated the scaling of AI applications, leading to broader adoption in industries ranging from healthcare and finance to
entertainment and manufacturing.
As AI tasks become more complex and datasets grow larger, the ongoing advancement of AI compute remains crucial.
Continued improvements in hardware and computing architectures will fuel further innovation, drive AI capabilities to
new heights, and unlock novel applications that have the potential to reshape industries and societies.
AI chipsets, often referred to as AI accelerators or neural processing units (NPUs), are specialized hardware components
designed to efficiently perform the complex computations required for artificial intelligence tasks. These chipsets are
engineered to handle the unique demands of AI workloads, such as deep learning, machine learning, and neural network
processing.
The significance of AI chipsets lies in their ability to significantly enhance the performance and efficiency of AI applications.
Unlike traditional CPUs and GPUs, AI chipsets are optimized for parallel processing and matrix calculations, which are
fundamental to AI computations. This specialization results in faster inference and training times, enabling real-time
decision-making and accelerating the development of AI models.
AI chipsets have far-reaching implications across various sectors. In healthcare, it provides image analysis and diagnostic
tools. In autonomous vehicles, it enables rapid and accurate perception of the environment. In financial services, AI
chipsets facilitate fraud detection and algorithmic trading. Moreover, they contribute to advancements in natural
language processing, recommendation systems, and robotics.
160
The evolution of AI chipsets drives innovation by making AI technology more accessible and efficient. As the demand for
AI applications continues to grow, AI chipsets play a pivotal role in enabling the deployment of AI solutions at scale. Their
ongoing development and integration into a wide range of devices and systems promise to reshape industries and improve
our daily lives.
India today is witnessing a surge in AI startups working on innovative solutions across different industries. These startups
are developing AI-powered products and services, ranging from chatbots, virtual assistants, Chipsets to almost every
sector.
The purpose of this report is to provide a comprehensive overview of the Chipsets for AI and the sectors of opportunities
for Indian start-ups in this area.
BACKGROUND
The historical patterns of scaling in semiconductor process technology innovation are experiencing a notable deceleration.
Following numerous decades of impressive adherence to Moore's law, which advocates that the number of transistors on
a semiconductor wafer approximately doubles every two years, the rate of transistor scaling has decreased in recent
times.
Moreover, the concept of Dennard scaling, which anticipated a consistent power consumption per unit chip area with
rising transistor density, is also faltering.
During their 2018 Turing lecture, renowned computer architects John Hennessy and David Patterson noted that
slowdowns in process technology innovation would steadily increase the incentive for architectural innovations—that is,
the way in which integrated circuits are designed to perform computational tasks. These innovations pertain to the
strategies employed in designing integrated circuits to execute computational functions. They posited that the inherent
inefficiencies of broad-spectrum compute architectures, such as central processing units (CPUs), would gradually give way
to, or harmonize with, the enhanced computational potency and cost-effectiveness offered by architectures fine-tuned
for particular computational tasks. These specialized architectures, referred to as domain-specific architectures (DSAs),
are optimized to efficiently handle specific computational workloads. Hennessy and Patterson's observations underscore
the evolving landscape of computer design, where the focus shifts towards creating purpose-built solutions tailored to
distinct tasks, thus potentially unlocking higher performance and efficiency for specific applications.
Simultaneously, the widespread adoption of computing and digital transformation is permeating numerous sectors,
encompassing cloud computing (encompassing AI and high-performance tasks), networking, edge computing, the Internet
of Things (IoT), and self-driving systems. In this context, finely-tuned computational tasks specific to each domain are on
the rise, presenting domain-specific architectures (DSAs) with ample opportunities to deliver substantial performance
enhancements.
Consider the significant impact of extensive language models, which serve as pivotal components in generative AI. These
models, like ChatGPT, bring about heightened specialization within AI workloads, operating at a remarkably large scale.
161
This trend could potentially drive a deeper level of hardware customization and specialization, further refining the
hardware components to cater to the unique demands of these specialized AI tasks.
The result is a dynamic landscape where DSAs stand to become even more vital, offering the potential to unlock
remarkable levels of efficiency and effectiveness in various computational contexts.
The substantial commercial promise of Domain-Specific Accelerators (DSAs), encompassing both hardware and software
tailored for particular application domains, cannot be overstated. Notably, Graphics Processing Units (GPUs) have
captured a significant portion of the market from edge to data centres with most of the techniques explained in the
previous chapter covered on their platform with majority contributions by Open-source developers (i.e. same GPU or its
feature scaled up or down versions can now be used for AI, HPC, IOT/Edge, Analytics, Autonomous vehicles/Robotics &
Quantum simulations as well). And, so is the case with tensor core processors (TPU) which scale good AI but limited to
one platform available only to consume on one cloud service provide. Their dominance stems from their ability to surpass
CPUs in scenarios demanding extensive parallelization, as observed in AI workloads involving learning and inferencing.
The enhancements in performance are nothing short of remarkable, often resulting in workload-specific accelerations
ranging from 15 to 500 times the norm. In the automotive sector, preeminent providers offer tailored solutions that
furnish the essential low-latency, high-performance inference capabilities essential for the safe advancement of
autonomous driving capabilities.
Observing the proliferation of DSAs into diverse application domains, the projections indicate that the revenue attributed
to DSAs is poised to reach around $90 billion by 2026. This impressive figure represents approximately 10 to 15 percent
of the global semiconductor market, a notable ascent from the roughly $40 billion recorded in 2022. Consequently, the
162
notable surge in venture capital investments directed towards domain-specific design start-ups is hardly unexpected. A
staggering $18 billion in funding has provided sustenance to approximately 150 start-ups collectively over the past decade.
This stands in stark contrast to the preceding ten years, during which hardware investments were overshadowed by a
preference for software-focused ventures.
Over the past decade, the landscape of AI has been significantly shaped by the integration of machine learning, particularly
through the utilization of intricate deep neural networks. The successful deployment of deep neural networks took root
in the early 2010s, made possible by the escalated computational prowess of contemporary computing hardware.
Pioneering this path, AI hardware has emerged as a groundbreaking iteration of custom-designed hardware tailored
exclusively to meet the demands of machine learning tasks.
With the continued proliferation of artificial intelligence and its myriad applications, the trajectory becomes evident: a
heightened competition is poised to intensify among industry behemoths, each vying to engineer more cost-effective and
swifter chips. This pursuit of innovation in chip technology is an inevitable consequence of AI's expanding reach across
various domains.
AI chips, often referred to as AI hardware or AI accelerators, constitute a distinct category of purpose-built accelerators
meticulously crafted to cater to the requirements of applications centered around artificial neural networks (ANNs). The
focal point of these AI chips predominantly aligns with the realm of deep learning applications, which overwhelmingly
dominates the landscape of commercial ANN implementations.
General purpose hardware uses arithmetic blocks for basic in-memory calculations. The serial processing does not give
sufficient performance for deep learning techniques, for instance - Neural networks need many parallel/simple arithmetic
operations and Powerful general-purpose chips cannot support a high number of simple, simultaneous operations. AI
optimized HW includes numerous less powerful chips which enables parallel processing.
The AI accelerators bring the following advantages over using general purpose hardware – (a) Faster computation:
Artificial intelligence applications typically require parallel computational capabilities in order to run -sophisticated
training models and algorithms. AI hardware provides more parallel processing capability that is estimated to have up to
10 times more competing power in ANN applications compared to traditional semiconductor devices at similar price
points, (b) High bandwidth memory: Specialized AI hardware is estimated to allocate 4-5 times more bandwidth than
traditional chips. This is necessary because due to the need for parallel processing, AI applications require significantly
more bandwidth between processors for efficient performance.
AI chips represent a class of highly specialized silicon chips meticulously fine-tuned to excel in the realm of AI workloads.
Their architecture is finely attuned to swiftly handle substantial volumes of data, rendering them exceptionally well-suited
for intricate tasks like machine learning, deep learning, and the processing of neural networks. Unlike their general-
purpose counterparts, the conventional CPUs, AI chips are meticulously crafted to tackle the distinctive intricacies of AI
computations. This strategic specialization empowers them to exhibit unparalleled performance in the arena of AI
applications, frequently outshining CPUs by a substantial degree.
Beyond their role in optimizing computing system performance, AI chips also play a pivotal role in enhancing energy
efficiency. Traditional CPUs are notorious for their power-hungry nature, leading to elevated operational expenses and
environmental repercussions stemming from increased power consumption. In stark contrast, AI chips are meticulously
163
crafted with energy efficiency in mind, curbing power usage while maintaining robust performance. This distinctive
feature positions them as a sustainable alternative for high-performance computing needs.
Despite the array of benefits they offer, AI chips do encounter certain obstacles. A significant hurdle pertains to the
substantial expenses linked with their development and manufacturing, rendering them potentially unaffordable for
certain applications. Nonetheless, with the onward march of technology and the eventual realization of economies of
scale, the cost associated with AI chips is projected to decrease. This anticipated reduction in cost will ultimately render
AI chips more attainable, catering to a broader spectrum of users.
AI chips employ innovative architectures to enhance their performance, which has been categorized from the most
prevalent to the emerging methodologies as below:
I. GPUs: Graphical Processing Units were originally designed for accelerating graphical processing via parallel
computing. The same approach is also effective at training & Inferencing deep learning applications and is
currently one of the most common hardware used by Deep learning and Machine learning software developers.
Most current and popular globally referred AI benchmarks such as MLPerf from MLCommons have demonstrated
record settings on varieties of AI workload Benchmarks on large numbers of GPU clusters connected through
high-speed lower latency networks.
II. Reconfigurable neural processing unit (NPU): The architecture provides parallel computing and pooling to
increase overall performance. It is specialized in Convolution Neural Network (CNN) applications which are a
popular architecture for Artificial Neural Networks (ANNs) in image recognition. San Diego and Taipei based low
power edge AI startup Kneron licences the architecture on which their chips are based; a reconfigurable neural
processing unit (NPU). The fact that this architecture can be reconfigured to switch between models in real time
allows creating an optimized hardware depending on the needs of the application. US National Institute of
Standards and Technology (NIST) recognized Kneron’s facial recognition model as the best performing model
under 100 MB.
III. Neuromorphic chip architectures: These are an attempt to mimic brain cells using novel approaches from
adjacent fields such as materials science and neuroscience. These chips can have an advantage in terms of speed
and efficiency on training neural networks. Intel has been producing such chips for the research community since
2017 under the names Loihi and Pohoiki.
IV. Analog memory-based technologies: Digital systems built on 0’s and 1’s dominate today’s computing world.
However, analog techniques contain signals that are constantly variable and have no specific ranges. IBM
research team demonstrated that large arrays of analog memory devices achieve similar levels of accuracy as
GPUs in deep learning applications.
V. Wafer chips: For example, Cerebras is building wafer chips by producing a 46,225 square millimeter (~72 square
inch) silicon wafers containing 1.2 trillion transistors on a single chip. Thanks to its high volume, the chip has
164
400,000 processing cores on it. Such large chips exhibit economies of scale but present novel materials science
and physics challenges.
AI chipsets are used across a wide range of industries and applications, where they bring significant performance
improvements and efficiency gains to various tasks. Here are some of the key areas where AI chipsets are used:
1. Data Centers and Cloud Computing: AI chipsets are extensively used in data centers to accelerate AI workloads,
including training and inference for deep learning models. They enable faster and more efficient processing of
large datasets, making cloud-based AI services more powerful and responsive.
2. Edge Computing: AI chipsets are deployed in edge devices such as smartphones, smart cameras, drones, and IoT
devices. This allows real-time processing of AI tasks directly on the device, reducing latency and enhancing privacy
by minimizing the need for data transmission to centralized servers.
3. Autonomous Vehicles: AI chipsets play a crucial role in enabling advanced driver assistance systems (ADAS) and
self-driving cars. They process sensor data in real time, making split-second decisions for navigation, collision
avoidance, and situational awareness.
4. Healthcare: AI chipsets are used in medical imaging for tasks like image analysis, diagnosis, and anomaly
detection. They also enable personalized medicine by analyzing patient data to predict and recommend
treatments.
5. Natural Language Processing (NLP): AI chipsets power language processing tasks such as speech recognition,
sentiment analysis, language translation, and chatbots, making interactions with computers and devices more
natural and intuitive.
6. Finance: AI chipsets are employed in algorithmic trading, fraud detection, credit scoring, and risk assessment.
They analyze vast amounts of financial data to make informed decisions and predictions.
7. Manufacturing and Industry 4.0: AI chipsets enhance automation and predictive maintenance in manufacturing,
optimizing production processes and minimizing downtime through real-time data analysis.
8. Agriculture: AI chipsets are used for precision agriculture, where they analyze data from sensors, drones, and
satellites to optimize crop management, irrigation, and yield prediction.
9. Energy Management: AI chipsets contribute to smart energy grids by analyzing data from sensors to optimize
energy distribution, monitor usage, and predict maintenance needs.
10. Retail: AI chipsets power recommendation systems, inventory management, customer analytics, and cashierless
checkout systems in retail environments.
11. Security and Surveillance: AI chipsets enable intelligent video analytics for security and surveillance, identifying
suspicious activities and objects in real time.
12. Environmental Monitoring: AI chipsets process data from sensors to monitor and manage pollution levels,
weather patterns, and natural disasters.
165
13. Scientific Research: AI chipsets aid in data analysis and simulations in scientific research, contributing to fields
like astronomy, physics, and drug discovery.
14. Entertainment and Gaming: AI chipsets enhance graphics rendering, enable realistic virtual environments, and
improve character animations in video games and virtual reality applications.
15. Education: AI chipsets support personalized learning experiences, adapt educational content to individual
student needs, and provide tools for interactive learning platforms. The new trend of outsourcing labeling/
annotating work is all done using AI chipset and technologies around it.
16. HPC: AI chipsets have found significant applications in various High-Performance Computing (HPC) areas,
enhancing computational capabilities, accelerating data processing, and enabling advanced simulations. Some of
the key HPC areas where AI chipsets are used include- Scientific Simulations, Genomics and Bioinformatics,
Material Science, Seismic Imaging, Astrophysics and Astronomy, Particle Physics, Drug Discovery,
Neuroscience, Quantum Chemistry, Cryptography, Environmental Modeling, etc.
Some of the important criteria for assessing the performance of the AI Hardware include but are not limited to the
following:
I. Processing speed: AI hardware enables faster training and inference using neural networks- (a) Faster training
enables the machine learning engineers to try different deep learning approaches or on optimizing the structure
of their neural network (hyperparameter optimization); (b) Faster inferences (e.g., predictions) are critical for
applications like autonomous driving.
II. Development platforms: Building applications on a standalone chip is challenging as the chip needs to be
supported by other hardware and software for developers to build applications on them using high level
programming languages. An AI accelerator which lacks a development board would make this device challenging
to use in the beginning and difficult to benchmark.
III. Power requirements: Chips that will function on battery need to be able to work with limited power consumption
to maximize device lifetime.
IV. Size: In IoT applications, device size may be important in applications like mobile phones or small devices.
V. Cost: As always, Total Cost of Ownership of the device is critical for any procurement decision.
VI. Scale up and Scale out: In-node communication and Inter-node communication is a key criteria today for AI
Training workloads and determines how best and fast you can train the model and go to market deploy. Low
latency Interconnect are increasingly becoming integral part of the AI Compute Infrastructure today.
Because of their unique features, AI chips are tens or even thousands of times faster and more efficient than CPUs for
training and inference of AI algorithms. State-of-the-art AI chips are also dramatically more cost-effective than state-of-
166
the-art CPUs as a result of their greater efficiency for AI algorithms. AI chips include three classes: graphics processing
units (GPUs), field-programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs).
GPUs were originally designed for image-processing applications that benefited from parallel computation. In 2012, GPUs
started seeing increased use for training AI systems and by 2017, were dominant. GPUs are also used for inference.
There are several global leaders as well as start-ups involved in designing & innovating the AI Chipsets including but not
limited to the following:
Big Enterprises
I. NVIDIA: is a pioneer in the field of AI hardware with its GPUs, which have been widely adopted for AI and deep
learning tasks. They introduced specialized AI hardware like the Tensor cores first in P100s and then throughout
as well into their latest, 4th generation in NVIDIA H00 GPU. Tailored for high-performance AI workloads, NVIDIA
H100 GPUs today supports Transformer engines as well. NVIDIA’s chipsets are designed to solve business
problems in various industries (claims 4M+ developers with over 500+ SDKs/Libraries and over 1000+ opensource
containers for AI applications on their website). Orin - for example, is the basis for an autonomous driving
solution, while Hopper and DGX™ H100 (flagship GPU System of NVIDIA) are for data centers & L40s is for
Inferencing workloads. The DGX H100 System a Supercomputer-in-a-box integrates 8 GPUs and up to 640GB GPU
memory. Nvidia Grace – CPU is the new AI chip model that the company released for the HPC market in 2023 and
167
claims lowest power and highest performance in CPU market segment targeting HPC/ AI. The Unified memory
architecture between CPU+GPU is most favourite for AI models to maximize the performance for their operations
alongside with high throughput lower latency InfiniBand interconnect fabric. For AI workloads on the cloud,
Nvidia almost has a monopoly with most cloud players offering Nvidia GPUs as cloud GPUs. NVIDIA also co-
authored along with Intel and ARM a new Floating point (FP8) and introduced already into their chip which is
gaining the highest attraction among developers to save power and cost computing trend.
II. Intel: Xeon processors, which are appropriate for a variety of jobs, including processing in data centers, has up
to 40+ cores and 1.6 times greater memory bandwidth, and 2.66 times greater memory capacity compared to
previous generation. Gaudi is the neural network training accelerator from Intel. For inferencing, Intel has the
Goya, optimized for throughput and latency. Intel® NCS2 is the latest AI chip from Intel and was developed
specifically for deep learning.
III. Google Alphabet: Google Cloud TPU is the purpose-built machine learning accelerator chip that powers Google
products like Translate, Photos, Search, Assistant, and Gmail. It can be used via the Google Cloud implementation.
Edge TPU, another accelerator chip from Google Alphabet, is smaller than a one-cent coin and is designed for
edge devices such as smartphones, tablets, & IoT devices.
IV. Advanced Micro Devices (AMD): designs CPU, GPU and AI accelerators. AMD’s Alveo U50 data center accelerator
card has 50 billion transistors; which can run 10 million+ embedding datasets and perform graph algorithms in
milliseconds. AMD launched MI300 for AI training workloads in Dec 2023.
V. IBM: launched its “neuromorphic chip” TrueNorth AI in 2014. TrueNorth contains 5.4 billion transistors, 1 million
neurons, and 256 million synapses, so it can efficiently perform deep network inference and deliver high-quality
data interpretation. In April 2022, IBM launched its new hardware, “IBM Telum Processor”; suitable to use Telum
processors for missions such as preventing fraud immediately due to its improved processor core and memory
compared to previous AI chips of the company.
Start-ups
I. SambaNova Systems: founded in 2017, has developed the SN10 processor chip.
II. Cerebras System: founded in 2015 has developed WSE-2 Chip, which has 850,000 cores and 2.6 trillion
transistors, which has effective use cases in accelerating the genetic and genomic research and shortens the time
for drug discovery.
III. Graphcore: founded in 2015 has developed IPU-POD256 and working with research institutes around the globe
like Oxford-Man Institute of Quantitative Finance, University of Bristol and Berkeley University of California are
other research organizations that use Graphcore’s AI chips.
IV. Groq: represents a new model for AI chip architecture that aims to make it easier for companies to adopt their
systems; i.e. GroqChip™ Processor, GroqCard™.
V. Mythic: founded in 2012, has developed products such as M1076 AMP, MM1076 key card, etc.
168
The landscape of global AI Chipsets developer including but not limited to the following:
IC Vendors Intel, Qualcomm, Nvidia, Samsung, AMD, Xilinx, IBM, STMicroelectronics, NXP, Marvell,
MediaTek, HiSilicon, Rockchip
Tech Giants & Google, Amazon_AWS, Microsoft, Apple, Aliyun, Alibaba Group, Tencent Cloud, Baidu,
Baidu Cloud, HUAWEI Cloud, Fujitsu, Nokia, Facebook, HPE, Tesla
HPC Vendors
Startups in China Cambricon, Horizon Robotics, Bitmain, Chipintelli, Thinkforce, Unisound, AlSpeech,
Rokid, NextVPU
Startups Worldwide Cerebras, Wave Computing, Graphcore, PEZY, KnuEdge, Tenstorrent, ThinCI, Koniku,
Adapteva, Knowm, Mythic, Kalray, BrainChip, Almotive, DeepScale, Leepmind, Krtkl,
NovuMind, REM, TERADEEP, DEEP VISION, Groq, KAIST DNPU, Kneron, Esperanto
Technologies, Gyrfalcon Technology, SambaNova Systems, GreenWaves Technology,
Lightelligence, Lightmatter, ThinkSilicon, Innogrit, Kortiq, Hailo,
Tachyum,Alpha|Cs,Syntiant, Habana
While over 30+ companies support and works on these benchmarks only few could submit all of some. Notably it’s a data
center scale issue (i.e. CPU / GPU / Network/ Storage/ Cooling etc.) and one need to consider all these things while
designing their national LLM for the citizens and local industries. Below are the benchmarks which are asked to be
submitted by Chip vendors on various AI use cases and NVIDIA with their full stack, opensource approached backed by
millions of developers is clearly leading the way.
NE NE
W W
Large Language Model Recommendat ion Medical Imaging Object Det ect ion, Light -Weight
GPT-3 175B DLRM-DCNv2 3D U-Net Ret inaNet
Nat ural Language Processing Speech Recognit ion Object Det ect ion, Heavy-Weight Image Classif icat ion
BERT RNN-T Mask R-CNN ResNet -50 v1.5
169
Per-accelerator performance is not a primary metric of MLPerf ™ Training v3.0. Results normalized to NVIDIA platform solution (DGX or OEM partner),
using scale common to most submitters and/or closest chip count to the NVIDIA platform result. | Format: Chip count, Submitter, MLPerf-ID | ResNet-
50 v1.5: 8x Dell 3.0-2053, 8x Intel-HabanaLabs 3.0-2017, 32x Intel 3.0-2011 | 3D U-Net: 8x GIGABYTE 3.0-2054, 8x Intel-HabanaLabs 3.0-2016 |
BERT-Large: 8x GIGABYTE 3.0-2055, 8x Supermicro 3.0-2025, 16x Intel 3.0-2012 | RetinaNet: 64x NVIDIA 3.0-2073, 32x Intel 3.0-2011 | Mask R-
CNN: 8x NVIDIA 3.0-2064 | RNN-T: 8x NVIDIA 3.0-2064 | GPT-3: 512x NVIDIA 3.0-2069, 384x Intel-HabanaLabs 3.0-2014 | DLRM-DCNv2: 8x
NVIDIA 3.0-2063. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved.
Unauthorized use strictly prohibited.
170
RECOMMENDATIONS FOR AI CHIPSETS FOR HIGH -PERFORMANCE COMPUTE (HPC) AND
RELATED AREAS
The SemiconIndia futureDESIGN Design Linked Incentive (DLI) Scheme aims to offer financial incentives as well as design
infrastructure support to domestic companies and start-ups/ MSMEs across various stages of development and
deployment of semiconductor design(s) for Integrated Circuits (ICs), Chipsets, System on Chips (SoCs), Systems & IP
Cores and semiconductor linked design(s) over a period of 5 years. This includes the following support categories:
(i) National EDA Grid Product Design Linked Incentive (P-DLI): Reimbursement of 50% of
the eligible expenditure subject to a ceiling of ₹15 Crore incentive
(ii) IP Core Repository
per application.
(ii) Prototyping
Deployment Linked Incentive: Reimbursement of 6% to 4% of net
(iv) Post Silicon Validation
sales over 5 years subject to a ceiling of ₹30 Crore incentive per
application.
In the realm of High-Performance Computing (HPC), AI chipsets play a critical role in enhancing computational capabilities
and accelerating data-intensive tasks. The specific AI chipsets required for HPC applications depend on the nature of the
workloads and the computational demands of the tasks.
Here are some AI chipsets that could be relevant for HPC & related areas; the design of which by domestic Start-ups/
MSMEs may be supported under the SemiconIndia futureDESIGN Design Linked Incentive (DLI) Scheme:
1. GPU Accelerators: Graphics Processing Units (GPUs) are widely used in HPC for AI-related workloads. They excel
at parallel processing and are highly effective for training deep neural networks and conducting large-scale
simulations.
2. Neuromorphic Processors: Neuromorphic processors emulate the architecture of the human brain, which can
be advantageous for certain types of AI simulations and brain-inspired computations.
3. Tensor Processing Units (TPUs): TPUs, are AI accelerators designed to handle the matrix computations prevalent
in neural network training and inference. They are used to speed up AI workloads in data centers and cloud
environments.
4. Quantum AI Processors: Quantum computing technologies, although still in the experimental stage, have the
potential to accelerate certain AI and optimization tasks in HPC. Quantum AI processors could become relevant
as quantum computing matures.
171
5. Intelligence Processing Unit (IPU) are required to accelerate training and inference tasks for Deep Neural
Networks (DNN).
6. Language Processing AI Chipsets: India has a diverse linguistic landscape, making natural language processing
(NLP) in various languages a significant challenge. Developing AI chipsets optimized for multilingual NLP
applications, including speech recognition, translation, and sentiment analysis, could address local needs.
7. FPGAs: Field-Programmable Gate Arrays (FPGAs) are reconfigurable hardware devices that can be customized
for specific AI and HPC tasks. They offer flexibility and can be optimized for a range of applications, including AI
workloads.
8. AI-Specific ASICs: Application-Specific Integrated Circuits (ASICs) tailored for AI tasks can provide high
performance and energy efficiency. ASICs designed to accelerate AI computations can offer significant speedup
for specific HPC workloads.
9. AI-Optimized Processors: Some processors are designed with AI acceleration features, such as specific vector
instructions and hardware support for neural network computations. These processors can enhance AI-related
tasks in HPC environments.
10. Hybrid AI Chipsets: Some chipsets combine traditional high-performance computing components with AI
acceleration units. These hybrid chipsets can provide the benefits of both worlds, enhancing HPC tasks with AI
capabilities.
11. Energy-Efficient AI Chipsets: Energy efficiency is crucial in HPC environments. AI chipsets designed to provide
high performance while minimizing power consumption can be highly valuable for resource-intensive
computations.
12. The choice of AI chipsets for HPC depends on factors such as the specific workloads being performed, the
available infrastructure, budget considerations, and the desired performance improvements. It's important to
assess the compatibility of AI chipsets with existing HPC systems and software frameworks to ensure seamless
integration and optimal utilization of computational resources.
172
173
MEITY.GOV.IN
174