0% found this document useful (0 votes)
359 views60 pages

WHO MSD GSEDpackage v1.0 2023.1 Eng PDF

Uploaded by

Goteti Nalini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
359 views60 pages

WHO MSD GSEDpackage v1.0 2023.1 Eng PDF

Uploaded by

Goteti Nalini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Global Scales for

Early Development v1.0

Technical
report
Images ©WHO
Design and layout: büro svenja
Global Scales for
Early Development v1.0

Technical
report
WHO/MSD/GSED package v1.0/2023.1

Global Scales for Early Development v1.0 Technical report – Global Scales for Early
Development v1.0 Short Form (caregiver-reported) – Global Scales for Early Development v1.0
Short Form (caregiver-reported). Item guide – Global Scales for Early Development v1.0 Short
Form (caregiver-reported). User manual – Global Scales for Early Development v 1.0 Long
Form (directly administered) – Global Scales for Early Development v 1.0 Long Form (directly
administered). Item guide – Global Scales for Early Development v1.0 Long Form (directly
administered). User manual – Global Scales for Early Development v1.0 Scoring guide – Global
Scales for Early Development v1.0 Adaptation and translation guide

© World Health Organization 2023. Some rights reserved. This work is available under the
CC BY-NC-SA 3.0 IGO licence.

Selected questions and descriptions for the GSED measures have been reproduced or adapted
from the following tools/assessments: Ages and Stages Questionnaire, third edition (ASQ-3);
Bayley Scales of Infant Development (Bayley); Bayley Scales of Infant Development, second
edition (Bayley II); Caregiver-Reported Early Development Instruments (CREDI); Denver
Developmental Screening Test (DDST); Denver Developmental Screening Test, second edition
(DDST II); Developmental Milestones Checklist (DMC); Developmental Milestones Checklist II
(DMC II); Dutch Development Instrument (DDI); Griffiths Mental Development Scales (GMDS);
Griffiths Mental Development Scales – South African version (GMDS-SA); Kilifi Developmental
Inventory (KDI); Malawi Developmental Assessment Tool (MDAT); Preschool Pediatric
Symptoms Checklist (PPSC); Saving Brains Early Childhood Development Scale (SBECD);
Stanford-Binet Intelligence Scales, fifth edition (SBIS-5); Test de Desarrollo Psicomotor
[Psychomotor Development Test] (TEPSI); and Vineland Adaptive Behavior Scales (Vineland)
(see Bibliography for details).

ii
Contents
Acknowledgements v
Abbreviations vii

Executive summary ix

1. Introduction 1

2. GSED: overview 3
The D-score 3
The GSED measures 4

3. Development of GSED Short Form and GSED


Long Form measures v1.0 5
Step 1. Conceptual framework: target product profile 6
Step 2. Harmonization of existing datasets on child development 6
Step 3. Creation of the GSED SF and GSED LF measures 6
Step 3.1 Item matching 6
Step 3.2 Statistical modelling 7
Step 3.3 Item feasibility 7
Step 3.4 Item domaining 9
Step 3.5 Item shortlisting 10
Step 3.6 Creation of R Shiny App for information display 10
Step 3.7 Item selection for each measure 10
Step 3.8 GSED SF and LF measures finalization 11
Step 4. GSED App development 11

4. GSED validation 13
Preparation and feasibility phase 15
Translations and adaptation 15
Training 16
Feasibility of implementation processes 16
Validation phase 18
Study population 18
Sample size and sampling scheme 18

iii
GSED v1.0 Technical report

Study measures 21
Visit schedule and quality control 22

5. GSED psychometric properties 23


Internal reliability 25
External reliability 26
Concurrent validity 27
Convergent validity 28
Known groups validity 29
Short-term predictive validity 30

6. GSED package v1.0 31


Scoring 35
Other components of the GSED package v1.0 36

7. Next steps 37

References 38

Bibliography 40

Annex 1. Early childhood development dataset for creation


of GSED measures 41
Annex 2. GSED study validation measures 43
Annex 3. Validity results by GSED measure 44

iv
GSED v1.0 Technical report

Acknowledgements

Vision and conceptualization Nisar (AKU, Pakistan), Ambreen Nizar ( AKU, Pakistan),
Mariana Pacifico Mercadante (University of São Paulo
The Global Scales for Early Development (GSED) package
Medical School, Brazil), Michelle Pérez Maillard (WHO
v1.0 was developed under the overall guidance of and
headquarters, Switzerland), Abbie Raikes (University of
conceptualization by Tarun Dua and Dévora Kestel of the
Nebraska Medical Center, USA), Arunangshu Dutta Roy
Department of Mental Health and Substance Use of the
(Projahnmo Research Foundation, Bangladesh), Marta
World Health Organization (WHO).
Rubio-Codina (Inter-American Development Bank, USA),
Sunil Sazawal (CPHK, United Republic of Tanzania),
Project coordination
Yvonne Schönbeck (TNO, Netherlands), Jonathan Seiden
The development of the GSED package was coordinated (Harvard Graduate School of Education, USA), Fahmida
by Vanessa Cavallera, Brain Health Unit, Department of Tofail (International Centre for Diarrhoeal Disease
Mental Health and Substance Use, WHO. Research, Bangladesh), Marcus Waldman (University of
Nebraska Medical Center, USA), Ann M Weber (University
Writing of the Technical Report of Nevada, USA), Yunting Zhang (Shanghai Jiao Tong
University School of Medicine, China), Arsène Zongo (IPA,
The writing team included (in alphabetical order):
Côte d’Ivoire).
Salahuddin Ahmed (Projahnmo Research Foundation,
Bangladesh ), Romuald Kouadio E Anago (Innovations
Technical contribution and review
for Poverty Action [IPA], Côte d’Ivoire), Abdullah H Baqui
(Johns Hopkins University, United States of America Valuable input was received from WHO staff at
[USA]), Maureen Black (RTI International and University of headquarters and in regional and country offices, as
Maryland School of Medicine, USA), Alexandra Brentani well as from many international experts and programme
(University of São Paulo Medical School, Brazil), Kieran implementers. These technical and/or data contributions
Bromley (Keele University, United Kingdom), Stef van were central to the development of the GSED package.
Buuren (Netherlands Organisation for Applied Scientific Those who provided input include (in alphabetical order):
Research [TNO] and Utrecht University, Netherlands), Amina Abubakar (AKU, Kenya), Fahad Aftab (Center for
Vanessa Cavallera (WHO headquarters, Switzerland), Public Health Kinetics, United Republic of Tanzania ),
Symone Detmar (TNO, Netherlands), Tarun Dua (WHO Claudia Regina Lindgren Alves (Universidade Federal
headquarters, Switzerland), Arup Dutta (Center for de Minas Gerais, Brazil), Omar Ali Ali (WHO Country
Public Health Kinetics [CPHK], United Republic of Office, United Republic of Tanzania), Elisa Altafim
Tanzania), Iris Eekhout (TNO, Netherlands), Melissa (Universidade de São Paulo, Brazil), Maria Caridad
Gladstone (University of Liverpool, United Kingdom), Araujo (Inter-American Development Bank, USA), Orazio
Katelyn Hepworth (University of Nebraska, USA), Attanasio (Institute for Fiscal Studies, United Kingdom),
Andreas Holzinger (IPA, Côte d’Ivoire), Magdalena Janus Farzana Begum (AKU, Pakistan), Florence Baingana
(McMaster University, Canada), Fyezah Jehan (Aga Khan (WHO Regional Office for Africa, Congo), Molly Biel
University [AKU], Pakistan), Fan Jiang (Shanghai Jiao (WHO headquarters, Switzerland), Geoffrey Bisoborwa
Tong University School of Medicine, China), Patricia (WHO Regional Office for Africa, Congo), Andrea Bruni
Kariger (Independent Consultant, USA), Raghbir Kaur (WHO Regional Office for South-East Asia, India),
(WHO headquarters, Switzerland), Rasheda Khanam Betzabe Butron Riveros (WHO Regional Office for the
(Johns Hopkins University, USA), Gillian Lancaster Americas, USA), Claudia Cappa (UNICEF, USA), Andreana
(Keele University, United Kingdom), Dana McCoy Castellanos (Afinidata, Guatemala), Susan M Chang
(Harvard Graduate School of Education, USA), Gareth (University of the West Indies, Jamaica), Alexandra Chen
McCray (Keele University, United Kingdom), Imran (Harvard University, USA), Anne Marie Chomat (McGill

v
GSED v1.0 Technical report

University, Canada), Marie-Helene Cloutier (The World the Americas, USA), Nicole Petrowski (UNICEF, USA),
Bank, USA), Bernadette Daelmans (WHO headquarters, Lauren Pisani (Save the Children, United Kingdom),
Switzerland), Sandra Dao (WHO Country Office, Côte Helen Pitchik (University of California at Berkeley
d’Ivoire), Teshome Desta Woldehanna (WHO Country School of Public Health, USA), Chemba Raghavan
Office, Kenya), Erinna Dia (UNICEF, USA), Bernice M (UNICEF, USA), Muneera Rasheed (AKU, Pakistan),
Doove (Maastricht University, Netherlands), Dan Fang Lisy Ratsifandrihamanana (Centre Médico-Educatif,
(WHO Country Office, China), Wafaie Fawzi (Harvard Madagascar), Sarah Reynolds (University of California
University, USA), Lia CH Fernald (University of California, at Berkeley School of Public Health, USA), Linda
USA), Emanuela Galasso (The World Bank, USA), Richter (University of the Witwatersrand, South Africa),
Sally M Grantham-McGregor (Emeritus Professor of Peter C Rockers (Boston University School of Public
International Child Health at University College London, Health, USA), Marta Rubio-Codina (Inter-American
United Kingdom), Shuchita Gupta (WHO Regional Development Bank, USA), Norbert Schady (The World
Office for South-East Asia, India), Cristina Gutierrez de Bank, Washington), Khalid Saeed (WHO Regional
Piñeres (United Way, Colombia), Jena Derakhshani Office for the Eastern Mediterranean, Egypt), Makeba
Hamadani (International Centre for Diarrheal Disease Shiroya (WHO Country Office, Kenya), Kathleen Louise
Research, Bangladesh), Charlotte Hanlon (Addis Strong (WHO headquarters, Switzerland), Christopher
Ababa University, Ethiopia), Natalia Henao (Genesis R Sudfeld (Harvard University, USA), Edwin Exaud Swai
Foundation, Colombia), Alaka Holla (The World Bank, (WHO Country Office, United Republic of Tanzania),
USA), Sidra Inayat (AKU, Pakistan), Matias Irarrazaval Safila Telatela (WHO Country Office, United Republic
(WHO Regional Office for the Americas, USA), Nemes of Tanzania), Daisy Trovada (WHO Country Office,
Joseph Iriya (WHO Country Office, United Republic of Mozambique), Martin Vandendyck (WHO Regional Office
Tanzania), Charles Newton (Kenya Medical Research for the Western Pacific, Philippines), Paul H Verkerk
Institute, Kenya), Ledia Lazeri (WHO Regional Office (TNO, Netherlands), Susan P Walker (Caribbean Institute
for Europe, Denmark), Zhao Li (WHO Regional Office for Health Research, Jamaica), Christine Wong (Hong
for the Western Pacific, Philippines), Pamela Jervis Kong University, China), Dorianne Wright (Oregon
(Universidad de Chile, Chile), Codie Kane (Temple Health & Science University, USA) and Aisha K Yousafzai
University, USA), Simone M Karam (Federal University (Harvard University, USA).
of Rio Grande Brazil), Janet Kayita (WHO Regional
Office for Africa, Congo), Jamila Khalfan (Public Information technology programming
Health Laboratory-IdC, United Republic of Tanzania),
We acknowledge UniversalDoctors (Jordi Serrano Pons,
Neema Kileo (WHO Country Office, United Republic of
Fernando Vaquero, Jeannine Lemaire and Montse
Tanzania), Betsy Lozoff (University of Michigan, USA),
Garcia) for conceptualization and initial information
Diego Luna (The World Bank, USA), Raquel Dulce
technology support to the GSED application creation,
Maguele (WHO Country Office, Mozambique), Limbika
and the CPHK (Arup Dutta, Vishi Saxena, Waseem Ali,
Agatha Maliwichi (University of Malawi, Malawi), Sheila
Poonam Rathore) for further conceptual development
Manji (WHO headquarters, Switzerland), Susanne
and operationalization of the GSED App as well as
Martin Herz (University of California San Francisco,
data management throughout implementation of the
USA), Jeffrey Measelle (University of Oregon, USA),
project.
Girmay Medhin (Aklilu Lemma Institute of Pathobiology,
Ethiopia), Patricia Medrano (University of Chile, Chile),
Financial support
Junaid Mehmood (AKU, Pakistan), Rajesh Mehta (WHO
Consultant, India), Ana MB Menezes (Universidade Support for this work was received from (in alphabetical
Federal de Pelotas, Brazil), Assumpta Muriithi (WHO order): the Bernard van Leer Foundation, Bill &
Regional Office for Africa, Congo), Néllia Mutisse (WHO Melinda Gates Foundation, Children’s Investment
Country Office, Mozambique), Alphoncina Nanai Fund Foundation, Jacobs Foundation, King Baudouin
(WHO Country Office, United Republic of Tanzania), Foundation United States and the United States Agency
Renato Oliveira e Souza (WHO Regional Office for for International Development.


vi
GSED v1.0 Technical report

Abbreviations
HAZ height-for-age z-score
2PL two-parameter logistic
HF Household Form
AKU Aga Khan University
HOME Home Observation Measurement of the
ASQ Ages and Stages Questionnaire Environment
Bayley-III Bayley Scales of Infant and Toddler ICC intraclass correlation coefficient
Development, Third Edition
IPA Innovations for Poverty Action
BRS Brief Resilience Scale
IRT item response theory
CAT computerized adaptive tests
IYCD WHO Indicators of Infant and Young Child
CB combined format Development
CI confidence interval LF Long Form
CPAS Childhood Psychosocial Adversity Scale ODK Open Data Kit
CPHK Center for Public Health Kinetics PF Psychosocial Form
CREDI Caregiver Reported Early Development PHQ9 Patient Health Questionnaire-9
Instruments
PRIDI Regional Project on Child Development
D-score Developmental score Indicators
DAZ Development-for-Age z-score SD standard deviation
DDI Dutch Development Instrument SDG Sustainable Development Goal
DHS Demographic Health Survey SEE standard error of estimation
ECD early childhood development SES socioeconomic status
ECDI2030 Early Childhood Development Index 2030 SF Short Form
EDC electronic data capture SME subject matter expert
FCI Family Care Indicator TNO Netherlands Organisation for Applied
FGD focus group discussion Scientific Research
FSS Family Support Scale TPP Target Product Profile
GMDS Griffiths Mental Development Scales WAZ weight-for-age z-score
GSED Global Scales for Early Development WHO World Health Organization

vii
© WHO / Alasdair Bell

The GSED package v1.0 includes


open-access measures that provide a
standardized method for measuring
the development of children up to
36 months of age across diverse cultures
and contexts.”

viii
GSED v1.0 Technical report

Executive summary
This Technical Report documents the development and validation of the Global Scales for Early
Development (GSED). The GSED package v1.0 includes open-access measures that provide a
standardized method for measuring the development of children up to 36 months of age across
diverse cultures and contexts. It has been created to serve as a population-level assessment of
early childhood development (ECD) (up to 36 months) for the global community that may be
used for comparisons across countries. In contrast to growth, which is measured by changes
in children’s weight in grams and height in centimetres, there has been no uniform scale for
children’s early development. The GSED uses an innovative metric, the Developmental score
(D-score), a scale with interval properties, to measure children’s development.

The package includes the GSED measures v1.0 as Current evidence indicates that the psychometric
well as accompanying materials to facilitate their properties of the GSED SF and LF are comparable.
implementation and use. The GSED measures are The choice of one or the other, or the two together,
meant to collect population-level data on ECD and to measure child development should be dictated
are designed to be used primarily for research and by: i) the purpose of the evaluation and/or specific
programmatic evaluations. research question (e.g. type of intervention); ii)
They comprise a: preferred modality of administration (caregiver report
versus direct administration); and iii) the capacity and
expertise of the team. A combined format (CB) of GSED
SF together with GSED LF may be used to increase
GSED v1.0 Short Form
measurement precision. Further evidence on sensitivity
to potential changes after interventions and increase in
Global Scales for Global Scales for
Early Development v1.0 Early Development v1.0
Short Form Long Form
(caregiver-reported) (directly administered)
precision of the GSED CB is currently being tested and
will be made available in the near future.

Short Form (SF), a Long Form (LF),


caregiver-reported comprised
measure; and of items administered
directly to children by
a trained assessor.

ix
GSED v1.0 Technical report

The GSED measures also include a:1

Household Psychosocial
Form Form

Household Form (HF), a Psychosocial Form (PF), a


caregiver-reported measure, caregiver-reported measure
designed to be integrated into of children’s psychosocial
large-scale and national-level behaviours.
surveys for monitoring child
development; and

The GSED measures were created through an integrated empirical-conceptual approach. The
empirical approach included statistical modelling of a data set of 100 153 observations. The
conceptual approach included subject matter experts (SMEs) identifying conceptually relevant
and globally feasible items of child development for children under 36 months of age.

A rigorous and standardized method has been used to evaluate the psychometric properties
of the newly-created GSED measures in seven countries. A prospective cross-sectional design
was implemented, including a six-month longitudinal follow up, with an age- and sex-stratified
sample of children.

This interim package (v1.0) is based on validation data from three countries: Bangladesh,
Pakistan and the United Republic of Tanzania. The results demonstrate statistically significant
reliability and validity of the D-score to measure child development under 36 months of age at
the population level. Specifically, convergent validity measured through contextual measures
likely to be related to child development (e.g. socioeconomic status [SES] and exposure to
adversity) and concurrent validity with the Bayley Scales of Infant and Toddler Development
(Bayley-III) were statistically significant. Additional analyses have shown that GSED measures
are culturally neutral, have good content validity and are easy to implement at scale. Revisions
to the package are planned for 2024 when data are available from four additional countries:
Brazil, China, Côte d’Ivoire and the Netherlands.

1 Not part of this Technical Report in detail as they are being further tested, but can be made available on request.

x
GSED v1.0 Technical report

1. Introduction
The pathways to adult health and well-being begin in childhood are often measured by
children’s growth and development. Both are products of children’s specific genetic blueprint
and influenced by environmental factors that begin prenatally (1). Children in the early years
and, in particular, the first 1000 days (from conception to age 24 months) are highly sensitive to
environmental conditions due to the rapid brain development that occurs during this period.
Monitoring of child development at this time is important to track progress toward global and
national policy goals for children and provides a critical reference to plan and evaluate services
to support healthy development. However, there have been no universal measures designed to
quantify children’s development during the earliest years at population level (2). Without such
measures, countries are unable to monitor children’s progress and determine how to allocate
resources to provide the necessary support for children to reach their developmental potential.

Measures of stunting and severe poverty have been a proxy based on findings that chronic linear growth
effective indicators of the proportion of children at risk faltering (i.e. stunting: height-for-age < 2 standard
of not reaching their developmental potential (3) and deviations [SDs] below WHO growth standards) is
have contributed to advances in global policies and associated with fewer years of adult schooling, poorer
programmes for young children (4). However, they economic indices and greater likelihood of experiencing
have limited predictive ability (5) and lack sensitivity, poverty (6,7). Reductions in the proportion of childhood
and as such they are not suitable to measure change in stunting is frequently used as a national and global
response to interventions or environmental conditions. target, often also as a proxy for developmental
Countries have traditionally used physical growth as outcomes improvement.

© WHO / Ala Kheir

1
GSED v1.0 Technical report

The urgent need for a comprehensive measure To address the lack of a population or programmatic
and reporting system for child development has measure or metric of ECD, WHO assembled an
been reinforced by the United Nations Sustainable interdisciplinary and multi-country team to develop
Development Goals (SDGs). Specifically, SDG4 the GSED. These measures of ECD for children up to 36
(education) calls for ensuring inclusive and equitable months of age provide a metric of child development
quality education and promoting lifelong learning (the D-score) at both the population and programmatic
opportunities. SDG4 includes an early childhood- level as well as a system for interpreting scores.
specific indicator, 4.2.1, which mandates measurement
of the proportion of children under 5 years of age who This Technical Report summarizes the process of
are developmentally on track in health, learning and creating the GSED and related measures, and describes
psychosocial well-being, by sex (8). WHO’s Multicentre the validation methodology and psychometric
Growth Reference Study has shown that under optimal properties in three countries (Bangladesh, Pakistan
conditions, children’s early growth (up to age 2 years) and the United Republic of Tanzania). Data from
is comparable across countries (9). These findings form additional countries (Brazil, China, Côte d’Ivoire and the
the basis for global growth standards that have been Netherlands) (currently being collected) will address
adopted by 125 countries. Similarly, emerging evidence broader global validity and inform potential revisions of
suggests universality in multiple domains of children’s the measures. The report concludes by describing the
development across countries during the first 2 to 3 release of the package v1.0, the next steps for GSED, and
years of life (10,11). Additionally, in contrast to growth, the process of dissemination and continuous feedback
which is measured universally by changes in children’s to enable the use of GSED to monitor progress towards
weight in grams and height in centimetres, there is no global goals and inform programmes and policy to
unit (or metric) for measuring child development. promote the health and development of young children
globally.  

Monitoring of child development in the early


years is important to track progress toward
global and national policy goals for children
and provides a critical reference to plan
and evaluate services to support children’s
healthy development.”

2
GSED v1.0 Technical report

2. GSED: overview
The GSED is an open-access package specifically designed to provide a standardized method
for measuring development of children up to 36 months of age at population level globally. The
GSED meets that objective by incorporating all domains of ECD through a common scale for
measurement translated into a single score, the D-score, that represents holistic development.
This section of the report provides an overview of the D-score approach and the GSED measures.

The D-score
The D-score is a unit of measurement with an interval acquire (see Figure 1). It is calculated as the mean of
scale representing child development by a single the posterior distribution conditional on the responses,
number (12,13). As height (in centimetres) and weight the items’ difficulty and the child’s age. The D-score
(in grams) change over time with the growth of the may also be transformed into the Development-for-
child, development (measured in D-score units) also Age z-score (DAZ). The DAZ is age-independent and is
increases with age as the child acquires more skills. The scaled such that at each age, the distribution of scores
D-score is calculated from Yes/No responses on a set is normally distributed with a mean of 0 and a variance
of age-appropriate developmental items (e.g. “Can the of 1. Since DAZ adjusts for the natural increase in
child stack two blocks”, or “Does the child use two-word D-score with age, it helps ease the comparison between
sentences?”). Conceptually, a child’s D-score falls along samples from different ages or countries. Similar to
a developmental continuum, beginning with simple height-for-age z-score (HAZ) and weight-for-age z-score
skills and behaviours that the child is able to perform (WAZ), the DAZ is calculated relative to a reference
and progressing through the child’s repertoire until population.
reaching skills and behaviours that the child has yet to

FIGURE 1. D-SCORE AND DAZ

3
GSED v1.0 Technical report

The GSED incorporates all domains


of ECD through a common scale for
measurement of holistic development.
The D-score is a unit of measurement
with an interval scale representing child
development by a single number.”

The GSED measures


The GSED SF and LF measures were created as a areas (14,15), developmental domain sub-scores are
response to the need for a population-based measure not provided. Rather an aggregate score comprises
of ECD up to 36 months of age. Criteria for the measure the essence of each domain at each point in time. The
were that it should be reliable, valid globally, easy intended use includes the evaluation of the impact of
to administer in field conditions, require limited interventions applied to groups, and the comparison
training, be easy to interpret and openly accessible. of groups of children (for example, for monitoring
The GSED measures capture the underlying construct or research purposes). The GSED measures may be
of child development which includes the motor, administered using the GSED App on a tablet or as a
language, cognitive and socio-emotional domains. The paper version. The development and validation process
measures comprise a caregiver report SF and a direct of the GSED measures as well as a detailed description
administration LF. Each GSED measure (SF or LF) can of each measure are covered in Sections 3 and 4 of this
be stand-alone, or used in combination with each other report.
for a more comprehensive and precise assessment of
child development, depending on the requirements of Two additional caregiver report forms are being further
the measurement effort (e.g. research, programmatic developed and tested. The GSED HF is being proposed
evaluation, etc.). to be used for inclusion in large-scale data collection
efforts, including existing multi-topic household
The GSED measures are designed for use at population surveys (such as Multiple Indicator Cluster Surveys
and programmatic level. Even though they are collected or Demographic Health Surveys [DHS]). The GSED PF
for individual children, the results are not validated aims to capture psychosocial skills and behaviours
to be interpreted for any specific child. The GSED or non-normative developmental patterns that may
measures have not yet been tested within the context indicate challenges in children’s mental health. The
of clinical use and should therefore not be used for GSED PF’s items are not age-ordinal, and hence
screening individual children for developmental delays their results are not scored on the D-score scale. The
or impairments nor for diagnosis. creation and validation of the GSED HF and PF across
different cultural contexts are ongoing and are part of
Given the need for an easy-to-interpret score, as well forthcoming work (see Section 6 and Boxes 4 and 5 for
as evidence that in the first years of life development an overview).
is integrated, overlapping and linked across multiple

4
GSED v1.0 Technical report

3. Development
of GSED v1.0 SF
and LF measures
This section describes the development process of the GSED SF and LF measures, from
conceptualization through statistical analysis and expert consensus, to the prototypes for
validation (16). The GSED development followed a rigorous methodological process which
required a multi-step approach as summarized in Figure 2.

FIGURE 2. GSED MEASURES CREATION FLOWCHART

Step 1 Conceptual framework

Harmonization of existing datasets


Step 2 Item bank creation
(807 development items pertinent to 0 to < 36-month-olds)

Item matching

Statistical modelling (2PL vs Rasch)

Item feasibility Item domaining

Step 3
Item shortlisting
Short Form Creation of R Shiny App
Long Form
Global Scales for Early
Global Scales for Early
Development (GSED) Development (GSED)
Caregiver Report Direct Observation

GSED SF GSED LF
creation creation

5
GSED v1.0 Technical report

Step 1. Conceptual framework:


Target Product Profile (TPP)
A conceptual framework was developed by a group of international ECD and measurement
experts and researchers (13, 17-18) through consensus around the assumption that ECD
refers to various skills and behaviours that emerge sequentially early in life and are influenced
by gene-environment interactions occurring between conception and 3 years of age (19,20).
Therefore, the first step in the GSED measures development process focused on identifying
target properties and implementation characteristics that the package of measures needed
to achieve. The output of this process was the TPP listing minimum and optimal target
characteristics.

Step 2. Harmonization of existing datasets


on child development
Once the TPP was in place, an extensive collection of ECD datasets was brought together into
an item bank. The common dataset included data from 18 instruments, 51 cohorts, 32 countries
(of which 30 are low- or middle-income), 66 075 unique children, 100 153 observations (child/
age combinations) and 4 314 146 scores (see Annex 1 for details). The total number of items
was 2221. Sixteen studies included longitudinal data. Most instruments collected dichotomous
Yes/No scores, but a few used additional categories, such as ”Sometimes” or ”Not yet”. In
consultation with SMEs, these responses were re-coded into binary 0/1 scores.

Step 3. Creation of the GSED SF and GSED LF measures


Step 3.1 Item matching
The item-matching component was designed to establish the strength of the relations between
pairs of items across different instruments in the GSED item bank that measure the same or
very similar skills and behaviours in slightly different ways. SMEs in child development classified
such similar items from different instruments into clusters or equate groups. A subset of these
groups was used to connect every instrument to a shared latent factor (child development
construct). Additionally, these clusters helped link instruments and avoided duplication in the
final measures (see Figure 3 for the data collection matrix).

6
GSED v1.0 Technical report

FIGURE 3. DATA COLLECTION MATRIX AND CRITERIA FOR MATCHING EXERCISE

These questions stay Choose the strength Note: items are


locked in place of match via the marked as not
dropdown menu matched by default

Use the age range and Tip: use “page down” Do not put anything
domains for quicker to efficiently skim in red cells
navigation through the items to
be matched

Step 3.2 Statistical modelling


The Rasch model (21) was used to place children and items on the same scale. Rasch
constructed a method to map and scale the responses on different, overlapping attainment
tests given to different school classes. This method uses one parameter to quantify the child’s
ability at a given age and one parameter to quantify the difficulty of a test item. Rasch placed
both parameters on the same scale and showed that this scale adheres to the principle
of invariant comparison. Briefly, the principle implies that the comparison between two
individuals, or between the same individual at two time points, should be independent of the
test administered. The Rasch model has been highly influential in educational research and is
slowly being adopted in health, agricultural and market research fields. The approach for child
development data is very similar to Rasch’s application in the education field, which makes it a
natural fit (22).

7
GSED v1.0 Technical report

The difficulty parameters were estimated for a subset of 818 items that fitted the Rasch model
as judged by infit and outfit1 criteria using 17 equate groups2 that anchored all instruments
to the D-score scale. Scores extracted from the Rasch model were compared to scores from a
more general two-parameter logistic (2PL) model, which uses one additional discrimination
parameter per item. Extremely strong correlations between scores from the two models were
observed (r = 0.97), and so the more parsimonious Rasch model was selected over the 2PL.3

Step 3.3 Item feasibility


The item feasibility component was designed to provide judgement data by SMEs on the
appropriateness of each item for capturing development across various geographic, cultural
and language contexts. The data were used to identify for removal any items that were in
the item bank but were judged to be unsuitable for the diverse contexts in which the GSED
instruments would be used (see Figure 4 for the survey on feasibility).

FIGURE 4. SURVEY ON FEASIBILITY INFORMATION

1 Infit and outfit are “fit” statistics. In a Rasch context they indicate how accurately or predictably data fit the model.
2 Equate refers to groups of items that were held to have a constant difficulty (tau) across tools, based on evidence-led analysis of item similarity.
The equate groups permitted the different instruments used for different children to be on the same scale.
3 For a deeper discussion on model choice, see the AERA 1992 debate between Hambleton and Wright
(https://www.rasch.org/rmt/rmt61a.htm and https://www.rasch.org/rmt/rmt62d.htm).

8
GSED v1.0 Technical report

Step 3.4 Item domaining


The domaining exercise was designed to provide expert judgement on precisely which
domains of development each item measures. The taxonomy of domains was adapted from
the Caregiver Reported Early Development Instruments (CREDI) project (23). See Table 1 for the
taxonomy of domains.

TABLE 1. TAXONOMY OF DEVELOPMENTAL DOMAINS

Primary domain Sub-domain

Motor • Gross
• Fine

Language • Receptive
• Expressive
• Problem-solving/reasoning

Cognition • Executive function (e.g. attention, memory, inhibition)


• Pre-academic knowledge (e.g. letters, numbers, colours)

Socio-emotional • Emotional and behavioural self-regulation (e.g. controlling


emotions and behaviours)
• Emotion knowledge (e.g. identifying emotions)
• Social competence (e.g. getting along with others)
• Behaviour challenges/issues – internalizing (e.g. withdrawal)
• Behaviour challenges/issues – externalizing (e.g. hitting, kicking,
biting)

Adaptive Life skills (e.g. using toilet, dressing)

© WHO / Sergey Volkov

9
GSED v1.0 Technical report

Step 3.5 Item shortlisting


As a guiding principle and to avoid burden on respondents, two separate forms were created,
one fully caregiver reported and one fully directly administered, that could potentially be
combined for assessment depending on the target use. Additionally, for each of the forms, it
was decided that it would be preferable to test more items during the validation process to
allow a better-informed item selection after field-testing in multiple contexts and comparison of
which administration modality (caregiver reported versus directly administered) would better
capture specific skills and behaviours (to inform the final item selection for the additional CB,
avoiding repetition).

Step 3.6 Creation of R Shiny App for information display


All items were initially collated in spreadsheets and then combined with the psychometric
data available on each item from existing datasets and their estimated difficulties (taus) onto
an interactive dashboard created via an R Shiny App (24) for interactive visualizations. Shiny is
an R package that enables building interactive applications that execute R-code on the back
end, for example stand-alone applications on a webpage with interactive charts or dynamic
dashboards. The interactive dashboard facilitated the SMEs’ final item selection, as it provided
real-time evaluations of domain coverage, reliability statistics and measurement performance
by age group for any selected items.

Step 3.7 Item selection for each measure


The creation of the GSED measures for validation comprised four steps, which were all
processed in the interactive R Shiny App dashboard: i) selection of a single item from a group
of matched items for measuring a given behaviour (which included choosing the item that was
performing best across countries and wording of the item in the simplest and most culturally-
neutral way possible); ii) allocation of items to GSED SF or GSED LF (and potential adaptation
of the item from caregiver reported to directly administered and vice versa as needed); iii)
evaluation of the age and domain coverage of the list of items selected for each measure; and
iv) evaluation of the psychometric properties of the measure based on available data and
simulations. SMEs iterated over item selection and evaluation until a suitable final set of items
was identified to contribute to each form.

Items in the GSED SF were ordered by level of difficulty, based on data available on each item
reflecting children’s emerging skills. The order was slightly revised by SMEs to ensure it was
consistent with child development theory. Items in the GSED LF were first grouped in three
streams to facilitate administration flow during direct administration to the child and then
ordered following the same process described for the GSED SF. Each stream uses specific
materials to facilitate administration. Stream A includes items related to physical activity and
movement, Stream B uses tablet-based images and Stream C uses materials from the GSED LF
Kit (see below).

One hundred thirty-nine items were shortlisted for the GSED (caregiver response options of Yes,
No and Don’t know) and 155 items for GSEF LF that either observed incidentally or elicited by
the assessor or both (response options are binary, skill observed or not). “Start” and “stop” rules
are used for both forms’ administration based on child’s age and performance.

10
GSED v1.0 Technical report

Step 3.8 GSED SF and LF measures finalization


Once the items and sequences were finalized for each form, the SMEs reviewed the items
to identify where audio or visual prompts might be beneficial to facilitate training, caregiver
understanding (for GSED SF) and administration of items for assessors (for GSED LF). All media
files were created to accurately capture the skill or behaviour described in the items as well as
to be as culturally neutral as possible. Lastly, GSED LF items were reviewed to identify those
requiring physical materials for administration (e.g. stacking blocks, responding to rattle,
drawing on paper). A detailed description of the materials needed to form the GSED LF Kit was
created (including objects such as a timer, small 2 cm square blocks, a spoon, a plate, a cup,
a crayon/pencil, etc.) and guided by the principle that the kit should be assembled locally
and at a low cost. For the GSED LF items requiring children to look at images, efforts were
made to include drawings and pictures that could be displayed on a tablet. The principle was
to reduce the need for additional materials and printing and to streamline and standardize
administration. The GSED measures were tested and revised during the training and feasibility
phases of the validation study (see Section 4).

Step 4. GSED App development


To facilitate administration of the GSED measures, which require “start” and “stop” rules based
on age and performance, limit data entry errors, and standardize data collection, an electronic
data capture (EDC) system was designed. The EDC system uses Open Data Kit (ODK), a free
and open-source software used widely for collecting, managing and using data in resource-
constrained environments. The ODK Collect App (see Figure 5), an Android-based data
collection app, was customized to create a grid-based user-friendly interface for administering
the GSED LF. The GSED App was programmed to automatically determine the age-based
starting question, and all the required “start” criteria and “stop” rules. For the GSED LF, the app
enabled the assessors to score items during the direct observation of the child’s performance
while also providing administration instructions. The
GSED measures were created using XLSForms which were
then converted to ODK XForm. For the specific purpose
of data collection in the validation sites (see Section 4), in
addition to GSED measures, all other study data collection
instruments were designed and incorporated in the same
app. ODK aggregate with MySQL database was used as
an aggregator. In addition, a separate dedicated data
management and monitoring tool was designed enabling
the study team to effectively manage, monitor and
generate analysable output files in a standardized manner.

11
GSED v1.0 Technical report

FIGURE 5. GSED APP

The GSED App prototype was pre-tested


through Google Play with two rounds
of feedback. The key feedback received
focused on the visual interface, colours and
fonts, number of questions per screen, and
the functioning of the administration rules as intended,
as well as the ease of using the media files (GSED SF) and
in-built administration instructions support (GSED LF). The
GSED App was revised and tested in the feasibility phase of
the validation (see Section 4).

12
GSED v1.0 Technical report

4. GSED validation
This section describes the methodology undertaken for the validation of the GSED measures.
The GSED validation study was planned in seven countries varying in geography, language,
culture and income, and implemented in two rounds (since funding was received in two
stages): Round 1 included Bangladesh, Pakistan and the United Republic of Tanzania
(validation completed and results informed this technical report); and Round 2 including
Brazil, China, Côte d’Ivoire and the Netherlands (data collection for the validation ongoing).
Data from Round 2 countries will further expand the generalizability of the results for global
use and is expected to be published in early 2024. Figure 6 shows the GSED validation partners
in each country.

The study utilized a rigorously standardized protocol The work included a preparation and feasibility phase
across countries with a mixed qualitative and and the main validation data collection phase. The
quantitative methods approach combining cross- methodology of both phases is described below. The
sectional and longitudinal approaches to evaluate results presented in this report are limited to Round 1
the psychometric properties of the GSED SF and LF: countries where data collection has been completed.
reliability (inter-rater and test-retest), concurrent
validity, convergent validity, short-term predictive In addition to the main validation effort, external
validity at six months and responsiveness (25). This research groups have proposed supporting
study received ethical approval from WHO (protocol generation of further GSED validation data through
GSED validation 004583 20.04.2020) and approval in the inclusion of the GSED package as secondary
each site. research outcomes in ongoing studies (see Box 1).

© WHO / NOOR / Sebastian Liste

13
GSED v1.0 Technical report

FIGURE 6. GSED VALIDATION STUDY SITES AND PARTNERS

Country Partner Institution

Projahnmo Research Foundation, Dhaka, Bangladesh


Bangladesh
[completed]

Johns Hopkins Bloomberg School of Public Health, Baltimore, USA


Round 1

Pakistan AKU, Karachi, Pakistan

CPHK Global, Zanzibar, United Republic of Tanzania


United Republic of
Tanzania Public Health Laboratory – Ivo de Carneri Pemba, Zanzibar, United Republic of Tanzania

Brazil University of São Paulo Medical School, São Paulo, Brazil


[ongoing]
Round 2

China National Children’s Medical Center/Shanghai Children’s Medical Center, Shanghai, China

Côte d’Ivoire IPA, Abidjan, Côte d’Ivoire

Netherlands TNO, Sylviusweg, Leiden, the Netherlands

14
GSED v1.0 Technical report

BOX 1. TESTING OF GSED THROUGH ONGOING STUDIES


The inclusion of GSED in external studies is a form of testing the GSED measures (either SF and/or LF and/or
PF) at sites of ongoing or newly-launched research projects with the aim of producing further validation and
feasibility evidence on GSED that is otherwise not covered by WHO’s main validation efforts. This approach
provides an opportunity to leverage existing resources by partnering with external research groups to
access a wider range of countries and populations, and to broaden the range of entry points and settings for
administering the GSED. Specifically, this work aims at generating additional GSED validation and feasibility
data for:
i. global relevance (increasing generalizability);
ii. sensitivity to change (programmatic evaluation);
iii. biological markers convergence;
iv. predictive validity;
v. feasibility of ongoing survey integration;
vi. developmental trajectories exploration.
Depending on the original study design and population, the additional data generated contribute to one
or more of the objectives above. Through the inclusion of GSED in external research protocols, additional
evidence is expected to be collected for the use of GSED for specific purposes in diverse contexts that should
increase its global uptake.

Preparation and feasibility phase


The preparation and feasibility phase included translation and local adaptation of study
measures, defining and writing standard operating procedures, recruitment and training of local
teams, and testing of implementation processes to ensure feasibility. This section describes the
activities conducted as well as the results and related decisions for main validation.

Translations and adaptation


The GSED measures followed a rigorous and
standardized translation and adaptation process to
capture linguistic and cultural nuances, without losing
the essential conceptual focus of the items and forms.
In brief, each GSED measure was translated into the
local languages (Bangla for Bangladesh, Urdu and
Sindhi for Pakistan and Swahili for the United Republic
of Tanzania), by two independent and qualified
translators. These translations were reviewed by the
local study team to agree on a translation by consensus.
This agreed-upon local language translation was then
back-translated into English by two other independent

15
GSED v1.0 Technical report

translators. This back-translation was then shared with Some of the key findings from the preparatory and
the WHO team, who initiated an iterative process of feasibility phase are provided below.
revisions with the local team before approval. During
the preparation and feasibility phase the approved Visit schedules Data were collected in two visits (in
translations were tested (see below) so that final Bangladesh both a one-visit and two-visit option were
modifications, if necessary, could be made before data tested) with the sample divided into subgroups to test
collection. This process led to adaptation of the GSED the feasibility of different visit schedules in terms of
measures to include cultural nuances for language and order of administration of study measures and settings
context; few items were revised with minor rephrasing for the visits. The first visit was done at home to test
in each country. the GSED SF in the setting for which it was intended for
future use (i.e. household surveys) and to administer the
Training study measures (see Annex 2) aimed at capturing the
child’s everyday home living environment. In Pakistan
One joint in-person training of trainers course was
and the United Republic of Tanzania, the second visit
conducted for the study teams. The training covered
was carried out in a mobile clinic/clinic setting to
implementation processes and all data collection forms.
facilitate anthropometric testing and standardize the
The training included workshops on child development
setting for concurrent validation (with Bayley-III). In
principles, detailed review of administration processes
Bangladesh, the second visit was done at home due
and item-by-item review. It included practical sessions
to transportation and travel times between the sites.
with live demonstrations, role-plays, and practice in
This process worked well and was confirmed for the
pairs and with both caregivers and children. Participants
validation. The two-visit model was also confirmed in
in the training were certified as local master trainers and
all countries, including Bangladesh, due to time needed
were then responsible for training the field assessors
for administration. Fine-tuning of the sequence in
in their country. For the local assessors’ training, a
which the study measures were administered took into
structured two-week training programme was designed
account how to optimize the privacy needed for some
by the local master trainers, in consultation with
of them.
the WHO team, with thematic/didactic sessions in
classrooms and practice sessions. To be able to collect
Reliability testing Two reliability testing processes
data, field assessors completed a certification process,
were carried out in a subsample of a minimum of
which entailed achieving an agreement of 90% on
16 children per country to explore the feasibility of
the forms’ scoring between the assessor and the local
conducting multiple administrations of study measures
master trainer while administering the GSED forms.
within fixed time frames. First, for GSED LF, both a video
recording approach (Bangladesh N=15 and Pakistan
Feasibility of implementation processes
N=12) of the GSED LF assessment was tested (with
Data were collected from a minimum of 32 children videos watched and independently scored by other
per country, stratified by age group and sex, to test assessors and the study master trainers as well as a
feasibility and acceptability of administration of the simultaneous coding by another assessor (N=8 per
study instruments, as well as to finalize implementation country). For GSED SF (and GSED PF) audio recording
processes, such as visit schedules including time (Bangladesh and Pakistan, N=16) the administration
required, and the use of tablets. The sample adequately was carried out and independently scored by other
enabled all implementation procedures to be tested assessors on the study team. In Pemba, United Republic
such as for reliability testing processes, checks of Tanzania, the reliability testing was completed via
carried out on completeness of data collection and live coding of the administration of the test by another
whether any items from the measures, both GSED assessor (GSED SF N=22, GSED LF N=23]. Several
and contextual, had a significant number of missing limitations were encountered with the video and
or otherwise invalid responses in any of the countries. audio recordings. As the camera was placed in a fixed

16
GSED v1.0 Technical report

location for the video recordings, the motor component was reinforced with a focus on supervised practice to
of the GSED LF assessment was difficult to capture ensure that data collectors were sufficiently comfortable
consistently and be coded. In addition, some sites faced with and efficient about administration of the study
issues with providing sufficient lighting during recording measures. Additionally, emphasis was placed on further
to enable later coding. The audio recording method for clarifying the time commitment for study participation
GSED SF posed a challenge pertaining to the quality at the consent stage and deciding to implement data
of the voice recordings. The in-person approach with collection in two separate visits at all sites. Some
an independent assessor coding the observations concerns were raised by caregivers and assessors about
simultaneously with another assessor administering the sensitivity of some of the questions (see Annex 2
the GSED measures was found to be feasible and of for study measures), which were addressed by ensuring
higher quality for inter-rater reliability and was therefore adequate privacy for conversations with the caregiver
implemented in all sites during the validation phase. (e.g. arranged at a health clinic or outside of the home,
or during a time when a private setting could be found
Quality control Ten per cent of the scheduled visits at home). In the GSED LF exit interviews, some parents
of each assessor in each site were randomly selected said the skills tested (28%) and materials administered
for a quality control visit by the master trainer or local (43%) were unfamiliar to children in their community.
supervisor. During these visits, the master trainer or Based on the preliminary quantitative analyses the
supervisor completed a checklist for the administration items in question (e.g. those including blocks, shape
of the tests that included verifying the child’s and board, peg board) seemed to perform well and were
mother’s ages, date of birth, consent, rapport building kept without significant change. Moreover, 15% of
and accuracy in administration of the study measures. respondents also specifically offered positive comments
In this testing phase, at least 10% of the total visits about their experiences with the GSED LF, such as
at each site were also video recorded. The direct learning what their children could do and the need for
administration approach for quality control was found more education on ECD.
to be feasible and reliable as compared to video
assessment, due to challenges with the video recording Qualitative data – focus group discussions (FGDs)
mentioned above. At the end of the feasibility phase, virtual structured
FGDs with the assessors, supervisors and study
Qualitative data - exit interviews During the managers from each country were conducted (N=42).
feasibility phase, exit interviews with caregivers The purpose was to elicit local field team feedback
were conducted to understand the caregiver on implementation processes (consent process, ease
experience in the consent process, the acceptability of administration of study measures, visit schedules,
of GSED administration, visit schedules (N=63) and use of the GSED App, training needs, comprehension
the acceptability of the GSED LF items, materials, of the items by assessors and caregivers), and cultural
instructions and procedures (N=72). The assessors appropriateness of GSED SF and GSED LF items and
recorded the responses and any narrative comments the GSED LF Kit materials. FGDs were audio-recorded,
by caregivers on paper, and these were later translated transcribed and translated by local staff with specialized
into English. From these exit interviews, it emerged training in qualitative methods. These data were
that most (> 90%) respondents said that the various analysed using Dedoose (26). Codes were created for
aspects of the implementation process (e.g. comfort the items included in the FGDs and applied to the
with items asked, where and when they were asked responses from participants. Themes were identified
to respond to items) were acceptable. However, 14% for summary analysis. Overall, the results indicated that
of respondents said that the duration of the visit was the assessors‘ experiences with the administration of
very long. The duration of the total interview time the study measures were positive. The main concerns
was expected to decrease with familiarity with the expressed were, consistent with caregivers‘ experiences
study measures; nonetheless, the training process shared in exit interviews, about ensuring privacy for

17
GSED v1.0 Technical report

measures with sensitive items (such as caregiver Based on the exit interviews and FGDs, along with
depression or exposure to violence, see Annex 2). The feedback from the translation process, 13 GSED LF items
FGDs also prompted changes in how to introduce the were identified as difficult to administer or understand.
GSED SF (e.g. more information to be provided about These items were adapted to facilitate administration,
the fact that, by design, the validation of the measures or rephrased to improve ease of translation and/or
implies that some items may seem repetitive to the clarity of administration instructions. For example, the
respondent) and tips for optimizing administration of GSED LF item to assess the child’s understanding of the
the GSED LF (e.g. offering materials, such as blocks, to concepts of “more” and ”less” initially asked the child to
children to familiarize them with the objects prior to indicate which of two cups contained more water. This
administration). The FGDs were also useful for teams to item was reportedly hard for children to understand
strategize how to manage implementation challenges and complete as some children wanted to play with the
during GSED LF administration (e.g. keeping children water or drink from the cups. This item was adapted by
engaged, avoiding distractions, etc.). asking the child to indicate which of two piles of blocks
had more blocks (in place of the cups with water).

Validation phase
Study population children who were low birth weight (< 2500 g); born
preterm or late term (gestational age < 37 or ≥ 42
The study population included children 2 weeks to
completed weeks); undernourished (weight-for-age,
41 months of age (inclusive) living in the study areas.
length-for-age or weight-for-height z-score < –2 SD
Children up to 41 months were included to ensure
based on the WHO Child Growth Standards) at the
that measure parameters could be estimated with
time of developmental assessment; had a known
adequate precision at 36 months. Inclusion criteria
severe congenital birth defect, history of birth asphyxia
were the child’s age, the child’s primary caregiver
or neonatal sepsis requiring hospitalization, known
(person most familiar with the child and spending most
neurodevelopmental disorder or disability, or other
time with him/her) was available to participate in the
chronic health problem; or primary caregiver had less
study, and the family spoke to the child in one of the
than secondary level education.
GSED translated languages. Children were excluded if
gestational age or birth weight data were missing.
Sample size and sampling scheme
A subsample of children was used to estimate The target sample size per site was 1248 children (total
preliminary reference scores and is henceforth referred 3744 in the three countries). In each site, children were
to as the “reference” sample. Additional exclusion sampled from a list of potentially eligible caregiver-child
criteria were applied to this subsample to exclude dyads residing in the defined study areas. Only one child

18
GSED v1.0 Technical report

per caregiver or multi-family household was selected GSED administered first and vice versa); and 504 to
and the target children’s primary caregiver approached re-evaluation six months later for predictive validity.
for consent and enrolment. Children who were acutely
unwell at the time of assessment were rescheduled after Within the predictive validity subsample, children
seven days. Refusals to participate and drop-outs were were further divided into groups that also received
registered and replaced. After consent was obtained, the GSED adaptive measure to determine whether
children were allocated to sex and age groups using the adaptive testing is a feasible and valid option to
sampling scheme in Figure 7. Larger quotas were set for measure child development within the GSED (see Box 2
the youngest age groups where rates of development for further details), or the UNICEF’s Early Childhood
are steepest. Out of the full sample of 1248 children per Development Index 2030 (ECDI2030) measure to inform
site, 154 were randomly allocated to either inter-rater harmonization of measurement of child development
(N=99) or test-retest (N=55) reliability testing; 166 to up to 5 years at population level (see Box 3 for further
concurrent validity testing with the order of GSED and details).
Bayley-III administrations counter-balanced (N=83 with

FIGURE 7. GSED SAMPLING SCHEME IN EACH COUNTRY

Full sample N = 1248

Reliability Concurrent
N=154 (140 + 10%LTF – STOP at 140) N=166 (150+10%LTF – STOP at 150)1

Inter-rater Test retest Concurrent 1 Concurrent 2


N=99 (90)1 N=55 (50)1 N=83 (75)1 N=83 (75)1

Predictive(N=504) 2
Adaptive (N=432 + 72 new)3
ECDI (N=230)4

[1] The number inside parentheses is the number collected and the number outside is the number randomized to account for loss to follow-up.
[2] Two additional participants have been added to the predictive to have equal numbers in each experimental group.
[3] 72 new children between 2 weeks and 6 months of age have been added to the adaptive sample to ensure coverage at the lower ages.
[4] ECDI was only done on N=230 children of 24 months and above at the time of the predictive data collection.

19
GSED v1.0 Technical report

© WHO / NOOR / Sebastian Liste


BOX 2. GSED ADAPTIVE TESTING
Computerized adaptive tests (CAT) are widely used in education. They tailor the order of administration
for each participant based on the participant’s responses to the prior questions. The GSED adaptive test
selects a first item (i.e. starting item) based on an external estimate of the child’s proficiency, for example,
based on age. After recording the response to that item, the underlying algorithm calculates the D-score and
determines whether the result is precise enough to meet the “stop” criterion. If this is not met, the algorithm
continues to select the next item, such that it provides maximal information given all previous responses,
and the cycle continues. The process ends once the predefined “stop” criterion is satisfied. Figure 8 visualizes
the algorithm. Adaptive testing is a modern, flexible and personalized method to quantify each child’s
D-score. As yet, there is little published research and experience utilizing adaptive testing for measuring child
development, beyond the efficiency of adaptive testing in simulation studies (27,28) or clinical settings (27).
The adaptive testing approach for GSED SF and GSED LF has been tested within the GSED validation study
and the results are under review.

FIGURE 8. CAT ALGORITHM FOR THE D-SCORE

Answer
Start item

Calculate
Next item D-score

Evaluate
No stop rule:
—min standard Final
error of Yes D-score
measurement |
—max items

20
GSED v1.0 Technical report

BOX 3. LINKING GSED AND ECDI2030


Population-level assessments of ECD outcomes allow countries to track their progress in achieving SDGs
while also generating data for advocacy and policy purposes. One of the most relevant for ECD is SDG
indicator 4.2.1 (the proportion of children aged 24 - 59 months who are developmentally on track in health,
learning and psychosocial well-being). UNICEF’s ECDI2030 is recognized as a viable measure for assessing
progress towards this target. However, research indicates that the first 1000 days of life provides an important
window of opportunity to build a strong foundation for future development. Therefore, an open question
remains of how to harmoniously and meaningfully track the development of children starting from birth to 2
years, when inequities in health and education begin. Establishing the feasibility of connecting population-
level monitoring in the first 24 months of life with that of older children and improving the evidence-building
capacity in this area (e.g. providing metrics that allow for comparison of programme/policy impact across
the first 5 years of life) is clearly imperative for advancing the child well-being and equity agenda beyond
2030. Considering the urgency to address the current gap in measurement of ECD in children less than 2
years of age, WHO and UNICEF advanced the dialogue on how to align the GSED and ECDI2030, to enable
governments and partners to plan for continuity in measurement of ECD across the 0 – 59 months age range,
in an interpretable and psychometrically valid manner.
To explore this objective and given the complementarity, scope and format of the two tools, leveraging the
ongoing validation efforts, WHO collected data on 628 children aged 24 – 41 months across three countries
where both GSED SF and ECDI2030 were administered in their entirety for the same child. The specific aim of
the work was to explore how scores from the two measures link together and relate to one another so that
continuity can be established in measuring children’s development from birth. The results from this work will
be published separately.

Study measures biological and social determinants of development (29),


the demonstrated validity of the contextual measures
The GSED SF, GSED LF (and GSED PF) measures, as
in at least one low- and middle-income country,
well as measures of children’s growth (anthropometry)
and efficiency for data collection. While there is no
and nutrition, health, environmental and contextual
universal gold standard that can be recommended for
information such as family and home environment
concurrent validity testing, the Bayley-III measures a
through Home Observation Measurement of the
similar construct to the GSED and had previously been
Environment (HOME) or Family Care Indicator (FCI),
used by the country teams. The Bayley-III was therefore
Childhood Psychosocial Adversity Scale (CPAS), Patient
administered in the concurrent validity subsample
Health Questionnaire-9 (PHQ9), Brief Resilience Scale
and the ECDI2030 (see Box 3) to inform harmonization
(BRS) and Family Support Scale (FSS) (see Annex
of measurement beyond 3 years in a subgroup of the
3) were administered to all children to examine
predictive validity subsample.
convergent and discriminant validity and to identify a
subsample of children with minimal constraints on their
development in order to create preliminary reference
scores. The selection of measures was based on known

21
GSED v1.0 Technical report

Visit schedule and quality control participants for scheduling, data collection status for
the study age and gender bins and completion status of
Data collection was scheduled over one to three visits the participants in the study. The data from the module
depending on the study site and subsample. The were reviewed weekly by the country teams and also
first administration of the GSED SF was completed at with WHO for study status monitoring.
home to test it in the setting intended for future use
(e.g. household surveys) and prior to administration of The prototype was pre-tested through Google Play
the GSED LF to avoid influencing caregiver responses. with two rounds of feedback by the field teams, SMEs
The GSED LF and Bayley-III were administered in a and the statistics teams. The key feedback received
controlled environment (e.g. clinic or quiet residential focused on the visual interface, colours and fonts,
area) to match the required testing protocols of the number of questions per screen, and the functioning
concurrent validity measure. To ensure high quality, of the administration rules as intended, as well as the
10% of all study visits were observed in person by facility of use of the media files (GSED SF) and in-built
study supervisors, covering each child age band and administration instructions support (GSED LF). The
certified assessor. The supervisors had at least five years GSED App was then revised and field-tested in the
of experience in community-based research and/or feasibility phase of the validation for ease and accuracy
formal education in fields related to ECD (e.g. teaching, of data collection and transfer. Following the feasibility
nursing, psychology). Supervisors independently phase, the revised GSED App version was released and
completed questionnaires administered by the assessor tested for the following features: placement of media
and completed a fidelity checklist to provide feedback files and questions on screen for improved speed in
to assessors. Supervisors reviewed quality assurance using the App, inclusion of a pop-up screen to inform
findings with WHO biweekly. The GSED App for data the assessor the age of the child to be assessed, and
collection provided built-in data range and consistency inclusion of a pop-up asking if all questions have been
checks. Data managers reviewed and resolved issues completed for the child’s age as per the rules prior to
daily in consultation with the local field and/or WHO saving the forms.
team.
Once collected, the data were stored in local password-
Data management protected user authenticated servers. The de-identified
As described above, to optimize ease of administration data were securely transferred to the WHO central data
of the GSED measures and minimize data entry errors, repository by each site. They were transferred weekly
the GSED App was designed which also improved for the first month of data collection and then bi-
standardization of data collection across study sites. In monthly. The weekly data collected in the first month
addition to GSED measures, all other study measures were reviewed for consistency with the data dictionary,
were designed and incorporated in the same app. checks for missing data, data formatting and diagnosing
ODK aggregate with MySQL database was used as any potential problems (missing or non-sensical data).
an aggregator. Lastly, a separate data management Teams maintained detailed logs related to procedures
and monitoring module was designed enabling for rescheduling or incomplete visits. These were
the study team to effectively manage, monitor and reviewed weekly for the resolution of queries with the
generate analysable output files. The module for data WHO team. The final data set was transmitted by each
management allowed data managers to check for the country to the WHO repository for analysis.
completion status of the forms, flagged missing data,
status of the visit schedule and the visit windows for the

22
GSED v1.0 Technical report

5. GSED psychometric
properties
This section addresses various aspects of the GSED’s psychometric performance in the three
Round 1 countries (Bangladesh, Pakistan and the United Republic of Tanzania). The analyses
in this section, with the exception of the analysis of reliability, are performed on the combined
measure, i.e. the GSED SF and GSED LF data together. This is to reflect that primarily the scale
from which the measures are drawn is being validated, rather than the individual tools. However,
in order to show the limited differences between the psychometric properties of the GSED LF
versus the GSED SF versus the CB, concurrent, convergent and short-term predictive validity for all
three forms of the measure are presented in Annex 3.

In total, data were collected on 4452 children across the numbers collected for each of the sites by measure.
three countries. Some countries collected more data Data from 41 children were removed from the analysis
than the specified sample size in order to: i) meet the based on notes provided by the countries that the data
minimum quota of the reference sample; and ii) ensure were invalid for various reasons (e.g. duplicate entries,
every age group stratum was sampled to the specified withdrew from the study, etc.), and 62 participants were
level. Data were analysed on 4349 children, with removed as they had neither GSED LF nor GSED SF data
randomly selected children contributing to the various available at baseline due to incomplete administration
reliability and validity subsamples. Table 2 shows the of the battery of tools.

TABLE 2. NUMBERS OF CHILDREN BY STUDY MEASURE COLLECTED AND BY COUNTRY


Bangladesh Pakistan United Total
Republic of
Tanzania

GSED SF sample 1336 1663 1350 4349

GSED LF sample 1332 1642 1344 4318

Test-retest reliability GSED SF subsample 48 59 52 159

Test-retest reliability GSED LF subsample 48 59 52 159

Inter-rater reliability GSED SF subsample 95 100 96 291

Inter-rater reliability GSED LF subsample 95 99 95 289

Predictive validity subsample 472 455 469 1396

Concurrent validity subsample (Bayley-III) 159 158 161 478

23
GSED v1.0 Technical report

Table 3 presents the demographic characteristics of up challenges. The large sample size, relative to similar
the sample. The mean age of the children is higher in studies, and the standardized implementation of the
Pakistan than in the other countries because more older tools across multiple countries lends strength to the
children were recruited to ensure a sufficient sample for validity inferences and robustness of the results of the
the predictive validity subsample after 6-month follow- study.

TABLE 3. DEMOGRAPHIC CHARACTERISTICS BY COUNTRY


Bangladesh Pakistan United Total
Republic of
Tanzania

Male (%) 50 50 50 50

Mean age in days (SD) 432 (375) 475 (381) 432 (377) 448 (378)

Mean age in months (SD) 14.20 (12.32) 15.61 (12.52) 14.19 (12.38) 14.72 (12.42)

GSED DAZ (SD) 0.24 (0.75) -0.34 (0.85) 0.07 (0.81) -0.03 (0.84)

Gestational age – weeks (SD) 38.67 (1.71) 38.41 (2.00) 38.71 (1.76) 38.59 (1.85)

Birthweight - grams (SD) 2921 (422) 3251 (723) 3226 (513) 3143 (599)

Anthropometry – HAZ (SD) -1.30 (1.09) -1.33 (1.17) -1.27 (1.12) -1.30 (1.13)

Anthropometry - WAZ (SD) -1.09 (1.08) -1.45(1.02) -0.77 (1.07) -1.13 (1.09)

Maternal education – N (%)


No schooling 25 (2%) 517 (32%) 163 (12%) 705
Primary 299 (22%) 224 (14%) 343 (25%) 866
Secondary 735 (55%) 358 (22%) 795 (59%) 1888
Post-secondary 277 (21%) 528 (32%) 49 (4%) 854

PHQ9 – N (%)*
Minimal 587 (44%) 1215 (74%) 808 (61%) 2610
Mild 437 (33%) 220 (12%) 422 (32%) 1079
Moderate 121 (9%) 89 (5%) 52 (4%) 262
Moderate-severe 84 (6%) 52 (3%) 27 (2%) 163
Severe 103 (8%) 65 (4%) 19 (1%) 187

HOME stimulation score (SD) 40.63 (3.77) 38.27 (5.44) 38.70 (3.81) 39.13 (4.61)
* For more information on definitions of these categories, see https://www.med.umich.edu/1info/FHP/practiceguides/depress/score.pdf.

24
GSED v1.0 Technical report

Internal reliability
The precision of the CB at any FIGURE 9. SEE PLOT BY GSED FORM
given age is greater than that of the
individual forms from which it is
constituted because of the larger
number of items. Figure 9 gives the
standard error of estimation (SEE)
(30) of the D-scores obtained for the
combined GSED SF and GSED LF
under the Rasch model. As there are
varying numbers of items pertinent
to different sections of the scale, the
precision of an estimate can vary
at different points along the scale.
The y-axis gives the SEE, the bottom FIGURE 10. PSEUDO-RELIABILITY PLOT BY
x-axis gives the D-score scale, and the GSED FORM
top x-axis gives the average D-score
for various child ages, expressed
in months. Note that the average
D-scores for any given age are non-
linearly related to age, reflecting the
decreasing rate of development seen
as children become older.

Figure 10 shows the pseudo-


reliability [Reliability=1−(1/Test
information) (31)] along the D-score
scale. This value, at any point on
the scale, can be interpreted in the
same way as traditional measures
of internal reliability (e.g. Kuder-
Richardson 20 or Cronbach’s alpha),
which indicate the consistency of
scores across items. For the GSED LF
the internal reliability is above 0.8 for
almost all of the scale, for the GSED
SF it is above 0.8, and for the GSED
CB above 0.9 for almost all of the
scale. Lower reliability at a point on
the scale indicates a lower density of
items at a given point.

25
GSED v1.0 Technical report

External reliability
All the reliability metrics are excellent on the D-score points. Reliability is expressed here (see Table 4) as an
scale. External reliability is the extent to which a intraclass correlation coefficient (ICC), where a value of
measure produces the same score over theoretically 0 represents no reliability and a value of 1 represents
identical administrations. Inter-rater reliability is perfect reliability. Common benchmark values suggest
the measurement of agreement between different that 0 - 0.2 indicates poor reliability; 0.2 - 0.4 indicates
raters and test-retest reliability is the measurement fair reliability; 0.4 - 0.6 indicates moderate reliability;
of agreement for the same rater at two different time and values > 0.8 indicate excellent reliability (32).

TABLE 4. INTRACLASS CORRELATIONS FOR RELIABILITY OF GSED D-SCORE


(95% CONFIDENCE INTERVALS [CIs])

D-score scale
Bangladesh Pakistan United Republic Total
of Tanzania

GSED 0.99 (0.99-1.00) 0.98 (0.97-0.99) 0.99 (0.98-0.99) 0.99 (0.98-0.99)


CB N=95 N=99 N=95 N=289

Inter-rater GSED 0.99 (0.99-0.99) 0.97 (0.96-0.98) 0.99 (0.99-0.99) 0.99 (0.98-0.99)
reliability SF N=95 N=100 N=96 N=291

GSED 0.99 (0.98-0.99) 0.98 (0.96-0.98) 0.97 (0.95-0.99) 0.98 (0.97-0.99)


LF N=95 N=99 N=95 N=289

GSED 1.00 (0.98-1.00) 0.99 (0.98-0.99) 0.99 (0.99-1.00) 0.99 (0.99-1.00)


CB
N=48 N=59 N=52 N=159

Test-retest GSED 0.99 (0.99-1.00) 0.99 (0.98-1.00) 0.99 (0.99-1.00) 0.99 (0.99-1.00)
SF N=48 N=59 N=52 N=159

GSED 0.99 (0.97-1.00) 0.98 (0.96-0.99) 0.99 (0.98-1.00) 0.99 (0.98-0.99)


LF N=48 N=59 N=52 N=159

A value of 0 represents no
reliability and a value of 1
represents perfect reliability.”

26
GSED v1.0 Technical report

Concurrent validity
Concurrent validity, a type of criterion validity, is the pertinent to the specific sample, as far as possible.
extent to which a measure correlates with another Age-adjusted z-scores were only generated for the total
measure of the same construct, possibly a gold score, as the individual domains did not contain enough
standard, given at the same time. Here the criterion items to calculate the age standardization robustly.
measure is the Bayley-III, a widely-used measure that
frequently acts as a reference (33-35). To assess the Table 5 gives the Pearson’s correlations between the
correlation of the Bayley-III raw scores with GSED DAZ Bayley-III individual domains and total scores, with
on the same scale, a 2PL item response theory (IRT) the GSED D-scores. The GSED D-score correlates > 0.90
model was fitted to the Bayley-III item responses and with the domains of the Bayley-III in most domains and
a Generalized Additive Models for Location Scale and countries. The correlations are higher for the cognitive
Shape (GAMLSS) model (36) was used to remove the and motor items than for the communication items,
effect of age, in line with the methodology used to although in the total score the correlation is very high
construct the GSED DAZ. Sample-specific norms were (0.98).
constructed to ensure that the age adjustments were

TABLE 5. CONCURRENT VALIDITY FOR GSED BY BAYLEY-III AND BAYLEY-III DOMAINS FOR
EACH COUNTRY AND OVERALL – CORRELATION COEFFICIENT (95% CIs)

GSED D-score
D-score scale
Bayley-III domain Bangladesh Pakistan United Republic Total
of Tanzania
(N=159) (N=158) (N=478)
(N=161)

Cognitive 0.98 (0.97 - 0.98) 0.95 (0.93 - 0.96) 0.97 (0.96 - 0.98) 0.97 (0.96 - 0.97)

Receptive 0.91 (0.88 - 0.93) 0.88 (0.84 - 0.91) 0.91 (0.88 - 0.94) 0.90 (0.88 - 0.92)
communication

Expressive 0.94 (0.92 - 0.96) 0.87 (0.83 - 0.90) 0.89 (0.85 - 0.92) 0.90 (0.88 - 0.91)
communication

Fine motor 0.97 (0.96 - 0.98) 0.97 (0.95 - 0.97) 0.97 (0.95 - 0.97) 0.97 (0.96 - 0.97)

Gross motor 0.98 (0.97 - 0.98) 0.97 (0.97 - 0.98) 0.98 (0.97 - 0.98) 0.98 (0.97 - 0.98)

Overall Bayley-III score 0.99 (0.98 - 0.99) 0.97 (0.96 - 0.98) 0.98 (0.98 - 0.99) 0.98 (0.98 - 0.98)

DAZ scale

Overall Bayley-III age- 0.55 (0.44-0.65) 0.26 (0.11-0.41) 0.56 (0.44-0.66) 0.53 (0.47-0.6)
adjusted z-score*
* DAZ domain scores were not produced at a domain level as insufficient data existed to do this robustly.

27
GSED v1.0 Technical report

Convergent validity
Convergent validity is the assessment of how closely accurate for the DHS Wealth Index based on generated
a measure is correlated with other variables where internal scores rather than relying on the published
correlation is expected. Table 6 gives a selection of quintiles, according to weights established some years
variables that were, a priori, expected to correlate ago. A total score (i.e. all countries combined) was
with the GSED DAZ score based on evidence from the not generated for the DHS Wealth Index and maternal
literature. The table contains Pearson’s correlation education because the items and categories were not
coefficients with 95% CIs, unless otherwise specified. comparable across countries. For the total scores, all
Some variables were ordinal in nature and required the measures are statistically significant from zero in the
use of Spearman’s correlation coefficient. hypothesized directions at the 5% level of significance.
However, some measures do not differ significantly
For several of the multi-item variables which contained from zero in the expected directions in the country level
large amounts of missing data, a unidimensional 2PL analyses. Table 6 compares the results of the GSED SF
IRT model was fitted to extract summary scores, under and GSED LF when administered in a CB against the
the Missing at Random missingness assumption. convergent contextual variables.
The 2PL model may also be more appropriate and

TABLE 6. CONVERGENT VALIDITY WITH GSED DAZ – PEARSON’S CORRELATION COEFFICIENT


(95% CIs)
Bangladesh Pakistan United Republic Total
of Tanzania

SES - DHS Wealth Index*** 0.10 (0.05-0.16) 0.14 (0.09-0.19) 0.15 (0.10-0.20) NA

Anthropometry - HAZ 0.21 (0.16-0.26) 0.18 (0.13-0.23) 0.21 (0.16-0.26) 0.19 (0.16-0.22)

Anthropometry - WAZ 0.21 (0.16-0.26) 0.17 (0.12-0.22) 0.17 (0.12-0.23) 0.23 (0.20-0.26)

Birth weight 0.16 (0.10-0.21) 0.03 (-0.02-0.08) 0.20 (0.14-0.25) 0.04 (0.01-0.07)

Gestational age 0.11 (0.05-0.16) 0.16 (0.11-0.21) 0.21 (0.15-0.26) 0.17 (0.14-0.20)

Maternal education*, *** 0.14 (0.08-0.19) 0.21 (0.16-0.26) 0.06 (0.01-0.11) NA

PHQ9 category* -0.05 (-0.10-0.01) -0.04 (-0.08-0.02) 0.02 (-0.03-0.07) 0.05 (0.02-0.08)

HOME 0.21 (0.16-0.26) 0.17 (0.12-0.21) 0.21 (0.16-0.26) 0.23 (0.20-0.26)

CPAS ** -0.05 (-0.1-0.01) -0.05 (-0.10-0.01) -0.01 (-0.06-0.04) -0.07 (-0.1--0.04)

FSS** 0.11 (0.06-0.16) 0.07 (0.02-0.12) 0.03 (-0.02-0.09) 0.22 (0.19-0.25)

BRS* -0.1 (-0.15--0.05) -0.09 (-0.14--0.04) -0.01 (-0.06-0.04) -0.09 (-0.12--0.06)


*Spearman’s correlation: maternal education [no schooling, primary, secondary, higher], PHQ9 [minimal, mild, moderate, moderate-severe, severe depression].
** Scale created via a unidimensional 2-parameter IRT model.
*** For these variables a cross-national scale was not considered appropriate.

28
GSED v1.0 Technical report

Known groups validity


To assess whether the GSED scores were associated WAZ < -2 SD below the mean), children with low birth
with known factors which influence development, weight, and children who were premature are all about
a series of logistic regression models, corrected for twice as likely to have lower-than-average DAZ scores.
differential prevalence across country, were fitted to Premature children and those whose mother’s used
predict the probability of a child having lower than tobacco during pregnancy are 1.77 and 1.34 times more
average DAZ given their membership of one of the likely to have a lower-than-average DAZ score. Children
groups in Table 7. None of the CIs for the odds ratios whose mothers took supplements during pregnancy are
contain zero, indicating statistical significance at the 5% 25% less likely to have lower-than-average DAZ scores.
level. Children who are stunted or underweight (HAZ or

TABLE 7. KNOWN GROUP VALIDITY ODDS RATIOS (95% CIs)

Known group Odds ratio

Stunted HAZ 2.12 (1.83-2.46)

Underweight WAZ 2.15 (1.81-2.54)

Low birth weight (< 2500 g) 2.24 (1.73-2.90)

Premature (< 38 weeks gestation) 1.77 (1.51-2.08)

Maternal supplement use during pregnancy 0.75 (0.66-0.86)

Maternal tobacco use during pregnancy 1.34(1.05-1.70)

29
GSED v1.0 Technical report

Short-term predictive validity


Short-term predictive validity was assessed via the Overall, the correlation between GSED DAZ scores in a
correlation of the GSED measure at baseline with a six-month interval was 0.59 with similar values across
GSED measure taken six months (± two weeks) later. countries (see Table 8).

TABLE 8. PREDICTIVE VALIDITY – CORRELATION COEFFICIENT (95% CIs)


Bangladesh Pakistan United Republic Total
of Tanzania

GSED DAZ at baseline vs 0.55 (0.48 - 0.61) 0.57 (0.51 - 0.63) 0.57 (0.50 - 0.63) 0.59 (0.56 - 0.63)
GSED DAZ at 6 months

© WHO / Christopher Black

30
GSED v1.0 Technical report

6. GSED package v1.0


This section describes all components for the GSED package v1.0. Package has been tested in
three countries and met the target criteria of concurrent, convergent and short-term predictive
validity and reliability. Further testing is underway in four additional countries to extend the
validity checks (see Section 7).

The package has been created to serve as an open-access population-level measure of ECD
for the global community that is comparable across countries. There are no fees nor royalties
involved when using it, and it was designed and tested to be linguistically and culturally
neutral. It includes: i) GSED SF and GSED LF measures as both a paper version and app; ii) GSED
measures Item Guides; iii) GSED measures Administration Manuals; iv) Adaptation and Translation
Guide; and v) Scoring Guide.

GSED SF and LF measures v1.0


The GSED SF is a caregiver-reported interview, while LF administration relies partially on low-cost materials
the GSED LF is comprised of items that are directly organized into a kit that is assembled locally by users
administered (e.g. making sounds, fixing gaze and following detailed guidance found in the GSED LF User
following an object, sitting and standing or identifying Manual.
objects provided in a picture on a tablet). They both
have administration rules (“start” and “stop” rules) The GSED measures should be administered using
based on the age of the child and responses to the the GSED App on a smart device that includes the in-
items. Table 9 summarizes the key aspects of the built administration rules and a user-friendly interface
GSED SF and GSED LF. Both forms are available in for both caregiver report and direct administration
multiple languages, and more translations will become (through a grid-based interface which includes
available. instructions for the assessor) data collection. The GSED
App is supported by any mobile device running on the
The GSED SF includes media files to be used as Android operating system, and can be downloaded
prompts to facilitate understanding of the item by from Google Play Store. The GSED XLSForm and
the caregiver, such as images and short animations Xform (xml) version of the GSED SF and GSED LF
showing actions and skills being asked about (e.g. measures along with the associated media files can
walking, kicking a ball) or audio files for caregivers to be uploaded and configured to any aggregator server
hear the sounds related to the questions (e.g. giggling (ODK Aggregate/Central, Kobo, etc.) of choice for data
or cooing). collection using the GSED App.

The GSED LF is organized into three streams which If necessary, a paper version of GSED measures may
group tasks that are likely observed together in order be used as long as administration rules are followed
to streamline and facilitate administration. The GSED and assessors have access to the accompanying

31
GSED v1.0 Technical report

materials specific to each measure. Alternatively, these recommendations to maximize the quality of data
forms can be printed. For the GSED LF, the kit must be generated are to use face-to-face administration of the
complemented by printing the components that are GSED SF and GSED LF with the GSED App.
in-built in the GSED App (i.e. images and booklets).
Self-administration and remote administration (e.g. The GSED HF (Box 4) and GSED PF (Box 5) are not yet
via phone) of the GSED SF are being tested. Current fully tested but can be made available on request.

TABLE 9. GSED MEASURES DESCRIPTION

GSED SF GSED LF
Large-scale data collection and
monitoring efforts
Research and programme
Primary purpose
evaluation
Research and programme
evaluation

Population-level
Score interpretation
NOT for individual-level interpretation

Target age < 36 months

Format Caregiver report Direct administration

Items grouped in three streams to


facilitate form administration
Form structure Ordered questions to caregiver
Each stream organized into grid
(items do not need to be scored in
sequential order)

Total number of Items 139 155

Average number of items


administered per child/ 30 - 50 45 - 60 (15 - 20 per stream)
caregiver*

Binary (Yes/No)
Binary (Yes/No) + “Don’t Know”
Response options response option (to be used only
Only observed items qualify to be
when absolutely necessary)
scored as Yes

32
GSED v1.0 Technical report

GSED SF GSED LF

“Start” rule based on age of child


“Start” and “stop” rules based on the
and “stop” rule based on reported
age of child and performance
abilities
“Bookends” to allow measure to
“Go back” rule to allow measure to
Administration rules be administered to children of all
be administered to children of all
abilities and developmental levels
abilities and developmental levels
All streams must be administered to
Items should not be skipped to
complete the form
complete the form

Time range to administer


15 - 25 minutes 30 - 75 minutes
measure*

GSED App (recommended)


Administration modality
Paper based (also available)

GSED LF Kit
GSED App: tablet or similar device
GSED App: tablet or similar device
Materials needed to Administration on paper:
administer instrument printed paper form with a tablet
Administration on paper:
or similar device for audio/visual
printed paper form and instruction
prompts (may also be printed)
manual

*GSED measures length and administration time are intended to be reduced by revisions planned once data from Round 2 countries are available (see Section 7). The same is true
when the adaptive version is available.

Dedicated training courses are required to learn Additionally, a series of self-paced online courses are
to administer the measures and to train others to in development. Training courses include resources,
administer them. These courses are available in such as the GSED Training Manual (available for
English in person or via Zoom upon request. The trainees), which provides guidelines for standardized
suggested length of the training is seven to 10 days (for administration to ensure that the same procedures
both measures); however, the training sessions may are used consistently by all assessors. Individuals
be tailored according to the users’ experience with administering the GSED must familiarize themselves
child development-related tools and depending on thoroughly with the guidelines and follow them
whether the GSED SF or GSED LF only is to be used. carefully.

33
GSED v1.0 Technical report

BOX 4. GSED HOUSEHOLD FORM [FOR FURTHER TESTING]


The GSED HF is a population-level, caregiver-reported data collection measure designed to be suitable
for integration into multi-topic household surveys to monitor child development globally. It measures the
proportion of children up to 24 months of age who are developmentally on track. The GSED HF uses the
same metric, the D-score, as the GSED SF and GSED LF. The GSED HF captures the youngest children’s
achievement of key developmental milestones. It is comprised of 55 items organized by five age bands (0 to
< 3 months; 3 to < 6 months; 6 to < 12 months; 12 to < 18 months; 18 to < 24 months). Depending on the age
of the child, primary caregivers are asked a set of 20 questions (tailored to the age band, but overlapping
across bands) about their children’s behaviour, skills and knowledge. The total of 55 questions and their
allocation to the specific age bands are the result of a rigorous methodological process to identify the
shortest and most informative sets of items to capture child development. The questions were intentionally
selected to reflect the increasing complexity of skills children acquire as they become older. Therefore, some
questions may seem too easy or too difficult for some children. The GSED HF is accompanied by a package
of implementation tools. It is specifically designed to be used in surveys that also collect a wide spectrum of
additional data on other family members, and whose focus may not necessarily be ECD. The GSED HF allows
inclusion of an ECD component for children < 24 months in multi-topic investigations with minimal additional
burden. When used in surveys that are adequately designed and implemented, it allows for the generation
of data that are comparable across countries. The GSED HF will be tested in multi-topic household surveys
before being released for scale up.

BOX 5. GSED PSYCHOSOCIAL FORM [FOR FURTHER TESTING]


Understanding the emergence of early mental health challenges, including disorders in sleeping,
eating and emotion regulation, is an important component of tracking young children’s development.
Because the GSED was designed to track normative development rather than the emergence of mental
health challenges, an additional form intended to be included in the package, called the PF, was created to
focus specifically on young children’s mental and behavioural well-being. Using items from existing tools for
emotion regulation and behaviour problems, an initial set of 49 was selected to index difficulties in eating,
sleeping, internalizing and externalizing behaviours, and social competence. In Bangladesh, Pakistan
and the United Republic of Tanzania, qualitative data through exit interviews and cognitive testing were
collected from a sample of 16 caregiver-child dyads. The qualitative information was used to evaluate the
cultural relevance of the GSED measures, caregiver understanding of the items, and feedback on the study
implementation processes. In one additional site, the USA, cognitive testing was conducted in both Spanish
and English to inform final item selection. In the three Round 1 countries the PF was administered alongside
the GSED SF and GSED LF, and in the USA, as part of an ongoing study, it was administered online together
with the GSED SF to a sample of approximately 1000 parents (Marcus Waldman and colleagues, University of
Nebraska, unpublished observations, 2023).

34
GSED v1.0 Technical report

Scoring
Once the GSED SF, GSED LF or both
Global Scales for
Early Development v1.0
calculates preliminary DAZ scores. These DAZ scores are
Scoring have been administered to one or more calculated in reference to same-age children from the
guide
children, the next step is to calculate Round 1 GSED data from both the GSED SF and GSED
the D-score and the DAZ for each child. LF in Bangladesh, Pakistan and United Republic of
This step is known as scoring. The Tanzania to estimate the age-conditional distributions
present section provides instructions of scores. Using this reference group, the D-scores of
on how to calculate these scores. Either of two methods new data can then be converted into standardized Z
may be used: scores (with a mean of 0 and SD of 1) at all ages.

1. online calculator. The online Shiny App While these preliminary norms are useful to adjust
(https://tnochildhealthstatistics.shinyapps.io/ scores to remove the age effect, DAZ scores with the
dcalculator/) is a convenient option for users current reference population should not be interpreted
not familiar with R. The app contains online as representative of any specific population or hold any
documentation and instructions; special normative importance. They are calculated on
a non-representative convenience sample. The main
2. R package dscore (https://CRAN.R-project. utility of these preliminary reference scores is to provide
org/package=dscore) is a flexible option with all estimates of the stability of GSED and the D-score over
the tools needed to calculate the D-score. It is an time, without artificially inflating correlations due to
excellent choice for users familiar with R and users the strong association between D-scores and age. They
who like to incorporate D-score calculations into a can also be used to provide a rough estimate of the
workflow. association between D-scores and other concurrent and
predictive measures.
Revisions to the GSED measures planned for 2024 will
not impact the interpretation of the D-scores calculated The DAZ is not currently an appropriate basis for
on previous versions. Scoring for previous versions of determining whether children are on or off track
the GSED instruments will continue to be available. developmentally. A Norms and Standards study will
While procedures for future versions may change, they be carried out by WHO (see Section 7) which aims to
will continue to produce scores on the same standard create a better estimate of how D-scores vary by age in
D-score scale and thus will remain comparable. a restricted population of children living without major
constraints on their development. This updated DAZ
Detailed instructions on how to calculate the D-score will be the focus of ongoing norms and standards work
and DAZ with the above methods can be found in the and provide a better justification of cut-off points.
GSED Scoring Guide.

Because development naturally occurs with age, it can


be difficult to compare D-scores for children of different
ages. To help solve this problem, the package also

35
GSED v1.0 Technical report

Other components of the GSED package v1.0


The GSED package Each GSED measure
GSED v1.0 Short Form ITEM GUIDE GSED v1.0 Long Form ITEM GUIDE

Global Scales for Global Scales for Global Scales for Global Scales for
Early Development v1.0 Early Development v1.0 Early Development v1.0 Early Development v1.0
Short Form Long Form Short Form Long Form
(caregiver-reported)
Item guide
includes the GSED
(directly administered)
Item guide
(caregiver-reported)
User manual
is accompanied by a
(directly administered)
User manual

measures as well User Manual to guide


as accompanying assessors’ understanding
materials to facilitate their and use of the measures.
implementation and use. 1

Assessors require training


A detailed item-by-item description is available through and certification to administer the GSED measures, with
the GSED SF and LF Item Guides. They can be used as the manuals and item guides as support. The manuals
a resource for both the translation process (to ensure are organized into four main sections: (a) description
that the translations reflect the original purpose of of the measure; (b) administration of the measure; (c)
the questions), adaptation (to ensure instructions are what to do and what not to do when administering the
relevant for the context) and training (to ensure that measure; and (d) how to address possible challenging
assessors have clear instructions on how to administer situations when administering the measures.
and score items). The Item Guides include instructions
on how to administer, assess and score each item. In To generate high-quality comparable
Global Scales for
Early Development v1.0

particular, the GSED SF Item Guide further clarifies the Adaptation


and translation data, the GSED measures should
guide
purpose of each item, and the GSED LF Item Guide be used in their entirety (no item
includes indications for methods and props to use, should be removed or added) without
referring to whether the item should be administered modifications to the item wording and
by observation, by listening, by demonstrating, etc., and sequence or to the response options.
whether any particular GSED LF Kit tool should be used Only the specific adaptation options indicated are
as well. acceptable, as well as best practices for translation.
If needed, guidance found in the Adaptation and
Translation Guide should be followed.  

© WHO / Kiana Hayeri

36
GSED v1.0 Technical report

7. Next steps
The GSED package aims to provide a feasible and reliable means of collecting population-level
data on early development that could be used to monitor progress and policy-level changes, and
evaluate programmes and interventions. Data from the GSED will be useful for policy-makers
and governments in deciding priorities for funding. Global organizations will be able to use the
data for cross-country comparisons and trend analyses. The GSED measures are expected to
provide countries with an indication of how the youngest children are developing and become a
motivation to invest in and promote healthy development.

This Technical Report provides an overview of the data from all seven countries, but a final revision will
GSED creation and validation methodology with results be provided when data from the normative sample
from three countries. The GSED measures have been (under GSED 2.0) are available. These norms will then
shown to be valid and reliable for measurement of be used as references to set standards for on- and off-
child development up to 36 months at the population track development, including exploratory adjustment
level. Additional work is ongoing to expand evidence for moderate-to-late preterm babies. Thirdly, both
of global validity and reliability (in Round 2 countries conceptual and field work will address the adaptation
and inclusion of GSED in external studies). The analysis of GSED for individual-level identification of children at
related to the work conducted for field-testing of the risk for neurodevelopmental impairment.
adaptive testing approach as well as the psychometric
property description of the GSED PF across different Lastly, the D-score approach may be used to harmonize
cultures and contexts, and linkages of GSED SF with measurements across ages and instruments. Scores
ECDI2030 will be finalized and disseminated as soon from multiple instruments can be translated into
as available. Similarly, the results of the testing of the D-scores and compared to scores from a different
GSED measures within the context of programmatic instrument. Future work will evaluate the extension
evaluations and use of GSED HF within multi-topic of the D-score to instruments for children beyond 36
household surveys are expected to be made available months of age. Extending the age limit of the D-score
soon. will provide improved guidance to users on tracking
children’s development over time.
Moreover, the GSED project has been expanded, under
GSED 2.0, to answer further research questions. Firstly, As with other measures of child development, the GSED
additional validation evidence will be generated for: i) will continue to evolve as more knowledge is acquired
predictive validity until 5 years of age by following the about the capabilities and learning processes that occur
cohorts from Round 1 countries; and ii) assessing the in the earliest years of life and about environmental
association of the GSED measures with biomarkers influences on children’s early development. It is also
(including brain imaging). Secondly, a global age- expected that technical innovations (e.g. adaptive
normed distribution of GSED scores through 36 months testing) will facilitate future measurements. Consistent
will be created, based on a rigorous methodology, with the SDGs, equity is strived for by developing
among children raised with minimal constraints on measurements that provide data to enable government
their development. While the existing D-score package leaders and programme planners to implement
calculates DAZ relative to three possible references strategies that enable all children to reach their
(of the Round 1 countries validation data), they are developmental potential.
considered an interim reference. Round 2 countries
validation data will replace these references using
37
GSED v1.0 Technical report

References
1. Clark H, Coll-Seck AM, Banerjee A, Peterson S, Dalglish SL, Ameratunga S et al. A future for the world’s children? A WHO–UNICEF–Lancet
Commission. Lancet. 2020;395(10224):605-58.
2. Daelmans B, Darmstadt GL, Lombardi J, Black MM, Britto PR, Lye S et al. Early childhood development: the foundation of sustainable
development. Lancet. 2017;389(10064):9–11.
3. Lu C, Black MM, Richter LM. Risk of poor development in young children in low-income and middle-income countries: an estimation and
analysis at the global, regional, and country level. Lancet Global Health. 2016;4:e916-e22.
4. Black MM, Walker SP, Fernald LCH, Anderson CT, DiGirolamo A, Lu C et al. Early child development coming of age: science through the
life-course. Lancet. 2017;389(10064):77-90. doi:10.1016/S0140-6736(16)31389-7.
5. Rubio-Codina M, Grantham-McGregor S. (Predictive validity in middle childhood of short tests of early childhood development used in
large scale studies compared to the Bayley-III, the Family Care Indicators, height-for-age, and stunting: a longitudinal study in Bogota,
Colombia. PLOS ONE. 2020;15(4): e0231317.
6. Hoddinott J, Behrman JR, Maluccio JA, Melgar P, Quisumbing AR, Ramirez-Zea M et al. Adult consequences of growth failure in early
childhood. Am J Clin Nutr. 2013;98(5):1170-8. doi:10.3945/ajcn.113.064584.
7. Fink G, Peet E, Danaei G, Andrews K, McCoy DC, Sudfeld CR et al. Schooling and wage income losses due to early-childhood growth
faltering in developing countries: national, regional, and global estimates. Am J Clin Nutr. 2016;104(1):104-12.
8. United Nations. Sustainable Development Goal 4 (Education) [Online] (https://www.sdg4education2030.org/the-
goal#:~:text=Sustainable%20Development%20Goal%204%20(SDG%204)%20is%20the%20education%20goal,lifelong%20learning%20
opportunities%20for%20all.%E2%80%9D).
9. WHO Multicentre Growth Reference Study Group, de Onis M. WHO Child Growth Standards based on length/height, weight and age. Acta
Paediatrica. 2006;95(S450):7685.
10. Ertem IO, Krishnamurthy V, Mulaudzi MC, Sguassero Y, Balta H, Gulumser O et al. Similarities and differences in child development from
birth to age 3 years by sex and across four countries: a cross-sectional, observational study. Lancet Glob Health. 2018;6(3):e279-e91.
11. Fernandes M, Villar J, Stein A, Staines Urias E, Garza C, Victora CG et al. INTERGROWTH-21st Project international INTER-NDA standards
for child development at 2 years of age: an international prospective population-based study. BMJ Open. 2020;10(6):e035258.
doi:10.1136/bmjopen-2019-035258.
12. Jacobusse G, van Buuren S, Verkerk PH. An interval scale for development of children aged 0–2 years. Stat Med. 2006;25(13):2272–83.
13. Weber AM, Rubio-Codina M, Walker SP, van Buuren S, Eekhout I, Grantham-McGregor SM et al. The D-score: a metric for interpreting the
early development of infants and toddlers across global settings. BMJ Glob Health. 2019;4(6):e001724.
14. Ayoub CC, Fischer KW. Developmental pathways and intersections among domains of development. In: McCartney K, Phillips D, editors.
Blackwell handbook of early childhood development. Massachusetts: Blackwell Publishing; 2006:62-81.
15. Fischer KW, Bidell TR. Dynamic development of action and thought. In: Lerner RM, Damon W, editors. Handbook of child psychology:
theoretical models of human development. New York: John Wiley & Sons Inc.; 2006:313-99.
16. McCray G, McCoy D, Kariger P, Janus M, Black MM, Chang-Lopez S et al. The creation of the Global Scales for Early Development (GSED)
for children aged 0-3 years: combining subject matter expert judgements with big data. BMJ Global Health. 2023;8:e009827.
17. McCoy DC, Waldman M, CREDI Field Team, Fink F. Measuring early childhood development at a global scale: evidence from the Caregiver-
Reported Early Development Instruments. Early Childhood Research Quarterly. 2018;45:58–68.
18. Gladstone M, Lancaster G, McCray G, Cavallera V, Alves CRL, Maliwichi L et al. Validation of the Infant and Young Child Development (IYCD)
indicators in three countries: Brazil, Malawi and Pakistan. Int. J. Environ. Res. Public Health. 2021;18(11):6117.
19. Aylward GP. Brain, environment, and development: a synthesis and a conceptual model. In: Aylward GP, editor. Bayley 4 clinical use and
interpretation. London: Academic Press; 2020:1-19.
20. Gesell AL, Halverson HM, Amatruda C. The first five years of life; a guide to the study of the pre-school child. New York: Harper & Brothers;
1940.
21. Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press; 1980.
22. van Buuren S, Eekhout I. Child development with the D-score: turning milestones into measurement [version 1]. Gates Open Res.
2021;5:81 (https://doi.org/10.12688/gatesopenres.13222.1).
23. McCoy DC, Waldman M, CREDI Field Team, Fink G. Measuring early childhood development at a global scale: evidence from the
Caregiver-Reported Early Development Instruments. Early Child. Res. Quart. 2018;45:58–68.
24. Chang W, Cheng J, Allaire JJ, Sievert C, Schloerke B, Xie Y et al. shiny: web application framework for R. R package version 1.7.1. 2021;
(https://CRAN.R-project.org/package=shiny).

38
GSED v1.0 Technical report

25. Cavallera V, Lancaster G, Gladstone M, Black MM, McCray G, Nizar A et al. Protocol for validation of the Global Scales for Early
Development (GSED) for children under 3 years of age in seven countries. BMJ Open 2023;13:e06256227.
26. Salmona M, Lieber E, Kaczynski D. Qualitative and mixed methods data analysis using Dedoose: A practical approach for research
across the social sciences. Thousand Oaks: Sage Publications; 2019.
27. Huang CY, Tung LC, Chou YT, Wu HM, Chen KL, Hsieh CL. Development of a computerized adaptive testing of children’s gross motor
skills. Archives of Physical Medicine & Rehabilitation. 2018;99:512-20.
28. Jacobusse G, van Buuren S. Computerized adaptive testing for measuring development of young children. Statistics in Medicine.
2007;26(13):2629–38 (https://stefvanbuuren.name/publication/2007-01-01_jacobusse2007/).
29. Walker SP, Wachs TD, Gardner JM, Lozoff B, Wasserman GA, Pollitt E et al. Child development: risk factors for adverse outcomes in
developing countries. Lancet. 2007;369(9556):145-57.
30. De Ayala RJ. The theory and practice of item response theory. New York: Guilford Publications; 2013.
31. Thissen D. Reliability and measurement precision. In: Wainer H, editor. Computerized adaptive testing: a primer (2nd ed.). New Jersey:
Lawrence Erlbaum Associates Publishers; 2000:159-84.
32. Landis R, Koch G. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.
33. Fernald Lia CH, Prado E, Kariger P, Raikes A. A toolkit for measuring early childhood development in low and middle-income countries.
Washington, DC: The World Bank; 2017.
34. Lennon EM, Gardner JM, Karmel BZ, Flory MJ. Bayley Scales of Infant Development. In: Haith MM, Benson JB. Encyclopedia of infant
and early childhood development. Massachusetts: Academic Press; 2008:145-56.
35. Armstrong KH, Agazzi HC. The Bayley-III cognitive scale. In: Weiss LG, Oakland T, Aylward GP, editors. Practical resources for the mental
health professional, Bayley-III clinical use and interpretation. Massachusetts: Academic Press; 2010:29-45.
36. Stasinopoulos DM, Rigby RA. Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software.
2008;23:1-46.
37. Sudfeld CR, McCoy DC, Fink G, Muhihi A, Bellinger DC, Masanji H et al. Malnutrition and its determinants are associated with suboptimal
cognitive, communication, and motor development in Tanzanian children. J Nutr. 2015;145(12):2705-14. .

39
GSED v1.0 Technical report

Bibliography
Ages and Stages Questionnaire, third edition (ASQ-3) Griffiths Mental Development Scales – South African Version
Squires J, Bricker D. Ages and Stages Questionnaire (ASQ): a (GMDS-SA)
parent-completed child monitoring system, third edition. Luiz DM, Kotras N, Barnard A, Knoesen N. Technical manual of the
Baltimore (MD): Brooks Publishing Company; 2009. Griffiths Mental Development Scales – Extended Revised (GMDS-ER).
Amersham: Association for Research in Infant & Child Development;
Bayley Scales of Infant Development (Bayley)
2004.
Bayley N. Bayley Scales of Infant Development. San Antonio (TX):
The Psychological Corporation; 1969. Kilifi Developmental Inventory (KDI)
Abubakar A, Holding P, van Baar A, Newton CR, van de Vijver FJR.
Bayley Scales of Infant Development, second edition (Bayley-II)
Monitoring psychomotor development in a resource-limited setting:
Bayley N. Bayley Scales of Infant Development, second edition. San
an evaluation of the Kilifi Developmental Inventory. Ann Trop
Antonio (TX): The Psychological Corporation; 1993.
Paediatr. 2008;28(3):217–26. doi.org/10.1179%2F146532808X335679.
Caregiver-Reported Early Development Instruments (CREDI)
Malawi Developmental Assessment Tool (MDAT)
McCoy DC, Waldman M, CREDI Field Team, Fink G. Measuring
Gladstone M, Lancaster GA, Umar E, Nyirenda M, Kayira E, Van Den
early childhood development at a global scale: evidence from the
Broek NR et al. The Malawi Developmental Assessment Tool (MDAT):
Caregiver-Reported Early Development Instruments. Early Child Res
the creation, validation, and reliability of a tool to assess child
Q. 2018;45(4):58–68. doi.org/10.1016/j.ecresq.2018.05.002.
development in rural African settings. PLoS Med. 2010;7:e1000273.
Denver Developmental Screening Test (DDST) doi:10.1371/journal.pmed.1000273.
Frankenburg WK. The Denver Developmental Screening Test. J
Preschool Pediatric Symptoms Checklist (PPSC)
Pediatr. 1967;71(2):181–91. doi:10.1016/S0022-3476(67)80070-2.
Sheldrick RC, Henson BS, Merchant S, Neger EN, Murphy JM,
Denver Developmental Screening Test, second edition (DDST II) Perrin EC. The Preschool Pediatric Symptom Checklist (PPSC):
Frankenburg WK, Dodds J, Archer P, Shapiro H, Bresnick B. The development and initial validation of a new social/emotional
Denver II: a major revision and restandardization of the Denver screening instrument. Acad Pediatr. 2012;12(5):456–67. doi:
Developmental Screening Test. Pediatrics. 1992;89(1):91–7. PMID: 10.1016/j.acap.2012.06.008.
1370185.
Saving Brains Early Childhood Development Scale (SBECD)
Developmental Milestones Checklist (DMC) McCoy DC, Sudfeld CR, Bellinger DC, Muhihi A, Ashery G, Weary TE et
Abubakar A, Holding P, van de Vijver FJ, Bomu G, Van Baar al. Development and validation of an early childhood development
A. Developmental monitoring using caregiver reports in a scale for use in low-resourced settings. Popul Health Metr.
resource-limited setting: the case of Kilifi, Kenya. Acta Paediatr. 2017;15(1):3. doi: 10.1186/s12963-017-0122-8.
2010;99(2):291–7. doi.org/10.1111/j.1651-2227.2009.01561.x.
Stanford-Binet Intelligence Scales, fifth edition (SBIS-5)
Developmental Milestones Checklist II (DMC II) Roid GH. Stanford-Binet Intelligence Scales, fifth edition. Itasca (IL):
Prado EL, Abubakar AA, Abbeddou S, Jimenez EY, Somé JW, Riverside Publishing; 2003.
Ouédraogo JB. Extending the Developmental Milestones Checklist
Test de Desarrollo Psicomotor [Psychomotor Development Test]
for use in a different context in sub-Saharan Africa. Acta Paediatr.
(TEPSI)
2014;103(4):447–54. doi: 10.1111/apa.12540.
Haeussler IM, Marchant T. Elaboración y estandarización del Test
Dutch Development Instrument (DDI) de Desarrollo Psicomotor 2-5 años TEPSI [Development and
Laurent de Angulo MS, Brouwers-de JEA, Bijlsma-Schlösser standardization of the Psychomotor Development Test 2-5 years].
JFM, Bulk-Bunschoten AMW, Pauwels JH, Steinbuch-Linstra I. Rev Educ. 1989.
Ontwikkelingsonderzoek in de Jeugdgezondheidszorg. Het Van
Vineland Adaptive Behavior Scales (Vineland)
Wiechenonderzoek. De Baecke-FassaertMotoriektest. Assen: Van
Sparrow SS, Cicchetti DV. The Vineland Adaptive Behavior Scales. In:
Gorcum; 2008.
Newmark CS, editor. Major psychological assessment instruments,
Griffiths Mental Development Scales (GMDS) Vol. 2. Boston (MD): Allyn & Bacon; 1989:199–231.
Huntley M. Griffiths Mental Development Scales from birth to 2
years – manual. Oxford: Association for Research in Infant & Child
Development; 1996. doi.org/10.1037/t03301-000.

40
GSED v1.0 Technical report

Annex 1. Early childhood development dataset for creation of


GSED measures
Bibliography
Table A.1.1 lists the cohorts that contributed information to the dataset for creation of GSED with details on
number of visits by country, age group and instruments administered.

TABLE A.1.1. COHORTS CONTRIBUTING TO GSED DATASET


Country Cohort1 0–<1 1-<2 2-<3 3+ years Individual Instruments
year years years children
(N)
Bangladesh CREDI-BGD 49 202 29 0 280 CREDI
Bangladesh GCDG-BGD-7MO 0 1807 20 0 1827 Bailey-II
Ages and Stages
Bangladesh IYCD-BGD-ASQVAL 127 132 88 101 448 Questionnaire
(ASQ)-3, Bailey-III
Brazil CREDI-BRA-ONLINE 113 287 224 49 673 CREDI
Brazil CREDI-BRA-SP 472 426 688 65 1651 CREDI
Brazil GCDG-BRA-1 1875 899 0 0 2774 Denver-II
Battelle
Developmental
Brazil GCDG-BRA-2 3208 4015 551 0 7774
Inventory and
Screener-2
WHO Indicators of
Infant and Young
Brazil IYCD-BRA-FPS2017 48 26 11 12 97
Child Development
(IYCD)
Cambodia CREDI-KHM 126 123 161 83 493 CREDI
Chile CREDI-CHL 85 88 71 0 244 CREDI
Chile GCDG-CHL-1 1483 537 0 0 2020 Bailey-I
Test de Desarrollo
Chile GCDG-CHL-2 312 1185 5166 16675 23338
Psicomotor
China GCDG-CHN 0 982 0 0 982 Bailey-III
Colombia CREDI-COL 17 121 143 4 285 CREDI
Colombia GCDG-COL-LT42M 215 417 450 229 1311 Bailey-III
Bailey-III, Denver
Developmental
Colombia GCDG-COL-LT45M 53 632 257 393 1335
Screening Test-II,
ASQ-3
Regional Project on
Costa Rica IYCD-CRI-PRIDI 0 0 618 1186 1804 Child Development
Indicators (PRIDI)
Ecuador GCDG-ECU 186 259 222 0 667 Barrera
Ethiopia GCDG-ETH 115 75 440 456 1086 Bailey-III
Ghana CREDI-GHA 575 541 426 23 1565 CREDI
Guatemala CREDI-GTM 67 73 57 8 205 CREDI
India CREDI-IND-ONLINE 85 41 74 0 200 CREDI
India IYCD-IND-ASQ 1367 1627 17 0 3011 ASQ-3

41
GSED v1.0 Technical report

Country Cohort1 0–<1 1-<2 2-<3 3+ years Individual Instruments


year years years children
(N)
Indonesia IYCD-IDN-ASQ 757 1006 0 0 1763 ASQ-3
Griffiths Mental
Jamaica GCDG-JAM-LBW 0 327 116 0 443 Development
Scales (GMDS)
Jamaica GCDG-JAM-STUNTED 5 144 151 177 477 GMDS
Jordan CREDI-JOR 114 98 66 37 315 CREDI
Kilifi
Kenya IYCD-KEN-DID 79 148 196 0 423 Developmental
Inventory
Developmental
Kenya IYCD-KEN-DMC 188 96 0 0 284
Milestone Chart
Lao People's
Democratic CREDI-LAO 16 18 9 3 46 CREDI
Republic
Lebanon CREDI-LBN 181 118 84 41 424 CREDI
Madagascar GCDG-MDG 0 0 18 187 205 Stanford Binet Test
Malawi IYCD-MWI-FPS2017 39 20 9 9 77 IYCD
Malawi
Malawi IYCD-MWI-MDAT 687 276 130 353 1446 Development
Assessment Tool
Nepal CREDI-NPL 227 136 0 0 363 CREDI
Dutch
Netherlands GCDG-NLD-2 0 262 1253 2130 3645 Development
Instrument (DDI)
Netherlands GCDG-NLD-SMOCC 10 110 5120 1308 0 16 538 DDI
Nicaragua IYCD-NIC-PRIDI 0 0 583 1251 1834 PRIDI
Pakistan CREDI-PAK 85 80 76 9 250 CREDI
Pakistan IYCD-PAK-FPS2017 48 23 12 12 95 IYCD
Paraguay IYCD-PRY-PRIDI 0 2 456 1044 1502 PRIDI
Peru IYCD-PER-ASQ 1261 1654 3 0 2918 ASQ-3
Peru IYCD-PER-PRIDI 0 0 825 1742 2567 PRIDI
Philippines CREDI-PHL 198 351 170 1 720 CREDI
Bailey-I, Vineland
South Africa GCDG-ZAF 490 796 1275 1614 4175 Adaptive Behavior
Scales, GMDS
United
Republic of CREDI-TZA-MALARIA 0 56 132 9 197 CREDI
Tanzania
United
Republic of CREDI-TZA-NEOVITA 0 938 1467 76 2481 CREDI
Tanzania
USA CREDI-USA-BOS 61 56 37 2 156 CREDI
USA CREDI-USA-ONLINE 336 188 221 0 745 CREDI
Zambia CREDI-ZMB-CHIPATA 223 591 236 0 1050 CREDI
Zambia CREDI-ZMB-CHOMA 519 378 47 0 944 CREDI

42 1 Cohort name is an internal coding representing original group, country and number.
GSED v1.0 Technical report

Annex 2. GSED study validation measures


Table A.2.1 lists and describes the study measures in addition to GSED that were collected for validation processes.
They capture children’s growth and nutrition, health, environmental and contextual information.

TABLE A.2.1. STUDY MEASURES USED FOR VALIDATION PROCESSES

Construct What the measure captures Measure Administration Time to


mode administer
(minutes)
• Eligibility (exclusion criteria)
• Demographic information
• Information about acute child
health
Eligibility and
• Delivery and perinatal conditions contextual form
Child health and • Breastfeeding (specifically Caregiver report 35
household SES
• Child’s health history developed for the
• Household SES* study)

• Caregiver education
• Maternal health/chronic illness
• COVID-19 exposure
• Weight at time of assessment
• Infant length/child height at time
of assessment
Anthropometry • Child’s mid-upper arm Anthropometry form Child assessment 15
circumference at time of
assessment
• Child’s head circumference at
time of assessment
• Home environment (HOME only) HOME: caregiver report
HOME HOME: 45
• Play/stimulation/interactions & observation
between the child and other OR FCI FCI: 15
FCI: caregiver report
Family/home family members in the home
environment (HOME and FCI)
• Child neglect/abuse
CPAS† Caregiver report 15
• Exposure to violence or conflict
• Family resilience BRS† Caregiver report 1
• Family social support FSS† Caregiver report 5
Caregiver health
• Caregiver depressive symptoms PHQ9 Caregiver report 5
and well-being

• Global child development Bayley-III OR


Direct child assessment 45 - 60
(0 - 41 months) GMDS‡
Child development
• Global child development
ECDI2030§ Caregiver report 10
(24 - 41 months)

* SES information on this form comes from the standard DHS multiple assets index; however, some sites have adapted the items to better fit their contexts.
† These measures have been slightly adapted for the purpose of the study.
‡ In a subsample (N=150).
§ In a subsample (all children of 24 - 41 months within the predictive validity subsamples in three countries).

43
GSED v1.0 Technical report

Annex 3. Validity results by GSED measure


The tables in this Annex present the validity and reliability results for each GSED measure individually as well as for
the scale as a whole (i.e. the CB).

TABLE A.3.1. CONCURRENT VALIDITY (WITH BAYLEY-III)

D-score scale
GSED United Republic of
Bayley-III domain Bangladesh Pakistan Combined
measure Tanzania

CB Cognitive 0.98 (0.97-0.98) 0.94 (0.92-0.95) 0.97 (0.96-0.98) 0.96 (0.95-0.97)

SF Cognitive 0.97 (0.96-0.98) 0.93 (0.91-0.95) 0.96 (0.95-0.97) 0.95 (0.94-0.96)

LF Cognitive 0.98 (0.97-0.98) 0.94 (0.92-0.96) 0.97 (0.96-0.98) 0.96 (0.96-0.97)


Receptive
CB communication
0.91 (0.88-0.93) 0.87 (0.83-0.91) 0.91 (0.88-0.93) 0.90 (0.88-0.92)

Receptive
SF communication
0.90 (0.87-0.93) 0.87 (0.82-0.90) 0.90 (0.86-0.92) 0.89 (0.87-0.91)

Receptive
LF communication
0.91 (0.88-0.93) 0.88 (0.84-0.91) 0.92 (0.89-0.94) 0.90 (0.88-0.92)

Expressive
CB communication
0.94 (0.92-0.96) 0.87 (0.82-0.90) 0.89 (0.85-0.92) 0.90 (0.88-0.91)

Expressive
SF communication
0.93 (0.91-0.95) 0.86 (0.81-0.89) 0.87 (0.82-0.90) 0.88 (0.86-0.90)

Expressive
LF communication
0.94 (0.92-0.96) 0.87 (0.83-0.90) 0.90 (0.87-0.93) 0.90 (0.88-0.92)

CB Fine motor 0.9 7(0.96-0.98) 0.96 (0.94-0.97) 0.96 (0.95-0.97) 0.96 (0.96-0.97)

SF Fine motor 0.96 (0.95-0.97) 0.95 (0.94-0.97) 0.95 (0.94-0.97) 0.96 (0.95-0.96)

LF Fine motor 0.97 (0.96-0.98) 0.96 (0.94-0.97) 0.96 (0.94-0.97) 0.96 (0.95-0.97)

CB Gross motor 0.98 (0.97-0.98) 0.97 (0.95-0.98) 0.98 (0.97-0.98) 0.97 (0.97-0.98)

SF Gross motor 0.97 (0.95-0.97) 0.96 (0.95-0.97) 0.97 (0.96-0.98) 0.97 (0.96-0.97)

LF Gross motor 0.97 (0.97-0.98) 0.96 (0.95-0.97) 0.97 (0.96-0.98) 0.97 (0.96-0.97)

CB Overall Bayley-III score 0.99 (0.98-0.99) 0.96 (0.95-0.97) 0.98 (0.97-0.99) 0.98 (0.97-0.98)

SF Overall Bayley-III score 0.97 (0.95-0.97) 0.96 (0.95-0.97) 0.97 (0.96-0.98) 0.97 (0.96-0.97)

LF Overall Bayley-III score 0.99 (0.98-0.99) 0.96 (0.95-0.97) 0.98 (0.97-0.98) 0.98 (0.97-0.98)

DAZ scale
CB Overall Bayley-III score 0.55 (0.44-0.65) 0.26 (0.11-0.41) 0.56 (0.44-0.66) 0.53 (0.47-0.6)

SF Overall Bayley-III score 0.37 (0.23-0.50) 0.18 (0.03-0.33) 0.40 (0.26-0.52) 0.35 (0.27-0.43)

LF Overall Bayley-III score 0.59 (0.48-0.68) 0.31 (0.16-0.44) 0.60 (0.49-0.69) 0.58 (0.52-0.64)

44
GSED v1.0 Technical report

TABLE A.3.2. CONVERGENT VALIDITY

D-score scale
GSED United Republic
Bayley-III domain Bangladesh Pakistan Combined
measure of Tanzania

CB SES-DHS Wealth Index** 0.10 (0.05-0.16) 0.14 (0.09-0.19) 0.15 (0.10-0.20) NA

SF SES-DHS Wealth Index** 0.07 (0.02–0.12) 0.11 (0.61-0.16) 0.10 (0.05-0.15) NA

LF SES-DHS Wealth Index** 0.07 (0.02-0.13) 0.12 (0.08-0.17) 0.14 (0.09-0.19) NA

CB Anthro-HAZ 0.21 (0.16-0.26) 0.18 (0.13-0.23) 0.21 (0.16-0.26) 0.19 (0.16-0.22)

SF Anthro-HAZ 0.16 (0.10-0.21) 0.13 (0.08-0.18) 0.14 (0.08-0.19) 0.14 (0.11-0.17)

LF Anthro-HAZ 0.2 (0.15-0.25) 0.18 (0.13-0.23) 0.22 (0.17-0.27) 0.19 (0.16-0.22)

CB Anthro-WAZ 0.21 (0.16-0.26) 0.17 (0.12-0.22) 0.17 (0.12-0.23) 0.23 (0.20-0.26)

SF Anthro-WAZ 0.16 (0.11-0.21) 0.11 (0.06-0.16) 0.17 (0.11-0.22) 0.16 (0.13-0.19)

LF Anthro-WAZ 0.22 (0.16-0.27) 0.19 (0.15-0.24) 0.16 (0.11-0.21) 0.24 (0.21-0.27)

CB Birthweight 0.16 (0.10-0.21) 0.03 (-0.02-0.08) 0.20 (0.14-0.25) 0.04 (0.01-0.07)

SF Birthweight 0.09 (0.04-0.14) 0.00 (-0.05-0.05) 0.13 (0.07-0.18) 0.03 (0.00-0.06)

LF Birthweight 0.16 (0.1-0.21) 0.06 (0.01-0.11) 0.19 (0.14-0.24) 0.03 (0.00-0.06)

CB Gestational age 0.11 (0.05-0.16) 0.16 (0.11-0.21) 0.21 (0.15-0.26) 0.17 (0.14-0.20)

SF Gestational age 0.06 (0.00-0.11) 0.13 (0.08-0.17) 0.14 (0.09-0.19) 0.12 (0.09-0.15)

LF Gestational age 0.13 (0.07-0.18) 0.12 (0.07-0.17) 0.18 (0.13-0.23) 0.16 (0.13-0.19)

CB Maternal education* 0.14 (0.08-0.19) 0.21 (0.16-0.26) 0.06 (0.01-0.11) NA

SF Maternal education* 0.12 (0.07-0.18) 0.18 (0.14-0.23) 0.05 (0.00-0.10) NA

LF Maternal education* 0.09 (0.41-0.15) 0.15 (0.10-0.20) 0.04 (-0.01-0.10) NA

CB PHQ9 category -0.05 (-0.10-0.01) -0.04 (-0.08-0.02) 0.02 (-0.03-0.07) 0.05 (0.02-0.08)

SF PHQ9 category -0.05 (-0.10-0.01) -0.01 (-0.06-0.04) 0.01 (-0.05-0.06) 0.01 (-0.02-0.04)

LF PHQ9 category -0.02 (-0.08-0.03) -0.08 (-0.13—0.03) 0.02 (-0.03-0.08) 0.07 (0.04-010)

CB Home 0.21 (0.16-0.26) 0.17 (0.12-0.21) 0.21 (0.16-0.26) 0.23 (0.20-0.26)

SF Home 0.18 (0.13-0.23) 0.15 (0.10-0.20) 0.22 (0.17-0.27) 0.19 (0.16-0.22)

LF Home 0.14 (0.08-0.19) 0.12 (0.07-0.17) 0.09 (0.04-0.15) 0.18 (0.15-0.21)

CB FSS** 0.11 (0.06-0.16) 0.07 (0.02-0.12) 0.03 (-0.02-0.09) 0.22 (0.19-0.25)

SF FSS** 0.06 (0-0.11) 0.04 (-0.01-0.09) 0.07 (0.01-0.12) 0.11 (0.08-0.14)

LF FSS** 0.14 (0.08-0.19) 0.09 (0.05-0.14) 0.01 (-0.05-0.06) 0.28 (0.25-0.30)

CB BRS** -0.1 (-0.15--0.05) -0.09 (-0.14--0.04) -0.01 (-0.06-0.04) -0.09 (-0.12--0.06)

SF BRS** -0.03 (-0.08-0.02) -0.05 (-0.1--0.01) 0.00 (-0.06-0.05) -0.04 (-0.07--0.01)

LF BRS** -0.11 (-0.16--0.06) -0.1 (-0.15--0.06) 0.01 (-0.04-0.06) -0.10 (-0.13--0.07)

CB CPAS** -0.05 (-0.1 - 0.01) -0.05 (-0.10 - 0.01) -0.01 (-0.06 - 0.04) -0.07 (-0.1 - -0.04)

SF CPAS** -0.03 (-0.08 - 0.02) -0.05 (-0.10 - 0.00) -0.01 (-0.06 - 0.04) -0.05 (-0.08- -0.02)

LF CPAS** -0.05 (-0.11 - 0.00) -0.03 (-0.07 - 0.02) 0.00 (-0.05 - 0.06) -0.06 (-0.09- -0.03)
* Spearman’s correlation: maternal education (no schooling, primary, secondary, higher), PHQ9 (none, mild, moderate, moderate-severe, severe depression).
** Scale created via a unidimensional 2-parameter IRT model.
*** For these variables a cross-national scale was not considered appropriate.
45
GSED v1.0 Technical report

TABLE A.V.3. SHORT-TERM PREDICTIVE VALIDITY (AT 6 MONTHS)

D-score scale
United Republic
Bangladesh Pakistan Combined
of Tanzania

GSED CB DAZ at
baseline vs GSED 0.55 (0.48- 0.61) 0.57 (0.51- 0.63) 0.57 (0.50- 0.63) 0.59 (0.56 - 0.63)
DAZ at 6 months
GSED SF DAZ at
baseline vs GSED 0.53 (0.46 - 0.59) 0.56 (0.50 - 0.62) 0.58 (0.52 - 0.64) 0.57 (0.53 - 0.6)
DAZ at 6 months
GSED LF DAZ at
baseline vs GSED 0.38 (0.3 - 0.46) 0.43 (0.35 - 0.50) 0.38 (0.3 - 0.46) 0.48 (0.43 - 0.52)
DAZ at 6 months

46
FOR MORE INFORMATION PLEASE CONTACT: GSED v1.0 Technical report

Brain Health Unit


Department of Mental Health and Substance Use
World Health Organization
Avenue Appia 20
CH-1211 Geneva 27
Switzerland

Email: [email protected]

Website: https://www.who.int/teams/mental-health-
and-substance-use/data-research/global-scale-for-
early-development

page 48

You might also like