WHO MSD GSEDpackage v1.0 2023.1 Eng PDF
WHO MSD GSEDpackage v1.0 2023.1 Eng PDF
Technical
report
Images ©WHO
Design and layout: büro svenja
Global Scales for
Early Development v1.0
Technical
report
WHO/MSD/GSED package v1.0/2023.1
Global Scales for Early Development v1.0 Technical report – Global Scales for Early
Development v1.0 Short Form (caregiver-reported) – Global Scales for Early Development v1.0
Short Form (caregiver-reported). Item guide – Global Scales for Early Development v1.0 Short
Form (caregiver-reported). User manual – Global Scales for Early Development v 1.0 Long
Form (directly administered) – Global Scales for Early Development v 1.0 Long Form (directly
administered). Item guide – Global Scales for Early Development v1.0 Long Form (directly
administered). User manual – Global Scales for Early Development v1.0 Scoring guide – Global
Scales for Early Development v1.0 Adaptation and translation guide
© World Health Organization 2023. Some rights reserved. This work is available under the
CC BY-NC-SA 3.0 IGO licence.
Selected questions and descriptions for the GSED measures have been reproduced or adapted
from the following tools/assessments: Ages and Stages Questionnaire, third edition (ASQ-3);
Bayley Scales of Infant Development (Bayley); Bayley Scales of Infant Development, second
edition (Bayley II); Caregiver-Reported Early Development Instruments (CREDI); Denver
Developmental Screening Test (DDST); Denver Developmental Screening Test, second edition
(DDST II); Developmental Milestones Checklist (DMC); Developmental Milestones Checklist II
(DMC II); Dutch Development Instrument (DDI); Griffiths Mental Development Scales (GMDS);
Griffiths Mental Development Scales – South African version (GMDS-SA); Kilifi Developmental
Inventory (KDI); Malawi Developmental Assessment Tool (MDAT); Preschool Pediatric
Symptoms Checklist (PPSC); Saving Brains Early Childhood Development Scale (SBECD);
Stanford-Binet Intelligence Scales, fifth edition (SBIS-5); Test de Desarrollo Psicomotor
[Psychomotor Development Test] (TEPSI); and Vineland Adaptive Behavior Scales (Vineland)
(see Bibliography for details).
ii
Contents
Acknowledgements v
Abbreviations vii
Executive summary ix
1. Introduction 1
2. GSED: overview 3
The D-score 3
The GSED measures 4
4. GSED validation 13
Preparation and feasibility phase 15
Translations and adaptation 15
Training 16
Feasibility of implementation processes 16
Validation phase 18
Study population 18
Sample size and sampling scheme 18
iii
GSED v1.0 Technical report
Study measures 21
Visit schedule and quality control 22
7. Next steps 37
References 38
Bibliography 40
iv
GSED v1.0 Technical report
Acknowledgements
Vision and conceptualization Nisar (AKU, Pakistan), Ambreen Nizar ( AKU, Pakistan),
Mariana Pacifico Mercadante (University of São Paulo
The Global Scales for Early Development (GSED) package
Medical School, Brazil), Michelle Pérez Maillard (WHO
v1.0 was developed under the overall guidance of and
headquarters, Switzerland), Abbie Raikes (University of
conceptualization by Tarun Dua and Dévora Kestel of the
Nebraska Medical Center, USA), Arunangshu Dutta Roy
Department of Mental Health and Substance Use of the
(Projahnmo Research Foundation, Bangladesh), Marta
World Health Organization (WHO).
Rubio-Codina (Inter-American Development Bank, USA),
Sunil Sazawal (CPHK, United Republic of Tanzania),
Project coordination
Yvonne Schönbeck (TNO, Netherlands), Jonathan Seiden
The development of the GSED package was coordinated (Harvard Graduate School of Education, USA), Fahmida
by Vanessa Cavallera, Brain Health Unit, Department of Tofail (International Centre for Diarrhoeal Disease
Mental Health and Substance Use, WHO. Research, Bangladesh), Marcus Waldman (University of
Nebraska Medical Center, USA), Ann M Weber (University
Writing of the Technical Report of Nevada, USA), Yunting Zhang (Shanghai Jiao Tong
University School of Medicine, China), Arsène Zongo (IPA,
The writing team included (in alphabetical order):
Côte d’Ivoire).
Salahuddin Ahmed (Projahnmo Research Foundation,
Bangladesh ), Romuald Kouadio E Anago (Innovations
Technical contribution and review
for Poverty Action [IPA], Côte d’Ivoire), Abdullah H Baqui
(Johns Hopkins University, United States of America Valuable input was received from WHO staff at
[USA]), Maureen Black (RTI International and University of headquarters and in regional and country offices, as
Maryland School of Medicine, USA), Alexandra Brentani well as from many international experts and programme
(University of São Paulo Medical School, Brazil), Kieran implementers. These technical and/or data contributions
Bromley (Keele University, United Kingdom), Stef van were central to the development of the GSED package.
Buuren (Netherlands Organisation for Applied Scientific Those who provided input include (in alphabetical order):
Research [TNO] and Utrecht University, Netherlands), Amina Abubakar (AKU, Kenya), Fahad Aftab (Center for
Vanessa Cavallera (WHO headquarters, Switzerland), Public Health Kinetics, United Republic of Tanzania ),
Symone Detmar (TNO, Netherlands), Tarun Dua (WHO Claudia Regina Lindgren Alves (Universidade Federal
headquarters, Switzerland), Arup Dutta (Center for de Minas Gerais, Brazil), Omar Ali Ali (WHO Country
Public Health Kinetics [CPHK], United Republic of Office, United Republic of Tanzania), Elisa Altafim
Tanzania), Iris Eekhout (TNO, Netherlands), Melissa (Universidade de São Paulo, Brazil), Maria Caridad
Gladstone (University of Liverpool, United Kingdom), Araujo (Inter-American Development Bank, USA), Orazio
Katelyn Hepworth (University of Nebraska, USA), Attanasio (Institute for Fiscal Studies, United Kingdom),
Andreas Holzinger (IPA, Côte d’Ivoire), Magdalena Janus Farzana Begum (AKU, Pakistan), Florence Baingana
(McMaster University, Canada), Fyezah Jehan (Aga Khan (WHO Regional Office for Africa, Congo), Molly Biel
University [AKU], Pakistan), Fan Jiang (Shanghai Jiao (WHO headquarters, Switzerland), Geoffrey Bisoborwa
Tong University School of Medicine, China), Patricia (WHO Regional Office for Africa, Congo), Andrea Bruni
Kariger (Independent Consultant, USA), Raghbir Kaur (WHO Regional Office for South-East Asia, India),
(WHO headquarters, Switzerland), Rasheda Khanam Betzabe Butron Riveros (WHO Regional Office for the
(Johns Hopkins University, USA), Gillian Lancaster Americas, USA), Claudia Cappa (UNICEF, USA), Andreana
(Keele University, United Kingdom), Dana McCoy Castellanos (Afinidata, Guatemala), Susan M Chang
(Harvard Graduate School of Education, USA), Gareth (University of the West Indies, Jamaica), Alexandra Chen
McCray (Keele University, United Kingdom), Imran (Harvard University, USA), Anne Marie Chomat (McGill
v
GSED v1.0 Technical report
University, Canada), Marie-Helene Cloutier (The World the Americas, USA), Nicole Petrowski (UNICEF, USA),
Bank, USA), Bernadette Daelmans (WHO headquarters, Lauren Pisani (Save the Children, United Kingdom),
Switzerland), Sandra Dao (WHO Country Office, Côte Helen Pitchik (University of California at Berkeley
d’Ivoire), Teshome Desta Woldehanna (WHO Country School of Public Health, USA), Chemba Raghavan
Office, Kenya), Erinna Dia (UNICEF, USA), Bernice M (UNICEF, USA), Muneera Rasheed (AKU, Pakistan),
Doove (Maastricht University, Netherlands), Dan Fang Lisy Ratsifandrihamanana (Centre Médico-Educatif,
(WHO Country Office, China), Wafaie Fawzi (Harvard Madagascar), Sarah Reynolds (University of California
University, USA), Lia CH Fernald (University of California, at Berkeley School of Public Health, USA), Linda
USA), Emanuela Galasso (The World Bank, USA), Richter (University of the Witwatersrand, South Africa),
Sally M Grantham-McGregor (Emeritus Professor of Peter C Rockers (Boston University School of Public
International Child Health at University College London, Health, USA), Marta Rubio-Codina (Inter-American
United Kingdom), Shuchita Gupta (WHO Regional Development Bank, USA), Norbert Schady (The World
Office for South-East Asia, India), Cristina Gutierrez de Bank, Washington), Khalid Saeed (WHO Regional
Piñeres (United Way, Colombia), Jena Derakhshani Office for the Eastern Mediterranean, Egypt), Makeba
Hamadani (International Centre for Diarrheal Disease Shiroya (WHO Country Office, Kenya), Kathleen Louise
Research, Bangladesh), Charlotte Hanlon (Addis Strong (WHO headquarters, Switzerland), Christopher
Ababa University, Ethiopia), Natalia Henao (Genesis R Sudfeld (Harvard University, USA), Edwin Exaud Swai
Foundation, Colombia), Alaka Holla (The World Bank, (WHO Country Office, United Republic of Tanzania),
USA), Sidra Inayat (AKU, Pakistan), Matias Irarrazaval Safila Telatela (WHO Country Office, United Republic
(WHO Regional Office for the Americas, USA), Nemes of Tanzania), Daisy Trovada (WHO Country Office,
Joseph Iriya (WHO Country Office, United Republic of Mozambique), Martin Vandendyck (WHO Regional Office
Tanzania), Charles Newton (Kenya Medical Research for the Western Pacific, Philippines), Paul H Verkerk
Institute, Kenya), Ledia Lazeri (WHO Regional Office (TNO, Netherlands), Susan P Walker (Caribbean Institute
for Europe, Denmark), Zhao Li (WHO Regional Office for Health Research, Jamaica), Christine Wong (Hong
for the Western Pacific, Philippines), Pamela Jervis Kong University, China), Dorianne Wright (Oregon
(Universidad de Chile, Chile), Codie Kane (Temple Health & Science University, USA) and Aisha K Yousafzai
University, USA), Simone M Karam (Federal University (Harvard University, USA).
of Rio Grande Brazil), Janet Kayita (WHO Regional
Office for Africa, Congo), Jamila Khalfan (Public Information technology programming
Health Laboratory-IdC, United Republic of Tanzania),
We acknowledge UniversalDoctors (Jordi Serrano Pons,
Neema Kileo (WHO Country Office, United Republic of
Fernando Vaquero, Jeannine Lemaire and Montse
Tanzania), Betsy Lozoff (University of Michigan, USA),
Garcia) for conceptualization and initial information
Diego Luna (The World Bank, USA), Raquel Dulce
technology support to the GSED application creation,
Maguele (WHO Country Office, Mozambique), Limbika
and the CPHK (Arup Dutta, Vishi Saxena, Waseem Ali,
Agatha Maliwichi (University of Malawi, Malawi), Sheila
Poonam Rathore) for further conceptual development
Manji (WHO headquarters, Switzerland), Susanne
and operationalization of the GSED App as well as
Martin Herz (University of California San Francisco,
data management throughout implementation of the
USA), Jeffrey Measelle (University of Oregon, USA),
project.
Girmay Medhin (Aklilu Lemma Institute of Pathobiology,
Ethiopia), Patricia Medrano (University of Chile, Chile),
Financial support
Junaid Mehmood (AKU, Pakistan), Rajesh Mehta (WHO
Consultant, India), Ana MB Menezes (Universidade Support for this work was received from (in alphabetical
Federal de Pelotas, Brazil), Assumpta Muriithi (WHO order): the Bernard van Leer Foundation, Bill &
Regional Office for Africa, Congo), Néllia Mutisse (WHO Melinda Gates Foundation, Children’s Investment
Country Office, Mozambique), Alphoncina Nanai Fund Foundation, Jacobs Foundation, King Baudouin
(WHO Country Office, United Republic of Tanzania), Foundation United States and the United States Agency
Renato Oliveira e Souza (WHO Regional Office for for International Development.
vi
GSED v1.0 Technical report
Abbreviations
HAZ height-for-age z-score
2PL two-parameter logistic
HF Household Form
AKU Aga Khan University
HOME Home Observation Measurement of the
ASQ Ages and Stages Questionnaire Environment
Bayley-III Bayley Scales of Infant and Toddler ICC intraclass correlation coefficient
Development, Third Edition
IPA Innovations for Poverty Action
BRS Brief Resilience Scale
IRT item response theory
CAT computerized adaptive tests
IYCD WHO Indicators of Infant and Young Child
CB combined format Development
CI confidence interval LF Long Form
CPAS Childhood Psychosocial Adversity Scale ODK Open Data Kit
CPHK Center for Public Health Kinetics PF Psychosocial Form
CREDI Caregiver Reported Early Development PHQ9 Patient Health Questionnaire-9
Instruments
PRIDI Regional Project on Child Development
D-score Developmental score Indicators
DAZ Development-for-Age z-score SD standard deviation
DDI Dutch Development Instrument SDG Sustainable Development Goal
DHS Demographic Health Survey SEE standard error of estimation
ECD early childhood development SES socioeconomic status
ECDI2030 Early Childhood Development Index 2030 SF Short Form
EDC electronic data capture SME subject matter expert
FCI Family Care Indicator TNO Netherlands Organisation for Applied
FGD focus group discussion Scientific Research
FSS Family Support Scale TPP Target Product Profile
GMDS Griffiths Mental Development Scales WAZ weight-for-age z-score
GSED Global Scales for Early Development WHO World Health Organization
vii
© WHO / Alasdair Bell
viii
GSED v1.0 Technical report
Executive summary
This Technical Report documents the development and validation of the Global Scales for Early
Development (GSED). The GSED package v1.0 includes open-access measures that provide a
standardized method for measuring the development of children up to 36 months of age across
diverse cultures and contexts. It has been created to serve as a population-level assessment of
early childhood development (ECD) (up to 36 months) for the global community that may be
used for comparisons across countries. In contrast to growth, which is measured by changes
in children’s weight in grams and height in centimetres, there has been no uniform scale for
children’s early development. The GSED uses an innovative metric, the Developmental score
(D-score), a scale with interval properties, to measure children’s development.
The package includes the GSED measures v1.0 as Current evidence indicates that the psychometric
well as accompanying materials to facilitate their properties of the GSED SF and LF are comparable.
implementation and use. The GSED measures are The choice of one or the other, or the two together,
meant to collect population-level data on ECD and to measure child development should be dictated
are designed to be used primarily for research and by: i) the purpose of the evaluation and/or specific
programmatic evaluations. research question (e.g. type of intervention); ii)
They comprise a: preferred modality of administration (caregiver report
versus direct administration); and iii) the capacity and
expertise of the team. A combined format (CB) of GSED
SF together with GSED LF may be used to increase
GSED v1.0 Short Form
measurement precision. Further evidence on sensitivity
to potential changes after interventions and increase in
Global Scales for Global Scales for
Early Development v1.0 Early Development v1.0
Short Form Long Form
(caregiver-reported) (directly administered)
precision of the GSED CB is currently being tested and
will be made available in the near future.
ix
GSED v1.0 Technical report
Household Psychosocial
Form Form
The GSED measures were created through an integrated empirical-conceptual approach. The
empirical approach included statistical modelling of a data set of 100 153 observations. The
conceptual approach included subject matter experts (SMEs) identifying conceptually relevant
and globally feasible items of child development for children under 36 months of age.
A rigorous and standardized method has been used to evaluate the psychometric properties
of the newly-created GSED measures in seven countries. A prospective cross-sectional design
was implemented, including a six-month longitudinal follow up, with an age- and sex-stratified
sample of children.
This interim package (v1.0) is based on validation data from three countries: Bangladesh,
Pakistan and the United Republic of Tanzania. The results demonstrate statistically significant
reliability and validity of the D-score to measure child development under 36 months of age at
the population level. Specifically, convergent validity measured through contextual measures
likely to be related to child development (e.g. socioeconomic status [SES] and exposure to
adversity) and concurrent validity with the Bayley Scales of Infant and Toddler Development
(Bayley-III) were statistically significant. Additional analyses have shown that GSED measures
are culturally neutral, have good content validity and are easy to implement at scale. Revisions
to the package are planned for 2024 when data are available from four additional countries:
Brazil, China, Côte d’Ivoire and the Netherlands.
1 Not part of this Technical Report in detail as they are being further tested, but can be made available on request.
x
GSED v1.0 Technical report
1. Introduction
The pathways to adult health and well-being begin in childhood are often measured by
children’s growth and development. Both are products of children’s specific genetic blueprint
and influenced by environmental factors that begin prenatally (1). Children in the early years
and, in particular, the first 1000 days (from conception to age 24 months) are highly sensitive to
environmental conditions due to the rapid brain development that occurs during this period.
Monitoring of child development at this time is important to track progress toward global and
national policy goals for children and provides a critical reference to plan and evaluate services
to support healthy development. However, there have been no universal measures designed to
quantify children’s development during the earliest years at population level (2). Without such
measures, countries are unable to monitor children’s progress and determine how to allocate
resources to provide the necessary support for children to reach their developmental potential.
Measures of stunting and severe poverty have been a proxy based on findings that chronic linear growth
effective indicators of the proportion of children at risk faltering (i.e. stunting: height-for-age < 2 standard
of not reaching their developmental potential (3) and deviations [SDs] below WHO growth standards) is
have contributed to advances in global policies and associated with fewer years of adult schooling, poorer
programmes for young children (4). However, they economic indices and greater likelihood of experiencing
have limited predictive ability (5) and lack sensitivity, poverty (6,7). Reductions in the proportion of childhood
and as such they are not suitable to measure change in stunting is frequently used as a national and global
response to interventions or environmental conditions. target, often also as a proxy for developmental
Countries have traditionally used physical growth as outcomes improvement.
1
GSED v1.0 Technical report
The urgent need for a comprehensive measure To address the lack of a population or programmatic
and reporting system for child development has measure or metric of ECD, WHO assembled an
been reinforced by the United Nations Sustainable interdisciplinary and multi-country team to develop
Development Goals (SDGs). Specifically, SDG4 the GSED. These measures of ECD for children up to 36
(education) calls for ensuring inclusive and equitable months of age provide a metric of child development
quality education and promoting lifelong learning (the D-score) at both the population and programmatic
opportunities. SDG4 includes an early childhood- level as well as a system for interpreting scores.
specific indicator, 4.2.1, which mandates measurement
of the proportion of children under 5 years of age who This Technical Report summarizes the process of
are developmentally on track in health, learning and creating the GSED and related measures, and describes
psychosocial well-being, by sex (8). WHO’s Multicentre the validation methodology and psychometric
Growth Reference Study has shown that under optimal properties in three countries (Bangladesh, Pakistan
conditions, children’s early growth (up to age 2 years) and the United Republic of Tanzania). Data from
is comparable across countries (9). These findings form additional countries (Brazil, China, Côte d’Ivoire and the
the basis for global growth standards that have been Netherlands) (currently being collected) will address
adopted by 125 countries. Similarly, emerging evidence broader global validity and inform potential revisions of
suggests universality in multiple domains of children’s the measures. The report concludes by describing the
development across countries during the first 2 to 3 release of the package v1.0, the next steps for GSED, and
years of life (10,11). Additionally, in contrast to growth, the process of dissemination and continuous feedback
which is measured universally by changes in children’s to enable the use of GSED to monitor progress towards
weight in grams and height in centimetres, there is no global goals and inform programmes and policy to
unit (or metric) for measuring child development. promote the health and development of young children
globally.
2
GSED v1.0 Technical report
2. GSED: overview
The GSED is an open-access package specifically designed to provide a standardized method
for measuring development of children up to 36 months of age at population level globally. The
GSED meets that objective by incorporating all domains of ECD through a common scale for
measurement translated into a single score, the D-score, that represents holistic development.
This section of the report provides an overview of the D-score approach and the GSED measures.
The D-score
The D-score is a unit of measurement with an interval acquire (see Figure 1). It is calculated as the mean of
scale representing child development by a single the posterior distribution conditional on the responses,
number (12,13). As height (in centimetres) and weight the items’ difficulty and the child’s age. The D-score
(in grams) change over time with the growth of the may also be transformed into the Development-for-
child, development (measured in D-score units) also Age z-score (DAZ). The DAZ is age-independent and is
increases with age as the child acquires more skills. The scaled such that at each age, the distribution of scores
D-score is calculated from Yes/No responses on a set is normally distributed with a mean of 0 and a variance
of age-appropriate developmental items (e.g. “Can the of 1. Since DAZ adjusts for the natural increase in
child stack two blocks”, or “Does the child use two-word D-score with age, it helps ease the comparison between
sentences?”). Conceptually, a child’s D-score falls along samples from different ages or countries. Similar to
a developmental continuum, beginning with simple height-for-age z-score (HAZ) and weight-for-age z-score
skills and behaviours that the child is able to perform (WAZ), the DAZ is calculated relative to a reference
and progressing through the child’s repertoire until population.
reaching skills and behaviours that the child has yet to
3
GSED v1.0 Technical report
4
GSED v1.0 Technical report
3. Development
of GSED v1.0 SF
and LF measures
This section describes the development process of the GSED SF and LF measures, from
conceptualization through statistical analysis and expert consensus, to the prototypes for
validation (16). The GSED development followed a rigorous methodological process which
required a multi-step approach as summarized in Figure 2.
Item matching
Step 3
Item shortlisting
Short Form Creation of R Shiny App
Long Form
Global Scales for Early
Global Scales for Early
Development (GSED) Development (GSED)
Caregiver Report Direct Observation
GSED SF GSED LF
creation creation
5
GSED v1.0 Technical report
6
GSED v1.0 Technical report
Use the age range and Tip: use “page down” Do not put anything
domains for quicker to efficiently skim in red cells
navigation through the items to
be matched
7
GSED v1.0 Technical report
The difficulty parameters were estimated for a subset of 818 items that fitted the Rasch model
as judged by infit and outfit1 criteria using 17 equate groups2 that anchored all instruments
to the D-score scale. Scores extracted from the Rasch model were compared to scores from a
more general two-parameter logistic (2PL) model, which uses one additional discrimination
parameter per item. Extremely strong correlations between scores from the two models were
observed (r = 0.97), and so the more parsimonious Rasch model was selected over the 2PL.3
1 Infit and outfit are “fit” statistics. In a Rasch context they indicate how accurately or predictably data fit the model.
2 Equate refers to groups of items that were held to have a constant difficulty (tau) across tools, based on evidence-led analysis of item similarity.
The equate groups permitted the different instruments used for different children to be on the same scale.
3 For a deeper discussion on model choice, see the AERA 1992 debate between Hambleton and Wright
(https://www.rasch.org/rmt/rmt61a.htm and https://www.rasch.org/rmt/rmt62d.htm).
8
GSED v1.0 Technical report
Motor • Gross
• Fine
Language • Receptive
• Expressive
• Problem-solving/reasoning
9
GSED v1.0 Technical report
Items in the GSED SF were ordered by level of difficulty, based on data available on each item
reflecting children’s emerging skills. The order was slightly revised by SMEs to ensure it was
consistent with child development theory. Items in the GSED LF were first grouped in three
streams to facilitate administration flow during direct administration to the child and then
ordered following the same process described for the GSED SF. Each stream uses specific
materials to facilitate administration. Stream A includes items related to physical activity and
movement, Stream B uses tablet-based images and Stream C uses materials from the GSED LF
Kit (see below).
One hundred thirty-nine items were shortlisted for the GSED (caregiver response options of Yes,
No and Don’t know) and 155 items for GSEF LF that either observed incidentally or elicited by
the assessor or both (response options are binary, skill observed or not). “Start” and “stop” rules
are used for both forms’ administration based on child’s age and performance.
10
GSED v1.0 Technical report
11
GSED v1.0 Technical report
12
GSED v1.0 Technical report
4. GSED validation
This section describes the methodology undertaken for the validation of the GSED measures.
The GSED validation study was planned in seven countries varying in geography, language,
culture and income, and implemented in two rounds (since funding was received in two
stages): Round 1 included Bangladesh, Pakistan and the United Republic of Tanzania
(validation completed and results informed this technical report); and Round 2 including
Brazil, China, Côte d’Ivoire and the Netherlands (data collection for the validation ongoing).
Data from Round 2 countries will further expand the generalizability of the results for global
use and is expected to be published in early 2024. Figure 6 shows the GSED validation partners
in each country.
The study utilized a rigorously standardized protocol The work included a preparation and feasibility phase
across countries with a mixed qualitative and and the main validation data collection phase. The
quantitative methods approach combining cross- methodology of both phases is described below. The
sectional and longitudinal approaches to evaluate results presented in this report are limited to Round 1
the psychometric properties of the GSED SF and LF: countries where data collection has been completed.
reliability (inter-rater and test-retest), concurrent
validity, convergent validity, short-term predictive In addition to the main validation effort, external
validity at six months and responsiveness (25). This research groups have proposed supporting
study received ethical approval from WHO (protocol generation of further GSED validation data through
GSED validation 004583 20.04.2020) and approval in the inclusion of the GSED package as secondary
each site. research outcomes in ongoing studies (see Box 1).
13
GSED v1.0 Technical report
China National Children’s Medical Center/Shanghai Children’s Medical Center, Shanghai, China
14
GSED v1.0 Technical report
15
GSED v1.0 Technical report
translators. This back-translation was then shared with Some of the key findings from the preparatory and
the WHO team, who initiated an iterative process of feasibility phase are provided below.
revisions with the local team before approval. During
the preparation and feasibility phase the approved Visit schedules Data were collected in two visits (in
translations were tested (see below) so that final Bangladesh both a one-visit and two-visit option were
modifications, if necessary, could be made before data tested) with the sample divided into subgroups to test
collection. This process led to adaptation of the GSED the feasibility of different visit schedules in terms of
measures to include cultural nuances for language and order of administration of study measures and settings
context; few items were revised with minor rephrasing for the visits. The first visit was done at home to test
in each country. the GSED SF in the setting for which it was intended for
future use (i.e. household surveys) and to administer the
Training study measures (see Annex 2) aimed at capturing the
child’s everyday home living environment. In Pakistan
One joint in-person training of trainers course was
and the United Republic of Tanzania, the second visit
conducted for the study teams. The training covered
was carried out in a mobile clinic/clinic setting to
implementation processes and all data collection forms.
facilitate anthropometric testing and standardize the
The training included workshops on child development
setting for concurrent validation (with Bayley-III). In
principles, detailed review of administration processes
Bangladesh, the second visit was done at home due
and item-by-item review. It included practical sessions
to transportation and travel times between the sites.
with live demonstrations, role-plays, and practice in
This process worked well and was confirmed for the
pairs and with both caregivers and children. Participants
validation. The two-visit model was also confirmed in
in the training were certified as local master trainers and
all countries, including Bangladesh, due to time needed
were then responsible for training the field assessors
for administration. Fine-tuning of the sequence in
in their country. For the local assessors’ training, a
which the study measures were administered took into
structured two-week training programme was designed
account how to optimize the privacy needed for some
by the local master trainers, in consultation with
of them.
the WHO team, with thematic/didactic sessions in
classrooms and practice sessions. To be able to collect
Reliability testing Two reliability testing processes
data, field assessors completed a certification process,
were carried out in a subsample of a minimum of
which entailed achieving an agreement of 90% on
16 children per country to explore the feasibility of
the forms’ scoring between the assessor and the local
conducting multiple administrations of study measures
master trainer while administering the GSED forms.
within fixed time frames. First, for GSED LF, both a video
recording approach (Bangladesh N=15 and Pakistan
Feasibility of implementation processes
N=12) of the GSED LF assessment was tested (with
Data were collected from a minimum of 32 children videos watched and independently scored by other
per country, stratified by age group and sex, to test assessors and the study master trainers as well as a
feasibility and acceptability of administration of the simultaneous coding by another assessor (N=8 per
study instruments, as well as to finalize implementation country). For GSED SF (and GSED PF) audio recording
processes, such as visit schedules including time (Bangladesh and Pakistan, N=16) the administration
required, and the use of tablets. The sample adequately was carried out and independently scored by other
enabled all implementation procedures to be tested assessors on the study team. In Pemba, United Republic
such as for reliability testing processes, checks of Tanzania, the reliability testing was completed via
carried out on completeness of data collection and live coding of the administration of the test by another
whether any items from the measures, both GSED assessor (GSED SF N=22, GSED LF N=23]. Several
and contextual, had a significant number of missing limitations were encountered with the video and
or otherwise invalid responses in any of the countries. audio recordings. As the camera was placed in a fixed
16
GSED v1.0 Technical report
location for the video recordings, the motor component was reinforced with a focus on supervised practice to
of the GSED LF assessment was difficult to capture ensure that data collectors were sufficiently comfortable
consistently and be coded. In addition, some sites faced with and efficient about administration of the study
issues with providing sufficient lighting during recording measures. Additionally, emphasis was placed on further
to enable later coding. The audio recording method for clarifying the time commitment for study participation
GSED SF posed a challenge pertaining to the quality at the consent stage and deciding to implement data
of the voice recordings. The in-person approach with collection in two separate visits at all sites. Some
an independent assessor coding the observations concerns were raised by caregivers and assessors about
simultaneously with another assessor administering the sensitivity of some of the questions (see Annex 2
the GSED measures was found to be feasible and of for study measures), which were addressed by ensuring
higher quality for inter-rater reliability and was therefore adequate privacy for conversations with the caregiver
implemented in all sites during the validation phase. (e.g. arranged at a health clinic or outside of the home,
or during a time when a private setting could be found
Quality control Ten per cent of the scheduled visits at home). In the GSED LF exit interviews, some parents
of each assessor in each site were randomly selected said the skills tested (28%) and materials administered
for a quality control visit by the master trainer or local (43%) were unfamiliar to children in their community.
supervisor. During these visits, the master trainer or Based on the preliminary quantitative analyses the
supervisor completed a checklist for the administration items in question (e.g. those including blocks, shape
of the tests that included verifying the child’s and board, peg board) seemed to perform well and were
mother’s ages, date of birth, consent, rapport building kept without significant change. Moreover, 15% of
and accuracy in administration of the study measures. respondents also specifically offered positive comments
In this testing phase, at least 10% of the total visits about their experiences with the GSED LF, such as
at each site were also video recorded. The direct learning what their children could do and the need for
administration approach for quality control was found more education on ECD.
to be feasible and reliable as compared to video
assessment, due to challenges with the video recording Qualitative data – focus group discussions (FGDs)
mentioned above. At the end of the feasibility phase, virtual structured
FGDs with the assessors, supervisors and study
Qualitative data - exit interviews During the managers from each country were conducted (N=42).
feasibility phase, exit interviews with caregivers The purpose was to elicit local field team feedback
were conducted to understand the caregiver on implementation processes (consent process, ease
experience in the consent process, the acceptability of administration of study measures, visit schedules,
of GSED administration, visit schedules (N=63) and use of the GSED App, training needs, comprehension
the acceptability of the GSED LF items, materials, of the items by assessors and caregivers), and cultural
instructions and procedures (N=72). The assessors appropriateness of GSED SF and GSED LF items and
recorded the responses and any narrative comments the GSED LF Kit materials. FGDs were audio-recorded,
by caregivers on paper, and these were later translated transcribed and translated by local staff with specialized
into English. From these exit interviews, it emerged training in qualitative methods. These data were
that most (> 90%) respondents said that the various analysed using Dedoose (26). Codes were created for
aspects of the implementation process (e.g. comfort the items included in the FGDs and applied to the
with items asked, where and when they were asked responses from participants. Themes were identified
to respond to items) were acceptable. However, 14% for summary analysis. Overall, the results indicated that
of respondents said that the duration of the visit was the assessors‘ experiences with the administration of
very long. The duration of the total interview time the study measures were positive. The main concerns
was expected to decrease with familiarity with the expressed were, consistent with caregivers‘ experiences
study measures; nonetheless, the training process shared in exit interviews, about ensuring privacy for
17
GSED v1.0 Technical report
measures with sensitive items (such as caregiver Based on the exit interviews and FGDs, along with
depression or exposure to violence, see Annex 2). The feedback from the translation process, 13 GSED LF items
FGDs also prompted changes in how to introduce the were identified as difficult to administer or understand.
GSED SF (e.g. more information to be provided about These items were adapted to facilitate administration,
the fact that, by design, the validation of the measures or rephrased to improve ease of translation and/or
implies that some items may seem repetitive to the clarity of administration instructions. For example, the
respondent) and tips for optimizing administration of GSED LF item to assess the child’s understanding of the
the GSED LF (e.g. offering materials, such as blocks, to concepts of “more” and ”less” initially asked the child to
children to familiarize them with the objects prior to indicate which of two cups contained more water. This
administration). The FGDs were also useful for teams to item was reportedly hard for children to understand
strategize how to manage implementation challenges and complete as some children wanted to play with the
during GSED LF administration (e.g. keeping children water or drink from the cups. This item was adapted by
engaged, avoiding distractions, etc.). asking the child to indicate which of two piles of blocks
had more blocks (in place of the cups with water).
Validation phase
Study population children who were low birth weight (< 2500 g); born
preterm or late term (gestational age < 37 or ≥ 42
The study population included children 2 weeks to
completed weeks); undernourished (weight-for-age,
41 months of age (inclusive) living in the study areas.
length-for-age or weight-for-height z-score < –2 SD
Children up to 41 months were included to ensure
based on the WHO Child Growth Standards) at the
that measure parameters could be estimated with
time of developmental assessment; had a known
adequate precision at 36 months. Inclusion criteria
severe congenital birth defect, history of birth asphyxia
were the child’s age, the child’s primary caregiver
or neonatal sepsis requiring hospitalization, known
(person most familiar with the child and spending most
neurodevelopmental disorder or disability, or other
time with him/her) was available to participate in the
chronic health problem; or primary caregiver had less
study, and the family spoke to the child in one of the
than secondary level education.
GSED translated languages. Children were excluded if
gestational age or birth weight data were missing.
Sample size and sampling scheme
A subsample of children was used to estimate The target sample size per site was 1248 children (total
preliminary reference scores and is henceforth referred 3744 in the three countries). In each site, children were
to as the “reference” sample. Additional exclusion sampled from a list of potentially eligible caregiver-child
criteria were applied to this subsample to exclude dyads residing in the defined study areas. Only one child
18
GSED v1.0 Technical report
per caregiver or multi-family household was selected GSED administered first and vice versa); and 504 to
and the target children’s primary caregiver approached re-evaluation six months later for predictive validity.
for consent and enrolment. Children who were acutely
unwell at the time of assessment were rescheduled after Within the predictive validity subsample, children
seven days. Refusals to participate and drop-outs were were further divided into groups that also received
registered and replaced. After consent was obtained, the GSED adaptive measure to determine whether
children were allocated to sex and age groups using the adaptive testing is a feasible and valid option to
sampling scheme in Figure 7. Larger quotas were set for measure child development within the GSED (see Box 2
the youngest age groups where rates of development for further details), or the UNICEF’s Early Childhood
are steepest. Out of the full sample of 1248 children per Development Index 2030 (ECDI2030) measure to inform
site, 154 were randomly allocated to either inter-rater harmonization of measurement of child development
(N=99) or test-retest (N=55) reliability testing; 166 to up to 5 years at population level (see Box 3 for further
concurrent validity testing with the order of GSED and details).
Bayley-III administrations counter-balanced (N=83 with
Reliability Concurrent
N=154 (140 + 10%LTF – STOP at 140) N=166 (150+10%LTF – STOP at 150)1
Predictive(N=504) 2
Adaptive (N=432 + 72 new)3
ECDI (N=230)4
[1] The number inside parentheses is the number collected and the number outside is the number randomized to account for loss to follow-up.
[2] Two additional participants have been added to the predictive to have equal numbers in each experimental group.
[3] 72 new children between 2 weeks and 6 months of age have been added to the adaptive sample to ensure coverage at the lower ages.
[4] ECDI was only done on N=230 children of 24 months and above at the time of the predictive data collection.
19
GSED v1.0 Technical report
Answer
Start item
Calculate
Next item D-score
Evaluate
No stop rule:
—min standard Final
error of Yes D-score
measurement |
—max items
20
GSED v1.0 Technical report
21
GSED v1.0 Technical report
Visit schedule and quality control participants for scheduling, data collection status for
the study age and gender bins and completion status of
Data collection was scheduled over one to three visits the participants in the study. The data from the module
depending on the study site and subsample. The were reviewed weekly by the country teams and also
first administration of the GSED SF was completed at with WHO for study status monitoring.
home to test it in the setting intended for future use
(e.g. household surveys) and prior to administration of The prototype was pre-tested through Google Play
the GSED LF to avoid influencing caregiver responses. with two rounds of feedback by the field teams, SMEs
The GSED LF and Bayley-III were administered in a and the statistics teams. The key feedback received
controlled environment (e.g. clinic or quiet residential focused on the visual interface, colours and fonts,
area) to match the required testing protocols of the number of questions per screen, and the functioning
concurrent validity measure. To ensure high quality, of the administration rules as intended, as well as the
10% of all study visits were observed in person by facility of use of the media files (GSED SF) and in-built
study supervisors, covering each child age band and administration instructions support (GSED LF). The
certified assessor. The supervisors had at least five years GSED App was then revised and field-tested in the
of experience in community-based research and/or feasibility phase of the validation for ease and accuracy
formal education in fields related to ECD (e.g. teaching, of data collection and transfer. Following the feasibility
nursing, psychology). Supervisors independently phase, the revised GSED App version was released and
completed questionnaires administered by the assessor tested for the following features: placement of media
and completed a fidelity checklist to provide feedback files and questions on screen for improved speed in
to assessors. Supervisors reviewed quality assurance using the App, inclusion of a pop-up screen to inform
findings with WHO biweekly. The GSED App for data the assessor the age of the child to be assessed, and
collection provided built-in data range and consistency inclusion of a pop-up asking if all questions have been
checks. Data managers reviewed and resolved issues completed for the child’s age as per the rules prior to
daily in consultation with the local field and/or WHO saving the forms.
team.
Once collected, the data were stored in local password-
Data management protected user authenticated servers. The de-identified
As described above, to optimize ease of administration data were securely transferred to the WHO central data
of the GSED measures and minimize data entry errors, repository by each site. They were transferred weekly
the GSED App was designed which also improved for the first month of data collection and then bi-
standardization of data collection across study sites. In monthly. The weekly data collected in the first month
addition to GSED measures, all other study measures were reviewed for consistency with the data dictionary,
were designed and incorporated in the same app. checks for missing data, data formatting and diagnosing
ODK aggregate with MySQL database was used as any potential problems (missing or non-sensical data).
an aggregator. Lastly, a separate data management Teams maintained detailed logs related to procedures
and monitoring module was designed enabling for rescheduling or incomplete visits. These were
the study team to effectively manage, monitor and reviewed weekly for the resolution of queries with the
generate analysable output files. The module for data WHO team. The final data set was transmitted by each
management allowed data managers to check for the country to the WHO repository for analysis.
completion status of the forms, flagged missing data,
status of the visit schedule and the visit windows for the
22
GSED v1.0 Technical report
5. GSED psychometric
properties
This section addresses various aspects of the GSED’s psychometric performance in the three
Round 1 countries (Bangladesh, Pakistan and the United Republic of Tanzania). The analyses
in this section, with the exception of the analysis of reliability, are performed on the combined
measure, i.e. the GSED SF and GSED LF data together. This is to reflect that primarily the scale
from which the measures are drawn is being validated, rather than the individual tools. However,
in order to show the limited differences between the psychometric properties of the GSED LF
versus the GSED SF versus the CB, concurrent, convergent and short-term predictive validity for all
three forms of the measure are presented in Annex 3.
In total, data were collected on 4452 children across the numbers collected for each of the sites by measure.
three countries. Some countries collected more data Data from 41 children were removed from the analysis
than the specified sample size in order to: i) meet the based on notes provided by the countries that the data
minimum quota of the reference sample; and ii) ensure were invalid for various reasons (e.g. duplicate entries,
every age group stratum was sampled to the specified withdrew from the study, etc.), and 62 participants were
level. Data were analysed on 4349 children, with removed as they had neither GSED LF nor GSED SF data
randomly selected children contributing to the various available at baseline due to incomplete administration
reliability and validity subsamples. Table 2 shows the of the battery of tools.
23
GSED v1.0 Technical report
Table 3 presents the demographic characteristics of up challenges. The large sample size, relative to similar
the sample. The mean age of the children is higher in studies, and the standardized implementation of the
Pakistan than in the other countries because more older tools across multiple countries lends strength to the
children were recruited to ensure a sufficient sample for validity inferences and robustness of the results of the
the predictive validity subsample after 6-month follow- study.
Male (%) 50 50 50 50
Mean age in days (SD) 432 (375) 475 (381) 432 (377) 448 (378)
Mean age in months (SD) 14.20 (12.32) 15.61 (12.52) 14.19 (12.38) 14.72 (12.42)
GSED DAZ (SD) 0.24 (0.75) -0.34 (0.85) 0.07 (0.81) -0.03 (0.84)
Gestational age – weeks (SD) 38.67 (1.71) 38.41 (2.00) 38.71 (1.76) 38.59 (1.85)
Birthweight - grams (SD) 2921 (422) 3251 (723) 3226 (513) 3143 (599)
Anthropometry – HAZ (SD) -1.30 (1.09) -1.33 (1.17) -1.27 (1.12) -1.30 (1.13)
Anthropometry - WAZ (SD) -1.09 (1.08) -1.45(1.02) -0.77 (1.07) -1.13 (1.09)
PHQ9 – N (%)*
Minimal 587 (44%) 1215 (74%) 808 (61%) 2610
Mild 437 (33%) 220 (12%) 422 (32%) 1079
Moderate 121 (9%) 89 (5%) 52 (4%) 262
Moderate-severe 84 (6%) 52 (3%) 27 (2%) 163
Severe 103 (8%) 65 (4%) 19 (1%) 187
HOME stimulation score (SD) 40.63 (3.77) 38.27 (5.44) 38.70 (3.81) 39.13 (4.61)
* For more information on definitions of these categories, see https://www.med.umich.edu/1info/FHP/practiceguides/depress/score.pdf.
24
GSED v1.0 Technical report
Internal reliability
The precision of the CB at any FIGURE 9. SEE PLOT BY GSED FORM
given age is greater than that of the
individual forms from which it is
constituted because of the larger
number of items. Figure 9 gives the
standard error of estimation (SEE)
(30) of the D-scores obtained for the
combined GSED SF and GSED LF
under the Rasch model. As there are
varying numbers of items pertinent
to different sections of the scale, the
precision of an estimate can vary
at different points along the scale.
The y-axis gives the SEE, the bottom FIGURE 10. PSEUDO-RELIABILITY PLOT BY
x-axis gives the D-score scale, and the GSED FORM
top x-axis gives the average D-score
for various child ages, expressed
in months. Note that the average
D-scores for any given age are non-
linearly related to age, reflecting the
decreasing rate of development seen
as children become older.
25
GSED v1.0 Technical report
External reliability
All the reliability metrics are excellent on the D-score points. Reliability is expressed here (see Table 4) as an
scale. External reliability is the extent to which a intraclass correlation coefficient (ICC), where a value of
measure produces the same score over theoretically 0 represents no reliability and a value of 1 represents
identical administrations. Inter-rater reliability is perfect reliability. Common benchmark values suggest
the measurement of agreement between different that 0 - 0.2 indicates poor reliability; 0.2 - 0.4 indicates
raters and test-retest reliability is the measurement fair reliability; 0.4 - 0.6 indicates moderate reliability;
of agreement for the same rater at two different time and values > 0.8 indicate excellent reliability (32).
D-score scale
Bangladesh Pakistan United Republic Total
of Tanzania
Inter-rater GSED 0.99 (0.99-0.99) 0.97 (0.96-0.98) 0.99 (0.99-0.99) 0.99 (0.98-0.99)
reliability SF N=95 N=100 N=96 N=291
Test-retest GSED 0.99 (0.99-1.00) 0.99 (0.98-1.00) 0.99 (0.99-1.00) 0.99 (0.99-1.00)
SF N=48 N=59 N=52 N=159
A value of 0 represents no
reliability and a value of 1
represents perfect reliability.”
26
GSED v1.0 Technical report
Concurrent validity
Concurrent validity, a type of criterion validity, is the pertinent to the specific sample, as far as possible.
extent to which a measure correlates with another Age-adjusted z-scores were only generated for the total
measure of the same construct, possibly a gold score, as the individual domains did not contain enough
standard, given at the same time. Here the criterion items to calculate the age standardization robustly.
measure is the Bayley-III, a widely-used measure that
frequently acts as a reference (33-35). To assess the Table 5 gives the Pearson’s correlations between the
correlation of the Bayley-III raw scores with GSED DAZ Bayley-III individual domains and total scores, with
on the same scale, a 2PL item response theory (IRT) the GSED D-scores. The GSED D-score correlates > 0.90
model was fitted to the Bayley-III item responses and with the domains of the Bayley-III in most domains and
a Generalized Additive Models for Location Scale and countries. The correlations are higher for the cognitive
Shape (GAMLSS) model (36) was used to remove the and motor items than for the communication items,
effect of age, in line with the methodology used to although in the total score the correlation is very high
construct the GSED DAZ. Sample-specific norms were (0.98).
constructed to ensure that the age adjustments were
TABLE 5. CONCURRENT VALIDITY FOR GSED BY BAYLEY-III AND BAYLEY-III DOMAINS FOR
EACH COUNTRY AND OVERALL – CORRELATION COEFFICIENT (95% CIs)
GSED D-score
D-score scale
Bayley-III domain Bangladesh Pakistan United Republic Total
of Tanzania
(N=159) (N=158) (N=478)
(N=161)
Cognitive 0.98 (0.97 - 0.98) 0.95 (0.93 - 0.96) 0.97 (0.96 - 0.98) 0.97 (0.96 - 0.97)
Receptive 0.91 (0.88 - 0.93) 0.88 (0.84 - 0.91) 0.91 (0.88 - 0.94) 0.90 (0.88 - 0.92)
communication
Expressive 0.94 (0.92 - 0.96) 0.87 (0.83 - 0.90) 0.89 (0.85 - 0.92) 0.90 (0.88 - 0.91)
communication
Fine motor 0.97 (0.96 - 0.98) 0.97 (0.95 - 0.97) 0.97 (0.95 - 0.97) 0.97 (0.96 - 0.97)
Gross motor 0.98 (0.97 - 0.98) 0.97 (0.97 - 0.98) 0.98 (0.97 - 0.98) 0.98 (0.97 - 0.98)
Overall Bayley-III score 0.99 (0.98 - 0.99) 0.97 (0.96 - 0.98) 0.98 (0.98 - 0.99) 0.98 (0.98 - 0.98)
DAZ scale
Overall Bayley-III age- 0.55 (0.44-0.65) 0.26 (0.11-0.41) 0.56 (0.44-0.66) 0.53 (0.47-0.6)
adjusted z-score*
* DAZ domain scores were not produced at a domain level as insufficient data existed to do this robustly.
27
GSED v1.0 Technical report
Convergent validity
Convergent validity is the assessment of how closely accurate for the DHS Wealth Index based on generated
a measure is correlated with other variables where internal scores rather than relying on the published
correlation is expected. Table 6 gives a selection of quintiles, according to weights established some years
variables that were, a priori, expected to correlate ago. A total score (i.e. all countries combined) was
with the GSED DAZ score based on evidence from the not generated for the DHS Wealth Index and maternal
literature. The table contains Pearson’s correlation education because the items and categories were not
coefficients with 95% CIs, unless otherwise specified. comparable across countries. For the total scores, all
Some variables were ordinal in nature and required the measures are statistically significant from zero in the
use of Spearman’s correlation coefficient. hypothesized directions at the 5% level of significance.
However, some measures do not differ significantly
For several of the multi-item variables which contained from zero in the expected directions in the country level
large amounts of missing data, a unidimensional 2PL analyses. Table 6 compares the results of the GSED SF
IRT model was fitted to extract summary scores, under and GSED LF when administered in a CB against the
the Missing at Random missingness assumption. convergent contextual variables.
The 2PL model may also be more appropriate and
SES - DHS Wealth Index*** 0.10 (0.05-0.16) 0.14 (0.09-0.19) 0.15 (0.10-0.20) NA
Anthropometry - HAZ 0.21 (0.16-0.26) 0.18 (0.13-0.23) 0.21 (0.16-0.26) 0.19 (0.16-0.22)
Anthropometry - WAZ 0.21 (0.16-0.26) 0.17 (0.12-0.22) 0.17 (0.12-0.23) 0.23 (0.20-0.26)
Birth weight 0.16 (0.10-0.21) 0.03 (-0.02-0.08) 0.20 (0.14-0.25) 0.04 (0.01-0.07)
Gestational age 0.11 (0.05-0.16) 0.16 (0.11-0.21) 0.21 (0.15-0.26) 0.17 (0.14-0.20)
PHQ9 category* -0.05 (-0.10-0.01) -0.04 (-0.08-0.02) 0.02 (-0.03-0.07) 0.05 (0.02-0.08)
28
GSED v1.0 Technical report
29
GSED v1.0 Technical report
GSED DAZ at baseline vs 0.55 (0.48 - 0.61) 0.57 (0.51 - 0.63) 0.57 (0.50 - 0.63) 0.59 (0.56 - 0.63)
GSED DAZ at 6 months
30
GSED v1.0 Technical report
The package has been created to serve as an open-access population-level measure of ECD
for the global community that is comparable across countries. There are no fees nor royalties
involved when using it, and it was designed and tested to be linguistically and culturally
neutral. It includes: i) GSED SF and GSED LF measures as both a paper version and app; ii) GSED
measures Item Guides; iii) GSED measures Administration Manuals; iv) Adaptation and Translation
Guide; and v) Scoring Guide.
The GSED LF is organized into three streams which If necessary, a paper version of GSED measures may
group tasks that are likely observed together in order be used as long as administration rules are followed
to streamline and facilitate administration. The GSED and assessors have access to the accompanying
31
GSED v1.0 Technical report
materials specific to each measure. Alternatively, these recommendations to maximize the quality of data
forms can be printed. For the GSED LF, the kit must be generated are to use face-to-face administration of the
complemented by printing the components that are GSED SF and GSED LF with the GSED App.
in-built in the GSED App (i.e. images and booklets).
Self-administration and remote administration (e.g. The GSED HF (Box 4) and GSED PF (Box 5) are not yet
via phone) of the GSED SF are being tested. Current fully tested but can be made available on request.
GSED SF GSED LF
Large-scale data collection and
monitoring efforts
Research and programme
Primary purpose
evaluation
Research and programme
evaluation
Population-level
Score interpretation
NOT for individual-level interpretation
Binary (Yes/No)
Binary (Yes/No) + “Don’t Know”
Response options response option (to be used only
Only observed items qualify to be
when absolutely necessary)
scored as Yes
32
GSED v1.0 Technical report
GSED SF GSED LF
GSED LF Kit
GSED App: tablet or similar device
GSED App: tablet or similar device
Materials needed to Administration on paper:
administer instrument printed paper form with a tablet
Administration on paper:
or similar device for audio/visual
printed paper form and instruction
prompts (may also be printed)
manual
*GSED measures length and administration time are intended to be reduced by revisions planned once data from Round 2 countries are available (see Section 7). The same is true
when the adaptive version is available.
Dedicated training courses are required to learn Additionally, a series of self-paced online courses are
to administer the measures and to train others to in development. Training courses include resources,
administer them. These courses are available in such as the GSED Training Manual (available for
English in person or via Zoom upon request. The trainees), which provides guidelines for standardized
suggested length of the training is seven to 10 days (for administration to ensure that the same procedures
both measures); however, the training sessions may are used consistently by all assessors. Individuals
be tailored according to the users’ experience with administering the GSED must familiarize themselves
child development-related tools and depending on thoroughly with the guidelines and follow them
whether the GSED SF or GSED LF only is to be used. carefully.
33
GSED v1.0 Technical report
34
GSED v1.0 Technical report
Scoring
Once the GSED SF, GSED LF or both
Global Scales for
Early Development v1.0
calculates preliminary DAZ scores. These DAZ scores are
Scoring have been administered to one or more calculated in reference to same-age children from the
guide
children, the next step is to calculate Round 1 GSED data from both the GSED SF and GSED
the D-score and the DAZ for each child. LF in Bangladesh, Pakistan and United Republic of
This step is known as scoring. The Tanzania to estimate the age-conditional distributions
present section provides instructions of scores. Using this reference group, the D-scores of
on how to calculate these scores. Either of two methods new data can then be converted into standardized Z
may be used: scores (with a mean of 0 and SD of 1) at all ages.
1. online calculator. The online Shiny App While these preliminary norms are useful to adjust
(https://tnochildhealthstatistics.shinyapps.io/ scores to remove the age effect, DAZ scores with the
dcalculator/) is a convenient option for users current reference population should not be interpreted
not familiar with R. The app contains online as representative of any specific population or hold any
documentation and instructions; special normative importance. They are calculated on
a non-representative convenience sample. The main
2. R package dscore (https://CRAN.R-project. utility of these preliminary reference scores is to provide
org/package=dscore) is a flexible option with all estimates of the stability of GSED and the D-score over
the tools needed to calculate the D-score. It is an time, without artificially inflating correlations due to
excellent choice for users familiar with R and users the strong association between D-scores and age. They
who like to incorporate D-score calculations into a can also be used to provide a rough estimate of the
workflow. association between D-scores and other concurrent and
predictive measures.
Revisions to the GSED measures planned for 2024 will
not impact the interpretation of the D-scores calculated The DAZ is not currently an appropriate basis for
on previous versions. Scoring for previous versions of determining whether children are on or off track
the GSED instruments will continue to be available. developmentally. A Norms and Standards study will
While procedures for future versions may change, they be carried out by WHO (see Section 7) which aims to
will continue to produce scores on the same standard create a better estimate of how D-scores vary by age in
D-score scale and thus will remain comparable. a restricted population of children living without major
constraints on their development. This updated DAZ
Detailed instructions on how to calculate the D-score will be the focus of ongoing norms and standards work
and DAZ with the above methods can be found in the and provide a better justification of cut-off points.
GSED Scoring Guide.
35
GSED v1.0 Technical report
Global Scales for Global Scales for Global Scales for Global Scales for
Early Development v1.0 Early Development v1.0 Early Development v1.0 Early Development v1.0
Short Form Long Form Short Form Long Form
(caregiver-reported)
Item guide
includes the GSED
(directly administered)
Item guide
(caregiver-reported)
User manual
is accompanied by a
(directly administered)
User manual
36
GSED v1.0 Technical report
7. Next steps
The GSED package aims to provide a feasible and reliable means of collecting population-level
data on early development that could be used to monitor progress and policy-level changes, and
evaluate programmes and interventions. Data from the GSED will be useful for policy-makers
and governments in deciding priorities for funding. Global organizations will be able to use the
data for cross-country comparisons and trend analyses. The GSED measures are expected to
provide countries with an indication of how the youngest children are developing and become a
motivation to invest in and promote healthy development.
This Technical Report provides an overview of the data from all seven countries, but a final revision will
GSED creation and validation methodology with results be provided when data from the normative sample
from three countries. The GSED measures have been (under GSED 2.0) are available. These norms will then
shown to be valid and reliable for measurement of be used as references to set standards for on- and off-
child development up to 36 months at the population track development, including exploratory adjustment
level. Additional work is ongoing to expand evidence for moderate-to-late preterm babies. Thirdly, both
of global validity and reliability (in Round 2 countries conceptual and field work will address the adaptation
and inclusion of GSED in external studies). The analysis of GSED for individual-level identification of children at
related to the work conducted for field-testing of the risk for neurodevelopmental impairment.
adaptive testing approach as well as the psychometric
property description of the GSED PF across different Lastly, the D-score approach may be used to harmonize
cultures and contexts, and linkages of GSED SF with measurements across ages and instruments. Scores
ECDI2030 will be finalized and disseminated as soon from multiple instruments can be translated into
as available. Similarly, the results of the testing of the D-scores and compared to scores from a different
GSED measures within the context of programmatic instrument. Future work will evaluate the extension
evaluations and use of GSED HF within multi-topic of the D-score to instruments for children beyond 36
household surveys are expected to be made available months of age. Extending the age limit of the D-score
soon. will provide improved guidance to users on tracking
children’s development over time.
Moreover, the GSED project has been expanded, under
GSED 2.0, to answer further research questions. Firstly, As with other measures of child development, the GSED
additional validation evidence will be generated for: i) will continue to evolve as more knowledge is acquired
predictive validity until 5 years of age by following the about the capabilities and learning processes that occur
cohorts from Round 1 countries; and ii) assessing the in the earliest years of life and about environmental
association of the GSED measures with biomarkers influences on children’s early development. It is also
(including brain imaging). Secondly, a global age- expected that technical innovations (e.g. adaptive
normed distribution of GSED scores through 36 months testing) will facilitate future measurements. Consistent
will be created, based on a rigorous methodology, with the SDGs, equity is strived for by developing
among children raised with minimal constraints on measurements that provide data to enable government
their development. While the existing D-score package leaders and programme planners to implement
calculates DAZ relative to three possible references strategies that enable all children to reach their
(of the Round 1 countries validation data), they are developmental potential.
considered an interim reference. Round 2 countries
validation data will replace these references using
37
GSED v1.0 Technical report
References
1. Clark H, Coll-Seck AM, Banerjee A, Peterson S, Dalglish SL, Ameratunga S et al. A future for the world’s children? A WHO–UNICEF–Lancet
Commission. Lancet. 2020;395(10224):605-58.
2. Daelmans B, Darmstadt GL, Lombardi J, Black MM, Britto PR, Lye S et al. Early childhood development: the foundation of sustainable
development. Lancet. 2017;389(10064):9–11.
3. Lu C, Black MM, Richter LM. Risk of poor development in young children in low-income and middle-income countries: an estimation and
analysis at the global, regional, and country level. Lancet Global Health. 2016;4:e916-e22.
4. Black MM, Walker SP, Fernald LCH, Anderson CT, DiGirolamo A, Lu C et al. Early child development coming of age: science through the
life-course. Lancet. 2017;389(10064):77-90. doi:10.1016/S0140-6736(16)31389-7.
5. Rubio-Codina M, Grantham-McGregor S. (Predictive validity in middle childhood of short tests of early childhood development used in
large scale studies compared to the Bayley-III, the Family Care Indicators, height-for-age, and stunting: a longitudinal study in Bogota,
Colombia. PLOS ONE. 2020;15(4): e0231317.
6. Hoddinott J, Behrman JR, Maluccio JA, Melgar P, Quisumbing AR, Ramirez-Zea M et al. Adult consequences of growth failure in early
childhood. Am J Clin Nutr. 2013;98(5):1170-8. doi:10.3945/ajcn.113.064584.
7. Fink G, Peet E, Danaei G, Andrews K, McCoy DC, Sudfeld CR et al. Schooling and wage income losses due to early-childhood growth
faltering in developing countries: national, regional, and global estimates. Am J Clin Nutr. 2016;104(1):104-12.
8. United Nations. Sustainable Development Goal 4 (Education) [Online] (https://www.sdg4education2030.org/the-
goal#:~:text=Sustainable%20Development%20Goal%204%20(SDG%204)%20is%20the%20education%20goal,lifelong%20learning%20
opportunities%20for%20all.%E2%80%9D).
9. WHO Multicentre Growth Reference Study Group, de Onis M. WHO Child Growth Standards based on length/height, weight and age. Acta
Paediatrica. 2006;95(S450):7685.
10. Ertem IO, Krishnamurthy V, Mulaudzi MC, Sguassero Y, Balta H, Gulumser O et al. Similarities and differences in child development from
birth to age 3 years by sex and across four countries: a cross-sectional, observational study. Lancet Glob Health. 2018;6(3):e279-e91.
11. Fernandes M, Villar J, Stein A, Staines Urias E, Garza C, Victora CG et al. INTERGROWTH-21st Project international INTER-NDA standards
for child development at 2 years of age: an international prospective population-based study. BMJ Open. 2020;10(6):e035258.
doi:10.1136/bmjopen-2019-035258.
12. Jacobusse G, van Buuren S, Verkerk PH. An interval scale for development of children aged 0–2 years. Stat Med. 2006;25(13):2272–83.
13. Weber AM, Rubio-Codina M, Walker SP, van Buuren S, Eekhout I, Grantham-McGregor SM et al. The D-score: a metric for interpreting the
early development of infants and toddlers across global settings. BMJ Glob Health. 2019;4(6):e001724.
14. Ayoub CC, Fischer KW. Developmental pathways and intersections among domains of development. In: McCartney K, Phillips D, editors.
Blackwell handbook of early childhood development. Massachusetts: Blackwell Publishing; 2006:62-81.
15. Fischer KW, Bidell TR. Dynamic development of action and thought. In: Lerner RM, Damon W, editors. Handbook of child psychology:
theoretical models of human development. New York: John Wiley & Sons Inc.; 2006:313-99.
16. McCray G, McCoy D, Kariger P, Janus M, Black MM, Chang-Lopez S et al. The creation of the Global Scales for Early Development (GSED)
for children aged 0-3 years: combining subject matter expert judgements with big data. BMJ Global Health. 2023;8:e009827.
17. McCoy DC, Waldman M, CREDI Field Team, Fink F. Measuring early childhood development at a global scale: evidence from the Caregiver-
Reported Early Development Instruments. Early Childhood Research Quarterly. 2018;45:58–68.
18. Gladstone M, Lancaster G, McCray G, Cavallera V, Alves CRL, Maliwichi L et al. Validation of the Infant and Young Child Development (IYCD)
indicators in three countries: Brazil, Malawi and Pakistan. Int. J. Environ. Res. Public Health. 2021;18(11):6117.
19. Aylward GP. Brain, environment, and development: a synthesis and a conceptual model. In: Aylward GP, editor. Bayley 4 clinical use and
interpretation. London: Academic Press; 2020:1-19.
20. Gesell AL, Halverson HM, Amatruda C. The first five years of life; a guide to the study of the pre-school child. New York: Harper & Brothers;
1940.
21. Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press; 1980.
22. van Buuren S, Eekhout I. Child development with the D-score: turning milestones into measurement [version 1]. Gates Open Res.
2021;5:81 (https://doi.org/10.12688/gatesopenres.13222.1).
23. McCoy DC, Waldman M, CREDI Field Team, Fink G. Measuring early childhood development at a global scale: evidence from the
Caregiver-Reported Early Development Instruments. Early Child. Res. Quart. 2018;45:58–68.
24. Chang W, Cheng J, Allaire JJ, Sievert C, Schloerke B, Xie Y et al. shiny: web application framework for R. R package version 1.7.1. 2021;
(https://CRAN.R-project.org/package=shiny).
38
GSED v1.0 Technical report
25. Cavallera V, Lancaster G, Gladstone M, Black MM, McCray G, Nizar A et al. Protocol for validation of the Global Scales for Early
Development (GSED) for children under 3 years of age in seven countries. BMJ Open 2023;13:e06256227.
26. Salmona M, Lieber E, Kaczynski D. Qualitative and mixed methods data analysis using Dedoose: A practical approach for research
across the social sciences. Thousand Oaks: Sage Publications; 2019.
27. Huang CY, Tung LC, Chou YT, Wu HM, Chen KL, Hsieh CL. Development of a computerized adaptive testing of children’s gross motor
skills. Archives of Physical Medicine & Rehabilitation. 2018;99:512-20.
28. Jacobusse G, van Buuren S. Computerized adaptive testing for measuring development of young children. Statistics in Medicine.
2007;26(13):2629–38 (https://stefvanbuuren.name/publication/2007-01-01_jacobusse2007/).
29. Walker SP, Wachs TD, Gardner JM, Lozoff B, Wasserman GA, Pollitt E et al. Child development: risk factors for adverse outcomes in
developing countries. Lancet. 2007;369(9556):145-57.
30. De Ayala RJ. The theory and practice of item response theory. New York: Guilford Publications; 2013.
31. Thissen D. Reliability and measurement precision. In: Wainer H, editor. Computerized adaptive testing: a primer (2nd ed.). New Jersey:
Lawrence Erlbaum Associates Publishers; 2000:159-84.
32. Landis R, Koch G. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.
33. Fernald Lia CH, Prado E, Kariger P, Raikes A. A toolkit for measuring early childhood development in low and middle-income countries.
Washington, DC: The World Bank; 2017.
34. Lennon EM, Gardner JM, Karmel BZ, Flory MJ. Bayley Scales of Infant Development. In: Haith MM, Benson JB. Encyclopedia of infant
and early childhood development. Massachusetts: Academic Press; 2008:145-56.
35. Armstrong KH, Agazzi HC. The Bayley-III cognitive scale. In: Weiss LG, Oakland T, Aylward GP, editors. Practical resources for the mental
health professional, Bayley-III clinical use and interpretation. Massachusetts: Academic Press; 2010:29-45.
36. Stasinopoulos DM, Rigby RA. Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software.
2008;23:1-46.
37. Sudfeld CR, McCoy DC, Fink G, Muhihi A, Bellinger DC, Masanji H et al. Malnutrition and its determinants are associated with suboptimal
cognitive, communication, and motor development in Tanzanian children. J Nutr. 2015;145(12):2705-14. .
39
GSED v1.0 Technical report
Bibliography
Ages and Stages Questionnaire, third edition (ASQ-3) Griffiths Mental Development Scales – South African Version
Squires J, Bricker D. Ages and Stages Questionnaire (ASQ): a (GMDS-SA)
parent-completed child monitoring system, third edition. Luiz DM, Kotras N, Barnard A, Knoesen N. Technical manual of the
Baltimore (MD): Brooks Publishing Company; 2009. Griffiths Mental Development Scales – Extended Revised (GMDS-ER).
Amersham: Association for Research in Infant & Child Development;
Bayley Scales of Infant Development (Bayley)
2004.
Bayley N. Bayley Scales of Infant Development. San Antonio (TX):
The Psychological Corporation; 1969. Kilifi Developmental Inventory (KDI)
Abubakar A, Holding P, van Baar A, Newton CR, van de Vijver FJR.
Bayley Scales of Infant Development, second edition (Bayley-II)
Monitoring psychomotor development in a resource-limited setting:
Bayley N. Bayley Scales of Infant Development, second edition. San
an evaluation of the Kilifi Developmental Inventory. Ann Trop
Antonio (TX): The Psychological Corporation; 1993.
Paediatr. 2008;28(3):217–26. doi.org/10.1179%2F146532808X335679.
Caregiver-Reported Early Development Instruments (CREDI)
Malawi Developmental Assessment Tool (MDAT)
McCoy DC, Waldman M, CREDI Field Team, Fink G. Measuring
Gladstone M, Lancaster GA, Umar E, Nyirenda M, Kayira E, Van Den
early childhood development at a global scale: evidence from the
Broek NR et al. The Malawi Developmental Assessment Tool (MDAT):
Caregiver-Reported Early Development Instruments. Early Child Res
the creation, validation, and reliability of a tool to assess child
Q. 2018;45(4):58–68. doi.org/10.1016/j.ecresq.2018.05.002.
development in rural African settings. PLoS Med. 2010;7:e1000273.
Denver Developmental Screening Test (DDST) doi:10.1371/journal.pmed.1000273.
Frankenburg WK. The Denver Developmental Screening Test. J
Preschool Pediatric Symptoms Checklist (PPSC)
Pediatr. 1967;71(2):181–91. doi:10.1016/S0022-3476(67)80070-2.
Sheldrick RC, Henson BS, Merchant S, Neger EN, Murphy JM,
Denver Developmental Screening Test, second edition (DDST II) Perrin EC. The Preschool Pediatric Symptom Checklist (PPSC):
Frankenburg WK, Dodds J, Archer P, Shapiro H, Bresnick B. The development and initial validation of a new social/emotional
Denver II: a major revision and restandardization of the Denver screening instrument. Acad Pediatr. 2012;12(5):456–67. doi:
Developmental Screening Test. Pediatrics. 1992;89(1):91–7. PMID: 10.1016/j.acap.2012.06.008.
1370185.
Saving Brains Early Childhood Development Scale (SBECD)
Developmental Milestones Checklist (DMC) McCoy DC, Sudfeld CR, Bellinger DC, Muhihi A, Ashery G, Weary TE et
Abubakar A, Holding P, van de Vijver FJ, Bomu G, Van Baar al. Development and validation of an early childhood development
A. Developmental monitoring using caregiver reports in a scale for use in low-resourced settings. Popul Health Metr.
resource-limited setting: the case of Kilifi, Kenya. Acta Paediatr. 2017;15(1):3. doi: 10.1186/s12963-017-0122-8.
2010;99(2):291–7. doi.org/10.1111/j.1651-2227.2009.01561.x.
Stanford-Binet Intelligence Scales, fifth edition (SBIS-5)
Developmental Milestones Checklist II (DMC II) Roid GH. Stanford-Binet Intelligence Scales, fifth edition. Itasca (IL):
Prado EL, Abubakar AA, Abbeddou S, Jimenez EY, Somé JW, Riverside Publishing; 2003.
Ouédraogo JB. Extending the Developmental Milestones Checklist
Test de Desarrollo Psicomotor [Psychomotor Development Test]
for use in a different context in sub-Saharan Africa. Acta Paediatr.
(TEPSI)
2014;103(4):447–54. doi: 10.1111/apa.12540.
Haeussler IM, Marchant T. Elaboración y estandarización del Test
Dutch Development Instrument (DDI) de Desarrollo Psicomotor 2-5 años TEPSI [Development and
Laurent de Angulo MS, Brouwers-de JEA, Bijlsma-Schlösser standardization of the Psychomotor Development Test 2-5 years].
JFM, Bulk-Bunschoten AMW, Pauwels JH, Steinbuch-Linstra I. Rev Educ. 1989.
Ontwikkelingsonderzoek in de Jeugdgezondheidszorg. Het Van
Vineland Adaptive Behavior Scales (Vineland)
Wiechenonderzoek. De Baecke-FassaertMotoriektest. Assen: Van
Sparrow SS, Cicchetti DV. The Vineland Adaptive Behavior Scales. In:
Gorcum; 2008.
Newmark CS, editor. Major psychological assessment instruments,
Griffiths Mental Development Scales (GMDS) Vol. 2. Boston (MD): Allyn & Bacon; 1989:199–231.
Huntley M. Griffiths Mental Development Scales from birth to 2
years – manual. Oxford: Association for Research in Infant & Child
Development; 1996. doi.org/10.1037/t03301-000.
40
GSED v1.0 Technical report
41
GSED v1.0 Technical report
42 1 Cohort name is an internal coding representing original group, country and number.
GSED v1.0 Technical report
• Caregiver education
• Maternal health/chronic illness
• COVID-19 exposure
• Weight at time of assessment
• Infant length/child height at time
of assessment
Anthropometry • Child’s mid-upper arm Anthropometry form Child assessment 15
circumference at time of
assessment
• Child’s head circumference at
time of assessment
• Home environment (HOME only) HOME: caregiver report
HOME HOME: 45
• Play/stimulation/interactions & observation
between the child and other OR FCI FCI: 15
FCI: caregiver report
Family/home family members in the home
environment (HOME and FCI)
• Child neglect/abuse
CPAS† Caregiver report 15
• Exposure to violence or conflict
• Family resilience BRS† Caregiver report 1
• Family social support FSS† Caregiver report 5
Caregiver health
• Caregiver depressive symptoms PHQ9 Caregiver report 5
and well-being
* SES information on this form comes from the standard DHS multiple assets index; however, some sites have adapted the items to better fit their contexts.
† These measures have been slightly adapted for the purpose of the study.
‡ In a subsample (N=150).
§ In a subsample (all children of 24 - 41 months within the predictive validity subsamples in three countries).
43
GSED v1.0 Technical report
D-score scale
GSED United Republic of
Bayley-III domain Bangladesh Pakistan Combined
measure Tanzania
Receptive
SF communication
0.90 (0.87-0.93) 0.87 (0.82-0.90) 0.90 (0.86-0.92) 0.89 (0.87-0.91)
Receptive
LF communication
0.91 (0.88-0.93) 0.88 (0.84-0.91) 0.92 (0.89-0.94) 0.90 (0.88-0.92)
Expressive
CB communication
0.94 (0.92-0.96) 0.87 (0.82-0.90) 0.89 (0.85-0.92) 0.90 (0.88-0.91)
Expressive
SF communication
0.93 (0.91-0.95) 0.86 (0.81-0.89) 0.87 (0.82-0.90) 0.88 (0.86-0.90)
Expressive
LF communication
0.94 (0.92-0.96) 0.87 (0.83-0.90) 0.90 (0.87-0.93) 0.90 (0.88-0.92)
CB Fine motor 0.9 7(0.96-0.98) 0.96 (0.94-0.97) 0.96 (0.95-0.97) 0.96 (0.96-0.97)
SF Fine motor 0.96 (0.95-0.97) 0.95 (0.94-0.97) 0.95 (0.94-0.97) 0.96 (0.95-0.96)
LF Fine motor 0.97 (0.96-0.98) 0.96 (0.94-0.97) 0.96 (0.94-0.97) 0.96 (0.95-0.97)
CB Gross motor 0.98 (0.97-0.98) 0.97 (0.95-0.98) 0.98 (0.97-0.98) 0.97 (0.97-0.98)
SF Gross motor 0.97 (0.95-0.97) 0.96 (0.95-0.97) 0.97 (0.96-0.98) 0.97 (0.96-0.97)
LF Gross motor 0.97 (0.97-0.98) 0.96 (0.95-0.97) 0.97 (0.96-0.98) 0.97 (0.96-0.97)
CB Overall Bayley-III score 0.99 (0.98-0.99) 0.96 (0.95-0.97) 0.98 (0.97-0.99) 0.98 (0.97-0.98)
SF Overall Bayley-III score 0.97 (0.95-0.97) 0.96 (0.95-0.97) 0.97 (0.96-0.98) 0.97 (0.96-0.97)
LF Overall Bayley-III score 0.99 (0.98-0.99) 0.96 (0.95-0.97) 0.98 (0.97-0.98) 0.98 (0.97-0.98)
DAZ scale
CB Overall Bayley-III score 0.55 (0.44-0.65) 0.26 (0.11-0.41) 0.56 (0.44-0.66) 0.53 (0.47-0.6)
SF Overall Bayley-III score 0.37 (0.23-0.50) 0.18 (0.03-0.33) 0.40 (0.26-0.52) 0.35 (0.27-0.43)
LF Overall Bayley-III score 0.59 (0.48-0.68) 0.31 (0.16-0.44) 0.60 (0.49-0.69) 0.58 (0.52-0.64)
44
GSED v1.0 Technical report
D-score scale
GSED United Republic
Bayley-III domain Bangladesh Pakistan Combined
measure of Tanzania
CB Gestational age 0.11 (0.05-0.16) 0.16 (0.11-0.21) 0.21 (0.15-0.26) 0.17 (0.14-0.20)
SF Gestational age 0.06 (0.00-0.11) 0.13 (0.08-0.17) 0.14 (0.09-0.19) 0.12 (0.09-0.15)
LF Gestational age 0.13 (0.07-0.18) 0.12 (0.07-0.17) 0.18 (0.13-0.23) 0.16 (0.13-0.19)
CB PHQ9 category -0.05 (-0.10-0.01) -0.04 (-0.08-0.02) 0.02 (-0.03-0.07) 0.05 (0.02-0.08)
SF PHQ9 category -0.05 (-0.10-0.01) -0.01 (-0.06-0.04) 0.01 (-0.05-0.06) 0.01 (-0.02-0.04)
LF PHQ9 category -0.02 (-0.08-0.03) -0.08 (-0.13—0.03) 0.02 (-0.03-0.08) 0.07 (0.04-010)
CB CPAS** -0.05 (-0.1 - 0.01) -0.05 (-0.10 - 0.01) -0.01 (-0.06 - 0.04) -0.07 (-0.1 - -0.04)
SF CPAS** -0.03 (-0.08 - 0.02) -0.05 (-0.10 - 0.00) -0.01 (-0.06 - 0.04) -0.05 (-0.08- -0.02)
LF CPAS** -0.05 (-0.11 - 0.00) -0.03 (-0.07 - 0.02) 0.00 (-0.05 - 0.06) -0.06 (-0.09- -0.03)
* Spearman’s correlation: maternal education (no schooling, primary, secondary, higher), PHQ9 (none, mild, moderate, moderate-severe, severe depression).
** Scale created via a unidimensional 2-parameter IRT model.
*** For these variables a cross-national scale was not considered appropriate.
45
GSED v1.0 Technical report
D-score scale
United Republic
Bangladesh Pakistan Combined
of Tanzania
GSED CB DAZ at
baseline vs GSED 0.55 (0.48- 0.61) 0.57 (0.51- 0.63) 0.57 (0.50- 0.63) 0.59 (0.56 - 0.63)
DAZ at 6 months
GSED SF DAZ at
baseline vs GSED 0.53 (0.46 - 0.59) 0.56 (0.50 - 0.62) 0.58 (0.52 - 0.64) 0.57 (0.53 - 0.6)
DAZ at 6 months
GSED LF DAZ at
baseline vs GSED 0.38 (0.3 - 0.46) 0.43 (0.35 - 0.50) 0.38 (0.3 - 0.46) 0.48 (0.43 - 0.52)
DAZ at 6 months
46
FOR MORE INFORMATION PLEASE CONTACT: GSED v1.0 Technical report
Email: [email protected]
Website: https://www.who.int/teams/mental-health-
and-substance-use/data-research/global-scale-for-
early-development
page 48