0% found this document useful (0 votes)
3 views

Week_09_json

The document provides an overview of structured data with a focus on JSON, including its syntax, data types, and comparison with YAML and XML. It also covers key concepts such as serialization, schema validation, and the limitations of JSON. Additionally, it includes mid-term statistics and a brief introduction to a lab assignment involving CSV to JSON conversion.

Uploaded by

Yash Soni
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Week_09_json

The document provides an overview of structured data with a focus on JSON, including its syntax, data types, and comparison with YAML and XML. It also covers key concepts such as serialization, schema validation, and the limitations of JSON. Additionally, it includes mid-term statistics and a brief introduction to a lab assignment involving CSV to JSON conversion.

Uploaded by

Yash Soni
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Structured

Data JSON
deep dive
COMP 1238 - week 9
COMP1238 - Intro to Data Management
Week 9 - Tue, Oct 29
Structured data
JSON deep dive Start at:
16:05
AtKlass code:
COMP 1238 – Introduction to
Data Management

Starting Zoom recording


First part was about our main tools of the trade

● Keyboards
● Text editors
● Git & GitHub
● Command line & Terminal window
Second half of the course

Is an introduction to
Databases and SQL
SQL is a language used to
interact with databases
Agenda for today
● Mid-term stats
● What is Structured Data
● Spreadsheets and JSON as examples
● JSON Data Types
○ strings, numbers, booleans arrays and objects (and null)
● YAML as an extension of JSON
● XML & Binary representations - brief mention

Objective: Introduce JSON and the basic principles behind it and other
text based formats.
Midterm stats
Midterm stats
Class Average - 88%
Median - 92%
Bottom 3 questions - git commit
What is the function of the 'commit' command in Git?
A. To delete files from the repository
B. To save changes to your local repository (commit)
C. To upload changes to the remote repository (push /
sync)
D. To compare differences between files
Bottom 3 questions - character escaping \#
In Markdown we sometimes want to "escape" some
characters so that they are displayed literally, as is and not
interpreted as special. How would one escape the # character
in Markdown so it's not interpreted as the beginning of a title?
A. ##
B. \#
C. <#>
D. escape("#")
Bottom 3 questions - word processor
Which of the following tools is better described as a "word
processor" rather than a text editor?
A. Vim
B. Microsoft Word
C. VS Code
D. Notepad
Structured data
Structured data …
is the boring type
of data, like your
spreadsheets
Ann, 5 years
old
Some terminology
Each card in the catalogue is called a Record or an Entry

A form or a card has individual Fields such as “Name” or “Age”

In computer world, the structure, like that defined by a form is


called data model or schema
Spreadsheet terminology
We often mix the terms
Field / Column / Property

Row / Entry / Record


JSON -why talk about it
● Probably the most popular format for data exchanged on the
web
● It’s very simple - a great example to start from
JSON stands for JavaScript Object Notation
and looks like this
{
"title": "Romanian Furrow",
"published": 1933,
"author": {
"name": "Hall, Donald J.",
"born": 1903
}
}
JSON object contains key-value pairs

Key Key

{"name": "Hall, Donald J.", "born": 1903}

Value Value
Value types - scalars & objects

A scalar data type, or just scalar, is


{ any non-composite value.
Generally, all basic primitive data
"stringExample": "Hello World", types like numbers and booleans
are considered scalar.
"numberExample": 3.14,
"booleanExample": true,
"nullExample": null
}
JSON array

{
"arrayExample1": [1, 2, 3],
"arrayExample2": ["a", "b", 3, false],
"arrayExample3": [{ "x": 3, "y": 4}, { "x": 7, "y": 9}],
}
Array of Objects
[
{
"title": "Romanian Furrow",
"published": 1933
},
{
"title": "The Great Gatsby",
"published": 1925
},
{
"title": "The Grapes of Wrath",
"published": 1939
}
]
Key-value data - other names
Data structures similar to JSON object that holds key-value
pairs have many names:
● Associative array
● Dictionary
● Map
● Hash Map / Hash Table
JSON limitations
● Very strict - must use quotation marks and commas
● Writing JSON by hand is notoriously annoying with all the {“
”},
● No support for comments

VSCode can format JSON files and find problems


Or use a JSON validator like jsonlint.com
JSON vs. YAML
{ title: Romanian Furrow
"title": "Romanian Furrow", published: 1933
"published": 1933,
# Comment - part of YAML
// Comment - non-standard JSON
syntax
"author": {
author:
"name": "Hall, Donald J.",
"born": 1903 name: Hall, Donald J.
} born: 1903
}
json.org yaml.org
YAML
● YAML is like JSON with quotation marks and
brackets omitted and replaced with indentation
● YAML is almost a superset of JSON - in most (but
not all) cases code that accepts YAML will also
accept JSON
● YAML is way more convenient to write by hand
● Often used for config files like the _confit.yml for
GitHub Pages
● Rarely used for storing or sending large amounts of
data
https://yaml.org
https://en.wikipedia.org/wiki/YAML
JSON vs. XML
{ <?xml version="1.0" encoding="UTF-8" ?>
<root>
"title": "Romanian Furrow",
<title>Romanian Furrow</title>
"published": 1933, <published>1933</published>
"author": { <author>
"name": "Hall, Donald J.", <name>Hall, Donald J.</name>
<born>1903</born>
"born": 1903
</author>
}
</root>
}
Serialization
To “serialize” data is to convert it from
some representation in RAM into a file
format like JSON, YAML or XML.

It becomes a series of bytes

The reverse process of going from JSON


to RAM is called “parsing” or “de-
serializing”
Schema validation
A JSON document can contain
whatever information

Sometimes we want it to conform to


some specific data model or schema.

A schema validator is like the


government official that looks at the
form and tells us that the zip code is
missing and the date should be in a
different format

Schema is described in a separate file


Binary serialization formats
JSON is wasteful - can we do better?
[
{
"title": "Romanian Furrow",
"published": 1933
},
{
"title": "The Great Gatsby",
"published": 1925
},
{
"title": "The Grapes of Wrath",
"published": 1939
}
]
Binary used to be the only option
XML and JSON came along when disks and network bandwidth became
much cheaper

All images and videos are stored in binary formats

gRPC & ProtoBuf - popular modern binary alternative to JSON


Lab intro
● Local VSCode encouraged but not mandatory, github.dev ok
● Lab:
https://github.com/kamrik/IntroText/blob/main/labs/lab9-json.md
● Convert a CSV to JSON & YAML
● Using this tool https://www.bairesdev.com/tools/json2yaml/
● You can use github.dev editor, but local VSCode install is
encouraged
Links and references
● Structured vs Unstructured Data (1.5 minute video)
● json.org
DRAFTS

You might also like