0% found this document useful (0 votes)
15 views3 pages

Exercise 2 en

Uploaded by

TOÀN VÕ VĂN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views3 pages

Exercise 2 en

Uploaded by

TOÀN VÕ VĂN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

LANGUAGE MODELLING

Lecturer: Doctor Bui Thanh Hung


Data Science Laboratory
Faculty of Information Technology
Industrial University of Ho Chi Minh city
Email: [email protected])
Website: https://sites.google.com/site/hungthanhbui1980/

Exercise 1:
Write a funtion to reverse the string
Reverse the “stressed” string (from end to beginning).

Exercise 2:
Write a funtion to extract characters from a string
From the string “MpyaktQrBoilk RCSahr”, extract the characters at positions
2,4,6,8,10,12,14,16,18,20 and combine them in that order to form a new string
(space characters are also counted, characters are numbered from 1).

Exercise 3:
Write a funtion to combine the two strings “Partrol” and “Car” to form a new string
“PatrolCar”.

Exercise 4:
Write a funtion to tokenize and count the number of characters in each word (assume
each word is separated by a space)
1. Tokenize the following sentence: “Now I need a drink, alcoholic of course, after
the heavy lectures involving quantum mechanics.”
2. Generate a list of the number of alphabetic characters in each word in the order in
which the word appears in the sentence.

Exercise 5:
Component Characters
1. Tokenize the following sentence: “Hi He Lied Because Boron Could Not Oxidize
Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.”
2. Extract the first character of the words at positions 1, 5, 6, 7, 8, 9, 15, 16, 19; for
the remaining words, extract the first 2 characters. Create a map from the extracted
character strings to the word positions in the sentence.
Exercise 6:
n-gram
1. Write a function to generate all n-grams from a given sequence (string or list).
2. Using the above function, generate word bi-grams and character bi-grams from
the following sentence: “I am an NLPer”

Exercise 7:
Set
1. Generate sets X and Y which are the sets of character bi-grams from the two
character strings “paraparaparadise” and “paragraph” respectively.
2. Generate the union, intersection and difference sets of X and Y
3. Check whether the bi-gram ‘se’ belongs to set X (Y) or not?

Exercise 8:
Generate sentences from template
Write a function that takes in 3 variables x, y, z and returns the string “y at x time is
z” Generates the following results with the values x, y, z x=”12” y=”Temperature”
z=22.4

Exercise 9:
Cipher String
From the characters of a given string, implement a function named cipher to encrypt
the string as follows:
• If it is lower-case English characters, convert it to a character with the code (219 –
character code).
• Keep the other characters the same.
Use the written function to encrypt and decrypt English character strings.

Exercise 10:
Typoglycemia
Given an English sentence consisting of words separated by spaces. Write a program
that does the following:
• For each word, keep the first and last characters the same, and randomly rearrange
the remaining characters (of course, words with less than 4 characters do not need
to do anything)
• Given a valid English sentence, for example “I couldn’t believe that I could actually
understand what I was reading: the phenomenal power of the human mind.”, run the
program to output the result.
Exercise 11:
A Markov chain
Upon reading an input text file, construct a dictionary that represents a Markov chain.
We will set the number of previous words we observe in this chain to two. Thus, for
every word, we record as a key the previous two words in the document, and the
value as the next word.
Example: The quick brown fox jumps over the lazy dog.
markovChain['The quick'] = ['brown']
markovChain['quick brown'] = ['fox']
… The created dictionary must contain a list of words for each value. In our program,
most dictionary values will only have a single two-word key, but since it is possible
that the same key might occur for two different values, we must maintain a list. (For
example, in “to be or not to be that is the question” the key ‘tobe’ has two possible
next words: [‘or’, ‘that’])
To create a new text document, we begin by selecting the first two words of the
original text file as the first two words of our new document. These first two words
are our initial key. The value associated with that key will be the third word in the
new text document. If there is more than one word in the dictionary value, one of the
words is chosen randomly. A new key is formed from the second and third words,
and its value added as the fourth word of the document. The newly chosen word is
added to the new text document, the key updated and so on. The algorithm cycles by
randomly picking a new word from the dictionary value based on the two-word key,
adding the value of the dictionary to the text document, and updating the key.

Replacement of a ' ' + Word2 # put back space


key word is that split()
always in the removed
following form.
Key = Word1 +
Word3 = markovChain[Key] # randomly chosen word in this list
Key = Word2 + ' ' + Word3

You might also like