MarketPsych Indices V3.3 User Guide
MarketPsych Indices V3.3 User Guide
MARKETPSYCH
INDICES
USER GUIDE
MARKETPSYCH INDICES VERSION 3.3
CONTENTS
About This Document 4
Intended Readership 4
In This Guide 4
Feedback 4
Chapter 1 Introduction 5
Overview 5
Contact Information 5
Documentation and Notifications 5
Chapter 7 Currencies 18
Currencies Assets 18
Chapter 10 Countries 24
Countries Assets 24
Countries TRMI Indices 26
Chapter 12 Cryptocurrencies 32
Cryptocurrency Assets 32
Cryptocurrency TRMI Indices 32
IN THIS GUIDE
This document describes the format of TRMI output images. It is applicable to both the live data and the archive files.
Except for cryptocurrencies, all assets referenced in this guide are generated using TRMI version 3.2. The
cryptocurrencies asset class is reflective of updates to TRMI version 3.3.
FEEDBACK
If you have any comments on this document, please contact the Thomson Reuters Machine Readable News team at
[email protected].
CHAPTER 1 INTRODUCTION
OVERVIEW
Thomson Reuters MarketPsych Indices (TRMI) analyze news and social media in real-time to convert the volume and
variety of professional news and the internet into manageable information flows that drive sharper decisions. The
indices are delivered as real-time data series that can easily be incorporated into your investment and trading decision
processes – quantitative or qualitative.
Three types of indicators are provided:
• Emotional indicators such as Anger, Fear and Joy
• Macroeconomic metrics including Earnings Forecast, Interest Rate Forecast, Long vs. Short
• Buzz metrics on the asset level, i.e., Buzz, and on market-moving topics for that asset, such as Litigation,
Regulatory Crackdown, Mergers and Volatility
The indicators are updated every minute for companies, sectors, regions, countries, commodities and energy topics,
indices and currencies. They can be translated directly into spreadsheets or charts that can be monitored by traders,
risk managers or analysts – or they can be plugged straight into your algorithms for low frequency or longer term asset
allocation or sector rotation decisions.
CONTACT INFORMATION
W[Period Length]_U[Period Length], where W denotes window length and U denotes update frequency.
AVAILABLE COMBINATIONS
Window Length Update Frequency FTP File Abbreviation
1 minute 1 minute W01M_U01M
1440 minutes / 24 hours 1 hour WDAI_UHOU
1440 minutes / 24 hours Daily, at 3:30 Eastern time WDAI_UDAI
EXAMPLE
To illustrate with the WDAI_UHOU data, one series of TRMI scores occurs with data collected between December 22,
2016 15:00:00 UTC and December 23, 2016 15:00:00 UTC, a 1440-minute / 24-hour window length. The next set of
scores would occur with data collected between December 22, 2016 16:00:00 UTC and December 23, 2016 16:00:00
UTC, because of the 60-minute update frequency.
If over that window there was no relevant text identified for a particular index, then the correct index value is “NA”, not
zero, and the index will appear blank. For example, this could happen for the carryTrade index on the AUD (Australian
Dollar) in news if there were no discussions of the carry trade with the Australian dollar found in news-based text over
the content collected in that window.
NA differs in meaning from true zero in that true zero represents the presence of text corresponding to positive and
negative values that add up to zero. In other words, a zero value reflects that relevant text was found and its sentiment
implications net to zero. In contrast, NA represents the absence of any relevant text and of any resultant measurement.
Note that when the Buzz is zero, this means that no values were detected for any of the indices and thus all index
values necessarily will be NA. See Chapter 5 for more information on Buzz.
Nonetheless, in practice unipolar indices are positive over 90% of the time, because language typically reflects positive
assertions.
NEWS
Reuters news is present in the entire historical news dataset, as are a host of mainstream news sources collected by
MarketPsych Data. During 2005, the archive began including Internet news content collected by LexisNexis. The
LexisNexis content is restricted to those from top international and business news sources, top regional news sources,
and leading industry sources. In 2017 MarketPsych Data added additional cryptocurrency specific sources to the
cryptocurrencies asset class feed.
SOCIAL MEDIA
The social media collection process is more diverse. It begins in 1998 with Internet forum and message board content.
Starting in late 2008, LexisNexis social media content was added. Starting in late 2009, tweets were included. Using
popularity ranks measured by incoming links, this includes generally the top 20% of blogs, microblogs, and other
financial social media content. MarketPsych Data also included content from hundreds of less-popular asset-specific
blogs and forums.
Note that the entire LexisNexis social media content set for top source ranks is included in the country (COU) social
media dataset, while other assets contain specifically selected social media content sources.
Other Components
See Incoming Sources section below on the dataType field and the Asset Code section for a list of assetClass values.
ASSET CODE
The assetCode field represents the code used to denote the asset scored, where the asset code type varies by Asset
Class.
WINDOW TIMESTAMP
As mentioned in Chapter 2, TRMIs are based on content aggregated over a trailing window. The windowTimestamp
field represents the endpoint of that period.
It is expressed in ISO-8601 format: “yyyymmddThh:mm:ss.000Z”, where “000” represents the milliseconds and “Z”
represents the time zone. The time zone is GMT/UTC.
INCOMING SOURCES
The dataType field represents the type of content source(s) on which these TRMIs are based.
SYSTEM VERSION
The systemVersion field represents an overall build for the MarketPsych Indices. Different versions may lead to different
scores as enhancements are made and bugs are fixed. It is thus used for version control.
It consists of a three-level version string preceded by an “MP” prefix denoting the MarketPsych scoring engine, e.g.,
“MP:3.2.09”.
The third digit of the System Version increases by one, for all asset classes, in conjunction with monthly updates to the
companies coverage list.
BUZZ
The buzz field represents a sum of entity-specific words and phrases used in TRMI computations. It can be non-integer
when any of the words/phrases are described with a “minimizer”, which reduces the intensity of the primary word or
phrase. For example, in the phrase “less concerned” the score of the word “concerned” is minimized by “less”.
Additionally, common words such as “new” may have a minor but significant contribution to the Innovation TRMI. As a
result, the scores of common words/phrases with minor TRMI contributions can be minimized.
A list of the covered companies and monthly changes in coverage is available on the MRN FTP site. See Chapter 14 for
information on the files and how to access them.
Many of these correspond to grouping in the hierarchical Thomson Reuters Business Classification (TRBC) system. In
descending order of hierarchy, the four levels are economic sector, business sector, industry group and industry. For
more information on TRBC, please see
http://thomsonreuters.com/products_services/financial/thomson_reuters_indices/trbc/. All TRBC codes below are extant
in the TRBC 2012 system.
The 61 assets can be characterized in a few ways. 26 are composed of US-based companies, while 17 are for non-US
companies and another 18 are global. 21 resemble equity indices and are filtered chiefly by market cap ranks, while the
other 40 are composed according to a combination of TRBC code, domicile, and market cap above $100 million USD.
Note that because these groups are calculated with approximate point-in-time composition, all the asset codes used
here were invented for this application. They are not RICs.
MPTRXUSTEC Technology 57
MPTRXUSCOM Telecommunications Services 58
MPTRXUSUTL Utilities 59
CMPNY_AMER
Companies included in the Americas (CMPNY_AMER) sub-grouping include those domiciled in the following
countries.
Asset
Country
Code
AR Argentina
BB Barbados
BR Brazil
CA Canada
CL Chile
CO Colombia
CW Curacao
EC Ecuador
JM Jamaica
KY Cayman Islands
MX Mexico
PA Panama
PE Peru
PR Puerto Rico
PY Paraguay
US United States
VE Venezuela
VG Virgin Islands (British)
CMPNY_APAC
Companies included in the Asia and Pacific (CMPNY_APAC) sub-grouping includes those domiciled in the following
countries.
Asset
Country
Code
AU Australia
AZ Azerbaijan
BD Bangladesh
CN China
FJ Fiji
GE Georgia
HK Hong Kong
ID Indonesia
IN India
JP Japan
KG Kyrgyzstan
KR South Korea
KZ Kazakhstan
LK Sri Lanka
MH Marshall Islands
MN Mongolia
MO Macau
MY Malaysia
NZ New Zealand
PG Papua New Guinea
PH Philippines
PK Pakistan
SG Singapore
TH Thailand
TW Taiwan
VN Vietnam
CMPNY_EMEA
Companies included in the Europe, Middle East, and Africa (CMPNY_EMEA) sub-grouping includes those domiciled
in the following countries.
Asset
Country
Code
AE United Arab Emirates
AT Austria
BE Belgium
BH Bahrain
BM Bermuda
CH Switzerland
CY Cyprus
CZ Czech Republic
DE Germany
DK Denmark
EG Egypt
ES Spain
FI Finland
FR France
GB United Kingdom
GG Guernsey
GI Gibraltar
GR Greece
HR Croatia
HU Hungary
IE Ireland
IL Israel
IM Isle of Man
IQ Iraq
IS Iceland
IT Italy
JE Jersey
KE Kenya
KW Kuwait
LB Lebanon
LU Luxembourg
MA Morocco
MC Monaco
MT Malta
NG Nigeria
NL Netherlands
NO Norway
OM Oman
PL Poland
PT Portugal
QA Qatar
RO Romania
RU Russia
SA Saudi Arabia
SE Sweden
SK Slovakia
TG Togo
TR Turkey
TZ Tanzania
UA Ukraine
ZA South Africa
ZM Zambia
In the live data, the systemVersion value will increase with each monthly update.
CHAPTER 7 CURRENCIES
CURRENCIES ASSETS
There are 45 currencies, listed in the table below.
Asset Asset
Code Currency Code Currency
ARS Argentine Peso NOK Norwegian Krone
AUD Australian Dollar PKR Pakistani Rupee
BDT Bangladeshi Taka PEN Peruvian Sol
BTC Bitcoin PHP Philippine Peso
BRL Brazilian Real PLN Polish Zloty
CAD Canadian Dollar RON Romanian Leu
CLP Chilean Peso RUB Russian Ruble
CNY Chinese Yuan Renminbi SAR Saudi Riyal
COP Colombian Peso SGD Singapore Dollar
CZK Czech Koruna ZAR South African Rand
EGP Egyptian Pound KRW South Korean Won
EUR Euro SEK Swedish Krona
HKD Hong Kong Dollar CHF Swiss Franc
HUF Hungarian Forint TWD Taiwanese Dollar
INR Indian Rupee THB Thai Baht
IDR Indonesian Rupiah TRY Turkish Lira
IRR Iranian Rial USD U.S. Dollar
ILS Israeli Shekel UAH Ukrainian hryvnia
JPY Japanese Yen AED United Arab Emirates Dirham
MYR Malaysian Ringgit GBP United Kingdom Pound Sterling
MXN Mexican Peso VEF Venezuelan Bolívar
NZD New Zealand Dollar VND Vietnam Dong
NGN Nigerian Naira
Asset
Code Commodity
RAPOIL Canola
CTTL Cattle
COC Cocoa
COF Coffee
COR Corn
COT Cotton
HOGS Hogs
ORJ Orange Juice
POIL Palm Oil
RICE1 Rice
SOIL Soybean Oil
SOY1 Soybeans
SUG Sugar
WHT Wheat
Asset
Code Commodity
ALU Aluminum
BIOF Biofuels
COA Coal
CPPR Copper
CRU Crude Oil
BIOETH Ethanol
MOG Gasoline
GOL Gold
HOIL Heating Oil
IRN Iron
JET Jet Fuel
LNG Liquefied Natural Gas
NAP Naphtha
NGS Natural Gas
NKL Nickel
NSEA North Sea Oil
PALL Palladium
PLAT Platinum
RAREE Rare Earths
SLVR Silver
STEE Steel
URAN Uranium
CHAPTER 10 COUNTRIES
COUNTRIES ASSETS
There are 187 countries or regions being scored. Note that these scores are based on text not only of the
country/region, but also based on cities and other place names inside those geographies.
The asset codes correspond to Thomson Reuters topic codes for geopolitical units. The table below is sorted by
Country/Region.
CHAPTER 12 CRYPTOCURRENCIES
CRYPTOCURRENCY ASSETS
TRMI Cryptocurrencies data covers more than 150 active cryptocurrencies. Cryptocurrencies were selected for
inclusion based on listed market capitalization and/or technological significance. Coverage includes all cryptocurrencies
present in the coinmarketcap.com ranking of the top 100 cryptocurrencies by market capitalization starting in November
2017. Coverage is updated monthly to include new entrants from the top 100 list. Once covered, cryptocurrencies
remain in the feed until trading activity ceases.
A list of the covered cryptocurrencies and monthly changes in coverage is available on the MRN FTP site. See Chapter
14 for information on the files and how to access them.
For example, for a given company (assetCode), content source (dataType) and datetime (windowTimestamp), let Buzz0,
Buzz-1, ..., Buzz-(N-1) and TRMI0, TRMI-1, …, TRMI-(N-1) represent the corresponding Buzz and TRMI minutely data over
the trailing N minutes. Then the Buzz-weighted average TRMI over the trailing N-minute window length may be explicitly
calculated as:
1
See “Daily Update Frequency Anomaly” section above for exceptions to when WDAI_UDAI data does not cover a
24-hour window.
ACCESS
Production and most trial users are granted access to the FTP site. Trial users who convert to paid customers should
have their credentials upgraded to full-history access. Users should contact their sales specialist or account manager to
obtain login credentials.
Clients may use their credentials to connect via plain FTP and also secure FTP (FTPS) via explicit FTP over TLS, using
TLS version 1.2.
The FTP site is available at ftp://mrn-ftp.thomsonreuters.com, or 54.243.148.106. The site is accessible via FTP client
only. Please set your FTP client to passive mode, although in some cases active mode will work instead. FTP/S is
available on port 21, and data ports are 60022-60100.
LIVE FILES
Each live file contains all the scores for an asset class, for a certain window length and update frequency.
Directory Structure
/TRMI_LIVE/{Asset Class}/{Time Abbreviation}/[LastTwoHours/LastTwoDays/Older]
{Asset Class} can take one of the following values:
o CMPNY: all individual companies
File Naming
TRMI.Live.{Asset Class}.{Time Abbreviation}.{Time Period}.{System Version, converted}.txt, such that
Fields
See Chapter 5 for information on fields that are generic to all TRMI asset classes.
id
assetCode
ticker: only for Companies. Note that ticker is generally not point-in-time, although it can vary across System
Version values.
windowTimestamp
dataType: “News”, “Social”, or “News_Social”
systemVersion, excluding its prefix
buzz
Sort Order
assetCode
dataType
ARCHIVE FILES
Each live file contains all the scores for an asset class, for a certain window length and update frequency.
Using aggregation time, although single-minute differences between the live feed and archive buzz remain, they affect
well under 1% of buzz scores in the minute-window data.
Archive Backfills
As might be expected for data derived from numerous textual sources, the processing of MarketPsych Indices
undergoes intermittent changes to its content inputs and filtering algorithms. These changes generally are introduced in
the live data and in monthly archive updates, on a pro forma basis. Full-history TRMI archives are backfilled to reflect
these content changes on an occasional basis, often in tandem with a major version upgrade.
For users who prefer to work with fewer files, archive files are also being provided in a small number of large files,
according to the capacity of what can be zipped. These files are stored in the “Bulk” subdirectory.
Bulk files contain scores on an entire asset class over a given time period. Standard files may cover only a single asset,
depending on the window length and update frequency.
Directory Structure
Because of the large number of Companies assets, the directory structure for Companies adds two or three extra levels
for organization.
/TRMI/{Asset Class}/{Time Abbreviation}/[Historical/Recent/Bulk] [[/{Country}/{Economic Sector code}] /{Asset Code }]
File Naming
TRMI.Archive.{Asset Class}.[[.{Country}.{Economic Sector code}] /.{Asset Code }]{Time Abbreviation}.{Time
Period}.{System Version, converted}.zip/txt, such that
{Asset Class} as per above
{Country}.{Economic Sector code} as per above. This applies only to CMPNY WDAI_UHOU and WDAI_UDAI
Historical/Recent files.
{Asset Code }. This level is only present for CMPNY W01M_U01M non-Bulk.
{Time Abbreviation} as per above
{Time Period} can take one of the following values, depending on its directory:
o Bulk: “yyyy-yyyy” for multi-year data, as per “Bulk File Details” table above
o Historical:
“yyyy” format for annual data.
“yyyyQq” format for quarterly data. This applies only to hourly (WDAI_UHOU) data.
o Recent: “yyyymm” format for monthly data
{System Version, converted}. The systemVersion value, “MP:3.0.0” will be converted to “0300”. Note that this
string’s length reaches five characters when the third number in the systemVersion exceeds 9, e.g., “3.0.10” will
be converted to “03010”.
Fields
See Chapter 5 for information on fields that are generic to all TRMI asset classes.
id
assetCode
ticker: only for Companies. Note that ticker is generally not point-in-time, although it can vary across System
Version values.
windowTimestamp
dataType: “News”, “Social”, or “News_Social”
systemVersion, excluding its prefix
buzz
non-generic indices: sentiment, optimism, et al.
VERSION UPDATES
The file system version is updated in one of three digits. The first digit of the version is updated (e.g., from version 2 to
3) approximately every three years due to advancements in the NLP and overhauls in source constituents. Such major
versions are run in production in parallel while customers upgrade over 12 to 18 months. For the most recent major
versions, version 2 NLP was frozen in March 2014, and version 3 NLP was frozen in March 2017.
The second digits in the version number are updated if an urgent patch is made. For example, between versions
3.0/3.1 and 3.2/3.3 a source-level change affecting media acquisition led to excessive buzz in some assets. The
change was identified, its effect was quantified, and a minor version update restores the feed to regain consistency from
that source. The second digits are also updated when a forked sub-feed is launched. For example, the cryptocurrency
sentiment feed was forked off of v3.0 and launched as v3.1 due to minor NLP changes to account for cryptocurrency
source-specific linguistic differences.
The third digits of the version number are updated monthly to reflect the inclusion of IPOs, company additions,
corporate name changes, mergers, and buyout activity. These minor version updates include additional corporate
aliases and changes in corporate aliasing. Updates reflected in second and third version number digits do not reflect
NLP or text analysis alterations.
Please contact your Account Manager or Sales Specialist if you are interested in viewing these third-party identifiers for
companies and have a requisite license. Thomson Reuters will contact the identifier issuer(s) to verify the license(s).
Directories
Companies
/TRMI/CMPNY/[BASIC/CUSIPISIN/SEDOL/CUSIPISINSEDOL]/
Notes:
The third level directory is permissioned according to user’s combination of licenses for third-party identifiers.
Access will be given to exactly one such directory. By default, users are granted access to the BASIC directory.
Cryptocurrencies
/TRMI/CRYPTO/[BASIC]/
Companies
TRMI.Companies.[BASIC/CUSIPISIN/SEDOL/CUSIPISINSEDOL].{System Version, converted}.txt
Notes:
{System Version, converted}. The systemVersion value, “MP:3.0.0” will be converted to “0300”. Note that this
string’s length reaches five characters when the third number in the systemVersion exceeds 9, e.g., “3.0.10”
will be converted to “03010”.
Fields:
PermID: Thomson Reuters organizational identifier.
companyName
countryOfDomicile: two-character ISO 3166-1 country code
TRBCEconomicSector: plain-text description of Thomson Reuters Business Classification (TRBC) economic
sector
status: “active” if the PermID may be scored in a live feed. Otherwise, “inactive”.
RIC: main RIC for this company.
ticker: Because ticker is in the score files, it shall be populated for all companies. Private companies shall
have a MarketPsych-designated ticker beginning with “PVT-“.
marketMIC: ISO 10383 code for market or exchange identification. Value may differ from similar value
maintained by London Stock Exchange.
CUSIP: only available in files with “CUSIPISIN” in the file name
ISIN: only available in files with “CUSIPISIN” in the file name
Values on inactive companies shall attempt to represent their most recent values, including after delisting. Some values
may be blank.
Cryptocurrencies
TRMI.Crypto.[BASIC].{System Version, converted}.txt
Notes:
{System Version, converted}. The systemVersion value, “MP:3.0.0” will be converted to “0300”. Note that this
string’s length reaches five characters when the third number in the systemVersion exceeds 9, e.g., “3.0.10”
will be converted to “03010”.
For cryptocurrencies, the System Version begins at MP:3.1.05.
Fields:
assetCode
cryptocurrencyName
status: “active” if the cryptocurrency may be scored in a live feed; otherwise “inactive”.
Changes File
The second is a file of changes since the previous System Version.
Companies
TRMI.Companies.Changes.{System Version, converted}.txt, such that
{System Version, converted}. The systemVersion value, “MP:3.0.0” will be converted to “0300”. Note that this
string’s length reaches five characters when the third number in the systemVersion exceeds 9, e.g., “3.0.10” will
be converted to “03010”.
File contents:
First line: “TRMI Companies list for systemVersion [systemVersion] was created on [yyyy-mm-dd].”
Further lines:
o “PermID [xxxxxxxxxxx], [companyName], became active.”
o “PermID [xxxxxxxxxxx], [companyName], became inactive.”
Cryptocurrencies
TRMI.Crypto.Changes.{System Version, converted}.txt, such that
{System Version, converted}. The systemVersion value, “MP:3.0.0” will be converted to “0300”. Note that this
string’s length reaches five characters when the third number in the systemVersion exceeds 9, e.g., “3.0.10” will
be converted to “03010”.
File contents:
First line: “TRMI Cryptocurrencies list for systemVersion [systemVersion] was created on [yyyy-mm-dd].”
Further lines:
o “assetCode [xxxxxxxxxxx], [cryptocurrencyName], became active.”
o “assetCode [xxxxxxxxxxx], [cryptocurrencyName], became inactive.”