Jargon Buster Memoq 2019 Web
Jargon Buster Memoq 2019 Web
Gábor Ugray
memoQ founder
[alignment]
Often, when you get a document to translate, you receive a
set of previously translated documents along with it, or you
can find matching pages on websites. Those may contain a
a
whole lot of translations you could use either as TM match-
es or through concordancing. Problem is, the TM needs
segments, and you have whole documents. Alignment means
splitting source and target documents into segments, and al-
gorithmically finding out which target segment corresponds
to which source segment. Not a straightforward thing to do!
Advanced CAT tools have a function to automate the bulk of
the work and help you correct the rest.
» See also LiveAlign
[analysis]
Before you accept a job, you need to know how much text
there is to translate. But you already have TMs with past
translations, so you also want to know how much new text
there is, and how many fuzzy or exact matches you can ex-
pect. That’s what analysis does it compares your text against
your TMs and corpora, and gives you a neat breakdown ex-
pressed in segment, word and character counts.Analysis is
sometimes used interchangeably with statistics, which has
absolutely no fancy scientific meaning in this context.
a
the concise CAT dictionary | 5
[API; application programming interface]
A nerdy term to say that a program allows other programs to
use its functions, just as if a human was clicking its buttons. If
a program has no API, then it’s impossible to integrate it with
other systems, and humans end up with tendonitis from lots
of completely unnecessary clicking. It is particularly impor-
tant to make sure a cloud-based tool you’re considering has
an API. If it does not, you may get locked in, with no easy way
to retrieve your data if you want to switch.
a
[automatic concordance]
If you want to see how an expression has been translated,
concordance gives you exactly that. But a good CAT tool can
do more by looking at the source segment it can find the
parts that occur in a lot of other segments, and point you to
them. That’s effectively saying, “Hey! These phrases seem to
be all over the place, it’s probably a good idea to concordance
them right now!” And if you’re very lucky, those phrases also
occur as entire source segments, and the TM or corpus will
give you their translation right away.
a
the concise CAT dictionary | 7
[AutoPick]
I don’t know about you, but I hate to type numbers as I’m
translating, and I also hate to lose the flow to select, copy
and paste something over from the source. In addition to
numbers, source segments also contain other things that
can go straight into your translation tags, non-translatables,
terms. If you just press and release Ctrl (in memoQ), AutoPick
highlights all the special entities in your source,lets you cycle
through them with the arrow keys, and insert the next one
with a single keystroke. It also re-formats numbers to match
your target language’s conventions.
[auto-propagation]
Almost every text you translate has repetitions segments
that occur multiple times. With some technical texts these
may even make up the majority. One responsibility of TMs
is to pick these up, but CAT tools can do even better. If you
enable auto-propagation, then as soon as you confirm a seg-
ment, the tool immediately populates all the other occurrenc-
es in the document and marks them as confirmed.
a
[auto-translatables]
Most texts (particularly technical, legal and financial) have
recurring entities that follow some pattern. Think of a date
05/27/1978, to be “translated” as 27.5.1978. Auto-translation
rules allow you to create regular expressions that recognize
and transform such patterns in an incredibly flexible way.
a
the concise CAT dictionary | 9
{B}
[BiDi]
Short for bidirectional text, i.e., the right-to-left scripts used to
write languages like Arabic, Hebrew or Farsi. It’s bidirectional
because numbers and some proper names in Latin letters
are written from the left within the overall right-to-left text
flow.
[bilingual Excel]
» See multilingual Excel
[bilingual RTF]
A specially formatted Word document that contains a trans-
lation’s source and target segments, and often also com-
ments and other information. This way, a translator can share
work in progress with a client or a domain expert who has
no CAT tool, only a word processor. CAT tools, in turn, can
read an edited bilingual RTF with changes and comments,
and bring the updates back into the translation environment.
Some old formats relied on hidden text and were very easy to
ruin with a single misplaced edit. These days it’s more com-
mon to see a table with three or more columns.
[CCJK]
Stands for the three East Asian languages, Chinese, Japanese
and Korean. There are two Cs because Chinese can be writ-
ten either with simplified characters (PRC and Singapore) or
traditional ones (Hong Kong and Taiwan).
[check out]
When a translator or reviewer checks out an online project
in memoQ, the tool downloads the assigned documents and
sets up a correctly configured working environment. This
eliminates saving email attachments and going through an
error-prone series of steps, saving time and ensuring all pro-
ject participants work with the right resources and settings.
[comment]
In a CAT tool you can mark entire documents, source or tar-
get segments, or just a short part within a segment. You can
add a remark, or use the function to simply highlight some-
thing. This way you can communicate with other translators,
reviewers or even clients, or just bookmark something for
yourself to return to later. You can keep comments private,
or choose to export them as part of the finished translation.
[concordance]
A function of translation memories and LiveDocs corpora
that allows you to search for a word or expression, retrieving
all translated segments where it occurs. This is nothing short
of a small wonder, allowing you to “Google” existing transla-
tions. memoQ also highlights the expression’s most probable
translation within the target segments, just like Linguee, but
from your own private data.
[confirmed]
» See segment status
[context ID]
Usually a short machine-readable text that identifies a string
that belongs to a specific place in an app or a game. It’s cru-
cial to distinguish between, say, “Open” on a label (translated
into German as “Offen”) or on a button (translated as “Öff-
nen”). The TM stores the ID and returns a context match if
the same text occurs with the same ID later.
on pages...
Where does “misericordia”
occur in the Bible?
Where does
“liner regression” occur in my
translation memory? & How
was it translated?
TRANSLATION
MEMORY
CONCORDANCE GIVES YOU
YOU CAN
Search for words or expressions
Use wildcards
CONCORDANCE IS AWESOME
Because it lets you “Google” past translations. You don’t need to
research the same expression again
[context match]
» See TM match types
[custom fields]
» See metadata
[dictation]
Technology that allows you dictate text, instead of typing it
on a keyboard. Dictation is preferred by a minority of trans-
lators; they, however, report a productivity boost of 50% or
more over typists.
[DITA]
Short for Document Information Typing Architecture, DITA
is exactly as unsexy as it sounds, but tremendously useful.
It is an open standard that defines how to structure and
reuse content in CMS systems. The format is based on
XML, and if your CAT tool supports it, you can deal with a
huge share of the content coming from several different
CMSes.
[DTP; desktop publishing]
DTP tools include the likes of FrameMaker and InDesign, used
to produce professionally typeset printed documents. In the
industry DTP typically means an activity after translation and
review. Translated text looks really bad in the original format
unless you adjust the typesetting to accommodate longer
paragraphs, different special characters, or even a complete
left/right directional swap.
[ELM license]
» See CAL license
[exact match]
» See TM match types
[export]
» See file format filter
{F}
[file format filter]
One key benefit of CAT tools is that you always translate in
the same familiar editor, regardless of the file format your
text came in. That means CAT tools must somehow extract
the text from all the different file formats. The component
that does this is called a file format filter it “filters” text from
all the other stuff in the file. Bringing the text into the CAT
tool is called importing a file; retrieving the translation in the
original format is called exporting it.
Every filter comes with its own options that affect how it
works (“Do you want to extract the hidden text from this
Word file?”), and for some formats, notably XML, these set-
tings make an enormous difference.
Some weird
file
Don’t worry!
I have a FILTER
for all of them!
TRANSLATE BEAUTIFUL
ME! CLEAN
TEXT
When you IMPORT a file, you use a FILTER in your
translation tool to extract the SOURCE TEXT.
Some weird
YOUR
EXPORT TRANSLATION file*
[font substitution]
Many file formats, particularly from DTP tools, tend to use
fonts that look really good, but cannot draw a lot of special
characters. If your target language happens to have a lot of
these, the translated file will look ugly, or skip letters outright.
Font substitution is a function of file format filters that tweaks
the file, replacing the original font with one that has the right
glyphs for your target language.
[fragment assembly]
If there is no exact or fuzzy match for a segment in your TMs
or corpora, a lot of the segment’s parts may still have a match
from a term base, non-translatables or auto-translatables.
Fragment assembly takes all of these and just replaces them
with their target equivalents, giving you a patchwork segment
that might still take a lot less work to brush up than translat-
ing it from scratch.
[fuzzy]
First impressions are correct here this is one of the fuzzi-
est words in the entire industry jargon. Initially a fuzzy match
was used in contrast to an exact match from a TM you get
a translation that is fully legit, except it’s the translation of
something more or less different from your current source
segment. Just how different is expressed by the fuzzy match
rate. Eventually fuzzy matching was also extended to termi-
nology, where it can be pretty useful if your language is in the
habit of changing letters in the middle of words.
» See also TM match types
[glossary]
» See term base
[homogeneity]
A garden-variety analysis tells you how much of your text has
fuzzy or exact matches from your existing TMs and corpora.
But even if you start with an empty TM, as you progress in
a document, you will start getting matches from your own
new translations! The homogeneity function quantifies these
“internal” matches as part of the analysis, going beyond the
mere detection of repetitions.
[internationalization; #i18n]
Localizing a product entails more than just translation it in-
cludes things like showing dates in the right format, displaying
temperatures in Celsius vs. Fahrenheit, writing first name last
or vice versa, and the like. It requires extra effort to enable a
product to do all this; that effort is called internationalization.
[interoperability]
The ability of CAT tools to understand each other’s formats
and APIs, and to support standard formats well, so that peo-
ple using software from different manufacturers can work
together without drama, tears and major tragedies.
{K}
[KWIC; keyword in context]
A layout for concordance results where the search term is
in the middle, with preceding and following text on both sides,
row after row.
{L} [leverage]
To “leverage” past translations is fancy talk for the tool gives
me what I already translated, I don’t need to do it again. “Lev-
erage” as a noun is fancy talk for the extent that happens if a
tool promises to enhance leverage, you should expect to type
fewer new characters while translating the same text.
[light resources]
This is memoQ lingo for things like non-translatables,
segmentation rules, and a lot more. As opposed to heavy
resources, which mean TMs, LiveDocs corpora and Muses,
light resource have much less data. But while in many other
tools they are “settings,” in memoQ they are resources they
have a name; they can be exported and imported; you can re-
use them in different projects; and they can be shared online
through memoQ server.
[LiveAlign]
memoQ’s approach to alignment, where you simply throw
a bag of source and target documents at the tool, and start
translating. The tool aligns fist the documents, then their seg-
ments, and indexes them in the background so they immedi-
ately give you lookup results in the editor. There will inevitably
be errors, but you only spend time fixing those that actually
give you matches.
[LiveDocs corpus]
memoQ’s alternative to TMs. While a TM holds a homoge-
nous mass of translated segments in no particular order, a
LiveDocs corpus preserves entire translated documents, but
gives you the same kinds of matches. If you want to check
the context of a past translation, you can jump directly to
the full document from the translation editor. TMs have one
big advantage they only store every translation once. If your
content has a lot of repetitions, LiveDocs can become cum-
bersome.
[localization; #l10n]
Sometimes used as a synonym for translation, localization
entails a bit more it includes showing dates in the right for-
mat, money in the right currency, and the like. In order to lo-
calize a product, it must first enable doing all this, which is
called internationalization.
[LSC]
» See automatic concordance
[master TM]
» See working, master and reference TM
[match rate]
» See TM match types
[MatchPatch]
A memoQ function that improves fuzzy matches from a
TM or a corpus by replacing the phrases that are different,
relying on term base matches, auto-translatables and
non-translatables.
[metadata; metainformation]
Additional details about a piece of stored information, like
“who translated this, when, and for what client” in the case of
a translation unit, or “what source did this come from and
did the client approve it” for a term base entry. CAT tools
usually support a set of standard fields like the ones above,
but also allow users to define their own custom fields and
categories for more detail.
Why do you want to bring such changes back into the CAT
tool? To make sure your TM contains only final, approved
translations. Otherwise you may end up with trash in, trash
out.
[MQXLIFF]
An XLIFF file than contains additional, non-standard infor-
mation specific to memoQ, such as segment statuses, QA
warnings, LQA errors, comments etc.
[multilingual Excel]
An Excel file with source text, translations, comments and
other information. Sometimes it’s a small, innocuous file with
two columns for source and target text, but we have reliable
eyewitness reports of files out there with 50,000 rows and 25
columns for different languages. Such monstrous files often
come from computer games.
[MultiTrans XML]
The XML-based format used by SDL MultiTrans to export
and import terminology. Although not an official standard, it
is widely used for terminology exchange even between com-
pletely different systems.
[Muse]
One of the resources powering predictive typing in memoQ. A
Muse is built by analyzing existing TMs and corpora, with the aim
of extracting words and phrases that correspond to each other
in the two languages. When you translate a new source segment,
the Muse looks at the phrases in it and gives you a list of sugges-
tions that might be the translation of a phrase in the source text.
[non-breaking space]
A special character that looks like a normal space but acts
differently because it doesn’t allow a line break to intervene
between the word on its left and right. A non-breaking space
is a must before a colon in French (you don’t want “ ” to start a
line), and between a number and a unit of measurement (you
don’t want “cm” to start a line either). In most word proces-
sors you can type it by pressing Ctrl+Space.
[non-printing characters]
Spaces, non-breaking spaces, tabs, and newlines. Also, a few
other invisible characters used in bidirectional text. The point
is, they are all blanks and you normally don’t see them. Just
like Word, CAT tools have an option to show them, so that you
don’t accidentally type two spaces, or a normal space where a
non-breaking one is warranted.
[non-translatables]
Somewhat similar to terms, except that they are identical in
all languages. Most often they are brand names that are to
be left alone.
[online project]
A memoQ project that stores documents in a server, allowing
multiple people to simultaneously translate and review them,
working together in real time. Online projects also make it
really simple to assign work because they eliminate sending
files around in email, and they prevent trivial errors because
they make sure everyone in the project uses the right settings
and resources.
[online TM]
A translation memory shared through a server or in the cloud.
They allow organizations to store their translations central-
ly (and always find them when they are needed). They also
make sure that translators working together in real time on
different parts of a project get to see each other’s translations
instantly, ensuring their work will be consistent.
[on-the-fly filter]
A function present in all advanced CAT tools (though usually
called differently) that allows you to filter the segments of the
document you’re working in. It is “find” on steroids you can
quickly skim segments that contain a particular word or ex-
pression, and make changes if you changed you mind about a
translation. It’s also useful to eliminate, say, segments that are
already confirmed so you can just focus on what needs work.
[penalty]
Some translations are to be trusted less than others. They
may be too old, coming from the wrong translator, or applica-
ble to a different client or domain. A penalty means reducing
the translation’s natural match rate so it gets ranked lower
than others.
[perpetual license]
A license that allows you to use a piece of software you have
purchased forever. Perpetual licenses typically belong to a
specific version of the software to make sure the developer
stays in business, it needs to finance its work by charging an
upgrade fee for new versions.
[pre-translated]
» See segment status and pre-translation
[pre-translation]
A function that processes every segment in a document and
automatically inserts the best translation from the project’s
TMs and corpora.
TRANSLATION
MEMORY
A single click?
1 Put documents in hat
On a hat?
2 Click pre-translate
3 Check documents
[Project home]
The screen in memoQ where you can add or remove docu-
ments to translate, pick TMs, term bases, Muses and other
resources, and fiddle with your working environment in count-
less other ways, whenever you have an urge to procrastinate.
[pseudo-translation]
Translation is the fun part, but if you’re dealing with complex
file formats from esoteric systems, you need to make sure
your work will also make it back to the original system at the
end and not crash your client’s multimillion-dollar flagship
app right before the deadline. Pseudo-translation allows you
to test the whole process without actually translating an-
ything. It replaces source text with funny characters, words
spelled backwards, and made-up stuff to inflate strings.
[repetition]
Any segment that occurs at least twice in your source text
is a repetition. They are a delight because usually you need
to translate the same thing only once. That’s how repetitions
gave rise to auto-propagation and exact matches. And for
the cases where the same thing must be translated different-
ly, you have context IDs to differentiate.
[RTL; right-to-left]
» See BiDi
[segment]
When we translate text, we almost always proceed sentence
by sentence. If you try to get to the bottom of it, however,
nobody really knows what a sentence precisely is. Also, when
you translate a single word in a bullet-point list, is that a sen-
tence? CAT tools decided to sidestep this can of worms alto-
gether, so we speak about segments instead.
Generally (though not always), a segment is the essential unit
of translation you proceed segment by segment in the editor,
and you store the translation of segments in the TM. Your
TM and corpus matches also refer to the segment you are
translating at the moment.
Segments are born with the active cooperation of regular
expressions, in a special incarnation called segmentation
rules. As all regex, they look gibberish to the uninitiated eye,
but they basically elaborate a single theme “If you find sen-
tence-final punctuation like a period followed by one or more
spaces followed by a capital letter, start a new segment right
there. Except if the last word before the period is a known ab-
breviation.” Segmentation normally happens quietly, behind
the scenes, when your CAT tool’s file format filter imports a
source document.
No matter how elaborate, segmentation rules will inevitably
get it wrong from time to time. To help get around this, CAT
tools have a function to join neighboring segments, and to
split a single segment into two.
[segmentation]
» See segment
[simultaneous translation and review]
A function of online collaborative CAT tools that allows sev-
eral people to edit the same document together in real time.
You can think of this as Google Docs on steroids, customized
for the two-column, source-and-target world of translation.
SMA;
support & maintenance agreement
While a license agreement entitles you to use a piece of soft-
ware, the SMA that usually goes along with it grants you ac-
cess to support from a human and to new versions of the
software. Normally, perpetual licenses have a one-off fee;
SMA, in contrast, is charged on an annual basis.
SENTENCES SEGMENTS
PRE-TRANSLATION
WHY THE WEIRD NAME?
COMPUTER
what is a
“sentence” even? DUNNO
PROFESSOR
DUNNO
etc. ca.
ABBREVIATIONS
Dr. Dec.
Don’t worry!
REGULAR EXPRESSIONS
[split segment]
» See segment
[subsegment leverage]
This is a strong contender for the industry’s most fuzzy word,
right there after fuzzy itself. When a CAT tool vendor uses it,
they basically want to say, “We’re doing something extremely
advanced and useful here.” In prosaic terms it means lookup
results and suggestions (aka leverage) that refer to a short-
er bit of the source segment. In all earnestness, often the
machinery that generates such matches really is pretty ad-
vanced, extrapolating knowledge from past translations in
ways that are far from obvious.
[statistics]
» See analysis
[string]
In developer-talk, a string is a sequence of characters. When
you translate the user interface of a software application or a
game, all the chunks of text that appear in different places are
called “strings.” Typically, a string shows up as a single seg-
ment, and it has an associated context ID to disambiguate it.
[structural tags]
» See tags
[synchronize]
When you work in a memoQ online project, you have the
option not to save every translated segment in the server
immediately, but instead gather a lot of changes locally, and
exchange news with the server in one go. That action is called
synchronizing the project.
[tag error]
Some inline tags are optional maybe that bold formatting in
the source text is not needed in your translation at all. Others,
however, are mission-critical if they represent N in the sen-
tence “You have N enemies left”, then if you omit the tag, the
translated game will crash and the outrage of gamers will put
your client out of business. To avoid such an outcome, the QA
module of CAT tools gives you a tag error right in the editor,
and won’t let you deliveryour translation until you fix it.
[tag insertion mode]
Tags can be a real nuisance as you translate you need to
think about where they must go, you need special shortcuts
to insert them, and generally, they throw you out of the flow.
So in memoQ you can just focus on translating a segment’s
text first, then activate tag insertion mode and sprinkle your
target segment with tags in the right places.
[tag soup]
An unfortunate but all too frequent situation when a docu-
ment that you have just imported is chock full of tags that
are unexpected, pointless, or both. This most often happens
with Word documents that an OCR tool produced from a PDF
because it wanted to make sure everything is shown exactly
in the right place, down to a hundredth of a millimeter. You
can make things better by tweaking the OCR tool’s settings,
running a cleanup macro like Dave Turner’s CodeZapper, or
pestering your CAT tool’s developers to do something about
it. Only the first two have been conclusively shown to work.
[TC match]
A bit of a schizophrenic creature that cannot completely
make up its mind whether it’s a match rate or a segment sta-
tus. It rears its head in the complicated scenario when you
need to translate a source segment that contains tracked
changes, which you need to reproduce in the translation too.
A TC match is basically an exact match for the original form
of the source segment, pretending those tracked changes
were never put in there.
» See also track changes
[term base]
A “database” or a component of CAT tools that allows users
to store important words/expressions and their equivalents. It
saves the hassle of researching the same term twice. It also
helps translators adhere to terminology mandated by their
clients, or at least stay consistent with themselves. In fact, it’s
indispensable for consistency if different people are translat-
ing the same large text simultaneously, collaborating online
from different locations.
Often used interchangeably with glossary, but they’re not
quite the same. A glossary is usually just a word list in two
languages, while term bases have structure and metadata
too.
[term extraction]
A function of advanced CAT tools that looks at new source
text or a body of existing translations and extracts important
words and expressions. The output typically contains a lot of
“false positives,” but it allows a translator to research impor-
tant terms before starting to translate, include them in a term
base, and make sure they are then translated both correctly
and consistently.
[terminology database]
» See term base
I know I researched
this before...
3 No need to ask!
Your glossaries are
I’ve highlighted all these
your most valuable
other terms for you
ASSET already!
A TERM BASE IS WAY MORE THAN AN EXCEL
SHEET!
1 METADATA
Just a fancy way of saying:
For each term, you know which client it is for, where you found
it, etc.
2 SPEED
CAT tools have auto-complete. If you’ve stored a term, you
never need to type it out again.
3 SUGGESTIONS
No need to remember and search.
Your CAT tool finds and highlights every term in the sentence
you are translating.
4 VALUE
You can share it with your customer. They will love it and hire
you again!
[TMX; Translation Memory eXchange]
An XML-based format to, well, exchange translation memo-
ries. The adoption of this standard was a crucial step in the
industry towards interoperability, and at this point virtually
all tools support it.
[track changes]
Many regulated industries (like pharma) are required by law
to track every change they make to crucial documents, such
as the usage instructions and side effects of a medicament.
Not only that, but when they sell to multiple markets, they
must reproduce all these changes in translated materials too.
As a translator or LSP, the only way to achieve this without
losing your sanity and/or getting sued out of your profits
is if your CAT tool has special functions to both cope with
change-tracked documents and preserve the benefits of
TMs, term bases, QA and everything else.
» See also TC match
[translation unit]
In a CAT tool, you translate documents segment by seg-
ment. Once you store the translation of a source segment
in your TM, the two together, plus some metadata like “who
translated this and when,” are bundled up and transmogrified
into a translation unit.
[two-column Excel]
» See multilingual Excel
[UTF-8]
» See Unicode
{V} [vendor]
In our industry, a person or a business that offers translation
services to other persons or businesses.
[view]
Since CAT tools are apparently great fans of deconstructiv-
ism and start their day by tearing text into chunks called seg-
ments, you might as well max this out by slicing and dicing the
living daylight out of those poor segments. As in “I have just
turned this User Guide into 1300 segments and pre-trans-
lated them. Now give me those segments that have no TM
match, occur at least twice, and have the words ‘squinting
squirrels’ in them. Also, show me each segment only once,
and order them alphabetically.” That is the kind of thing that
views allow you to do.
TRANSLATION
MEMORY
4 THE NEXT TIME
H e ll o ! TRANSLATION
MEMORY
BUT WAIT IT
GETS BETTER
The translation memory also gives you
sentences that are only similar FUZZY
and shows you all occurrences of a MATCHES
particular expression
CONCORDANCE
{W}
[web editor]
A component of CAT tools that allows translators and re-
viewers in an online project to work from a browser, without
installing software on their own computer. A web editor is to
traditional desktop tools as Google Docs is to Word, except
advanced CAT tools offer both options (even within the same
project) and don’t force you to choose between two incom-
patible companies.
[word count]
» See analysis
[working, master and reference TM]
Keeping stuff organized is an age-old challenge. If you don’t
get it right, you end up with trash in, trash out. One way to
stay on top of data within a translation project is to desig-
nate one TM as the master (translations coming from there
get precedence over others); another one as the working TM
(new, as-yet unrevised translations get stored there, keeping
the master pristine); and the rest as reference (to fill in the
gaps that the master does not cover).
a
the concise CAT dictionary | 3