Turkish Segmentation Rules for Better TM Matching in OmegaT

OmegaT’de Daha İyi ÇB Eşleşmesi İçin Türkçe Dilimleme Kuralları

Kesme/İstisna seçili olanlar (Exception selection is checked):

Örüntü Öncesi sütunu (‘pattern before’ column):
:
;
[\.\?\!][‘”]
Örneğin,
Ancak,
Ama,
Bu nedenle,
Ek olarak,
Bu çalışmada,
Bu makalede,

Örüntü Sonrası sütunu (‘pattern after’ column):
\s

Kesme/İstisna kutusu boş olanlar (exception selection is empty):

Örüntü Öncesi sütunu (‘pattern before’ column):
[0-9]:[0-9]
[1-9]\.
vb\.
vs\.
vd\.
Dr\.
Doç\.
Yrd\.
Prof\.

Örüntü Sonrası sütunu (‘pattern after’ column):
\s

Here, [0-9]:[0-9] is used for not breaking the time data of 12:21 PM and such. [1-9]\. rulse for numbered items. [\.\?\!][‘”] rules is used to break at the quotation marks at the end of a sentence. The “Bu nedenle,” rule, is to break at common conjunctions at the begining of sentences to increase TM match possibility. You should add similar frequent conjunctions (“However,”, “In addition to this,”, “In this study,”, … etc.) for English also.

Advertisements

Translation Memory of Dates for the Years 2010-2020

We have many defacto formats to write dates in every language; and Turkish is no exception. But translators in this era need standards to keep up the breathless pace. Any non-standard term or phrase cause a TM hit miss, hence the time loss.

The following translation memory in TMX format includes English translations of approximately 14000 pieces of dates, written in various Turkish date formats. You can ensure correct standard translation of dates by importing this into your CAT software. For example, all of these dates of 21.12.2012, 21/12/2012, 21 Aralık 2012 have single corresponding translation of December 21, 2012 in this TMX.

You can download this Turkish-English translation memory here: tren-tarihler-2010-2019.rar

ps: This TMX has many-to-one relation, so it’s not suitable to use in other direction. For, English-to-Turkish version please check the website.

Informatics Terminology Database (English-Turkish)

Turkish Language Service Provider EC Tercume has just relased a new terminology database in informatics domain. This database in TMX format can be downloaded here:

http://www.sipsakceviri.com/indir/en/6/5/1/terminology_database/Informatics_Terminology_Database

It’s stated that it can be used bidirectionally, after importing it into your own Computer Aided Translation (CAT) software (i.e. Deja Vu X2, Trados, Memq, OmegaT, etc. …).

These computer terms are from the informatics glossary prepared by Turkish Informatics Association.

 

Regular Translation Memories for Turkish English Translators

EC Tercume has just introduced its new translation resources page. It currently publishes some of the internal translation memories, and dictionaries in translation industry standard TMX format. Currently there are 30.000 segments, which were released last week.

This Turkish-English Translation Memories can be found at this EC Tercume Trasnlation Resources web page: http://www.sipsakceviri.com/indir/en/ The page itself is also bilingual.

 

 

 

BabelUnity – Translation Resources News

In BabelUnity we plan to inform you about new resources on translation business. What you will find here as follows in short:

  • Translation Memories
  • Terminology Databases
  • News on Computer Aided Translation tools
  • Innovative approaches in translation workflow
  • Standardized glossaries and dictionaries
  • Translation Quality performance metrics
  • Online linguistic tools
  • Articles on Natural Language Processing (NLP)

Enjoy…