Summary

The purpose of this project is to bring on a corpus platform which can be formalised depending on the research questiones of the language/linguistics researchers working on Turkish language and which is specific to the researcher, which is flexible and easily accessible, database-supported and within this scope, where the research questiones can be reported properly.

As it is known, text processing/corpus softwares developed for western languages, have some data processing features which are unique to those languages. The said softwares which can be used relatively for Turkish are able to present Turkish characters support. In spite of that the largest drawback of these softwares is that datas processed depending on the research questions, reports obtained cannot be stored and they cannot be organized expediently. Being privatized can be considered as one of these deficiencies of the mentioned softwares. For instance, oft-used packaged softwares like AntConc, WordSmith Tools, MonoConc Pro, TextStat, Nooj are able to provide outputs such as frequency, bound index, keyword notation, n-grams and collocations computing for the languages they are developed. Although relatively, these outputs can be exported as text documents and consequently the researcher come accross another unfiltered and non-decomposed text bank derived from text banks. Another drawback of the said softwares is that they are intrinsic to diversified operating systems. Eventhough the packaged softwares have versions for different operating systems like UNIX, Mac, Windows, generally this might make it difficult to access to the programmes and use it.

In fact today language/linguistics studies are in tendency to seek answers for more detailed research questions. It is a fact that restrictions of the corpus studies on Turkish language day by day created tendency to evaluate Turkish by means of corpus linguistics methods. Besides, to learn how to use these text processing/corpus softwares becomes a problem for the researchers all by itself. In this sense, user-friendly, easily accesible, flexible platforms which can be developed based on the research questions are needed. The user-friendly “Do It Yourself Corpora Platform” (DIYCP) which will be composed just by way of a browser can turn all the mentioned negations into the user benefit. On the other hand, it doesn’t seem practical for each linguist to get software information about corpus formation/processing etc.

Proposed “Do It Yourself Corpus Platform” will be formed by using practically and theoretically principles and methods of corpus linguistics which is more valid every passing day. In this context, this project aims to provide a flexible, easily accessible, database supported corpus platform which can be formed according to the research questions of the researchers, specific to the researcher and where the results of the research can be reported wholesomely instead of providing existing standard corpus outputs for the researchers.Project team experienced the parts of integrated system which will generate the basis of the mentioned corpus by bringing out successful results in different corpus projects. Herein, the project team intends to present the background to the other linguists for their benefits.

On the other hand in recent years the opportunities of the information technology in the world have increased the quality and quantity of the scientific studies carried out in the field of language/linguistics. Thanks to the proposed project, the studies carried out on Turkish is expected to increase in the quality and quantity. In this sense, the researchers will have opportunities to carry out corpus based studies in the sub-branches of linguistics that we encounter as part of applied linguistics such as grammar, local dialect, science of translation, historical grammar and lingual diversity, language learning and teaching, semantics, pragmatics, sociolinguistics, discourse analysis, poetics (McEnery vd., 2006). For instance, a researcher interested in semantics of a word or sentence will be able to determine the layers and texts that will be in his own corpus; enter the metadatas of the layers and texts; mark the language unit he wants by using the labels that he/she wants within the context of research questions. And eventually he will be able to report all of these properly.

Project Calendar :

Start Date: 01/06/2015

Due Date: 01/06/2016

This project is supported within the scope of TUBITAK 1005 – National New Ideas and Products Research Support Program
Project No:114E791