Projects
- English
- Türkçe
2025-... TDK Ulusal Türkiye Derlemi Projesi (Turkish Language Association National Turkish Corpus Project)
Role: Academic Coordinator/Consultant for the Spoken sub-Corpora
Project Team: DISCORE members
News: https://www.instagram.com/p/DF8BxtmCwwd/
2025-.... English Medium Instruction Corpus (EMIC) Project
The English-Medium Instruction Corpus (EMIC) is a multimodal and interdisciplinary spoken academic corpus being developed at Middle East Technical University (METU) by the METU EMI Research Network who are also members of DISCORE.
EMIC offers an extensive and principled collection of naturally occurring EMI classroom interactions across a wide range of disciplines, instructional formats. Spanning data collected between 2021 and 2025+, EMIC currently includes full transcriptions of over 90 hours of video-recorded EMI classroom interactions. The speech event types represented in EMIC include lectures, seminars, and active learning environments such as labs and studios, capturing the diversity of instructional discourse and pedagogical practices in real-time. The corpus currently includes data from more than 30 departments across six major academic disciplines: (Arts and Humanities, Design and Architecture, Educational and Applied Sciences, Engineering and Technology, Natural and Life Sciences, Social Sciences and Management. These disciplinary categories were defined through a cross-referencing of course codes, instructional practices, and disciplinary discourse norms, acknowledging the complexity and intersectionality of disciplinary classifications. (see Işık-Güler, Turan, Şimşek-Tontuş & Köse, 2024)
With its comprehensive scope, EMIC is comparable to other available academic corpora (e.g., BASE, MICASE, ELFA, EmiBO) and differs in its recency and unique understanding of representativeness, interactivity, and multimodality. EMIC data in its entirety is transcribed using Jeffersonian Transcription Conventions. A selected portion is currently being annotated to include gestures, spatial movement, and other non-verbal modes (see EMIGeCo, Şimşek-Tontuş, 2025). Our approach to the transcription and annotation of the data enables not only corpus linguistic analyses but also multimodal conversation analysis (CA) of academic communication. The EMIC corpus offers researchers and educators insights into: (a) real-time EMI instructional language and pedagogical discourse, (b) disciplinary variations in EMI classroom interaction, (c) patterns of student participation and lecturer questioning, (d) turn-taking, word count, and interactional density, (e) Integration of multimodal teaching resources. Through its robust data architecture and interdisciplinary design, EMIC aims to contribute to a deeper understanding of EMI discourse and provides an empirical foundation for pedagogy, policy, and further corpus-based research in multilingual higher education.
2022- 2025 (42 months) ....TUBITAK 1001 (The Turkish Scientific and Technological Research Projects Funding Program)
/Project code 121K227: Developing an in-service training framework for faculty members in English Medium Universities based on classroom interaction data.
Role: Project Manager (Research team from METU, Bosphorus, Bilkent and Kadir Has Universities)
This project investigated the teaching and learning procedures at English Medium Instruction (EMI) universities in Turkey in an attempt to explore the effective and ineffective practices via corpus-assisted discourse analysis. 150 hours of video recordings were be obtained from over 40 different disciplines/ undergraduate programs of 10 different faculties in five different higher education institutions that use English as the medium of instruction in Turkey. Surveys with students and stimulated recall protocols with instructors will be used as supplementary data to the English Medium classroom interaction data. Based on the effective and ineffective practices obtained from evidence-based authentic data, an EMI classroom interaction framework were developed. The outputs will include practical suggestions for institutions that ongoingly implement or wish to implement EMI as their institutional language policy in Turkey and abroad.
(Grant awarded: 1.826.158,00 TL)
For more information please visit METU-EMI Network's website
2022-2024 The Corpus of Turkish Youth Language (CoTY) (Esranur Efeoğlu, Advisor: Hale Işık Güler)
The COTY was compiled with the aims of:
-describing the architecture of Turkish youth language in terms of its macro and micro structures,
-exploring the socio-pragmatic dynamics and patterns in this dyadic and multi-party interaction,
- identifying and discussing the discursive strategies employed with regard to co-construction of specific interactional events,
- highlighting potential linguistic and discursive trends in contemporary Turkish.
The current version of The Corpus of Turkish Youth Language (CoTY) comprise 168,748 tokens of 24,736 word types within the single domain of informal conversation exclusively among friends. The corpus has 123 unique speakers (62 females and 61 males) and consists of 49 conversations which correspond to 26 hours 11 minutes of interaction. The language spoken is Turkish along with occasional code-switches to English, as well as some words or expressions from French, Russian and Japanese. For more information please visit: https://www.esraefeogluozcan.com/coty/
2020-2023 Turkish Social Media Influencer Corpus (SMIC) (Hülya Mısır, Advisor: Hale Işık Güler)
SMIC includes multimodal transcriptions of 30 vlogs of six Turkish macro influencers (12hs 37 mins) and contains 120,906 tokens. The corpus was constructed using the ELAN software by which text, semiotic and multimodal elements were annotated by creating hierarchically inter-connected tiers. The vlog genre characteristics and the speakers’ translanguaging practices were examined through the SMIC. Through ad hoc annotation, patterns of translanguaging practices where influencers seamlessly blended languages and created hybrid linguistic repertoires weer identified. The findings of the project illustrate the co-occurrence of standardized linguistic codes and non-standardized forms, organic evolution of lexical innovations, such as net neologisms and genre-related digital lexis, phonetic transliterations, idiosyncratic expressions, and marketing terminology in Turkish influencer talk.
2018-2022 Corpus of Student Written Language Project/Türkçe Söz Varlığı Projesi (TSVP)
Aim: Building a 'Turkish Nation-wide Corpus for Student Written Language' for Grades 2 thru 12 in Turkey. (Project Lead by: The Ministry of National Education/Turkey, (TTKB) Talim Terbiye Kurulu Başkanlığı)
A news article about the project: https://www.aa.com.tr/tr/egitim/meb-1-milyon-ogrenciden-turkce-soz-varli...)
Scientific Advisor & Academic Coordinator for the province of Ankara (Hale Işık-Güler; Esranur Efeoğlu-Özcan)
2019 - 2022 Call Center Interaction Project (Lead: Hale Işık Güler, Researchers: Merve Bozbıyık; Esranur Efeoğlu)
Conversation Analysis (CA) informed investigation of Call Center operator-client interactions in Turkey.