Schweizer Textkorpus

The Swiss Text Corpus is dedicated to the standard German language of Switzerland in the 20th and 21st centuries. The texts are chosen based on criteria concerning form, content and time of publication. The digital collection currently includes 23.5 million words (tokens). The text corpus constitutes a balanced representation of Swiss German vocabulary and can serve as a basis for lexicographical questions specific to German-speaking Switzerland. Further texts from the 21st century will be added by 2025.

The project Swiss Text Corpus was developed as part of the international research project Korpus C4. The association with partner projects from Germany, Austria and Italy aimed at recording and bringing online a balanced snapshot of standard German of the 20th century. For this purpose, texts of every kind in German (newspaper articles, advertisements, forms, manuals, guidebooks, popular technical literature, literature for young people, light fiction, fiction etc.) were digitised. A first version of Korpus C4 has been online since April 2009, albeit not yet having reached its full size. It is nevertheless the first balanced corpus for 20th century standard German that considers regional variation and can be used for different linguistic research questions..

The Swiss Text Corpus was built by a research group of the Deutsches Seminar of the University of Basel and was funded mainly by the Swiss National Science Foundation. Since 2014 it has been hosted by the Schweizerisches Idiotikon with financial support of the Swiss Academy of Humanities and Social Sciences.

Information

Project

Footer