The project Swiss Text Corpus is part of an international research project with the aim of recording and bringing online a balanced snapshot of standard German of the 20th century. For this purpose texts of every kind in German (newspaper articles, advertisements, forms, manuals, guidebooks, popular technical literature, literature for young people, light fiction, fiction etc.) are digitised. The Swiss sub-project Swiss Text Corpus contains texts in German written by Swiss authors in the 20th century. This digital collection is structured in analogy to the partner projects in Germany, Austria and Italy (using the same formal, time and content criteria). It is a balanced representation of standard German vocabulary in Switzerland and can serve as a base resource for specifically Swiss lexicographical needs.
The jointly developed digital text corpus (together with our partner projects in Germany, Austria and Italy) is called Korpus C4. Its planned size is 80 million words. Since April 2009 a first version of Korpus C4 has been online (http://www.korpus-c4.org). However it has not reached its full size yet. For the first time there is a balanced corpus for 20th century standard German that takes into account regional variation and can be used for different linguistic research questions.
The corpus is currently being expanded to include Standard Swiss German texts from the 21st century. By the end of 2018, it will encompass an additional 3.8 million text words for the 21st century and then be expanded on a regular basis.
The Swiss Text Corpus was built by a research group of the Deutsches Seminar of the University of Basel and was funded mainly by the Swiss National Science Foundation. Since 2014 it has been hosted by the Schweizerisches Idiotikon with financial support of the Swiss Academy of Humanities and Social Sciences.