Home Korpus

From the very beginning, the structure of the Swiss Text Corpus was designed to cover the vocabulary of 20th century standard German in Switzerland as widely as possible. The corpus consists of printed and typewritten texts of very different production and publication forms. It is a balanced according to time, form and content criteria:

  • Text class: formal criterion
  • Quarter of century: time criterion
  • Domain: content criterion

With this structure, the Swiss Text Corpus is a balanced data resource for all kinds of linguistic research questions.

The Swiss Text Corpus contains the following amounts of text (according to the criteria mentioned above):

 

documents/
words
1900-1924

documents/
words
1925-1949

documents/ words
1950-1974

documents/
words
1975-1999

total

functional texts

1'042

1'170'099

1'465

1'267'731

969

1'193'200

1'417

1'087'395

4'893

factual texts

167

1'450'562

433

2'052'909

804

1'954'529

276

1'891'373

1'680

fiction

188

1'116'820

50

1'248'911

159

1'122'447

59

1'149'111

456

journalistic texts

833

513'728

1'107

1'020'160

993

982'098

1'929

1'135'426

4'862

total

2'230

4'251'209

3'055

5'589'711

2'925

5'252'274

3'681

5'263'305

11'891