Home Korpus

From the very beginning, the structure of the Swiss Text Corpus was designed to cover the vocabulary of 20th century standard German in Switzerland as widely as possible. The corpus consists of printed and typewritten texts of very different production and publication forms. It is a balanced according to time, form and content criteria:

  • Text class: formal criterion
  • Quarter of century: time criterion
  • Domain: content criterion

With this structure, the Swiss Text Corpus is a balanced data resource for all kinds of linguistic research questions.

The Swiss Text Corpus contains the following amounts of text (according to the criteria mentioned above):

 

documents/
words
1900-1924

documents/
words
1925-1949

documents/ words
1950-1974

documents/
words
1975-1999

documents/
words
2000-2018

total

functional texts

1'042

1'170'099

1'465

1'267'731

969

1'193'200

1'417

1'087'395

1'238

962'316

6'131

factual texts

167

1'450'562

433

2'052'909

804

1'954'529

276

1'891'373

898

980'125

2'578

fiction

188

1'116'820

50

1'248'911

159

1'122'447

59

1'149'111

40

944'405

496

journalistic texts

833

513'728

1'107

1'020'160

993

982'098

1'929

1'135'426

1'267

970'559

6'129

total

2'230

4'251'209

3'055

5'589'711

2'925

5'252'274

3'681

5'263'305

3'443

3'857'405

15'334