SciELO - Scientific Electronic Library Online

vol.42 issue69Thematic negotiation in the co-construction of knowledge as conducted by university studentsThe dimension of focality: Conceptualization, instantiation and taxonomies author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand




Related links


Revista signos

On-line version ISSN 0718-0934


OLMOS, Ricardo; LEON, José Antonio; ESCUDERO, Inmaculada  and  JORGE-BOTANA, Guillermo. An analysis of size and specificity of corpora in the assessment of summaries using LSA: A comparative study between LSA and human raters. Rev. signos [online]. 2009, vol.42, n.69, pp.71-81. ISSN 0718-0934.

Latent Semantic Analysis (LSA) is an automatic statistical method for representing the meanings of words and text passages. An emerging body of evidence supports the reliability of LSA as a tool for assessing the semantic similarities between units of discourse. LSA has also proved to be comparable to human judgments of similarities in documents. Before analyzing a linguistic corpus composed by digitized documents, this tool acquires the mathematical representation of the texts. The main objective of this study was to analyze what properties (general, condensed, diversified, and base corpus) different linguistic corpora should have so that the assessment of the summaries carried out by the LSA is as similar as possible to the assessment made by four human raters. Three hundred and ninety Spanish middle and high school students (14-16 years old) and undergraduate students read a narrative text and later summarized it. Findings indicate that the size of the corpora need not be as general and as big as those used in Boulder (made up by millions of texts and over one million words), nor do they have to be too specific (fewer than 300 texts and 5000 words) for the assessment to be satisfactorily efficient.

Keywords : Latent Semantic Analysis (LSA); summary; discourse assessment; linguistic corpus; university students.

        · abstract in Spanish     · text in Spanish     · Spanish ( pdf )


Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License