The Factorial Structure of Written
Hebrew and its Application to AES
Anat
Ben-Simon & Yael Safran
- Abstract -
In 2000, NITE launched the
Hebrew Language Project (HLP), the goal of which is to develop computational tools
for the analysis and evaluation of Hebrew texts. The present paper summarizes
the initial development, analysis and organization of machine-generated
statistical and NLP text features and mapping of the underlying structure of
written Hebrew through analysis of the structure of these features. To this
end, the paper reports the results of two successive studies.
The purpose of
the first study was to examine the characteristics of 133 machine-generated
quantified features, to identify the ones most relevant to text difficulty and
writing quality and to combine them into empirically based and theoretically
meaningful linguistic categories. The study also examined the effect of the
text-feature clustering model on the accuracy of the automated score. To attain
these goals, a three-stage analysis was carried out using two text corpora and
two essay corpora.
The second
study focuses on analysis of the factorial structure of writing ability and the
validation of machine-generated text features used for its prediction. A factor
analysis applied to the selected AES features using five essay-corpora,
revealed three AES dimensions: lexical complexity (fluency), topical analysis
(content) and vocabulary. However, the AES dimensions failed to align with
raters' scores on compatible or close dimensions.