SVM produces two weighted sets of words, male and female, which, taken together, are maximally effective to the extent of the ability of the algorithm to produce an optimal solution at discriminating between texts from the two corpora. Words which might exhibit interesting distributions but which do not fit well into a particular model will not be assigned high weights and will escape our notice.
Therefore, it is useful to perform a variety of machine learning runs, find what works, and search for common threads in the results.
- Mylène Bouchard - Wikipedia!
- Democracy in Print: The Best of The Progressive Magazine, 1909–2009!
- Alien Sex Troopers (Gay Alien Tentacle Erotica).
- Greensleeves Easy Violin Sheet Music!
- L'Amour et l'amitié au Grand Siècle. Love and Friendship in the Grand Siècle (Gand/Ghent);
Ultimately, results must find support from a knowledgeable reading of the texts and be fitted with a critical hypothesis to be of great interest from the literary scholar's point of view, although predictive models may have practical uses, such as adding guessed metadata to unclassified documents, independent of their critical value or validity. Experimental Design 6. SVM has proven to be a model well-suited for text classification, and our initial tests showed that SVMLight achieved the best accuracy in classification among learning algorithm implementations at our disposal, including naive Bayesian and decision tree learners.
The SVMLight implementation is freely available and includes key capabilities such as cross-validated accuracy measures via leave-one-out estimation and the ability to extract the weights assigned to each feature. The ability to interrogate the model in this way is essential, because without it we would learn nothing about what word usage patterns distinguish male writing from female writing, merely that such a distinction can be learned with a particular degree of accuracy.
A black-box model may be adequate for industrial applications, where the goal is to classify unclassified instances with a certain accuracy, but in this experiment, where the correct classification is already known for all texts, we are far more interested in picking apart the constructed model to determine the orientation and magnitude of the weights of individual words.
For our preliminary experiments, we prepared 8 sets of vectors, comprised of the two collections the full document corpus and document subset in four versions each: the surface form of the words, the lemmas, the parts of speech POS of the words as assigned by TreeTagger, and a simplified part of speech grouping, with broader categories POSgroup.
- Product details.
- IBM Support: Fix Central - Please wait, Select fixes.
- See a Problem?.
Each matrix consisted of either or vectors, labeled with 1 for male-authored and -1 for female-authored documents. Machine Learning Runs 8. We then trained SVMLight on each matrix, and obtained the accuracies given in Tables 1 and 2, after cross-validation. This is a significant result and indicates that the model has indeed found generalizable differences between the texts in the two corpora. The differences in accuracy between the surface and lemma forms of the words are insignificant, and the POS and POSgroup accuracy differences are generally quite slight as well.
Naturally, the more accurate our model is, the more importance we can attach to the words the model weights toward each author gender. In order to test whether our accuracies were an artifact of the classifier used, rather than demonstrative of true differences between our corpora, we performed the same experiment but with each document randomly labeled as male or female, regardless of true author gender.
We can try to learn from our failures here. The fact that SVMLight cannot construct a very accurate prediction model based on POS vectors is a kind of weak evidence against any theory of gendered authorship that holds that men and women speak radically different languages. If, in fact, men and women used the basic building blocks of language in substantially different ways, we might expect to see strong mechanical differences between male and female writing reflected in POS usage rates that the model could exploit to make accurate classifications.
That such differences do not widely obtain in this corpus is strongly suggested by the inability of SVMLight to construct a very accurate model to distinguish between the gendered corpora on that basis. Of course, this does not rule out mechanical and stylistic differences that aren't reflected in the simple metric of POS frequencies, but it does suggest a base level of linguistic similarity between the two classes. Based on these initial results, we decided to proceed with further experiments using the surface forms of the words, that being the simplest method and tied for most accurate with the lemmatized forms.
Now that we were comfortable that the accuracy of our models were significant enough to indicate real differences between our corpora, we investigated the internals of those models to determine where they get their predictive power.
Session about to expire
We began by extracting the weights assigned to each word in the 2 x surface form features SVMLight model, and sorting them in descending order of magnitude. Words oriented toward male authorship are scored as positive decimals, while those pointing toward female authorship are negative decimals.
We obtained the weights of the most influential words in the model, given in Table 5. Such terms are gifts to the machine learner, greedily seized upon by our classification model but unlikely to generate any penetrating insight for the scholar. Proper names are the prime example of such features, and we saw several in Table 5 , Consuelo being the highest-ranked of these. We eliminated terms like Consuelo present in a number of works by Sand from the input our model receives by stipulating that we will only use words that occur in more than a certain percentage of documents in the corpora.
Several patterns are evident. This is not an unexpected finding given the observation of Olsen [ Olsen ] of a usage rate for these terms among female authors that is nearly 1. These results are striking in that they replicate almost exactly those of a similar analysis of female- and male- authored texts in the British National Corpus BNC [ Argamon ]. Although reflexive pronouns are not expressed by a single word in French as they are in English, and hence do not show up distinctly in our analysis, the rest of the findings match almost exactly.
The issue of reflexive pronouns might be investigated in subsequent tests by using word bigrams as features rather than, or in addition to, single words. The strong agreement between these two experiments is all the more remarkable for the very different texts involved in these two studies.
Kay Boyle and Caresse Crosby: Devoted Friendship
Argamon et al. This cross-linguistic similarity could be supported with further research in additional languages. Learn more about Amazon Prime. Get fast, free delivery with Amazon Prime. Back to top. Get to Know Us. Amazon Payment Products.
English Choose a language for shopping. Length: 84 pages. Enhanced Typesetting: Enabled.
Subscribe to RSS
Page Flip: Enabled. Language: French. Amazon Music Stream millions of songs. Amazon Advertising Find, attract, and engage customers. Amazon Drive Cloud storage from Amazon. Alexa Actionable Analytics for the Web. Sell on Amazon Start a Selling Account. AmazonGlobal Ship Orders Internationally. Buenas noches, mi amor! Goodnight, My Love! Spanish English Bilingual children's book.
Perfect for kids studying English or Spanish as their second Perfect for kids studying English or Spanish as their second language. After reading a bedtime story, his father suggests planning a dream View Product. Spanish Kids Book: Goodnight,. Find out where their imagination Buonanotte tesoro! Italian Book for Kids : Goodnight, My. This book is uniquely original and can be personalized with any girl's name.
How fun How fun is that? Over book names already published on Amazon! Think of the possibilities: baby or shower gifts, birthdays, a special something from grandma, and more. English Spanish Bilingual Book. Perfect for kids learning English or Spanish as their second language.