DEVELOPMENT OF A MACHINE LEARNING ALGORITHM TO PREDICT AUTHOR’S AGE FROM TEXT
DOI:
https://doi.org/10.29121/granthaalayah.v7.i10.2019.408Keywords:
Author Profiling, Machine Learning, Binary Classification, Natural Language ProcessingAbstract [English]
Author's age prediction is the task of determining the author's age by studying the texts written by them. The prediction of author’s age can be enlightening about the different trends, opinions social and political views of an age group. Marketers always use this to encourage a product or a service to an age group following their conveyed interests and opinions. Methodologies in natural language processing have made it possible to predict author’s age from text by examining the variation of linguistic characteristics. Also, many machine learning algorithms have been used in author’s age prediction. However, in social networks, computational linguists are challenged with numerous issues just as machine learning techniques are performance driven with its own challenges in realistic scenarios. This work developed a model that can predict author's age from text with a machine learning algorithm (Naïve Bayes) using three types of features namely, content based, style based and topic based. The trained model gave a prediction accuracy of 80%.
Downloads
References
Al Zuabi Ibrahim Mousa, Assef Jafar and Kadan Aljoumaa, 2019, “Predicting customer’s gender and age depending on mobile phone data”. Journal of big data, DOI: https://doi.org/10.1186/s40537-019-0180-9
https://doi.org/10.1186/s40537 019 0180 9
Charl van Heerden, Etienne Barnard, Marelie Davel, Christiaan van der Walt, Ewald van Dyk, Michael Feld, and Christian Muller. 2010. Combining re-gression and classification methods for improving au-tomatic speaker age recognition. InProc. of ICASSP. DOI: https://doi.org/10.1109/ICASSP.2010.5495006
Clauda Peersman, Walter Daelemans & Leona Van Vaerenbergh, 2010 “Predicting Age and Gender in Online Social Networks” Conference’10, Month 1–2, 2010, City, State, Country. Copyright 2010 ACM 1-58113-000-0/00/0010.
Rao Delip, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. 2010.“Classifying Latent User Attributes in Twitter”. In: Proceedings of the 2Nd International Workshop on Search and Min-ing User-generated Contents. SMUC ’10. Toronto, ON, Canada: ACM,2010, pp. 37– 44. url: http://doi.acm.org/10.1145/1871985.1871993 (cit. on p. 4). DOI: https://doi.org/10.1145/1871985.1871993
Dong Nyuyen, Noah A., Smith Carolyn P., & Rose, 2011, Author age prediction from text using linear regression, Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 115–123,Portland, OR, USA, 24 June 2011.c©2011 Association for Computational
Elias Lundeqvist & Maria Svensson, 2017, “Author profiling: A machine learning approach towards detecting gender, age and native language of users in social media” M Sc thesis, Department of information technology, Uppsala, http://www.teknat.uu.se/student, UPTEC IT 17013
Federica Barbieri. 2008. Patterns of age-based linguistic variation in American English. Journal of Sociolin-guistics, 12(1):58–88. DOI: https://doi.org/10.1111/j.1467-9841.2008.00353.x
Goswami, Sudeshna Sarkar, and Mayur Rustagi.2009. Stylometric analysis of bloggers’ age and gen-der. InProc. of ICWSM.
Herring, S. C. 2001. Computer-mediated discourse. In Schiffrin, D., Tannen, D., and Hamilton, H.E. (eds.), The Handbook of Discourse Analysis. Blackwell, Malden, Massachusetts, USA, 612 -634. DOI=10.1111/b.9780631205968.2003.x
Pennebaker James W and Lori D. Stone. 2003. Wordsof wisdom: Language use over the lifespan. Journalof Personality and Social Psychology, 85:291–301. DOI: https://doi.org/10.1037/0022-3514.85.2.291
Pennebaker James W., Roger J. Booth, and Martha E. Francis, 2001.Linguistic Inquiry and Word Count (LIWC): A Computerized Text Analysis Program.
Nikesh Garera and David Yarowsky. 2009. Modeling la-tent biographic attributes in conversational genres. InProc. of ACL-IJCNLP. Sumit DOI: https://doi.org/10.3115/1690219.1690245
Morgan-Lopez AA, Kim AE, Chew RF, Ruddle P (2017) Predicting age groups of Twitter users based on language and metadata features. PLoS ONE 12(8): e0183537.
https://doi.org/10.1371/journal.pone.0183537 DOI: https://doi.org/10.1371/journal.pone.0183537
Werner Spiegl, Georg Stemmer, Eva Lasarcyk, Varada Kolhatkar, Andrew Cassidy, Blaise Potard, StephenShum, Young Chol Song, Puyang Xu, Peter Beyer-lein, James Harnsberger, and Elmar N ̈oth. 2009. Ana-lyzing features for automatic age estimation on cross-sectional data. InProc. of INTERSPEECH.
Downloads
Published
How to Cite
Issue
Section
License
With the licence CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute, and/or copy their contribution. The work must be properly attributed to its author.
It is not necessary to ask for further permission from the author or journal board.
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.
 
						 
							 
			
		 
			 
			 
				














