Skip to content

RCR Forum: Clustering and Classifying Textual Corpora (GS717.10)

This workshop will equip students with a general understanding of document clustering and classification techniques for research. We'll use the Orange data analysis platform and the open-source MALLET toolkit to explore ways of characterizing or sorting large corpora. Participants will learn how to build workflows for classifying texts, how to interpret the results of document classification and clustering, and how to apply such techniques to their own research. 

More information:


Professional Development, Professionalism and Scholarly Integrity, Responsible Conduct of Research (RCR)