By Associate Professor Rodney Clarke
All major approaches used to analyse tweets (statistical and machine learning) proceed from the segmentation and classification of lexical items (words). An alternative approach involving computer-based grammars is rarely used because they do not scale well to the dimensions necessary for the analysis of PetaJakarta tweet traffic. All current approaches are syntactic and asemantic. They are incapable of addressing questions concerning how citizens are actually using social media platforms during emergencies because they are incapable of explaining how language is structured for use.
The PetaJakarta project is providing the proving ground for a new kind of approach to analysing tweets – functional and semantic analysis (FSA) – that uses sociolinguistic grammatical features associated with representing experiences. Doing so overcomes a major hurdle in processing tweets in emergency situations. FSA can exclude the vast majority of tweets that do not encode meanings related to experiences based on happenings and events. This excludes for instance messages in tweets that describe what people say, think, or believe.
By identifying experiences based on happenings and events we consider only tweets that are relevant to emergencies as experienced by citizens, for example, flood events, inundation, hazards, medical emergencies or evacuation. Further analysis is used to refine these results by considering additional situational or conditional insights into when, where, how or why the happenings or events took place or are taking place. This enables us to distinguish events or happenings by duration, distance, time, and place for example.
We are currently analysing all confirmed tweets gathered during the monsoon season in Jakarta (Jan-Mar 2015). These are being manually translated and analysed in order to develop reliable algorithms for FSA and visualisation of PetaJakarta tweet traffic.