Big Data is characterized by the three V's:
- the unprecedented volume of data
- the velocity of data generation
- the variety of data structure
The data are to a high degree the result from an unprecedented scale of user-generated data, increasing availability of open access to governmental data, and the declining cost of collecting and storing enterprise data and scientific data sets. A major challenge is how to understand and gain value from Big Data, techniques for achieving this can broadly be divided into those based on data mining/machine learning techniques (e.g., prediction) and those based on query processing (e.g., aggregation and ranking). In ExiBiDa we will study the challenge from a query processing point of view, where the aim is to develop efficient algorithms and index structures.
A large fraction of Big Data is textual, and has spatial and temporal dimensions. A prime example is textual social media data, which can have associated the location of the user when he/she wrote the message and the timestamp of when it was posted. In ExiBiDa, we will focus on exploratory analysis of data containing such spatiotemporal-textual (STT) contents, and develop frameworks and scalable techniques (i.e., efficient algorithms and index structures) for supporting analytical queries on such data.
During the execution of the project, we expect to provide contributions on the following research topics:
- efficient algorithms and index structures for processing spatiotemporal-textual queries on Big Data
- techniques for fast, approximate, and indicative answers to exploratory STT queries
- methods for parallel/distributed execution that can be used as part of Big Data frameworks (Spark, MapReduce, etc.)
- use of STT query techniques in related areas, e.g., recommender systems
The contributions will be disseminated in international refereed conferences and journals, and the ExiBiDa software will be made available as open-source for the research community. Additionally, a total of 2 doctoral thesis and at least 4 master theses related to ExiBiDa will be completed.