Exploiting Open Data for Improving Spatial Keyword Query Applications
spatial data, spatial query, linked open data, query evaluation, query processing, personalization
Nowadays information is an essential resource in various sectors of the economy. The popularization of social networks, smartphone applications, and online services has increased the volume of data available online. Among this extensive amount of information, there is a specific data type called spatial data. It represents a physical object using its spatial coordinates (e.g. latitude and longitude). Spatial data is critical in a large number of application domains (e.g. land use, transportation plan). For instance, the user can find points of interest (POIs) or be warned of critical situations by spatial data applications like web search engines or emergency response applications. It’s been asserted that 80% of all data business has some locational reference. Spatial queries are widely employed to manipulate spatial data more efficiently. However, the user has a crucial role in the spatial information retrieval process when querying the needed information. For decades, researchers have proposed several techniques to aid users in expressing their information needs, such as Boolean models, pattern matching operators, and query expansion. Despite the existence of relevant alternatives in the field, there is still a lack of solutions applied to keyword preference queries. The Spatial Keyword Preference Query (SKPQ) arises as a potential solution to assist users in finding POIs. SKPQ selects POI based on the description of features in their neighborhood. In essence, the user defines a spatial (i.e. radius) and textual (i.e. query keywords) constraint to be satisfied. In this context, this thesis aims at proposing strategies to improve SKPQ results. The contribution is threefold. First, two Linked Open Data (LOD) repositories (i.e. DBpedia and LinkedGeoData) are exploited to improve the features description. The feature description in LOD contains more information than traditional spatial databases, leading to a more detailed description. Second, the query results are personalized to present the best POIs for the underlying user. By exploiting reviews on POIs, the system identifies the object that best satisfies the user and re-order the rank with respect to the user preference. Third, we model the user preference in visiting locations near to each other using a probabilistic function. This function is incorporated into the ranking function to retrieve POIs considering this user preference. We evaluate each technique employed in this proposal separately. The first technique achieves a relative NDCG improvement of 20% when using random query keywords. Also, it finds POIs where SKPQ is unable to find. The second technique further improves the relative NDCG by 92%. Finally, the third technique improves the rank consistency achieving a Tau performance of 52%.