Research Summmary: Information Retrieval and Semantic Web

My Ph.D. research, completed in 2010, embraced semantic heterogeneity, information search and data integration - in the context of the Semantic Web and Ontology.  The research included the development of some test query and query-topic relevant ontologies; these were developed and specified using XML-based RDF, RDF Schema and OWL, together with other relevant Semantic Web technologies, including Protege Ontology Editor. Useful Ontology links are shown below.

The main areas of review and research activity were:

  • An examination of ontology mapping issues at both Upper (foundational) and Domain/Context levels.
  • Consideration of modularity within the context of small "geographical" OWL ontologies.
  • Formulation of theory regarding module reuse and the identification of primary and secondary contexts.  The issues of effectiveness and efficiency in ontology design have been considered, with particular relevance to minimising ontology specification redundancy.
  • Consideration of how such contexts can be best applied in a motivating application ontology.
  • Preliminary development of a Jena Ontology-API based java applet - to support:
    • OWL Ontology query.
    • Ontology search using a developed web-crawler process.
    • Selective dynamic importation of the results of ontology searches into an OWL Ontology.
  • The java applet (shown below) was subsequently further developed as a semantic search tool, called SemSeT, for Ontology-based query expansion (OQE) and IR on Web document collections.
  • Preliminary analysis of the issues involved in Web document searches using a test data population of 100 Web pages.  More extensive test search experiments were then conducted using the TREC WT2g document corpus and assessed using predetermined query-relevant judgements.
  • Completion of formal and wide-ranging OQE experiments using a corpus of ¼ million Web documents; these were based on three search contexts, i.e. using the TREC-8 (T) query topics (T401 - Foreign Minorities in Germany, T416 - Three Gorges Dam, and T438 - Tourism Increases).  Three OWL-based ontologies were developed to support the queries; the OWL files (named immigration.owl, hydro-electric.owl and tourism-uk.owl respectively) can be found on the Resource page.
  • Calculation of document relevance rankings using a classic VSM tf-idf based document term weighting algorithm and, then, measurement and comparison of search effectiveness, based on precision and recall measures.
  • Comparison of the OQE-based search results against conventional keyword results generated by the search tool, to determine relative search effectiveness of OQE-based search against keyword-driven search.
  • Presentation of the results of the query experiments were detailed in my thesis and have shown that OQE, enabled by query topic-specific ontology contexts, can double the search precision rate, when compared to traditional keyword-only search. An example of the results is shown in the graph below.

SemSeT: semantics-based search tool

An example of the prototype SemSeT query interface.


Below is an example P&R graph of Ontology-based query expansion (OQE) comparison outcomes; this was based on IR data generated using SemSeT with TREC WT2G 416 (Three Gorges Dam) query topic and a query topic-specific hydro-electric ontology context.

The graph shows macro-evaluation based (MEA) average precision and recall outcomes, based on a set of 10 queries, where the query terms were used on an optional basis; hence, it provides a comparison between "optional keyword" (Ko) versus "optional Ko plus ontology sub and super classes" (Oo), versus optional "Oo plus ontology relation classes" (Oro). Optional query terms were used, i.e. must-have terms were not specified.

P&R graph of OQE comparisons using SemSeT, TREC WT2G 416 and hydro-electric ontology context.

As mentioned above, ontology-based query expansion had the effect of doubling the query precision outcomes.

Presentations, Papers and Posters

  • "Examining the Application of Modular and Contextualised Ontology in Query Expansions for Information Retrieval", Thesis, September 2010, pdf.
  • "Using OWL Contexts and OQE to Improve Web Document Search Precision and Recall", Journal submission paper, January 2010, pdf.
  • "An Ontology-based Semantic Web Search Engine to improve Precision and Recall", poster presented at Faculty of Science and Technology Annual Research Conference, UCLan, 18 June 2008, jpg.
  • "Semantic Web and Efficient Reuse of Ontology Modules", presentation, MSc CO3701 Advanced Database Systems Research Topics, 5 March 2008, ppt.
  • "Geographical Ontology Modules for Efficient Semantic Web Reuse", presentation, Ordnance Survey Research Labs, Southampton, 28 June 2007, ppt.
  • "Ontology Modules by Layering - Facilitating Reuse in a Geographical Semantic Web Context", presentation SEVENTH Conference of Dept. of Computing, UCLan, 20 June 2007, ppt.
  • "MSc Database Systems Research Topics",  presentation, MSc Database Systems, 23 March 2007, ppt.
  • "CO3709 Research Topics in Computing 2007",  presentation, BSc Research Topics, 6 March 2007, ppt.
  • "Developing Ontologies based on RDF-OWL Semantic Web languages", presentation to Graduate School How-to@2, UCLan, April 2006, pps.
  • "Semantic Web, Ontology Integration, and Web Query", presentation at Dept. of Computing Seminar, UCLan, March 2006, pps.
  • "Developing "Geo" Ontology Layers for Web Query", presentation to Graduate School Conference, UCLan, December 2005, ppt.
  • "Understanding Structural and Semantic Heterogeneity in the Context of Database Schema Integration", In Proceedings of the SIXTH Conference in the Dept. of Computing, (Journal of the Dept. of Computing, UCLan, Issue Number 4, pp. 29-44, ISSN 1476-9069), May 2005, pdf. Citations: [1][2][3][4][5][6][7][8].
  • "Schema and Semantic Heterogeneity in Database Schema Integration", presentation, SIXTH Conference of Dept. of Computing, UCLAN, May 2005, ppt.
  • "Data Modelling for Data Integration", research topics presentation, March 2005, ppt.
  • "Information Dynamics, Perspectives, and Risks", short paper,  February 2005, pdf.

Useful Ontology Links

