Information Retrieval: A Comparative Study of Textual Indexing using an Oriented Object Database (DB4O) and the Inverted File

  • Mohammed ERRITALI TIAD laboratory, Computer Sciences Department, Faculty of sciences and techniques, Sultan Moulay Slimane University, Beni-Mellal, Morocco
Keywords: Information Retrieval, indexation, oriented object database, inverted file.

Abstract

The Growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. Most of the models of information retrieval use a specific data structure to index a corpus which is called "inverted file" or "reverse index". This inverted file collects information on all terms over the corpus documents specifying the identifiers of documents that contain the term in question, the frequency of each term in the documents of the corpus, the positions of the occurrences of the word.

In this paper we use an oriented object database (db4o) instead of the inverted file, that is to say, instead to search a term in the inverted file, we will search it in the db4o database.

The purpose of this work is to make a comparative study to see if the oriented object databases may be competing for the inverse index in terms of access speed and resource consumption using a large volume of data.

Downloads

Download data is not yet available.

References

Ricardo B Y., Berthier R N. Modern information retrieval, ACM (Association for Computing Machinery).

Baziz, M. (2005). Indexation conceptuelle guidee par ontologie pour la recherche d'information (Doctoral dissertation, Toulouse 3).

Mooers, C. N. (1948). Application of random codes to the gathering of statistical information (Doctoral dissertation, Massachusetts Institute of Technology).

KARBASI, S. Ponderation des termes en Recherche d’Information (Doctoral dissertation, Toulouse 3).

Harrathi, F. (2009). Extraction de concepts et de relations entre concepts a partir des documents multilingues: approche statistique et ontologique.

Salton, G. (1969). A comparison between manual and automatic indexing methods. American Documentation, 20(1), 61-71.

Mallak, I. (2011). De nouveaux facteurs pour l'exploitation de la sémantique d'un texte en Recherche d'Information (Doctoral dissertation, Université Paul Sabatier-Toulouse III).

Aouicha, M. B. (2009). Une approche algebrique pour la recherche d'information structuree (Doctoral dissertation).

Barry, C. L. (1994). User-defined relevance criteria: an exploratory study.JASIS, 45(3), 149-159.

Boubekeur-Amirouche, F. (2008). Contribution a la definition de modeles de recherche d'information flexibles bases sur les CP-Nets (Doctoral dissertation, Universite de Toulouse, Universite Toulouse III-Paul Sabatier).

Roussey, C. (2001). Une methode d’indexation semantique adaptee aux corpus multilingues. Institut National des Sciences Appliquees de Lyon Lyon, Ecole Doctorale Informatique et Information pour la Societe.

Azzoug, W. (2014). Contribution a la definition d’une approche d’indexation semantique de documents textuels.

Porter, M. F. (1980). An algorithm for suffix stripping. Program: electronic library and information systems, 14(3), 130-137.

Buckley, C., Singhal, A., Mitra, M., & Salton, G. (1995, November). New retrieval approaches using SMART: TREC 4. In Proceedings of the Fourth Text REtrieval Conference (TREC-4) (pp. 25-48).

Brini, A. H. (2005). Un modele de recherche d'information base sur les reseaux possibilistes (Doctoral dissertation, Toulouse 3).

Maron, M. E., & Kuhns, J. L. (1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM (JACM), 7(3), 216-244.

Agrawal, R., Imielinski, T., & Swami, A. (1993, June). Mining association rules between sets of items in large databases. In ACM SIGMOD Record (Vol. 22, No. 2, pp. 207-216). ACM.

Tebri H. Formalisation et specification d’un systeme de filtrage incremental d’information. These de doctorat de l’universite Paul Sabatier, Toulouse, 2004.

V.Rijsbergen C. J. Information Retrieval. Department of Computing Science University of Glasgow.

Iadh O. Un modele d'indexation relationnel pour les graphes conceptuels fonde sur une interpretation logique, These pour obtenir le grade de Docteur de l'Universite Joseph Fourier, 1992.

Piwowarski B, Denoyer L, Gallinari P. Un modele pour la recherche d’information sur des documents structures. 6es Journees internationales d’Analyse statistique des Donnees Textuelles. LIP6, PARIS – France, 2002.

Denos N. Modelisation de la pertinence en recherche d'information : modele conceptuel, formalisation et application. These pour obtenir le grade de Docteur de l'Universite Joseph Fourier-Grenoble I, 1997.

Published
2015-01-15
How to Cite
ERRITALI, M. (2015). Information Retrieval: A Comparative Study of Textual Indexing using an Oriented Object Database (DB4O) and the Inverted File. Journal of Information Sciences and Computing Technologies, 1(1), 59-68. Retrieved from http://scitecresearch.com/journals/index.php/jisct/article/view/22
Section
Articles