Pretreatment of web log files

  • Mohammed ERRITALI TIAD laboratory, Computer Sciences Department, Faculty of sciences and techniques Sultan Moulay Slimane University Beni-Mellal, BP: 523,Morocco
  • Hanane EZZIKOURI LMACS laboratory, Morocco
  • Mohamed OUKESSO LMACS laboratory, Morocco

Abstract

The pretreatment of web data is often the most laborious and requires the most time, this due in particular to the lack of structuration and the large amount of noise present in the raw data. Pretreatment of Web log files is to clean and organize the data contained in these files to prepare them for future analysis. Web log files are often text type, an objective of the pretreatment step is to transfer the data in an easier to use environment (eg in a database).

In this paper we will start with the presentation of different formats of web log files, then we will present the different pretreatment methods that we used as cleaning of Web robots queries, removing queries relating to scripts (.js, .css, .swf), identifications of users, sessions and visits.

Downloads

Download data is not yet available.

References

Cooley, R., Mobasher, B., & Srivastava, J. (1999). Data preparation for mining world wide web browsing patterns. Knowledge and information systems, 1(1), 5-32.

Tan, P. N., & Kumar, V. (2004). Discovery of web robot sessions based on their navigational patterns. In Intelligent Technologies for Information Analysis (pp. 193-222). Springer Berlin Heidelberg.

M. Spiliopoulou. Data Mining for the Web. Proceedings of the Symposium on Principles of Knowledge Discovery in Databases (PKDD), 1999.

Tanasa, D., Trousse, B., Masseglia, F., & AxIS, P. (2004). Application des techniques de fouille de données aux logs web: Etat de l’art sur le Web Usage Mining. Mesures de l'internet, 126-143.

Tanasa, D., & AxIS, A. (2002, December). Lessons from a web usage mining intersites experiment. In Proceedings of the First International Workshop on Data Cleaning and Preprocessing of the ICDM02 (pp. 99-107).

R. Cooley. Web Usage Mining: Discovery and Application of Interesting Patterns from Web Data. PhD thesis, University of Minnesota, 2000.

Aye, T. T. (2011, March). Web log cleaning for mining of web usage patterns. InComputer Research and Development (ICCRD), 2011 3rd International Conference on (Vol. 2, pp. 490-494). IEEE.

Pamutha, T., Chimphlee, S., Kimpan, C., & Sanguansat, P. (2012). Data Preprocessing on Web Server Log Files for Mining Users Access Patterns.International Journal of Research and Reviews in Wireless Communications (IJRRWC) Vol, 2.

Merzoug, N., & Bessa, H. Application du processus de fouille de donnees d'usage du web sur les fichiers logs du site cubba.

Charrad, M. (2005). Techniques d'extraction de connaissances appliquees aux donnees du Web. Transformation, 56, 5-2.

Tanasa, D., & Trousse, B. (2003). Le pretraitement des fichiers logs web dans le “Web Usage Mining” multi-sites. Journees Francophones de la Toile (JFT’2003), 113-122.

Langhnoja, S., Barot, M., & Mehta, D. (2012). Pre-Processing: Procedure on Web Log File for Web Usage Mining. International Journal for Emerging Technology and advanced enfineering, 2(12).

Tanasa, D., Trousse, B., Masseglia, F., & AxIS, P. (2004). Application des techniques de fouille de données aux logs web: Etat de l’art sur le Web Usage Mining. Mesures de l'internet, 126-143.

Charrad, M., Ahmed, M. B., & Lechevallier, Y. (2005). Extraction des connaissances à partir des fichiers logs. Atelier fouille du Web EGC2006, 768.

Sharma, A. (2008). Web Usage Mining: Data Preprocessing, Pattern Discovery and Pattern Analysis on the RIT Web Data (Doctoral dissertation, PhD thesis, Rochester Institute of Technology).

Khalil Gdoura, Web Usage Mining-Determination des facteurs de succes d’un site web par un modele de regression logistique, Ecole Superieure de la Statistique et de l’Analyse de l’Information, 2008 / 2009.

www.developer.mozilla.org

Catledge, L. D., & Pitkow, J. E. (1995). Characterizing browsing strategies in the World-Wide Web Computer Networks and ISDN systems, 27(6), 1065-1073.

Published
2015-02-13
How to Cite
ERRITALI, M., EZZIKOURI, H., & OUKESSO, M. (2015). Pretreatment of web log files. Journal of Information Sciences and Computing Technologies, 2(1), 108-121. Retrieved from http://scitecresearch.com/journals/index.php/jisct/article/view/29
Section
Articles