Data Extraction and Scratching Information Using R

G Midhu Bala; K Chitra

doi:10.34293/sijash.v8i3.3588

G Midhu Bala Assistant Professor, Department of Computer Science, Mangayarkarasi College of Arts and Science for Women, Madurai, Tamil Nadu, India
K Chitra Assistant Professor, Department of Computer Science, Government Arts College, Melur, Madurai, Tamil Nadu, India

DOI: https://doi.org/10.34293/sijash.v8i3.3588

Keywords: Web scraping, Web mining, Locating files in websites, R programming, R vest, Web Crawling

Abstract

Web scraping is automatic process of extracting multiple Web pages from the World Wide Web. It is a field with active developments that shares a common goal with text processing, the semantic web vision, semantic understanding, machine learning, artificial intelligence and human- computer interactions. Current web scraping solutions range from requiring human effort, the ad-hoc, and to fully automated systems that are able to extract the required unstructured information and convert into structured information, with restrictions. A method for budding a web scraper using R programming which locates files on a website, then extracts the filtered data and stores it is explained in this paper. The modules, algorithm for automating the navigation of a website through links are mentioned in this paper. Further it can be used for data analytics.