Sunday, June 25, 2017
Bilingual Intelligent Search & Analytics Solution for the Legislative Assembly of Uttar Pradesh
Bilingual Intelligent Search & Analytics Solution provides a complete searchable digital archival and retrieval solution of Vidhan Sabha proceeding books and video recordings. A Searchable Digital Library is created to take care of the instantaneous and on line demand of the archived information from existing library. The Solution consists of intelligent digitization of proceedings and integrated bilingual search platform for house proceedings of books and videos, developing a web portal, setting up a data center and DR (disaster recovery) site for providing uninterrupted services through internet and operation, management and maintenance of the entire software.
WHY THIS CONCEPT:
Uttar Pradesh Vidhan Sabha is facing a challenge of ever increasing paper stack of house proceedings and associated documents. It is becoming humanly impossible to do instantaneous search for the relevant information when requested and especially when the session is in action. Digitization and search benefits for Vidhan Sabha are across three dimensions: the transparency of governmental activities; the delivery of e-government services and the participation of public at large.
PROCESS (HOW IS IT IMPLEMENTED?):
This system consists of multiple systems and components integrated on one platform.
Freezing Requirements & Sign off SRS > Approval of Prototype > Gap Analysis > Development > Auditing > User Acceptance Test (UAT) > Hosting > Launch of Web portal System Components
1) CMS (Content Management System)
CMS is used by digitization team to digitize proceeding books and videos.
2) Annotation Tool
Annotation tool is used by digitations team to OCR (Optical Character Recognition) and extract proceeding headlines text from proceeding books. Proceeding books metadata like proceeding date, question type, page type, and key persons are tagged to make them searchable.
Database is used to store proceeding data and metadata. For Security purposes Application level authentication and authorization mechanisms are configured.
4) Web Portal
Web Portal is used by Vidhan Sabha and Other Government Departments to search on Vidhan Sabha proceedings and to view Vidhan Sabha Members' profiles and department details. Web Portal is accessible over internet for everyone.
5) Video Streaming Server
Media Server is installed and configured for on-demand video streaming of proceeding videos displayed in the search results. Specific videos can be viewed based on headlines searched. There is no need to download / view entire proceeding day video. This feature is available to registered users only.
6) Search Engine
Proceeding data extracted from proceeding books is indexed in Solr Search Engine with linguistic capability for Hindi language like Homophone, Lemmatizer, and Synonyms etc.
7) Analytics/Dashboard System
Analytics server is installed and used to get reports like most discussed debates, most active persons etc. Other dashboards for analytics purposes can be easily created.
8) News Crawling (RSS Based)
Apache Nutch is used and configured for online news RSS crawling. News classification workflow implemented for categorization. Reporter user is created to review crawled news data and is based on approval /rejection action by library staff news indexed and available for search.
· Time required in manual process of getting the information reduces from 20 days to few seconds.
· More cost-effective in terms of effort and time compared to physical book library.
· Making documents available to the authorized person as and when required.
· Easy & Accurate information retrieval, erasing human error occurring during manual process.
· Digitization helps in keeping records useful for data archival in a safe manner.
· Scalable storage as per requirement.
· 24/7 availability of access over the Internet for everyone.
· Great saving of physical space due to Digital Archival of past books.
· A Comprehensive and integrated digitized search solution so that information is quickly accessed.
· English and Hindi language support.
· Visually appealing look to web portal.
· Necessary security features against hacking & defacement.
· Support on mobile devices.
· Annotation of the search tags on the images.
· Seamless viewing of the video to be provided. Support for pausing, seeking and stopping the video.
· Image enhancement and noise removal from raw images.
· OCR to retrieve text from image.
· Intelligent Digitization of Video files.
· Tags cleaning / spelling correction /modification and linking of the tags to the portion of the information on document.
· Creation of book index and storage of images along with associated tags in new e-Book format.
· Storage and archival.
CHALLENGES, IF ANY:
We have faced few challenges but with research, enhancement and modification in our software component, we are able to incorporate our solution properly and effectively, some of the challenges faced are as follows.
· In converting different Video Formats to software specific
.· Old Cassettes with Poor Video and Audio Quality.
· Poor Quality of few Books because of natural aging process.
· Torn pages, dark marks and ink spots on Book Pages are removed by image enhancement tools.
· Software is reconfigured to incorporate different fonts and text size.