Friday, November 14, 2014

Nov. 18 Required Readings

1. Paepcke, A. et al. (July/August 2005). Dewey meets Turing: librarians, computer scientists and the digital libraries initiative. D-­‐Lib Magazine. 11(7/8). http://www.dlib.org/dlib/july05/paepcke/07paepcke.html

Seeing that this article is from I'm hesitant believe that much of what the author is describing is particularly relevant. The content is very interesting and I wonder what a more current article on the topic would reveal.

This article discusses the intersection of the expectations and opinions of librarians and computer scientists on their collaboration of digital library initiatives. Both groups were very excited about the opportunities presented to them from the emergence of digital libraries. Computer scientists were excited for the relief between conduction research and impacting day to day society. Librarians were excited to offset some burdening costs and using information technologies to ensure the library's impact on scholarly work.

The article goes on to discuss the interesting union of computer scientists and librarians. It also discusses how this union changed the face of libraries and publishing in many ways. Both librarians and computer scientists were very effected by the emergence of the world wide web and I might try to find an article that discuss the current standings of the two groups on this topic since this article interested me so much.

2. Lynch, Clifford A. "Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age" ARL, no. 226 (February 2003): 1-­‐7. http://www.arl.org/storage/documents/publications/arl-br-226.pdf


This reading covers the emergence of institutional repositories as a new strategy that allows universities to apply serious systematic leverage to accelerate changes taking place in scholarship and scholarly communication.

An institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. Its essential goal is that of long term preservation of digital materials.


Scholarship and scholarly communication have radically changed with the implementation of new technologies and digital repositories are a popular response for the university and university library to stay relevant. No two digital repositories are the same and many most continue to adapt to stay current with the constant technological updates and subsequent effects on how scholars communicate.

3. Hawking, D. Web Search Engines: Part 1 and Part 2 IEEE Computer, June 2006. 
Part 1: http://web.mst.edu/~ercal/253/Papers/WebSearchEngines-1.pdf


This article explains how data processing of web indexing works with an emphasis on the search tools and portal search interfaces that make it happen. The actual processing of crawl the web to be able to index it is very in-depth web data of about 400 terabytes. such processing requires a good network infrastructure and servers that can handle the work load.

Crawling also requires algorithms to visit the sites and determine if it has been already seen and this is done with a queue initialized by one or more seed URL. These crawls are continuous so that responses to queries are quick and updated.


The reading goes on to explain spamming and how search engines must maneuver through web landscapes of domains, servers, links, and pages to avoid spammers calculated techniques to get through.
Part 2: http://web.mst.edu/~ercal/253/Papers/WebSearchEngines-2.pdf

Part two of the reading spends more time explaining the indexing process that results from the crawl. Much like crawling, indexing requires special algorithms for scanning and inversion. Indexing keeps track of all of the documents that resulted in a crawl and organizes them for easy retrieval for a query. There are different steps or options to the process such as: scaling up, team lookup, compression, phases, anchor text, link popularity score, and query-independent score. There are other algorithms involved, such as query processing algorithms which is the most common type of search engine queries. The reading also explains problems with queries such as poor search results. Quality, speed, skipping, early termination, and caching are all was to help to get better search results.

4. Shreeves, S. L., Habing, T. O., Hagedorn, K., & Young, J. A. (2005). Current developments and future trends for the OAI protocol for metadata harvesting. Library Trends, 53(4), 576-589.

This reading discusses the Open Archives Initiative Protocol for Metadata Harvesting and gives an overview of the OAI environment. The focus of the topic seems to be on community and domain specific OAI services and how metadata plays into them.  For example, metadata variation and metadata formats are two challenges that exist for the communities. Metadata variation is the normalization of a subject element with many different controlled vocabularies that are used by different data providers. Metadata formats include the many new formats like adding new paths to the processing routines of data. The reading does mention that the OAI community does have a future if guidelines are created to address the issues and problems.     

No comments:

Post a Comment