LIS2600 Course Blog

Friday, November 14, 2014

Muddiest Point from Nov. 11 Class

Are there any good sites to go to (other than the ones in the required readings) that can help trouble shoot HTML and CSS coding problems that might come up in the assignment?

Nov. 18 Required Readings

1. Paepcke, A. et al. (July/August 2005). Dewey meets Turing: librarians, computer scientists and the digital libraries initiative. D-‐Lib Magazine. 11(7/8). http://www.dlib.org/dlib/july05/paepcke/07paepcke.html

Seeing that this article is from I'm hesitant believe that much of what the author is describing is particularly relevant. The content is very interesting and I wonder what a more current article on the topic would reveal.

This article discusses the intersection of the expectations and opinions of librarians and computer scientists on their collaboration of digital library initiatives. Both groups were very excited about the opportunities presented to them from the emergence of digital libraries. Computer scientists were excited for the relief between conduction research and impacting day to day society. Librarians were excited to offset some burdening costs and using information technologies to ensure the library's impact on scholarly work.

The article goes on to discuss the interesting union of computer scientists and librarians. It also discusses how this union changed the face of libraries and publishing in many ways. Both librarians and computer scientists were very effected by the emergence of the world wide web and I might try to find an article that discuss the current standings of the two groups on this topic since this article interested me so much.

2. Lynch, Clifford A. "Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age" ARL, no. 226 (February 2003): 1-‐7. http://www.arl.org/storage/documents/publications/arl-br-226.pdf

This reading covers the emergence of institutional repositories as a new strategy that allows universities to apply serious systematic leverage to accelerate changes taking place in scholarship and scholarly communication.

An institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. Its essential goal is that of long term preservation of digital materials.

Scholarship and scholarly communication have radically changed with the implementation of new technologies and digital repositories are a popular response for the university and university library to stay relevant. No two digital repositories are the same and many most continue to adapt to stay current with the constant technological updates and subsequent effects on how scholars communicate.

3. Hawking, D. Web Search Engines: Part 1 and Part 2 IEEE Computer, June 2006.
Part 1: http://web.mst.edu/~ercal/253/Papers/WebSearchEngines-1.pdf

This article explains how data processing of web indexing works with an emphasis on the search tools and portal search interfaces that make it happen. The actual processing of crawl the web to be able to index it is very in-depth web data of about 400 terabytes. such processing requires a good network infrastructure and servers that can handle the work load.

Crawling also requires algorithms to visit the sites and determine if it has been already seen and this is done with a queue initialized by one or more seed URL. These crawls are continuous so that responses to queries are quick and updated.

The reading goes on to explain spamming and how search engines must maneuver through web landscapes of domains, servers, links, and pages to avoid spammers calculated techniques to get through.

Part 2: http://web.mst.edu/~ercal/253/Papers/WebSearchEngines-2.pdf

Part two of the reading spends more time explaining the indexing process that results from the crawl. Much like crawling, indexing requires special algorithms for scanning and inversion. Indexing keeps track of all of the documents that resulted in a crawl and organizes them for easy retrieval for a query. There are different steps or options to the process such as: scaling up, team lookup, compression, phases, anchor text, link popularity score, and query-independent score. There are other algorithms involved, such as query processing algorithms which is the most common type of search engine queries. The reading also explains problems with queries such as poor search results. Quality, speed, skipping, early termination, and caching are all was to help to get better search results.

4. Shreeves, S. L., Habing, T. O., Hagedorn, K., & Young, J. A. (2005). Current developments and future trends for the OAI protocol for metadata harvesting. Library Trends, 53(4), 576-589.

This reading discusses the Open Archives Initiative Protocol for Metadata Harvesting and gives an overview of the OAI environment. The focus of the topic seems to be on community and domain specific OAI services and how metadata plays into them. For example, metadata variation and metadata formats are two challenges that exist for the communities. Metadata variation is the normalization of a subject element with many different controlled vocabularies that are used by different data providers. Metadata formats include the many new formats like adding new paths to the processing routines of data. The reading does mention that the OAI community does have a future if guidelines are created to address the issues and problems.

Friday, November 7, 2014

Muddiest Point from Nov. 28 Class

I know I'm going to be asking about next weeks lecture- but the readings for next week talk a lot about embedding DTD in XML systems or keeping it separate but linking it. How does that work? I really can't picture this in my head.

Nov. 11 Required Readings: XML

1) Martin Bryan. Introducing the Extensible Markup Language (XML): http://www.is-thought.co.uk/xmlintro.htm
I found this reading to be hard to follow because of all of the coding jargon and because I knew nothing about XML before reading it. I also feel that the reading could've been organized a little bit better, but the follow are the main points I took away from the reading:
An XML file normally consists of:

1. an XML processing instruction

2.A document type declartion

3. A fully-tagged document instance with matching element type name to document type name

XML- a subset of the Standard Generalized Markup Language that is designed to make it easy to interchange structured documents over the Internet
After defining the role of each element of text in a formal model (DTD), users of XML can check that each component of document occurs in a vaild place within the exchanged data stream- DTD is not required and if no DTD is available an XML system can assign a default definition for undeclared components of the markup.
XML is not a standardized way of coding text- it is a formal language that can be used to pass info about the component parts of a document to another computer system-XML is flexible enough to be able to describe any logical text structure
To use XML- users need to know how the markup tags are delimited from normal text and in which order the various elements should be used in- systems that understand XML can provide a list of vail elements and will add the required delimiters. When the system does not understand XML users can enter tags manually for later validation
Elements- are entered between matched angle brackets
Entity references- start with & and end with ;
A Document Type Definition (DTD) must be created to define tag sets
If attributes of elements are not defined with a start tag in XML the program will use default elements
Commonly used text can be declared within the DTD as a text entity
XML provides many techniques for special elements- usually notation declaration is required to tell the program what to do with the reference files's unparsed data

3) Extending you Markup: a XML tutorial by Andre Bergholz: http://xml.coverpages.org/BergholzTutorial.pdf
This article did a much better job of explaining XML jargon and made the first reading much easier to understand. Unlike the first reading, this reading defines and explains acronyms before using the acronyms through out the reading. The first reading is so much more confusing than this one that it shouldn't be a required reading.

But what I took away the most from this reading includes:

XML- is semantic language that lets you meaningfully annotate text.
XML documents look a lot like HTML documents
XML elements can be nested and attributes can be attached to them- attributes must be in quotes and tags must be balanced or explicitly close
DTD's define the structure of XML documents- users can specify the set, order, and attributes of tags
When an XML document conforms to its DTD its called valid- a DTD can be included in the XML or contained in a separate file
DTD elements- can be nonterminal or terminal and DTD attributes can have zero to many attributes- attribute definitions don't impose order on when the attributes occur-DTDs expressive power is limited
XML extensions let you link more than one source

4) XML Schema Tutorial http://www.w3schools.com/Schema/default.asp

Just like all the previous tutorials- at first I was a little lost, but I caught on. I liked how this week we read readings about the code first and then did the tutorial.

Friday, October 31, 2014

Muddiest Point form Oct. 28 Class

With as extensive as HTML code is- do web designers/coders memorize HTML or is there a go to site or book that they use?

Nov. 4 Required Readings: Cascading Style Sheet

1) W3 School Cascading Style Sheet Tutorial: http://www.w3schools.com/css/
I thought this tutorial did a better job of explaining the coding style and what I can do in the tutorial than last weeks HTML tutorial. This site also tried to make it a little more fun with offering a quiz. But I really liked the CSS examples it gave.

2) CSS tutorial: starting with HTML + CSS http://www.w3.org/Style/Examples/011/firstcss
I thought that the local links this tutorial provided were excellent. They did a really good job of breaking the process down step-by-step so I could understand all the layers of creating the page using the code.

3) chapter 2 of the book Cascading Style Sheets, designing for the Web by Håkon Wium Lie and Bert Bos (2nd edition, 1999, Addison Wesley, ISBN 0-201-59625-3) http://www.w3.org/Style/LieBos2e/enter/
CSS works with HTML to create a web page/sheet/document. But CSS gives the creator a little bit more editorial control and allows the creator to be more creative over the end result. But for CSS to work, it must be used in a browser that supports CSS. Even in the right browser- there will be bugs and limitations

Some key terms from the chapter:

CSS rule- a statement about one stylistic aspect of one or more elements -- a rule consists of two parts:

Selector- the part before the left brace that links the HTML document and style

Declaration- the part inside the braces that sets forth the effect. The declaration has two parts separated by a colon- the property and value

CSS stye sheet- is a set of one or more rules that apply to an HTML document-- for it to affect the HTML document it must be glued- for example, you can put the style sheet inside a style element at the top of the document.

What I really liked about this reading were the explanations of common tasks. I liked how easy the explanations were to understand and that the author showed an example of each task.

Friday, October 24, 2014

Muddiest Point from Oct. 21 Class

I'm hesitant if my post for the first two required readings for the upcoming week are enough. I wasn't sure what to write since it was just "practice" and no content to summarize or review. Were there specific expectations to write more about these first two readings? Or just write about our thoughts?