August 2010
- Aug 25
- Aug 19
May 2010
- May 29
- May 22
- May 22
February 2010
- Feb 24
- Feb 23dltjEAD and MARC sitting in a tree: D-R-U-P-A-L
# EAD and MARC sitting in a tree: D-R-U-P-A-L Mark Matienzo Yale University Library (ex-New York Public Library)
# background: migration/redeployment
# launched new site January 6: www.nypl.org
# components: Drupal, XSLT, Solr, Shrew
# browse/search/view: http://www.nypl.org/find-archival-materials
# drupal-shrew: liberating yr data from III http://github.com/anarchivist/drupal-shrew
# III into Drupal...
# ...to the rest of the world - Feb 23dltjMarcXimiL - bibliographic similarity analysis
MarcXimiL is a free, flexible, fully standards-compliant and efficient bibliographic similarity analysis framework.
Similarity analyses may be set up at all levels of the process, run in batch or through the API. Options include:
* the order of comparisons between and within collections
* for each field, the selection of a parsing function
* for each field, the selection of a comparison function amongst a wide selection: vectorial (Dice, Jaccard, Salton's cosine), probabilistic (OKAPI BM25), Levenshtein based, Shingling, Authors, Date, and others.
* the global record similarity strategy (integration of fields similarities)
* the output format (XML, spreadsheet)
* thresholds at different levels - Feb 23dltjMatching Dirty Data
A description of a method for matching bibliographic records when the only common identifiers are strings that are not exact matches.
- Feb 08
January 2010
- Jan 28
- Jan 28
- Jan 28
- Jan 25
- Jan 20
- Jan 09
December 2009
- Dec 15
- Dec 15
- Dec 15
- Dec 14

