These are chat archives for weaverbel/LibraryCarpentry

7th
Sep 2016
Belinda Weaver
@weaverbel
Sep 07 2016 03:32
@cmacdonell Cataloguing was not even mentioned at the library conference I was just at. And the only talk about metadata was around repositories, not libraries.
Belinda Weaver
@weaverbel
Sep 07 2016 05:20
and @cmacdonell I am a 'lapsed' librarian. Really I haven't done 'library' work since 2002 - since then I've built repositories and set up data management services and now I'm working in cloud provisioning
James Baker
@drjwbaker
Sep 07 2016 12:57
On MARC: I was just at a metadata conference. They still love it (and even have a much loved tool for using it http://marcedit.reeset.net/)
Owen Stephens
@ostephens
Sep 07 2016 12:59
MARC is still the dominant method for recording bibliographic data in the higher education and research library world. However we have to be slightly careful when talking about MARC as a single thing because it can mean different things to different people
James Baker
@drjwbaker
Sep 07 2016 12:59
On XSLT: it was suggested to me (by none other than Alan Danskin at the British Library, a man very well known in metadataland) that XSLT would be well received by librarians as translating data between schemas is a common task often given to XSLT 'gurus' in a black box / wizard / automagic kinda way. This may be a start https://twitter.com/ndalyrose/status/773198896604520449?s=03
Jez Cope
@jezcope
Sep 07 2016 14:06
MARC has a number of storage formats, including a text-based one (mostly for humans), a binary one (for computers) and an XML form
Presumably MARCXML is where XSLT would be useful
Owen Stephens
@ostephens
Sep 07 2016 14:07
I’m a bit sceptical about XSLT with MARCXML
Jez Cope
@jezcope
Sep 07 2016 14:07
Fair enough - it's not my area of expertise!
What are the issues?
Owen Stephens
@ostephens
Sep 07 2016 14:08
If you want to manipulate MARC much better to focus on MARC specific tools
MARCXML is pretty horrible - not really well designed XML, but MARC stuffed into XML
Jez Cope
@jezcope
Sep 07 2016 14:10
That's been the impression I've had when looking at it too!
Owen Stephens
@ostephens
Sep 07 2016 14:10
Rather than having e.g. XML tag for a specific field like <245>Title</245>
You get <datafield tag=“245" ind1=" " ind2=“ “><subfield code=“a”>Title</subfield></datafield>
This makes the XML verbose and difficult to parse
If you use MARC specific code libraries or tools then (a) it doesn’t care whether you have MARCXML or MARC binary, and (b) it handles all the complexity of parsing the data out of the MARC structure
Jez Cope
@jezcope
Sep 07 2016 14:12
Yuk:
Owen Stephens
@ostephens
Sep 07 2016 14:12
exactly
So - I’m not against an XSLT module, but think the usefulness likely to be to those dealing with non-MARC data
Jez Cope
@jezcope
Sep 07 2016 14:13
Cool, makes sense
So what XSLT use cases might there be that we can use as examples for the module?
Owen Stephens
@ostephens
Sep 07 2016 14:14
If we want something for “MARC” then either PyMarc (which implies Python) or MARCEdit (GUI based MARC editor, with support for regular expressions etc)
So XSLT might be relevant for those working with TEI, MODS
Jez Cope
@jezcope
Sep 07 2016 14:15
There are some LMSs that will dump out XML reports for further analysis too
Owen Stephens
@ostephens
Sep 07 2016 14:15
Also maybe repositories
TEI is a good “DH” thing if we want that focus
Oh - also anyone doing work with Archives (although maybe that needs to be Archives Carpentry!) - EAD is XML
Jez Cope
@jezcope
Sep 07 2016 14:16
OAI-PMH too
Our digital preservation folk do stuff with METS
Owen Stephens
@ostephens
Sep 07 2016 14:17
Ah yes - should have said METS - often used to wrap MODS
Jez Cope
@jezcope
Sep 07 2016 14:18
Time to shift this to a GH issue?
Ryan Johnson
@remerjohnson
Sep 07 2016 14:27
MARCXML is handy in order to run the MARC to MODS XSL on it. I'm one of those repository people you speak of
Ryan Johnson
@remerjohnson
Sep 07 2016 14:36
The XML is verbose because for the web <245>Title </245> isn't helpful
Owen Stephens
@ostephens
Sep 07 2016 14:38
Agreed that <245> isn’t meaningful - I was trying to exemplify the structure issue
Of course you can use XSL with MARCXML - I just think its hard work!
Ryan Johnson
@remerjohnson
Sep 07 2016 14:42
Also agreed XSLT tasks are often handed off in a black box sort of way.
James Baker
@drjwbaker
Sep 07 2016 15:24
The case study I was given (by Alan) was manipulating LOD into other formats. No idea if enough people care less about this!
Owen Stephens
@ostephens
Sep 07 2016 15:36
I’d argue that LOD is another area where better to use specific tools - otherwise reduce interacting with LOD to interacting with RDF/XML - which is another horrible abuse of XML IMHO!
Ryan Johnson
@remerjohnson
Sep 07 2016 15:53
It is definitely not the best serialization of RDF, agreed
@JulianeS_twitter and I work in RDF metadata for our repository. We only touch it in RDF/XML when things are broken
Ryan Johnson
@remerjohnson
Sep 07 2016 17:01
OpenRefine (and its fork LODRefine, along with extensions and scripts that handle reconciliation and NER) is indispensible in working with LOD, but the inputs tend to still be spreadsheets, with sometimes some need to parse JSON responses from APIs
Ryan Johnson
@remerjohnson
Sep 07 2016 17:11
Juliane did a great demo during her lesson of the reconciliation services, which the catalogers were very impressed with