Date

08 Jun 2017

Attendees

Agenda

1) Where we are on our road map.

2) Open Action Items

3) JIRA Issues Review - https://jira.edmcouncil.org/projects/RDFKIT/issues

4) Todays content discussion.

Look at current HTML in spec. DN says it is the wrong one and should not be called glossary anyway. That has always been the plan.

Command line in protege to do the HTML build. Dean says that Mathew says it is possible to use the open source software and build that in to some java program. We'd have to make it part of the RDF-Toolkit. - DA and TC.

How do we deliver N-QUADS - DA

How do we deliver Linked Fragments - Omar

DN says we should remove history before 30 June pub. FPT and KT seems to think it should be there. If yes, then in a separate place so not to confuse casual users?

Discussion on JG plan to work on ugly directory listings that we have now are replaced with proper HTML pages (that are generated in the browser by JavaScript)

UML for 30 June - MB, CC, JL

5) For next week.

Proceedings:

20170608 FIBO FPT RDF ToolKit

Today's Content Discussion (not in order shown)

N-Quads - DA has written a script as part of the Publish process that produces an N-Quads file. This will be merged into MAster on FIBO-INFRA as soon as it is tested alongside other things. DA has verified with the Jena command line that these are actual command files. Will then turn this over to Kevin Tyson to verify if this is what he was asking for. DA confident that it should be. This is a provisional version until after we have verified this with KT. Other minor changes are also to be made.

OK: interested in picking up linked data fragments from this, not as interested as KT in the versioning aspects of this. Would like to consume the N-Quads file for this. This hadn't been clear. So OK should also go to Jenkins Job 75, go to its workspace and look at the "provisional NQuads" that appear there. This is in the workspace, not yet on the site. So look in the workspace. This won't be pushed out onto Spec until one small bug has been addressed. Also, OK should use the ZIP file to look at the results of this. OK will first work with the unzipped version for reasons.

Fragments progress: OK has sent information to the Git people and JG and others.

JG: the formal extension for NQuads files is NCube. Jena is pretty forgiving about these format variations. DA can change this if needed. JG - the above variant is in the standard (NQ or N Cubed) so we should change to using that

OK - with the NQuads, when something is published is there a different version for each iteration of that? DA at present no. We only change the URIs for the ontology entities not for the classes and properties. The stuff you are interested in in NQuads is not the structure but the content, so that is not affected by this.

Need to put something at a different spot into the process.

JG if you add something in a triple store and there is a previous version, you get all sorts of compilations. DA would need to rearrange what time this happens so we get the versioned IRIs for the greps. JG this IRI is not a valid IRI because it ends with a slash. That makes it a namespace URI, not a proper URL. DA actually this is allowed as a resource URI. We decided long ago to go with slash rather than hash so that is the policy.

CC: we seem to be accumulating lots of different formats (DA: generating these intentionally in response to demand); we are accumulating requirements for different formats. We have different versions. These are at different levels of maturity. This makes for a lot of permutations. Would it make sense instead of generating them all, to have some download web page that dynamically produces a download set in a persons preferred format, for the preferred version, across their preferred scope and so on. That would let us add things more flexibly. DA the Publish process that generates things takes 5 - minutes. These are for up to 36 files for which the different formats are generated and IRIs minted and so on.

CC if we did this it would need to be rolled into a server. DA could be done in StarDog or maybe the fragment server. Then instead of parsing 36 files we would have 36 graphs loaded into a high performance database on which queries could be made. JG the downside is these all have to be kept on line. JG with files, disk space is free. CC is thinking about flexibility for the consumer.

JG the TTL, NQuads and so on are W3C standards People can always convert to anything else themselves. Technical reason for a ZIP file is that nGenx server can use the zipper version (which he would name with a different suffix). Then if you click on a URL and do not specify the zip version you would still get it served to you because it is there.

DA we use zips because of the maturity levels. So, for a release version we needed a kind of selection of things in the release folder, so we wanted as a zip a subset of files in a given formula as defined in a given script, that might involves greps and so on. So that complicated formula is the input to ZIP.

Where is that documented? JG for unzip use tarGZ rather than Zip. DA - tried the tarGZ and Windows didn't like that, and this would have required some horrible instructions to need to write for user. Our existing instructions work for users on all platforms and tools. JG for the nGnix server we might not be able to not use tarGZ. The existing instructions have been tested, e.g. by Bobbin using them.

Next item: Command line in Protege (for Tony)

DA: one direction we are exploring is to use the OWLDoc that is found in Protege, to provide human readable navigation output for FIBO. Right now you can load FIBO into Protege and hit a button and generate those docs. We would like to be able to do that as part of a Jenkins job. However, Protege does work as a headless server. So DA asked Matthew Horridge if you can use the program for OWLDoc directly. MH sent back a response identifying hat OWLDoc is not very dependent on Protege. So a competent Java dev could easily turn this into a program. So Tony is someone who might be able to do that. Can Tony take this on?

TC yes, as soon as I can get the Toolkit code running again, would then look at this as the next thing. We think OWLDoc uses the OWL API. TC the Toolkit has the OWL API there so that will be OK

ACTION: DA to forward email from Matthew Horridge, to TC.

5th task on removing history lines about conversion. DA can do this . Does not require discussion

TC: (separate matter) - Would NQuads require something that is not in our files? TC: if we are setting up NQuads and even if we only put them out a graph at a time, we potentially want to put some metadata (triples with the graph as the subject) to identify which graph some graph is and how it relates to other graph. DA we could have e.g. in FIBO, here's what I need? TC similar but different: I am FIBO and here is the parts I made from. DA - can have metadata that points to the ontology. What TC is saying is if I load graphs into my triple store today there is no way to know what is FIBO and what is not. Also if I have multiple versions of FIBO it is hard to tell them apart. DA can only tell them apart by URL smashing, which is a bad idea.

This is a new thing to consider. It is an appropriate thing we should do The first task is to write out what it should look like e.g., what graph it goes in to,

ACTION: Run it by KT and establish how important it would be for him. TC anticipates that KT would need it. First action is to get feedback from KT about whether we have identified a requirement of his and characterized it correctly. The thing to be identified is where to put the triples, what to call the property, how to describe the things that we want to keep separate and so on. Implementation is easy but specifying is not trivial.

Next: Current HTML - See link to "glossary" on spec

Background: a month back DN ran OWLDoc on something in Protege, on the basis we need something on our website that is the HTML for FIBO.

The question was whether to use that or run it again. It turns out DN ran it only for FBC not the whole thing. Also it turns out there is a lot of stuff just from that FBC run e.g. individual countries and individual gauges, brought in from LCC. So this is not FIBO at all. So we would need to leave out LCC or alternatively have an abridged version of LCC.

JG let's not run the HTML files but generate the JSON files that have the information that is stored on those pages, and have a modern website that reads those JSON files and generates the pages on the fly at the client browser. DA: We need something for Jun 30^th.

Can we do that now? DA does not have the resources to do this, Does JG? JG: possibly has the resource to do this. Also we need the ability to do that anyway. DA we can generate the OWLDoc right now. If JG can put together what he described above, by Jun 30, we should do that. Otherwise, DN is annoyed we do not have right thing as of now. So we need something sooner.

JG the files you have in OWLDoc don't have the same FYN scheme. These are a static picture of some specific version of FIBO. There are people who do not know or care about the versions or maturity levels, they just want to browse. We need something for them to see. We could have a sub directory under master latest/OWLDoc for example. We would need to do something on the command line to do that. Hence DA mentioned the programming task that TC has taken on to put OWLDoc into the toolkit so it can run on the CL so we can also do what JG is describing. TC would be doing a little bit of code to make this work from the CL using the OWL API if needed. Then Jenkins can have a thing added to do this. DA this is itself only a stop gap, because the ideal is what JG describes, which is where we generate a data payload, and generate the rest from that. That cannot be done by Jun 30.

JG based on OWL API and based on reasoning we could generate the stuff. So the EDM Council site should only have the data payload, and no HTM, And have something that runs in the user's own client, that interprets the data we published, in the various branch and tag directories etc., and for each of those versions ;ppJG we can make people log in using some authentication server. Not put it behind a firewall but to know who they are and their settings and preferences, so thereby generate a better page for them to see

DA we need something for people to look at for June 30.

Another comment on the agenda: HTML now on Spec: We should not call this Glossary. MB the Glossary should be Term, definition and synonym. Whether or not it is HTML is separate from that. DA where should a person click to see the HTML. MB I thought we were going t generate HTML for various kinds of audience, of which this OWL thing is one such, namely the view for the technical otologist audience.

DA there is a separate meeting to be had about what is to be shown to who and when. This is a separate conversiaon that this group. This is what the PR meetings are about. We also need a FIBO-specific digestion of the PR meetings. This is not that meeting. So there are several kinds of view for different audiences, of which David's Protege HTML view things is one such. JG we can specify different flavors of what people want to see for a given concepts. MB this was the outcome of the PR calls and the original CC slides for FIBO specifically. The idea is there is one page for one concepts (one URI) and at that page, people can opt to see different views. Later we can think about retaining information about people's preferences.

CCM model status.- where are we and what blocks? DA needs to finish round trip of OWL back in to CCM to see if the diagrams work fine and see if the diagrams work per GitHub. Because of the maturity levels, DA needed to update all of the files and generate the appropriate stuff. We are now at the point where we think we can round trip the OWL. This pass will be a pair programming task between MB and DA. Once this is done and checked, then we can proclaim the start point of the live process. They will do on Monday. DA and MB to agree a time and do this.

Decisions:

Action items

Dean Allemang forward email from Matthew Horridge, to TC.
Dean Allemang Run N-QUADS output by KT and establish how important it would be for him.
jacobus.geluk@gmail.com [X] Provide resources to generate user specific FIBO content.