...
Attendees
- Dean Allemang
- Cory Casanave
- Anthony Coates
- Bobbin Teegarden (Unlicensed)
- Pete Rivett
- David Newman
- jacobus.geluk@gmail.com [X]
- Elie Abi-Lahoud (Unlicensed)
- Mike Bennett
Agenda
1) Where we are on our road map.
...
Code that KT gave to JG and TC to build an NQuad from FIBO
5) For next week.
Proceedings:
View file | ||||
---|---|---|---|---|
|
20170511 FIBO FPT RDF ToolKit
Dean to TC about catalog file publishing that is now blocking MU. Cory wrote a Python script that does this. Hack, but works. TC need something simple that runs on local computer of the developer. TC Git Hook is the obvious time to format and save if this can be neat. DA we could all do this in Jenkins at commit time. Could be done quickly. Then anyone who checks in would have this. Then anyone who edits FIBO would have this. TC Git comes with its own BASH shell. Catalog files could be generated from there as necessary. Does not need to be done often and takes compute time that would slow down committs. CC's Python is fast but not automatically installed on Windows. JAVA easier and already installed for Serializer. OK Python is portable to BASH? OMAR likes BASH. TC, yes this could happen. DA agrees. TC let's start with DA script. DN we can’t allow developers to check in catalog files. DA that is our policy. Conclusion is for the time being use the BASH in a Githook and longer term it is an on demand part of the RDF ToolKit. DN This has to be more than policy. We have to automatically block such check in.
ACTION: Dean and TC work out how the RDF ToolKit enable FIBO developers to automatically on demand generate the catalog files and prevent those who are not authorized from doing the same.
2nd blocker for Dean: is a Q for Jacobus: DA has been working on making it so you can run FIBO out of the box on both TBC and Protege. going to work with Jim to make that happen for CCM; EK for VOM. DA has made changes to published FIBO dot shell. DW has merged this back into /dev. There is a job that runs this but DA not seeing the updates making their way back into spec.edmcouncil.org - there is a job that takes something from the workspace and sends it there. Not seeing this work. This is the main job not DA's test version. Expects it to put the things there and it doesn't. JG: there is another job that is triggered by the first one, and that resides in StarDog and triggers something via Jenkins to the artifacts thing and from there. See "downstream projects" on the initial page This is how it happens. DA had not understood that bit. This is the thing that does the copy from the workspace to where it needs to be. There is a program called set the trigger, less relevant to this. Script includes the directory path for the things hosted in the nGenix server. This meanwhile has a pre-stage area where everything that is to be published will be when go to a place called Workspace to see what will be seen on the web. So the puzzle for DA started with a Pull Request. The open in DA branch won't be expected to hook into that copy, so not that. So then there is the pull request that DW actioned. DA added a header to all the ttl files so a thing figures out the right version of the IRI. JG then also did some unspecified "cleaning up" of that. DA looked for something before, wasn't there, that was the cause of the confusion. Meanwhile some files are ending up with control end characters on their line feeds - this might mess up the TTL parser. Ugly needs cleaning up.
JG has a question: Should we go for Turtle 1.1 since this is now the most recent supported version of that standard? DA yes we should. However TopBraid does not support that, and even if they add support there are people using the older version. So we should support the old and the new version. TC: the serializer puts in both the special comment for TopBraid and another special thing that is part of Turtle. DA - we need to add a key to the serializer to tell is about this. TC the base URI is not always inferable. So you can either tell it what the base URI is or tell it that the ontology URI is the one to use. A separate possible enhancement would be to tell it to use the versionIRI. DA at present there are a lot of DirtyPink ontologies that do not have versionIRIs. Possible solution is to say to DirtyPink to fix that. An alternative is to infer the versionIRI from the branch name and IRI. DA can specify the base URI, as reminded by TC above. This is on the command line. This is a better idea than doing some fragile post-processing as DA is currently doing. Will switch to using the command line option. Meanwhile, for versionIRI: Either make a policy to always have these. or always not have them and have them be computable. All agree that the 2nd option is the one to follow. So there is no need for DirtyPink to add versionIRIs in the CCM model or the OWL. Now DA has the knowledge he needed.
JG but this is still under development, the last job needs to be replaced by something that publishes it to the S3, as we will soon run out of space with what we are doing .The future variant on this would put everything on S3, using the nGenix server as a proxy for S3 and forward the stuff to there. Part of DA's confusion was he was expecting go see S3 in the loop and wasn't. Is DA now complete with what he needs? Yes, but would really want to understand the flow of the files. All the way from the script that he writes to where it ends up. MB is that documented?
ACTION: DA document current state. Will update this when the S3 comes in, and so on. DA and JG will continue to maintain this going forward. This will be in the Hosting task in the RDF ToolKit Roadmap Wiki. (S3 v nGenix) which will be an evolution.
What we would announce at the NYC member's brief (June 8) DW has added a User Story to support this release. Has extended hosting out to Jun 30. That should be right. The remaining task bars; are these all to be extended? Yes.
Other agenda items for today: Roadmap Review. We do not have sub tasks in these activity bars. Hence DW writing user stories for these. Rule is to connect a User Story to a JIRA Issue. So DW will raise one of those for each user story he is writing for these sub-task / milestones. Make the title of the doc the same as the "feature". Ideally also the same as the name of the JIRA issue e.g. "I want to have access to all the FIBO products" is the Feature and the JIRA title, and Outcome of the US is e.g. "The bank has access to all the files it needs". Expect hundreds of User Stories. Cluster per Person and per Outcome.
Next Agenda item: Shash versus hash - was hashed out again last Thursday. Differences remain. DW would like to get past this. MB recalls we already decided this many years ago. Monday's problem was an implementation of the policy on external ones. Where is this documented? This needs to now be adopted as our policy, whereas previously this was hosted as being an OMG policy? PR: there is no OMG position on this, we had the policy ourselves.
ACTION: DW will look for # / policy and task MU to add it to the Policies place. Or create one if it does not exist.
Next: From FIBO-V meeting with KT: Kevin needed to do nQuads. KT gave Tony or someone a script or something to make one giant nQuad for FIBO. TC: We may or may not use that ourselves. Not clear if we would use it. If we integrated it with the Serializer we could do it there. Is this for OWL or FIBO SKOS? DN this is for the FIBO OWL ontology. However, if we do want to do it for the FIBO OWL then we would do it for the FIBO-V as well. JG: then what would the URLs be? DN perhaps we would define named graphs for each module, or have a default graph for all of FBI. KT had suggested a particular named graph that crosses over prefixes and namespaces (URIs) so it would have a lookup that captures the prefix for each URI. DN: there are many details still to explore but we like the idea that we can take FIBO and leverage the ability to load the TBox (and maybe some of the ABox items we include) immediately into the graph. CC: instead of publishing FIBO in a different form, what we provide in the toolkit the ability for people to publish it in that different form. We could also do that with other flavors. JG we could add this to the Publish function in the ToolKit. But we would want to know what the URLs would be. He is loading FIBO stuff into StarDog himself, with their own internal URL for that (in BNY-M).
DA what problem does this solve? JG happy to generate whatever quad file we want, but what for? DN: it allows an end user to have an easy load of FIBO into a triple store. Not into a desktop tool like Protege or TopBraid, which we already have ways of doing. Rather this is for loading it into other triple stores. DN makes no sense to have an nQuad file for Protege. DA similarly the CS PoC for CFTC - says that CS won't follow your imports you have to load the whole thing as a single graph. DA responded by using a recipe he has for taking all FIBO and putting into a single file, but this is a bit sloppy and not production ready. So we do need to turn FIBO into a single file - but that's a triple file not a quads file. So, we can see the advantage of doing this. DW heard the se thing from KT - he is happy with a giant quad file, but also in future will want a quad file for each domain. As JG said, the Publisher can do any of this easily. But what will the URIs be, every domain has its own URI so we could use the Domain URI as the URI for the graph. This can also be done at ontology level. So we range from ontology level via Domain to "File" i.e. all of FIBO. The One File approach gets around the need to resolve imports and stuff. Different triple stores handle that differently (OK). Resolution: Will we build this as artifacts into the Publisher? Them Giant File would be a triple not a quad file if we do it in the Publisher. DN: one thing KT asked for (and why nQuad) is he wanted this cross referenced between namespace prefix and URI for the ontologies, so this needed to be in a named graph. If we had another structure for that, that would also be OK but we haven't developed such a thing. DN someone would need to define a basic semantic structure for such a thing anyway. This also opens new discussion points on structures and annotations, around the structure of an ontology, of a graph itself and so on. Not clear if there is precedent for this. CC: We are trying to make decisions of a wide range of people Why not provide a utility that reads the syntax in from the FIBO standard and gives users options to specify what they want out. Could include JSON LD (already providing); nQuads and so on. Relates to the question about distributing utility. Are these developer utilities? Is loading things into a triple store also a developer thing? Is there an audience that will expect a thing on the website that they can use to do things.
OK preference would be quads if he was to consume an artifact. Tendency is to load things into a default graph; if we use quads then this keeps those separate53 PR: Policy decision needed on what level to provude a graph e.g. File / Product or Domain or Module, versus Ontology level (which all think is too fine grained). DN: Thinks KT would be happy with various levels. Can improve this incrementally. Main thing is that KT needs to easily load FIBO into a graph. PR What is the "cross reference" thing? JG: there is a standard SPARQL command called Load that allows you to load any statement into whatever named graph you want. DA this doesn't address the KT problem. He can get everything into the graph but for a given triple he wants to then know what ontology that triple came from. JG this is a separate question. Named graph is one of many possible solutions to that problem. Metadata is another. PR there is a standard annotation for that. DA the issue is to put that on triples you would need to reify them, hence the need for a quad based solution. DN we need to bring Kevin into this discussion, Seems to be onto something.
ACTION: DW to reach out to KT about joining this call next week if possible.
DN has an issue to add, not yet in JIRA. Introduce annotation at the element level (at least in FIBO-V, maybe other) to identify the lifecycle state of the element. e.g. Contract is in Green in OMG; a new version of Contract might be in Yellow or Pink. This is something everyone using FIBO for data stewardship will need to know. Wells using Collibra, loading FIBO-V into Collibra, need the state of the element in its lifecycle. PR not sure you can do this at the element level, but at the ontology level. If you have 2 things already in green and you add something that changes the relations between those existing items e.g. an equivalent, then you can't usefully track it at the element level, needs to be at the ontology level. DW this is part of the action on JG and DN on tracking the maturity levels. Is on the agenda for Tuesday.
Decisions:
Action items
- Dean Allemang Anthony Coates work out how the RDF ToolKit enable FIBO developers to automatically on demand generate the catalog files and prevent those who are not authorized from doing the same.
- Dean Allemang document current state of how catalog files are generated. Will update this when the S3 comes in, and so on. Dean Allemang jacobus.geluk@gmail.com [X] DA and JG will continue to maintain this going forward. This will be in the Hosting task in the RDF ToolKit Roadmap Wiki. (S3 v nGenix) which will be an evolution.
- Dennis Wisnosky will look for # / policy and task Michael MU to add it to the Policies place. Or create one if it does not exist.
- Dennis Wisnosky kptyson reach out to KT about joining this call next week if possible.