Date
Attendees
...
Agenda
1) Where we are on our road map.
2) Open Action Items
3) JIRA Issues Review - https://jira.edmcouncil.org/projects/RDFKIT/issues
4) Todays content discussion.
5) For next week.
...
Agenda
This was a special FPT meeting in Cambridge England because Dean was in town.
Proceedings: View file name 20170714 FIBO FPT RDF ToolKit.docx height 250
FPT Meeting Notes
Friday 14 July, Cambridge, UK
Present
- Tony Coates (hosting)
- Dean Allemang
- Jacobus Geluk
- Mike Bennett
Summary
Delivery
We considered among other things, the precise nature of what should be published at spec, in terms of the maturity levels and their timeliness. The possible options considered were:
- Production: snapshot as of the quarterly date
- Development: snapshot as of the quarterly date
- Development: continually updated
We decided that (2) is not something people have a need for, at least until we hear otherwise. Therefore the aim is to have Production updated to spec once a quarter, and Development updated continually via the GitHub and Publisher mechanisms.
So we will do (1) and (3) only going forward.
Tests were carried out and actions identified to support this.
Meanwhile what we published on 30 June (updated as of today 14 July), was (1) and (2) above.
HTML Structure
There is an impact on the required web structure. At present you go to /fibo and select one Product type, e.g. Vocabulary, and then have a choice between Production or Development (or in some cases just one or the other) at the Product level.
To support delivery of (1) and (3), we need to have the distinction between Production and Development at the very top, with users selecting between Products after that. Some products may not exist for Development on this basis. Further work is needed to ensure that the CCM-derived products are updated in real time for Development alongside the RDF-generated products.
Apart from schema, all our Products are generated either from RDF or from CCM.
Detailed Discussion Notes
Discussion - what exactly do we want to put out for the wide world?
For example:
Now:
- Production at a quarterly date
- Development at a quarterly date
Possible:
- Production - quarterly
- Development - always latest
Another possible:
- Production (quarterly)
- Development (quarterly snapshot)
- Development (real time)
The question is whether there is any merit to the middle option. So far we see none.
This being the case, we have one deliverable that is updated quarterly and another that is update all the time.
Latest URI fragment and usage
This also raises questions about "latest" e.g. the notion that there is a URI type with /latest/ in it and if so where this comes from e.g. a specific version like /20170701/
Background details:
DA Scripts works on this:
Upstream:
- Release
- Provisional
- Informative
Downstream:
- Production
- Development
There is no longer a Git branch for these, these are in the Artifact ontology.
Suggest:
Make a new branch that is a copy of the master branch, that has filtered so that it only has the things that are for Production. So these are copied across into that as a separate GitHub branch.
Then you don't see the word production in the URL you see master and latest in the url.
Question on URLs
Do we need differing URLs? If so what should they be e.g. latest?
What about FYN?
If I load a production ontology on a Tuesday and on Wednesday I want to do Dev. Will it load the same ontologies on the same 2 days? It should but as described above it won't.
Tags
Attaching a tag to the quarterly release:
Investigation: Using Commit and Tag in GitHub
e.g. do you need to change something before you get such a thing as a Tag or can it just be set as an environment variable by Jenkins, (or is this sentence nonsense – please check!) ?
Now figure out what we want to put in this tag.
Test
Here Dean carried out a test for the above.
Results:
By the time we get to here Git is working from the tag and the branch is a thing in the background and we don't know what the branch is. So the branch name is gobbledegook. This appears in the URI.
So that is useless.
What we Need
So we need to split the build process. This is already split.
Push the commit, then have another one that is triggered on the tag, this goes to the workspace of the previous one, plucks that out, transforms that and then that’s the one the publishes to the website.
For this you need to figure out that the thing you are pushing is the tag. This would happen if someone is working on something while you were doing this.
Each commit had to be on a branch when you did it.
The workspace for that job may have moved on. So you have to have done something that puts this out into a staging area.
The issue here is you want to have done the commit and done the tests in order to know if it is the one that you now wish you had made the extra copy of. That is, when you start this process you don’t yet know if it will be the thing you want to end up with, since there are still tests to be done on it. Once you have done those tests, you know whether or not you should have done something earlier based on the success of those tests. So there is a chicken and egg situation.
Proposals
So we do this:
3 Jenkins jobs.
1 we currently have builds all the products.
Then it files this under the unique ID of the build (new addition to the Jenkins1 job)
2nd Jenkins job triggered on tags only. This uses the unique ID of the commit that was tagged. Uses this to cross reference to the first one. If it finds such a cross reference it does with the tag name, following the rules JG wrote up.
If Jenkins2 succeeds then it triggers the current Jenkins job that copies things to the website.
Products Origin
What is autogenerated by the scripts?
SKOS and OWL only.
The other deliverables are Glossary and SMIF diagrams (and SMIF repository, eventually). These all originate from SMIF.
Short Term
So we do need a snapshot of Development, that includes SMIF diagrams and the SMIF-generated Glossary.
Longer term plan:
With the exception of schema.org all our Products are generated either from OWL in GitHub or from CCM. IF we are maintaining an up to date set of Products in real time then the arrangements for updating all of these need to be in synch for both CCM- and RDF-generated material.
We need Headless CCM, which is used to autogenerate Glossary and (maybe) output the SMIF diagrams.
So we have 2 temporalities for real time Development.
Diagrams curation:
For curated (SMIF) diagrams, when there is a change in a class we will have a process that someone (Ashley) right clicks that class and exposes every connection (except Comment and Containment) so that they are rendered on the diagram, and then makes the layout look OK. This is a repeatable process that will work with someone following simple instructions.
Current Diagrams Dispensation
Problem for Web: the diagrams on Alfresco intercept their URI and take you into the Alfresco environment no matter what you do.
Alternative would be to generate a report a lot like this, using the report generator.
Can we just export the diagrams anyway? Such that the diagram image filename is the diagram name.
These diagrams have a naming convention for the defining diagrams at least. Export all that, drop them in nGenux and use the URIs.
Drop the " (AB)" from all diagram names.
ACTION: MB to do that.
Can we get Rest APIs for diagrams? To get a predictable set of URLs.
- alternatively can download a picture and publish it somewhere else with its URL.
Main topic on this (questions)
1. How do we keep our image server up to date?
2. How do we deal with those images when we publish to the server?
Create a CCM artifact ontology.
Then if you have an API that can fetch those or go to the database or something.
For this you need a copy of the CCM file on your own machine. So we need a headless thing that can save an offline copy of the repository.
Note that you can't save locally into a new project without changing all the GUIDs. However there is an "Offline Copy" file that is used and seen only when you think you are looking at the TWC online copy. If we can use and reference this with the headless version of NoMagic then we should be good.
You need to open the TWC and select “update”, so that the Offline Copy file is up to date.
Ideally the headless can even do the above check-in and update.
Remember that changes in your own JIRA branch do nothing so only when you check in a completed change, we then trigger everything including re-ingesting the OWL into CCM.
Q: Would there be a problem with automatically ingesting all the RDF/OWL back in to the CCM File?
A: This is not problem since as a minimum it should do nothing - this is even a good test that it has gone OK.
The Headless thing will also need to create the Glossary.
Conclusion
So we can have a real time view of Development. There is no need for a snapshot of Development as we have now, unless people ask for it.
Result of DA test just now:
Gets something with a long unusual IRI based on the branch name or something.
Has hit Build, see what happens.
This also runs the thing for all the branches that currently exist in GitHub e.g. INFRA-123 is a branch named for a JIRA.
In the test above, the only thing that was generated was the first of the 2 tags that was pushed.
e.g. Q2Test and Q2Test2.
Some bug existed, relating to Jenkins behavior.
Recommend: have a naming convention for the tag whereby it starts with branch name followed by tag name.
- This requires a change to the script as this now looks at the branch environment variable. The latter has some fragility e.g. if there are too many dashes in the environment variable then it decides it is not valid. So this is not good.
Suggest we not have the branch name and the tag name in the URL. But we want to set tags in multiple branches so we need to be consistent with our naming.
This is something we need to address now rather than for Q3.
Currently all the links on the front go to master/latest - we need to do something now to not cause our users problems. The current arrangement is not working.
Conclusions:
We need real time for Development and we need snapshot once a quarter of Release.
At present we have neither:
- Today we have a snapshot of Production as of today (not 30 June)
- Today we have a snapshot of Development
We need to get to where we use the tag to make us have a snapshot for each Quarter for Release (Production).
We need to get to where Development is updated whenever we update the master branch on GitHub.
For this, we need a structure and an naming convention
The up to date updating of master to the website is all working already.
Summary:
- Production (Release) = snapshot, based on a tag. Changed quarterly
- Development (Release + Provisional + Informative): No tag. Updated whenever Master branch changes.
For now:
Published FIBO script will figure out if it is coming from a tag or a branch.
If it is a branch the tag is nothing.
If the branch is Master, then for the Development stuff it does exactly as now, no change to the code.
For the Prod side, when it is a tag, we want to do the same thing, and when it is a branch we do nothing.
JG: If you are on the Master branch you generate 2 sets of files (Dev and Prod). Propose we have another branch called Production that is automatically built out of this. Now and then merge Master into Production and set off a tag.
Q: Are we maintaining these with humans or can it be set up automatically?
A: Automatically.
Back to the issue of how we were to synch up the branches - last time we spoke of this we decided not to have 2 branches but to have one branch and use tags. We can revisit this but that's how it is now done.
In Production branch, removing everything that is not production, in Git is a problem. If you do this on the publisher side then that is fine.
What that means? Publisher looks at Master, does this twice, once looking at everything, and once filtering out only what is at release quality.
This depends on the notion that for an ontology file that is in Production, the same version of that file is in Dev, i.e. the Dev additional scope refers to the production level versions of the ontologies that are in production.
Changes in Production-level Ontology files
What happens if / when an ontology file in Production, has changes to it that are in Development (separate branch in Git)?
- we need to discuss this. Meanwhile the above idea depends on Production never changing.
Actually this is not an issue because when you take a real time view of all FIBO in Development, you know not to assume that you are seeing the Release variant of those files that were published as a snapshot as Production - if you want those go to the snapshot from the quarterly release.
Can give this a name e.g. for release or something or Q32017 etc.
Considered whether to have Jenkins trigger based on something being the Release. The answer is no; actually committing to do a release is a human decision.
Policy: the development stuff is carrying on as it is but the Release stuff needs to be set at the previous tag.
Naming Conventions
Proposed naming convention:
2017Q3
Year, Q , quarter number.
This is based on us never having subsequent snapshots.
What if we want to do a release within a quarter, either for patches or fur a bug conference or something? Say we have a SmartData Release.
Use S for special e.g. S2.1
2017Q2
2017Q2S1
Only a single digit. This is because we should never be considering doing large numbers of snapshot updates, these are intended to be a quarterly matter plus special editions.
Can also do e.g. SmartData as special copy of Development for e.g. stuff we want to show off or play with.
Tag names are slightly different:
branchname_2017Q2
- this is the Git tag
e.g. Master_2017Q2
SmartData
(nothing after that one, SmartData is the whole of the tag)
Conclusions
JG understands the changes that need to be done with the scripting.