Proposal: LOSHv1 as the base (for merging the different standards)

hoijui · 3 August 2022 11:31

In the pros and cons below (comparing LOSHv1 to V1),
I will ignore the actual fields (with one exception, for Kaspar ),
because that is very subjective, and really,
though Martin Haeuer could give reasoning why LOSHv1 has better fields.

pros

LOSHv1:

Is a bit more strictly documented
Is better machine-readability
Has more projects, including all V1 projects (as of 2. August 2022)
Has an RDF ontology and a mapping of manifests to that ontology
Developers were/are “deeper into it”, as in:
they spent more hands-on time with tools and data, consistently, over years.
This applies both for tech devs and Martin,
who came up with the new set of fields.

RDF (point 4. above) opens a plethora of possibilities and advantages,
without any disadvantage,
as the more simple manifest representation also still exists,
and nobody has to write RDF by hand.

cons

LOSHv1:

Has not been developed by a larger community
Has (probably) had less people using/involved with it so far
Does not further categorize source and export files into CAD, Electronics and so on
(though one could argue this is possible with the file extension)
Has no source and export file data yet

This last point is an issue for Kaspar right now with LOSH,
and it is two fold:

okh-tool does not currently
map V1 design-files and schematics to LOSHv1 source and export files.
This would be less then one day of work for me,
which I’d like to do, and will (hopefully) at some point.
This only matters for V1 projects, when converting them to LOSHv1,
For the LOSHv1 data directly krawl’ed from platforms.
would need adjustments in the krawler
(I am not sure if this is worked on or even done already by now)

The list of OSH file types with their properties
should help in both cases.

RDF pros

RDF is the de-facto Open Linked-Data standard.
This means, that if we do it right (by linking the ontology to commonly used RDF schemas)
That RDF ready consumers just need to link to the source of the data,
And can access fields without knowing anything about OKH, without knowing that the project uses OKH, and still use all the fields that are linked to the commonly used RDF schemas.
For example, it would make it easy (or even a no-op) for search engines,
to send the user accurate results including OKH projects,
or it might allow libraries to index OKH projects/user-manuals alongside their books.
RDF is also a DB-ready format, so when aggregating projects in RDF form,
one can add them to an RDF-DB, and run queries on it, using SPARQL (quite similar to SQL).

While it takes some time to wrap ones head around RDF at first,
really, it is very simple, which comes with a lot of benefits.
It are just triples of the form:

Subject --Property--> Object

e.g.:

Jake --hasA--> Dog
ExampleProject --isA--> Project
ExampleProject --name--> "Example Project"
ExampleProject --license--> CC-BY-SA
ExampleProject --licensor--> JohnDoe
JohnDoe --name--> "John Doe"
JohnDoe --email--> "john.doe@email.com"

This is almost as complex as it gets,
just that there are also name-spaces (based on URIs) involved in reality.

One can then run queries like:

Give me all projects that have a GNU approved license and at least one image defined
Give me all people that are authors of more then one project
Count the number of uses of each license over all projects

RDF DBs are very simple to set up and use.

Links

The main repository,
describing the standard in different formats,
and keeping track of a lot of issues found while using the standard
A list of the fields
To setup a local RDF DB with the LOSH data, to then being able to query that data,
try the RDF-DB-tester.
All the latest LOSH data,
both the TOML manifest files (super easily convertible to JSON) and in RDF/Turtle.
This is also used by the RDF-DB-tester above.
The okh CLI tool:
1. Converts manifest files from OKH v1 to OKH LOSH-v1, and
2. validates both OKH v1 and OKH LOSH-v1 manifest files
The JSON Schema,
useful for validation of manifests, generation of sample-manifests, generating WebUIs (and maybe more?)
A (hand crafted) WebUI for creating manifests (link missing!)
(An other one is in the making by dyne.org)
The crawler
gets projects from GitHub, Wikifactory, Thingiverse and OSHWA
A list of manually curated metadata files for OKH LOSH
A very cheap Appropedia.org “crawler”,
while developped by LOSH, really is a pure OKH V1 tool.
A generator for a statistical report on LOSH data -
This piece is nothing to brag about, tech-wise -
yet it is a good example for what RDF queries are good for.
(The actual generated report is not available for download yet,
but can be requested from me if desired)

A Note on WikiBase

WikiBase (the software used by WikiData) is NOT RDF (based)!
You can not judge RDF based on any experience you had with WikiBase!

The only thing they have in common,
is that they both use the triple-store concept.
WikiBase can not be used for Linked-Data.

Gettign LOSH data onto WikiBase took two devs weeks of full-time,
very frustrating work stretched over months.
The same, much more stable and feature rich, on an RDF DB,
took one dev 30min (without ever before having used an RDF DB).

Never use WikiBase.

kaspar · 3 August 2022 18:18

Thanks for doing this @hoijui. Here are my thoughts:

1.

What do you mean by “strictly”? I’ve clicked through the provided links but nowhere does there seem to be a document that outlines how I would use this as a project creator or a platform developer. I do see the JSON schema and we should really provide an official one for OKH, but that doesn’t replace documentation. I think okh-v1, for all its flaws, is much more clearly documented.

2.

I’m still really not convinced RDF solves any problems that I have and requiring “time to wrap your head around” is a massive red flag for me. It’s going to hamper adoption unless we can make it an optional add-on. To me, we would need to be able to say: “by the way it’s also available as RDF if you are into that, but it’s not required”, is that possible @hoijui?

I think this does come down to a fundamental difference: I really don’t care about the data-scientist use-case. I care about the end-users creating and re-building projects and thus indirectly about the platforms providing hosting to those users.

3.

One thing that struck me about the whole LOSH approach is this part of the architecture. Where the crawler is collecting all this heterogenous data (“various file formats, mostly .json”)

v.s. how I have been thinking about it with okh-search, where individual instances communicate via a standardised API (note the key: “OKH compliant API”)

(adapted from GitHub - iop-alliance/okh-search-design-docs)

I think the LOSH approach is fine for what LOSH wanted to achieve but I really don’t think anything was ever attempted to be standardised. So when you say:

It’s nice to see so many OSHW projects but these projects and platforms never adopted a standard. The little adoption we have for okh-v1 is at least individuals and platforms actually adopting our standard.

4.

To me the actual schema changes are the most valuable from an OKH perspective and I think they should be turned into change proposals on this forum.

Summary

To summarise my points:

I don’t think LOSH-v1 is well documented compared to OKH-v1
I don’t think we should use RDF unless we can make it optional (I don’t care about data scientists )
I disagree with the LOSH approach of crawling heterogenous APIs and don’t think counting the number of projects crawled this way is worth comparing.
Schema improvements made in LOSH would be really valuable to be included in OKH.

hoijui · 8 August 2022 06:50

Ups … just saw that the first link (main repository) was wrong;
I fixed it now.
This first link is the standard,
and it explains The fields of the manifest

I agree that this (How to use OKH LOSH) is not easy to find or figure out from the main repo, though.
Could you point me to the exact point(s) in the V1 docu that do that?
I can tell you right now: “by the way it’s also available as RDF if you are into that, but it’s not required”
(See the 4th link in the original post: LOSH data)
I though I explained this a few times already, and Also in this post, but obviously I was not clear enough about that (probably also related to 1.).
The user/machine designer does NOT write RDF, but TOML (Though by now we also support YAML and JSON in the Krawler crawled repos).
I also listed the repo with all the data, which contains both the TOML and the RDF representation of the data for each project.
They are not complementary, they contain the same data. you can use which ever of the two you prefer, and forget about the other.
This to me, is clearly an argument pro LOSH: We support projects that do not put effort into supporting our standard.
Of course the support there would be limited in some sense,
but not necessarily generally more then in the case of platforms that specifically themselfs support our standard.

hoijui · 8 August 2022 06:58

ups… just saw that the mainfest-files are there as yml files, not toml, and not for all platforms… That’s a bug, will fix it, sorry!

kaspar · 8 August 2022 10:47

It’s not the easiest to understand language but I would say it’s described in section 4.3. I had also added something to the original wiki, which is now gone. It’s archived here. We should add something like this again. I’m not saying these instructions are adequate but it’s something to build on.
Is it an option then not to talk about RDF at all in our standard?
The thing I don’t understand is: what role does the standard play if you are the only ones using it? What’s the purpose of standardising the LOSH format?

The crucial thing to me is that when you say “use LOSH as a base” I don’t know what that means and this post and the linked documentation doesn’t help. I think it would be incredibly valuable to go through every change that LOSH made to OKHv1 and the rationale and examine it and build OKHv2 with these changes. I suggest we go through that process by LOSH submitting change proposals.

hoijui · 8 August 2022 13:28

Thanks! … I will have a look.
… That makes no sense. RDF is a benefit to some, that comes at no cost to anyone who does not want to use it. I recommend you use grep -V -i RDF on all that is LOSH for your personal convenience.
This argument comes down to: We should not use LOSH, because we do not use LOSH.
2 years ago, this community (IoPA and co.) stopped working and using OKH.
Since then, Martin and cohort have put a lot of work into it, improving it (in my opinion).
Now IoPA comes back, and you say (or so it sounds in my ears): “we do not use it, so we should ignore it.”
I do not understand.

As I understand it, and a I would like it to be, OKH is meant to be an open standard. Meaning, it does not belong to the IoPA, nor all the members of the IoPA or anything the like.
If the IoPA stops developing it, and someone else comes along and continues, and then later the IoPA comes back … why would all the work done by the other party/community have to be revised piece by piece? That would take at least an other 2 years, during which further development would be severely hindered if not halted… it just makes absolutely no sense for us (the Open Source people) to fail in such a horrible, bureaucratic way, out of sense of entitlement.
using LOSH as a base makes sense, cause it built on OKHv1. Also, it was mainly lead by Martin, who was a crucial person in the development of OKHv1 already.
Not using KOSH as a base would mean at least one of these two:

a LOT of work done the last 2 years, in the best intentions, would be lost
a LOT of time would be lost in the future, making individual proposals for all the changes LOSH made on top of V1, and deciding over them (realistically, that is not going to happen anyway)

To me it seems like, you feel that LOSH did a hostile takeover and messed up your OKH standard.
In reality, you left it to dry in the hot sun, and we gave water to it, removed the weed and added plants that collaborate with it at its side. You are angry cause it is different then how you left it.
… Maybe I am wrong, but … I just don’t understand your sentiment.

hoijui · 8 August 2022 13:28

TOML files are now online at LOSH data.

kaspar · 8 August 2022 15:21

I think there’s a major misunderstanding which is making our discussion much more adversarial than it needs be. It’s not about LOSH vs OKH. it’s simply about finding a way to merge the two. I think we should make use of all the work and of the improvements LOSH has made over these two year but the way this needs to be done is to take each improvement in isolation, document it, discuss it and adopt it into the standard.

Your proposal here is to use LOSH as the base and then go from there but it seems like a much less clear and actionable proposal to me. You would still need to go through every part of LOSH in isolation and write up the standard for it before we can make any improvements to it. Why not go the more clear and understandable route?

I don’t think the end result of these two routes are even that different, so if you wanted to go what I think is the hard way and use LOSH as the base and say: here’s the document that we call OKHv2. I think I could actually accept that. I think it’s just a lot more work for you and much harder for everyone to review.

hoijui · 10 August 2022 18:02

As said, doing it like you propose, is MUCH more work. LOSH is already a standard, it needs not to be made into a standard. It sure needs some adjustments and additions in the introduction text as you suggested, but how do you imagine this to be more work then dividing LOSH into individual changes and write forum proposals for each one of them, each having a discussion … that would result in more work then doing LOSH was in the first place, as I already explained, and thus it would obviously mean, discarding all the work that LOSH did, not to speak of (also: as already said): halting further development for most likely 2+ years, and generating a LOT of frustration, most likely leading in the whole LOSH team abandoning OKH completely.

The scenario you picture is completely fictional. If this scenario would be chosen, either LOSH would actually be ignored for the most part, or further development would be halted, and in either case, there would be lots of frustration.

Of course you can write a diagram, and draw 4 bubbles and 4 arrows and thus illustrate, that the two approaches lead to the same end-result. but that simply odes not map to reality. we are not machines that can do everything flawlessly and in constant time.