Minutes June 2025 Community Call
Open Know-How: Announcing New Tools
Towards ontology-based, distributed hardware search
Introduction
This NGI Search project is presented by Open Source Ecology Germany:
- Robin Vobruba
- Timm Wille
- Pieter Hijma
Overview of the work
Robin presents the main graphic, focuses mostly on the bottom part: the scraping:

Robin uses the following notes for presenting the work.
Open Know-How is a standard for meta data for open source hardware projects. The meta data can be specified by humans in OKH manifest files, but the project also includes an updated tool called a scraper that searches for Open Source Hardware projects in well-known repositories and produces OKH manifest files automatically.
A different tool converts the manifest files to RDF data or “linked data”. The benefit of this linked data is that it defines a standard and it allows you to link to other standards as well. An example would be Open Know-Where; meta data that describes location of production facilities and their capabilities.
The meta-data is human-readable and can be presented on a website by means of one of the developed tools. But in addition, the code is also machine-readable, allowing computers to reason about the data and make connections.
Details of changes to Open Know-How
Complexity
The standard is unfortunately more complex, but for a good reason: It is now possible to interlink standards. The ultimate goal is distributed manufacturing and we believe that in order to be efficient in distributed manufacturing, we need to have an interlinked network as foundation.
Scraping
The scraping has highly improved from 35 thousand projects in 2022 to 1 million projects now.
Unfortunately GitHub is not scraped any longer. In fact, this never worked well in the first place. Wikifactory is not scraped either because their API is not available anymore.
The scraping continues to use Appropedia with which we are in close collaboration. We also scrape the projects indexed by OSHWA. Finally, we scrape Thingiverse projects, something which is possible. Many thanks to the ones helping out in scraping!
There are also new sources, for example manifest-repos
, a project from Mairin O’grady from Public Inventions. We currently don’t use manifest-lists
by Kaspar Emmanuel, but it is in the testing phase.
Images
A change to the standard is that images can now have tags and other metadata.
Ontology
The ontologies (that define the standard) have now permanent URLs and follow best practices. A collaboration with Linked Open Vocabularies on their tool OOPS! was instrumental in this.
Server hosting
For the first time, there is now a server hosting the projects data the community can send queries to.
We have funding to keep the server and the hosting running for 10 years.
Learnings
It is a challenge to acquire high-quality data. First we have to collect all data, then find the overarching concepts found in the data, define that concept in the schema in order to make all data comparable and of high quality.
Another issue is the so-called “distributed identificaton problem”. The same project may be mirrored or forked. How do we know if a project is a mirror or a fork and thus a new project?
In general in IT it is well recognized that “naming things” is challenging. This is even more challenging for standards and it has to be done properly.
Questions from the audience
Timm allows the audience to ask questions that came up in the chat. He shares the website of the project.
Scraping GitHub
Scraping GitHub is not something they allow. It is possible to have a list of repositories that contain open hardware and update the database based on that.
Julian Stirling asks if we considered to have a 1-click GitHub action install that pushes itself to a global list. Victoria Jacqua would be in favor of that. Martin Hauer is reminded of Zenodo’s DOI system that is automatically updated on a new version. Robin thinks that it is a good idea and perhaps this is possible with the earlier mentioned recursive lists.
Robert Reed asks when this is going to be live. It would have been good to have a minimal version live and improve on that incrementally. Timm mentions that there is an easter egg at the end. The project was not in a position to do put this live any earlier, apart from the SPARQL query service (not human friendly).
Outlook
The following tasks are on our radar:
- Update the IoPA website with references to the new OKH standard
- Solving the distributed identification problem
- Establish a standard for Bill of Materials
- Continue to work on “naming things”
- This is already a fruitful collaboration with Lynn Foster from Value Flows.
- LinkML
- Another, potentially improved way to link data
- Scraper improvements
- Robin gives a demo of the current scraper that is too slow.
- Ask ourselves the big question whether standards is the ideal solution here.
- Although standards clearly have benefits, there are also drawbacks. We should be open to ask ourselves whether we are on the right path.
Demo of the website
Timm showed a demo of the search UI and showed filtering options and a couple of projects.
Linked Data explanation
Linked data is using internet as a distributed database. Instead of websites that present things to humans, we make data accessible to computers.
Final Questions
Victoria Jaqua as curator of the Open Source Medical Supplies wonders if there is a way to filter for medical data. Robin and Martin Hauer answer that they used the CPC classification that is used in patents. This information has to be provided currently. Victoria stresses that many universities or projects work on the same thing without knowing about each other. Discoverability is important.
Another question from Victoria is whether HardwareX, the leading journal on open source hardware projects should not be included in the scraping. Martin Hauer answers that HardwareX is aware of the project since they hosted a podcast on the subject. Perhaps we should show a stable interface so they can recognize the value of this work.
Repositories
Open Know-How
- The new Open Know-How standard
- The crawler for OKH, updated:
- Validate and convert OKH metadata
- An improved scraper with higher performance, WIP;
will one day replace the Krawler
RDF/Ontology
- A service for hosting ontologies according to best-practice
- A linter for RDF
- A pretty printer for RDF/Turtle files