Following recommendations made during DAPSI Phase 2 R&D (final report available here), work on implementing recommendations from that final report have continued, led by @hoijui of Open Source Ecology (OSE) Germany. Below are:
-
updates regarding the current work,
-
outstanding issues that need to be addressed,
-
recommendations from Robin on where the community should focus efforts to suport the success of the OKH-LOSH crawler, and
-
next steps
Current Work
-
Cleaning up the OKH specification, focusing on OKH-LOSH per recommendations provided by DAPSI Phase I Use Case Analysis Report for discoverability/location of OKH project manifest files by a crawler (Use Case 3: OPEN-NEXT/OKH-LOSH, p4).
-
Testing current state and applying adjustments to the crawler
-
Crawler is currently configured to locate all OKH manifest files in .toml and .yml; .ttl is being added based on MVP and UX testing recommendations for design file portability made in DASPI Phase II User Testing and Technical Report
-
Ability to fetch project manifests from GitHub: confirmed; now testing full crawl of GitHub
-
Platforms currently crawled are listed on https://github.com/OPEN-NEXT/LOSH-krawler/tree/main/krawl/fetcher
-
Outstanding Items/Issues
- Crawler chron
-
Manually created list of projects to crawl (.csv) file was deleted by user kasbah Make a data directory · iop-alliance/okh-search@d3a2504 · GitHub - Robin Vobruba restoring.
-
Regular schedule / chron needs to be defined for the crawler
- How often should the chron run (> 1 min)?
- Who owns/shepherds the chron (IoPA OKH WG Chair)?
-
Agreements with platforms (MOUs preferred) crawled need to be established to ensure that there is a shared understanding of:
- What information is needed/shared
- Speed of chron (some currently take up to 3 days to fetch, and information is incomplete)
-
@hoijui will be approaching this work by separating platforms into two groups moving forward:
-
Well Supported, such as Appropedia, Github, OSHWA
-
Well supported are platforms that are either file-system-based (e.g. git based) and have a useful search function, like github, or those that give easy access to all their relevant data, and generally have fewer projects, like appropedia and OSHWA.
-
Well supported means: we will try to keep a relatively up-to-date list of projects hosted and their current meta-data. For the Low support platforms, we basically just maintain more or less what is already there in the crawler, and rather focus on manually requested, one-time exports of the data of specific projects, to provide easy means for projects to move away from these platforms.
-
-
LOW Supported, such as Thingiverse and Wikifactory
- @hoijui chose this separation, because platforms like Thingiverse have no interest in 3rd parties to export meta-data, or even whole projects from their platforms, it is not a part of their business model. We should focus our efforts where they may bear fruits, and prevent frustration as well.
Next Steps
-
Restore file deleted from GitHub Make a data directory · iop-alliance/okh-search@d3a2504 · GitHub - Owner: @hoijui - DONE
-
Schedule OKH WG meeting - Owner @schutton, with agenda items pending (DONE, indicate availability here):
- Define chron schedule - Owner: OKH WG
- Define chron owner/shepherd - Owner: OKH WG
- Define MOU for platform agreement(s) - Owner: OKH WG
- Identify community member to negortiate MOUs with identified platforms, following vote - Owner: OKH WG
-
Review platform crawl list - Owner: @schutton (post vote to community forum, following OKH WG meeting)