|Jesse Griffin 757a3c2d82 Added initial manifest.yaml file||6 months ago|
|LICENSE||1 year ago|
|Project Explanation.md||1 year ago|
|README.md||1 year ago|
|Volunteer job description.md||1 year ago|
|manifest.yaml||6 months ago|
The resource we are using as our UHB is the Open Scriptures Hebrew Bible. This project is the Westminster Leningrad Codex with Strongs lexical data and morphological data marked up in OSIS files.
See the parsing status for the whole Old Testament. Or use the book by book links below.
Get tC to support OSIS XML files like https://github.com/openscriptures/morphhb/blob/master/wlc/Ruth.xml
lemmaattribute, which is the word's Strongs number
morphattribute, key here
May as well read the files directly from https://github.com/openscriptures/morphhb/blob/master/wlc/ unless we want to create a process to put this into our container format.
Currently, I'm only seeing about 1% of the words in those files has having morphological data.
Write a comparer script that can verify our proposed parsings from http://hb.openscriptures.org/OshbParse/ against an existing dataset (such as https://shebanq.ancient-data.org/shebanq/static/docs/tools/shebanq/plain.html). If they check out then they can be marked as verified and included in the XML files.
Create a process that takes verified parsings from https://github.com/openscriptures/morphhb/blob/master/wlc/ and programmatically guess at the rest of the words in the OT (e.g. strip cantillation and find and replace for unknowns). Feed these back into the parsing system at http://hb.openscriptures.org/OshbParse/ and verify them against an existing dataset and/or Editors.
If we can make this an iterative process then we would be able to cut down the amount of manual intervention necessary to get the morph data.
After the morphology data is complete, the UHB project will effectively be completed. At the moment there are no further plans to markup the text with other information.