Create results
Step 1: Download and install this repository
Create and activate a virtual environment and clone this repository (11.38 MB) by running
python3 -m venv venv && source venv/bin/activate
git clone https://github.com/LoanpyDataHub/GothicHungarian.git
Next, clone the two repositories containing Hungarian and Gothic language data (10.65 MB + 8.75 MB):
git clone https://github.com/LoanpyDataHub/gerstnerhungarian
git clone https://github.com/LoanpyDataHub/koeblergothic
Next, from the same directory, run:
pip install -e GothicHungarian
This will install a command-line interface for running the analysis. It will also install two dependencies, namely loanpy and Spacy, for which we need to install a pretrained German word-vector model. You can find different models on the Spacy website. Currently this 500 MB model seems to be the most suiting (But make sure to use the same model as in gerstnerhungarian and koeblergothic because entries in those repositories were filtered out if they were missing from this particular word-vector model):
python3 -m spacy download de_core_news_lg
To deactivate the virtual environment run:
deactivate
and to remove it run:
rm -r venv
Step 2: Load the relevant data in the right format
From your command-line, run
loadinput
Load and transform input data for loanfinder, save to raw folder.
- gothuncommands.loadinput.main()
Read the filenames with the argparse library
Assign the file contents to variables
Create four dictionaries from them for later use
Create input for loanpy.loanfinder.phonetic_matches.
Write files to
rawfolder.
Step 3: Search for phonetic matches
From your command-line, run
phonmatch
Read the prepared input files in folder raw and search for phonetic
matches between Gothic and Hungarian. Write result as phonetic_matches.tsv
to folder out.
- gothuncommands.phonmatch.main()
Read the input data
Pass it on to loanpy
End the function since loanpy writes the file
Step 4: Search for semantic matches
From your command-line, run
semmatch
Read the phonetic matches file in folder out and search for
semantic matches among them. Write results as semantic_matches.tsv
to folder out.
- gothuncommands.semmatch.main()
Read phonetic matches file with csv library
Read related tables that contain the meanings
Grab meanings from related tables and create new input table
Input the table to loanpy.loanfinder.semantic_matches <https://loanpy.readthedocs.io/en/latest/documentation.html#loanpy.loanfinder.semantic_matches>`_
End the function since loanpy writes the file
- gothuncommands.semmatch.semsim(meaning1, meaning2)
Convert each meaning to a Spacy-object
Create cartesian product of both meaning lists with a nested for-loop
Return the similarity of the most similar pair
Step 5: Load columns for manual inspection
From your command-line, run
loadcols
Merge IDs in out/semantic_matches.tsv with relevant columns for manual
inspection.
- gothuncommands.loadcols.main()
Read the semantic matches file
Read the related tables
Stitch the desired columns together
Overwrite the input file
Step 6: Manually inspect the results
Open the file in a spread-sheet software, sort the rows according to
semantic similarity (column semsim) and within that according to
cognate ID (column ID_s). Carefully look at the matches:
Pick candidate loanwords where the phonetic matching and the semantic shift
looks plausible.