Create results

Step 1: Download and install this repository

Create and activate a virtual environment and clone this repository (11.38 MB) by running

python3 -m venv venv && source venv/bin/activate
git clone https://github.com/LoanpyDataHub/GothicHungarian.git

Next, clone the two repositories containing Hungarian and Gothic language data (10.65 MB + 8.75 MB):

git clone https://github.com/LoanpyDataHub/gerstnerhungarian
git clone https://github.com/LoanpyDataHub/koeblergothic

Next, from the same directory, run:

pip install -e GothicHungarian

This will install a command-line interface for running the analysis. It will also install two dependencies, namely loanpy and Spacy, for which we need to install a pretrained German word-vector model. You can find different models on the Spacy website. Currently this 500 MB model seems to be the most suiting (But make sure to use the same model as in gerstnerhungarian and koeblergothic because entries in those repositories were filtered out if they were missing from this particular word-vector model):

python3 -m spacy download de_core_news_lg

To deactivate the virtual environment run:

deactivate

and to remove it run:

rm -r venv

Step 2: Load the relevant data in the right format

From your command-line, run

loadinput

Load and transform input data for loanfinder, save to raw folder.

gothuncommands.loadinput.main()

Read the filenames with the argparse library
Assign the file contents to variables
Create four dictionaries from them for later use
Create input for loanpy.loanfinder.phonetic_matches.
Write files to raw folder.

Step 3: Search for phonetic matches

From your command-line, run

phonmatch

Read the prepared input files in folder raw and search for phonetic matches between Gothic and Hungarian. Write result as phonetic_matches.tsv to folder out.

gothuncommands.phonmatch.main()

Import loanpy.loanfinder.phonetic_matches
Read the input data
Pass it on to loanpy
End the function since loanpy writes the file

Step 4: Search for semantic matches

From your command-line, run

semmatch

Read the phonetic matches file in folder out and search for semantic matches among them. Write results as semantic_matches.tsv to folder out.

gothuncommands.semmatch.main()

Import loanpy.loanfinder.semantic_matches
Read phonetic matches file with csv library
Read related tables that contain the meanings
Grab meanings from related tables and create new input table
Input the table to loanpy.loanfinder.semantic_matches <https://loanpy.readthedocs.io/en/latest/documentation.html#loanpy.loanfinder.semantic_matches>`_
End the function since loanpy writes the file

gothuncommands.semmatch.semsim(meaning1, meaning2)

Convert each meaning to a Spacy-object
Create cartesian product of both meaning lists with a nested for-loop
Return the similarity of the most similar pair

Step 5: Load columns for manual inspection

From your command-line, run

loadcols

Merge IDs in out/semantic_matches.tsv with relevant columns for manual inspection.

gothuncommands.loadcols.main()

Read the semantic matches file
Read the related tables
Stitch the desired columns together
Overwrite the input file

Step 6: Manually inspect the results

Open the file in a spread-sheet software, sort the rows according to semantic similarity (column semsim) and within that according to cognate ID (column ID_s). Carefully look at the matches: Pick candidate loanwords where the phonetic matching and the semantic shift looks plausible.