Michael Kane - Yale University and Phronesis
Contextualize the exploration (and analysis) of text
Demonstrate a preliminary text exploration using announcement from the G77 and UN
Define and demonstrate exploratory concepts pioneered by John Tukey
Python with the selenium package
R
G77 - loose coalition of developing nations promoting its members' collective economic interests
The date of a statement
Who the statement came from (G77, General Assembly, etc.)
The text in the statement
Let's look at statement volume over time
Data spans different periods. (Validation)
May be seeing a periodic effect of statement releases or we may bee seeing a relationship between G77 and EASC. (Hypothesis generation)
Let's look at content volume over time
A consistent constellation of words over time constitutes normalcy.
When the words from a new statement are different than what is normal something has change.
The proportion of new words that appear at time t compared to the words at time t-1.
Consider a corpus from a "language" with only two words:
he, runs
1, 1
4, 1
3, 1
1, 3
2, 3
Mathematical tools that can reduce the dimensionality while trying to preserve the salient differences.
Other tools find clusters and establish relationships between documents and other values of interest.
We've seen how cognostics help us prioritize our investigation of graph.
We've generated hypotheses.
This is usually where we propose models for testing hypotheses.
Unsupervised
Supervised
Casey King (casey@gophronesis.com) has been looking at the relationship between U.N. voting records, G77 statements, and U.S. Aid.