An Exploratory Analysis of Text Trends in the G77 (and rest of the UN)

Michael Kane - Yale University and Phronesis

What are the goals for this talk?

Contextualize the exploration (and analysis) of text

Demonstrate a preliminary text exploration using announcement from the G77 and UN

Define and demonstrate exploratory concepts pioneered by John Tukey

Technologies Used

Python with the selenium package

tm (Feinerer, Hornik, Artifex Software, Inc.)
trelliscope/datadr (Hafen)
irlba (Lewis, Baglama, Reichel)
foreach (Weston, Revolution Analytics)
iotools (Urbanek)
lubridate (Wickham)

How much can we know about a collection of texts without reading the content?

U.N. and G77 Statements

G77 - loose coalition of developing nations promoting its members' collective economic interests

U.N. organs

General Assembly - the main deliberative assembly
Secretary General - provides studies, information, and facilities needed by the UN
Security Council - decides on resolutions for peace and security
Economic and Social Council - promotes international economic and social co-operation and development

The data

The date of a statement

Who the statement came from (G77, General Assembly, etc.)

The text in the statement

Let's look at statement volume over time

Taking a step back

Data spans different periods. (Validation)

May be seeing a periodic effect of statement releases or we may bee seeing a relationship between G77 and EASC. (Hypothesis generation)

Let's look at content volume over time

"There's nothing better than a picture for making you think of all the questions you forgot to ask" -John Tukey

Are changes in statement frequency and content volume related to narrative changes?

Defining a change in narrative

A consistent constellation of words over time constitutes normalcy.

When the words from a new statement are different than what is normal something has change.

A measure of narrative novelty

The proportion of new words that appear at time t compared to the words at time t-1.

N(X_t | X_{t-1}) = \frac{ |X_t \setminus X_{t-1}| }{|X_t|}

N (X ​ t ​ ​ ∣ X ​ t - 1 ​ ​) = \frac{​ ∣ X ​ t ​ ​ ∖ X ​ t - 1 ​ ​ ∣ ​ ​}{​ ∣ X ​ t ​ ​ ∣ ​}

"It seems natural to call such computer guided diagnostics cognostics. We must learn to choose them, calculate them, and use them. Else we drown in a sea of many displays" -John Tukey

What happened in August 2012?

Let's see.

What if we want to compare individual statements?

A toy example

Consider a corpus from a "language" with only two words:

"He runs."
"He he he he runs."
"He he he runs."
"He runs runs runs."
"He he runs runs runs."

Create a Term-Document Matrix

he, runs

1, 1

4, 1

3, 1

1, 3

2, 3

Plot the documents in the word space

Going beyond the toy example

Mathematical tools that can reduce the dimensionality while trying to preserve the salient differences.

Other tools find clusters and establish relationships between documents and other values of interest.

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." - John Tukey

What comes next?

We've seen how cognostics help us prioritize our investigation of graph.

We've generated hypotheses.

This is usually where we propose models for testing hypotheses.

If you are interested in the modeling portion...

Unsupervised

(Probabilistic) Latent Semantic Analysis
Latent Dirichlet Allocation

Supervised

Supervised Latent Dirichlet Allocation
Support Vector Machine

If you are interested in further analysis of the G77 Statements...

Casey King (casey@gophronesis.com) has been looking at the relationship between U.N. voting records, G77 statements, and U.S. Aid.

Thanks!

U.N. Exploratory Data Analysis

By Michael Kane

U.N. Exploratory Data Analysis

2,263

Michael Kane

kaneplusplus

An Exploratory Analysis of Text Trends in the G77 (and rest of the UN)

What are the goals for this talk?

Technologies Used

How much can we know about a collection of texts without reading the content?

U.N. and G77 Statements

The data

Taking a step back

"There's nothing better than a picture for making you think of all the questions you forgot to ask" -John Tukey

Are changes in statement frequency and content volume related to narrative changes?

Defining a change in narrative

A measure of narrative novelty

"It seems natural to call such computer guided diagnostics cognostics. We must learn to choose them, calculate them, and use them. Else we drown in a sea of many displays" -John Tukey

What happened in August 2012?

What if we want to compare individual statements?

A toy example

Create a Term-Document Matrix

Plot the documents in the word space

Going beyond the toy example

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." - John Tukey

What comes next?

If you are interested in the modeling portion...

If you are interested in further analysis of the G77 Statements...

Thanks!

U.N. Exploratory Data Analysis

More from Michael Kane