Concepts are Key, Not Words

Some form of textual analysis has become a standard feature among services that offer summaries of large volumes of documents.  Natural Language Processing (NLP), deep learning and neural nets are buzz words we often hear.  But when you look under the hood, most of the functionality is based on keywords, word counts and rigid taxonomies.  That is a pretty basic step and does not get you very far toward an understanding of “context”, “themes” or “concepts”.

Our partner at BEA takes textual analysis a step further and teaches artificial intelligence (AI) software to find concepts and themes, not just words.  It’s one thing to find all occurrences of the word “decline” in an earnings call transcript.  It is another thing altogether to understand the concept of “decline” within the context of a paragraph.  Is it a decline in sales?  or a decline in bad accounts?

In another guest article, Tom Marsh, CTO at Boulder Equity Analytics (BEA), talks about keyword vs. theme detection and why “concepts are key, not words”.

A critical skill for the analyst during earnings season is detecting changes in the key indicators or themes for a company and its peers. Keyword detection is often passed off as theme detection but it’s not and the difference is critical.  Here at BEA, teaching software (AI) to find themes buried in SEC filings, earnings calls and press coverage from investor relations is a critical technology advantage.

First, understand the terms. Analysts tell us that with all the buzzwords and claims by vendors, it’s hard to understand the difference between real and apparent performance.  For us, the goal is to replicate an expert analyst’s ability to read and understand a document, whether its a filing, earnings call or interview.

While there’s more, in this post I want to make sure we understand each other when we use the term “theme”, “topic” or “concept”.

What is a “concept”?

Since we claim to teach software agents to find “concepts”, let’s check the definition of the term “concept” to make sure we are using it correctly. While I didn’t expect this to lead me all the way back to Philosophy class with references to Kant, Locke, Mill etc., our approach and use of this term are fundamentally consistent with the excerpts below from Wikipedia.

Conceptdefinition from Wikipedia

A concept is a general idea, or something conceived in the mind.

Notable definitions:

John Locke‘s description of a general idea corresponds to a description of a concept. According to Locke, a general idea is created by abstracting, drawing away, or removing the uncommon characteristic or characteristics from several particular ideas. The remaining common characteristic is that which is similar to all of the different individuals.

John Stuart Mill argued that general conceptions are formed through abstraction. A general conception is the common element among the many images of members of a class. “…When we form a set of phenomena into a class, that is, when we compare them with one another to ascertain in what they agree, some general conception is implied in this mental operation” (A System of Logic, Book IV, Ch. II).

Philosopher Arthur Schopenhauer argued that concepts are “mere abstractions from what is known through intuitive perception, and they have arisen from our arbitrarily thinking away or dropping of some qualities and our retention of others.” (Parerga and Paralipomena, Vol. I, “Sketch of a History of the Ideal and the Real”).

By contrast to the above philosophers, Immanuel Kant held that the account of the concept as an abstraction of experience is only partly correct. He called those concepts that result from abstraction “a posteriori concepts”.

A concept is a common feature or characteristic. Kant investigated the way that empirical a posteriori concepts are created.

“The logical acts of the understanding by which concepts are generated as to their form are:

  1. comparison, i.e., the likening of mental images to one another in relation to the unity of consciousness;

  2. reflection, i.e., the going back over different mental images, how they can be comprehended in one consciousness; and finally

  3. abstraction or the segregation of everything else by which the mental images differ …

In order to make our mental images into concepts, one must thus be able to compare, reflect, and abstract, for these three logical operations of the understanding are essential and general conditions of generating any concept whatever. For example, I see a fir, a willow, and a linden. In firstly comparing these objects, I notice that they are different from one another in respect of trunk, branches, leaves, and the like; further, however, I reflect only on what they have in common, the trunk, the branches, the leaves themselves, and abstract from their size, shape, and so forth; thus I gain a concept of a tree.”

— Logic, §6

Optimization of AI software

We worked with our ai-one partner to optimize their AI for this task. 

At the core, our application processes each line of text much the way our brains do it, learning the patterns of language, the “key” words, their importance and the words most closely associated with them. The AI provides commands to extract as an array those key words and associations, their direction and values (strengths).

While the ability to score the similarity of concepts is important, my observation from years of applying it to problems from NASA (Topic Mapping pg.198) to SwissRe is that it’s even more proficient at filtering out the noise, giving lowest values to the unimportant words and associations.

Filtering is fundamental to our brain’s ability to find the topic that’s important.

Locke describes a concept as an idea “created by abstracting, drawing away, or removing the uncommon characteristic or characteristics”. This is very close to the way our solution builds a model of a concept after learning from the examples provided to teach it. The fingerprint we extract from that text is an array that represents the concept in the same way. The similarity score for that comparison is a powerful attribute we use in a number of ways to deliver a great user experience.

Building powerful qualitative analytics for financial analysts and investors starts with the right core technology. Finding concepts buried inside documents is the first part and foundation of extracting actionable insight.

Now that I think about it, maybe Philosophy 101 wasn’t a liberal arts waste of money after all.

Tom

@tom_semantic

KDD Analytics and Boulder Equity Analytics are partnering to deliver collaborative artificial intelligence to the financial and competitive analysis industries.