2021-01-05 | Xiaohang Zhao: A Deep Learning Approach to Industry Classification
Industry classification systems (ICSs), which identify economically related firms as peer firms, play a central role in business research and practice. Traditional expert-driven approaches manually design ICSs and thus have limitations, including high maintenance costs and coarse granularity of the identified firm relatedness. To circumvent these limitations, recent research takes an algorithm-driven approach, employing a bag-of-words method to represent firms’ 10-K reports and leveraging these representations for identifying economically related firms. While firms’ 10-K reports are highly informative for identifying economically related firms, the bag-of-words method is inadequate for representing these documents, as it ignores the rich semantic information encoded in word contexts and order, resulting in a less effective ICS. Recent developments in deep-learning-based document embedding provide powerful tools for document representation. However, existing document embedding models (DEMs) are not well suited to capture the rich semantics of 10-K reports due to their challenging nature: they are long documents featuring heterogeneous and shifting concepts. We propose a novel DEM to address these challenges; it solves them through an innovative design of an adaptive gating mechanism and its associated gating function. In addition, we develop a new ICS that takes firms’ 10-K reports as input, employs the proposed DEM to represent the semantics of these reports, and identifies economically related firms based on similarities between their 10-K representations. We demonstrate through extensive empirical evaluations that our proposed ICS is superior to representative existing ICSs as well as ICSs constructed using state-of-the-art DEMs. This study contributes to business research and practice with a novel ICS that can effectively identify economically related firms. It also contributes to the field of deep-learning-based document embedding with an innovative DEM that can capture the semantics of a broad variety of long documents with shifting concepts, such as 10-K reports, legal documents, and patent documents.
Xiaohang Zhao is a Ph.D. Candidate in Financial Service Analytics at the Alfred Lerner College of Business & Economics, University of Delaware. His primary research interest is designing novel methods for solving problems in Financial Technology, Social Network Analytics and Health Care Analytics by leveraging tools in Deep Learning, Machine Learning and Natural Language Processing. Xiaohang Zhao holds a bachelor's degree in Financial Engineering from Renmin University of China.