By contrast, machine learning models are flexible algorithms
that grow and change with exposure to new data. Such modeling
methods can be adept at crunching through large volumes of
data to identify characteristics and their interrelationships that
help to predict credit behavior. Machine learning methods allow the model to build and update itself. The automated model
determines which variables are useful and how to combine them
to best predict behavior based on the latest data available. The
model determines which applicants should be approved based
on targets for business objectives (e.g., charge-off rates or profitability), without necessarily computing a numeric credit score.
In some cases, machine learning may combine the results of
multiple models to increase predictive power. Finally, machine
learning methods tend to select attributes and combinations of attributes based purely on the strength of their correlations to credit
outcomes. Less emphasis is placed on whether there are logical
economic or behavioral reasons underlying those correlations.
A key motivation for machine learning is the desire to identify
and exploit subtle and difficult-to-observe relationships among
disparate data elements from many different sources that can
be combined to better predict consumer behavior. As a simple
example, there may be a distinct difference in default risk between
a consumer with multiple recent delinquencies on a single credit
account and several recent credit card inquiries who has also
recently applied for a payday loan, compared to a consumer who
is otherwise the same but has delinquencies across multiple accounts. It would be prohibitively time consuming or impossible
for a human analyst to evaluate all possible interrelationships
among all available data elements to identify the characteristics
that best predict default. Machine learning techniques, however,
can allow an analyst to consider an arbitrary number of complex
interrelationships among hundreds or thousands of variables.
Beyond that, because the process of building the model is
automated, it can be updated frequently as new data on actual
loan performance becomes available. This is especially valuable
for new products and new lenders, for which loan performance
experience is very limited.
Fair Lending Benefits and Risks
Big Data and new modeling approaches have the potential to
provide new insights into consumer behavior that could improve
profitability for lenders and broaden credit access for consumers. Alternative data sources and modeling methods could allow
lenders to better serve consumer segments that historically have
been underserved, such as consumers who are unbanked, have
low or moderate incomes, do not use traditional credit products,
are self-employed or have little established credit history.
For example, a 2015 study by the Bureau estimated that about
15% of Blacks and Hispanics are “credit invisible”—meaning that
they have no records at the national credit reporting bureaus—
compared to about 9% of Whites and Asians (“Data Point: Credit
Invisibles,” Consumer Financial Protection Bureau, Office of
Research May 2015). The study also found that a further 13%
of Blacks and 12% of Hispanics have credit bureau records that
cannot be assigned a traditional credit score because of insufficient
credit history or insufficient recent credit activity, compared to
about 7% of Whites and Asians. Even without understanding
the assumptions, data attributes, or motivations in the study, the
results suggest that mining alternative data sources for information about consumer payment behavior or risk characteristics
could potentially broaden access to credit for minority consumers.
Easy and low-cost access through an online platform generally
results in faster credit decisions and funding, reduced shopping
costs, reduced geographic boundaries, increased choice and flexibility for consumers seeking credit and provides opportunities
to build good credit management habits. Also, the automation
of credit application and decision processes reduces the risk of
disparate treatment on a prohibited basis that can arise in manual
or judgmental processes—assuming the inputs to the automated
decision are not problematic.
Fair lending risk can still arise with automated and machine
learning processes and fair lending risk management becomes
more challenging with machine learning and data analytics. It’s
probably safe to say that there is not a full appreciation among credit
risk specialists of how fair lending risk may arise in automated,
model-driven processes. Modelers are likely to say, “We don’t
discriminate. Our models don’t consider prohibited factors.” While
that’s a big step in the right direction, the risk of disparate impact
(particular attributes alone or in combination with other attributes)
may get insufficient attention, and all aspects of the credit process
may not be evaluated for fair lending risk.
There are various potential sources of fair lending risk that should
be considered in the use of alternative data and automated decision
processes. First of all, the Equal Credit Opportunity Act (ECOA) and
Regulation B prohibit lenders from discriminating on the basis of
prohibited characteristics in any aspect of a credit transaction. This
means that the full credit lifecycle must be evaluated for fair lending
risk, including marketing, underwriting, fraud risk detection, setting
terms and conditions (pricing, credit line/limit determination, etc.),
servicing and collections. Each of these stages in the process may
involve different data sources, decision criteria and models with
different fair lending risk potential, and each should be evaluated.
Next, the risk of a disparate impact on a prohibited basis should
be evaluated. Ostensibly neutral variables that predict credit be-
Unlike credit history data, which has long been accepted by regulatory
agencies as having a legitimate business justification notwithstanding its
correlations with prohibited bases, alternative credit attributes have yet
to gain widespread acceptance and some are viewed with suspicion.