How Amazon Uses Machine Learning to Drive the Customer Experience

I read all 22 Amazon Shareholder letters. 

I wanted to understand how Amazon used machine learning to drive the customer experience. Here's what I learned.

Amazon is a Platform-as-a-Service (PaaS) company that just happens to be a retailer

Jeff Bezos has been consistent about Amazon's goal since the beginning - deliver an amazing customer experience. That means providing vast selection, fast convenience, and price reductions. Amazon has invested in building massive platforms, whether it's fulfillment or cloud computing centers, to support the pillars of selection and convenience. The last pillar, price reductions, is a result of the efficiencies from scaling. 

Here's their playbook: (a) create a platform for their own business needs (b) formalize this platform into an ecosystem by opening it up to 3rd-parties (c) refine this platform into a self-service option that's easy-to-use with an interface. Lather, rinse, repeat. 

An example is Amazon's distribution centers which they built to sell their own products,

We opened distribution and customer service centers in the U.K. and Germany, and in early 1999, announced the lease of a highly-mechanized distribution center of approximately 323,000 square feet in Fernley, Nevada. This latest addition will more than double our total distribution capacity and allows us to even further improve time-to-mailbox for customers. - 1998 Shareholder Letter

Then they launched Fulfillment by Amazon (FBA) which opened up their distribution centers to 3rd-party sellers,

Fulfillment by Amazon is a set of web services APIs that turns our 12 million square foot fulfillment center network into a gigantic and sophisticated computer peripheral. Pay us 45 cents per month per cubic foot of fulfillment center space, and you can stow your products in our network. You make web services calls to alert us to expect inventory to arrive, to tell us to pick and pack one or more items, and to tell us where to ship those items. - 2006 Shareholder Letter

And finally evolved FBA to a self-service user-interface that anyone can use,

...when sellers use FBA, their items become eligible for Amazon Prime, for Super Saver Shipping, and for Amazon returns processing and customer service. FBA is a self-service and comes with an easy-to-use inventory management console as part of Amazon Seller Central. - 2011 Shareholder Letter

Amazon has mastered the ability to invent, refine and scale platforms that enable an unparalleled customer experience. 

Amazon has focused on customer personalization since the beginning

Nowadays, we often take product recommendations for granted. We hear about Netflix's famous recommendation engine which predicts the shows that keep you binge watching or Spotify's ability to introduce you to new artists that you'll love.

Bezos knew that Amazon.com couldn't win against physical bookstores if they tried to duplicate the same experience online. The two distribution channels are different, which means that the customer experience needs to be different. He talks about customer personalization in his very first shareholder letter,
Today, online commerce saves customers money and precious time. Tomorrow, through personalization, online commerce will accelerate the very process of discovery. - 1997 Shareholder letter
Amazon's vast selection meant that they had a larger product catalog than any physical store. More choice is great, but not practical if I can't find what I'm looking for on the website. They've managed to develop the right assets over time to scale product recommendations: unparalleled access to historical data, cloud computing resources, and machine learning recommendation engines.

First, they started with recommending new books. A decade later, they added features such as customer reviews & product discovery like 'customers who bought this item also bought'. In the past decade, the rise of cloud computing allowed Amazon to put their product recommendation engine on steroids - building a search engine that returns fast, relevant results and running hundreds of software programs to personalize a product page,
...to construct a product detail page for a customer visiting Amazon.com, our software calls on between 200 and 300 services to present a highly personalized experience for that customer. - 2009 Shareholder Letter
Amazon has relentlessly focused on turning their vast product selection into the killer customer experience feature of personalization. 

Amazon has evolved from using machine learning to support their products to machine learning as the product

Over time, Amazon's machine learning capabilities went from a support role that augmented decision making to becoming a product in itself.

For example, machine learning for order fulfillment represents augmented decision-making. Computers label and categorize a product, which minimizes errors that cost money & time. However, more recently, Amazon has created machine learning products. The best examples are Alexa, which uses natural language processing to understand human speech, and Amazon Go, which are stores where checkout lines are eliminated with the aid of cameras using computer vision.

True to Amazon form, they're running the playbook I described earlier and turning machine learning into a platform service. They created machine learning models to support their own infrastructure, then they formalized those models into a platform that people can access, such as Alexa's API - Alexa Voice Service (AVS), and now they're in the process of offering pre-packaged deep learning models-as-a-service - Amazon SageMaker - for developers to run on their cloud platform.

It's been decades in the making, but it looks like machine learning modelling may emerge as their new big platform. And one can imagine SageMaker sets them up to remain a top company for the next decade. As Jeff Bezos says, at Amazon, every day is Day 1.

Mythbusters: Were Overzealous Algorithms Responsible for Slow Sales at Loblaw Companies Ltd?

Well, this is new.

Retailers have blamed bad weather for poor sales before, but I don't think I've ever seen a retailer blame bad algorithms.

Loblaw Companies Ltd, Canada's largest grocery and pharmacy chain, which owns the mainstream brand Loblaws and discount brand No Frills, had a soft Q2 performance with same-store revenue growing 0.6%. Their President, Sarah Davis, blamed the performance on algorithms that prioritized increasing profit margins instead of promotional pricing to attract foot traffic. That is, Loblaw chose to increase the revenue from each customer instead of focusing on increasing the number of customers. She says:

We know exactly what we did and what we did was we focused on going for margin improvements...And in the excitement of seeing margin improvements in certain categories as we started to implement some of the algorithms, people were overzealous...You end up with fewer items on promotion in your flyer. 

Are the data scientists at Loblaw really running wild with their overzealous algorithms and causing there to be fewer items on promotion in flyers, and, ultimately, softer sales? 

Davis' statement was a Mythbusters challenge that I couldn't resist, so I did some research. 

I looked at the Ontario flyers for No Frills and their competitors, Food Basics (Metro), and FreshCo (Empire) over the same time period in May 2019 and 2018. I selected a total of 10 items - a mix of produce, meat, and dairy - that were posted in the flyers for all 3 grocers so that I could price compare. I consider an item on promotion if it has the lowest price out of the three grocers. The goal was to determine how many items where No Frills was the price leader in 2019 or 2018.  

Results

Out of the 3 discount grocers, No Frills ranked 2nd place for being the price leader in both 2019 and 2018. This finding leads me to believe that they did not run any more promotions last year when compared to this year, and certainly the impact of algorithms may be overstated. 

There is, however, one qualifier: in 2018, No Frills ran a promotion for their loyalty program, PC Optimum, offering a bonus for spending a certain amount. Could it be that not including this promotion caused slower sales this year?

Well, taking a look at Q2 same-store sales growth for the past 2 years, it's clear that Loblaw Companies was slowing down even in 2018 when compared to Metro & Empire. 

At this point, I would say, myth busted. There was no difference in flyer promotions when comparing last year to this year and same-store sales growth has been slowing since 2018...which means, the data scientists and their algorithms can take a sigh of relief.

So, how can Loblaw Companies turn this around? Here are 3 flyer suggestions.

Be price comparable or price match

Metro and Empire have trained customers over the past year that their stores offer the best prices. The grocer discount channel is a growth area in food retailing, and is becoming even more competitive as Empire aggressively converts their under-performing Sobeys stores into FreshCo locations. No Frills needs to be price comparable.

Highlight fresh food offering

When I reviewed the flyers, I got the sense that Food Basics had a better complete grocery offering by focusing on produce and fresh foods, whereas, No Frills was focused on packaged dry goods. The drop-off between the primary brand and discount brand seems larger for Loblaw, which is a challenge if consumers are switching to discount brands. 

Continue leaning into the direct-to-consumer channel, the PC Optimum loyalty program

PC Optimum may be one of the strongest loyalty programs in Canada, and from my perspective, has surpassed Metro's Air Miles program. A direct-to-consumer brand relationship is quickly becoming one of the few remaining competitive advantages in markets where products are commoditized. Loblaw Companies should continue promoting offers in flyers to acquire new users - whether by mobile app or points card. The data generated will let them put their data scientists in a position to drive top-line business outcomes.


Disclosures

  • I own a long position in Loblaw Companies Ltd.
  • Basket items
    • 2019: Steak, Chicken Breast, Corn, Tomatoes, Watermelon, Sweet Peppers, Mushrooms, Blueberries, Ice Cream, Yogourt
    • 2018: Steak, Chicken Breast, Ribs, Corn, Tomatoes, Watermelon, Sweet Pepper, Mushrooms, Cucumber, Strawberries
  • Empire's quarterly end dates are mid-quarter when compared to the normal fiscal end dates, so I averaged the two quarters that represented Q2
  • I recognize that the flyer data is limited because it's only from 1 week and doesn't include Walmart or Costco

The One Skill That Data Scientists Are Not Being Trained On

After attending the Toronto Machine Learning Micro-Summit this past week, one theme came up repeatedly during the presentations -  communicate with the business team early, and often, or you'll need to go back and re-do your work. 

There was the story of an insurance company that created a model that recommended whether to replace or fix a car after a damage claim. It sounded great - the Data Scientists got a prototype up and running and had business team buy-in. But, the problem was that their models weren't very accurate. Usually when that happens it means that your data is noisy or the algorithm isn't powerful enough. They went back to their business team and it turns out that they missed 2 key features: the age of the vehicle and if it's a luxury model. 

Another example was a telecom that built a model to optimize call center efficiency. The data science team spent a month building the model and everyone was excited to get it in production. Then, they were told that the call center runs on an outdated application. It turns out that integrating with the application would cost more than the ROI of the project.

I think these situations are happening for 2 reasons: (i) companies are still learning to develop machine learning as a core competency and (ii) don't always have a clear agenda because they don't know what's feasible. As a result, Data Scientists are being hired for their laundry list of hard skills and educational background, but don't always have the domain expertise to understand the business. Even in my Data Science Certificate courses, the focus is on the programming tools, algorithms, and statistics. So we're seeing Data Scientists joining companies and actively looking for problems where they can apply machine learning, then jump into building models too quickly so that they can show traction to management.

The one skill that Data Scientists are not being trained on is Product Discovery - the ability to validate ideas in the cheapest, fastest, way possible. It's about prototyping - starting with low-fidelity and getting feedback at every stage. Stakeholder feedback and buy-in is just as much part of the solution as the outcome decided by a model. 

I can relate. 

As a Product Manager, much of my time is spent evangelizing and educating stakeholders of the products that I'm building. I'm trying to understand how my work impacts them. We are all inherently visual and so, at the very beginning, I use the most basic prototype - a flow diagram. It's just so much easier to explain a diagram over a call to someone rather than describe a solution. And after almost every call, I get asked if I can send them my diagram so that they can look it over again.

I think the examples I described earlier will become less of an issue as the field of machine learning matures and Data Scientists get more domain expertise. It does go to show though, why soft skills and communication are still the most important skills in a workplace. 

Are Data Scientists Actually Surveillance Scientists? - Part 1

There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time. But at any rate they could plug in your wire whenever they wanted to. You had to live—did live, from habit that became instinct—in the assumption that every sound you made was overheard, and, except in darkness, every movement scrutinized. -1984, George Orwell

Last summer I had a conversation with an acquaintance who had recently visited China. There was discussion about China's Social Credit System (SCS) and its impact on people's daily lives. The Social Credit System is a system that assigns scores to citizens based on their reputation, and that score can impact someone's ability to be outside in the evening, their eligibility to book a travel ticket or their suitability for a loan. It's similar to a Credit Score that North Americans are more familiar with but more encompassing as the SCS takes non-financial data into account. My acquaintance said that the initial feedback was positive - her friends and family felt safer walking the streets at night knowing that people deemed dangerous wouldn't be allowed outside. 

At roughly the same time I had this conversation, I was reading a great technical book on how to build big data systems: Designing Data-Intensive Applications. The last chapter, surprisingly, was not technical - rather, it was a commentary on data and society:  

As a thought experiment, try replacing the word data with surveillance, and observe if common phrases still sound so good. How about this: “In our surveillance-driven organization we collect real-time surveillance streams and store them in our surveillance warehouse. Our surveillance scientists use advanced analytics and surveillance processing in order to derive new insights.” [1]

Now, I work in technology and advertising, so it's not lost on me that the industry is built around collecting data on users to provide a more customized experience. Big data and machine learning systems are tools, and their application is really in the hands in the beholder. I think we need to ask ourselves at what point does data collection turn into surveillance and what are its implications?

Taking a step back might help in answering these questions. One of the big breakthroughs that's fueling the current artificial intelligence gold rush is something called deep learning. In 2012, Canadian researchers discovered a way to significantly reduce the error rates in computers being able to classify images. We're seeing the applications of deep learning popping up - from self-driving cars, to identifying diseases from medical images, to legal documents being read faster than humans ever could.

Another use, however, is authenticating people's faces through video cameras. 

Depending on your perspective, it's a way to conveniently unlock your phone or it's a way to conveniently monitor humans. I think many people don't believe they have anything to hide and so it doesn't register as a concern. However, machine learning systems are still built by humans, which means errors still happen.

Let's walk through an example: you buy a subway ticket to a certain neighbourhood but you have to submit a face scan. The scan believes your Social Credit Score is too low to be trustworthy and you're denied. You have no transparency into why you have a low score and you try and think of any possible reasons. What about that bill you paid late one time a few years ago? Could that be impacting me? Even worse, what if there's a mistake that's attributed to you? In machine learning systems, there's always a tradeoff being made of prioritizing false positives or false negatives. 

Another challenge is that these systems, "merely extrapolate from the past; if the past is discriminatory, they codify that discrimination." [2] Are the data scientists building these systems removing the bias in the datasets? I worry about these downstream effects and their impact to our daily lives. 

One area that's promising is an increased focus on privacy by organizations and how they collect and use your data. We're seeing countries legislating privacy in favor of the user in the EU and US. Where there's a gap, though, is the governance of the data systems themselves. In the future, privacy policies could evolve to incorporate an organization's values being applied to data systems too. I'll discuss what a framework for privacy-focused data systems could look like in a followup Part 2.

 

[1] Designing Data-Intensive Applications, Martin Kleppmann

[2] Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Cathy O'Neil

Lessons from University of Toronto's Data Science Statistics Course (3251)

I recently completed a Statistics for Data Science course at the University of Toronto for Continuing Studies and I wanted to share my reflections about the experience. 

Overall, the course was mostly interesting, some parts boring, and always challenging. 

I should begin with admitting that I managed to avoid any heavy maths or statistics classes in university even though I completed science and business degrees. I certainly felt behind when math notations and equations started popping up in class. But, where there's a will, there's a way (mostly). 

Most lectures were challenging to follow because of the pace of learning. I even tried to read ahead for some lectures to be more prepared, but that only marginally helped. As a result, I often sat in class with material that was way over my head and questioning if it was the best use of my time to spend that weekday night on campus, rather than at home self-studying. A younger me might of panicked and wondered what I had gotten myself into or worried that I would be outed as a fake. And I don't think I was the only one as I saw classmates drop out in the initial few weeks.  

There is, however, one skill that I've developed since finishing my college studies that came in handy - learning how to learn. So, I tackled this problem like any other and focused on finding the best resources suitable for me. 

I started with the required Statistics textbook and found the explanations and sample problems were inviting to beginners. The lectures mostly followed the chapters in the textbook, and I actually ignored the course lecture slides for the first 3/4 of the course. I also supplemented with Khan Academy, which does a great job of breaking material down into small snippets with videos. It's incredible that all of this material is available free and online -  I highly recommend it. Lastly, I had a friend who was a bit more experienced whom I would ask for help when needed. Friendships are usually the biggest reason why I always recommend in-person courses vs. online studies.

So...was it worth it? Yes, but your mileage may vary. 

In the working world, the satisfaction that you get out of something is a reflection of the effort that you put in. No one is watching over your shoulder, nor cares if you slack. Also, the rewards are more abstract - there's no incredible fitness transformation photos that you can show everyone after training for 12 weeks.

What I did get out of the course was a revised mental framework in which to examine decisions, better language when working with data scientists, and as mentioned, new friendships. 

How many of us have to make decisions every day based on performance reports, trend analysis reports, or present business cases to internal stakeholders or clients? Would you feel more prepared if you had a framework to critically assess the underlying data in order to make a decision? 

Not to get too technical, but I've stepped back and starting considering outcomes based on conditional probabilities - also known as Bayesian statistics. This change also coincides with a book I was reading recently, Principles, by Ray Dalio, who talks about expected value - weighing risks and rewards against probabilities. Many of us already make decisions taking probabilities and expected value into account, but I think about these factors more explicitly now. 

Another benefit is the ability to better relate to data scientists and, ultimately, develop more effective working relationships. I may not know the intricacies of Monte Carlo Markov Chains, but I have been exposed to the topics. Actually, one of my metrics of how well I can relate to a functional role is if I can get a person to laugh at a joke related to their discipline whether they're in sales, marketing, client services, developers, or data scientists. I suppose I'm always building up a joke supply.

Lastly, friendships naturally develop over the course of the program with familiar faces and group projects. It's fascinating meeting working professionals from different backgrounds, who have opted to drag themselves to a stats class for 3 months during the winter, after work. There's a shared bond and certainly it's one of the most enduring satisfactions of the course.


Kerry's Recommendations for Data Science Statistics Resources

Good

Not Great


How studying data science lets me design better customer solutions

If data is the new oil, then data science is the new refinery.

I was recently asked whether studying Data Science has helped me in my day-to-day job. My response was yes, but not in an obvious way - it's resulted in better designed customer solutions by improving my empathy.

Let me take a step back. For the past few years, I've been leading Software-as-a-Service (SaaS) platform integrations for enterprise clients. I often describe the work as similar to being a clothing tailor. If a software consultancy is a bespoke tailor that customizes every detail at a premium price; than, a SaaS platform is a made-to-measure tailor who cuts from an existing pattern at an economical price. Over time, I've learned how to measure and cut software for customers of all shapes, sizes, and sophistication.

However, where the analogy ends is that cloth is something we can touch and see, so we naturally understand its limitations; software architecture, on the other hand, exists in our minds and most of us aren't able to judge the quality. With fabric, we don't question why it can't be made from a liquid, but I often find myself explaining to customers why our platform can't do what they want because how data is stored. 

So how does studying data science fit into all of this?

A large part of pragmatic data science involves the process of Extracting, Transforming, and Loading (ETL) data. Extract data from a source (i.e. database, API, CSV), Transform the data by cleaning it up such as removing outliers and incomplete records, and Load the new data into your machine learning training model. Rinse, lather, and repeat for every project. 

Let me give some examples. If I have a task that requires bulk automation, a developer will likely prefer a well-formatted CSV where they can easily extract the information. If I have a task that requires a computer to read thousands of records, a developer will likely prefer an input with standardized punctuation and identifiers (i.e. JSON). If I have a task that requires loading data into a new table or database, a developer will likely prefer working with someone who weighs any risks to the existing data. 

All the hands-on ETL practice I've done over the past year has honed my compass on how to work with data - whether it's a better grasp of what's a reasonable request of a developer or being more articulate with a customer in explaining what's possible with data. It leads to improved communication, credibility, faster-decision making, and ultimately, a timely well-designed solution. 

Why soft skills will win in the age of machine learning

Back in college, I had a summer job completing research for a clinical health professor. She was a leading expert in diagnosing and treating open human wounds. My job was to survey other experts, get them to examine photos of open wounds, and then recommend a treatment.

A few months ago, I discovered a smartphone app which replaces this work.* You take a photo of an open wound and upload it to the cloud. I suspect that the photo is run through an image recognition model, called a Convolutional Neural Network (CNN), that identifies specific features of the wound for treatment. Current machine learning is very good at completing narrowly defined tasks, such as analyzing a specific type of medical image, because they have millions of previous examples to train from. It is not good at handling non-standard cases. 

Jobs that require experts will increasingly be impacted as cloud storage and machine learning services come down in cost and become more accessible.

For example, we have seen generalists such as nurse practitioners, paralegals, and dental hygienists, offer services that used to be only available through doctors, lawyers, and dentists. Machine learning allows these generalists to offer even more of these services. As a result, specialists will be left handling non-standard cases. 

Making specialist services more affordable means that those who were under-served before now have access. The total market size grows and consumers benefit from this outcome. 

Reflecting on my daily work, I can see the hard skills such as reporting and analysis being automated. However, the softer skills of project management, architecture design, and relationship building are tasks that I just can't see being automated anytime soon. Context for how to operate in a certain industry won't be easily captured either. These soft skills are the ones that will win in the age of machine learning.

*Company is called: https://swiftmedical.com

How Data Scientists Are Controlling Your Life

My daily experience with recommendation systems are seamless. They recommend what to read on Apple News, listen on Spotify, eat on Uber Eats, purchase on Amazon, watch on Netflix. These software programs take millions of data points, clean and segment the data, weigh different variables, and output recommendations that ensure we stay engaged with the platform for the next selection. As much as we want to believe that machines make all these decisions, data scientists are the ones that are deciding the inputs for these models. Ultimately, these choices introduce bias. 

What if I'm missing out on an incredible book or song because the inputs don't capture interests of mine that I didn't even know existed?

Six years ago, I switched my book purchases to Google's Play Book marketplace. I loved the convenience of having my book highlights stored online and available for quick future reference. Google's recommendation system has an endless list of books for me to discover. And, for many years, I happily obliged and purchased their recommendations. What I've noticed recently, however, is that their recommendations are just not interesting anymore. I had a narrowly defined set of interests for book topics that I was looking for and I read them all. 

Lately, I've been visiting independent bookstores and discovered many new, interesting books that I had never heard of. Books in different topics that I found interesting, but didn't know I wanted. One bookstore had an interesting quote that's stuck with me:

On the internet you can find what you're looking for; in our store you can find what you are not looking for (benmcnallybooks)

When you walk into a store, you're not browsing a store's products. You're browsing the store owner's taste. A store owner has carefully curated their selection based on their database of expertise, and can filter top selections from many categories. I may discover an interesting selection not because I'm interested in a topic, but because an expert knows a quality selection regardless of topic.

We all start consuming similar lifestyles as our lives become increasingly focused around digital platforms and their recommendation systems, We may not be discovering parts of ourselves because we didn't know they existed. So, get out into the real world and don't let data scientists control your life.