How YouTube Recommends Videos

Introduction

Back in 2008, YouTube had passed Yahoo! to become the second largest search engine in the world, behind only Google. Today, we can ask a related question: “Is YouTube about to pass Amazon as the largest scaled and most sophisticated industrial recommendation system in existence?” This question isn’t rhetorical – because we don’t know the answer as YouTube fiercely competes with the Amazon recommendation system.

YouTube suggested videos are a force multiplier for YouTube’s search algorithm that we would need to understand.

Earlier YouTube Recommendation Process

To maximize your presence in YouTube search and suggested videos, you need to make sure your metadata is well-optimized. This includes your video’s title, description, and tags. Most SEOs focus on the search results – because that’s what matters in Google.

How to create metadata tags in YouTube?

We need to look at the relevant top-ranking video and then use as many of the tags as we could that were also relevant for our video.

Recent YouTube Recommendation Behaviour

The scenario with the YouTube Recommendation approach is changed now. To get repeated viewers, the video must be recognized by the YouTube Recommendation Process. But, most YouTube marketers know that appearing in suggested videos can generate almost as many views as appearing in YouTube’s search results.

Why? Because viewers tend to watch multiple videos during sessions that last about 40 minutes, on average. So, a viewer might conduct one search, watch a video, and then go on to watch a suggested video. In other words, you might get two or more videos viewed for each search that’s conducted on YouTube. That’s what makes suggested videos a force multiplier for YouTube’s search algorithm.

How does YouTube Recommend Videos – Lighter Approach

There is a video in YouTube on the YouTube Creators channel entitled “How YouTube’s Suggested Videos Work”.

As the video’s 300-word description explains:

“Suggested Videos are a personalized collection of videos that an individual viewer may be interested in watching next, based on prior activity.”

“Studies of YouTube consumption have shown that viewers tend to watch a lot more when they get recommendations from a variety of channels and suggested videos do just that. Suggested Videos are ranked to maximize engagement for the viewer.”

So, optimizing your metadata still helps, but you also need to create a compelling opening to your videos, maintain and build interest throughout the video, as well as engage your audience by encouraging comments and interacting with your viewers as part of your content.

How YouTube Recommends Videos – Recommender Systems

Recommender Systems are among the most common forms of Machine Learning that users will encounter, whether they’re aware of it or not. It powers curated timelines on Facebook and Twitter, and “suggested videos” on YouTube.

Previously formulated as a matrix factorization problem that attempts to predict a movie’s ratings for a particular user, many are now approaching this problem using Deep Learning; the intuition is that non-linear combinations of features may yield a better prediction than a traditional matrix factorization approach can.

In 2016, Covington, Adams, and Sargin demonstrated the benefits of this approach with “Deep Neural Networks for YouTube Recommendations”, making Google one of the first companies to deploy production-level deep neural networks for recommender systems.

Given that YouTube is the second most visited website in the United States, with over 400 hours of content uploaded per minute, recommending fresh content poses no straightforward task. In their research paper, Covington et al. demonstrate a two-stage information retrieval approach, where one network generates recommendations, and a second network ranks these generated recommendations. This approach is quite thoughtful; since recommending videos can be posed as an extreme multiclass classification problem, having one network to reduce the cardinality of the task from a few million data points into a few hundred data points permits the ranking network to take advantage of more sophisticated features which may have been too minute for the candidate generation model to learn.

Background

There were two main factors behind YouTube’s Deep Learning approach towards Recommender Systems:

  • Scale: Due to the immense sparsity of these matrices, it’s difficult for previous matrix factorization approaches to scale amongst the entire feature space. Additionally, previous matrix factorization approaches have a difficult time handling a combination of categorical and continuous variables.
  • Consistency: Many other product-based teams at Google have switched to deep learning as a general framework for learning problems. Since Google Brain has released TensorFlow, it is sufficiently easy to train, test, and deploy deep neural networks in a distributed fashion.

Network Structure

There are two networks at play:

  • The candidate generation network takes the user’s activity history ****(eg. IDs of videos being watched, search history, and user-level demographics) and outputs a few hundred videos that might broadly apply to the user. The general idea is that this network should optimize for precision; each instance should be highly relevant, even if it requires forgoing some items which may be widely popular but irrelevant.
  • In contrast, the ranking network takes a richer set of features for each video, and score each item from the candidate generation network. For this network, it’s important to have a high recall; it’s okay for some recommendations to not be very relevant as long as you’re not missing the most relevant items*.***

On the whole, this network is trained end-to-end; the training and test set consists of hold-out data. In other words, the network is given a user’s time history until some time t, and the network is asked what they would like to watch at time t+1! The authors believe this was among the best ways to recommend videos provided the episodic nature of videos on YouTube.

Performance Hacks

In both the candidate generation and candidate ranking networks, the authors leverage various tricks to help reduce dimensionality or performance from the model. We discuss these here, as they’re relevant to both models.

First, they trained a subnetwork to transform sparse features (such as video IDs, search tokens, and user IDs) into dense features by learning an embedding for these features. This embedding is learned jointly with the rest of the model parameters via gradient descent.

Secondly, to aid against the exploitation/exploration problem, they feed the age of the training example as a feature. This helps overcome the implicit bias in models which tend to recommend stale content, as a result of the average watch likelihood during training time. At serving time, they simply set the age of the example to be zero to compensate for this factor.

Ranking the Predictions

The fundamental idea behind partitioning the recommender system into two networks is that this provides the ability for the ranking network to examine each video with a finer tooth comb than the candidate generation model was able to.

For example, the candidate generation model may only have access to features such as video embedding, and the number of watches. In contrast, the ranking network can take features such as the thumbnail image and the interest of their peers to provide a much more accurate scoring.

The objective of the ranking network is to maximize the expected watch time for any given recommendation. Covington et al. decided to attempt to maximize watch time over the probability of a click, due to the common “clickbait” titles in videos.

Similar to the candidate generation network, the authors use embedding spaces to map sparse categorical features into dense representations. Any features which relate to multiple items (i.e. searches over multiple video IDs, etc) are averaged before being fed into the network. However, categorical features which depend upon the same underlying feature (i.e. video IDs of the impression, last video ID watched, etc) are shared between these categories to preserve memory and runtime requirements.

As far as continuous features go, they’re normalized in two ways.

  • First, it follows the standard normalization between [0, 1), using a cumulative uniform distribution.
  • Secondly, in addition to the standard normalization x, the form sqrt(x) and  are also fed. This permits the model to create super and sub-linear functions of each feature, which is crucial to improving offline accuracy.

To predict expected watch time, the authors used logistic regression. Clicked impressions were weighed with the observed watch time, whereas negative examples all received unit weight. In practice, this is a modeled probability ET, where E[T] models the expected watch time of the impression, and P models the probability of clicking the video.

Finally, the authors demonstrated the impact of a wider and deeper network on per-user loss. The per-user loss was the total amount of mispredicted watch time, against the total watch time on held-out data. This permits the model to predict something that is a proxy to a good recommendation; rather than predicting a good recommendation itself.

Conclusion

“Deep Neural Networks for YouTube Recommendations” was one of the first papers to highlight the advancements that Deep Learning may provide for Recommender Systems, and appeared in ACM’s 2016 Conference on Recommender Systems. It laid the foundation for many papers afterward. So, it has been a fantastic journey for the YouTube in the past decade to improve the recommendation process which in turn helps to keep the viewers intact. There are statistics that YouTube app in mobiles has replaced watching television to a great extent around the world. Not at all a simple task, we must sincerely appreciate the people behind it to happen.

“We will soon trade in our clunky flat screens for its handheld cousin, the smartphone and its YouTube app.”

Why Edge Computing is gaining popularity

Edge Computing

Edge Computing brings computation and data storage closer to the devices where it’s being gathered, rather than relying on a central location that can be thousands of miles away. This is done so that data, especially real-time data, does not suffer latency issues that can affect an application’s performance. In addition, companies can save money by having the processing done locally, reducing the amount of data that needs to be processed in a centralized or cloud-based location.

Gartner defines edge computing as “a part of a distributed computing topology in which information processing is located close to the edge – where things and people produce or consume that information.”

Ubiquitous Computing

Ubiquitous computing is a concept in software engineering and computer science where computing is made to appear anytime and everywhere. In contrast to desktop computing, ubiquitous computing can occur using any device, in any location, and across any format.

And we are probably seeing this in our own everyday life. For example, at home, we might be using an Alexa device from Amazon or we might be using Google home. We might even have an intelligent fridge or a car we can talk to.

As companies increasingly leverage ubiquitous computing to support multiple types of applications and systems, a massive amount of data is generated for decision making. However, sending all the data to the cloud can result in latency. Edge computing can drive sub-second responses by moving both computing and data closer to the user. This will reduce latency, minimize data threats, and boost bandwidth. Here are some interesting use cases across industries:

Evolution of Computing

To understand Edge Computing, we need to travel back a few decades and see how Computing has evolved in the past 50 years. The below picture provides a quick recap of the evolution of Computing.

How Edge Computing works

Edge computing was developed due to the exponential growth of IoT devices, which connect to the internet for either receiving information from the cloud or delivering data back to the cloud. And many IoT devices generate enormous amounts of data during the course of their operations.

Think about devices that monitor manufacturing equipment on a factory floor or an internet-connected video camera that sends live footage from a remote office. While a single device producing data can transmit it across a network quite easily, problems arise when the number of devices transmitting data at the same time grows. Instead of one video camera transmits live footage, multiply that by hundreds or thousands of devices. Not only will quality suffer due to latency, but the costs in bandwidth can be tremendous.

Edge-computing hardware and services help solve this problem by being a local source of processing and storage for many of these systems. An edge gateway, for example, can process data from an edge device and then send only the relevant data back through the cloud, reducing bandwidth needs. Or it can send data back to the edge device in the case of real-time application needs.

These edge devices can include many different things, such as an IoT sensor, an employee’s notebook computer, their latest smartphone, the security camera, or even the internet-connected microwave oven in the office break room. Edge gateways themselves are considered edge devices within an edge-computing infrastructure.

Why does Edge Computing matter

For many companies, the cost savings alone can be a driver towards deploying an edge-computing architecture. Companies that embraced the cloud for many of their applications may have discovered that the costs in bandwidth were higher than they expected.

Increasingly, though, the biggest benefit of edge computing is the ability to process and store data faster, enabling more efficient real-time applications that are critical to companies. Before edge computing, a smartphone scanning a person’s face for facial recognition would need to run the facial recognition algorithm through a cloud-based service, which would take a lot of time to process. With an edge computing model, the algorithm could run locally on an edge server or gateway, or even on the smartphone itself, given the increasing power of smartphones. Applications such as virtual and augmented reality, self-driving cars, smart cities, even building-automation systems require fast processing and response.

Computing as close as possible to the point of use has always been important for applications requiring low-latency data transmission, very high bandwidth, or powerful local processing capabilities — particularly for machine learning (ML) and other analytics.

Here are some interesting use cases across industries:

Use Case (a) Autonomous vehicles

One of the leading current uses is for autonomous vehicles, which need data from the cloud. If access to the cloud is denied or slowed, they must continue to perform; there is no room for latency. The amount of data produced by all sensors on a vehicle is prodigious and must not only be processed locally, but anything sent up to the cloud must be compressed and transmitted on an as-needed basis to avoid overwhelming available bandwidth and taking precious time. IoT applications in general are important drivers of edge computing because they share a similar profile.

Use Case (b) In-hospital patient monitoring

Healthcare contains several edge opportunities. Currently, monitoring devices (e.g. glucose monitors, health tools, and other sensors) are either not connected, or where they are, large amounts of unprocessed data from devices would need to be stored on a 3rd party cloud. This presents security concerns for healthcare providers.

An edge on the hospital site could process data locally to maintain data privacy. Edge also enables right-time notifications to practitioners of unusual patient trends or behaviours (through analytics/AI), and the creation of 360-degree view patient dashboards for full visibility.

Use Case (c) Remote monitoring of assets in the oil and gas industry

Oil and gas failures can be disastrous. Their assets, therefore need to be carefully monitored.

However, oil and gas plants are often in remote locations. Edge computing enables real-time analytics with processing much closer to the asset, meaning there is less reliance on good quality connectivity to a centralized cloud.

Privacy and Security

However, as is the case with many new technologies, solving one problem can create others. From a security standpoint, data at the edge can be troublesome, especially when it’s being handled by different devices that might not be as secure as a centralized or cloud-based system. As the number of IoT devices grows, it’s imperative that IT understand the potential security issues around these devices, and make sure those systems can be secured. This includes making sure that data is encrypted, and that the correct access-control methods are implemented.

What about 5G

Around the world, carriers are deploying 5G wireless technologies, which promise the benefits of high bandwidth and low latency for applications, enabling companies to go from a garden hose to a firehose with their data bandwidth. Instead of just offering faster speeds and telling companies to continue processing data in the cloud, many carriers are working edge-computing strategies into their 5G deployments to offer faster real-time processing, especially for mobile devices, connected cars, and self-driving cars.

The Future of Edge Computing

Shifting data processing to the edge of the network can help companies take advantage of the growing number of IoT edge devices, improve network speeds, and enhance customer experiences. The scalable nature of edge computing also makes it an ideal solution for fast-growing, agile companies, especially if they are already making use of colocation data centers and cloud infrastructure.

By harnessing the power of edge computing, companies can optimize their networks to provide flexible and reliable service that bolsters their brand and keeps customers happy.

Edge computing offers several advantages over traditional forms of network architecture and will surely continue to play an important role for companies going forward. With more and more internet-connected devices hitting the market, innovative organizations have likely only scratched the surface of what’s possible with edge computing.

Big Data with AI for Business is just compelling

What is Big Data

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analysed for insights that lead to better decisions and strategic business moves.

Big Data refers to our ability to make sense of the vast amount of data that we generate every single second. In recent times, our world has become increasingly digitized, we produce more data than ever before. The amount of data in the world are simply exploding at the moment.

With the internet, more powerful computing and cheaper data storage helped to use data much better than ever before. Big Data means companies like Google can personalize our search results, Netflix and Amazon can understand our choices as a customer and recommend the right things for us. And we can use Big Data to even analyse the entire social media traffic around the world to spot trends.

Benefits of Big Data with AI

By bringing together big data and AI technology, companies can improve business performance and efficiency by:

  • Anticipating and capitalizing on emerging industry and market trends
  • Analyzing consumer behavior and automating customer segmentation
  • Personalizing and optimizing the performance of digital marketing campaigns
  • Using intelligent decision support systems fuelled by big data, AI, and predictive analytics

Realtime Examples of AI and Big Data in Business

Here are the examples of companies that use AI with Big Data and seen enormous success in their fields.

Case Study (a): Netflix – Big Data and AI

Netflix uses AI and Big Data extensively and achieved great success as an organization. It has over 200 million subscribers around the world.

  • Generate Content: AI with big data helps Netflix in understanding consumers more and more granular level, thereby it helps Netflix to generate ‘content’ that matches the consumers taste to a large extent. Other competitors have a 40% success rate, whereas Netflix enjoys an 80% success rate.
  • Recommend Programmes: Netflix uses AI to recommend new movies and television programmes to consumers. 80% of what the consumers watch is driven by their AI recommendations. Netflix fine-tunes their algorithms in understanding the consumers and provides recommendations to the consumers about their programmes and movies.
  • Auto Generate Thumbnails: Netflix uses AI to auto-generate thumbnails. Consumers spend limited time choosing the films on seeing just the thumbnails for few seconds to minutes. Netflix understood the importance of thumbnails for consumers choosing their favourite programmes. Using Artificial Intelligence, thumbnails are generated dynamically based on the consumers’ interests.
  • Vary Streaming Speed: Netflix uses AI for Predicting the internet based on the consumers’ internet speed. AI algorithms help to scale up or scale down the streaming of movies based on the consumers’ real-time internet bandwidth.
  • Assist Pre-production: Netflix uses AI in pre-production activities. It helps to find location spots to shoot a movie (based on actors availability, actors location, etc)
  • Assist Post-production: Netflix uses AI widely in post-production activities as well. Although editing is manual, quality checks are driven by AI to avoid mistakes in post-production. There were several mistakes that happened due to negligence or lack of time, resources during post-production activities. But with the usage of AI algorithms, Netflix could eradicate these problems to a great extent.

Case Study (b): Disney (Theme Park and Cinemas) – Big Data and AI

Disney uses Big Data and AI to give customers a more magical experience. Disney has always been a tech innovator in both Theme Park and in Cinemas to give the customer a wonderful experience.

  • Magic band: Disney offers magic band to its customers while they enter the theme park. Its kind of fitness watch which helps to open hotel room, allows the customers to pay. It has a GPS tracker in the band, which keeps tracking the customers where there are walking within Disney Theme Park. It is to ensure, where they are going within the park, which rides they are spending time, how much time they spend in restaurants.
  • Better Operational Management: It helps to schedule the workers to manage over crowding at one ride or at a single restaurant with in the park.
  • Better Customer Experience: Better management of crowd, giving proper assistance within the park gives the customer a better experience. They might direct the customers to other rides, other restaurants to avoid delay in one place.
  • Realtime Sentiment Analysis: Disney research team started using AI to understand real-time reactions when people watch in the live show or in the cinema. How they do is they are using ‘Machine Vision’ – AI coupled with a Camera, a night vision Camera looking at the audience. They do Sentiment analysis with the people in the show. Cameras will interpret the facial expressions by looking at how the people are responding to the shows or movies to see if they are sad, scared, having fun, etc. This would in turn help Disney to generate quality content based on the customers for their shows and movies.

Case Study (c): Big Data and AI with Motor Insurance

Motor Insurance providers have started using AI with Big Data to provide a dynamic flexible insurance plan that will suit different customers based on their driving skills, ability and composure at different times.

  • Motor Insurance companies generally determine the premium based on the age of the vehicle. The insurance providers then started to understand the Customer based on how they drive by considering the age factor. This gave the perception a person aged 18 would drive rashly on comparing with a person aged 55 who will show maturity in driving.
  • Tracking Card: Motor Insurance providers started providing a tracking card to insert in the vehicle, which helps them to track and understand about the driving ability of the customer. This helped the provider to understand the customer better.
  • Mobile App: Now replacing the card with the mobile connected with GPS, it just needs the providers to install a mobile app within the customers mobile. This helps the providers to collect information about the customer driving. With the implementation of AI with Big Data, the providers can study the customer to a granular level. It helps the provider to understand how the customer is driving in a highway, during a rainy day, or on a hilly mountain road. Also, the question comes, they are people aged 18 who can drive better than the people with higher age. With the AI algorithms, over a period of time, the providers can understand each individual, how he is driving in the morning or in the late night, during a rainy day or during peak hours. Hence the data with the granular detail of the customer helps the Insurance providers to provide flexibility based on their driving skills not just merely on the age of the vehicle or the age of the customer.

Conclusion

It’s no hype that AI with big data are another set of high five technologies just to boast with for the IT giants. It has been used widely in several sectors and industries starting from big organizations to small business. The implementation of AI with Big Data in every industry has proved a great success and has helped the company business to a great extent. As said in the beginning, the world is exploding with data at the moment. Big Data with AI is really making sense of the huge data with the internet, more powerful computing and cheaper data storage.