How Airbnb leverages ML to derive visitor curiosity from unstructured textual content information and supply personalised suggestions to Hosts
At Airbnb, we endeavor to construct a world the place anybody can belong wherever. We try to know what our company care about and match them with Hosts who can present what they’re in search of. What higher supply for visitor preferences than the company themselves?
We constructed a system referred to as the Attribute Prioritization System (APS) to take heed to our company’ wants in a house: What are they requesting in messages to Hosts? What are they commenting on in opinions? What are widespread requests when calling buyer help? And the way does it differ by the house’s location, property sort, worth, in addition to company’ journey wants?
With this personalised understanding of what dwelling facilities, amenities, and site options (i.e. “dwelling attributes”) matter most to our company, we advise Hosts on which dwelling attributes to accumulate, merchandize, and confirm. We are able to additionally show to company the house attributes which are most related to their vacation spot and desires.
We do that by a scalable, platformized, and data-driven engineering system. This weblog submit describes the science and engineering behind the system.
What do company care about?
First, to find out what issues most to our company in a house, we take a look at what company request, touch upon, and make contact with buyer help about essentially the most. Are they asking a Host whether or not they have wifi, free parking, a personal sizzling tub, or entry to the seaside?
To parse this unstructured information at scale, Airbnb constructed LATEX (Listing ATtribute EXtraction), a machine studying system that may extract dwelling attributes from unstructured textual content information like visitor messages and opinions, buyer help tickets, and itemizing descriptions. LATEX accomplishes this in two steps:
- A named entity recognition (NER) module extracts key phrases from unstructured textual content information
- An entity mapping module then maps these key phrases to dwelling attributes
The named entity recognition (NER) module makes use of textCNN (convolutional neural network for text) and is skilled and fantastic tuned on human labeled textual content information from varied information sources inside Airbnb. Within the coaching dataset, we label every phrase that falls into the next 5 classes: Amenity, Exercise, Occasion, Particular POI (i.e. “Lake Tahoe”), or generic POI (i.e. “submit workplace”).
The entity mapping module makes use of an unsupervised studying method to map these phrases to dwelling attributes. To attain this, we compute the cosine distance between the candidate phrase and the attribute label within the fine-tuned phrase embedding house. We think about the closest mapping to be the referenced attribute, and may calculate a confidence rating for the mapping.
We then calculate how regularly an entity is referenced in every textual content supply (i.e. messages, opinions, customer support tickets), and mixture the normalized frequency throughout textual content sources. Residence attributes with many mentions are thought-about extra essential.
With this method, we’re in a position to acquire perception into what company are serious about, even highlighting new entities that we could not but help. The scalable engineering system additionally permits us to enhance the mannequin by onboarding extra information sources and languages.
What do company care about for various kinds of houses?
What company search for in a mountain cabin is totally different from an city residence. Gaining a extra full understanding of company’ wants in an Airbnb dwelling allows us to supply extra personalised steering to Hosts.
To attain this, we calculate a novel rating of attributes for every dwelling. Primarily based on the traits of a house–location, property sort, capability, luxurious degree, and so forth–we predict how regularly every attribute might be talked about in messages, opinions, and customer support tickets. We then use these predicted frequencies to calculate a custom-made significance rating that’s used to rank all potential attributes of a house.
For instance, allow us to think about a mountain cabin that may host six folks with a median day by day worth of $50. In figuring out what’s most essential for potential company, we be taught from what’s most talked about for different houses that share these similar traits. The consequence: sizzling tub, hearth pit, lake view, mountain view, grill, and kayak. In distinction, what’s essential for an city residence are: parking, eating places, grocery shops, and subway stations.
We may instantly mixture the frequency of key phrase utilization amongst related houses. However this method would run into points at scale; the cardinality of our dwelling segments may develop exponentially giant, with sparse information in very distinctive segments. As an alternative, we constructed an inference mannequin that makes use of the uncooked key phrase frequency information to deduce the anticipated frequency for a section. This inference method is scalable as we use finer and extra dimensions to characterize our houses. This permits us to help our Hosts to greatest spotlight their distinctive and numerous assortment of houses.
How can company’ preferences assist Hosts enhance?
Now that now we have a granular understanding of what company need, we can assist Hosts showcase what company are in search of by:
- Recommending that Hosts purchase an amenity company usually request (i.e. espresso maker)
- Merchandizing an present dwelling attribute that company are inclined to remark favorably on in opinions (i.e. patio)
- Clarifying fashionable amenities which will find yourself in requests to buyer help (i.e. the privateness and talent to entry a pool)
However to make these suggestions related, it’s not sufficient to know what company need. We additionally have to be certain about what’s already within the dwelling. This seems to be trickier than asking the Host as a result of 800+ dwelling attributes we gather. Most Hosts aren’t in a position to instantly and precisely add the entire attributes their dwelling has, particularly since facilities like a crib imply various things to totally different folks. To fill in among the gaps, we leverage company suggestions for facilities and amenities they’ve seen or used. As well as, some dwelling attributes can be found from reliable third events, similar to actual property or geolocation databases that may present sq. footage, bed room depend, or if the house is overlooking a lake or seaside. We’re in a position to construct a really full image of a house by leveraging information from our Hosts, company, and reliable third events.
We make the most of a number of totally different fashions, together with a Bayesian inference mannequin that will increase in confidence as extra company verify that the house has an attribute. We additionally leverage a supervised neural community WiDeText machine studying mannequin that makes use of options in regards to the dwelling to foretell the chance that the following visitor will verify the attribute’s existence.
Along with our estimate of how essential sure dwelling attributes are for a house, and the chance that the house attribute already exists or wants clarification, we’re in a position to give personalised and related suggestions to Hosts on what to accumulate, merchandize, and make clear when selling their dwelling on Airbnb.
That is the primary time we’ve recognized what attributes our company need right down to the house degree. What’s essential varies tremendously primarily based on dwelling location and journey sort.
This full-stack prioritization system has allowed us to present extra related and personalised recommendation to Hosts, to merchandize what company are in search of, and to precisely symbolize fashionable and contentious attributes. When Hosts precisely describe their houses and spotlight what company care about, company can discover their excellent trip dwelling extra simply.
We’re presently experimenting with highlighting facilities which are most essential for every sort of dwelling (i.e. kayak for mountain cabin, parking for city residence) on the house’s product description web page. We imagine we are able to leverage the information gained to enhance search and to find out which dwelling attributes are most essential for various classes of houses.
On the Host aspect, we’re increasing this prioritization methodology to embody extra ideas and insights into how Hosts could make their listings much more fascinating. This contains actions like releasing up fashionable nights, providing reductions, and adjusting settings. By leveraging unstructured textual content information to assist company join with their excellent Host and residential, we hope to foster a world the place anybody can belong wherever.
If such a work pursuits you, take a look at a few of our associated positions at Careers at Airbnb!
It takes a village to construct such a sturdy full-stack platform. Particular due to (alphabetical by final identify) Usman Abbasi, Dean Chen, Guillaume Guy, Noah Hendrix, Hongwei Li, Xiao Li, Sara Liu, Qianru Ma, Dan Nguyen, Martin Nguyen, Brennan Polley, Federico Ponte, Jose Rodriguez, Peng Wang, Rongru Yan, Meng Yu, Lu Zhang for his or her contributions, dedication, experience, and thoughtfulness!