Curated Resource ( ? )

Mapping Medium’s Tags - Medium Engineering

Mapping Medium’s Tags - Medium Engineering

my notes ( ? )

there are big issues with tags that limit their usefulness... tags are scattered... over 1 million unique tags. Many ... duplicates ... or so close that they have the same audience...
represent each tag by a vector of numbers in a multi-dimensional vector space...find the meaning of these thousands of tags in a way that can represented by vectors ... I pretended each post’s tag list was a “sentence” ... fed into a training algorithm which usually takes real sentences... it figures out a tag’s vector values by looking at the tags that are used along with it ... these kinds of vectors are also known as “embeddings”...
the dimensions work in concert to represent information about the tags... we can find tag vectors that are close to each other... We can also do arithmetic on our vectors... eg average the “Tech” vector with “Education” to land in the vicinity of EdTech tags... We can also solve analogies... essentially "Education" is to "EdTech" as "Agriculture" is to _"... We can also plot the tag vectors... interpreting them as points in space... a myriad of ways to do this “dimensionality reduction”...
identify duplicate tags and normalize them... solve other prediction problems with machine learning... these “dense” expressive vectors ... should improve the algorithms’ ability to predict...
Here are some things we’d need to consider... “word-sense disambiguation”... De-biasing embeddings... Associating tags across languages... train a cross-lingual model that embeds the tags from both languages in a shared vector space, so they could be directly compared...

Read the Full Post

The above notes were curated from the full post

Related reading

More Stuff I Like

More Stuff tagged nlp, machine learning, semantic, tag, word2vec

See also: Content Strategy, Digital Transformation

Cookies disclaimer saves very few cookies onto your device: we need some to monitor site traffic using Google Analytics, while another protects you from a cross-site request forgeries. Nevertheless, you can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings, you grant us permission to store that information on your device. More details in our Privacy Policy.

I agree