In this blog, we will cover a brief history of emoji, share some recent statistics, delve into current challenges that pertain to eDiscovery, and close by citing developments in deciphering emoji with a few examples as to how various factors can affect the meaning of this evolving, relatively new digital language.
Emoji Statistics
2021 saw the release of 217 new emoji, and while the total for 2022 decreased to just over 100, an additional 151 emoji are set to be added or recommended in September 2023. As a result, the total number of emoji registered at unicode.org will soon exceed 3,700, and is on track to surpass 4,000 over the next 2 to 3 years.
The continued growth of emoji can be directly tied to the increased global use of social media over the past decade; however, the recent increase in messaging app usage for business purposes has become an additional driver for the expanded use of existing and new emoji.
On the business-use front, current metrics indicate that messaging apps, such Teams and Slack, have increased substantially over the past few years such that users today are just as likely to prefer using these apps over email. This, of course, has many contributing factors that range from industry type, company culture, departmental and individual preferences. Nevertheless, the fact that messaging apps are now on par as a primary form of communication with email is exceptionally important to the legal community given the various responsibilities legal professionals have for understanding, administering and managing data preservation requirements as it relates to eDiscovery.
Emoji in eDiscovery – Technical Considerations
It has become a priority for legal teams to better understand how emoji are preserved in accordance with the law and jurisdictional requirements since emoji characters are not only used to replace text, but are also effective at accompanying text to communicate emotion or sentiment. With this in mind, below are three technology related items that can cause the same emoji to vary in appearance and in some cases result in the rendering of ambiguous symbols such as 🤩 or
Below is a chart that illustrates how the same emoji can differ across devices, operating systems and corresponding version releases:
There are also distinctions across software programs with regard to storing and exporting emoji characters, which poses another key difference that legal teams must be mindful of. It is critical that legal teams become familiar with the standard method in which a given software indexes emoji as it can have an impact on eDiscovery workflows later on. For example, Google Chat and WhatsApp store emoji characters in Unicode in a manner that enables the ability to search for emoji using the actual emoji character, provided your eDiscovery vendor’s platform maintains a search index that supports all Unicode characters. On the other hand, Teams and Slack currently index emoji by short code (or short name), therefore searching for emoji across chats exported from these platforms would require searching for the corresponding short code text. Short code details for approved Unicode emoji can be obtained from unicode.org.
The following are important steps to consider when handling and managing ‘modern’ data for eDiscovery purposes:
It is wise to consider the development of workflows that support the various nuances specific to short message data due to the technical factors outlined above that can compromise intended search results and the document review process. More importantly, always be prepared to ask questions. Emoji have an ever-evolving nature, which we will cover in more detail, so be ready to inquire with your service provider if emoji do not appear to render in the review platform correctly or if search results are not returning as expected.
Compartmentalizing the management and subsequent review of modern data is an excellent approach to ensure you are taking appropriate steps to mitigate the chance that data is converted incorrectly for review.
Current eDiscovery Challenges Reviewing Emoji
After taking appropriate measures to prevent the occurrence of technology related issues with modern data, you will now be ready to address the challenges that legal practitioners currently encounter with interpreting emoji contained in relevant communication.
Several eDiscovery service providers have in recent years introduced technologies that leverage AI/machine learning capabilities to identify emotional tone behind document text. This technology, in general terms, assigns weight to a glossary of words linked to specific sentiments, such as anger or desire, and tallies the number of occurrences to score the emotional content of a document. This technology is usually referred to as sentiment analysis and can be very effective at locating documents likely to be responsive.
The increased use of emoji in communication that is subject to discovery has garnered the attention of linguists, technologists and developers to further innovate existing solutions to account for messages that contain emoji – that is individual words and emoji – to identify emotional tone contained within. However, since the context associated with emoji tend to be short in terms of words as compared to an email or business document, the challenge eDiscovery providers face entails the ability to examine a small set of words and emoji, such as the length of a very short sentence, to derive the underlying emotion.
Although this effort will require a lot of work and several rounds of testing, it is by no means insurmountable since emoji are primarily used to convey emotion which, in and of itself, helps add context to an otherwise minimal amount of text for sentiment analysis purposes. In fact, we will briefly discuss a recent study that focused on analyzing emoji across short messages (from Twitter) to successfully create an emoji lexicon for sentiment analysis purposes.
Emoji - Sentiment Analysis
Context is key to understanding the emotion or meaning that messages with emoji are intended to carry. However, emoji differ from text or individual words in that they can express emotional communication without accompanying text, and can also facilitate the communication of subtle emotional cues such as irony, sarcasm and playfulness, that may be difficult to communicate using traditional text-based communication. For example, the message “you deserve it” could be intended positively or negatively, but an accompanying emoji could help clarify the emotional content of this text. Consider if this phrase were to be followed by a smiley face or balloon, you might conclude it was sent with positive intent; however, if the phrase were to be followed by an angry face or a balance scale, perhaps the message was sent with negative intent.
Recent findings have demonstrated that emoji contain necessary information to interpret the emotional and semantic content of digital communication accurately and effectively. However, to date, the vast majority of research that has examined this emotional content has failed to incorporate emoji into their analyses. This is likely the result of a dearth of standardized tools available to researchers to evaluate the contribution of emoji in modern communication.
The original or primary emoji lexicon developed for sentiment analysis purposes was created by Novak et al. (2015) and involved human raters that classified tweets as positive, negative, or neutral. Sentiment scores for each emoji’s positivity and negativity were calculated as the proportion of tweets containing the emoji were rated as positive and negative. This approach is still widely used, but today, researchers are investigating, through the use of newly developed Natural Language Processing (NLP) tools for emotion detection, the ability to create a more expansive lexicon, such as one that could factor 8 basic human emotions, anger, fear, sadness, disgust, joy, sadness, surprise, anticipation & trust in addition to positivity and negativity, when analyzing text that include emoji.
One particular study that extrapolated data from Twitter on 3 separate dates over a 17-month period found that emoji are an important component of nonverbal, emotional communication, and led to the creation of an emoji lexicon that can be used independently or in conjunction with existing sentiment analysis tools. This particular emoji lexicon contains 359 of the most commonly used emoji and provides sentiment numerical ratings for each. It is worth noting that the impressive stability over time of this newly developed emoji lexicon, despite a rapidly evolving landscape of digital communication, supports its viability, or for that matter any new lexicon, to retain relevance and utility in the future. More fascinating insights regarding this research can be accessed directly from this link.
This study also raised another important point pertaining to the sentiment analysis of emoji. Beyond the variation in an emoji’s meaning that can be caused by different devices and software, or the dynamic of time due to factors such as the political climate and social trends, there are 4 other variables that can alter the meaning of an emoji as it relates to legal matters. All 6 are listed below for ease of reference:
Emoji are dynamic symbols with context and interpreter-dependent meaning. For this reason, it is suggested that legal professionals take efforts to untangle and weave historical, social, cultural and legal contexts into the interpretation of an emoji’s meaning.
To further this point, below are a few examples of how emoji may be interpreted differently:
One final instance that underscores how social groups can alter the meaning of emoji involves a recent warning issued by the Narcotics Department of Ohio, in which emoji are reportedly being used in connection with drug activity. This use of emoji tends to refer to the physical, psychological, or physiological characteristics of the referenced drugs. For example, a peeled banana is used to refer to oxycodone/Percocet, a step ladder for alprazolam/Xanax, a snail for fentanyl, and a palm tree for marijuana.
Emoji are also used by this community in generic ways, such as the use of an electrical outlet plug to refer to a drug dealer, or the use of a concert ticket stub emoji to symbolize the price of a drug. Other general references include a flame, gasoline pump, or goat to depict the high potency of a drug, and an astronaut, rocket, or face with an exploding brain to describe the euphoria of drug use.
There are many contextual constraints that can impact an emoji’s meaning that collectively cause interpretative challenges for legal decision-making. However, as evidenced by the emoji lexicon referenced in this article, experts in this field are making great strides at developing solutions that can evolve and scale with emoji’s changing landscape.
Conclusion
It bears repeating that emoji, as the first language borne of the digital age, will continue to pose challenges as new ones are added to the Unicode Consortium, and existing emoji evolve to represent an altered version of their prior meaning or a different meaning altogether as in the drug-use example. That said, with the unofficial, recent global launch of generative AI/machine learning initiatives, many organizations have begun examining how this latest technology can help address the various challenges that emoji interpretation presents today.
Gone is the idea that emoji are used to decorate messages – they are a complex, robust form of digital language that will only increase in use and continue to evolve.
Sources:
Unicode – Emoji
https://home.unicode.org/emoji/
The WIRED Guide to Emoji
https://www.wired.com/story/guide-emoji/
In 2023, Global Emoji Count Could Grow to 3,491
https://www.statista.com/chart/17275/number-of-emojis-from-1995-bis-2019/
Deciphering Emoji Variation in Courts – A Social Semiotic Perspective
https://www.researchgate.net/publication/366224130_Deciphering_emoji_variation_in_courts_a_social_semiotic_perspective
The Multidimensional Lexicon of Emojis: A New Tool to Assess the Emotional Content of Emojis
https://www.frontiersin.org/articles/10.3389/fpsyg.2022.921388/full
Court finds “thumbs-up” emoji can constitute an electronic signature
https://www.mltaikins.com/corporate-commercial/court-finds-thumbs-up-emoji-can-constitute-an-electronic-signature/
Ohio Narcotics Intelligence Center Warns of Emojis Symbolizing Potential Drug Activity
https://publicsafety.ohio.gov/home/news-and-events/all-news/onic_032323
【日本語サマリ】
メッセージアプリの普及に伴い、絵文字含んだチャットメッセージが証拠開示の対象となるケースが増加しています。eDiscoveryサービスプロバイダーはAI/機械学習を活用して絵文字の出現回数を集計し、感情の内容をスコア化するセンチメント分析を導入したり、研究者は自然言語処理(NPL)ツールを通じて絵文字辞書作成の為の調査を進めています。絵文字の意味に影響を与える可能性のある文脈上の制約は数多く存在し、法的な解釈上の課題を引き起こしますが、専門家は絵文字の変化に合わせて進化し、拡張できるソリューションの開発に大きく前進しています。
FRONTEOでも絵文字検索、チャットの重複削除機能を新たに開発、リリースしました。
参照リンク:https://www.fronteo.com/20230904