Are algorithms fair?
With algorithms taking a major role in the decision-making of common life situations, it’s natural to ask if they are fair. The issue is that, with great fairness, comes bad performance… or does it?
In this article, we’ll explore a specific use case and dissect it to understand whether algorithms can be constructed to achieve positive results without negatively affecting the target user in the real world.
If you ever requested a loan, a mortgage, applied for a job, called customer service or used a credit card; Then your life, for the better or worse, was influenced by the output of a data science algorithm.
Let me start by saying that algorithms cannot be classified as “fair/not fair” there is no concept in algorithms themselves for that because they don’t think. Algorithms have inputs and outputs, and normally, there is a function that they’re optimising for. For example, if the algorithm says that “you are eligible for a credit card” then it is very likely that you’ll get it, but they might still have some human input in determining acceptance or rejection of the application.
What needs to be discussed is whether the way the algorithm is planned to be used is fair, and to make sure that the algorithm is designed to be fair with everyone and not just benefit a minority or even majority group.
Let’s focus on a specific yet unexpected example where life decisions are being affected by algorithms: Data Science algorithms are being used to shape the studies of children in some schools in Hong Kong. Let’s make a summary of the system based on information from the above article:
What does the program do?
One of the objectives of the algorithm is to accurately determine emotions using data extracted via video recordings, focusing on the facial movements of the students. Let’s start by noting that the creators of the algorithm claim not to keep the footage of the children, just the data generated from it i.e the distance between the eyes and the eyebrows. Those emotions captured on film in conjunction with data regarding the student: how long it takes to answer questions, marks and performance history, generates reports and highlights their strengths and weaknesses. The algorithm is also utilized to forecast their grades.
What is used for evaluation? What was the research behind it?
Just like many of these tools, details of how they operate are unknown. Reading through the information provided by the creator company , it is not 100% apparent how the algorithm works and so it is difficult to evaluate what specific inputs are being used for evaluation and the background research that led the company to this method.
Using what is mentioned in the article, it seems like their main point is to utilize the facial expressions of emotions combined with the history of the student (grades are mentioned). Since these tools are built to work on something so important as student’s choices, both made by and made for the student, based upon data outputted via the algorithm i.e. the way they operate, their weaknesses, and how they are improving on them, the evaluation procedure and background research should be very clear and open to the public but as Cathy O’Neil mentions in her informative talk weapons of math destruction , these tools are rarely transparent.
How is it used?
It helps teachers to create personalised tests and assignments based upon the emotional state of the student.
How are they determining value?
Students perform 10 % better in exams if they have used the system. It also detects emotions such as happiness/sadness with a 90 % accuracy, for complex emotions such as confusion and anger, the performance drops to 60-70%.
That’s all for the summary, now, I hope you have as many questions as I did when I finished the article!
First of all, let me state something obvious:
Even if the algorithm accurately detects emotions; that only validates the performance of the model, not the assumption for which it was created. It's extremely short-sighted to think the emotions of students are primarily dependent on their life at school and therefore their mood and academic success can be rectified via alterations to their learning program. Perhaps the student has a negative home life or is a victim of bullying unbeknownst to the school system.
Let me expand upon that:
Testing of artificial intelligence systems that shape our lives needs to be focused on proving the assumptions that inspired them, not on the performance of the metrics that the machine learning algorithms target.
If you look into most articles talking about systems using artificial intelligence, most of the time they make claims using this language: “the algorithm has a 90% accuracy on detecting risky applicants” or “happiness/sadness is detected with 90% accuracy”. Those are metrics that the algorithms optimise for, not metrics or information that substantially help us to understand how including artificial intelligence algorithms are improving a process. How are they making it better? Why are they needed?
We should be seeing other information. For a system like in the above example, I would love to get the following information:
Since the recommendations start to shape the life of the students, what process do they have in place to effectively see that their life will improve? Assuming the recommendations are based on emotions, what do they do when the emotions are not caused by the class itself, but from something outside the class? Are they tracking this and finding ways to improve the system?
Let’s expand on the last point. If we separate the students into two groups:
Group 1: Students whose strong emotions are mainly caused due to the learning experience. Group 2: Students whose strong emotions are mainly caused due to external life circumstances.
Which group would benefit more from the system?
Without a doubt, Group 1. And without a doubt Group 2 will most certainly not benefit to a great degree from the platform. Let’s examine the two possibilities for Group 2:
- The student has a happy life outside class.
Then, the teacher will have no clue when a student is not understanding something unless it has input from other tests. In any case, emotion recognition wouldn’t be beneficial here.
- The student has difficulties outside of class which is producing strong negative emotions.
Here is where things get unfair. Students with acute sadness tend to underperform, and often the root issue must be addressed before any good progress can be made at school. What is the system going to do when these situations happen? Is it going to try to recommend further exercises? I don’t know, but I couldn’t find any guidance on their website that details what process might be in place in these situations.
To elaborate on this, there are two challenges I see on the system:
Number one is that children’s emotions and inner state are not a sole, direct consequence of what happens in the classroom, let alone the fact that the inner state is not always reflected in what our faces show. Their life outside academic study is often much more indicative of their feelings. Even if it is true that the algorithm is “accurately detecting emotions” there will be a huge misunderstanding from the teachers if they assume that difficulties are caused by or can be cured by, adjustments in the class.
The second point and consequence of the first is what do we do when the suggestions are based on emotions but the children have a heightened and negative inner state due to a personal situation, or their grades are generally lower because their day to day life is more complicated than those of their peers.
This makes the system unfair for those who actually need it the most and in conjunction makes the system “perform better”, according to their definition of performance, for students that would’ve probably performed well regardless.
All systems will have flaws. Our focus and criticism need to be centred on whether these algorithms are actually needed and to find ways which evaluate whether they do in fact make an improvement in real-world terms, rather than simply reinforcing the data the program was created to produce.
Unfortunately in most cases, artificial intelligence systems transparency is rarely present. You can find many analogous situations to the one we explored in this article, however, I am hopeful because there are some efforts to try to make the systems as fair as possible, for example, in 2018, Google organised a competition to improve inclusion on image classification .
We are entering an era where algorithms will influence our day-to-day life. The challenge is that we are also entering a phase where we are not auditing them enough to understand if they create a true benefit.
If you are designing a system using machine learning algorithms, I urge you to include fairness in your design and even if you’re not, I urge you to become aware of the processes behind these algorithms so we can all hold companies and institutions accountable for lapses in fair and equal treatment.
To programmers, I would like to suggest to always go deeper than average metrics and really understand how your system affects different groups of humans. It’s particularly important to take note of those who might need the system the most and as much as possible, use data that is a representation of fairness.
The team working and designing the system must be diverse and bring in many ideas and viewpoints on how the process may impact people. That’s a pivotal way to really understand if the way it is coded is fair.
Sign up now and apply for roles at companies that interest you.
Engineers who find a new job through Functional Works average a 15% increase in salary.Start with GitHubStart with TwitterStart with Stack OverflowStart with Email