Predicting Sentiment from Text

After having scraped and analyzed some professor review data, I wanted to know if I could predict sentiment from the text. Reading the reviews as a human, it certainly seems like you can tell a good review from a bad review without looking at the overall score, but could I do this through a machine learning algorithm? First, I used overall score to distinguish good reviews from bad (see the spread here). Since there are so many “5” scores, those became the good reviews. Then I counted “1” and “2” scores as bad reviews and threw out the rest of the scores because they were ambiguous.

To use the text to predict the sentiment, I decided to use a “bag of words,” in which I would disregard grammar and word order and just count how often a word appeared in a review. I also threw out “stopwords,” which are common words like “these” and “am.” This is the loss of a lot of information but can make the analysis problem much more computationally tractable. Each of these words then becomes a feature that can help us predict the sentiment.

One way to predict the sentiment from these features is to form a decision tree. For example, I could predict sentiment with the tree below. This predicts sentiment correctly about half the time. To do a better job than this we can sSample decision treeample the reviews (and the features) and use each sample to make a different decision tree. We train the decision tree by splitting the space of features up in the best way possible (so that the bad and good reviews are separated, as unmixed as possible). We do this splitting by looking at all of the values of the features and deciding where to place the fit, then we evaluate how good the split is and move on to the next split. Eventually we are able to choose amongst the splits, selecting the best one. For example, for a particular sample, it it could end up that the best first split is whether the review contains the word “worst.” Then we proceed iteratively, looking at the two buckets of reviews that we have and deciding how to best split those, and so on. This will train a single tree (for instance like the tree pictured).

But one tree isn’t good enough, so we select another sample of reviews and a sample of features and do this recursive “best” splitting again and again, with each sample making a new tree. We end up with a whole “forest” of trees and we use this forest by running a new review through each tree, determining whether teach tree says the review is bad or good, and then going with the sentiment of the majority of trees to predict the sentiment of this review.

I implemented this with the RandomForestClassifier from sklearn and it is pretty accurate, around 94% on the data I set aside for testing the model. You can find the code in my GitHub (look at sentimentFromText.ipynb).

Professor Reviews and Learning Python

Last fall, I decided to learn Python, with a desire to analyze text and implement some machine learning. So, I decided to start by learning BeautifulSoup and using the tools there to scrape a professor rating site. The project went well, and I was able to write some code (that you can find on my GitHub). I got the hang of scraping and wrote code to collect numeric and text information from reviews of professors by school or by state.

Next, I began to analyze those reviews. I started the project intending to look gender differences in ratings, following other reports of differences, such as Sidanius & Crane, 1989, and and Anderson & Miller, 1997. So, I had to have the gender of the professors, something that was not available in the dataset that I had scraped. I decided to use pronouns to assess gender, and in case there were no pronouns in the text or the pronoun use was unclear, I assigned gender based on name.

I compiled reviews for schools in Rhode Island, New Hampshire, and Maine, discarded reviews with no gender assigned, and began by looking at differences in numeric rankings, which include an overall score and a difficulty score. Male and female professors have nearly identical mean scores — women’s mean overall is 3.71 and men’s is 3.75 and women’s difficulty is 2.91 and men’s is 2.90. The overall and difficulty means for each professor are correlated, as you can see here:

Plot of histograms, scatter with linear, and residuals

Interestingly, women seem to have fewer reviews than men. On average, female professors have about 9 reviews and male professors have about 12. This difference seems to be stable when looking at each individual year, and could be due to male professors teaching larger classes (but I have no data on that). The end result is that there are far fewer reviews of female professors. In my dataset, there are 107,930 reviews of male professors and just 64,799 reviews of female professors.

The data set also has a self-report of grades from most reviewers. You can see in the data overall scores go down when  reviewers get a bad grade, but women seem to be hit harder by this than men.

Bar graph showing overall mean score by grade and gender.

Note also that far more reviewers report receiving high grades. In fact, over 160,000 of the 173,000 reviews in my dataset report getting A’s.

The overall scores show a bimodal distribution, as you can see in the histogram of overall scores (reviewers can report scores from 0 to 5 with half-points possible). The next thing I decided to was to categorize these reviews into positive or negative, getting rid of reviews in the middle, and then to do some analysis of the text in those reviews. I’ll report on that next.

Histogram of overall scores, showing bimodal distributions for both men and women.

 

Reading Mathematics: Click/Clunk

Years ago, I did some work to help students read mathematics textbooks. I gave a presentation on the material at an NCTM conference and wrote a piece for students that is still in use by the Harvard Bureau of Study Council. I was recently reminded of the work because of an email I received about it, so I’m going to be looking at making use of the work again and perhaps writing up something additional about it and getting the work out more broadly. It is based on the “click/clunk” method which is used in some reading methods in education.

Makers, Doers, and Liberation Math

There’s a growing interest out in the world in making cool things, particularly with technology. Commonly called the “maker movement,” this trend has its roots in tinkering with technology and computing in ways that move the creation of things out of the hands of manufacturers and into the hands of real people. There is a magazine, MAKE, devoted to this movement, and Maker Faire‘s all over the place where people come together in community to learn, share, and show off stuff that is made by real people. The president of the United States even mentioned 3D printing in his state of the union address in February — this used to be a technology that existed only in the manufacturing sector, but MakerBot, Shapeways and others have brought the technology to makers so that we can all play. And making isn’t just people who already know what they are doing — thirteen-year-old Lauren Rojas recently gained YouTube fame for her video of a rocket she built and launched.

I’ve been starting to ask myself who gets to be a maker. Yes, I know, its a grassroots movement, so of course the answer is “anyone.” But it isn’t really anyone. At the right is what the Maker Faire people put out MakerFaireDemographics in terms of demographics to get sponsors, so you can see that, as you might have guessed, this movement is fairly male and pretty well-funded. MAKE magazine is even more extreme, with subscribers being 90% male. So we should be talking about access, equity, and justice issues. Some people are talking (for instance here and here), but we certainly need more.

But there are more than just access issues involved in who becomes a maker, or, more broadly, a doer. I started thinking through a mind map of the issues last night in my weekly Liberation Math class, and it morphed into the diagram below.

Diagram of Resources, Community, and Self

Now, you might be wondering what place this all has in my math class. Certainly the maker movement is exerting and influence on STEM (“science, technology, engineering, and math”) education, so that’s a part of it. But more than that, liberation math is all about becoming a maker and a doer. I’m trying to fight against the idea that students are empty and powerless vessels for the knowledge and excitement that I already have. I want people to rise up and take charge of their own mathematics, to become powerful doers and makers of mathematics. Part of this power is the power to decide. People might decide to do very little math, and there’s nothing wrong with that.  So long as people are in community and have resources, they can step back from math without losing their ability to make and do. When people need the math, it is always there, and they can access it again through accessing their networks, taking advantage of learning and skill-building opportunities. It is only when students are stuck in the middle of oppressive curriculum that you fall off the edge of the world, doomed to be lost forever if you step away from math. People who are standing in a place of power, safety, and courage can always find a route to access and use the mathematics that they need.

Disappointment and Hiding in the Classroom

I’ve been noticing lately my disappointment in students. I don’t want to feel disappointed in students. Honestly, I don’t want to feel disappointed in anyone. Who does? But you might argue that we have certain expectations for how the people around us will act, and that people don’t always meet those expectations. When they don’t, I am justified in feeling disappointed, at least provided that my expectations were reasonable. The trouble is that  disappointment is counterproductive, and for me it is part of an overall tendency I have to disconnect with people.

Let me look at this a little closer. I have certain expectations for my students. I set those out for the students by giving them specific assignments (“turn this worksheet in on Monday” or “write a blog post about your problem-solving process”), and I lay them out on the course syllabus by telling students to come to class, check their email regularly, participate, and so forth. There are also a collection of expectations that go unspoken by me. I expect that students will be thinking about what they need to do to prepare for upcoming exams, even if I don’t give them explicit assignments. I expect that students will ask for help and support when they don’t understand something after class. I expect that students will monitor what they do and don’t understand. I expect that students will give me their best work, and won’t piece together something at the last minute. I often say things which imply these expectations, but I’m not always explicit about them. Also notice that not all of these expectations are realistic.

If a student doesn’t meet these expectations, I get cranky. In between classes, if I am expecting work and participation from students that I don’t see, I start to worry, and to run my “disappointment tape.” Typically it involves me getting frustrated and making up a lot of things that I imagine to be happening with the students. I imagine them as uninterested in the course, not dedicated, not hard-working, wanting to get away with not doing work, not caring about thinking deeply, not caring about interacting with me or other students. Yes, there’s some really ugly stuff hiding in there. The thing is that I don’t know that any of that is really happening. Mostly, I think what is happening with me is that I want this connection with students, and most of what I have to connect with is their work. When the work isn’t there, I feel rejected. I imagine the students pulling away from me, and I rush to pull away from them first, by getting “disappointed” in them. Most of the time, I can get back my connection with the students simply by being around them — it is the time in between classes that provides a space for these feelings to grow.

Students don’t always do what we teachers what them to do. In fact, people in general don’t always do what other people what them to do. So we get anxious about our relationships and our standing with other people. In school, this means teachers get frustrated with and disappointed in students. What do students do? Students learn to hide from the disappointment of teachers. They hide and they lie so they can save themselves from the consequences of expectations unmet. Students hide so that they’re grades aren’t in jeopardy and they hide so that they can maintain positive relationships with the powerful people that are important to them. Students get into a habit of hiding, so that it seems as natural as breathing. I remember it well from the last time I was a student — doing work I wasn’t proud of and hoping it would slip by without notice, making up excuses for doing work late or stretching excuses that were technically true but not really accurate, trying to look good in order to get away with things. As a teacher, I know that students are doing these things, but I ignore it, acting as if students are going to meet all of my expectations, and then getting disappointed when they don’t. Because I am required to assign grades to students, I maintain and perpetuate the fiction that grades mean something objective, when the reality is that they’re just a somewhat arbitrary record of how well a student met my somewhat arbitrary standards about a somewhat arbitrary collection of activities and topics.

What if I stopped doing this? It’s hard to imagine. Could I stop having expectations of students? What would happen to me and to the students if I did? What if I kept having my expectations, but was more honest about the fact that I know students won’t always meet them? What’s so bad about the students not meeting them anyways? Could I keep the expectations, but let go of the disappointment, simply connecting with students about what happened and deciding what to do next? Could I let my students be honest with me about the unrealistic nature of my expectations and with what really happens for them in a class? Could I let students formulate their own expectations, help them to make those expectations realistic, and then help them to live up to those expectations? Could I create a classroom environment in which I helped my students evaluate themselves? Wouldn’t this cause the very foundation of objective and rational subjects like math and science crumble because students would start writing expressive poetry about how math makes them feel and giving themselves an A++ on every assignment?

 

Emotional Cycle of Teaching

I’m now in the second week of classes, and today I noticed how much my emotions have been fluctuating over the last week. I’ve experienced excitement, tension, anxiety/worry, happiness, connection, and isolation. For me, what primarily drives these emotions is how connected I feel and how exposed I feel. As I gear up for a class, I think about what I want to do and what the students might want and my anxiety and excitement both go up. I want the class to go well, and I manage the anxiety around that by preparing. Sometimes my preparation is great, and sometimes I over-prepare, repeatedly messing with my plans and making them more elaborate or complicated than they need to be. Essentially, the anxiety is about exposure and vulnerability. Teaching leaves you very vulnerable and we all deal with that vulnerability in different ways. The more I can just be OK with the vulnerability, the better things tend to go because when I do that I leave plenty of room for the students. When I get to tense and over-prepare, I tend to shut the students out, trying to control everything about the class. There’s a sweet spot to preparation, where I feel safe enough, but let myself be vulnerable enough to the students to make real connections. It’s often a hard spot for me to reach!

During class, my emotions all depend on what I get back from the students. If I’m getting a lot back from the students, I feel connected and less exposed, so I relax and take more risks. When I get less back from students, I talk more and feel more exposed and anxious. I want to focus this semester on watching the students more, no matter my mood, setting aside whatever anxiety I feel to really see what they are doing. It’s harder than it sounds, at least for me.

After a class, I tend to get a dip where I worry about both my performance and the students performance. What did they get out of the class? Are we moving in the right direction? Here I find that minute responses can help, because at least I have information from students and for me data is often an antidote to anxiety and that feeling of exposure. Even better is real conversations with students directly after class, and I want to make more of those happen. Checking in with students after class can lead to a great dialogue and a chance to offer support. I also feel relief after teaching — another class is over and I don’t have to start that cycle planning, execution, and evaluation for another couple of days.

Why Shame? Why Mathematics?

Shame is an painful and disruptive emotion in which a person feels a deep-seated failure or flaw in their core self; the feeling is often experienced as feeling exposed, small, worthless, or wanting to withdraw or even die. Although shame can occur in private or in public it is a an emotion that signals a threat to our social being and the feeling can be characterized as feeling unworthy of human connection. Scheff and Retzinger make a case that shame is the “dominant emotion in social interactions,” but note that this shame is often unacknowledged and unclaimed.1 They note that, “Since one’s relationships and emotions don’t show up on a resume’, they have been de-emphasized to the point of disappearance. But shame and relationships don’t disappear, they just assume hidden and disguised forms.” 2

Shaming experiences can happen in all school learning, but students learning mathematics may be particularly vulnerable to such experiences. In a traditional mathematics classroom there is little ambiguity or room for interpretation in problems, and the learning is focused on products, rules, and algorithms. This “right or wrong” nature of mathematics can prevent students from saving face, or otherwise deflecting shame experiences, and can trap students who are struggling in a repeated cycle of negative experiences that are eventually felt as a flawed self. Doing mathematics requires a student to perform in ways that call into question not just her memory, but also her understanding and intelligence, both because mathematics requires the performance of mental skill and because mathematical competence is seen as a stand-in for overall intelligence and ability. As Tamara Bibby says in her paper on shame in mathematics, “It is important to be seen to be able to do/perform mathematics, i.e. ‘do it’ right quickly and efficiently—preferably mentally or with a neat paper and pencil algorithm with as little mark making as possible and with an exact answer.”3

Mathematics is seen as an objective judge, and this aspect of judgment may contribute to the experience of shame. Unlike other subjects, in mathematics there is often no room for other points of view. In science, the interpretation of data may lead to different conclusions, and theories change as new information comes to light. In history, there are some immutable facts, but there is plenty of room for interpretation through different lenses. In English, the interpretation and interaction with the subject is everything. School mathematics also generally requires the student to make a permanent record of their answers as well as the work behind those answers, both of which can make the student vulnerable to judgment.

It is clear why anxiety, panic, and fear were first identified as a barrier to doing mathematics. Many people doing mathematics feel a crippling panic as they sit down to do math. Laurie Buxton separates this anxiety into what she calls “mind chaos” and what she says is more common in math class, a “paralysis” of the mind.4 Fear is the presenting emotion, but shame is the core emotion since the fear is that “through an unwitting self-disclosure, you will allow someone to see your ineptitude and so open yourself to ridicule.”5.

Many people feel silenced by mathematics, lacking the vocabulary and voice to discuss their ideas and feelings. In mathematics classrooms, the discourse is generally out of the control of the student. In everyday conversation, students can manage their own self-disclosure and are likely to be engaged with a supportive other who will acknowledge the separate reality of the speaker. But in a mathematics classroom, the it may be impossible to keep some aspects of work private, and the discussions are around things that are right or wrong with no room for management or hiding.

Continue reading