I feel the need to have a place to collect what little I can learn about COVID-19. I am on an immunosuppressive therapy (Ocrevus) that increases my risk for pneumonia and other illnesses. Many people with MS are on immunosuppressive therapies, but not all of us. You should know your own situation and ask your neuro if you need help assessing risk. If you get other recommendations or have other sources of info please comment!

Some links

Good summary of COVID-19:

Informative article about severe cases:

Dr. Aaron Boster on COVID-19 (leaves out ocrevus) and here’s COVID-19 ocrevus:

Barts MS on COVID-19: https://multiple-sclerosis-research….0/02/pandemic/

Barts MS on Italian response to community infection by MS neuros: https://multiple-sclerosis-research….over-covid-19/

WHO situation reports on COVID-19 — here you can get info on spread by country, recommendations, updated regularly:…ation-reports/

CreakyJoints COVID-19 page with info and advice:

How to wash hands from WHO (great video but leaves out the step to open door with paper towel):

CDC on COVID-19:

Live data about COVID-19:

Possible recomendations

I have gleaned these from a variety of sources and I am following them because I am on ocrelizumab and I believe that community transmission here will be happening very soon (I’m in Massachusetts in the US). These are specifically for people that are immune compromised or have other conditions that are at risk for more serious responses to COVID-19. Of course, advice of your own doctor or public health department is what you should really listen to!

  1. Consider delaying start of lymphodepleting DMTs such as Ocrelizumab, Alemtuzumab, Rituximab or Cladribine. Talk to your doctor before you stop any DMT — depending on your disease stopping a DMT could be an awful idea!
  2. Avoid crowded places such as cinemas, theatres, schools, etc. Avoid travel on public transit and airplanes if possible. Stay at least a six feet away from other people whenever possible.
  3. Work from home if you work!
  4. Italian neuros are recommending that those on immunosuppressive infusion therapies use protective surgical-grade masks. I don’t see how I could get one of those, but if you’ve got it then use it.
  5. Use caution when touching doors/lightswitches/handrails/etc outside of the home. I have latex gloves that I wear when out. Hand sanitizer is also good when out in the world and carrying tissues or paper towels for door opening.
  6. Use disinfectant wipes around the house and out in the world. I wipe my computer, surfaces, lightswitches, door handles, phone twice a day. Wipe your mobility aids too!
  7. Wash hands with soap whenever you return home, before eating or cooking, after bathroom.
  8. Carry hand sanitizer and use at home and while out. We will run out in another couple of weeks so I have alcohol and aloe on order to make my own (WHOs directions here:…Production.pdf)
  9. Cough or sneeze into a disposable tissue and discard. Your sleeve can keep germs for a looooong time.
  10. If I had a disposable mask I’d be using it to stop myself from touching my face (but not for protecting me from germs because that doesn’t work).
  11. I have some zinc lozenges and will probably start taking at some point due to paranoia. There evidence zinc lozenges help with general coronavirus but they are not a cure-all.
  12. Stock up on food that is shelf-stable.
  13. Gyms and swimming. I can’t really find any good info about swimming risk, but did find this:…irus-in-water/. Overall, gyms and pools are probably best to stay away from for now if you have immune issues. Found a second source that says pools should be chlorinated enough to stop spread
  14. Should members of your family that live with you also restrict their travel and movements? I have not found any articles discussing this. My husband is working from home right now, my kids are still going to school, and the whole family is restricting travel. If things get more intense here we might do some further restriction.
  15. Don’t forget to take care of yourself. I found this article for cancer patients that talked some about exercise, getting outside, good nutrition, and sleeping. From the article: “‘Sleep deprivation is one of the most potent ways of suppressing the immune system,’ Lyman said. ‘Everybody has a different threshold but if you’re not getting a minimum of six or seven or, ideally, eight hours of sleep a night, there’s demonstrable scientific evidence that the immune system may be compromised.’”
  16. If you have cold/flu symptoms call your doctor and talk to them. If you are immune compromised, make sure they know that as they advise you what to do next. If your symptoms start to get more serious, get more serious in your calling. Here’s what I find about what they might do: “There is currently no specific treatment or vaccine against coronavirus-caused respiratory illness. Supportive care is the mainstay of management for all patients including the ones confirmed with SARS, MERS or 2019-nCoV. Oxygen, IV fluids and possible mechanical ventilation may be warranted for all patients with severe clinical presentation. Several antiviral treatments including ribavirin, interferons and the anti-HIV combination lopinavir/ritonavir or remdesivir are under investigation for use against MERS-CoV infection and have been initiated against the 2019-nCoV” (from…-coronaviruses).

Continue reading


New Data Science Bot

Just wanted to post this quick tweet from the new data science twitter bot I created. I’ll be posting some details about it’s creation later this week. For now, just enjoy the absurdity.

troll on a computer

Finding Trolls on Twitter

A couple of months ago, I left my job as a math professor and joined Insight Data Science in order to start a new career in data science. Why on earth would I do that, you ask? I enjoyed my job as a professor, and I was even (arguably) good at it. I got to do fun things with students. The truth is simply that I wanted to work on real problems with good people, and Insight has been so much fun, that I am sad now that it is coming to a close (but I am excited to see where I will land next).

Our first task at Insight is to select a project. There were some consulting project available, but I ended up deciding to strike out on my own, trying to better understand trolls on twitter. I’ve been a twitter user for several years, and I am well aware of the problems that trolls cause for users of twitter and other social media.

I decided I wanted to take a machine learning approach to finding trolls. My first stab at the problem was to use shared block lists. Twitter allows users to block other users, and in the past couple of years, Twitter has started allowing users to share these block lists. There is one service helping with this sharing called @blocktogether. So I collected multiple blocklists (as a result, I currently block 250K users), and my hope was that I could use these to curate a list of trolls. I was able to identify a number of interesting things about blocked users vs non-blocked users. For instance, blocked users tweeted less per day on average but had more followers, so apparently we are not following the rule of “don’t feed the trolls.”

A few things got in the way — I found that the lists coming from @blocktogether cut off the number of users they would allow me to download. But even worse, I began to suspect that even without my download problems, I would still struggle to find a sufficient number of users on large numbers of blocklists (twitter is a very big place after all). I would also have no way of knowing if a user appeared on multiple blocklists simply because the blocklists were being shared, rather than because users independently decided to block them.

So, I decided I had to come up with my own definition of trolling and to use a rules-based approach to identify trolls. My first stab at that was to say that trolling was repeatedly mentioning a user in a tweet with a negative sentiment. I was able to identify these trolling tweets, and I made a machine learning model to predict which users were trolls. The model did fine, but as I looked through the users identified as trolls, I realized that I was really doing a good job of finding arguments on twitter, but a less good job of finding users I would truly consider trolls.

As I thought about the problem, and I landed on one older example of trolling that helped me change my perspective. This was the case of Robin Williams daughter who posted a tribute to her father and was set upon by a pair of trolls telling her things like “I hope you choke and have to scream out for help but no one helps you and they just watch you choke.” I looked back on other examples I had of real trolling and realized that most of them involved saying “you” — these were personal attacks that needed the second person pronoun. At that point, I changed my criteria for trolling to be that there were at least two mentions of the same person, that they had negative sentiment, and that they used the word “you” (or similar words like “your”).

With this criteria, I was able to find a number of users that I consider trolls. These were users that were on my blocklist, who also engaged in trolling behavior in their last 200 tweets. That is, they mentioned a specific user at least twice, used you language in those tweets, and those tweets had negative sentiment (I used the Vader python package to get the sentiment). Out of about 10K users on blocklists, I found that 44% had trolled by this definition. I also needed “human” tweets, so I grabbed a random collection on people who were on positive lists (so that they were about as engaged as the trolls) using the twitter API. It turns out that about 7% of those had engaged in trolling behavior, so I threw those out.

Screen Shot 2017-07-25 at 3.40.12 PMFrom these categorized users, I collected tweets without mentions (since I had used the mentions to determine if they were trolls or not). I split those tweets off into a training a test set so that I could see if I could predict the trolls from their no-mention tweets. I turned their tweets into a bag of words and bigrams, vectorized them using TF-IDF and fed the vectors into a logistic regression model. My model was a pretty good predictor, and gave me the collection of words that were most predictive, which was illuminating. Basically, trolls are talking about gamergate (still!) and politics.

Now, here are my words of caution. I need to update this model regularly, because the topics trolls are talking about are going to change, so my model needs to change to catch todays trolls. I also need to still refine my model a bit. My model looks for negative sentiment, but saying “I’m really angry at you” isn’t trolling, it’s expressing a negative feeling. I need to search for hate-speech, and I have not yet implemented this in the model. I would also love to do some test to see how little text I can use and identify a troll. What if I just grabbed a couple of tweets plus the users description. Is that enough? And finally, if I wanted to really deploy this model to catch trolls, I need to be sure that it errs on the side of avoiding false negatives in order to protect the free speech of twitter users. I have not explored different thresholds for this model (although note that it has a nice ROC curve).

If you are interested in using the app I created, find it at You can also follow me on twitter (I’m not a troll, I promise).

Predicting Sentiment from Text

After having scraped and analyzed some professor review data, I wanted to know if I could predict sentiment from the text. Reading the reviews as a human, it certainly seems like you can tell a good review from a bad review without looking at the overall score, but could I do this through a machine learning algorithm? First, I used overall score to distinguish good reviews from bad (see the spread here). Since there are so many “5” scores, those became the good reviews. Then I counted “1” and “2” scores as bad reviews and threw out the rest of the scores because they were ambiguous.

To use the text to predict the sentiment, I decided to use a “bag of words,” in which I would disregard grammar and word order and just count how often a word appeared in a review. I also threw out “stopwords,” which are common words like “these” and “am.” This is the loss of a lot of information but can make the analysis problem much more computationally tractable. Each of these words then becomes a feature that can help us predict the sentiment.

One way to predict the sentiment from these features is to form a decision tree. For example, I could predict sentiment with the tree below. This predicts sentiment correctly about half the time. To do a better job than this we can sSample decision treeample the reviews (and the features) and use each sample to make a different decision tree. We train the decision tree by splitting the space of features up in the best way possible (so that the bad and good reviews are separated, as unmixed as possible). We do this splitting by looking at all of the values of the features and deciding where to place the fit, then we evaluate how good the split is and move on to the next split. Eventually we are able to choose amongst the splits, selecting the best one. For example, for a particular sample, it it could end up that the best first split is whether the review contains the word “worst.” Then we proceed iteratively, looking at the two buckets of reviews that we have and deciding how to best split those, and so on. This will train a single tree (for instance like the tree pictured).

But one tree isn’t good enough, so we select another sample of reviews and a sample of features and do this recursive “best” splitting again and again, with each sample making a new tree. We end up with a whole “forest” of trees and we use this forest by running a new review through each tree, determining whether teach tree says the review is bad or good, and then going with the sentiment of the majority of trees to predict the sentiment of this review.

I implemented this with the RandomForestClassifier from sklearn and it is pretty accurate, around 94% on the data I set aside for testing the model. You can find the code in my GitHub (look at sentimentFromText.ipynb).

Professor Reviews and Learning Python

Last fall, I decided to learn Python, with a desire to analyze text and implement some machine learning. So, I decided to start by learning BeautifulSoup and using the tools there to scrape a professor rating site. The project went well, and I was able to write some code (that you can find on my GitHub). I got the hang of scraping and wrote code to collect numeric and text information from reviews of professors by school or by state.

Next, I began to analyze those reviews. I started the project intending to look gender differences in ratings, following other reports of differences, such as Sidanius & Crane, 1989, and and Anderson & Miller, 1997. So, I had to have the gender of the professors, something that was not available in the dataset that I had scraped. I decided to use pronouns to assess gender, and in case there were no pronouns in the text or the pronoun use was unclear, I assigned gender based on name.

I compiled reviews for schools in Rhode Island, New Hampshire, and Maine, discarded reviews with no gender assigned, and began by looking at differences in numeric rankings, which include an overall score and a difficulty score. Male and female professors have nearly identical mean scores — women’s mean overall is 3.71 and men’s is 3.75 and women’s difficulty is 2.91 and men’s is 2.90. The overall and difficulty means for each professor are correlated, as you can see here:

Plot of histograms, scatter with linear, and residuals

Interestingly, women seem to have fewer reviews than men. On average, female professors have about 9 reviews and male professors have about 12. This difference seems to be stable when looking at each individual year, and could be due to male professors teaching larger classes (but I have no data on that). The end result is that there are far fewer reviews of female professors. In my dataset, there are 107,930 reviews of male professors and just 64,799 reviews of female professors.

The data set also has a self-report of grades from most reviewers. You can see in the data overall scores go down when  reviewers get a bad grade, but women seem to be hit harder by this than men.

Bar graph showing overall mean score by grade and gender.

Note also that far more reviewers report receiving high grades. In fact, over 160,000 of the 173,000 reviews in my dataset report getting A’s.

The overall scores show a bimodal distribution, as you can see in the histogram of overall scores (reviewers can report scores from 0 to 5 with half-points possible). The next thing I decided to was to categorize these reviews into positive or negative, getting rid of reviews in the middle, and then to do some analysis of the text in those reviews. I’ll report on that next.

Histogram of overall scores, showing bimodal distributions for both men and women.


Design for Learning Stats

From my course blog for Math | Art | Design.

Math | Art | Design

A student at Brown, Daniel Kunin, has created a terrific visual resource for explaining statistics. It is called Seeing Theory, and it is hosted at Brown. Under the hood, it features Mike Bostock’s JavaScript library for creating visualizations, D3. For anyone interested in visualizing quantitative information, it’s a delight!

View original post

Reading Mathematics: Click/Clunk

Years ago, I did some work to help students read mathematics textbooks. I gave a presentation on the material at an NCTM conference and wrote a piece for students that is still in use by the Harvard Bureau of Study Council. I was recently reminded of the work because of an email I received about it, so I’m going to be looking at making use of the work again and perhaps writing up something additional about it and getting the work out more broadly. It is based on the “click/clunk” method which is used in some reading methods in education.


I’m learning about various machine learning algorithms, so I want to get recorded what I have learned and where I learned it, in part so that I can relearn it again after I inevitably forget it!

The perceptron is “baby’s first neural network.” It can be used successfully for learning binary classification of data that is linearly separable. The basic idea is that you have some training data that comes to you as vectors. You can start by guessing a weighting for those vectors, which is basically a guess at the subspace that separates your data (the weighting gives the normal vector for the subspace), or you can just initialize the weights to 0. Then you look at a random data point. First, you have to see how your current perceptron categorizes the data point, which you can do by taking a dot product of the weight vector with the data point vector and just looking at its sign.

If it is incorrectly classified, you need to adjust your weighting, which you do by subtracting (a scaling of) the vector of your current data point from your weighting vector, giving that normal vector a bit of a bump so that you will be correctly classifying the current data point. Then you pick another point and do the whole thing again. You are continuously adjusting your weights, so presumably your perceptron is getting better all the time. It is also useful to note that you need to use some kind of activation function to distinguish between correctly and incorrectly classified data points and it seems pretty typical to use a threshold step function.

Does this process ever end? Yes, it will provided that your data is linearly separable. You can even find the proof here. How long does it really take? I don’t know. Presumably it’s not the worst thing to do since lots of people reference it. What if your data isn’t really linearly separable? Well, it will go on forever, so you better pick a maximum number of iterations. Will it give you something reasonable after a reasonable number of iterations if you data is linearly separable-ish? No idea, but it seems like it might.

I read several useful pieces to figure out what I do know. I found this material from a presentation in graduate course on machine learning (there’s a lot of other interesting stuff in the webpage for the 2007 course I also relied heavily on this material on perceptron from a CMU course in computer vision. Both of these sources have useful illustrations that I decided not to replicate here, so you should go look at them.The wikipedia page on perceptron had some good material, and I got curious a little about history, so I read

Woman sitting in a motorized chair

#9: Park on a Hill

When I was writing my dissertation I was part of the PhinisheD community, and there I learned the concept of “parking on a hill.” The idea is to not end your day’s work with the completion of a task, but rather to end it with a task started, ready to move on the next day. Much in the way that gravity will help you get started if you park a car in a hill, the gravity of having a task in progress can help you get started when you park a project on a hill.

This is a technique I could use more of in my life right now, so I am going to record some of my thoughts on this technique here. Right now I am working on a paper that I think I’ve been writing for three years. I’m very slow. This is a “back burner” project which means I often go for days or weeks without touching it. So if I can be sure to have it set up with something that will move me forward whenever I leave it then I think two things will happen. First, part of my brain will already be engaged in the next idea or task, which may push me toward getting back to the project and may help good ideas to percolate in my brain. Second, if I have jotted down something that will help me get started I may find it easier to remember and reconnect with the project the next time I sit down.

Here are some ideas for parking on a hill:

  • Write half a sentence to give yourself a jumping off point. This can produce some anxiety — you probably have an idea of how you are going to finish the sentence, but what if you never get that idea back? But the fact is that leaving openness allows something new to come into your work. If you over-plan your next steps you may miss the exciting accidents and new ideas that come to you when you least expect it.
  • Briefly outline the thing you want to do or write. Don’t over-plan since that won’t produce energy or excitement, but do be sure you have a next job that is easy and fun to dive into.
  • Ask a question. For instance, I am working on a developing some materials for a class this fall, so I might end my work day with this question, “How can I refresh student’s understanding of percentages without talking down to my students and ignoring the understanding they already have?” Then I can start my day with some freewriting in response to that question, which may help me set up the activities I want to develop of the class.