Syndromic Surveillance & Covid-19 Caveats

This article is part of the Syndromic Surveillance and Covid-19 collection on Thoughtfaucet.

There are caveats to all data projects. I do not believe any of these undermine the work and thinking. But they are important to note (and I hope you mention other caveats as well–it improves the project) and discuss if necessary. But all the same, remember that John Snow removed the pump handle for a reason.

Syndromic Surveillance Factors

  • Syndromic surveillance from tools such as EpiQuery are tracking what people say upon arrival to the emergency department. They specifically do not show diagnosis or what happens after they arrive. Neither Influenza-like Illness nor Respiratory cases are necessarily Covid-19 cases.

Human factors

  • I entered this data by hand into a spreadsheet. I could have made an error. I double check everything but still I am human.
  • The data itself has the caveats of data entry and maintenance. While working on this project I noticed that the download CSV version differs from the web interface data. Not by a giant amount, but it’s definitely not the same. This project uses the web interface version which I can confirm is updated after it is initially entered.
  • The data is encoded by humans in the emergency departments.

Disease Factors

  • Some of the aspects of Covid-19 present in ways which may result in their encoding going into various parts of EpiQuery that are not being observed by my methods.
  • The things we know about coronavirus are still emerging. Some of the key data for these graphs may change, such as the doubling rate. How the disease progresses appears to be very dependent on the human behavior in relation to it and how well medical systems cope with the surge in demand.

Society Factors

  • Increases after March 1 and March 10 could be people getting inspired by announcements, of first case and first local death respectively, to check into the ER. This is a critically important caveat for this and all syndromic surveillance methods.
  • Communication about what to do, when to do it, whether to do anything at all has been mixed and sometimes conflicting at the federal, state, and local level. These communication issues likely have an influence on whether people show up at the emergency department or not.
  • Maybe it’s all just some artifact of an aging population.

Unknown Factors

  • The increase we see may not be Covid-19. Maybe it’s a fluke. Maybe it’s some other influenza-like thing which has different characteristics.

Syndromic Surveillance: Tracking Covid-19 via Respiratory ER Visits

Growth of reported cases of Respiratory issues in the ERs of NYC, most likely related to Covid-19

The above graph uses the NYC Health EpiQuery data (Respiratory case counts) as of 11:50pm EST March 27, 2020 and the NYC Health & Mental Hygiene’s 2019 Novel Coronavirus (Covid-19) Daily Data Summary (tested positive Covid-19 and Covid-19 deaths) as of 4:00pm March 27, 2020. This page is updated regularly with new data as it becomes available.

This article is part of the Syndromic Surveillance and Covid-19 collection on Thoughtfaucet. Please see the section on Caveats if nothing else in the collection.

Definitions related to Estimating Future ER Load graph:

The actual data is the moving, lighter weight solid line beneath it. These points are calculated for each day based on the following from EpiQuery: ((Respiratory 18-64 2020)+(Respiratory 65+ 2020))-(Average((Respiratory 18-64 2017)+(Respiratory 65+ 2017)),((Respiratory 18-64 2018)+(Respiratory 65+ 2018)),((Respiratory 18-64 2019)+(Respiratory 65+ 2019))). Triangular points on the lighter weight line enclose weekends.

The bar chart shows the confirmed cases per day made by subtracting a given day’s total case count as reported in NYC Health & Mental Hygiene’s 2019 Novel Coronavirus (Covid-19) Daily Data Summary and subtracting the measurement obtained in the previous day.

Dots along the bottom indicate one death each. These are currently gathered via NYC Health & Mental Hygiene’s 2019 Novel Coronavirus (Covid-19) Daily Data Summary.

The loadband is a series of shaded bands. The darkness of the band indicates the number of days that many cases of ILI, all ages were generated within the Dec 12, 2019-February 15, 2020 timeframe. A median line is also present. This element is then transposed to the relevant date range and baseline. This loadband summarizes the peak months of the 2019/20 flu season activity for emergency departments in NYC. Data source is EpiQuery. See discussion of the baseline below re: why this is angled slightly.

The baseline is an average of people 18 and older, presenting Influenza-like Respiratory at a NYC emergency department 2017-2019. All lines and bars are relative to this figure in order to show variation from normal seasonality. Note that people reporting respiratory situations increases as the Spring comes on (allergies, etc). You can see the mild spring increase in the slight angle of the loadband and the peak performance line as these both represent static numbers vs the increasing level of the baseline.

FAQ on the Coronavirus/Covid-19/ER graph

  1. Didn’t you used to chart vs Influenza-like Illness? Why did you switch to Respiratory? The messaging within NYC is now that if a patient has a cough or fever they should stay home. But if they have breathing problems, they should come in. As New Yorkers begin following that instruction the Influenza-like Illness case count may be decreasing, indicating a false decline in severity of the Covid-19 threat. Respiratory issues continue to climb and so I have switched. In addition, the aspect of Covid-19 that is most dangerous is respiratory as machines become scarce.
  2. Why do some of the lines slope up? This chart shows the increased workload in the emergency department. If Covid-19 were not in NYC people would still be coming to the emergency department for respiratory problems like allergies etc. The baseline of the chart is the average of 2017-2019 of each day. As spring arrives, slightly more people came in with respiratory issues in 2017-2019. The sloped lines of the loadband and the peak performance line are based on static numbers, so in order to show how much more than any given day’s daily average, they slope upwards.
  3. Why is the loadband talking about influenza-like illness when this chart is showing respiratory issues? Treating and interacting with a Covid-19 patient involves a workload similar to an influenza-like illness in that measures to prevent contagious spread must be taken, in fact far more so than usual influenza-like like illness. This chart element shows just how far above a recent past experience the emergency department is working.
  4. Is this all the deaths caused by Covid-19 in NYC? No. Covid-19 response takes up staff resources and ventilators etc. There are people who do not have Covid-19 but will die because there are no more ventilators left for them, no clean rooms to perform emergency procedures, no staff available to take them in. Additionally, there are people who will die because they do not go to the emergency department out of fear of catching Covid-19.
  5. What do you mean “additional?” The curved trend line and wavy actual data line represent how many more cases are showing up this year vs the average of 2017-2019. These numbers are only the additional–more than the average–that show up for the 18 and older people arriving at the ER reporting a respiratory (aka breathing) problem.
  6. What do you mean “Respiratory” symptoms? This is defined by the data set I’m using to make this graph, EpiQuery: “Respiratory includes ED chief complaint mention of bronchitis, chest cold, chest congestion, chest pain, cough, difficulty breathing, pneumonia, shortness of breath, and upper respiratory infection.”
  7. So it isn’t coronavirus for certain? No. It could be people just coming in because they are worried, for example. This project was initiated due to the lack of access, material, and permission for people in the United States to test and screen for coronavirus. This project is not helpful in a test/trace/treat system, unfortunately. It can only indicate what the burden of our medical infrastructure will be if it follows a given pattern.
  8. Is this the entirety of the increase we should expect to see? No. This is only people 18 and older. There will be a few people that are younger who show up as well (See Daniel Weinberger’s project in “Resources” below to look into different age and borough configurations of this data). Also, this is only people who show up in the emergency department. There will be people who come in for treatment via other channels. Also, this is only people who report a respiratory (breathing) problem. There will almost certainly be people who show up with other concerns such as shortness of breath. This graph shows only one, very narrow and specific group of people.
  9. I saw a different version of this graph where the wavy line was different, how come that is? After this data is entered it is continually refined for up to two weeks. Though I do not know why this is, my experience with data projects leads me to believe that any exhaustive data-gathering activity finds ands corrects errors afterwards. Each time I update the graph I use the most current data available in the web interface.

Syndromic Surveillance and Covid-19

During the Covid-19 outbreak of early 2020 I started working on some graphs with data about the NYC emergency department case counts. This page is a collection of the different aspects of that project.

Why is it important to understand Covid-19 burden on NYC emergency departments?

The concern with Covid-19 or any pandemic is that it will overwhelm a health system. Once a health system is overwhelmed then people die not only from the pandemic illness. They also die because dealing with the pandemic takes up all of the available beds, equipment, and personnel.

If you get in a car accident but all of the ICU beds are taken up with pandemic illness patients your chances of dying increase. Or if you are in need of pediatric ICU but one of the Drs in the unit is infected, now the unit is quarantined and this will effect the level of your medical care. Or if the emergency department is crowded with worried people who are not ill and you arrive with a broken foot your treatment might be delayed due to heavier administrative burden.

With Covid-19 we don’t have much insight into how prevalent it is in NYC. This is because we have been unable to access testing in a significant way. As a result, we aren’t sure how intense the wave of illness will be. The examples of Italy, South Korea, and China suggest that the wave of illness will be quite intense.

Syndromic Surveillance & Covid-19 Resources

This article is part of the Syndromic Surveillance and Covid-19 collection on Thoughtfaucet.

In the process of developing the NYC Emergency Dept Load graph I gathered and read a variety of resources. Some of these are for specialist audiences and others are for more general audiences. This page is an annotated bibliography of the medical journal articles, Twitter threads, and news reports related to the project.

Bernard-Stoecklin, Sibylle and Patrick Rolland, Yassoungo Silue, Alexandra Mailles, Christine Campese, Anne Simondon, Matthieu Mechain, Laure Meurice, Mathieu Nguyen, Clément Bassi, Estelle Yamani, Sylvie Behillil, Sophie Ismael, Duc Nguyen, Denis Malvy, François Xavier Lescure, Scarlett Georges, Clément Lazarus, Anouk Tabaï, Morgane Stempfelet, Vincent Enouf, Bruno Coignard, Daniel Levy-Bruhl. “First cases of coronavirus disease 2019 (COVID-19) in France: surveillance, investigations and control measures, January 2020.” Euro Surveill, February 13, 2020. https://www.eurosurveillance.org/docserver/fulltext/eurosurveillance/25/6/eurosurv-25-6-4.pdf
A study and discussion of the first three cases of Covid-19 in France. This article outlines a standard detection and contact tracing method as employed in France at the time of the Coronavirus outbreak.
expert audience, study, process, medical authorities, covid-19, SARS-CoV-2
Bhadelia, Dr Nahid. (@BhadeliaMD) “Just to break down some #COVID19 issues for those not working in a hospital. First: even if the number of cases in many major US cities are in the 100s (not yet 1000s), hospitals and clinics in those cities are already becoming very busy and stretched. Why is this? [THREAD]” Twitter, March 15, 2020 10:03PM. https://twitter.com/BhadeliaMD/status/1239371589386371078
A thread on the load put on emergency departments when testing isn’t available during a pandemic.
general audience, medical authorities, emergency services, workflow
Christakis, Nicholas A. (@NAChristakis) “Flu pandemics recur reliably but unpredictably every decade or so, and their extent and intensity varies.” Twitter, March 13, 2020 12:21PM. https://twitter.com/NAChristakis/status/1238934000187707400
A thread on seasonal flu patterns. Focused on the 1957 Flu.
general audience, 1957 Flu, flu patterns
Maxmen, Amy. “How much is coronavirus spreading under the radar?”. Nature, March 13, 2020. https://www.nature.com/articles/d41586-020-00760-8
Discussion with experts at WHO, USA CDC, and Wellcome re: estimating amount of spread of the coronavirus which causes Covid-19.
general audience, interview, medical authorities, WHO, Wellcome, USA CDC, spread estimate methods, containment, covid-19, coronavirus
Newman, Kira MD PhD. (@KiraNewmanMDPhD) “After a week of working in the #ICU here in #Seattle caring for patients with #COVID19 (and other illnesses), here are some thoughts.” Twitter, March 15, 2020 3:34PM https://twitter.com/KiraNewmanMDPhD/status/1239273774341447680
A thread of practical lessons learned and general observations from a Seattle ICU doctor.
general audience, emergency medicine, coronavirus testing, patterns of care
Marks, Clifford. “The wait is endless. Supplies are gone. My New York hospital is on the brink.” The Washington Post Washington DC, March 25, 2020. https://www.washingtonpost.com/outlook/2020/03/25/wait-is-endless-supplies-are-gone-my-new-york-hospital-is-melting-down/
A second-year emergency medicine resident discusses the situation in his hospital.
general audience, NYC emergency department, ER burden
Momennejad, Ida. “COVID-19 watch USA: NY, WA, CA, & other states” Medium, March 24, 2020. https://medium.com/@idamomennejad/covid-19-watch-usa-ny-wa-ca-other-states-39117de734ae
A computational neuroscientist compares data from WA, NY, and CA to see what there is to watch containment strategies and outcomes.
general audience, data, comparisons
Murthy, Vivek. (@vivek_murthy) “Yesterday, I spoke with doctors from one of the nation’s leading academic hospitals located in a state where #COVID19 cases are increasing quickly. This is what they told me: They’ve been seeing *many* patients with symptoms concerning for COVID19 who need testing.” Twitter, March 13, 2020 12:21PM. https://twitter.com/vivek_murthy/status/1238500475068125192
Former Surgeon General Murthy describing needs of medical systems based on conversation with people who run leading health care systems.
general audience, medical authorities, health institution needs, medical supplies
Offenhartz, Jake. “‘I Am Dreading My Next Shift’: An NYC E.R. Doctor Speaks Out About The Escalating Coronavirus Crisis” Gothamist New York, March 13, 2020. https://gothamist.com/news/nyc-emergency-room-doctor-coronavirus-crisis-interview
An anonymous interview with an ER Doctor about conditions in NYC emergency departments as of March 13, 2020.
general audience, interview, NYC emergency department, ER burden
Reich, Nicholas G. and Evan L. Ray, Graham C. Gibson, Estee Cramer, Caitlin Rivers. “Looking for evidence of a high burden of COVID-19 in the United States from influenza-like illness data.” Reichlab UMass-Amherst, March 13, 2020. https://github.com/reichlab/ncov/blob/master/analyses/ili-labtest-report.pdf
A study of publicly available data from a variety of health providers on Influenza-like Illness (ILI), all age groups, to see if non-influenza ILI is making a larger portion of ILI visits vs the same time previous years. Data is through March 7, 2020.
expert audience, study, data, medical authorities, influenza-like illness, ILI, covid-19, SARS-CoV-2
Rivers, Caitlyn. (@cmyeaton) “I’ve been working with @reichlab and colleagues weekly to assess whether levels of ‘influenza like illness’ that are NOT due to influenza are higher than usual in the US.” Twitter, March 13, 2020 7:50PM. https://twitter.com/cmyeaton/status/1238613507559624709
Brief Twitter thread of the Reich et al study with a focus on the primary chart involved.
general audience, chart, medical authorities, influenza-like illness, ILI, covid-19
Weinberger, Daniel. “NYC ED syndromic surveillance” Weinberger Lab Yale School of Public Health, March 14, 2020. https://weinbergerlab.shinyapps.io/NYC_syndromic/
An interactive tool to view EpiQuery data with graphs for each age bucket and pull-downs to select ILI or Resp(iratory illness) and specific borough. Compare this year against a model adjusting for seasonality, influenza activity, and RSV activity.
data tool, medical authorities, influenza-like illness, ILI, respiratory illness
Wu, Zunyou, MD, PhD and Jennifer M. McGoogan, PhD. “Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72?314 Cases From the Chinese Center for Disease Control and Prevention” JAMA, February 24, 2020. https://jamanetwork.com/journals/jama/fullarticle/2762130
Data re: the Wuhan outbreak. A clear display of the lag-time between true cases and confirmed cases. Also it shows the relationship between measures taken to “flatten the curve” and outcomes. Additional statistics on case fatality rates and other characteristics of Covid-19.
expert audience, study, data, medical authorities, Wuhan, covid-19

Loadband, an information design pattern for showing intensity of real world factors

This article is part of the Syndromic Surveillance and Covid-19 collection on Thoughtfaucet.

While working on my “Estimating Future ER Load” information design I wanted a way to show how many cases of influenza-like illness an emergency department was capable of handling over a given time.

One of doing this might be to examine previous flu seasons and show what that workload was like relative to the current environment.

With this goal and data in mind I crafted what I’m going to call a loadband. Not to be confused with the excellent chamber music group Loadbang (recommended listening when you want something other than all of this to think about).

Loadband shows an area of workload activity to visualize a sense of the relative difficulty.

To make this loadband I took the data for the peak of the previous flu season: Dec 12, 2019-Feb 15, 2020 from EpiQuery (NY Health & Mental Hygiene). I found the median case count for that period and set a solid line.

The case counts per day for this time period ranged from the 400s to the 1100s. I made bands that were 100 cases tall and set the darkness of each band relative to how many days had cases/day which fell within that range.

In this way, readers can see that the peak of the immediate preceding flu season had quite a few cases per day and that at the peak they tended to range more in the 900s and up.

My hope with the loadband is that it gives a sense of what an emergency department has done in the past over a sustained period of time so we know when they are exceeding past performance and in entirely new territory.

Coronavirus: Estimating future ER load in NYC

A chart comparing additional cases per day for Influenza-like Illness, 18 and older, visiting NYC emergency departments with tested cases of Covid-19 March 24, 2020
Additional cases per day for Influenza-like Illness, 18 and older, visiting NYC emergency departments. March 24, 2020

The above graph uses the NYC Health EpiQuery data (ILI case counts) as of 1:47pm EST March 24, 2020 and the NYC Health & Mental Hygiene’s 2019 Novel Coronavirus (Covid-19) Daily Data Summary (tested positive Covid-19 and Covid-19 deaths) as of 9:45am March 24, 2020.

This article is part of the Syndromic Surveillance and Covid-19 collection on Thoughtfaucet. In particular see the Caveats section.

Discussion

While case counts are up today vs the weekend (which is common in how we interact with the medical system culturally), they rise isn’t as steep as it was after the previous weekend. Today is the first day since this project began that estimates of crossing the recording-breaking levels of ILI have moved further away, into the future.

This follows on Dan Weinberger making note that there was a change in the data available via his modeling yesterday:

Respiratory cases, however, continue to increase. So perhaps this decline in rate of change is a result of messaging re: staying home for cough/fever is getting through.

At the current pace, sometime around March 26th emergency departments could begin seeing back-to-back record numbers of case counts per day.

As of March 18, 2020 emergency departments were operating at the peak of this year’s regular influenza season. Importantly, the work required–staff protective equipment, room and equipment cleaning, etc–of the emergency department is much more intense than a regular influenza season (see Offenhatz and Newman in resources below).

If the experience of Wuhan or Italy are an accurate guide, the case count will continue to increase for 12 days after drastic social distancing (i.e. lockdown) conditions are obtained in NYC.

In this graph we see two predictions based on data from NYC Health and Mental Hygiene.

  • Predicted case count above average (2017-2019) if cases before the first NYC-area death was announced are Covid-19 related: dotted curved line. This prediction did not come to pass.
  • Predicted case count above average (2017-2019) if case count grows according to the trend without any multiplier: solid gray line.

For reference, the case count of the heaviest day for Influenza-like Illness is included.

There is also a loadband showing the typical cases-per-day of ILI, all ages, during the peak of the 2019/2020 flu season (which actually ends just as this chart is beginning).

These two elements provide a measure of how burdened the emergency department is relative to other challenges they’ve overcome.

Worried people

Even if the unusual lift in Influenza-based Illness, 18+ this year is not related to Covid-19 (for example, if it is people who are worried but do not in fact have Covid-19) it will still place a burden on the system. These patients will still take up time with staff administration, being examined, processing paperwork, and being nervous. Given the current lack of Covid-19 testing access these individuals may also face unnecessary quarantine, follow-up, etc.

“It’s important to know that in a busy New York City emergency department, it can be a huge drain on resources to have all these people show up at the ER at once to get tested.”

—Anonymous ER Doctor, interviewed in Gothamist, March 13, 2020

If cases presenting at NYC emergency departments are, in fact, worried people without the requisite symptoms then it would make sense to examine measures which relieve unnecessary public worry and the burden on the emergency department.

Ways to determine this might include:

  • Qualitative assessment of emergency department: talk to staff about the nature of the current rise in case count.
  • Examine diagnosis/outcomes of cases. EpiQuery tells us that someone has checked in but does not tell us if they turned out to be sick of Covid-19 or any other illness.

Covid-19 Patients

The graph includes a bar chart section which shows, to scale, the positives count of the current testing being done in NYC. It is to scale with the other elements of the chart.

Soon, NYC emergency departments may be at record-breaking loads for Influenza-like Illness. Unlike a normal record-breaking day for an influenza season each day will be followed by another record-breaking day. It will continue like this for 12 or days after meaningful social distancing is practiced by everyone who is in or enters NYC.

During this time, patients would then filter further into the health system absorbing resources, space, and staff. How many record-breaking days of Influenza-like Illness intakes in a row can the medical system absorb?

This is the trajectory that was faced by Italy which lead to collapse of medical services and complete shutdown of the country. We know from the experience of Wuhan that once a complete shutdown occurs case counts will continue to rise for 12 days (JAMA February 24, 2020).

Importantly, Mayor DiBlasio notes that at the current rate NYC will run out of ventilators in the next ten days. Had “extreme” social distancing measures been implemented even two or three days ago the turnaround in cases would begin in time. Now we’ll have something truly extreme: people who would have lived otherwise will die because states will be unable to obtain ventilators.. There are no more ventilators to be had nor will they be built in time.

Definitions related to Estimating Future ER Load graph:

The heavy solid curved line is a trend line of current actual data. It’s a 3rd order polynomial. The actual data is the moving, lighter weight solid line beneath it. These points are calculated for each day based on the following from EpiQuery: ((ILI 18-64 2020)+(ILI 65+ 2020))-(Average((ILI 18-64 2017)+(ILI 65+ 2017)),((ILI 18-64 2018)+(ILI 65+ 2018)),((ILI 18-64 2019)+(ILI 65+ 2019))). Triangular points on the lighter weight line enclose weekends.

The dotted curved line is a prediction based on data through March 8, 2020–two days before the first NYC-area death and assuming the Covid-19 case doubling of every six days (The Lancet, Jan 31, 2020). This line is made by taking the March 8, 2020 data point from EpiQuery, doubling it every six days. It is a 3rd order polynomial trend line.

The gray solid line is a prediction based on all current data but makes no assumption related to Covid-19. It is a 2nd order polynomial trend line.

The bar chart shows the confirmed cases per day made by subtracting a given day’s total case count as reported in NYC Health & Mental Hygiene’s 2019 Novel Coronavirus (Covid-19) Daily Data Summary and subtracting the measurement obtained in the previous day.

Dots along the bottom indicate one death each. These are currently gathered via NYC Health & Mental Hygiene’s 2019 Novel Coronavirus (Covid-19) Daily Data Summary.

The loadband is a series of shaded bands. The darkness of the band indicates the number of days that many cases of ILI, all ages were generated within the Dec 12, 2019-February 15, 2020 timeframe. A median line is also present. This element is then transposed to the relevant date range and baseline. This loadband summarizes the peak months of the 2019/20 flu season activity for emergency departments in NYC. Data source is EpiQuery.

The baseline is an average of people 18 and older, presenting Influenza-like Illness at a NYC emergency department 2017-2019. All lines and bars are relative to this figure in order to show variation from normal seasonality.

FAQ on the Coronavirus/Covid-19/ER graph

  1. Is this all the deaths caused by Covid-19 in NYC? No. Covid-19 response takes up staff resources and ventilators etc. There are people who do not have Covid-19 but will die because there are no more ventilators left for them, no clean rooms to perform emergency procedures, no staff available to take them in. Additionally, there are people who will die because they do not go to the emergency department out of fear of catching Covid-19.
  2. What do you mean “additional?” The curved trend line and wavy actual data line represent how many more cases are showing up this year vs the average of 2017-2019. These numbers are only the additional–more than the average–that show up for the 18 and older people arriving at the ER with influenza-like illness.
  3. What do you mean “influenza-like illness?” This is defined by the data set I’m using to make this graph, EpiQuery: “chief complaint mention of flu, fever, and sore throat.”
  4. So it isn’t coronavirus for certain? No. It could be people just coming in because they are worried, for example. This project was initiated due to the lack of access, material, and permission for people in the United States to test and screen for coronavirus. This project is not helpful in a test/trace/treat system, unfortunately. It can only indicate what the burden of our medical infrastructure will be if it follows a given pattern.
  5. Is this the entirety of the increase we should expect to see? No. This is only people 18 and older. There will be a few people that are younger who show up as well (See Daniel Weinberger’s project in “Resources” below to look into different age and borough configurations of this data). Also, this is only people who show up in the emergency department. There will be people who come in for treatment via other channels. Also, this is only people who report influenza-like illness. There will almost certainly be people who show up with other concerns such as shortness of breath. This graph shows only one, very narrow and specific group of people.
  6. Doesn’t this chart overstate reported cases? The statistic for the six day doubling rate comes from the reported cases in the Wuhan outbreak, the largest data set available on coronavirus. If the people coming in are not a result of Covid-19 then the doubling etc will not happen. So far however, each day I’ve filled out the chart the graph gets steeper.
  7. Doesn’t this chart understate reported cases? Possibly. This chart backs out the average, only counting cases that are above the average for that day from the past three years. This is because alongside Covid-19 there are still people who are getting the regular flu. People getting the regular flu will not multiply at the same rate as those with Covid-19. The graph accounts for this by showing how many more cases than average we are seeing in the data.
  8. I saw a different version of this graph where the wavy line was different, how come that is? After this data is entered it is continually refined for up to two weeks. Though I do not know why this is, my experience with data projects leads me to believe that any exhaustive data-gathering activity finds ands corrects errors afterwards. Each time I update the graph I use the most current data available in the web interface.
  9. Wait, you update this by hand? Yes. There are a few reasons for this. One, it was simpler and easier to get it up and running. I began on on March 10th, 2020 when there was still time to avert many of deaths so getting it running quickly was important to me. Also, by manually entering the data I gain a granular, intuitive familiarity with the patterns which I would not have were I to pull the data spreadsheet form or through an API. This familiarity with the data has allowed me the space and time to develop further insights. Each day I enter data and check the most recent two weeks for changes, this helps me keep the data correct, though errors could certainly be introduced this way as well (see Caveats, below).

For regular people: what to do with this chart?

Hopefully this chart helps to impress upon you that the emergency services are likely going to be very busy in the next few weeks. Avoid doing dangerous stunts. Be careful with your physical self to avoid unnecessary injuries. Please continue to follow health leaders’ advice re:

  • Stay safe, avoid unnecessary trips to the emergency department.
  • Wash your hands regularly.
  • Disinfect your living quarters regularly.
  • Encourage everyone to be safe.
  • Call your friends, avoid physical gatherings.
  • You can help medical professionals by maintaining your own safety and the safety of those in your community.

For emergency services workers:

This chart suggests what the work load is going to be like in the next few weeks. I do not know what it will mean if 800 additional people show up for influenza-like illnesses in terms of staffing and resources, the flow of patients and staff. You do, however. Hopefully this can help you prepare.

For decision-makers:

If you are in a position to make a decision, how far up that dashed line we go is up to you. So far the societies that encountered Covid-19 have done one or more of the following:

  • Instituted extraordinary lockdown efforts early: Wuhan and the rest of China. It’s no longer early for NYC. If the caseload increase begins around February 17 then Wuhan was locked down by March 15.
  • Massively increased the available hospital beds in a short period of time: Wuhan built two 1000 bed facilities in two weeks.
  • Quickly and aggressively instituted test, trace, treat: South Korea.
  • Belatedly instituted extraordinary lockdown: Italy.
  • Ignored, dithered, and white-washed: Iran.

Each of these leadership decisions have resulted in differing death and disruption results. Some of these responses are not appropriate for the US. Some of them are no longer available to us.

Hopefully the chart can give you an indication of what kind of support the emergency medical system is going to need in order to avoid being overwhelmed as it has in Italy.

In 2013, Gahlord wrote about the Flowgraph. Though his initial use-case was relatively whimsical, it is an effective way to communicate contact tracing as well.

Seeing Covid-19 in Influenza-like Illness, NYC 2020

This article is part of the Syndromic Surveillance and Covid-19 collection on Thoughtfaucet.

This document outlines the thinking behind my initial chart, trying to see if there was anything to the idea that Influenza-like Illness could be used to spot Covid-19 in NYC.

A thread on Twitter by Farzad Mostashari mentioned EpiQuery, a data collection maintained by NYC Health. The data consists of emergency department visits broken down by date, coarse age groups, borough of New York City, and what sort of symptoms the person was reporting.

At the time (and continuing as of this writing) there has been a severe bottleneck in US residents having permission, access, and proper resources to test for Covid-19, the disease caused by the 2019 novel coronavirus. One hypothesis is that by using what we know about the disease combined with the data in Epiquery, we might be able to see the presence of it in NYC.

While using this method wouldn’t be useful for individuals–it doesn’t actually screen people for coronavirus–it can be useful to see how long it might be present and perhaps how prevalent.

I wanted to do a quick test and see if it was worth getting deeper into the NYC Health data to find Covid-19 and predict impact on emergency services. My concern was that we don’t turn out like Italy and face a collapse of emergency health in NYC.

Summary:

By using Covid-19’s observed bias towards being harsher on older people, I used a combination of EpiQuery’s age and Influenza-based Illness syndrome surveillance and found what appears to be an increase in people showing up at NYC emergency departments in mid-February.

What we know about Covid-19

First off, some characteristics of Covid-19 that we might be able to use in examining the public health data provided by Epiquery.

  • Covid-19 is more aggressive on older people. Young people do not develop as many serious cases of Covid-19.
  • It takes about 5 days from being infected by coronavirus for symptoms to arise. (BBC)
  • It takes between 5-10 days for symptoms to become serious enough for someone to seek medical help.
  • 6.1% of cases were administered mechanical ventilation. (NEJM)
  • Median duration of hospitalization was 12 days. (NEJM)
  • Once Wuhan, the city where Covid-19 first became prevalent, went on complete city-wide lockdown, it took about 12 days for the new cases per day to decrease. (JAMA, see figure 1.)
  • Covid-19 cases double about every 6 days. If you have 100 cases on day 1, you will have 200 cases on day 6, 400 on day 12, 800 on day 18, 1600 on day 24 and so on.

The EpiQuery Data

It’s worth knowing about the EpiQuery data as well. It comes with it’s own stated limitations, which are important to review. In addition, the data has the following features, characteristics, and limitations:

  • There are four age buckets: 0-4 (infants), 4-17 (children), 18-64 (working-age adults), 65+ (seniors).
  • There is data specifically consisting of people who came to the emergency department with Influenza-like Illness symptoms: fever, coughing, etc.
  • The data is posted two days after it is collected and then updated for the following few weeks.
  • The data is 100% of NYC emergency department visits back through mid 2016.

What we know about the world during the 2019/2020 Coronavirus Pandemic

There are things about the world that are relevant to this data search as well:

  • NYC has 5000 ventilators. (Politico, March 13, 2020)
  • Flu season tends to be between December and February.
  • Italy did not lockdown their country until around March 10. In terms of raw, reported case numbers they are about 10 days ahead of the United States.
  • Italy has a very slightly older average population age than we do.
  • Italy has more hospital beds per capita than we do.
  • Italy’s medical services are collapsing resulting in a higher rate of death than in post-Wuhan China. (The Atlantic)

Important dates related to coronavirus

Here are some important dates for coronavirus and NYC:

  • Dec 31, 2019: China notifies WHO of existence of coronavirus, Wuhan is locked down.
  • Jan 21, 2020: First case of Covid-19 detected in the USA in Washington state.
  • March 1, 2020: First case of Covid-19 detected in NY state.
  • March 10, 2020: First death related to Covid-19 in NJ reported in Bergen County which is next to NYC.
  • March 12, 2020: 95 confirmed cases of Covid-19 in NYC. (NYTimes), First death in NYC (Brooklyn).
  • March 13, 2020: 154 confirmed cases of Covid-19 in NYC (NYTimes), Second death in NYC (Manhattan).
  • March 14, 2020: 184 confirmed cases of Covid-19 in NYC (NYC Health)

Combining these things into something we can see

I wanted to know if we would really see anything at all with the data. The data contains case count, by age, of people who reported Influenza-like Illness at the emergency dept in NYC. I gathered this data for Nov 1-March 9 for both 2019 and 2020.

In a spreadsheet, I entered the case counts for each age bracket within the ILI syndromic surveillance data provided by EpiQuery. Then I calculated the percentage of the ILI syndromic surveillance counts for each age bracket. This is necessary because the ratio number provided by EpiQuery is vs all other emergency department visits. I wanted to know just within ILI, are the older people coming in more than they have in the past? My thinking is that since Covid-19 disproportionately effects older people the percentage of the Influenza-like Illness group would start to skew older as coronavirus spreads in NYC.

Using 2019 as a reference, I compared the percentage of people 18 and older seeking care for ILI symptoms to see if it deviated. Here is that chart:

Percent Influenza-like Illness Emergency Department visits, 18 and older, for NYC 2020 vs 2019

What’s in this initial, preliminary Coronavirus/NYC chart?

First obvious thing is a spike in the percentage on Jan 29. This was driven by a large number of cases reported that particular day (over 600 when typically it was in the 400s for the days preceding it). Not sure what that’s about.

After noticing that we can also see the “normal” flu season take off for this group around December and start to level off in January. This is visible in 2019 as well as 2020.

In mid-February, however, there is a second rise of cases for the 2020 18 and older age group while the 2019 data shows declining case percentage in February and March. The difference is pronounced by the time the first case in NYC is officially called on March 1, 2020.

While there are number of caveats–largest being that these are not diagnoses, these are just people self-reporting after arriving at emergency services–the data suggested it was worth going deeper to learn more and perhaps predict future impact on emergency services in NYC.

For my next chart I wanted to get a little deeper into the data so that something could be predicted about the future. In order to do that I needed to:

Clarify the pattern. Flu season creates a moving trend line as cases pick up for this group in December and begin to recede in January. I want to know how much additional strain is being put on the NYC emergency departments. Visualizing the difference across moving and waving lines can be difficult for some people. The new chart creates a straight base line consisting of the average. That way the curved line can tell us “what is the difference between what we see now and a normal flu season.”

Refine the question. A good information graphic deals with a specific question. My question for the previous chart was “Can we see anything that might be Covid-19 in the emergency department data?” and I felt like we could. This time I wanted to answer: “What will the impact be on the emergency department, for the coming weeks?”

Establish a better baseline. I made an average of our 18 and older, Influenza-like Illness emergency department visitors for 2017, 2018, 2019. That’s three years of data. 2017 and 2019 were relatively mild flu seasons. 2018 was not as mild. Between the three I figured this gives us a little more data to work with to establish what a flu season looks like without coronavirus. Having a better baseline increases trust that what we’re looking at isn’t just a random fluke of particularly bad or good year.

Focus the timeline on the relevant dates. Since the obvious split of the data wasn’t happening until after Valentine’s Day I decided to start with Jan 30 (which also passes over that spike in the data on Jan 29).

The impact can be discussed in terms of more or fewer cases than the average of 2017-2019. To do this we subtract the average number of cases 2017-2019 from this year’s cases. That lets us know if we’re seeing more cases or fewer cases than average. We can show the change in case load in the past. And perhaps, using other information, predict what it might be in the future.

Using the fact that Covid-19 cases double about every 6 days, I can take the most recent number of people showing up at the ER who are 18+ with influenza-like illness and double that number. Then go forward six days and subtract the average for that date from my doubled number. This gives me an estimate of what the difference might be

With those targets fixed out in the future I built a trend line that matches those targets.

Why do ad clicks not match up with Google Analytics data?

This is a question I get just about every week. Someone has an advertising distributor that charges by and reports by clicks. Those clicks don’t match up to the number in Google Analytics. Much worry ensues.

Here’s my standard answer on this topic, it covers most sitatuations:

 

Ad clicks and GA data:

Differences between clicks and what shows up in GA often can be accounted for in the following ways (in order of how I most often see them):

  1. Make sure the timeframes are the same and that they do not include the present day (GA can have a 24 hr lag in number crunching and thus underreport if comparing with current-day-included click data).
  2. Different measures: for lead gen businesses I typically measure Users. Click data is more closely matched to Sessions. A single User could, in theory, click an ad multiple times resulting in multiple clicks but only one User.
  3. Clicks are measured at the ad distributor’s site. Users/sessions are measured at your website by GA once the page has loaded enough to get the GA script. In theory, people can click an ad and decide to bail before your page loads enough to run the script. In these cases, a click will be measured by the ad distributor but no users/sessions will be measured by GA. Sites with slow load times are especially at risk for this situation; users lack patience for the page to load.
  4. Humans clicking ads twice because of the urge to double click everything. Not much to do here, but it will result in more ad clicks than sessions/users. This tends to be a constant level of noise across the internet.
  5. Old fashioned bot-driven click fraud. GA will not record data about spiders and bots as a user or a session. Some ad systems will register clicks from bots. This is pretty rare. I include mostly for completeness.

Dealing with the difference between ad reporting and GA reporting:

There are a few things that can be done to help with this. One is to make a dashboard that gives you sessions data so you can compare that more easily with clicks.

Another, if the difference is still great, is to simply evaluate the performance of the campaign based on the data in GA. So take the ad spend and distribute it by the actual users or sessions that arrive on site. Be certain to let your ad representative know that this is how you are evaluating performance. This is a good practice regardless, though it does require a spreadsheet.

The clicks are ultimately not the important thing. Ad sellers want you to think about the clicks because that is what they sell. They want to sell you traffic because that’s all they can do. For lead gen businesses, like like most b2b and many b2c businesses like real estate or main street businesses, you want to buy the opportunity to shake someone’s hand.

The best way to stay focused on what’s important for your business is to assemble the data in a way that is meaningful for your own goals. It’s a little bit of sausage-making and inevitably involves some manual labor.

Doing this work helps to answer the real question which is: How much money does an email address cost from this ad distributor? If we do this for each of our different efforts we can discover which is providing the most value per dollar. That knowledge can then be used to inform decisions about ad spends etc.

Napkin sketch: A news business model free from advertising

(Originally published in Sept 2015 but sadly still relevant.)

We know that advertising in all its forms is cluttering and degrading our experience of the web. Great stories and bits of news are accompanied by six-packs of “26 gay celebrities, you’ll never believe #4!” and “Stop bellyfat with this great tip to end credit card debt!”

It’s ridiculous. Even the “good” ads aren’t that good. The advertising is abysmal, a joke.

Aside from the quality of ads, the technology is abysmal as well. How much of your wifi and cellular time is spent waiting for a garbage ad to load?How often does the ad tech break the system entirely, forcing reloads, etc.?

By listening to media companies, we learn that Continue reading “Napkin sketch: A news business model free from advertising”

Failure and imagination

There are a large number of things wrong re: the situation with young electronics enthusiast Ahmed Mohamed being taken from school in handcuffs for bringing in an electronic clock. Those who are interested in innovation in America should watch this story closely.

The thing I want to focus on in this article is the following quote from Irving police spokesman James McLellan in the Dallas News:

“It could reasonably be mistaken as a device if left in a bathroom or under a car. The concern was, what was this thing built for? Do we take him into custody?”

This statement is important because it is an excellent example of Continue reading “Failure and imagination”