Researchers at the University of Sheffield used several CDRC datasets as part of their research into digital poverty in South Yorkshire. The South Yorkshire Mayoral Combined Authority (SYMCA) wanted a better understanding of how digital poverty and digital exclusion play a role in the region, and how to build digital capability in the region, particularly for social groups identified as at risk.
The research team mapped digital poverty in the South Yorkshire region. This allowed them to assess the areas at greater risk of digital poverty, by highlighting the intersections of different inequalities and barriers that different social groups in the region experience. This provided a place-based nuanced understanding of which populations and areas are more affected and thus potentially excluded from the labour market, education, and services due to being digitally excluded.
The findings are helping SYMCA create a positive impact in the region: they will inform the region’s COVID-19 recovery plan and the agenda for implementing their Inclusion Plan and is providing feasible and research- and evidence-informed pathways towards alleviating digital exclusion and digital poverty. Besides supporting the most disadvantaged citizens, by leveraging the findings of this project, SYMCA will be better able and prepared to create further digital development opportunities in the region, e.g., by supporting the development of literacies, entrepreneurship, and talent- thus supporting its economic recovery.
The project was funded through the Knowledge Exchange Support Fund (QR Policy and Covid Recovery) and supported by the South Yorkshire Office for Data Analytics Pilot.
Analysing COVID-19 Mobility Responses through Passively Collected App Data(Case Study)
Using smartphone GPS mobility data to understand population-scale responses to COVID-19 ‘lockdown’ policies in England.
Project overview
COVID-19 has prompted the enhanced use of novel mobility data in public life, offering fascinating insights into population-wide behavioural responses to Non-Pharmaceutical Interventions (NPIs) such as ‘lockdown’ stay-at-home orders. Here, we use privacy-preserving smartphone data to understand these trends at a regional scale over a longitudinal period spanning January 2020 to May 2021 for England, with a specific focus on examining adherence to policy measures on household visitation.
The concepts of adherence and fatigue to ‘lockdowns’ are highly debated ideas with limited observational evidence, despite their key role in supporting current policy assumptions. The SAGE report of 16th March 2020 underscored this when it said there was “(limited) evidence on whether the public will comply with the interventions in sufficient numbers and over time” (p.2) with respect to COVID-19 measures. Our study uses a novel measure of ‘house visits’ activity to cut out general noise and is explicitly purposed with better informing health policy interventions in the context of a public health emergency.
Data and methods
According to UK Government polling for the Centre for Data Ethics and Innovation (CDEI), 58% of over 2000 UK adults surveyed in Sept 2020 were either ‘quite comfortable’ or ‘very comfortable’ with “researchers using data to improve knowledge to help keep the public safe” during COVID-19, with just 14% being ‘quite’ or ‘very uncomfortable’. This finding was positive overall across all UK regions, all age groups, all income levels, all education levels, and whether or not people were worried about COVID-19 itself. There were also 16.5 million voluntary downloads of the NHS COVID App for modern smartphones in England and Wales in 2021. Clearly, there is a public demand for the harnessing of data to help tackle COVID-19.
Our study used anonymous, privacy-enhanced GPS smartphone mobility data from users who opted-in to data collection for research purposes under a GDPR compliant framework. Data was supplied by American and Italian location intelligence company Cuebiq, under their Data for Good program. We use unsupervised machine learning methods (DBSCAN) to make home and work area assignments, which are then taken out of user activities. Through a validated ‘process of elimination’ using POI analysis, we can then generate an aggregate measure of the proportion of de-identified users taking a house visit, for a given county area, on a given day. The output data is thus aggregated to strict privacy requirements set by Cuebiq for both temporal and spatial scales before it is analysed, yet still able to harness the precision inherent in such emerging data streams, in order to optimally inform public health policy under COVID-19. Limitations of the methods and data, including a potential lack of representativeness, were extensively discussed in the published findings. Importantly, the data could not accurately distinguish between visits to inside homes compared to outside garden areas.
Key findings
This LIDA project led to the publication of an original research paper ‘Household visitation during the COVID-19 pandemic’ in the Nature journal Scientific Reports in November 2021, detailing both methods and results.
Our results track the evolution of a measure of household visitation levels in English LTLAs (Lower-Tier Local Authorities) over time – notated as ‘HEngland,t’ throughout the study. This index value was a national level, calculated through the mean average of weekly levels for each of England’s 315 LTLA areas, excluding the Isles of Scilly due to sample size issues. This weekly measure of levels of household visitation was measured against a pre-pandemic baseline figure taken from across 13th January 2020 to 2nd March 2020. The baseline was specific to both each LTLA area, as well as to each day of the week, to account for relative changes in each locality.
Figure 1 from the paper here shows the evolution in ‘HEngland,t’ across the full study period, as well as the evolution of recorded COVID-19 cases. As can be seen, levels of household visitation dropped dramatically in late March 2020, dropping to an all-pandemic period low of –56.4% relative to pre-pandemic baseline levels on 29th March 2020. In Figure 1 we have marked ‘national lockdown’ periods as those when stay-at-home orders were in place, during which time household visitation was prohibited in almost all cases. By taking mean averages across these time periods, we can witness household visitation levels averaging −39.33% during the 1st National Lockdown (23/03/20 – 12/05/20) below baseline levels, compared to higher rates of average house visits activity recorded during the 2nd National Lockdown (05/11/20 – 01/12/20), when rates were only averaging −15.28% below pre-pandemic levels by comparison. We didn’t witness a great jump in household visitation in the immediate aftermath of the introduction of ‘support bubble’ exemptions in mid-June 2020.
Heading into the 3rd National Lockdown (06/01/21 – 07/03/21), mobility activity reduces pointedly ahead of the imposition of national restrictions, reflecting perhaps the impact of COVID-19 risk perception and/or the new Tiered restrictions announced on 19th December 2020 in response to the detection of the new Alpha variant in South-East England. These trends were reinforced by the imposition of the 3rd National Lockdown on 6th January 2021, which kept levels of household visitation at levels between the 1st and 2nd National Lockdowns at -26.22% below (06/01/21-14/02/21) baseline rates until approximately mid-February 2021.
At this point it was announced by the Prime Minister during a 10 Downing Street Coronavirus television briefing to the nation that 15 million people from the most vulnerable categories in JCVI Priority Groups 1-4 had received a first dose of COVID-19 vaccination. Almost immediately a significant rise in household visitation rates were witnessed by our metric ‘HEngland,t’ across England, such that by the 7th March 2021 levels of household visitation were comfortably above the pre-pandemic baseline, even though coronavirus regulations had stayed the same.
Figure 2 here illustrates the geographical variation in these household visitation rates for Local Authority Districts at LTLA scale, as mean averages across a) the entire COVID-19 period, and then, for b)-d), across the three National Lockdown periods respectively. These are presented as hex cartograms, prepared with assistance from the UK House of Commons Library. Some regional disparities are shown, notably between North and South, and between urban and rural areas. London boroughs, in particular, appear to have consistently higher relative rates of visitation against the pre-pandemic baseline than elsewhere in England.
Figure 3 here finalises this summary of our key results, by showing the findings when applied to two individual local authority areas that experienced specific and rigorous local restrictions to tackle sudden outbreaks in cases over summer 2020 – ‘local lockdowns’ as they became known in England. Here, the cities of both Leicester and Liverpool are shown to have exhibited a likelihood of different profiles of adherence to ‘local lockdown’ measures on household visitation. In the case of Leicester, despite a great reduction in visitation when local lockdown was at its strictest compared to the national trajectory, a serious rise in household visits (to above the national level for England) occurs just around the time of the first relaxation on 1st August 2020, even though this didn’t revoke the restrictions prohibiting house visits. By contrast, in Liverpool house visits had stayed meaningfully below the national figure for England throughout the summer period, including after regional measures were introduced on 22nd September 2020.
Value of the research
The research had been directly designed to inform public policy, aligned with LIDA’s commitment to using data for public good. Understanding actual levels of likely aggregate adherence to pandemic policy was highlighted as an area of importance by the House of Commons Health and Technology Select Committees joint report into the UK coronavirus response – “Coronavirus: lessons learned to date” – published in September 2021.
Many activities driving virus transmission are intimately connected to the mixing and mobility of individuals. Our observational findings on behavioural responses in house visits will therefore allow public sector agencies to better understand how English populations responded to a range of lockdown impositions and relaxations, as well as allow us to see how these responses may have been complicated and/or influenced by concurrent public messaging and prevalent COVID-19 risks. A mix of past national and local lockdown policies can therefore be optimised and/or evaluated using our results. The Scientific Reports research paper disseminating the results was highlighted in the ‘Behavioural Science and Insights Unit Weekly Literature Report’ of the UK Health Security Agency (UKHSA) in late November 2021.
The findings received significant coverage in the national British press, featuring in Metro, The Daily Telegraph, The Independent, Daily Express, Daily Mail, The I paper, as well as in other national-scale publications including the Yorkshire Evening Post, The Conversation and current affairs magazine The Week. This was supplemented internationally by mass online coverage from Yahoo! and MSN. According to Altmetric, as of 20th January 2021, the research paper has also been shared on Twitter to a combined total of 2.69 million followers.
Quote from project partner Cuebiq
“We’re proud of the exceptional and novel research led by University of Leeds, not only because it created impactful public goods, but also because it was achieved with an uncompromising commitment to data privacy and governance.”
Insights
Measures indicate adherence to household visitation restrictions was relatively high overall but waned both within and between subsequent National Lockdowns in England. This is rare observational evidence for shorter- and longer-term ‘fatigue’ in compliance with COVID-19 restrictions, at various stages of the pandemic lifecycle.
About 15th February 2021, when the Prime Minister informed the nation that 15 million people from the most vulnerable in JCVI Priority Groups 1-4 had been vaccinated, a significant and unprecedented rise in household visitation rates was witnessed nationally, to above pre-pandemic base rates, despite lockdown regulations staying the same. This indicates that people may have paid meaningful attention to levels of protection carried by the most vulnerable members of British communities when determining their visiting activities, and/or have adhered far less to relevant pandemic regulations once vaccinated.
Measures of household visitation indicate that household visitation activity was responsive to prevalent COVID-19 risk, ahead of the implementation of restrictions (i.e. Alpha variant in December 2020), as well as before they were officially lifted (1st and 3rd National Lockdowns), offering evidence individuals may respond to a perceived personal and/or collective risk of COVID-19 infection over and above current government policy or guidance.
Local lockdowns in Leicester and Liverpool indicated a likelihood of contrasting profiles of adherence over time to ‘local lockdown’ measures prohibiting household visitation, also highlighting the potential of smartphone mobility data to indicate waning population-wide adherence in a single aggregated local authority area (where sample size N > 10 is consistently satisfied, to protect against the risks from Statistical Disclosure).
Cuebiq mobility data for England is geographically representative across a series of temporal and spatial aggregations, and across several points in the pandemic for our sample, even if other factors of social representativeness remain rightly unknown.
Research theme
Health informatics & urban analytics.
People
Mr Stuart Ross, LIDA Data Scientist Intern
Mr George Breckenridge, LIDA Data Scientist Intern
Dr Mengdie Zhuang, Lecturer in Data Science, University of Sheffield
Prof Ed Manley, Professor of Urban Analytics & LIDA Fellow
Partners
Data provider: Cuebiq Inc., NYC, Milan
Funders: LIDA intern work funded by the CDRC (Consumer Data Research Centre), so in turn by the ESRC (Grant ES/L011891) of UKRI. Broader research project also supported by i-sense, so in turn by EPSRC (Grant EP/R00529X/1) of UKRI
GOLIATH: Geographies of Lifestyle, Activity, Transport and Health (Case Study)
Consumer data can provide insight in to a wide range of human activity, but there is a trade-off between privacy and utility of the data.
Project overview
Consumer data collected by commercial providers have huge potential for a range of research purposes but can be challenging to access as they are often held in secure environments. Secure handling of these datasets is crucial, as consumer data contains sensitive attributes (e.g. address) or commercially sensitive data (e.g. they have been purchased or contain licenced information). This project provides a proof of concept for creating enhanced and aggregated versions of consumer datasets for research purposes, and a dashboard for exploring those data.
Data and methods
Taking securely held consumer datasets within the Consumer Data Research Centre (CDRC), the objective of the project was to produce non-disclosive and aggregated versions of the data whilst maintaining the unique characteristics and value of those data. An R Shiny app visualising the aggregated data has been developed to showcase the utility of non-disclosive datasets for research purposes. Based on a randomised sample of Whenfresh/Zoopla consumer data, key matrices such as median price and affordability are calculated for different property types at the Middle Layer Super Output Areas (MSOA) level. Additionally, open data is used to calculate further metrics, for example, the attractiveness of an area based on Census flow data. The next steps include improving the efficiency, loading and updating times of the R Shiny appso that it can be populated with additional datasets.
Key findings
Using existing data, especially anonymised and aggregated consumer data, this research project can be seen as a proof of concept for an ‘alternative’ or ‘big data’ census. Different data types, e.g. time series, static, and origin-destination flow data, have successfully been combined and can be explored by the user in a dashboard (Figure 1).
Value of the research
The prototype R Shiny app forms the basis for further work in providing a dashboard for exploring local area statistics. Moving forward, other consumer data could be included as part of GOLIATH, for example, transport and lifestyle datasets. Utilising consumer data in addition to traditional census counts contributes to efforts to create an ‘alternative’ or ‘big data’ census.
Insights
Devised methods for the aggregation and calculation of metrics for secure consumer data
Developed a prototype R Shiny App for the visualisation of spatially disaggregated information
Research theme
Urban analytics
People
Maike Gatzlaff LIDA Data Scientist Intern
Dr Nik Lomax Co-Director of the Consumer Data Research Centre
Professor Mark Birkin Co-Director of the Leeds Institute for Data Analytics
Dr Will James Research Fellow, University of Leeds
Partners
The Consumer Data Research Centre
Funders
The data for this research have been provided by the Consumer Data Research Centre, an ESRC Data Investment, under project ID CDRC [Project Number], ES/L011840/1; ES/L011891/1.
Measuring Ambient Populations during COVID-19 in Leeds City Centre (Case Study)
The COVID-19 pandemic led to lockdowns being implemented all over the world, including in the UK. The aims of the project were to investigate relevant data sources for modelling the ambient population of Leeds City Centre during COVID-19 and analysing the impacts that lockdown policies had on urban footfall. The research builds on previous work undertaken with Leeds City Council by intersecting key dates from the English lockdowns and integrating these into machine learning models to assess the importance of different aspects of lockdowns. It also predicts what “business as usual” may have been like had there been no pandemic.
Leeds City Council have been collecting footfall data for more than a decade. The data were wrangled and aggregated to create a history going back to 2008. These data were then analysed alongside key lockdown dates to determine where trends in urban footfall intersected, raising questions about what aspects of these policies might have had the most impact. The data cover a relatively small geographical area of Leeds City Centre and only reflect pedestrian traffic going past the locations identified by the cameras. There are issues with data quality, such as potential double counting, periods of time with missing data and inconsistent file formatting, however it covers a large temporal scale and many problems can be worked around.
Google COVID-19 Community Mobility data was analysed as a potential alternative data source to the Council data. It shows changes in mobility from a baseline for six different destinations (see the website for more details). The smallest relevant spatial coverage is the Leeds City Region. This was considered too large to isolate any changes impacting the city centre, making comparison of trends difficult.
The Council footfall data were resampled to show daily counts on which analysis was then conducted. Visual analysis was undertaken to identify footfall trends over the course of the pandemic against key dates pertaining to the implementation and lifting of certain COVID-19 restrictions. These key dates were decided from research into when major legislation came into force or government announcements about restrictions were made. The questions generated from initial analysis were then explored by creating a series of machine learning models using Random Forest Regression in the Python SciKit Learn package.
The first model included a series of input variables to represent different aspects of society that had restrictions placed on them alongside other external conditions (such as weather, school/bank holidays, day of week, etc). Variable importance was used to identify what (if any) aspects of lockdown might be significant in predicting future changes in footfall. The second model omitted any lockdown related inputs and was designed to make predictions on what “business as usual” might have been like had the pandemic not happened.
Due to the inherently ordered nature of time series data, both models were validated using a method known as “Walk-Forward Validation” instead of the default Cross-validation included in SciKit Learn and often used on Random Forest Ensembles. The implementation of Walk-Forward validation allows the model to be retrained after every prediction on the validation dataset, essentially “walking forward” through the time series. This avoids potential data leakage because of the randomised nature of Cross-validation.
Key findings
The chart below shows the resampled footfall data intersecting with key dates from COVID-19 restrictions.
Key dates are shown as a dotted line with a number relating to a key. Red zones indicate “official” lockdowns whilst orange represents periods where a variety of restrictions were in place but in the process of being lifted/introduced individually. A summary of how this impacted footfall is below:
Footfall started to drop immediately after the announcement on 16th March 2020, no official restrictions implemented.
After non-essential shops and schools reopened on 15th June 2020, footfall started to rise again.
Footfall continues to rise through summer until around 22nd September 2020 when some restrictions were announced.
Footfall rises whilst Leeds is in tier 2 and 3, potentially because gatherings are only permitted in public spaces.
The second and third lockdowns drive footfall back down again until restrictions begin to ease again in April 2021.
The first machine learning model was intended to explore whether any lockdown variables would be significant in predicting future changes in footfall. Variable importance (top 10) is shown below.
The most important lockdown-related features were indoor entertainment and non-essential retail. Whilst this is only an initial model and not a definitive conclusion, it does help indicate what aspects of lockdown might have impacted pedestrian traffic in the city centre more than others.
The second model was designed to test how useful the data would be in predicting what “business as usual” may have been like.
There was little difference between error scores across different numbers of trees, so a compromise of the best score and least processing power (500 trees) was chosen. The model predictions using this hyperparameter are shown below.
Results from this initial model are by no means definitive, however the potential to quantify how much footfall has been lost exists. For example:
Average daily footfall in the lead up to Christmas (taken as 30th November to 24th December 2020) was approximately 36% lower than predicted.
Average daily footfall over the school holidays was approximately 63% lower than predicted.
Approximate footfall loss for individual Bank Holidays was also calculated. Most recorded over 90% lower than predicted values except for the August Bank Holiday which was around 22% lower.
Value of the research
Initial analysis has already been delivered to Leeds City Council. An aggregated dataset of footfall camera data has been created and is available on the Consumer Data Research Centre (CDRC) Data Store for future research. The initial models developed can be used and refined by future researchers and develop more accurate predictions, whilst more specific time series packages can be explored.
Insights
Urban footfall and ambient population was significantly impacted by COVID-19 lockdown policies (as was intended).
Closure of Indoor Entertainment and Non-Essential retail appear to be the most important lockdown-related factors in predicting footfall change.
Consideration must be given to how time series data is processed in classic machine learning models such as Random Forests.
Research theme
Urban analytics
People
Tom Albone – Data Scientist Intern (LIDA)
Dr Nick Malleson – Professor of Spatial Science
Professor Alison Heppenstall – Professor in Geocomputation