Navigation

Technology

What is Data-to-Text

Data-to-Text technology allows computer programs to automatically generate summaries in English (or other human languages) of numerical and other data sets, including time-series and geospatial data. These summaries help people understand and make best use of the data available to them, by presenting the information as a narrative (story).

Applications we have worked on include

  • Generating textual weather forecasts from numerical weather simulation data, for the offshore oil industry and also for road maintenance engineers.
  • Generating summaries of patient data from electronic health records, for doctors, nurses, and patients.
  • Generating summaries of sensor data from a gas turbine, for maintenance engineers.
  • Generating summaries of educational assessment data, to help test takers understand their strengths and weaknesses.
  • Generating summaries of statistical data for blind users, who cannot see maps and tables

Examples

A simple example (demo) of a data-to-text system is the pollen forecast for Scotland generator. This system takes as input 6 numbers which give predicted pollen levels in different parts of Scotland, and from these numbers produces a summary text, such as

Grass pollen levels for Tuesday have decreased from the high levels of yesterday with values of around 4 to 5 across most parts of the country. However, in South Eastern areas, pollen levels will be high with values of 6.

A more complex example is the BabyTalk system, which generates summaries of medical data about babies in a neonatal intensive care unit. BabyTalk's input data is second-by-second readings of medical sensors (heart rate, blood oxygen level, etc), plus a record of medical actions performed by doctors and nurses. From this input data, it generates summary texts such as

Over the next 24 minutes there were a number of successive desaturations down to 0. Fraction of Inspired Oxygen (FIO2) was raised to 100%. There were 3 successive bradycardias down to 69. Neopuff ventilation was given to the baby a number of times. The baby was re-intubated successfully.
 
Another complex example is the data-to-text system that generates weather forecast texts for winter road maintenance engineers. The input to this system consists of numerical weather predictions for thousands of locations on a road network. The system textually summarises the weather for the entire region covered by the road network as shown below:

       Overview:       Road surface temperature will fall below zero on all routes during the late evening until around midnight.

    Wind (mph):     NE 15-25 gusts 50-55 this afternoon in most places, backing NNW and easing 10-20 tomorrow morning, gusts 30-35 during this evening until tomorrow morning in areas above 200M

    Weather:        Snow will affect all routes at first, clearning at times then turning moderate during tonight and the early morning in all areas, and persisting until end of period. Ice will affect all routes from the late evening until early morning. Hoar frost will affect some southwestern and central routes by early morning. Road surface temperatures will fall slowly during the evening and tonight, reaching zero in some far southern and southwestern places by 21:00. Fog will affect some northeastern and southwestern routes during tonight and the early morning, turning freezing in some paces above 400M.



One of the important aspects of the above system is the ability to compute interesting spatial weather patterns from the input weather prediction data and to describe these patterns using appropriate geographical frame of reference.



One of the earliest example data-to-text systems we built produced marine weather forecast texts for the offshore oil industry. The output of this data-to-text system was found to be better than human written forecasts in an evaluation study involving oil company staff who routinely use weather forecasts for making operational decisions on offshore oilrigs. An example forecast text generated by this system is:

1.INFERENCE (Inference not machine generated) 0300 GMT, SUNDAY,           10-Sep   2000
LOW (989 MB),SW OF THE LOFOTENS,WILL MOVE SLOWLY ENE TO REACH
NORTHERN FINLAND EARLY MONDAY MORNING. A SHALLOW ATLANTIC LOW
WILL MOVE NE, EXPECTED OVER EIRE (1017 MB) BY MIDDAY SUNDAY THEN
DRIFTING EAST TO REACH THE GERMAN BIGHT (1012 MB) BY LATE MONDAY.
A RIDGE OF HIGH PRESSURE WILL BUILD NORTHEAST FROM THE CENTRAL
ATLANTIC FORMING A HIGH CELL (1023 MB) TO THE NORTH OF SCOTLAND
BY LATE THIS EVENING,REACHING HALTENBANKEN,(1026 MB),BY MIDDAY
TUESDAY WITH A RIDGE COVERING THE NORTH SEA.

2. FORECAST 6 - 24 GMT, Sun 10-Sep 2000
WIND(KTS)
10M: WNW 13-18 gradually veering NW 5-10
50M: WNW 20-25 gradually easing NW 10-15
WAVES(M)
  SIG HT:  3-4 mainly WNW swell. 
  MAX HT:  5.5-6.5 mainly WNW swell. 
  PER(SEC):  4-6 predominantly WNW 10-11 second swell. 
WEATHER: Some clear intervals through partly broken cloud becoming overcast with light to moderate rain.
VIS(NM): Greater than 10 but reduced in precipitation at times. 
TEMP(C): 8-10 rising 12 falling 8 later. 
CLOUD(OKTAS/FT): 2-4CU/SC 1000-1500 lowering 6-8ST/SC 400-900 lowering 5-7ST/SC 350-850. 
 
3. FORECAST 0 - 24 GMT, Mon 11-Sep 2000
WIND(KTS)
   10M: NW 5-10 gusts 20 in showers, gradually veering NE. 
   50M: NW 10-15 gusts 25 in showers, gradually veering NE. 
WAVES(M)
   SIG HT: 2.5-3.5 mainly NW swell. 
   MAX HT: 5-6 mainly NW swell. 
   PER(SEC): 4-6 predominantly WNW 10 second swell falling 2-4 predominantly NW 9 second swell. 
WEATHER: Mainly cloudy with light to moderate rain showers becoming partly cloudy later. 
VIS(NM): Greater than 10  but reduced in precipitation at times. 
TEMP(C): 6-8 rising 10 falling 8 later.
CLOUD(OKTAS/FT): 5-7ST/SC 350-850 lifting 3-5CU/SC 1000-1500. 
 
4. FORECAST 0 - 24 GMT, Tue 12-Sep 2000
WIND(KTS)
     10M: NE 5-10 gradually veering ESE 8-13 gusts 20-25 in showers. 
     50M: NE 10-15 gradually veering ESE 13-18 gusts 25 in showers. 
WAVES(M)
     SIG HT: 2-3 mainly NW swell falling 2 or less mainly WNW swell. 
     MAX HT: 3.5-4.5 mainly NW swell falling 2-3 mainly WNW swell. 
     PER(SEC): 3-4 sec sea,  predominantly NW 9 second swell falling 1-3 predominantly WNW 8 second swell. 
WEATHER: A few clear intervals but otherwise rather cloudy becoming overcast with rain later.
VIS(NM): Greater than 10 but reduced in precipitation at times. 
TEMP(C): 8 rising 9-11 then decreasing 7-9. 
CLOUD   (OKTAS/FT): 3-5CU/SC 1000-1500 lowering 7-8ST/SC 0-500.

Benefits of Data-To-Text

If a company already needs to produce textual summaries of data, then using our software can enhance the quality of the texts while reducing the time taken to write them. For example, two Aberdeen meteorological companies used our software to generate draft weather forecasts, which human forecasters checked and (if necessary) edited before releasing them to customers. This resulted in better forecasts from their customers perspective (in part because of more consistent use of standard terminology), as well as reducing the time human forecasters spent on writing forecasts.

If a company's staff need to make decisions based on data, then our software can assist the decision-making process, by providing short (one page or less) narrative summaries of the data sets. This is the goal of the BabyTalk project. We will begin deploying BabyTalk in a hospital in late 2009; by mid 2010 we should have solid data about BabyTalk's effectiveness as a decision-support aid.

We also believe that our technology can support people with visual and cognitive impairments; for example it can be used to make data sets accessible to visually-impaired users, which is a legal requirement in many cases.

How does data-to-text work

Data-to-text systems generate texts in four stages:

  • Signal Analysis: Basic temporal and spatial patterns, such as spikes and trends, are identified in the input data. This is done using standard pattern detection and noise-suppression algorithms, with some modifications to enhance effectiveness in a data-to-text context.
  • Data Interpretation: Patterns and other data are linked together; important abstractions and causal links are identified. This is done using a knowledge-based system, which uses rules acquired from client domain experts.
  • Document Planning: Key messages are extracted from the previous steps, and linked together into a narrative structure. This is done using our document-planning engine, which we customise based on analysis of example "ideal" texts provided by clients.
  • Language Production: Actual texts are produced. This is done using our language production engine; usually it is necessary to supplement the core engine with domain-specific terminology.