BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Limics - ECPv6.16.3//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Limics
X-ORIGINAL-URL:https://www.limics.fr
X-WR-CALDESC:Évènements pour Limics
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20240331T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20241027T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20250330T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20251026T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20260329T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20261025T010000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=Europe/Paris:20250428T140000
DTEND;TZID=Europe/Paris:20250428T150000
DTSTAMP:20260614T123608
CREATED:20241220T155826Z
LAST-MODIFIED:20250422T112658Z
UID:1619-1745848800-1745852400@www.limics.fr
SUMMARY:Ariel Cohen\, Introduction to Weak Supervision & Applications
DESCRIPTION:Introduction to Weak Supervision & Applications \nThe recent digitization of patient health records and their collection\, in a near real-time basis\, in Clinical Data Warehouses (CDWs) offer new perspectives for research\, steering activities and policy making. Although promising\, taking advantage of Electronic Health Records (EHR) is still a current challenge. Particularly\, textual data are very rich in information but their exploitation remains extremely difficult. The development of efficient methods of information extraction from unstructured data for further use is\, therefore\, essential.\nNatural language processing (NLP) techniques applied to health care notes have already shown satisfactory results in the literature\, especially with supervised learning approaches. However\, this good performance depends strongly on the existence of many annotated records and\, moreover\, these annotations must be performed by domain experts. This annotation task is in practice a bottleneck for the development of research because the experts’ available time is a scarce and expensive resource. Furthermore\, the majority of annotated datasets issued from clinical notes could not be shared and reused due to patient privacy regulations. \nThe challenge of acquiring labelled training data has driven the search for alternatives to traditional supervised machine learning. There are new engineering and mathematical methodologies that focus on minimising the expert annotation task\, especially the weak supervision approaches. Programmatic weak supervision encompasses a wide range of techniques that aim to learn from data where the supervision comes from labelling functions. Among those techniques\, the distant supervision approach allows the use of multiple data sources to build annotated datasets automatically\, consequently\, much faster than what can be produced by manual annotation. However\, this programmatic annotation is imperfect\, producing “silver standard” datasets with partially unreliable labels\, also called noisy labels. Many machine learning algorithms\, including the most recent such as Deep neural networks (DNNs)\, are susceptible to overfit on noisy labels; Therefore\, several efforts and methods have been developed to be able to learn from noisy labels with DNNs. \nThere is also an increasing interest in the use of Large Language Models (LLM) to solve NLP tasks of information extraction in the medical domain without the need of an expert labelled training set. Even though\, to date\, they present several limitations: first\, it has been shown that these models are not as performant as smaller supervised contextual models (e.g. BERT). Second\, the operational cost of deploying these huge resource demanding models in a CDW with more than 11M patients is not conceivable from an industrial perspective. The need for a dedicated\, state-of-the-art hardware and the energy consumption of it makes\, at date\, prohibitively expensive the massive use of this technology for inference purposes. On the other hand\, recent publications suggest that these models are suitable for the labelling task and they could accelerate the development of smaller specialized models. \nThe primary goal of our work is to explore how weak supervision approaches can be developed within a CDW to reduce the annotation workload for medical professionals and speed up NLP model development\, while addressing the constraints typical of this industrial environment. Our research will be developed using multiple real-world use cases\, and aims to answer the following research questions: Can weak supervision methods be applied in a Clinical Data Warehouse context to accelerate the development of NLP models? How can we leverage information redundancy present in certain portions of Electronic Health Records with a CDW to create labelling functions to obtain a programmatically annotated corpus (silver standard) which allows us to fit a model using distant supervision? How can we take advantage of Large Language Models in the annotation phase of training sets\, and how can we use these datasets for the development of smaller\, specialized models ready for deployment in a Clinical Data Warehouse? Which are the most effective training techniques for handling these silver standard datasets?
URL:https://www.limics.fr/event/ariel-cohen/
END:VEVENT
END:VCALENDAR