Elasticsearch has native integrations with the industry-leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps with the Elastic vector database.
To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.
If you work with English search, standard text analysis usually just works. You index “running,” the analyzer strips the suffix to store “run,” and a user searching for “run” finds the document. Simple.
But if you work with languages like Hebrew, Arabic, German, or Polish, you know that standard rule-based analyzers often fail. They either under-analyze (missing relevant matches) or overanalyze (returning garbage results).

Wrong results when searching for “A black and white carpet” in a morphologically complex language. (Photo by yaed on Unsplash.)
For years, we’ve had to rely on complex dictionaries and fragile regex rules. Today, we can do better. By replacing rule-based logic with neural models for text analysis (small, efficient language models that understand context), we can drastically improve search quality.
Here’s how to solve the morphology challenge by using the Elasticsearch inference API and a custom model service.
The problem: Why rules fail
Most standard analyzers are context-free. They look at one word at a time and apply a static set of rules.
- Algorithmic analyzers (like Snowball) strip suffixes based on patterns.
- Dictionary analyzers (like Hunspell) look up words in a list.
This approach breaks down when the structure of a word (its root and affixes) changes based on the sentence it lives in.
1. The semitic ambiguity (roots versus prefixes)
Semitic languages, like Hebrew and Arabic, are built on root systems and often attach prepositions (such as, in, to, or from) directly to the word. This creates ambiguous tokens that rule-based systems cannot solve.
- Word:
בצל(B-Tz-L). - Context A: “The soup tastes better with onion (batzal).”
- Context B: “We sat in the shadow (ba-tzel) of the tree.”
In Context A, בצל is a noun (onion). In Context B, it’s a preposition ב (in) attached to the noun צל (shadow).
A standard analyzer is forced to guess. If it aggressively strips the ב prefix, it turns "onion" into "shadow." If it’s conservative and leaves it alone, a user searching for "shadow" (tzel) will fail to find documents containing "in the shadow" (batzel). Neural models solve this by reading the sentence to determine whether the ב is part of the root or a separate preposition.
2. The compound problem (German, Dutch, and more)
Languages like German, Dutch, Swedish, and Finnish concatenate nouns without spaces to form new concepts. This results in a theoretically infinite vocabulary. To search effectively, you must split (decompound) these words.
- Word:
Wachstube. - Split A:
Wach(guard) +Stube(room) = guardroom. - Split B:
Wachs(wax) +Tube(tube) = wax tube.
A dictionary-based decompounder acts blindly. If both “Wach” and “Wachs” are in its dictionary, it might pick the wrong split, polluting your index with irrelevant tokens.
To see this problem in English: A naive algorithm might split “carpet” into “car” + “pet.” Without understanding meaning, rules fail.

Photo by Bob Brewer on Unsplash.
The solution: “Neural analyzers” (neural models for text analysis)
We don’t need to abandon the inverted index. We just need to feed it better tokens.
Instead of a regex rule, we use a neural model (like BERT or T5) to perform the analysis. Because these models are trained on massive datasets, they understand context. They look at the surrounding words to decide whether בצל means "onion" or "in shadow" or if Wachstube belongs in a military or cosmetic context.
Architecture: The inference sidecar
We can integrate these Python-based models directly into the Elasticsearch ingestion pipeline using the inference API.
The pattern:
- External model service: A simple Python service (for example, FastAPI) hosts the model.
- Elasticsearch inference API: Defines this service as a custom model within Elasticsearch.
- Ingest pipeline: Sends text to the inference processor, which calls your Python service.
- Index mapping: Create a
whitespacetarget field for the analyzed text. - Indexing: The service returns the cleaned text, which Elasticsearch stores in the target field.
- Search: Queries are analyzed via the inference API before matching.

Implementation guide
Let’s build this for Hebrew (using DictaBERT) and German (using CompoundPiece).
To follow along, you’ll need:
- Python 3.10+.
- Elasticsearch 8.9.x+.
Install the Python dependencies:
Step 1: External model service
To connect Elasticsearch to our neural model, we need a simple API service that:
- Receives text from the Elasticsearch inference API.
- Passes it through the neural model.
- Returns analyzed text in a format Elasticsearch understands.
This service interfaces Elasticsearch with the neural model. At ingest time, the Elasticsearch pipeline calls this API to analyze and store document fields; at search time, the application calls it to process the user's query. You can deploy this on any infrastructure, including EC2, Lambda, or SageMaker.
The code below loads both models at startup and exposes /analyze/hebrew and /analyze/german endpoints:
Save the code above to a file (for example, analyzer_service.py), and run:
Wait for “Models loaded successfully!” (takes ~30–60 seconds for models to download on first run).
Test locally:
Expected output:
Step 2: Configure Elasticsearch inference API
We’ll use the custom inference endpoint. This allows us to define exactly how Elasticsearch talks to our Python endpoint.
Note: Use response.json_parser to extract the content from our normalized JSON structure. You do not need to stick with the OpenAI output format. We’re using it here for consistency reasons, since we’re using the completion task type, which is text to text.
Exposing your local service
For testing, we’ll use ngrok to expose the local Python service to the internet. This allows any Elasticsearch deployment (self-managed, Elastic Cloud, or Elastic Cloud Serverless) to reach your service.
Install and run ngrok:
Expose your local service:
ngrok will display a forwarding URL like:
Forwarding https://abc123.ngrok.io -> http://localhost:8000
Copy the HTTPS URL. You’ll use this in the Elasticsearch configuration.
Configure the inference endpoint
Replace https://abc123.ngrok.io with your actual ngrok URL.
Note: ngrok is used here for fast testing and development. The free tier has request limits, and URLs change on restart. For production, deploy your service to a persistent infrastructure.
For production (with API Gateway)
In production, deploy your Python service to a secure, persistent endpoint (such as AWS API Gateway + Lambda, EC2, ECS, or any cloud provider). Use secret_parameters to securely store API keys:
Step 3: Ingest pipeline
Create a pipeline that passes the raw text field to our model and stores the result in a new field.
Step 4: Index mapping
This is the most critical step. The output from our neural model is already analyzed. We do not want a standard analyzer to mess it up again. We use the whitespace analyzer to simply tokenize the text we received.
Step 5: Indexing
Option A: Single document.
Option B: Reindex existing data.
If you have existing data in another index, reindex it through the pipeline:
Option C: Set pipeline as default for index.
Make all future documents automatically use the pipeline:
Then index normally (no ?pipeline= needed):
Step 6: Search
Search using a neural analyzer in Elasticsearch is a two-step process, so analyze the query first using the inference API, and then search with the result:
A. Analyze the query.
B. Search with the result.
In production, wrap these two calls in your application code for a seamless experience.
Available models
The architecture above works for any language. You simply swap the Python model and adjust the post-processing of the output. Here are verified models for common complex languages:
- Hebrew: Context-aware lemmatization. Handles prefix ambiguity (ב, ה, ל, and more) dicta-il/dictabert-lex.
- German: Generative decompounding. Supports 56 languages, including Dutch, Swedish, Finnish, and Turkish. benjamin/compoundpiece.
- Arabic: BERT-based disambiguation and lemmatization for Modern Standard Arabic. CAMeL Tools.
- Polish: Case-sensitive lemmatization for Polish inflections. amu-cai/polemma-large.
Conclusion
You don’t need to choose between the precision of lexical search and the intelligence of AI. By moving the “smart” part of the process into the analysis phase using the inference API, you fix the root cause of poor search relevance in complex languages.
The tools are here. The models are open-source. The pipelines are configurable. It’s time to teach our search engines to read.
Code
All code snippets from this article are available at https://github.com/noamschwartz/neural-text-analyzer.
References:
- https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-custom
- https://www.elastic.co/docs/manage-data/ingest/transform-enrich/ingest-pipelines
- https://ngrok.com
- https://huggingface.co/dicta-il/dictabert-lex
- https://huggingface.co/benjamin/compoundpiece
- https://arxiv.org/pdf/2305.14214




