AI & Research

Modern BERTic - the language model that reads your region’s languages.

An encoder model trained on 60 billion tokens in Bosnian, Croatian, Montenegrin, and Serbian. Built for understanding, not generation. Fast, local, and ready for the work that actually moves the needle in your stack.

Talk to the research team Request the technical brief

Important distinction

Modern BERTic is not ChatGPT.

It does not write. It does not chat. It does not generate emails.

It reads. Specifically: it converts any sentence in BCMS into a 768-dimensional vector that captures meaning - so two sentences that say the same thing in completely different words end up in the same place. That is the foundation for search, matching, classification, and extraction at scale.

Generative models write text. Encoder models understand it. For most enterprise problems involving regional-language text, an encoder is what you actually need - faster, cheaper, and able to run on your infrastructure.

	Generative LLMs (ChatGPT, etc.)	Modern BERTic (encoder)
Output	Generates new text	Understands existing text
Best at	Writing, summarizing, conversation	Search, matching, classification, extraction
Speed	Slower	Faster
Cost	Higher per request	Lower per request
Deployment	Usually cloud-only	Can run locally
BCMS quality	Translated training data	Native - 60B tokens

Why it matters

Three things you get.

Native, not translated.

60 billion tokens of training data in BCMS - not English models with translation layers. The difference shows up immediately on regional content.

Fast and local.

Runs on your infrastructure. No data leaves your network. Privacy-by-default for regulated sectors - banking, public administration, healthcare.

In-domain by design.

Trained on the kind of text people actually write - CVs, contracts, requests, news, support tickets. Not just Wikipedia.

Use cases

Where it is used.

Public sector: auto-routing citizen requests by intent - no manual triage.
Legal: semantic search across case law - finds relevant rulings keyword search misses.
Banking & insurance: automatic classification of customer requests, no rule maintenance.
Media & PR: sentiment analysis across thousands of articles per day.
E-commerce: intent-based product search ("something warm for the mountains" finds jackets and hoodies).
Inside Recrewty: candidate-role matching, assessment recommendation, structured CV extraction (NER).

Open source

BalkanBench - the open benchmark for BCMS language models.

We pushed BalkanBench as an open source initiative because regional NLP needs a shared, transparent yardstick - not vendor-curated numbers. Datasets, tasks, and evaluation code are public.

See how Modern BERTic stacks up against XLM-RoBERTa, mBERT, and the legacy BERTic on real BCMS tasks - and run the suite on your own model.

Explore BalkanBench

What is in the box

Native BCMS evaluation tasks - classification, NER, semantic similarity, retrieval.
Reproducible scripts and a public leaderboard.
Permissive license. Contributions welcome.
Maintained by Recrewty, scored independently.

balkanbench.com →

Partnerships

Two ways to work with Modern BERTic.

Build on it.

AI teams and ISVs: API access, fine-tuning support, and joint roadmap on regional NLP.

Reach out to the research team →

Co-build with us.

HR tech vendors, recruitment agencies, and consultancies: reseller and co-build partnerships where Modern BERTic becomes the language layer in your product.

Start a partnership conversation →

Want the technical brief?

We share the model card, evaluation methodology, and deployment patterns under NDA with research and engineering teams. Tell us briefly what you are building.

Talk to the research team