Index file for Samba Server, quick find

Hello people.
I have the latest version of NetshServer installed.
My data mass has millions of small files where searching is slow.

I would like to have an index file for faster searching when users try to browse some file on samba server.
It used to be “netatalk AFP” for linux rpm, and it has its own index, but AFP falls out of favor.
When I installed the AFP version for Windows Server, AFP used the native Windows feature “Windows Search”, there was always some problem reindexing, but ok, there was something.
I haven’t found anything on Linux, I know XFS works with journaling, but it doesn’t even come close to having an index file.
I don’t know if NetshServer already has some complementary support feature for what I need in order to integrate with Samba, or if there is any syntax I can enable in smb.conf.

I have an SSD for the operating system where I intend to store this index, in order to speed up the search in the file itself, besides 32 GB RAM and only use terminal mode.

I read the documentation on https://www.samba.org/ but found no parameter to specify an index file.

Does anyone in the community know anything about this and could help me?
Thank you in advance.
Paulo

Just for the sake of clarity, journaling for a filesystem and indexing files have nothing at all to do with each other.

Maybe overpowered, but could this be an option to try to get to run on NS: https://github.com/shirosaidev/diskover

Hi danb35 and robb
thanks for the quick return

About XFS, I understand that, thanks for scoring. Sorry if I mixed the subjects, my intention was to make reference to file organization.

about https://github.com/shirosaidev/diskover I’ll have to take a good read, I don’t understand much about that.

I was reading about Elastiseach-plugin smb
makes sense ?

I understood diskover uses elasticsearch as base.

I will study how to implement on Linux without attacking the idea of ​​NethServer.

If I’m not mistaken, elastisearch, docker and others should be shipped with version 8.

Still, if anyone has a simpler idea please submit.

If it is for Mac environment, check https://wiki.samba.org/index.php/Spotlight (don’t know if the package we use needs to be recompiled)

Other tuning options for Directories with a Large Number of Files, but annoying as it requires changing filename case type.

EDIT: other than proprietary Windows Server solution, a project to index samba shares compatible with Windows Search Protocol (samba WSP) was started by SUSE but it is not finished and no estimated timeline that I’m aware of:


Docker has been discussed here a bit in the past, but I don’t recall much recently.

My understanding is that both networking and templating have been issues, though I haven’t been following the situation closely.

We already have cockpit and afaik cockpit is a very nice interface to manage docker instances.,…
https://cockpit-project.org/guide/133/feature-docker.html

Hello people
A few days ago I got in touch to check out a search solution by indexing on Linux / SMB. Both on Netshserver and in other communities.
After some frustrating research, reading smb.org, communities, expert friends, no one had ever come across this need.
There is a reason, the Linux file system (EXT 3/4, XSF etc) and its find-based search method is very fast.
But I am not in the console performing search, but my end user on a Windows and / or Apple station.
When we are talking up to 500 thousand files we do not even notice, but as it passes the 5 million files begins to change the scenario.
I used to use AFP (netatalk) in CentOS, and it has its proprietary indexing mechanism.
But as AFP is discontinued and SMB is doing well on Microsoft, Linux and Apple platforms, and the audit helps a lot … I started the journey on how to solve the problem.

These 5 million files are small, at most 10 Mega, usually 5 Mega. Audio files. Few have Gibabyte, exceptions.

Finally I found a package that creates an index for Linux and still takes its recall content.
https://centos.pkgs.org/7/epel-testing-x86_64/recoll-1.23.1-2.el7.x86_64.rpm.html

however from what I understand the same only works with GUI interface, and I haven’t seen how I can integrate it with SMB.

I am going to research to see if it is possible to integrate the feature without GUI, without attacking the concept of Netshserver, and invite everyone in the community to the journey.

It would be amazing to have this package bundled with NetshServer version 8 as an addon.

Hug,
Paulo Basque

Hi, Paulo. Edited my previous answer adding some info on status of indexing of samba shares for Windows clients.

Hello @dnutan
I’m sorry for the late reply.
I searched so much on the Web that I arrived at the same as indicated.
The information in the article: “Directories with a Large Number of Files”, this one I hadn’t seen, thanks.
I added the parameters below, it improved, but it still hasn’t solved.

“**************************************************************************************************”
#SMB.CONF

case sensitive = true
default case = lower
preserve case = no
short preserve case = no

Apple extensions (“AAPL”) run under the SMB2 protocol
max protocol = SMB2
client max protocol = SMB2

Enabling Spotlight Search
spotlight = yes

Apple extensions require support for extended attributes (xattr) - the default is yes in Samba 4.9+:
ea support = yes

How to store OSx metadata:
fruit: metadata = stream
fruit: model = MacPro

TCP
socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF = 131072 SO_SNDBUF = 131072
"**********************************************************************************************"

In Mac OSx I opened the terminal and made OSx create the index on the machine
Enabling Network Volume for Index
mdutil / Volumes / DEPARTMENTS -i on
mdutil / Volumes / JOBS_OLD -i on
mdutil / Volumes / WORKS -i on

(reindex)
mdutil -E

"**********************************************************************************************"

Regarding the Spotlight “tracker” I am having trouble installing Tracker packages and additional libraries.

I don’t know if there is a repository that we can add that contains the packages.

I can’t evolve due to the lack of packages.

if i know where i can download the packages, i am grateful if i can inform.

If samba needs compiling, does it informs of something missing?
Sorry, I don’t know what is required. tracker (and tracker-devel) is in default CentOS repos (with libtracker-sparql being part of it if I’m not mistaken).


P.S.:If it’s of any use, here’s an OMV (debian based) forum thread with similar goal:

hello @dnutan
We are having trouble finding the packages and libraries that the article asks for.

objective of having index in SMB for OSx (Spotlight)

I will restart the process and post the errors / difficulties.

1 Like

Could you please share a video capture of the problem?
I’m not sure about the issue, I can say that we never had performance problems with millions of files.

When searching with a Windows or Apple client, it doesn’t bring any results.
we have already reinstalled NethServer and tested it on other networks and servers.
Note: it is a DELL T-440 Server
VOL System SSD
VOL Data RAID 5

is there any procedure I can do to send a test?
Basically I go to Find on Mac or Windows and it doesn’t work.

We tested on Synology’s DSM System and it works very fast.

I have a script to count the number of files and another script to count files by size.

we are currently using NAS Synology out of necessity, but we want to use NethServer.

This Server that I’m sending the print is in the process of reinstalling to be the main one.

I will try to record a video of how it does not bring results to customers.

@pbasque, I spent a lot of time investigating your request. I guess you need the Spotlight search feature on shared samba folders.
It has been introduced in a samba version newer than what is provided by Red Hat / CentOS, so we can’t support it in NethServer 7.

But you mention audio files, where spotlight is ineffective.
Maybe I completely misunderstood your request.

I was looking to solve the problem too and ended up migrating my user base to… nextcloud.

Users subscribe to the folders they need and sync them on their computers. For searching, they can use their local filesystem indexer or go to nextcloud’s UI and search there.

Users are happy. They can even access their file on the go. I ditched SMB.

Matt

But now everyone needs a local copy of their data. That’s kind of a big benefit to network storage that Nextcloud doesn’t give you (yes, they’ve promised this for a couple of years; like a lot of their other headline features, it seems relegated to wallow in limited-release, pre-alpha status).