Index file for Samba Server, quick find

Maybe overpowered, but could this be an option to try to get to run on NS: https://github.com/shirosaidev/diskover

Hi danb35 and robb
thanks for the quick return

About XFS, I understand that, thanks for scoring. Sorry if I mixed the subjects, my intention was to make reference to file organization.

about https://github.com/shirosaidev/diskover I’ll have to take a good read, I don’t understand much about that.

I was reading about Elastiseach-plugin smb
makes sense ?

I understood diskover uses elasticsearch as base.

I will study how to implement on Linux without attacking the idea of ​​NethServer.

If I’m not mistaken, elastisearch, docker and others should be shipped with version 8.

Still, if anyone has a simpler idea please submit.

If it is for Mac environment, check https://wiki.samba.org/index.php/Spotlight (don’t know if the package we use needs to be recompiled)

Other tuning options for Directories with a Large Number of Files, but annoying as it requires changing filename case type.

EDIT: other than proprietary Windows Server solution, a project to index samba shares compatible with Windows Search Protocol (samba WSP) was started by SUSE but it is not finished and no estimated timeline that I’m aware of:


Docker has been discussed here a bit in the past, but I don’t recall much recently.

My understanding is that both networking and templating have been issues, though I haven’t been following the situation closely.

We already have cockpit and afaik cockpit is a very nice interface to manage docker instances.,…
https://cockpit-project.org/guide/133/feature-docker.html

Hello people
A few days ago I got in touch to check out a search solution by indexing on Linux / SMB. Both on Netshserver and in other communities.
After some frustrating research, reading smb.org, communities, expert friends, no one had ever come across this need.
There is a reason, the Linux file system (EXT 3/4, XSF etc) and its find-based search method is very fast.
But I am not in the console performing search, but my end user on a Windows and / or Apple station.
When we are talking up to 500 thousand files we do not even notice, but as it passes the 5 million files begins to change the scenario.
I used to use AFP (netatalk) in CentOS, and it has its proprietary indexing mechanism.
But as AFP is discontinued and SMB is doing well on Microsoft, Linux and Apple platforms, and the audit helps a lot … I started the journey on how to solve the problem.

These 5 million files are small, at most 10 Mega, usually 5 Mega. Audio files. Few have Gibabyte, exceptions.

Finally I found a package that creates an index for Linux and still takes its recall content.
https://centos.pkgs.org/7/epel-testing-x86_64/recoll-1.23.1-2.el7.x86_64.rpm.html

however from what I understand the same only works with GUI interface, and I haven’t seen how I can integrate it with SMB.

I am going to research to see if it is possible to integrate the feature without GUI, without attacking the concept of Netshserver, and invite everyone in the community to the journey.

It would be amazing to have this package bundled with NetshServer version 8 as an addon.

Hug,
Paulo Basque

Hi, Paulo. Edited my previous answer adding some info on status of indexing of samba shares for Windows clients.

Hello @dnutan
I’m sorry for the late reply.
I searched so much on the Web that I arrived at the same as indicated.
The information in the article: “Directories with a Large Number of Files”, this one I hadn’t seen, thanks.
I added the parameters below, it improved, but it still hasn’t solved.

“**************************************************************************************************”
#SMB.CONF

case sensitive = true
default case = lower
preserve case = no
short preserve case = no

Apple extensions (“AAPL”) run under the SMB2 protocol
max protocol = SMB2
client max protocol = SMB2

Enabling Spotlight Search
spotlight = yes

Apple extensions require support for extended attributes (xattr) - the default is yes in Samba 4.9+:
ea support = yes

How to store OSx metadata:
fruit: metadata = stream
fruit: model = MacPro

TCP
socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF = 131072 SO_SNDBUF = 131072
"**********************************************************************************************"

In Mac OSx I opened the terminal and made OSx create the index on the machine
Enabling Network Volume for Index
mdutil / Volumes / DEPARTMENTS -i on
mdutil / Volumes / JOBS_OLD -i on
mdutil / Volumes / WORKS -i on

(reindex)
mdutil -E

"**********************************************************************************************"

Regarding the Spotlight “tracker” I am having trouble installing Tracker packages and additional libraries.

I don’t know if there is a repository that we can add that contains the packages.

I can’t evolve due to the lack of packages.

if i know where i can download the packages, i am grateful if i can inform.

If samba needs compiling, does it informs of something missing?
Sorry, I don’t know what is required. tracker (and tracker-devel) is in default CentOS repos (with libtracker-sparql being part of it if I’m not mistaken).


P.S.:If it’s of any use, here’s an OMV (debian based) forum thread with similar goal:

hello @dnutan
We are having trouble finding the packages and libraries that the article asks for.

objective of having index in SMB for OSx (Spotlight)

I will restart the process and post the errors / difficulties.

1 Like

Could you please share a video capture of the problem?
I’m not sure about the issue, I can say that we never had performance problems with millions of files.

When searching with a Windows or Apple client, it doesn’t bring any results.
we have already reinstalled NethServer and tested it on other networks and servers.
Note: it is a DELL T-440 Server
VOL System SSD
VOL Data RAID 5

is there any procedure I can do to send a test?
Basically I go to Find on Mac or Windows and it doesn’t work.

We tested on Synology’s DSM System and it works very fast.

I have a script to count the number of files and another script to count files by size.

we are currently using NAS Synology out of necessity, but we want to use NethServer.

This Server that I’m sending the print is in the process of reinstalling to be the main one.

I will try to record a video of how it does not bring results to customers.

@pbasque, I spent a lot of time investigating your request. I guess you need the Spotlight search feature on shared samba folders.
It has been introduced in a samba version newer than what is provided by Red Hat / CentOS, so we can’t support it in NethServer 7.

But you mention audio files, where spotlight is ineffective.
Maybe I completely misunderstood your request.

I was looking to solve the problem too and ended up migrating my user base to… nextcloud.

Users subscribe to the folders they need and sync them on their computers. For searching, they can use their local filesystem indexer or go to nextcloud’s UI and search there.

Users are happy. They can even access their file on the go. I ditched SMB.

Matt

But now everyone needs a local copy of their data. That’s kind of a big benefit to network storage that Nextcloud doesn’t give you (yes, they’ve promised this for a couple of years; like a lot of their other headline features, it seems relegated to wallow in limited-release, pre-alpha status).

@danb35

Hi

Quite agree, but there’s always more than one vantage point.
-> A glass can be half empty, or half full is a common saying…

Advantages: You do have x copies floating around, in case sh*t happens, and you find out your backup device just dumps core… :slight_smile:

And users can work when Internet is flakey or not at all available.

There are more here…

True, if you have a lot of users, the cumulated disk space waste get’s high, and other “issues”.

My 2 cents
Andy

That’s why you keep more than one backup, on more than one device, in more than one physical location. And that’s a lot easier to organize from one centralized location (the server) than from a bunch of individual PCs.

Agreed. There are definite benefits to having a local copy of your data, and if it’s stored on a SMB share, you don’t. But there are also definite benefits to having it stored on a remote server (ideally with a mechanism to store local copies of an index, frequently-accessed files, perhaps thumbnails or lower-resolution photos–a lot like what Apple does with iCloud), not hogging storage space on your client machine (which may not have a lot of it to spare). That’s a large part of why I store my video files on my FreeNAS box rather than my desktop computer–I don’t have 30 TB of storage to spare in my desktop.