[Ann] CintaNotes 2.0.2 Beta 1

Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Re: [Ann] CintaNotes 2.0.2 Beta 1

Postby Thomas Lohrum » Mon Jan 14, 2013 12:33 pm

CintaNotes Developer wrote:
Thomas wrote:The problem is with the Link-field, not the Tag-field!
It's essentially the same problem.
It's the difference between:
SELECT * FROM Notes WHERE text MATCH "engine google" OR link MATCH "engine google"
and
SELECT * FROM Notes WHERE (text MATCH "engine" OR link MATCH "engine") AND (text MATCH "google" OR link MATCH "google").

It is obvious that the second query will be much slower than the first.

If i want speed over hits, i can select title+text. If i go for "Anywhere" i want maximum hits. The help says, "In the Anywhere-mode, (...), CintaNotes includes all fields into the search." My expectation as a user is, that in this case search rules will always behave identical. That is a search for "word1 word2" should give my all notes containing both words "anywhere" in the text, regardless of title, text or link!

As for speed - how about the following suggestion: When storing the note in the database, append the link to the notes text using an internal syntax to separate the both. Now the search query can be executed the best way in terms of hits and speed. When reading the note for editing or displaying, the link values must be ripped from the notes text. The appended link value is redundant for search only. The use and storage of the link field itself is not affected.

Thomas
User avatar
CintaNotes Developer
Site Admin
Posts: 5002
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Ann] CintaNotes 2.0.2 Beta 1

Postby CintaNotes Developer » Mon Jan 14, 2013 12:59 pm

Thomas Lohrum wrote:If i want speed over hits, i can select title+text. If i go for "Anywhere" i want maximum hits. The help says, "In the Anywhere-mode, (...), CintaNotes includes all fields into the search." My expectation as a user is, that in this case search rules will always behave identical. That is a search for "word1 word2" should give my all notes containing both words "anywhere" in the text, regardless of title, text or link!

I understand. However, please consider that there are overall 6 fields: title, text, tags, link, time created and time modified. And they all need to be combined using OR operator which gives us, taking N as the number of words in the query, 6*N match clauses. This, unfortunately, will get very slow really soon!

Thomas Lohrum wrote:As for speed - how about the following suggestion: When storing the note in the database, append the link to the notes text using an internal syntax to separate the both. Now the search query can be executed the best way in terms of hits and speed. When reading the note for editing or displaying, the link values must be ripped from the notes text. The appended link value is redundant for search only. The use and storage of the link field itself is not affected.

Yes, this is a solution. Unfortunately it horribly violates an important law of good database design, namely the very first normal form: "column values should be atomic, no column should contain multiple values". In practical terms, it means that many database operations on these columns will be severely hampered, like e.g. sorting by link.

To summarize, I totally accept the usefulness of your request and suggest the following:
1) Add special search mode "Search across field boundaries" which would apply the F*N-matching to the SQL query.
2) Use only 4 fields for this matching: title, text, tags (stringized), and link. There's no real need to search "engine google 10.12.2012". Or is there?.. :)

Also, an altogether different idea: use some hints for the parser, like: engine title:google link:cintanotes.com (somewhat similar to Google search syntax).
Alex
User avatar
ChrisCN
Posts: 223
Joined: Wed Jul 04, 2012 10:20 am
Contact:

Re: [Ann] CintaNotes 2.0.2 Beta 1

Postby ChrisCN » Mon Jan 14, 2013 1:15 pm

Thomas I agree: There are times where it is necessary to find that one note you are searching for - regardless of speed.
Alex I agree: I would not start messing up the database design - this will avenge oneself.

As far as I have heard (although not an expert) there are still possible ways to accomplish the task.
What about building up some additional look-up tables for the search which could be optimized for this task?
Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Re: [Ann] CintaNotes 2.0.2 Beta 1

Postby Thomas Lohrum » Mon Jan 14, 2013 2:29 pm

CintaNotes Developer wrote:
Thomas Lohrum wrote:If i want speed over hits, i can select title+text. If i go for "Anywhere" i want maximum hits. The help says, "In the Anywhere-mode, (...), CintaNotes includes all fields into the search." My expectation as a user is, that in this case search rules will always behave identical. That is a search for "word1 word2" should give my all notes containing both words "anywhere" in the text, regardless of title, text or link!

I understand. However, please consider that there are overall 6 fields: title, text, tags, link, time created and time modified. And they all need to be combined using OR operator which gives us, taking N as the number of words in the query, 6*N match clauses. This, unfortunately, will get very slow really soon!

Nevertheless, imo the current design is not correct. It does not give me the results i expect. It is also in conflict with help.

CintaNotes Developer wrote:
Thomas Lohrum wrote:As for speed - how about the following suggestion: When storing the note in the database, append the link to the notes text using an internal syntax to separate the both. Now the search query can be executed the best way in terms of hits and speed. When reading the note for editing or displaying, the link values must be ripped from the notes text. The appended link value is redundant for search only. The use and storage of the link field itself is not affected.

Yes, this is a solution. Unfortunately it horribly violates an important law of good database design, namely the very first normal form: "column values should be atomic, no column should contain multiple values". In practical terms, it means that many database operations on these columns will be severely hampered, like e.g. sorting by link.

I am ware of the "very first normal form" :) Still CN can not be compared to a CRM-system, where a proper database design is much more important. As for the database operations: My suggestion is to use the largest field (note text) and append the short fields internally. The design of the database must not be changed, that is the short fields (title, link, tags (?)) remain separate fields.

CintaNotes Developer wrote:To summarize, I totally accept the usefulness of your request and suggest the following:
1) Add special search mode "Search across field boundaries" which would apply the F*N-matching to the SQL query.
2) Use only 4 fields for this matching: title, text, tags (stringized), and link. There's no real need to search "engine google 10.12.2012". Or is there?.. :)

Thank you for considering my suggestions. I guess it is theory to select all notes containing some text and having a modification date before some given date. If i really need this, i can search for the word and scroll down the list, to match the given date. Imo, title, text + link is a must. I'd also be pleased to see the tags field included. However, stringinizing the field also violates 1NF of database design ;) I think for tags a join should be used, even if this is slower/the slowest.

Limiting the search to the four fields as you described makes sense to me. Having one more option "Search across field boundaries" complicates things a little, but i agree it is a good compromise in regards to speed over hits.

There is also another suggestion regarding the offered field selection. Right now we have title+text,text only, title only, link only, etc. What about having each field as a checkable selection, which can be combined, e.g. [X] title [X] link. What do you think about this? I consider making this a suggestion on the roadmap.

Thomas
jkoerner
Posts: 8
Joined: Mon Dec 31, 2012 3:52 am
Contact:

Re: [Ann] CintaNotes 2.0.2 Beta 1

Postby jkoerner » Tue Jan 15, 2013 11:33 pm

CintaNotes Developer wrote:Alex
I wrote a lengthy reply on the subject of how to notify users about very bad bugs, but you did not respond to it.
In the 201 final topic.
Jon
User avatar
CintaNotes Developer
Site Admin
Posts: 5002
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Ann] CintaNotes 2.0.2 Beta 1

Postby CintaNotes Developer » Thu Jan 17, 2013 4:38 pm

jkoerner wrote:
CintaNotes Developer wrote:Alex
I wrote a lengthy reply on the subject of how to notify users about very bad bugs, but you did not respond to it.
In the 201 final topic.
Jon


Jon, I didn't forget about it, just taken some time to think the issue over. Will reply soon!
Alex
User avatar
CintaNotes Developer
Site Admin
Posts: 5002
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Ann] CintaNotes 2.0.2 Beta 1

Postby CintaNotes Developer » Thu Jan 17, 2013 4:41 pm

ChrisCN wrote:As far as I have heard (although not an expert) there are still possible ways to accomplish the task.
What about building up some additional look-up tables for the search which could be optimized for this task?


Good idea, but many people already complain that the .db file is bigger than the exported XML. Imagine what would be said if it became
even 10% bigger :)
Alex
User avatar
CintaNotes Developer
Site Admin
Posts: 5002
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Ann] CintaNotes 2.0.2 Beta 1

Postby CintaNotes Developer » Thu Jan 17, 2013 4:57 pm

Thomas Lohrum wrote:Nevertheless, imo the current design is not correct. It does not give me the results i expect. It is also in conflict with help.

I agree. However there must exist ability to specify that all search terms must be in the same field.

Thomas wrote:I am ware of the "very first normal form" :) Still CN can not be compared to a CRM-system, where a proper database design is much more important. As for the database operations: My suggestion is to use the largest field (note text) and append the short fields internally. The design of the database must not be changed, that is the short fields (title, link, tags (?)) remain separate fields.

Well, the instant search is one of the main wow-factors of CN, and I'd really hate to lose it. However duplicating content is also not a solution. Just think that now all these cached data need to be kept in sync..
I say we test the naive implementation first, just how bad is it, and then try to think of the ways to optimize.

Thomas wrote:Thank you for considering my suggestions. I guess it is theory to select all notes containing some text and having a modification date before some given date. If i really need this, i can search for the word and scroll down the list, to match the given date. Imo, title, text + link is a must. I'd also be pleased to see the tags field included. However, stringinizing the field also violates 1NF of database design ;) I think for tags a join should be used, even if this is slower/the slowest.

Don't worry, I meant stringizing just as a metaphor :) Real tags will remain in the Tags table :)

Thomas wrote:Limiting the search to the four fields as you described makes sense to me. Having one more option "Search across field boundaries" complicates things a little, but i agree it is a good compromise in regards to speed over hits.

Good! Hopefully it won't complicate the UI too much. But it is conceptually similar to "Search inside words", so I hope that the two options will be seen just as search fine-tuning.

Thomas wrote:There is also another suggestion regarding the offered field selection. Right now we have title+text,text only, title only, link only, etc. What about having each field as a checkable selection, which can be combined, e.g. [X] title [X] link. What do you think about this? I consider making this a suggestion on the roadmap.

For me it would be too cumbersome, because anytime you'd want to switch to another field, you'd have to also deactivate the old field.
Alex
Thomas Lohrum
Posts: 1324
Joined: Tue Mar 08, 2011 11:15 am

Re: [Ann] CintaNotes 2.0.2 Beta 1

Postby Thomas Lohrum » Thu Jan 17, 2013 9:20 pm

CintaNotes Developer wrote:I say we test the naive implementation first, just how bad is it, and then try to think of the ways to optimize.

OK :)

CintaNotes Developer wrote:
Thomas wrote:There is also another suggestion regarding the offered field selection. Right now we have title+text,text only, title only, link only, etc. What about having each field as a checkable selection, which can be combined, e.g. [X] title [X] link. What do you think about this? I consider making this a suggestion on the roadmap.
For me it would be too cumbersome, because anytime you'd want to switch to another field, you'd have to also deactivate the old field.

You're probably right. So we should leave the existing options, but enhance "Anywhere" the way discussed, e.g. "Search across field boundaries".
User avatar
CintaNotes Developer
Site Admin
Posts: 5002
Joined: Fri Dec 12, 2008 4:45 pm
Contact:

Re: [Ann] CintaNotes 2.0.2 Beta 1

Postby CintaNotes Developer » Fri Jan 18, 2013 10:20 am

Thomas Lohrum wrote:You're probably right. So we should leave the existing options, but enhance "Anywhere" the way discussed, e.g. "Search across field boundaries".
Agreed 8-)
Alex

Return to “CintaNotes Personal Notes Manager”